-
Jina AI
- Berlin, Germany
- @michael_g_u
- @michael-g-u.bsky.social
-
translation-align Public
LLM-based translation and translation comparison
TypeScript Apache License 2.0 UpdatedJan 22, 2025 -
acl-anthology Public
Forked from acl-org/acl-anthologyData and software for building the ACL Anthology.
Python Apache License 2.0 UpdatedDec 11, 2024 -
fast_minh Public
Python package for fast MinHash calculation and operations
-
table-embeddings Public
Tools for training schema-aware Web table embedding for unsupervised and supervised machine learning on tabular data
-
NLP-OSS Public
Forked from nlposs/NLP-OSSDemocratizing NLP!
Creative Commons Zero v1.0 Universal UpdatedNov 26, 2023 -
mteb Public
Forked from embeddings-benchmark/mtebMTEB: Massive Text Embedding Benchmark
Python Apache License 2.0 UpdatedSep 13, 2023 -
test-gradient-cache Public
Small test script of gradient cache (https://github.com/luyug/GradCache) applied to train a model for a retrieval task on the SciFact dataset (https://allenai.org/data/scifact)
-
Script to import data from the Open Food Facts to PostgreSQL (Dataset URL: https://www.kaggle.com/openfoodfacts/world-food-facts)
-
docarray Public
Forked from docarray/docarray🧬 The data structure for unstructured multimodal data · Neural Search · Vector Search · Document Store
Python Apache License 2.0 UpdatedDec 16, 2022 -
postgres-word2vec Public
utils to use word embedding models like word2vec vectors in a PostgreSQL database
-
postgres-retrofit Public
Tools to create database-specific text value embeddings from word embedding datasets
-
the-movie-database-import Public
Script to import data from the The Movie Database to PostgreSQL (Dataset URL: https://www.kaggle.com/rounakbanik/the-movies-dataset
-
google-play-dataset-import Public
Script to import data from a Google Play Store Apps dataset to a PostgreSQL database (Dataset URL: https://www.kaggle.com/lava18/google-play-store-apps)
-
-
SimilarityMeasure Public
Compute for one node in a graph the most similar one
C++ UpdatedJan 22, 2017