Stars
Morphological Analyser of Uzbek text based on affixes
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replica…
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.
Custom Russian tokenizer for spaCy
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
A neural word aligner based on multilingual BERT
An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instance useful for comparing a translation with the original te…
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Renderer for the harmony response format to be used with gpt-oss
SPT-VR brings the immersive, intense experience of Tarkov into the realm of virtual reality. Engage in intense firefights, loot dangerous environments, and survive the unforgiving world of Tarkov—a…
RuPAWS: A Russian Adversarial Dataset for Paraphrase Identification
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
GPU cluster manager for optimized AI model deployment
Python3 bindings for the Compact Language Detector v3 (CLD3)
Library for studying Cayley graphs and Schreier coset graphs
✂️ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) models
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
The most accurate natural language detection library for Python, suitable for short text and mixed-language text
A .NET library to access files and directories with more than 260 characters length.