Skip to content

Hello, we're Minish!

About us

We're an open-source lab, with a focus on Natural Language Processing. Minish is currently maintained by @pringled. The lab was originally founded by @pringled and @stephantul.

We believe that if you make models fast enough, you unlock new possibilities.

Using our models and packages, you can:

  • Embed the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on a CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: tiny static embedding models with state-of-the-art performance.
  • potion: the best small models in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, semantic deduplication and filtering for your text datasets.
  • model2vec-rs: a Rust port of model2vec.

You can also find us on:

Pinned Loading

  1. model2vec model2vec Public

    Fast State-of-the-Art Static Embeddings

    Python 2k 115

  2. semhash semhash Public

    Fast Semantic Text Deduplication & Filtering

    Python 861 54

  3. vicinity vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 324 10

  4. tokenlearn tokenlearn Public

    Pre-train Static Word Embeddings

    Python 94 8

  5. model2vec-rs model2vec-rs Public

    Official Rust Implementation of Model2Vec

    Rust 146 13

Repositories

Showing 10 of 10 repositories
  • semhash Public

    Fast Semantic Text Deduplication & Filtering

    MinishLab/semhash’s past year of commit activity
    Python 861 MIT 54 2 0 Updated Jan 5, 2026
  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    MinishLab/vicinity’s past year of commit activity
    Python 324 MIT 10 1 1 Updated Dec 30, 2025
  • model2vec Public

    Fast State-of-the-Art Static Embeddings

    MinishLab/model2vec’s past year of commit activity
    Python 1,966 MIT 115 3 0 Updated Dec 30, 2025
  • docs Public
    MinishLab/docs’s past year of commit activity
    MDX 0 2 0 0 Updated Nov 24, 2025
  • model2vec-rs Public

    Official Rust Implementation of Model2Vec

    MinishLab/model2vec-rs’s past year of commit activity
    Rust 146 MIT 13 1 0 Updated Sep 29, 2025
  • evaluation Public

    Code to evaluate performance for embeddings

    MinishLab/evaluation’s past year of commit activity
    Python 12 MIT 0 0 0 Updated Sep 20, 2025
  • .github Public

    Readme

    MinishLab/.github’s past year of commit activity
    0 0 0 0 Updated Sep 14, 2025
  • tokenlearn Public

    Pre-train Static Word Embeddings

    MinishLab/tokenlearn’s past year of commit activity
    Python 94 MIT 8 1 0 Updated Sep 9, 2025
  • MinishLab/minishlab.github.io’s past year of commit activity
    SCSS 0 MIT 1 0 0 Updated Jun 1, 2025
  • watertemplate Public template

    Template

    MinishLab/watertemplate’s past year of commit activity
    Makefile 4 MIT 3 0 1 Updated Dec 9, 2024