Stars
PostgreSQL extension for BM25 relevance-ranked full-text search. Postgres OSS licensed.
A server implementation for Wikidata API using the Model Context Protocol (MCP).
DuckLake is an integrated data lake and catalog format
Get your documents ready for gen AI
Send Mails at a given interval using SendGrid or MSGraph, store sent mail
⚡ TabPFN: Foundation Model for Tabular Data ⚡
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
Databricks SQL Connector for Python
Open, Multi-modal Catalog for Data & AI
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
Low effort linking and easy de-duplication. Databricks ARC provides a simple, automated, lakehouse integrated entity resolution solution for intra and inter data linking.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Master the command line, in one page
📚 Freely available programming books
Interactive roadmaps, guides and other educational content to help developers grow in their careers.
A tool for writing an scd2 deltalake table which can be used in data lakes
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
🧙 Build, run, and manage data pipelines for integrating and transforming data.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️