Highlights
Stars
Qforia Best For specific Industry's Respect to wordlift and IPULL RANK
This solution accelerator leverages Azure AI Foundry, Azure AI Content Understanding, Azure OpenAI Service, and Azure AI Search to enable organizations to derive insights from volumes of conversati…
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
The best open-source python library to generate and process SAT's CFDI
PHP Common utilities for Mexican CFDI 3.2, 3.3 & 4.0
Code for my "Efficient Data Processing in SQL" book.
R Package of automated tools to retrieve, parse, clean, and analyze documents from the United States Supreme Court - including: oral argument transcripts, motions, applications, orders, and decision.
MTEB: Massive Text Embedding Benchmark
This repository goes over how to handle massive variety in data engineering
This is a repo with links to everything you'd ever want to learn about data engineering
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
OWASP Juice Shop: Probably the most modern and sophisticated insecure web application
Google Tag Manager Variable Template for fetching nested properties from objects using dot notation
set of functions and operators for executing similarity queries
"1 + 1 = 1 or Record Deduplication with Python" Jupyter Notebook
Rapid fuzzy string matching in Python using various string metrics
Collect, aggregate, and visualize a data ecosystem's metadata
Data Pipeline Framework using the singer.io spec
Distributed query engine providing simple and reliable data processing for any modality and scale
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Source files used for an introduction to Twisted
Prefect tasks and subflows for interacting with shell commands.
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Event-driven networking engine written in Python.
dataform-ga4-sessions is a Dataform package to prepare session and event tables from Google Analytics 4 (GA4) BigQuery raw data