Stars
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
本專案紀錄 2024 鐵人賽當中程式碼
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Get your documents ready for gen AI
Python tool for converting files and office documents to Markdown.
A modular graph-based Retrieval-Augmented Generation (RAG) system
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
The interactive graphing library for Python ✨
Convert PDF to markdown + JSON quickly with high accuracy
✨✨Latest Advances on Multimodal Large Language Models
We present MocapNET, a real-time method that estimates the 3D human pose directly in the popular Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocula…
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Python version of the Playwright testing and automation library.
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
A large-scale, fine-grained, diverse preference dataset (and models).
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
🦜🔗 The platform for reliable agents.
Generative Agents: Interactive Simulacra of Human Behavior
Code and documentation to train Stanford's Alpaca models, and generate the data.
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
😎 Awesome lists about all kinds of interesting topics
Library for fast text representation and classification.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Tiled Diffusion and VAE optimize, licensed under CC BY-NC-SA 4.0