Stars
- All languages
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- CoffeeScript
- Cuda
- Elixir
- Elm
- Erlang
- Go
- HTML
- Handlebars
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Less
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- OCaml
- Objective-C
- OpenEdge ABL
- PHP
- Perl
- Python
- R
- Ragel
- Raku
- Rich Text Format
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Svelte
- Swift
- TeX
- TypeScript
- Vim Script
- Vue
- XSLT
- Yacc
This is a survey of research on AI scientists, AI researchers, AI engineers, and a series of AI-driven research studies
Data and software for building the ACL Anthology.
How do you train retrievers to find inspirations? [ACL 2025]
MultiCite code and data. Models are available on Huggingface.
Python PDF parser for scientific publications: content and figures
This repository delivers end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for re…
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Distribute and run LLMs with a single file.
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Code base for ICLR 2024 "Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature".
[ICML 2024] Binoculars: Zero-Shot Detection of LLM-Generated Text
Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.
A knowledge graph unifying computational and experimental data for MOFs
Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
Robin: A multi-agent system for automating scientific discovery
Python client for GROBID Web services
Knowledge Base is important [Accepted in NeurIPS 2024]
Curated resources for discovering, reading, and working with arXiv papers
Evaluation dataset for AI systems intended to benchmark capabilities foundational to scientific research in biology
LitQA Eval: A difficult set of scientific questions that require context of full-text research papers to answer
Pyzotero: a Python client for the Zotero API
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
Data and tools for generating and inspecting OLMo pre-training data.