- San Francisco, CA, United States
- in/andy-lee-b68302232
Starred repositories
A playground repo to play around with OpenAI Codex.
The world's first System Design Engineer AI Agent.
Dataset of BioService code containing signatures and docstrings.
techandy42 / bioservices
Forked from cokelaer/bioservicesAccess to Biological Web Services from Python.
A Test Repo for MLGit; Python; Relative Imports.
A Test Repo for MLGit; Python; Absolute Imports.
Benchmark that tests LLMs to find semantic bugs in large Python code.
MLGit: Index Codebase into Natural Language Descriptions; Works Just Like Git.
WAT.ai x Hamming.ai Joint Project for Building Code Debugging Benchmarks and Models.
A Python import visualization program.
Open-source datasets & models for LLM Judges to find and describe bugs in LLM-generated code.
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
🟣 LLMs interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
Can LLMs find bugs that compilers can't?: A benchmark for measuring LLMs' capabilities in debugging large source code.
An open-source framework that makes evaluating LLMs & prompt engineering x10 easier!
Open-source ECCC repository for notebooks and documentations for the Webcam project by Hokyung (Andy) Lee.
Open-source ECCC repository for notebooks and documentations for the Hail Forecasting project by Hokyung (Andy) Lee.
techandy42 / LVEval
Forked from infinigence/LVEvalRepository of LV-Eval Benchmark
techandy42 / babilong
Forked from booydar/babilongBABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Various examples on how to use Hamming for evals + observability