Skip to content
View techandy42's full-sized avatar
☀️
Being cracked.
☀️
Being cracked.

Block or report techandy42

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A playground repo to play around with OpenAI Codex.

Python 1 Updated Jun 29, 2025

The world's first System Design Engineer AI Agent.

TypeScript 1 Updated Jul 12, 2025

Dataset of BioService code containing signatures and docstrings.

Jupyter Notebook 1 Updated Jun 4, 2025

Access to Biological Web Services from Python.

Python 1 Updated May 19, 2025

A Test Repo for MLGit; Python; Relative Imports.

Python 1 Updated May 18, 2025

A Test Repo for MLGit; Python; Absolute Imports.

Python 1 Updated May 18, 2025

Benchmark that tests LLMs to find semantic bugs in large Python code.

Python 2 Updated Jun 11, 2025

MLGit: Index Codebase into Natural Language Descriptions; Works Just Like Git.

Python 1 Updated May 18, 2025
Python 68 4 Updated Mar 24, 2025

WAT.ai x Hamming.ai Joint Project for Building Code Debugging Benchmarks and Models.

Python 5 Updated May 3, 2025

A Python import visualization program.

Jupyter Notebook 1 Updated Sep 14, 2024
TypeScript 1 Updated Sep 16, 2024

Open-source datasets & models for LLM Judges to find and describe bugs in LLM-generated code.

Jupyter Notebook 2 Updated Nov 9, 2024

A new benchmark for measuring LLM's capability to detect bugs in large codebase.

Jupyter Notebook 3 5 Updated May 3, 2025

SimpleKit

TypeScript 5 34 Updated Oct 2, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 25,285 1,764 Updated Oct 13, 2025

🟣 LLMs interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.

615 75 Updated May 19, 2025

A new benchmark for measuring LLM's capability to detect bugs in large codebase.

Jupyter Notebook 32 4 Updated Jun 5, 2024

Can LLMs find bugs that compilers can't?: A benchmark for measuring LLMs' capabilities in debugging large source code.

Jupyter Notebook 1 Updated May 29, 2024

An open-source framework that makes evaluating LLMs & prompt engineering x10 easier!

Python 3 Updated Mar 20, 2024

Open-source ECCC repository for notebooks and documentations for the Webcam project by Hokyung (Andy) Lee.

Jupyter Notebook 1 Updated Apr 26, 2024

Open-source ECCC repository for notebooks and documentations for the Hail Forecasting project by Hokyung (Andy) Lee.

Jupyter Notebook 1 Updated Apr 26, 2024

Repository of LV-Eval Benchmark

Jupyter Notebook 1 Updated Apr 15, 2024

Repository of LV-Eval Benchmark

Python 70 9 Updated Aug 31, 2024

BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.

Jupyter Notebook 1 Updated Apr 15, 2024

BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.

Jupyter Notebook 215 21 Updated Sep 2, 2025

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Python 106 4 Updated Sep 19, 2025

m3 dataset with hamming

TypeScript 1 Updated Apr 26, 2024

Various examples on how to use Hamming for evals + observability

TypeScript 6 1 Updated Jan 27, 2025
Python 2 Updated Mar 31, 2024
Next