Skip to content

Lightweight mech-interp sandbox for probing ICL, tracing induction heads, and stress-testing distilled vs base LMs.

License

Notifications You must be signed in to change notification settings

salma2vec/inductra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inductra

Learn by building: Chain-of-Thought (CoT) probing, in-context learning (ICL) emergence, induction-head scans, and distilled vs. base model comparisons.

Works out-of-the-box with sshleifer/tiny-gpt2 for CPU demos; swap to larger causal LMs for deeper dives.


Quickstart

pip install -e .
python -m inductra.runner icl --model sshleifer/tiny-gpt2 --ctx 8 16 32 64
python -m inductra.runner cot --model sshleifer/tiny-gpt2 --layer -1
python -m inductra.runner compare --base gpt2 --distilled distilgpt2

What’s inside

  • ICL emergence grid — synthetic tasks (copy, reverse, modular add, parity) across context lengths; plot accuracy vs context.
  • Induction heads — shifted-copy attention score for detecting induction-like heads.
  • CoT probing — linear probe on residual streams to classify presence of CoT prefix (“Let’s think step by step”).
  • Distilled vs base — quick accuracy delta on modular arithmetic.
  • Hooks & probes — ~100 LOC each, minimal and hackable.

Resources

This repo pairs well with How to Become a Mechanistic Interpretability Researcher by Neel Nanda (Alignment Forum, 2025).

Distilled raw notes from the guide:

  • Stage 1 (≤1 month): breadth-first basics, code a transformer, replicate toy mech interp tasks. Don’t wait—experiment early.
  • Stage 2 (1–5 day projects): maximize info gain per unit time, run fast probes, debug aggressively.
  • Stage 3 (1–2 week sprints): develop skepticism, set real baselines, and always write up. Public write-ups are the best credential.

Skill growth rates:

  • Fast → coding, debugging, plotting.
  • Medium → taste & prioritization, built by doing multiple projects.
  • Slow → idea generation, emerges over months of exploration.

Practical tips: use LLMs as collaborators (they’re decent at mech interp tasks), don’t block on mentorship, and focus on “doing” over endless reading.


Roadmap

  • Hook adapters for Llama/Mistral/Qwen
  • Token-shifted causal tracing
  • Per-head ranking plots
  • GSM8K CoT token supervision
  • Path patching experiments

License

MIT

About

Lightweight mech-interp sandbox for probing ICL, tracing induction heads, and stress-testing distilled vs base LMs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published