Goodfire

Goodfire · 2025-11-13T16:40:22.567Z

Sparse autoencoders (SAEs) assume that model representations are static. But LLM features drift & evolve across context! Our new paper introduces a neuroscience-inspired method to capture these dynamic representations: Temporal Feature Analysis. Applying it reveals rich structure in model activations over the course of a conversation. E.g. in the video below, we plot 100 different conversations with Gemma 2 2B. Initial user prompts show similar trajectories (purple), but as the conversations continue, they diverge into different topic clusters (blue/green/yellow). This opens up a new "surface of attack" for interpretability - information about the geometry of model representations that previous methods largely neglected. That information is crucial for faithfully understanding & manipulating models!

Software Development

San Francisco, CA 5,363 followers

AI interpretability research company building safe and powerful AI systems

View all 46 employees

About us

Our mission is to advance humanity's understanding of AI by examining the inner workings of advanced AI models (or “AI Interpretability”). As a research-driven product organization, we bridge the gap between theoretical science and practical applications of interpretability. We're building critical infrastructure that empowers developers to understand, edit, and debug AI models at scale, ensuring the creation of safer and more reliable systems. Goodfire is a public benefit corporation headquartered in San Francisco.

Website: https://goodfire.ai/
External link for Goodfire
Industry: Software Development
Company size: 11-50 employees
Headquarters: San Francisco, CA
Type: Privately Held
Founded: 2024

Locations

Primary

San Francisco, CA, US

Get directions

Employees at Goodfire

See all employees

Updates

Goodfire

5,363 followers
3w Edited
Report this post
Our ML infra lets us steer trillion-parameter frontier models in real time: - live, mid-chain-of-thought edits to internal activations - directly altering how the model reasons (not just outputs) - stackable edits - no added latency We can make models more Gen Z, more concise, or anything else. Thanks for having us Gopal Raman South Park Commons!

Like Comment Share
Goodfire

5,363 followers
1mo Edited
Report this post
What counts as an explanation of how an LLM works? In our last Stanford guest lecture, Ekdeep Singh Lubana explains the different levels of analysis in interpretability, and outlines his neuro-inspired "model systems approach". Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach). 00:33 - What counts as an explanation? 04:47 - Levels of analysis & standard interpretability approaches 18:19 - The "model systems" approach to interp [Case study on in-context learning] 23:36 - How LLM representations change in-context 44:10 - Modeling ICL with rational analysis 1:10:54 - Conclusion & questions Thanks again to Surya Ganguli for having us in his class!
1 Comment

Like Comment Share
Goodfire

5,363 followers
1mo Edited
Report this post
What kinds of algorithmic building blocks do transformer models use to do different tasks? Jack Merullo recently gave a guest lecture at Stanford on "computational motifs" - the basic algorithmic primitives that show up again and again when we peer inside different circuits and models - and how they can help us understand models more fully. If you're interested in learning more about this approach to studying neural networks, check out this lecture (and our previous one, on causal approaches) - and stay tuned for our final lecture and reading list. Thanks again to Surya Ganguli for having us in his class! Watch the full lecture on YouTube: https://lnkd.in/g8n44bPs
1 Comment

Like Comment Share
Goodfire reposted this
Eric Ho
1mo
Report this post
I'll be at NeurIPS next week with Myra Deng, Jack Merullo, Mark Bissell, and others from the Goodfire team. If you want to chat about AI interpretability for scientific discovery, monitoring, or our recent research, fill out this form and we'll get in touch! https://lnkd.in/ejFFJbdj

2 Comments

Like Comment Share
Goodfire

5,363 followers
1mo
Report this post
We believe some high-priority directions for interpretability research are neglected by existing educational resources - so we've made some to get the community up to speed. If you're interested in AI but not caught up on interpretability, check out the Stanford guest lectures + reading lists we're releasing this month! We just posted the first lecture: Atticus Geiger on causal approaches to interpretability - applying frameworks and tools from causal modeling to understand LLMs and other neural networks. The video link is in the comments! Thanks to Surya Ganguli for having us in his course! 00:00 - Intro 01:51 - Activation steering (e.g. Golden Gate Claude) 10:23 - Causal mediation analysis (understanding the contribution of an intermediate component) 21:42 - Causal abstraction methods (explaining a complex causal system with a simple one) 54:54 - Lookback mechanisms: a case study in designing counterfactuals
3 Comments

Like Comment Share
Goodfire

5,363 followers
1mo
Report this post
Sparse autoencoders (SAEs) assume that model representations are static. But LLM features drift & evolve across context! Our new paper introduces a neuroscience-inspired method to capture these dynamic representations: Temporal Feature Analysis. Applying it reveals rich structure in model activations over the course of a conversation. E.g. in the video below, we plot 100 different conversations with Gemma 2 2B. Initial user prompts show similar trajectories (purple), but as the conversations continue, they diverge into different topic clusters (blue/green/yellow). This opens up a new "surface of attack" for interpretability - information about the geometry of model representations that previous methods largely neglected. That information is crucial for faithfully understanding & manipulating models!

7 Comments

Like Comment Share
Goodfire reposted this
Eric Ho
1mo
Report this post
Prompting and activation steering are just two sides of the same coin! New research from Goodfire researchers. Pretty insane how well their Bayesian belief model predicts empirical results. Read more here: https://lnkd.in/eSGMjrwP
5 Comments

Like Comment Share
Goodfire reposted this
Eric Ho
2mo
Report this post
LLMs memorize a lot of their training data. But where do "memories" live inside models? How are they stored? How much are they involved in different tasks? 📌 Jack Merullo and Srihita Vatsavaya's new paper investigates all of these questions! They found a signature of memorization in model weights, and use it to edit models, generally removing the ability to recite text verbatim. This reveals a spectrum of different model capabilities - some which rely heavily on memorization (like factual recall), and others which do perfectly fine without it (pure reasoning). In addition to answering fundamental scientific questions, this points to new applications - like being able to turn memorization up or down for different tasks, or making lightweight agents that excel at reasoning rather than encyclopedic knowledge.
6 Comments

Like Comment Share
Goodfire reposted this
Eric Ho
2mo
Report this post
44M real users’ data secured by our AI interpretability platform Ember. ⤵️ We partnered with Rakuten AI to translate frontier interpretability research into real production value - keeping personal user data safe without slowing down Rakuten’s AI agents. PII detection is a common concern in enterprise AI systems. In production, it requires methods that are: - Lightweight enough to run efficiently at scale - High-recall, so no sensitive data slips through - Trained only on synthetic data, since customer data can’t be used Using Ember, we built interpretability-based classifiers to catch PII with techniques that outperform black-box guardrails on recall, latency, and cost. Our methods were 15–500× cheaper than state of the art LLM-as-a-judge approaches. Huge thanks to Nam Nguyen, Dhruvil Gala, Myra Deng, Michael Byun and Daniel Balsam for leading the charge on this project at Goodfire, and to our collaborators at Rakuten - Yusuke Kaji, Kenta Naruse, Felix Giovanni Virgo, Mio Takei, and others who were early believers in Goodfire and our vision of interpretable AI. We’re excited about helping enterprises build safe, intentionally designed AI systems. If you’re interested in exploring what a partnership could look like, I’d love to chat.
10 Comments

Like Comment Share
Goodfire reposted this
Eric Ho
3mo
Report this post
Are you a high-agency, early- to mid-career researcher or engineer who wants to work on AI interpretability? We're looking for several Research Fellows and Research Engineering Fellows to start this fall. Fellows will work on areas like interp for scientific discovery, causal analysis, representational structure of memorization/generalization, dynamics of representations, and more. We're looking for a range of skillsets - e.g. RL, Bayesian inference, distributed systems, signal processing, and API infrastructure. Fellows will collaborate with senior members of our technical staff, contribute to core projects, and work full time in person in our San Francisco office. Full post and links to apply: https://lnkd.in/eum9VZhq
3 Comments

Like Comment Share

Browse jobs

Funding

Goodfire 2 total rounds

Last Round

Series A May 17, 2025

US$ 50.0M

Investors

Menlo Ventures + 5 Other investors

See more info on crunchbase

Goodfire

Software Development

San Francisco, CA 5,363 followers

AI interpretability research company building safe and powerful AI systems

About us

Locations

Employees at Goodfire

David Langer

Mike McCormick

Nnamdi Iregbulem

Archa Jain

Updates

Join now to see what you are missing

Similar pages

RippleMatch

Reducto

Wispr Flow

Baseten

Render

Chai Discovery

Listen Labs

8Flow

Replicate

Decagon

Browse jobs

Engineer jobs

Associate jobs

Analyst jobs

Machine Learning Engineer jobs

Traffic Coordinator jobs

Human Resources Manager jobs

Research Scientist jobs

Data Science Specialist jobs

Human Resources Specialist jobs

Marketing Director jobs

User Experience Designer jobs

Account Manager jobs

Senior Product Designer jobs

Scientist jobs

User Experience Specialist jobs

User Interface Designer jobs

Marketing Manager jobs

Developer jobs

Event Manager jobs

Product Designer jobs

Funding