Snorkel AI’s cover photo
Snorkel AI

Snorkel AI

Software Development

Redwood City, California 62,281 followers

Expert Data. Unparalled Quality

About us

Snorkel AI is building the data layer for specialized AI, enabling frontier labs, enterprises, and government agencies to develop AI tailored to their unique workloads. Born from pioneering research at the Stanford AI Lab, Snorkel combines cutting-edge programmatic data development technology with deep domain expertise to accelerate AI from prototype to production. Backed by Addition, Greylock, GV, In-Q-Tel, Lightspeed Venture Partners, and funds and accounts managed by BlackRock, Snorkel AI is headquartered in Redwood City, California. Learn more at snorkel.ai or follow @SnorkelAI.

Website
https://snorkel.ai
Industry
Software Development
Company size
51-200 employees
Headquarters
Redwood City, California
Type
Privately Held
Founded
2019
Specialties
enterprise ai, weak supervision, programmatic labeling, artificial intelligence, machine learning, data science, technology, software, foundation models, LLM, Generative AI, GPT-3, ChatGPT, NLP, computer vision, and document intelligence

Locations

Employees at Snorkel AI

Updates

  • One of the world’s leading LLM developers needed a dataset that could push frontier models to their limits. We developed a dataset of multiple-choice Q&A that tests for PhD-level understanding across thousands of domains, from humanities to STEM to professional fields. The outcome? 📊 <20% pass rate by two frontier LLMs 📚 1,000+ PhD-level sub-domains covereD 👉 Learn more about how Snorkel is redefining data-centric AI: https://lnkd.in/erH9TdrE

    • No alternative text description for this image
  • Evals are evolving: from off-the-shelf to yours. Snorkel helps teams create custom datasets, domain evals, and production feedback loops—so you can measure what moves the product. Interested in learning more 👉 https://lnkd.in/eWMfBtiQ

    View profile for Alexander Ratner

    Co-founder and CEO at Snorkel AI

    Some tweets/posts about "evals are dead" this weekend 🤯 What I think many mean to say: generic, off-the-shelf (OTS) evals are dead - and I agree! I'd liken this to people attacking a standardized test like the SAT - they're not actually attack the notion of systematically measuring and monitoring human performance (that would be absurd). They're questioning a certain set of generic (and potentially overrepresented) evals- likely in favor of a much more custom, nuanced way of doing evaluation. We're at a similar moment in the AI space: generic OTS evals and public benchmarks are going to have minimal (and naturally diminishing) value. Watching a set of pre-canned LLMAJ scores on a monitoring dashboard- no matter how slick - has close to zero additive value today. Quoting how your model/agents perform on public benchmarks - same, soon enough. Instead: successful AI products will rely on custom, specialized evals and benchmark datasets. Whether these are supported by formal platforms or less formalized error analysis/feedback loops - it's all about specialized evals. So are "evals dead"? Definitely not - in fact, we're only just getting started on the real way to do meaningful evaluation in AI!

  • Bigger models aren’t the answer. Better data is. Enterprises that can rapidly create, customize, and evaluate training data will define the next wave of AI. Fortune 500 banks, government agencies, and healthcare leaders are already proving it, transforming expert knowledge into scalable impact. Watch the full discussion with Snorkel co-founder Braden Hancock: https://lnkd.in/e3r5nTXu #AI #DataCentricAI #EnterpriseAI #LLMs

  • Daniel Xu, our Head of Product, will take the stage at AI Infra Connect’s AI Infra Summit on Sept 11, from 1:50–2:30 PM PST to discuss “Optimized Infrastructure: The Full-Stack Challenge.” Big thanks to Amazon Web Services (AWS) for the invite and @Amazon SageMaker (moderator Ankur Mehrotra) for hosting. Looking forward to sharing the stage with leaders from Walgreens Boots Alliance, @Datadog, and Lablup Inc.. See you there!

    • No alternative text description for this image
  • Part 2 of our 5-part series on AI evaluation rubrics just dropped! 📚 The right tool for the job: an A-Z of rubrics In this post, we dive deeper into types of rubrics: -Dataset-level that applies to all prompts, -Instance-specific rubrics designed along with the prompt and applied to a specific prompt. We also talk about process evaluation (trace level) or outcome evaluation and cover notes about LLM-based evals vs. code-based evals, Read more 👉 https://lnkd.in/gsn6mfsE

    • No alternative text description for this image
  • Part 2 of our five-part blog series on rubrics is live ⬇️

    View profile for Armin Parchami

    Director Research Eng @ Snorkel AI

    Part 2 of our 5-part series on AI evaluation rubrics just dropped! 📚 The right tool for the job: an A-Z of rubrics In this post, we dive deeper into types of rubrics: - Dataset-level that applies to all prompts - Instance-specific rubrics designed along with a specific prompt. - We also talk about process evaluation (trace level) or outcome evaluation. - And we cover some notes about LLM-based evals vs. code-based evals. Read on!  https://lnkd.in/gsn6mfsE

    • No alternative text description for this image
  • Many real-world apps need LLMs to handle multi-step reasoning-following connections, combining info, and solving problems that aren’t just about the next word. That’s exactly what SnorkelGraph tests. We ask models natural-language questions about graphs (node + edge lists), with answers we can verify. Two ways we dial up the challenge: Graph size 👉 more nodes/edges to keep track of Operator type 👉 different “math-like” questions to stress reasoning Built with Snorkel’s procedurally generated, expert-verified datasets, SnorkelGraph is part of our benchmark suite for evaluating reasoning in LLMs. 👉 Check out the full leaderboard: https://lnkd.in/e5bwFBcQ

    • No alternative text description for this image
  • ICYMI from our CEO, Alexander Ratner ⬇️ RL envs aren’t “easy,” won’t be commoditized, and aren’t the whole story. The useful ones are domain-specific and need human data + evals. We’re building with leading LLM teams—let’s talk.

    View profile for Alexander Ratner

    Co-founder and CEO at Snorkel AI

    Lots of chatter about agentic/RL simulation environments recently! Some key misconceptions (slightly caricatured): >> Building RL envs is easy, because you just code up a verifier quickly, and let the model do the tough data generation on its own! - Usually, this boils down to over-indexing on environments where verification is easy. - For example: you might need a chess expert to generate realistic expert gameplay traces, but anyone with a basic chess rulebook could verify a win easily. - However: there are many, many settings where verification is not at all trivial. The simplest examples are settings with nuanced, domain-specific evaluation rubrics (e.g. most real world enterprise settings). An extreme example being: verify whether a program will halt :) >> Building RL envs will get commoditized as the "standard" environments get rapidly solved. - RL environments effectively encode a complete product spec - including unique tools, data resources, constraints, rubrics/verifiers, and human/agent simulators - and as such, are as diverse as the space of all possible AI products. - Yes, certain generic RL envs will rapidly commoditize ('web browsing', 'computer OS') - but these are not the useful ones anyway! - The useful RL envs will be deeply domain- and product-specific – and will require corresponding human expertise and customization to build and evolve over time. >> RL (and RL envs) will be all that you need! - Current evidence suggests that RL / RL envs will be one part of the overall AI development loop- which will continue to require golden human annotations/traces for initial SFT; ongoing human evals; and more - Just like trial-and-error based learning is only one part of human learning, RL will likely be one tool/phase of many. In summary: - (1) Building the components of an RL environment is usually highly non-trivial - (2) RL envs effectively describe a product spec - there will be a wide range of unique ones, requiring deep product/domain expertise - (3) RL (and RL envs) will be one component of a rich ecosystem of tools for model learning, including human data, rubrics, evals, and more If interested in some of the work the Snorkel AI team is doing in partnership with leading LLM developers here- shoot us a note! It's an exciting time to build in this space :)

Similar pages

Browse jobs

Funding