One of the world’s leading LLM developers needed a dataset that could push frontier models to their limits. We developed a dataset of multiple-choice Q&A that tests for PhD-level understanding across thousands of domains, from humanities to STEM to professional fields. The outcome? 📊 <20% pass rate by two frontier LLMs 📚 1,000+ PhD-level sub-domains covereD 👉 Learn more about how Snorkel is redefining data-centric AI: https://lnkd.in/erH9TdrE
Snorkel AI
Software Development
Redwood City, California 62,281 followers
Expert Data. Unparalled Quality
About us
Snorkel AI is building the data layer for specialized AI, enabling frontier labs, enterprises, and government agencies to develop AI tailored to their unique workloads. Born from pioneering research at the Stanford AI Lab, Snorkel combines cutting-edge programmatic data development technology with deep domain expertise to accelerate AI from prototype to production. Backed by Addition, Greylock, GV, In-Q-Tel, Lightspeed Venture Partners, and funds and accounts managed by BlackRock, Snorkel AI is headquartered in Redwood City, California. Learn more at snorkel.ai or follow @SnorkelAI.
- Website
-
https://snorkel.ai
External link for Snorkel AI
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- Redwood City, California
- Type
- Privately Held
- Founded
- 2019
- Specialties
- enterprise ai, weak supervision, programmatic labeling, artificial intelligence, machine learning, data science, technology, software, foundation models, LLM, Generative AI, GPT-3, ChatGPT, NLP, computer vision, and document intelligence
Locations
-
Primary
55 Perry St
Redwood City, California 94063, US
Employees at Snorkel AI
Updates
-
Evals are evolving: from off-the-shelf to yours. Snorkel helps teams create custom datasets, domain evals, and production feedback loops—so you can measure what moves the product. Interested in learning more 👉 https://lnkd.in/eWMfBtiQ
Some tweets/posts about "evals are dead" this weekend 🤯 What I think many mean to say: generic, off-the-shelf (OTS) evals are dead - and I agree! I'd liken this to people attacking a standardized test like the SAT - they're not actually attack the notion of systematically measuring and monitoring human performance (that would be absurd). They're questioning a certain set of generic (and potentially overrepresented) evals- likely in favor of a much more custom, nuanced way of doing evaluation. We're at a similar moment in the AI space: generic OTS evals and public benchmarks are going to have minimal (and naturally diminishing) value. Watching a set of pre-canned LLMAJ scores on a monitoring dashboard- no matter how slick - has close to zero additive value today. Quoting how your model/agents perform on public benchmarks - same, soon enough. Instead: successful AI products will rely on custom, specialized evals and benchmark datasets. Whether these are supported by formal platforms or less formalized error analysis/feedback loops - it's all about specialized evals. So are "evals dead"? Definitely not - in fact, we're only just getting started on the real way to do meaningful evaluation in AI!
-
Bigger models aren’t the answer. Better data is. Enterprises that can rapidly create, customize, and evaluate training data will define the next wave of AI. Fortune 500 banks, government agencies, and healthcare leaders are already proving it, transforming expert knowledge into scalable impact. Watch the full discussion with Snorkel co-founder Braden Hancock: https://lnkd.in/e3r5nTXu #AI #DataCentricAI #EnterpriseAI #LLMs
Building Enterprise-Grade AI: Lessons from Snorkel AI’s Fundraising and Expansion | The SaaS CFO
https://www.youtube.com/
-
Thrilled to welcome Penelope Talbot-Kelly, GM, DaaS, and Dennis Panos, Head of Strategic AI Solutions, to Snorkel! Excited to have you here to lead our mission of partnering with enterprises to build specialized AI at scale.
-
-
Daniel Xu, our Head of Product, will take the stage at AI Infra Connect’s AI Infra Summit on Sept 11, from 1:50–2:30 PM PST to discuss “Optimized Infrastructure: The Full-Stack Challenge.” Big thanks to Amazon Web Services (AWS) for the invite and @Amazon SageMaker (moderator Ankur Mehrotra) for hosting. Looking forward to sharing the stage with leaders from Walgreens Boots Alliance, @Datadog, and Lablup Inc.. See you there!
-
-
Part 2 of our 5-part series on AI evaluation rubrics just dropped! 📚 The right tool for the job: an A-Z of rubrics In this post, we dive deeper into types of rubrics: -Dataset-level that applies to all prompts, -Instance-specific rubrics designed along with the prompt and applied to a specific prompt. We also talk about process evaluation (trace level) or outcome evaluation and cover notes about LLM-based evals vs. code-based evals, Read more 👉 https://lnkd.in/gsn6mfsE
-
-
Part 2 of our five-part blog series on rubrics is live ⬇️
Part 2 of our 5-part series on AI evaluation rubrics just dropped! 📚 The right tool for the job: an A-Z of rubrics In this post, we dive deeper into types of rubrics: - Dataset-level that applies to all prompts - Instance-specific rubrics designed along with a specific prompt. - We also talk about process evaluation (trace level) or outcome evaluation. - And we cover some notes about LLM-based evals vs. code-based evals. Read on! https://lnkd.in/gsn6mfsE
-
-
Many real-world apps need LLMs to handle multi-step reasoning-following connections, combining info, and solving problems that aren’t just about the next word. That’s exactly what SnorkelGraph tests. We ask models natural-language questions about graphs (node + edge lists), with answers we can verify. Two ways we dial up the challenge: Graph size 👉 more nodes/edges to keep track of Operator type 👉 different “math-like” questions to stress reasoning Built with Snorkel’s procedurally generated, expert-verified datasets, SnorkelGraph is part of our benchmark suite for evaluating reasoning in LLMs. 👉 Check out the full leaderboard: https://lnkd.in/e5bwFBcQ
-
-
ICYMI from our CEO, Alexander Ratner ⬇️ RL envs aren’t “easy,” won’t be commoditized, and aren’t the whole story. The useful ones are domain-specific and need human data + evals. We’re building with leading LLM teams—let’s talk.
Lots of chatter about agentic/RL simulation environments recently! Some key misconceptions (slightly caricatured): >> Building RL envs is easy, because you just code up a verifier quickly, and let the model do the tough data generation on its own! - Usually, this boils down to over-indexing on environments where verification is easy. - For example: you might need a chess expert to generate realistic expert gameplay traces, but anyone with a basic chess rulebook could verify a win easily. - However: there are many, many settings where verification is not at all trivial. The simplest examples are settings with nuanced, domain-specific evaluation rubrics (e.g. most real world enterprise settings). An extreme example being: verify whether a program will halt :) >> Building RL envs will get commoditized as the "standard" environments get rapidly solved. - RL environments effectively encode a complete product spec - including unique tools, data resources, constraints, rubrics/verifiers, and human/agent simulators - and as such, are as diverse as the space of all possible AI products. - Yes, certain generic RL envs will rapidly commoditize ('web browsing', 'computer OS') - but these are not the useful ones anyway! - The useful RL envs will be deeply domain- and product-specific – and will require corresponding human expertise and customization to build and evolve over time. >> RL (and RL envs) will be all that you need! - Current evidence suggests that RL / RL envs will be one part of the overall AI development loop- which will continue to require golden human annotations/traces for initial SFT; ongoing human evals; and more - Just like trial-and-error based learning is only one part of human learning, RL will likely be one tool/phase of many. In summary: - (1) Building the components of an RL environment is usually highly non-trivial - (2) RL envs effectively describe a product spec - there will be a wide range of unique ones, requiring deep product/domain expertise - (3) RL (and RL envs) will be one component of a rich ecosystem of tools for model learning, including human data, rubrics, evals, and more If interested in some of the work the Snorkel AI team is doing in partnership with leading LLM developers here- shoot us a note! It's an exciting time to build in this space :)
-
Snorkel is growing, and we’re on the hunt for bold thinkers and builders to help shape the future of AI. 🚀 Know someone who’s ready to make their mark? Send them our way! Check out the open roles across engineering, sales, operations, marketing, and beyond 👉 https://lnkd.in/gqrA9g2D
-