Hugging Face reposted this
Storybook generation with nano-banana and the all new gr.Walkthrough 🔥 😉
The AI community building the future.
External link for Hugging Face
We’re on a journey to solve and democratize artificial intelligence through natural language.
Paris, FR
Hugging Face reposted this
Storybook generation with nano-banana and the all new gr.Walkthrough 🔥 😉
Hugging Face reposted this
How do I choose the best open LLM for my project? Heard this too many times, so we've done something about it! 1. A curated list of models and tags in AI Sheets 2. A general but hopefully useful guide: https://lnkd.in/g9TjgnGq
Hugging Face reposted this
🚀 Challenge launch! $60k up for grabs! 🚀 We're thrilled to launch the Antibody Developability Prediction Competition, hosted by a collaboration between Ginkgo Datapoints and Hugging Face. 🔬 The challenge: predict critical antibody developability properties using the GDPa1 dataset (246 antibodies). 💧 Hydrophobicity 🎯 Polyreactivity 🧲 Self-association 🔥 Thermostability 🧪 Titer 💰 Up to $60,000 in prizes are available, with separate prizes for each property. 📅 Submissions close November 1, 2025. This is the first open benchmark for antibody developability, and we can’t wait to see how the community rises to the challenge. Whether you’re competing, sharing insights, or just following along, your participation helps push the boundaries of machine learning for biotech. 👉 Register your team and join the competition here: https://lnkd.in/erEnD2ZH Let’s advance the future of therapeutic antibody design together, so share with whoever needs to know! (+ special shoutout to Lood van Niekerk for putting so much into this 🤗)
Hugging Face reposted this
upgrade your transformers 🔥 it comes with insanely capable models like SAM2, KOSMOS2.5, Florence-2 and more 🫡 I built a notebook you can run with free Colab T4 to walk through the API for new models 🙋🏻♀️ fine-tuning will follow-up soon! notebooks in comments 💬
Hugging Face reposted this
Paging frontend devs! ✨ if you’ve been meaning to dive into AI open source, we’re welcoming contributions! 🔥 We’ve just shipped the Gradio Dataframe as a standalone npm package, and you can now plug it right into your Svelte projects. We’re working on making more of our frontend components standalone. What Gradio component should we publish next?
Hugging Face reposted this
So happy to release smol course on the hub! If you’re building with or learning about post training AI models right now, we have a new FREE and CERTIFIED course. 🔗 Follow the org to join in https://lnkd.in/e4XA7zQd The course builds on smol course v1 which was the fastest way to learn to train your custom AI models. It now has: - A leaderboard for students to submit models to - Certification based on exams and leaderboards - Prizes based on Leaderboards - Up to date content on TRL and SmolLM3 - Deep integration with the Hub’s compute for model training and evaluation We will release chapters every few weeks, so you can follow the org to stay updated.
Hugging Face reposted this
We are introducing 📄 FinePDFs: the largest PDF dataset ever released, spanning over half a billion documents! - Long context: Documents are 2x longer than web text - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA when mixed with FW-EDU&DCLM web copora. Since the beginning of this year, many have started asking: what happens when we run out of web pages to train on? Have we really hit the data wall? 💥 Yet, only a few knew about a data source that everyone avoided for ages, due to its incredible extraction cost and complexity; PDFs. While PDFs are definitely hard to extract, unlike HTML-based datasets, they're prevalent in high-demand quality domains such as legal, science, something so valuable but hard to find. Yet even largest corpora (CC-PDF), is only scratching 🤏 the surface of what's possible to extract from CommonCrawl PDFs. That’s why we built a 2-tier pipeline to unlock all those “imprisoned tokens”: 1️⃣ Extractable text PDFs → 🦆 Docling → efficient, good quality 2️⃣ Scanned PDFs → 🤖 rolmOCR → higher cost, great quality Once extracted, we refined the data with model-based filtering + deduplication, resulting in documents that are on average 2x longer than web text and substantially higher quality. Finally, to confirm FinePDFs’ quality, we compared against major HTML-based corpora. - Despite minimal filtering, our dataset nearly matches heavily filtered FW-EDU&DCLM. - More importantly, when mixed together, it achieves new SoTA, with substantial improvements over other mixtures 📈. As is the tradition the dataset is fully reporoducible and release under ODC-By 1.0 license. Link in comments 👇
Hugging Face reposted this
Community spotlight is a fun way to nominate your favourite open source AI community members to win prizes like compute credits, merch, or HF PRO subscriptions. https://lnkd.in/eUc92Gmq Also, it's a super useful way of learning about who's active in the community. I've learnt about a few exciting people already.
Hugging Face reposted this
The open source AI community is just made of people who are passionate and care about their work. So we thought it would be cool to share our favourite icons of the community with a fun award. Winners get free Hugging Face Pro Subscriptions, Merchandise, or compute credits for the hub. 🔗 Follow and nominate here: https://lnkd.in/ebEkqAeC This is a new initiative to recognise and celebrate the incredible work being done by community members. It's all about inspiring more collaboration and innovation in the world of machine learning and AI. They're highlighting contributors in four key areas: - model creators: building and sharing innovative and state-of-the-art models. - educators: sharing knowledge through posts, articles, demos, and events. - tool builders: creating the libraries, frameworks, and applications that we all use. - community champions: supporting and mentoring others in forums. Know someone who deserves recognition? Nominate them by opening a post in the Hugging Face community forum.
Hugging Face reposted this
🚨 Feature Alert! 🚨 Ever wanted to: ✨ Try multiple models to respond to the same prompt? ✨ Compare different prompts for translation, classification, extraction, or summarization? With Column Duplication, you can do it very easily. Here’s how it works: 1️⃣ Import your dataset (or describe it, and we’ll generate one for you) 2️⃣ Add a column → write your prompt (or select from templates) 3️⃣ Select from 1000+ models, or use the default option. 4️⃣ Hit the Egg button to generate 5️⃣ Duplicate the column → tweak the prompt or switch the model Then you can compare results systematically and make an informed decision. Try it for free: https://lnkd.in/dymybmWf