Ultravox.ai’s cover photo
Ultravox.ai

Ultravox.ai

Software Development

Seattle, WA 5,346 followers

Ultravox is a real-time voice AI infrastructure layer that delivers low-latency, intelligent voice agents.

About us

We're building AIs that can communicate as naturally as humans. Check out our realtime model work at https://ultravox.ai

Website
https://ultravox.ai
Industry
Software Development
Company size
11-50 employees
Headquarters
Seattle, WA
Type
Privately Held
Founded
2022

Locations

Employees at Ultravox.ai

Updates

  • As the end of the year approaches, we're taking a moment to reflect on the incredible year we've had. 17x growth in our customer base since January 38x growth of our busiest call day compared to 2024 231 record-setting days for call volume on the platform during 2025 We couldn't have done it without our amazing users sharing feedback, reporting issues, and building incredible conversational experiences with Ultravox. Happy holidays from all of us at Ultravox, and we can't wait to see what we can build together in 2026! https://lnkd.in/g4mwNMqj

  • Ultravox.ai reposted this

    On this week’s episode of The Voice Loop, I jammed with Zach Koch, our friends at Ultravox.ai. We got into what it actually takes to ship speech-to-speech systems. We also discussed: - Why Ultravox moved from Llama to GLM 4.6 for Ultravox 0.7 - The real reason cascading stacks still dominate (and what changes when the tradeoffs disappear) - Where most “audio sensitivity” failures really come from - Why “failed experiments” are a superpower in model development If you’re building voice agents or voice models, this one’s for you. Links for Spotify and YouTube in the comments!

  • View organization page for Ultravox.ai

    5,346 followers

    ICYMI: Ultravox v0.7 is the smartest speech understanding model on the market, ranking #1 among speech models on VoiceBench with a score of 87.1. The v0.7 model delivers significant improvements to both instruction following and tool calling without sacrificing speed. The same speed and conversational experience users love, better tool call execution, and all at the same $0.05/min price point (including speech generation).

    • No alternative text description for this image
  • Ultravox.ai reposted this

    We just released Ultravox v0.7, the newest version of our native speech model. It's very good! On Big Bench Audio it scores an industry leading 91.8 (compared to gpt-realtime's score of 82.8). On VoiceBench it's now #1 among speech models with a score of 87.1. But evals are only part of the story. The real question is whether it's helping customers build more effective agents. Here's what one of our early beta testers said: "Ultravox v0.7 was the breakthrough we needed... The combination of more natural conversation flow and better reasoning means our law firm clients are finally launching with confidence. Tool calls execute flawlessly, the agents understand end users at a completely different level, and we're not spending hours debugging edge cases anymore." Under the hood, v0.7 is trained on top of GLM-4.6, which is really just an incredible open-weight model. It excels at instruction following and tool calling, even as conversations grow. We also introduced new training techniques, further expanding our lead in speech understanding. Until now, most speech models improved conversation quality but forced you onto a weaker “brain.” With Ultravox v0.7, that tradeoff is gone. And we're keeping pricing the same: $.05/min, all-in (including speech generation). Give it a try and let me know what you think! The model is available on app.ultravox.ai (via the agent builder and API). And of course, the model weights are available on Hugging Face.

    • No alternative text description for this image
  • We’re pleased to announce the release of Ultravox v0.7, the smartest, most capable speech understanding model available today! After a comprehensive evaluation of some of the best open-weight models, we chose GLM 4.6 to serve as our new LLM backbone. Thanks to this change, v0.7 delivers better instruction following and more reliable tool calling, without sacrificing the speed and conversational experience that Ultravox users love (and, as a bonus, we’ve improved inference performance by roughly 20%). You don’t have to take our word for it though–you can create an Ultravox Realtime account for free and try it yourself! (link in comments)

    • No alternative text description for this image
  • We routinely run into user questions about how to make cloned voices sound more natural—as it turns out, there are a number of factors that can influence the degree to which your clone sounds like you. This week, we decided to take a deeper dive into the overall process, covering how TTS models are trained and how voices are cloned with the zero-shot method. We've also got some tips on getting a more natural-sounding voice clone. https://lnkd.in/gk63Y645

  • Ultravox.ai reposted this

    Most people in the Voice AI space misunderstand why speech-to-speech models are important. It's not about better latency or emotional understanding (though these are both nice side effects of the model). It's really about building a world model for real-time speech that is accurate and useful enough to predict the next most likely token for a given state of the conversation. Ilya Sutskever, former co-founder & Chief Scientist at OpenAI, gave my favorite explanation of what is actually happening in transformers when they are predicting the next token: “When we train a large neural network to accurately predict the next word in lots of different texts...it is learning a world model.... This text is actually a projection of the world.... What the neural network is learning is more and more aspects of the world, of people, of the human conditions, their hopes, dreams, and motivations...the neural network learns a compressed, abstract, usable representation of that.” Current Voice AI systems do not have this world model for speech. Instead, they're built on a hodgepodge of systems/tools/models that we try and stitch together and coordinate to create the mirage of a real-time conversation. But because these systems are a mirage, they are bound to fail when exposed to real-world variability. In his most recent newsletter, Davit Baghdasaryan (Krisp CEO) wrote about the two big problems holding back Voice AI adoption: 1.) Real-world conditions cause the agent to fail (background speech, noises, broken turn-taking). 2.) Inaccurate speech-to-text, especially for things that fall outside of standard training distributions (addresses, phone numbers, etc). Speech-to-speech models are the only path to solving these problems at scale. This is why we started training the Ultravox.ai models back in early 2024. We've made a bunch of progress since then, and I'm excited to share our work on foundational speech models later this year.

Similar pages

Browse jobs