Tech

Corti’s new Symphony Speech-to-Text model outperforms OpenAI in medical terminology accuracy, highlighting the value of specialized AI.

Today, Copenhagen-based healthcare AI Corti presents Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models specifically designed for real-time calling, conversation transcription, and group audio processing – and its accuracy level is the highest in this particular use case yet recorded.

"We are focused on ensuring that our AI authors can be trusted by doctors, clinicians and patients…the entire healthcare system," said Andreas Cleve, founder and CEO of Corti, in an exclusive video call interview with VentureBeat.

The performance data that the company brings to the table paints a stark picture of the current state of business AI: when it comes to highly regulated, specialized industries, domain-specific models can beat base model providers.

In a recently published research paperCorti has revealed that its new clinical-grade speech models have reduced word error rates (WER) by up to 93% compared to the best conventional speech models and APIs in medical terminology.

On English medical terminology, its Speech-to-Text Symphony achieved a surprisingly low WER of 1.4%.. By comparison, The OpenAI speech model registered a WER of 17.7%., ElevenLabs up 18.1%, Whisper recorded 17.4%again Parakeet scored 18.9%.

Corti’s announcement serves as an important turning point for healthcare providers. While general-purpose APIs like OpenAI whisper are adequate for broad domain transcription, they often stumble over medical acronyms, complex drug dosages, shorthand, and noisy emergency room environments. Symphony for Speech-to-Text aims to solve this by providing developers with a highly specialized, production-grade API designed from the ground up for clinical workflows.

The agent era requires flawless data entry

The launch of Symphony for Speech-to-Text highlights a fundamental shift in the way healthcare uses voice technology. For decades, medical speech recognition was primarily about producing a static text document for human doctors to review—a digital one instead of a notebook.

But as the health care industry abuses what experts call it "agent period," where autonomous AI agents actively assist in clinical decision making, EHR navigation, and real-time support, transcription is no longer the end product. Basic data layer.

“Speech has always been one of the most important concepts in healthcare,” Cleve said in a statement provided to VentureBeat. “What changes is what happens after the words are captured. In the agent era, speech recognition needs more than just generating text – we need to give AI systems accurate clinical facts to reason about. If the model doesn’t get the drug, dose, or symptom right, each step below becomes less reliable. Symphony for Speech-to-Text provides a sufficient layer of speech for healthcare professionals.”

This is where the compounding risk of high word error rates comes into play. If a general-purpose AI model manipulates the transcript—a turn "hyperthyroidism" in the middle "hypothyroidism," or misinterpreting the dose of an important drug—every subsequent AI agent that relies on that transcript will operate on corrupted data. Corti’s architecture mitigates this risk by producing structured, clinically usable output from the API, helping downstream AI applications think about clean facts rather than messy, unformatted text.

Nowhere is this more evident than in Corti’s business benchmarks. The Speech-to-Text Symphony has reached an incredible level 98.3% recall rate for formatted clinical entities—such as doses, measurements, and dates. In contrast, Corti reported that the general purpose hard base model achieved only 44.3%or similar organizations.

For developers building AI documentation tools around it, that 54% gap is the difference between a tool that saves a doctor time and a tool that creates medical bill.

Destroying the industry lovers

While Corti’s benchmarks against modern LLM builders like OpenAI and ElevenLabs are impressive, the company is also aiming to take on legacy medical document giants.

For years, the gold standard for a dedicated medical practitioner has been Dragon Medical One. However, these legacy systems were historically developed strictly for physician input, not as basic infrastructure for surrounding AI, complex multi-party conversations, or real-time clinical support tools.

In real-world English medical pronunciation tests, Corti achieved a WER of 4.6%, significantly better than Dragon’s 5.7% (19% relative improvement).

In addition, Corti showed higher medical word recall than Dragon (93.5% versus 92.9%).

By providing this level of accuracy through an API endpoint, Corti allows third-party developers, EHR vendors, and virtual care platforms to build their own calling and listening tools that outperform the industry’s legacy ones.

"We want people to build apps on top of our models," Cleve said. "The goal is to spread the technology as widely as necessary to be as useful as possible to patients and their doctors and specialists."

For Cleve and his founders, the goal is personal: Cleve’s mother was a health care professional who suffered a stroke and spent years struggling to recover. He wanted to improve health care practices as a way to honor his sacrifice.

Solving the health care model puzzle

Health care needs extend beyond English-speaking hospitals, and global health systems have long been underserved by clinical NLP models. Classic designers are already using Corti’s new models in areas that require language, proving the technology’s effectiveness in complex international markets.

Switzerland, for example, requires the delivery of care in multiple languages—often simultaneously in a single medical facility. It serves as one of the strongest reasons for proving the models of medical speech in many languages ​​in the world. Corti’s Symphony models showed significant performance gains on these non-English tests, achieving a WER of 2.4% for German (compared to 13.0% for the next best system) and a WER of 3.9% for French (compared to 10.6%).

“In the clinical conversation, every word counts – a missed drug name, a misheard dose, or a misspelled symbol can change the meaning of an encounter," said Pierre Corboz, Head of Solutions & Business Development at Voicepoint, a Swiss healthcare technology provider, in a statement provided to VentureBeat. "Symphony’s precision in clinical terminology gives us the foundation to bring more reliable AI capabilities to clinical workflows through our Voicepoint Xenon platform. When Corti develops the speech layer, the workflow we build together becomes sharper, safer, and more useful for Swiss doctors.”

AI verticalization and specialization have fruitful benefits

Today’s announcement of Symphony for Speech-to-Text is not an isolated event; it’s the culmination of the strategic narrative that Corti has been pushing so hard for the past few weeks.

Symphony’s comprehensive platform—powering the clinical and management applications of a global network of EHR vendors and life sciences organizations—has been systematically proven to protect vertical AI labs against horizontal technology giants.

This marks the third major benchmark that Corti has released in just six weeks, touching on different layers of healthcare AI functionality.

In April, the company revealed that its Symphony for Medical Coding system outperformed general-purpose models by 25% in clinical accuracy benchmarks, addressing the most popular complex healthcare workflows.

And last week, Corti announced that its clinical-level model passed OpenAI in HealthBench Professional, OpenAI’s health benchmark.

Taken together, these three data points—medical coding, clinical reasoning, and speech and text accuracy—represent a growing consensus in the field of business technology: standard models are breaking down in regulated industries.

Models used in hospitals must naturally understand complex acronyms, sudden interruptions, medical abbreviations, specialized language, and strict compliance restrictions. By being trained specifically for these unique edge scenarios, static AI labs like Corti are creating a major channel that companies that rely solely on API calls to big language models can’t easily cross.

Availability and product planning

Developers are clearly aware of the performance gap. According to momentum data provided to VentureBeat, Corti is seeing 30% growth in new registrations for its platform quarter-to-date, indicating that healthcare developers and manufacturers are gravitating toward more specific, clinical-grade models than standard APIs.

Corti, which already serves more than 100 million patients a year across major health systems including the UK’s National Health Service (NHS), is positioning Speech-to-Text Symphony as the automation engine for the next generation of healthcare software.

It is important to note that Corti is not presenting the Symphony itself today; rather, Symphony for Speech-to-Text serves as a new, unique capability within that broader ecosystem, accessible through its own API endpoints.

Speech-to-Text Symphony is usually available as of today. Developers and business designers can access the Corti API console models, with full technical documentation available to help integrate a clinical-grade speech layer into their existing systems.

In an effort toward research transparency, Corti has also published its full research paper describing its methodology, as well as a unique benchmarking tool designed to support transparent evaluation of medical speech recognition systems across the industry.

As the healthcare industry continues its rapid adoption of AI-driven automation, the underlying data layer has never been more critical. Corti’s latest launch is a stark reminder that in the field of medicine, standard AI is not good enough. The future belongs to the professionals.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button