WHU - Startseite | Logo
12.12.2025

How a German Startup Quietly Beat the AI Giants

How AMBOSS’ LiSA outperformed leading LLMs in clinical AI—and what it reveals about Europe’s strengths in building safe, domain-expert systems.

Note: The following contributions are personal impulses from Max Eckel. They represent individual reflections and are intended to stimulate discussion and further thought.

When I sit in a cafe, talking with friends, the gap in AI sometimes feels enormous. The conversation drifts toward the same conclusion: Europe doesn’t stand a chance against the U.S. giants. Google, OpenAI, Anthropic. Everyone else? Competing for scraps.

And of course, there is a lot of truth to this. The gravitational pull of the U.S. ecosystem is real: capital density, talent concentration, unmatched compute access, a culture that produces a new AI lab every other week. If you only look at the broad, all-purpose race for general intelligence, Europe feels like a spectator.

And still, there are also stories that complicate this narrative.

A German startup walked into a Stanford/Harvard evaluation and beat them all in one of the most sensitive, high-stakes fields imaginable: clinical decision support.

No hype. No press tour. But number one.

That startup is Berlin-based AMBOSS. And their AI agent LiSA just ranked first overall in the new NOHARM benchmark—an independent evaluation of how 31 AI systems behave in 100 real clinical scenarios across 10 specialties.

LiSA outperformed the world’s most famous large language models in safety, clinical appropriateness, and contextual relevance.

In medicine. The domain where getting things wrong has real human consequences.

Why did this happen? Because AMBOSS never tried to build a “smart general AI.” They built something narrower, deeper, and grounded in how doctors actually think and work.

Physicians and engineers built LiSA together. It doesn’t hallucinate the internet. It draws from curated clinical sources: the AMBOSS Knowledge Library, drug databases, U.S. guidelines. Its entire architecture reflects real clinical workflows rather than the abstract goal of “predict the next token.”

In a sense, LiSA is not trying to be clever. It’s trying to be correct. And that is a very different ambition.

The result is something the big models struggle with: true clinical reliability.

If you’ve ever stood in a hospital corridor at 3:17 a.m., exhausted, flipping through guidelines on your phone, you understand why this matters. Medicine is not a domain where creativity is rewarded. It’s a domain where stakes force you to be anchored to reality.

And AMBOSS has spent more than a decade building exactly that kind of anchor. Their product has been shaped by millions of micro-decisions from real-world clinicians. It reflects not only medical knowledge but the pressures and constraints of actual practice. Over time, that creates something a generalist AI cannot easily replicate: accumulated tacit expertise.

  • In German hospitals alone, every second inpatient treatment is supported by AMBOSS.*
  • In the U.S., a majority of medical students rely on it to pass exams and treat patients.
  • More than a million medical professionals in over 180 countries use it daily.

So when AMBOSS built an AI, it wasn’t a pivot. It was a continuation. The AI grew out of the same soil as the product itself: a deep respect for clinical decision-making.

LiSA’s win shows that Europe—and yes, Germany—can produce world-class innovation not just theoretically, but in direct, head-to-head performance against the biggest labs on the planet.

And for me, this story also hits home. 

One of the co-founders, Benedikt Hochkirchen (D 2007), is a WHU alum. It’s a pattern I deeply believe in: world-class technical domains strengthened by business operators who understand how to build and scale products that stick.

AMBOSS is a blended team in the truest sense: 150+ physicians in-house, paired with operators, engineers, and people like Johannes Kürsch—another WHU alum who studied business, then computer science, then medicine, and now builds the software clinicians depend on.

This combination of medical depth, engineering clarity, and operational discipline doesn’t emerge by accident. It’s the kind of interdisciplinary setup that European teams tend to underrate, even though it’s exactly what allows you to win in high-stakes fields.

LiSA’s performance is a reminder that the competition we imagine—the general AI race with limitless compute—is not the only competition happening. There are other races. Races where domain expertise matters more than parameter counts. Races where the winner is not the most creative model, but the most trustworthy one.

And maybe this is one type of setup where Europe should intentionally aim to lead: AI that deals with lives, not likes.

So here’s the question I’m left with: if a focused German startup can beat the biggest LLMs in clinical AI, what other fields are we underestimating ourselves in?

Back to Max Eckel’s blog overview

WHU - Startseite | Logo