Sarvam AI Challenges Global Giants, But Only in Select Areas

Sarvam AI proves Indian startups can outshine global tech giants in specialised AI tasks, despite limited compute and smaller models.
It is not every day that an Indian startup grabs headlines for outpacing some of the world’s most powerful technology companies. Yet that is exactly what Sarvam AI has managed to do. Over the past week, the homegrown company has sparked widespread conversation after claiming its latest AI tools performed better than ChatGPT and Google Gemini in certain benchmarks.
The buzz began when Sarvam introduced two products — Vision and Bulbul — and early results suggested they could outperform global heavyweights in niche tasks. That quickly led to celebratory posts online and bold headlines declaring that an Indian firm had “beaten” Silicon Valley’s best.
But the real story, as with most breakthroughs, is more layered.
Sarvam’s cofounder Pratyush Kumar announced on February 5 that the company’s Vision model topped the olmOCR-Bench, a benchmark designed to test optical character recognition (OCR). In simple terms, OCR evaluates how well an AI can read scanned documents, handwriting, complex fonts, and images.
On this test, Sarvam Vision posted an accuracy of 84.3 per cent — higher than OpenAI’s ChatGPT, Google’s Gemini 3 Pro, and even China’s DeepSeek OCR v2. On another benchmark, OmniDocBench v1.5, it scored an impressive 93.28 per cent, particularly excelling with technical tables, dense layouts, and mathematical formulas.
What gives Sarvam the edge is focus. Unlike global models trained broadly for multiple tasks, Vision has been fine-tuned specifically for Indic scripts and Indian documents. From regional languages to mixed-language forms, it understands the quirks of Indian writing systems better than most international competitors.
Then there is Bulbul V3, Sarvam’s text-to-speech tool. In tests involving Indian accents and languages, it reportedly outperformed ElevenLabs, a well-known global voice AI platform. Again, the advantage lies in localisation — Bulbul is built for how Indians actually speak.
So yes, Sarvam is ahead — but only in these specialised areas.
Outside of OCR and voice generation, the comparison changes. ChatGPT and Gemini are general-purpose AI systems capable of coding, tutoring, analysing medical images, generating creative writing, or holding long conversations. Sarvam’s tools simply aren’t built for those broader tasks.
Scale also plays a role. Sarvam Vision runs on a relatively modest 3 billion parameters, while Gemini models reportedly operate with parameters in the trillions. Larger models generally deliver broader intelligence but demand massive computing power — something Indian startups often lack due to limited access to high-end GPUs and large data centres.
Still, Sarvam’s achievement is significant. It shows that innovation does not always require size. With smart optimisation and local understanding, smaller teams can outperform tech giants where it matters most.
In many ways, Vision and Bulbul are proof that India’s AI potential is not limited by talent — only by infrastructure. And that alone makes Sarvam’s success worth celebrating.








