Skip to main content
Startups 3 min read 244 views

Taalas Raises $169M to Build Model-Specific AI Chips That Outperform Nvidia H200 at 1/10th the Power

Toronto-based Taalas raises $169 million to etch AI model weights directly into silicon, claiming 73x more output tokens per second than Nvidia's H200 GPU at one-tenth the power consumption for its Llama-optimized inference chip.

TD

TechDrop Editorial

Share:

Taalas, a Toronto-based AI chip startup, emerged from stealth with a $169 million funding round on February 19, 2026, bringing total capital raised to over $200 million. The company takes a fundamentally different approach to AI inference: rather than running models on general-purpose GPUs, Taalas etches specific model weights directly into silicon, trading flexibility for extraordinary efficiency.

The Technology

Taalas's approach is "model-specific" chip design. Instead of building a general-purpose processor that can run any model, Taalas customizes only 2 of a chip's 100+ layers using what the company calls "mask ROM recall fabric" modules — circuit structures that use a single transistor per 4-bit storage unit for matrix multiplication. The effect is that the model's weights are physically encoded in the chip's circuitry, eliminating the memory bandwidth bottleneck that limits GPU-based inference performance.

The first product is a chip optimized for Meta's Llama 3.1 8B model. Taalas claims this chip produces 17,000 output tokens per second — 73x more than Nvidia's H200 GPU — while consuming one-tenth the power. If these numbers hold up under independent testing, they represent a step-change in inference economics: the same workload running on Taalas silicon would cost a fraction of what it costs on Nvidia hardware, with dramatically lower energy consumption.

The Founders

Taalas was founded by Ljubisa Bajic, who previously founded Tenstorrent, another AI chip company. Co-founders Drago Ignjatovic and Lejla Bajic were early Tenstorrent engineers. The team of 25 employees has deep semiconductor design expertise and a track record of building AI-specific silicon. Investors include Quiet Capital, Fidelity, and Pierre Lamond, a veteran semiconductor investor who was an early backer of National Semiconductor and other foundational chip companies.

Roadmap and Limitations

A chip optimized for the Llama 20B model is expected by mid-2026, and a more advanced "HC2" processor targeting frontier-class models will follow. The fundamental trade-off of Taalas's approach is obvious: a chip optimized for Llama 3.1 8B cannot run GPT-5 or Claude. When a new model version is released, a new chip must be fabricated. Taalas's bet is that the inference cost savings are large enough to justify model-specific silicon for the most widely deployed models, and that the fabrication cycle can keep pace with model release cadences.

For cloud providers and enterprises running specific models at massive scale — where inference compute is a multi-billion-dollar annual expense — model-specific chips could fundamentally change the cost structure of AI deployment. The question is whether the performance claims survive independent verification and whether the model-specific constraint is acceptable for production deployments that may need to upgrade models on shorter timelines than chip fabrication allows.

Related Articles