Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference

Toronto startup Taalas is pioneering hardwired AI chips that can process 17,000 tokens per second, challenging the dominance of programmable GPUs in AI inference.

In a bold move that could reshape the AI hardware landscape, Toronto-based startup Taalas is challenging the long-held assumption that flexible, programmable GPUs are essential for AI inference. While the industry has largely embraced general-purpose silicon to accommodate rapidly evolving AI models, Taalas is betting on hardwired AI chips to deliver unprecedented performance.

Performance at Scale

The company claims its custom-designed chips can process 17,000 tokens per second, a dramatic leap over current GPU-based systems. This level of throughput is aimed at enabling AI inference to become as ubiquitous as traditional computing — something that's currently hindered by latency and scalability issues in existing hardware solutions.

Why Hardwired Chips?

While GPUs offer flexibility, they often sacrifice efficiency for adaptability. Taalas argues that for many AI workloads, especially those involving repetitive tasks like language processing, dedicated hardware can outperform general-purpose solutions by orders of magnitude. By designing chips specifically for AI inference, Taalas aims to reduce power consumption, increase speed, and make AI more accessible in edge computing and mobile environments.

Industry Implications

This approach could significantly impact how AI is deployed across industries. From autonomous vehicles to smart home devices, the ability to perform real-time inference at scale could unlock new applications. However, it also raises questions about whether the AI community is willing to trade flexibility for raw performance, especially as models continue to evolve at a rapid pace.

As Taalas moves toward commercial deployment, the startup’s strategy will be closely watched by both hardware vendors and AI researchers. If successful, it could mark a turning point in how we think about AI infrastructure — prioritizing speed and efficiency over adaptability.

Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference

Performance at Scale

Why Hardwired Chips?

Industry Implications

Related Articles

To Land a Job in AI, Try Reading Kant

AI Agents Plunged the Tech World Into Chaos. Here’s Exactly How That Happened

I Spent a Week Recording Myself Doing Chores for Money. Who's the Robot Now?