Thinking Machines Lab, led by Mira Murati, has unveiled a groundbreaking new architecture for human-AI interaction with the introduction of TML-Interaction-Small, a 276 billion parameter Mixture-of-Experts model. This innovative system, designed for real-time collaboration, processes audio, video, and text simultaneously in 200-millisecond chunks, setting a new standard for multimodal interaction.
Revolutionary Real-Time Multimodal Processing
The core innovation lies in its multi-stream, time-aligned micro-turn architecture, which eliminates the need for external voice-activity detection tools. Unlike traditional models that freeze perception during generation, TML-Interaction-Small operates with two parallel components. One runs in real-time, enabling full-duplex interaction, while the other handles asynchronous reasoning and tool use, maintaining a shared conversation context throughout.
Implications for AI Collaboration
This advancement addresses a critical limitation in current AI systems: the disconnect between perception and generation during interaction. By enabling continuous, simultaneous processing of multiple modalities, the system paves the way for more natural, human-like AI collaboration. The model's ability to maintain full conversation context while performing sustained reasoning could significantly enhance applications in customer service, education, and creative assistance.
Looking Ahead
While still in research preview, TML-Interaction-Small represents a major step toward truly interactive AI systems. As the field moves toward more seamless human-AI collaboration, such innovations may redefine how we interact with artificial intelligence in everyday settings.



