Thinking Machines unveils AI models that respond in real time
Thinking Machines is testing whether AI can feel more like a live call than a stop-and-start chat, with models built to listen, think and act at once.

Thinking Machines is betting that the next leap in AI is not a bigger chatbot, but a system that can keep up with a conversation as it happens. In a May 11 research preview, the company said its new interaction models continuously take in audio, video and text in real time, using a multi-stream, micro-turn design intended to make AI respond while the exchange is still unfolding.
That distinction matters because the current generation of AI products still behaves like a turn-based machine: a user speaks or types, the model waits, then it answers. Thinking Machines wants to replace that rhythm with something closer to a phone call, where listening and responding overlap. The company says the models are designed to handle interaction natively, rather than through external scaffolding, and to move beyond interfaces built around isolated prompts.
The practical test is simple to describe and hard to deliver. Real-time conversation has to stay responsive without talking over the user, missing context or freezing when the input shifts from speech to an image or a text prompt. It also has to prove that speed does not come at the expense of accuracy, privacy or reliability. A model that listens continuously may feel more natural, but it only changes behavior if it can keep latency low enough to preserve the flow of human conversation and accurate enough to be trusted in it.
Thinking Machines said the research preview demonstrated qualitatively new interaction capabilities and state-of-the-art combined performance in intelligence and responsiveness. The company framed that as part of a broader effort to make AI understandable, customizable and collaborative, and said it plans to keep publishing technical blog posts, papers and code. That publication strategy suggests the company wants the field to judge the work by measurable progress, not by product theater.

The timing also places the announcement inside a larger buildup around the startup. On March 10, Thinking Machines announced a long-term strategic partnership with NVIDIA to deploy at least one gigawatt of next-generation Vera Rubin systems, with deployment targeted for early next year. If the company can pair that compute scale with interaction models that genuinely feel continuous, it would mark more than a cleaner interface. It would signal a shift in how AI is used, from a tool that answers after the fact to one that participates while decisions are still being made.
Know something we missed? Have a correction or additional information?
Submit a Tip

