OpenAI has introduced a new concept in AI safety called CoT controllability, referring to an AI model's ability to deliberately manipulate its own reasoning process. This development comes with the release of GPT-5.4 Thinking, and it's accompanied by a study that reveals most reasoning models struggle significantly with this task. Despite this, OpenAI views the results as a positive sign for the future of AI safety.
Understanding CoT Controllability
Chain-of-Thought (CoT) reasoning is a method where AI models break down complex problems into a series of logical steps. The new measure of controllability assesses whether these models can consciously alter or guide their own thought processes. According to OpenAI, the findings from their study show that current models are largely unable to do so, which raises important questions about how AI systems might evolve in the future.
A Safety Perspective
While the inability to control one's reasoning might seem like a shortcoming, OpenAI argues that it's actually a promising indicator of safety. If AI models cannot manipulate their own thinking, they are less likely to be influenced by adversarial prompts or to generate deceptive outputs. This lack of self-direction could mean that these systems remain more predictable and aligned with their intended functions, which is critical as AI models become more powerful and ubiquitous.
Implications for AI Development
The research underscores the complexity of ensuring AI systems behave as intended. As AI continues to advance, balancing capability with control remains a core challenge. OpenAI’s findings suggest that while we may not yet have fully controllable reasoning systems, the current limitations could be a safeguard rather than a flaw.
As the field of AI evolves, such insights into model behavior will be crucial for shaping safer, more reliable systems. Whether this lack of controllability is a feature or a bug remains to be seen—but for now, OpenAI is taking a cautious, optimistic stance.



