OpenAI has announced significant performance improvements for its Codex agent workflows through the implementation of WebSockets in the Responses API. This enhancement addresses a key bottleneck in agentic systems by reducing API overhead and improving model response latency, making automated workflows more efficient and scalable.
Reducing Latency Through Connection-Scoped Caching
The new approach leverages WebSockets to maintain persistent connections between clients and OpenAI's servers, enabling real-time communication without the overhead of repeated HTTP requests. According to OpenAI's research, this method incorporates connection-scoped caching that stores intermediate results and context, eliminating redundant computations and significantly cutting down on latency.
Enhanced Agent Loop Performance
The implementation specifically targets the Codex agent loop, which is crucial for complex automated workflows involving multiple API calls and decision-making processes. By streamlining this loop, OpenAI reports that developers can now execute agentic workflows up to 40% faster than before. This improvement is particularly valuable for applications requiring rapid response times, such as chatbots, automated customer service systems, and real-time data processing tools.
Implications for Developers and Enterprises
The update positions OpenAI's platform as a more competitive option for businesses relying on complex AI workflows. Developers building sophisticated agent-based applications will benefit from reduced infrastructure costs and improved user experience. This advancement also reflects a broader industry trend toward optimizing AI systems for real-time performance, as organizations increasingly demand responsive and efficient automated solutions.
The improvements mark a significant step forward in making AI-powered agent systems more practical for enterprise deployment, where latency and efficiency are critical factors in system success.



