NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code
Back to Home
ai

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

May 27, 20261 views2 min read

NVIDIA introduces Polar, a token-faithful rollout framework for GRPO training that boosts code generation performance across multiple platforms without altering existing agent harnesses.

NVIDIA has unveiled Polar, a novel rollout framework designed to enhance reinforcement learning training for language agents without altering their existing agent harnesses. This innovation is particularly significant in the realm of code generation, where training efficiency and fidelity are paramount. Polar introduces an API proxy layer between the agent harness and the inference server, enabling token-level tracking and trajectory reconstruction—key for effective training.

Enhancing GRPO Training Across Multiple Platforms

The framework specifically supports Generalized Reward Policy Optimization (GRPO) and has been tested using a Qwen3.5-4B base model. Results show substantial improvements in code generation performance: a 22.6-point increase in SWE-Bench Verified pass@1 under the Codex harness, 4.8 points under Claude Code, and 6.2 points under Pi. These gains highlight Polar’s ability to seamlessly integrate with and optimize various code generation tools, making it a versatile solution for developers and researchers alike.

Open Source and Scalable

Polar is integrated as a NeMo Gym environment and is now available under the ProRL Agent Server repository, offering open-source access to the community. This move aligns with NVIDIA’s broader mission to democratize advanced AI training methods. By enabling fine-grained control over rollout processes without disrupting existing workflows, Polar represents a critical step forward in scalable, efficient reinforcement learning for code agents.

As AI models continue to evolve, tools like Polar are crucial in bridging the gap between research and practical application. With its token-faithful approach, Polar not only improves performance but also ensures the integrity of training data, setting a new benchmark for agent development in code generation.

Source: MarkTechPost

Related Articles