News

Reinforcement Learning from Code Execution: The Missing Axis in AI Scaling

End Of Miles

12 Apr 2025 — 2 min read

"There was a missing axis of scaling that wasn't being discussed, and it's frankly why we started this company. It was the axis of scaling through the use of reinforcement learning," reveals Eiso Kant, CTO of frontier AI company poolside, challenging the dominant narrative in artificial intelligence development. End of Miles reports that while the industry races to build larger models with more data, some leading researchers have identified a potentially more efficient path to advancing AI capabilities.

In a wide-ranging interview exploring the future of AI, Kant articulated a fundamental shift in how we should think about improving AI systems. "Scaling of next token prediction is the equivalent of imitation learning. Reinforcement learning is equivalent of trial and error learning," the AI leader explained, drawing a clear distinction between the two approaches.

Beyond Size and Data

For years, the prevailing wisdom in AI development has focused almost exclusively on two scaling axes: increasing model size (parameters) and expanding training data. This approach has produced remarkable results but may be hitting diminishing returns.

"You will never hear me argue against scale. Scaling of compute and scaling of data is critical for us to close the gap between where models are today and where we believe they can be. But that doesn't necessarily mean that the axes of scaling today are the same axes of scaling that they were what people thought they were two years ago." Eiso Kant, CTO of poolside

The poolside executive argues that reinforcement learning from code execution provides stronger signals for improvement than traditional approaches. Under this method, AI models learn through trial and error by attempting coding tasks, executing the code, and receiving feedback on the results.

A Million-Repository Testing Ground

To implement this approach, poolside has built an impressive infrastructure of containerized code repositories that allows their models to practice and improve.

"We're very known for our work on reinforcement learning from code execution feedback... we have close to a million repositories that are fully containerized with their test suite and many millions or tens of millions of revisions. We can say, hey, at this commit hash in this repository I want to change this code and then I want to execute it and see what comes back." Kant

This capability enables models to explore potential solutions and learning paths that would be difficult to capture through traditional training methods, creating what the technologist describes as "more signal in terms of what you can provide" compared to supervised learning approaches.

The Path to Human-Level AI

According to the poolside CTO, this reinforcement learning axis could dramatically accelerate the timeline for achieving human-level AI capabilities.

"I personally think now is eighteen to thirty-six months away, where human-level intelligence across the vast majority of knowledge work is achieved," states the AI researcher, adding that "you don't fine-tune your way to AGI."

While major AI labs like OpenAI, Anthropic and Google continue to push the boundaries with ever-larger models, this alternative scaling path suggests that the most efficient route to advanced AI might not simply be bigger models with more data, but models that can effectively learn from their own successes and failures in real-world execution environments.

Reinforcement Learning from Code Execution: The Missing Axis in AI Scaling

End Of Miles

Beyond Size and Data

A Million-Repository Testing Ground

The Path to Human-Level AI

Read more

Students Engage With AI in Four Distinct Patterns, Anthropic Education Report Reveals

Google and Yale Create 'Virtual Cells' That Could Replace Lab Experiments

Current AI Training Deception Could Sabotage Future Human-AI Cooperation

"Even AI Specialists Can't Process How Fast These Models Are Improving"