The AI Coding Loop Hits Runtime
Developers found flow in the IDE, now agents are chasing truth in execution.
Developers found flow in the IDE, now agents are chasing truth in execution.
.png)
Until recently, “AI coding” meant autocomplete, snippets, and assisted development inside the IDE. But models are evolving fast. They’re not just generating code anymore, they’re learning to run, test, and debug what they build.
That shift moves AI development from assistive to autonomous, extending the coding loop all the way to runtime.
These last few months in AI coding have been intense. New models, new operation modes and new form-factors continuously push even more developers to move from assisted-development workflows to (more) autonomous programming.
On the individual level, developers are discovering a new flow state that is not interrupted by web search and stack overflow, but persistent implementation and iteration on the IDE and CLI. Runtime was always there locally, developers have long had local environments either semi wired or fully wired to their pipelines, databases and workloads. Debugging locally with AI is a natural extension of that setup.
The team level is an entirely different story. Building reliable staging and pre-prod environments, let alone ephemeral environments with reliable connectivity require non-trivial platform engineering. In past cycles, we were obsessed with test coverage and creating optimal scenarios to help emulate real life user flows. Not only can agents continuously increase coverage for both unit, integration and end-to-end, but it can go beyond input/output.
For years, our “safe-shipping” rituals were built around our fears of downtime. We put in place guardrails and chokepoints in the form of CI steps, PR checks and feature flags. Continuous delivery tools thrived because they promised safety through smaller, slower releases. Don’t get me wrong, these are extremely effective means of safeguarding prod but now the tides have shifted. Coding agents mean more code and more code means more time spent on satisfying past controls.
That’s the paradox though: in trying to prevent bad code from shipping, we stopped getting closer to understanding it.
One more exciting plot twist in this story is the re-emergence of the CLI form factor. After being crammed into a tab in most common IDEs Cursor, Claude Code and others have helped revitalize the CLI as an effective operating system for the modern coding agent. While many credit its fast paced transactional character, I believe the main reason we went back to loving the CLI is it's the most effective way to proxy runtime into developer workflows.
The holy grail for code review is one where a reviewer drops whatever it is they are doing and steps into the pull request author’s shoes to validate and verify their work. There’s many ways about it but across hundreds of developers we surveyed over the past few months only a handful work this way. Most (like me) were satisfied with reading the code, seeing tests have passed and of course seeing Baz AI Code reviewer didn’t find any bugs.
But we all know that’s NOT effective code review - effective code review inspects intent, specifications and requirements against outcomes. The loop has to be extended.
Agents can analyze diffs in the context of environment variables, API tokens, and pre-prod data.
They can reason through “what happens next” — not just “what changed.” The result is a review that feels more like validation than inspection. The agent becomes a shared team-level resource that enforces standards while freeing engineers to build the thing.
There’s quite a few big unsolved problems to get this right:
Solving these problems means redefining code review around runtime truth — not static checks. The next generation of reviewers won’t just read code; they’ll execute it safely, observe the outcomes, and decide what’s truly safe to merge.
More on this coming soon.