All Articles

Building an AI Code Review Agent: Advanced Diffing, Parsing, and Agentic Workflows

Enhancing AI-driven code reviews with Git Diff, Difftastic, and Tree Sitter. Discover how syntax-aware diffing, structured parsing, and historical tracking improve AI’s ability to understand and optimize code changes.

Blog
3.25.2025
Guy Eisenkot, CEO & Co-Founder
3 min

When it comes to providing data for AI code review, the shortcomings of traditional tools are hard to ignore. They are blind to syntax, grammar, structure, and application utilization. With these limited inputs, AI is ill-equipped to make the code-reviewing decisions we need. Closing this gap is what drives our work at Baz. Software infrastructure today does not meet the needs of software developers in the AI era — that’s why our focus goes beyond code review, it’s building the infrastructure to model code that’s designed specifically for AI. 

Our approach revolves around agentic workflows—structured AI-driven review processes where agents are equipped with predefined tools to execute specific tasks. Unlike fully autonomous AI agents, these workflows provide controlled autonomy, ensuring accuracy and efficiency.  By integrating advanced diffing and parsing tools, our AI agents can perform better-informed code reviews with minimal manual intervention.

This first in a series of articles, we’ll discuss our learnings from building Baz Reviewer, an AI agent tasked with performing meticulous code reviews on complex pull requests. 

 Versioning

Git is a foundational tool for version control, and its standard output, the git diff, provides a standard textual representation of code changes. It’s great for answering the question, “What changed in this commit?” – it shows additions, deletions, and modifications line by line. The different types of diffing from the original diff include Text, Git, and AST.

While git diff helps human developers save their work and revert to previously working snapshots, it falls short in AI-driven workflows due to its limited contextual scope. When an AI model processes a diff, it sees the code modifications but lacks a full understanding of the surrounding context, such as function definitions or dependencies that haven't changed but remain critical to interpretation. This assumption—that the user has full knowledge of the original and modified code—isn't true for an AI agent working in a sandboxed coding environment.

Syntax

To address Diff’s inherent limitations, we turned to Difftastic, a language-aware open source syntax diff tool that understands code structure and grammar. Unlike Git’s diff, which treats code as text, Difftastic parses code according to the language’s rules. 

By leveraging Difftastic, we create more efficient AI workflows, enabling models to focus on substantial code modifications rather than being overwhelmed by irrelevant changes. Additionally, this optimization conserves token space, which is crucial for maintaining context within limited AI context windows.

Some specifics as about how Difftastic improves LLMs’ ability to analyze code changes:

  • Grammar-aware parsing: Difftastic understands the structure of code, not just its text. For instance, it recognizes that a function signature change affects all its call sites. This allows the AI to reason about the intent and impact of changes, rather than just their surface-level appearance.
  • Reduced noise: By ignoring meaningless changes (e.g., formatting, whitespace, or comments), Difftastic focuses on what’s relevant to the code’s functionality. This reduces the token footprint, making it easier for AI models to process and understand the changes within their context window.
  • Improved contextual understanding: Difftastic provides richer inputs to the AI by highlighting meaningful changes in the context of the code’s structure. For example, it can show how a modification to a function affects its logic or how a new variable is used across multiple files. This enables the LLM to provide more accurate and actionable feedback.

Perhaps the key distinction between diff and Difftastic is this: Where diff can highlight syntax changes, Difftastic is able to parse grammar. Because grammar signifies logic, this adds crucial context to the outputs.

However, there’s still something important missing — the wider context of version changes over time and overarching structure. Here’s where Tree-Sitter comes in.

Parsing

Tree-Sitter is a code parsing engine that builds incremental Abstract Syntax Trees (ASTs) for code. It delivers a structured, hierarchical view of the codebase to the LLM, greatly improving upon line-by-line diffs.

While Difftastic can also turn code into an AST, the difference with Tree-Sitter is that it assigns unique identifiers to code elements (e.g., functions, variables). These IDs can be persisted across versions (and even modules), enabling Tree-Sitter to track the evolution of code elements over time. For example, if a function is renamed or moved.

This is crucial for tasks like root cause analysis, where the agent needs to identify when and where a performance issue was introduced. With Tree-Sitter, we can get answers to previously unanswerable questions about code history, tying code elements to their runtime states and versions.

Building the foundation for AI-powered code reviews

Together, Git’s diffing, Difftastic, and Tree-Sitter form the backbone of our agentic workflows. These tools provide AI models with the context, structure, and historical insights they need to perform tasks like code review with precision and depth.

The next piece of the puzzle is log event data. We’ll cover that in the next article, looking at CI/CD and observability logs.

For more technical details, read the docs.

‍Ready to try Baz? Click here to Get Started.

We are shaping the future of code review.

Discover the power of Baz.