Copilot vs. Code Quality: Are Current Metrics Enough to Define AI Success?

Blog

6.26.2025

Shachar Azriel

3 min

Last week marked Baz’s first time attending GitHub Universe, a notable shift from our regular presence at cybersecurity conferences. Engaging with the developer community in San Francisco gave us the space to reflect on significant changes in the dev tool landscape and to assess our own path for 2025.
‍

*No getting around SF having the best micro-climate*

‍

One thought dominated my reflections throughout the event: How effective is Copilot, really?

Copilot’s rise is remarkable, now boasting 1.3 million paid subscribers—a scale that surpasses GitHub’s own reach when Microsoft acquired it. At GitHub Universe, many organizations shared their enthusiasm for Copilot, reporting substantial license increases and citing improved productivity, team morale, and anticipation for features like Copilot Workspaces.

However, as discussions shifted to technical insights beyond the keynotes, the data revealed some interesting metrics. A major engineering organization disclosed that only 28% of Copilot’s suggestions were accepted. GitHub’s own internal testing demonstrated a 60% acceptance rate for code review suggestions.

This brought up an essential question: Is this level of accuracy enough to drive real change in development workflows across the industry? Can it set the standard for AI in development at leading tech companies?

The metrics presented consistently across sessions included:

Number of engaged users
Number of new engaged users
Number of accepted suggestions

These metrics highlight engagement and adoption, but they also prompted us to think critically: Do they measure the real value of an AI tool in development? Should metrics like “engaged users” be the primary indicators of success, or do we need a different approach to understand AI’s impact in this context?

At Baz, we define success differently. Our mission in AI-powered code review is to make a tangible impact on how engineering teams work, especially in terms of efficiency and quality. While adoption rates are important, we focus on metrics that show clear value to engineers and their teams.

Here’s how we evaluate our impact at Baz:

Time Saved per Review‍

Time is the most valuable resource for engineering teams, so our primary metric is time saved during code reviews. We analyze time reductions across repositories, languages, and teams, which helps us pinpoint areas where Baz offers the greatest value. By focusing on efficiency, we aim to reduce time spent on repetitive review tasks so engineers can focus on critical problem-solving.

Improvement in Code Quality‍

We track code quality improvements, specifically noting the frequency of critical issues caught and fixed during reviews. Our system also uses tracing data to monitor reductions in system downtime, indicating Baz’s impact on overall reliability and stability in production. Unlike general engagement metrics, these quality-focused measures allow us to directly assess how well Baz helps teams produce robust, maintainable code.

‍

Feedback Loop Efficiency‍

Our goal is to improve team effectiveness over time. We monitor whether development teams are repeating mistakes previously flagged in reviews. This feedback loop efficiency metric helps us identify recurring issues and target suggestions to prevent them, fostering continuous improvement in code quality and team practices.

‍

*Some of the best advice just hiding in plain*

‍

These metrics represent what we believe to be the core of effective AI-driven development tools. They’re designed to not only boost adoption but also to show engineers and teams the clear, measurable value they gain by integrating Baz into their workflows.

The developer tool landscape will continue to evolve rapidly, and we’re excited to see where these advancements take us. For Baz, this means a commitment to refining our approach, applying AI to solve real challenges, and delivering meaningful outcomes that engineers can count on.

For those interested in a new way to approach code review with AI, we invite you to join our waitlist. Our goal is simple: to empower teams with tools that save time, enhance quality, and support continuous improvement in development.

Join us on this journey and be the first to experience how Baz can transform your code review process.

Get on the waitlist → https://baz.co/

‍

Copilot vs. Code Quality: Are Current Metrics Enough to Define AI Success?

Time Saved per Review‍

Improvement in Code Quality‍

Feedback Loop Efficiency‍

More posts

Code reviews are broken: How GitHub’s poor UX Is hurting developers

Beyond LGTM: Master Code Reviews with AI

Hello World from Baz: Transforming Code Reviews with AI-Powered Context

The future of code review is agentic...