Since its release in the autumn of 2021, GitHub’s Copilot has revolutionized the software development world.
According to GitHub’s AI in Software Development Survey, nearly all respondents had used Copilot, with the majority reporting a perceived increase in code quality. Another GitHub survey conducted with IT enterprise Accenture found that “80% of Accenture participants successfully adopted GitHub Copilot , with a 96% success rate among initial users.” Success here is defined as developers accepting Copilot’s suggestions.
The reality is that many of the perceived efficiency gains from Copilot usage are just that: perceived. Objective metrics tell a different story. An analysis by the engineering firm Uplevel found that while overall efficiency barely increased for teams using Copilot, bugs rose by 41%. This could pose a much larger problem than the inconvenience of manually writing boilerplate code.
What’s Missing? Context.
Copilot’s output is constrained by the context it has. As a result, AI-generated code often fails to scale effectively in multi-repo, multi-language projects. While AI can accelerate code generation, overall productivity is still hindered by other critical development processes like code reviews.
Context Boundaries in Complex Projects
AI assistants like Copilot have improved their ability to provide context-aware code suggestions. Copilot now considers not just the document you’re working on but also other open tabs in your IDE. However, this remains far short of the full context required to handle large-scale systems spanning multiple repositories – the contextual window is simply too small. Copilot’s capabilities also fall short when addressing project-specific workflows.
One major limitation is the lack of tracing and observability data in the code generation process. As we explored in a previous post, tracing enhances visibility, optimizes workflows, and supports data-driven decisions. Without real-time contextual awareness, AI tools struggle to predict how new code will integrate with existing systems, often producing suggestions that misalign with broader project requirements.
The context window for Copilot is constrained to what is directly in front of or behind the cursor and, potentially, other open documents in the IDE. When generating code, Copilot relies primarily on its large language model’s (LLM) training on general programming patterns, not specific project details. While this makes the tool flexible, it often overlooks critical project-specific elements like naming conventions, architectural patterns, or dependencies between components spread across multiple repositories.
Furthermore, unless your AI tool is highly customized and deeply integrated with your project (a resource-intensive endeavor for most teams), it cannot retain knowledge about the project’s history, evolution, or previous commits. This can lead to inconsistent or misaligned code suggestions, which are costly to fix later.
Productivity Bottlenecks and Code Review Challenges
Even when AI-powered code generation saves time for developers, that time is often reclaimed during code reviews. Suggestions that fail to align with project architecture risk introducing inconsistencies, which reviewers must manually uncover and resolve. Code that initially appears functional often results in technical debt or hidden bugs, increasing the workload for reviewers—or worse, causing production failures that must be fixed under the pressure of downtime.
Existing code review tools exacerbate the problem. Just as AI assistants lack the ability to account for multi-repository dependencies, code review tools often fail to provide a project-wide view. These limitations increase workloads for reviewers, who lack efficient ways to identify changes, spot dependencies, or assess the broader impacts of code modifications.
The rapid adoption of AI tools has outpaced the development of frameworks to ensure code quality. Until more sophisticated tooling becomes standard, the rise in technical debt, production issues, and bugs is likely to continue. This places additional pressure on development teams, who must balance faster deployments with intensified quality control.
Evolving Code Reviews with Contextual AI
The good news? A new generation of AI code assistants is emerging. With first-hand knowledge of the shortcomings in AI-assisted coding, Baz is leading the way by bringing deep contextual awareness to observability tooling.
Tracing and observability are productivity multipliers for code reviews, enabling developers to better understand complex, multi-repo, and multi-language environments. On top of this, Baz enhances the process by integrating additional data from the tech stack, such as telemetry and tracing. Cross-repo and cross-language visibility should be the baseline for large-scale projects – non-negotiable for today’s distributed applications. Tools that prioritize these capabilities will redefine code generation and review workflows, allowing AI to produce truly context-aware code tailored to modern software environments.
Until AI code generation is supported by end-to-end contextual insights, the much-heralded productivity gains will remain elusive. Baz is here to make them real.
Want to elevate your code reviews? Check out the latest updates in our changelog and join the waitlist → https://baz.co