How AI-Generated PRs Are Breaking Code Review
Teams using Copilot and Cursor are generating 3x more PRs. Review capacity hasn't scaled. Here's what's breaking and what to do about it.
There's a pattern playing out across engineering teams right now, and most of them don't realize it until the symptoms get painful.
The team adopts Copilot, or Cursor, or some combination of AI coding tools. Developers get faster. PR volume goes up — sometimes dramatically. Everyone feels productive. For about three weeks.
Then review queues start backing up. PRs sit for days. Developers start merging without thorough review because the queue is too long. Bugs that would've been caught in review make it to production. Someone raises it in the retro and the team realizes: they optimized for code generation without scaling code review.
The volume problem
The numbers are striking. Teams adopting AI coding tools consistently report a 2-3x increase in PR volume. GitHub's own research on Copilot showed a ~55% increase in throughput for developers using the tool. Some teams describe even larger jumps — going from 1-2 PRs per day to 3-5.
But review capacity is fixed. The same five developers who used to review 5 PRs a day now need to review 10-15. And reviewing AI-generated code isn't faster — if anything, it's slower, because AI code often requires more careful scrutiny (more on that below).
The result is a bottleneck shift. The constraint was "we can't write code fast enough." Now it's "we can't review code fast enough." And the second bottleneck is harder to solve, because you can't throw an AI tool at it the same way.
Some teams try. They set up AI-powered code review bots. But as any senior engineer will tell you, the value of code review isn't catching syntax errors or style violations — it's architectural judgment, context-specific decisions, and knowledge sharing. Those require humans.
The quality triage problem
Not all PRs are created equal. A one-line config change and a core database migration both show up in the review queue. In the pre-AI world, developers had an intuitive sense of which PRs were high-risk — they wrote the code themselves, so they knew where the complexity lived.
AI-generated code breaks this intuition. A developer might generate a 300-line PR using Cursor that looks clean and passes tests but contains subtle issues:
- Hallucinated patterns. The AI used an API method that doesn't exist in your version of the library, or mixed conventions from different frameworks.
- Plausible but wrong logic. The code handles the happy path perfectly but misses edge cases that only someone who knows the domain would catch.
- Over-engineered solutions. AI tends to produce more code than necessary — extra abstractions, unnecessary error handling for cases that can't occur, verbose patterns where simpler ones exist.
- Context-blind decisions. The AI doesn't know about your team's conventions, your scaling constraints, or the incident you had last month with a similar pattern.
As one engineering manager put it: "The problem isn't that AI code doesn't work. It's that it works well enough that reviewers let their guard down, and the issues it introduces are exactly the kind that slip through a quick review."
What this looks like in practice
Here's a typical week on a team that hasn't adapted their review workflow:
Monday: 12 PRs opened, 3 reviewed. Everyone's catching up from the weekend.
Tuesday: 8 new PRs. Monday's backlog persists. Total in queue: 17. Two developers start rubber-stamping small PRs to clear the queue.
Wednesday: The PR that refactors the payment service has been sitting since Monday. The author pings in Slack. The reviewer says "I'll get to it today." They don't.
Thursday: 22 PRs in the queue. The team lead declares "review day" and everyone stops coding to burn down the backlog. Productivity on new features drops to zero.
Friday: The queue is clear but 4 PRs were approved without meaningful review. One of them introduced a race condition that'll show up in production next week.
This isn't hypothetical. It's the pattern we hear repeatedly from engineering managers.
Why existing tools don't help
GitHub's built-in review tools were designed for a world where PR volume was manageable. The notification system broadcasts events to email or to a Slack channel. Neither helps a reviewer prioritize.
Most AI code review tools (CodeRabbit, Sourcery, etc.) focus on the wrong layer. They're good at catching style issues and obvious bugs, but they can't replace the human judgment that matters most in review: "Is this the right approach? Does this fit our system? What are the second-order effects?"
What's missing is the workflow layer between "PR opened" and "PR reviewed." The part where:
- The right reviewer is notified immediately (not via email, not via channel noise)
- They can see the PR's risk level and size without opening GitHub
- They can prioritize across multiple pending reviews
- Status changes are communicated without manual follow-up
What to do about it
Teams that are handling the AI-generated PR volume well tend to follow a few principles:
Separate triage from review. Don't ask reviewers to do both at once. Have a quick triage step — someone (or something) categorizes PRs by risk level. Low-risk PRs (config changes, typo fixes, generated boilerplate) get fast-tracked. High-risk PRs (core logic, security-sensitive code, new patterns) get dedicated review time.
Set review expectations per PR type. Not every PR needs the same depth of review. A boilerplate CRUD endpoint generated by Cursor needs a sanity check. A hand-written authentication refactor needs line-by-line review. Make the expectation explicit.
Batch reviews. Instead of reviewing PRs as they come in (constant context-switching), dedicate blocks for review. Morning and afternoon review windows keep the queue moving without fragmenting focus.
Track the right metrics. PR volume is a vanity metric. Track review queue depth (how many PRs are waiting at any given time), time to first response (how long before a reviewer engages), and stale PR rate (PRs with no activity for 24+ hours). These tell you whether your review process is keeping up.
Fix the notification layer. This is the lowest-hanging fruit and the one most teams skip. If your reviewers learn about PRs through email or a noisy Slack channel, you've already lost hours to notification lag. A direct message to the right person with the right context is worth more than any process change.
The new workflow
The teams shipping fastest right now aren't the ones generating the most code. They're the ones who've rebuilt their review workflow around the new reality: more code, same reviewers, higher stakes.
Tenpace is building the infrastructure for this workflow. Smart PR notifications that reach the right reviewer immediately, with context they can act on — PR size, status, who else is reviewing. Updates that consolidate instead of stack. No channel noise, no email black holes.
If your review queue is growing faster than your review capacity, see how Tenpace handles PR routing.