What Your PR Review Metrics Actually Tell You
Review time, first-response time, stale PR rate — these aren't productivity metrics. They're attention allocation data. Here's how to read them.
Engineering teams love metrics. DORA metrics, cycle time, deployment frequency — there's no shortage of numbers to track. But when it comes to code review, most teams either measure nothing or measure the wrong things for the wrong reasons.
The problem with review metrics isn't the data. It's the framing. When you track review time as a productivity metric — "are our developers reviewing fast enough?" — you create perverse incentives. Developers rubber-stamp to hit the SLA. Review quality drops. The metric improves while the code gets worse.
But when you frame the same data as attention allocation — "where is our review capacity going, and is it going to the right places?" — it becomes genuinely useful.
The metrics that matter
Time to first response
What it measures: The time between a PR being opened and the first reviewer engaging with it (comment, approval, or change request).
What it actually tells you: How effective your notification system is.
A long time to first response almost never means "reviewers are lazy." It means "reviewers didn't know the PR existed until hours later." The PR was opened, GitHub sent an email, the email was buried, and the reviewer didn't see it until they happened to check GitHub.
Healthy range: Under 2 hours during working hours.
What to do when it's high: Fix the notification path before blaming reviewers. Route PR notifications to where developers actually are (Slack DMs, not email). If first-response time is high even with good notifications, you have a capacity problem — more PRs than your team can triage.
Review cycle time
What it measures: The total time from PR opened to PR merged (or closed), including all back-and-forth.
What it actually tells you: How well your team communicates asynchronously.
Long cycle times usually have two causes: long gaps between rounds of feedback (a communication problem) or too many rounds of feedback (a clarity problem — the PR's intent wasn't clear, or the reviewer and author have different mental models of the change).
Healthy range: 4-24 hours for typical PRs, depending on timezone distribution. Under 4 hours if your team is co-located.
What to do when it's high: Look at where the time is spent. Is it all in the first response? (Notification problem.) Is it in the gap between "changes requested" and the author's update? (Author context-switching problem.) Is it in multiple review rounds? (PR clarity or scope problem — PRs might be too large or under-described.)
Stale PR rate
What it measures: The percentage of open PRs with no activity for 24+ hours.
What it actually tells you: Where attention is failing.
This is the most important review metric and the one fewest teams track. A stale PR is a PR that everyone forgot about. The author moved on. The reviewer never engaged. It sits there accumulating merge conflicts and becoming harder to review every day.
Healthy range: Under 10% of open PRs at any time.
What to do when it's high: Stale PRs are a workflow failure, not a people failure. They usually mean: (1) the notification was missed, (2) the reviewer wasn't sure if they were responsible, or (3) the PR seemed too large to tackle without a dedicated block of time. Fix notifications, clarify assignment, and push for smaller PRs.
Review load distribution
What it measures: How many reviews each developer handles per week.
What it actually tells you: Whether your team's review burden is balanced.
In most teams, review load follows a power law. One or two senior developers handle 50-70% of all reviews. They're the ones who "know the codebase" and get tagged on everything. This is a single point of failure — when they go on vacation, the entire review pipeline stalls.
Healthy range: No single developer handling more than 30% of total reviews.
What to do when it's skewed: Expand the reviewer pool deliberately. Use CODEOWNERS to distribute ownership. Pair junior developers with seniors on reviews to build expertise. The goal is resilience, not equity.
Review depth
What it measures: Average number of comments per review, or ratio of "approved without comment" to "approved with feedback."
What it actually tells you: Whether reviews are meaningful.
This is the metric that tells you if your other metrics are lying. If time-to-first-response is great and cycle time is low but 80% of reviews are approved without any comments, your team isn't reviewing — they're rubber-stamping.
Healthy range: Context-dependent, but if more than 40% of PRs are approved with zero comments, something is off. Either the PRs are all trivial (unlikely) or reviews are superficial.
What to do when it's low: This is often a symptom of volume overload. Developers rubber-stamp because the queue is too long for thorough review. Address the volume problem (better triage, smaller PRs, more reviewers) rather than mandating more comments.
The metrics that don't matter (as much as you think)
Lines of code reviewed per day. Meaningless without context. 500 lines of generated CRUD code is easier to review than 50 lines of concurrency logic.
Number of PRs reviewed per developer. Gameable and misleading. A developer who reviews 10 trivial PRs isn't contributing more than one who spends an hour on a critical security review.
Approval rate. A high approval rate doesn't mean code quality is high. It might mean review standards are low.
Metrics as a diagnostic, not a scorecard
The right way to use review metrics is as a diagnostic tool. Something feels off — PRs are taking too long, bugs are slipping through, developers are frustrated. You look at the metrics to understand where the problem is:
- High first-response time → notification problem
- High cycle time but fast first response → communication or PR size problem
- High stale rate → assignment or awareness problem
- Skewed review load → knowledge silo problem
- Low review depth → volume or motivation problem
Each diagnosis leads to a different intervention. The metrics don't tell you what to do. They tell you where to look.
Attention data, not surveillance
There's a reason developers recoil at review metrics: too many managers use them as surveillance. "Bob only reviewed 3 PRs this week" becomes a performance conversation instead of a workflow investigation.
The framing matters. Review metrics should answer "is our team's review attention well-allocated?" not "who's being lazy?" The former leads to process improvements. The latter leads to resentment and rubber-stamping.
When we built Tenpace, we thought about this carefully. We surface the signals that help teams improve — first-response time, stale PR rate, notification effectiveness — without turning them into a leaderboard. The goal is better workflow, not better surveillance.