Edit on GitHub

Productivity & Metrics

AI coding tools promise big productivity gains — but how much do they really help?

The answer is nuanced: it depends on the task, the team, and the discipline around the tool.


1. What the Data Actually Says

Speed & Output

Finding Source
Developers completed tasks 55.8% faster with Copilot (simple HTTP server task) Peng et al., MIT/Microsoft (2023)
26% more tasks completed in fixed time window; 84% more successful builds Accenture/GitHub RCT (2024)
Experienced OSS developers were 19% slower using AI on their own repos METR Randomized Trial (2025)
Productivity gains shrink to <10% on high-complexity tasks McKinsey (2024)

Adoption

Finding Source
84% of developers use or plan to use AI tools (up from 76% in 2024) Stack Overflow Developer Survey (2025)
51% use AI tools daily Stack Overflow (2025)
90% of DORA survey respondents use AI at work Google DORA Report (2025)
~30% of Copilot suggestions are immediately accepted; 88% are kept long-term GitHub Research (2024)

What Big Tech Reports

  • Google: 25% of code is AI-assisted; ~10% increase in engineering velocity (Pichai, Q3 2024)
  • Microsoft: 20–30% of code written by AI (Nadella, 2025)
  • Amazon: ~25% of code written by AI (2025)

Warning

The perception gap is real. In the METR study, developers believed AI made them 20% faster — when it actually made them 19% slower. That’s a 39-point gap between perception and reality. Always measure, don’t guess.

2. Where AI Helps Most (and Least)

High impact (30–55% time savings)

Task Why AI is good at it
Boilerplate & glue code CRUD endpoints, DTOs, serializers, config files — repetitive and pattern-heavy
Test scaffolding Setup, teardown, assertions from existing code — AI is faster than writing by hand
Documentation Docstrings, README drafts, API examples — AI captures the “obvious” parts well
Repetitive refactors Renaming, pattern updates across many files — mechanical work AI handles reliably

Low impact or risky (0–10%, sometimes negative)

Task Why AI struggles
Architecture & system design Requires deep understanding of tradeoffs, constraints, business context
Complex business logic Subtle domain rules that aren’t in the training data
Security-critical flows Auth, payments, privacy — AI generates plausible but insecure code
Large mature codebases METR showed experienced devs were slower — existing context is hard for AI to grasp

Tip

For low-impact tasks, use AI as a brainstorming partner, not a code generator. Ask it to explain options, not write the implementation.

3. The Quality Trade-off

Speed comes at a cost if you’re not careful:

Finding Source
AI-generated PRs have 1.7x more issues than human-written PRs CodeRabbit (2025)
4x growth in code duplication (2021–2024); copy-pasted lines rose from 8.3% to 12.3% GitClear (2025)
Refactoring dropped from 25% to <10% of changed lines — devs generate instead of refactor GitClear (2025)
Code revised within 2 weeks of commit grew from 3.1% to 5.7% — more premature commits GitClear (2025)
AI adoption reduces delivery stability by 7.2% at the org level Google DORA Report (2024)

Caution

More code ≠ better code. The GitClear data shows developers are generating more, refactoring less, and committing faster — with more defects. Speed without discipline creates debt.

4. Developer Sentiment

Trust is declining even as adoption rises:

Finding Source
Only 29% trust AI output to be accurate (down from 40% in 2024) Stack Overflow (2025)
66% say biggest frustration is “solutions that are almost right, but not quite” Stack Overflow (2025)
45% say debugging AI code is more time-consuming Stack Overflow (2025)
90% of Accenture devs felt more fulfilled using Copilot Accenture/GitHub (2024)

The pattern: developers use AI tools they don’t fully trust. This makes review discipline even more important.


5. The DORA Insight

The 2025 DORA Report found the most important thing about AI in teams:

AI acts as a multiplier of existing conditions.
It strengthens high-performing teams and exposes weaknesses in struggling teams.

This means:

  • If your team already has good review, testing, and deployment practices → AI makes you faster
  • If your team skips reviews, has no tests, deploys manually → AI makes the mess bigger, faster

AI doesn’t fix broken processes. It accelerates them.


6. What to Measure

Avoid vanity metrics like “lines of code generated” or “number of prompts.” Track what matters:

Delivery (DORA metrics)

Metric What it tells you
Lead time for changes Idea to production — are we actually shipping faster?
Deployment frequency How often do we ship?
Change failure rate What % of releases cause incidents?
Mean time to recovery How fast do we fix production issues?

Quality

Metric What it tells you
Bug rate per release Are we introducing more defects?
PR size and review time Are PRs still reviewable, or are they giant AI dumps?
Test coverage trend Is coverage going up or down with AI?
Code churn How much code gets rewritten within 2 weeks? (should be low)

Team Health

Metric What it tells you
Time on boilerplate vs. real problems Are devs spending time on interesting work?
Onboarding speed Can new members contribute faster with AI support?
AI suggestion acceptance rate Are devs critically evaluating, or auto-accepting? (too high = red flag)

7. Simple ROI Model

Developer cost:          €80,000/year (~€50/hour)
Work hours per month:    160h
Realistic AI uplift:     20–30% on routine tasks (conservative)
Routine task share:      ~50% of work

Hours saved per month  = 160h × 0.50 × 0.25 = 20h
Value of saved time    = 20h × €50 = €1,000
Tool cost per month    = ~€20-40

Net benefit per dev    ≈ €960-980/month

Even conservative estimates pay for the tools 25-50x over. But only if quality doesn’t regress — otherwise you spend the saved time fixing AI-generated bugs.


8. Adoption Timeline

What to expect when a team starts using AI tools:

Phase Timeline What happens
Exploration Months 1–2 People experiment; productivity may dip while learning
Integration Months 3–4 Shared rules and workflows take hold; 20–30% gains on routine work
Optimization Months 5–6 Teams refine prompts, rules, CI checks; AI becomes the default for boilerplate
Mastery Month 7+ AI becomes a force multiplier; gains of 40–50% on suitable tasks

Track metrics across all phases — not just after a single pilot week.


Checklist: Are We Getting Real Value?

Ask quarterly:

  • Do we ship more features at the same or higher quality?
  • Did our incident rate stay flat or go down?
  • Are PRs still reviewable in size and scope?
  • Do developers spend more time on hard problems and less on boilerplate?
  • Are we using AI to improve tests and docs, not just generate code?
  • Is our code churn (rewrites within 2 weeks) stable or decreasing?

If the answer to several of these is “no”, the problem is workflow and discipline, not the tools.


Sources