Productivity & Metrics

AI coding tools promise big productivity gains — but how much do they really help?

The answer is nuanced: it depends on the task, the team, and the discipline around the tool.

1. What the Data Actually Says

Speed & Output

Finding	Source
Developers completed tasks 55.8% faster with Copilot (simple HTTP server task)	Peng et al., MIT/Microsoft (2023)
26% more tasks completed in fixed time window; 84% more successful builds	Accenture/GitHub RCT (2024)
Experienced OSS developers were 19% slower using AI on their own repos	METR Randomized Trial (2025)
Productivity gains shrink to <10% on high-complexity tasks	McKinsey (2024)

Adoption

Finding	Source
84% of developers use or plan to use AI tools (up from 76% in 2024)	Stack Overflow Developer Survey (2025)
51% use AI tools daily	Stack Overflow (2025)
90% of DORA survey respondents use AI at work	Google DORA Report (2025)
~30% of Copilot suggestions are immediately accepted; 88% are kept long-term	GitHub Research (2024)

What Big Tech Reports

Google: 25% of code is AI-assisted; ~10% increase in engineering velocity (Pichai, Q3 2024)
Microsoft: 20–30% of code written by AI (Nadella, 2025)
Amazon: ~25% of code written by AI (2025)

Warning

The perception gap is real. In the METR study, developers believed AI made them 20% faster — when it actually made them 19% slower. That’s a 39-point gap between perception and reality. Always measure, don’t guess.

2. Where AI Helps Most (and Least)

High impact (30–55% time savings)

Task	Why AI is good at it
Boilerplate & glue code	CRUD endpoints, DTOs, serializers, config files — repetitive and pattern-heavy
Test scaffolding	Setup, teardown, assertions from existing code — AI is faster than writing by hand
Documentation	Docstrings, README drafts, API examples — AI captures the “obvious” parts well
Repetitive refactors	Renaming, pattern updates across many files — mechanical work AI handles reliably

Low impact or risky (0–10%, sometimes negative)

Task	Why AI struggles
Architecture & system design	Requires deep understanding of tradeoffs, constraints, business context
Complex business logic	Subtle domain rules that aren’t in the training data
Security-critical flows	Auth, payments, privacy — AI generates plausible but insecure code
Large mature codebases	METR showed experienced devs were slower — existing context is hard for AI to grasp

Tip

For low-impact tasks, use AI as a brainstorming partner, not a code generator. Ask it to explain options, not write the implementation.

3. The Quality Trade-off

Speed comes at a cost if you’re not careful:

Finding	Source
AI-generated PRs have 1.7x more issues than human-written PRs	CodeRabbit (2025)
4x growth in code duplication (2021–2024); copy-pasted lines rose from 8.3% to 12.3%	GitClear (2025)
Refactoring dropped from 25% to <10% of changed lines — devs generate instead of refactor	GitClear (2025)
Code revised within 2 weeks of commit grew from 3.1% to 5.7% — more premature commits	GitClear (2025)
AI adoption reduces delivery stability by 7.2% at the org level	Google DORA Report (2024)

Caution

More code ≠ better code. The GitClear data shows developers are generating more, refactoring less, and committing faster — with more defects. Speed without discipline creates debt.

4. Developer Sentiment

Trust is declining even as adoption rises:

Finding	Source
Only 29% trust AI output to be accurate (down from 40% in 2024)	Stack Overflow (2025)
66% say biggest frustration is “solutions that are almost right, but not quite”	Stack Overflow (2025)
45% say debugging AI code is more time-consuming	Stack Overflow (2025)
90% of Accenture devs felt more fulfilled using Copilot	Accenture/GitHub (2024)

The pattern: developers use AI tools they don’t fully trust. This makes review discipline even more important.

5. The DORA Insight

The 2025 DORA Report found the most important thing about AI in teams:

AI acts as a multiplier of existing conditions.
It strengthens high-performing teams and exposes weaknesses in struggling teams.

This means:

If your team already has good review, testing, and deployment practices → AI makes you faster
If your team skips reviews, has no tests, deploys manually → AI makes the mess bigger, faster

AI doesn’t fix broken processes. It accelerates them.

6. What to Measure

Avoid vanity metrics like “lines of code generated” or “number of prompts.” Track what matters:

Delivery (DORA metrics)

Metric	What it tells you
Lead time for changes	Idea to production — are we actually shipping faster?
Deployment frequency	How often do we ship?
Change failure rate	What % of releases cause incidents?
Mean time to recovery	How fast do we fix production issues?

Quality

Metric	What it tells you
Bug rate per release	Are we introducing more defects?
PR size and review time	Are PRs still reviewable, or are they giant AI dumps?
Test coverage trend	Is coverage going up or down with AI?
Code churn	How much code gets rewritten within 2 weeks? (should be low)

Team Health

Metric	What it tells you
Time on boilerplate vs. real problems	Are devs spending time on interesting work?
Onboarding speed	Can new members contribute faster with AI support?
AI suggestion acceptance rate	Are devs critically evaluating, or auto-accepting? (too high = red flag)

7. Simple ROI Model

Developer cost:          €80,000/year (~€50/hour)
Work hours per month:    160h
Realistic AI uplift:     20–30% on routine tasks (conservative)
Routine task share:      ~50% of work

Hours saved per month  = 160h × 0.50 × 0.25 = 20h
Value of saved time    = 20h × €50 = €1,000
Tool cost per month    = ~€20-40

Net benefit per dev    ≈ €960-980/month

Even conservative estimates pay for the tools 25-50x over. But only if quality doesn’t regress — otherwise you spend the saved time fixing AI-generated bugs.

8. Adoption Timeline

What to expect when a team starts using AI tools:

Phase	Timeline	What happens
Exploration	Months 1–2	People experiment; productivity may dip while learning
Integration	Months 3–4	Shared rules and workflows take hold; 20–30% gains on routine work
Optimization	Months 5–6	Teams refine prompts, rules, CI checks; AI becomes the default for boilerplate
Mastery	Month 7+	AI becomes a force multiplier; gains of 40–50% on suitable tasks

Track metrics across all phases — not just after a single pilot week.

Checklist: Are We Getting Real Value?

Ask quarterly:

Do we ship more features at the same or higher quality?
Did our incident rate stay flat or go down?
Are PRs still reviewable in size and scope?
Do developers spend more time on hard problems and less on boilerplate?
Are we using AI to improve tests and docs, not just generate code?
Is our code churn (rewrites within 2 weeks) stable or decreasing?

If the answer to several of these is “no”, the problem is workflow and discipline, not the tools.