Productivity & Metrics
AI coding tools promise big productivity gains — but how much do they really help?
The answer is nuanced: it depends on the task, the team, and the discipline around the tool.
1. What the Data Actually Says
Speed & Output
| Finding | Source |
|---|---|
| Developers completed tasks 55.8% faster with Copilot (simple HTTP server task) | Peng et al., MIT/Microsoft (2023) |
| 26% more tasks completed in fixed time window; 84% more successful builds | Accenture/GitHub RCT (2024) |
| Experienced OSS developers were 19% slower using AI on their own repos | METR Randomized Trial (2025) |
| Productivity gains shrink to <10% on high-complexity tasks | McKinsey (2024) |
Adoption
| Finding | Source |
|---|---|
| 84% of developers use or plan to use AI tools (up from 76% in 2024) | Stack Overflow Developer Survey (2025) |
| 51% use AI tools daily | Stack Overflow (2025) |
| 90% of DORA survey respondents use AI at work | Google DORA Report (2025) |
| ~30% of Copilot suggestions are immediately accepted; 88% are kept long-term | GitHub Research (2024) |
What Big Tech Reports
- Google: 25% of code is AI-assisted; ~10% increase in engineering velocity (Pichai, Q3 2024)
- Microsoft: 20–30% of code written by AI (Nadella, 2025)
- Amazon: ~25% of code written by AI (2025)
Warning
2. Where AI Helps Most (and Least)
High impact (30–55% time savings)
| Task | Why AI is good at it |
|---|---|
| Boilerplate & glue code | CRUD endpoints, DTOs, serializers, config files — repetitive and pattern-heavy |
| Test scaffolding | Setup, teardown, assertions from existing code — AI is faster than writing by hand |
| Documentation | Docstrings, README drafts, API examples — AI captures the “obvious” parts well |
| Repetitive refactors | Renaming, pattern updates across many files — mechanical work AI handles reliably |
Low impact or risky (0–10%, sometimes negative)
| Task | Why AI struggles |
|---|---|
| Architecture & system design | Requires deep understanding of tradeoffs, constraints, business context |
| Complex business logic | Subtle domain rules that aren’t in the training data |
| Security-critical flows | Auth, payments, privacy — AI generates plausible but insecure code |
| Large mature codebases | METR showed experienced devs were slower — existing context is hard for AI to grasp |
Tip
3. The Quality Trade-off
Speed comes at a cost if you’re not careful:
| Finding | Source |
|---|---|
| AI-generated PRs have 1.7x more issues than human-written PRs | CodeRabbit (2025) |
| 4x growth in code duplication (2021–2024); copy-pasted lines rose from 8.3% to 12.3% | GitClear (2025) |
| Refactoring dropped from 25% to <10% of changed lines — devs generate instead of refactor | GitClear (2025) |
| Code revised within 2 weeks of commit grew from 3.1% to 5.7% — more premature commits | GitClear (2025) |
| AI adoption reduces delivery stability by 7.2% at the org level | Google DORA Report (2024) |
Caution
4. Developer Sentiment
Trust is declining even as adoption rises:
| Finding | Source |
|---|---|
| Only 29% trust AI output to be accurate (down from 40% in 2024) | Stack Overflow (2025) |
| 66% say biggest frustration is “solutions that are almost right, but not quite” | Stack Overflow (2025) |
| 45% say debugging AI code is more time-consuming | Stack Overflow (2025) |
| 90% of Accenture devs felt more fulfilled using Copilot | Accenture/GitHub (2024) |
The pattern: developers use AI tools they don’t fully trust. This makes review discipline even more important.
5. The DORA Insight
The 2025 DORA Report found the most important thing about AI in teams:
AI acts as a multiplier of existing conditions.
It strengthens high-performing teams and exposes weaknesses in struggling teams.
This means:
- If your team already has good review, testing, and deployment practices → AI makes you faster
- If your team skips reviews, has no tests, deploys manually → AI makes the mess bigger, faster
AI doesn’t fix broken processes. It accelerates them.
6. What to Measure
Avoid vanity metrics like “lines of code generated” or “number of prompts.” Track what matters:
Delivery (DORA metrics)
| Metric | What it tells you |
|---|---|
| Lead time for changes | Idea to production — are we actually shipping faster? |
| Deployment frequency | How often do we ship? |
| Change failure rate | What % of releases cause incidents? |
| Mean time to recovery | How fast do we fix production issues? |
Quality
| Metric | What it tells you |
|---|---|
| Bug rate per release | Are we introducing more defects? |
| PR size and review time | Are PRs still reviewable, or are they giant AI dumps? |
| Test coverage trend | Is coverage going up or down with AI? |
| Code churn | How much code gets rewritten within 2 weeks? (should be low) |
Team Health
| Metric | What it tells you |
|---|---|
| Time on boilerplate vs. real problems | Are devs spending time on interesting work? |
| Onboarding speed | Can new members contribute faster with AI support? |
| AI suggestion acceptance rate | Are devs critically evaluating, or auto-accepting? (too high = red flag) |
7. Simple ROI Model
Developer cost: €80,000/year (~€50/hour)
Work hours per month: 160h
Realistic AI uplift: 20–30% on routine tasks (conservative)
Routine task share: ~50% of work
Hours saved per month = 160h × 0.50 × 0.25 = 20h
Value of saved time = 20h × €50 = €1,000
Tool cost per month = ~€20-40
Net benefit per dev ≈ €960-980/month
Even conservative estimates pay for the tools 25-50x over. But only if quality doesn’t regress — otherwise you spend the saved time fixing AI-generated bugs.
8. Adoption Timeline
What to expect when a team starts using AI tools:
| Phase | Timeline | What happens |
|---|---|---|
| Exploration | Months 1–2 | People experiment; productivity may dip while learning |
| Integration | Months 3–4 | Shared rules and workflows take hold; 20–30% gains on routine work |
| Optimization | Months 5–6 | Teams refine prompts, rules, CI checks; AI becomes the default for boilerplate |
| Mastery | Month 7+ | AI becomes a force multiplier; gains of 40–50% on suitable tasks |
Track metrics across all phases — not just after a single pilot week.
Checklist: Are We Getting Real Value?
Ask quarterly:
- Do we ship more features at the same or higher quality?
- Did our incident rate stay flat or go down?
- Are PRs still reviewable in size and scope?
- Do developers spend more time on hard problems and less on boilerplate?
- Are we using AI to improve tests and docs, not just generate code?
- Is our code churn (rewrites within 2 weeks) stable or decreasing?
If the answer to several of these is “no”, the problem is workflow and discipline, not the tools.
Sources
- Peng et al. — The Impact of AI on Developer Productivity (2023)
- GitHub/Accenture — Quantifying Copilot’s Impact in the Enterprise (2024)
- METR — AI Tools for Experienced OSS Developers (2025)
- McKinsey — Unleashing Developer Productivity with GenAI (2024)
- Google DORA Report (2025)
- Stack Overflow Developer Survey (2025)
- GitClear — AI Code Quality Research (2025)
- CodeRabbit — AI vs Human Code Generation (2025)