AI-Assisted Development

Productivity & Metrics

Workshop — Part 11: What the data actually says

AI-Assisted Development

AI coding tools promise big productivity gains.

But how much do they really help?

AI-Assisted Development

What the Studies Say

Optimistic findings:

55.8% faster on simple HTTP server tasks
(MIT/Microsoft, 2023)
26% more tasks completed in a fixed time window
(Accenture/GitHub RCT, 2024)
~25% of all code is AI-assisted at Google, Microsoft, Amazon

Sobering findings:

Experienced OSS developers were 19% slower using AI on their own repos
(METR, 2025)
Gains shrink to <10% on high-complexity tasks
(McKinsey, 2024)
84% of developers use AI — but only 29% trust AI output to be accurate
(Stack Overflow, 2025)

AI-Assisted Development

In the METR study, developers believed AI made them 20% faster — when it actually made them 19% slower.

That’s a 39-point gap between perception and reality.

Always measure. Don’t guess.

AI-Assisted Development

Where AI Helps Most and Least

High impact (30–55% time savings)

Boilerplate & glue code — CRUD, DTOs, config files
Test scaffolding — setup, teardown, assertions
Documentation — docstrings, README drafts, API examples
Repetitive refactors — renaming and pattern updates across many files

Low impact or risky (0–10%, sometimes negative)

Architecture & system design — requires deep understanding of tradeoffs
Complex business logic — subtle domain rules not in training data
Security-critical flows — auth, payments, privacy
Large mature codebases — existing context is hard for AI to grasp

For low-impact tasks, use AI as a brainstorming partner, not a code generator.

AI-Assisted Development

The Quality Trade-off

Speed comes at a cost if you’re not careful:

AI-generated PRs have 1.7x more issues than human-written PRs
(CodeRabbit, 2025)
4x growth in code duplication from 2021 to 2024
Refactoring dropped from 25% to <10% of changed lines — devs generate instead of refactor
(GitClear, 2025)

Code revised within 2 weeks grew from 3.1% to 5.7% — more premature commits
AI adoption reduces delivery stability by 7.2% at the org level
(Google DORA, 2024)

More code ≠ better code.

Speed without discipline creates debt.

AI-Assisted Development

The DORA Insight

The 2025 DORA Report found the most important thing about AI in teams:

AI acts as a multiplier of existing conditions.

High-performing teams with good practices → AI makes you faster
Teams that skip reviews, have no tests, deploy manually → AI makes the mess bigger, faster

AI doesn’t fix broken processes. It accelerates them.

AI-Assisted Development

What to Measure

Delivery (DORA metrics)

Lead time for changes
Deployment frequency
Change failure rate
Mean time to recovery

Quality

Bug rate per release
PR size and review time
Test coverage trend
Code churn (rewrites within 2 weeks)

Team Health

Time on boilerplate vs. real problems
Onboarding speed for new members
AI suggestion acceptance rate (too high = red flag — auto-accepting)

Avoid vanity metrics:

Lines of code generated
Number of prompts
AI “tokens saved”

AI-Assisted Development

Simple ROI Model

Developer cost:         €80,000/year (~€50/hour)
Work hours per month:   160h
Realistic AI uplift:    20–30% on routine tasks (conservative)
Routine task share:     ~50% of work

Hours saved per month = 160h × 0.50 × 0.25 = 20h
Value of saved time   = 20h × €50 = €1,000
Tool cost per month   = ~€20–40

Net benefit per dev   ≈ €960–980/month

Even conservative estimates pay for the tools 25–50× over.

But only if quality doesn’t regress — otherwise you spend the saved time fixing AI-generated bugs.

AI-Assisted Development

Adoption Timeline

Step

Months 1–2

Exploration
People experiment. Productivity may dip while learning.

Months 3–4

Integration
Shared rules and workflows take hold.
20–30% gains on routine work.

Months 5–6

Optimization
Teams refine prompts, rules, CI checks.
AI becomes the default for boilerplate.

Month 7+

Mastery
AI becomes a force multiplier.
Gains of 40–50% on suitable tasks.

AI-Assisted Development

Are We Getting Real Value?

Ask quarterly:

Do we ship more features at the same or higher quality?
Did our incident rate stay flat or go down?
Are PRs still reviewable in size and scope?
Do developers spend more time on hard problems and less on boilerplate?
Are we using AI to improve tests and docs, not just generate code?
Is our code churn (rewrites within 2 weeks) stable or decreasing?

If the answer to several of these is “no” — the problem is workflow and discipline, not the tools.