reviews

AI Makes Programmers 19% Slower, METR Study Finds

The scientific study that contradicts everything AI companies have been selling us

Sarah Chen
-January 28, 2026-10 min read
Share:
Programmer working with code on screen representing productivity with AI tools

Photo by Mohammad Rahmani on Unsplash

Key takeaways

A controlled METR study reveals expert developers are 19% slower using Cursor and Claude, despite believing they're 20% faster. The perception gap is nearly 40 percentage points.

Let me break this down: imagine you're told a magical tool will make you 55% more productive. You install it, use it for months, and swear it's helping. But when someone measures your actual performance with a stopwatch, it turns out you're 19% slower than before you started using it.

That's exactly what METR's study published in July 2025 discovered, and the data is so uncomfortable for Silicon Valley that almost no one wants to talk about it.

The study no one wanted to see

METR (Model Evaluation and Threat Research) is a nonprofit AI safety organization based in Berkeley, California. It was founded by Beth Barnes, a former alignment researcher at OpenAI. They don't have stock in Cursor or Anthropic. They don't sell coding tools. They just wanted to measure the truth.

And the truth is uncomfortable.

The numbers don't lie

The study gathered 16 expert developers with one special characteristic: they all worked on their own repositories. These weren't made-up lab tasks. These were real bugs, real features, real refactors in projects they themselves maintained.

Metric Value
Developers participating 16
Tasks completed 246
Average experience in repos 5 years
Average repository size 1+ million lines of code
Average GitHub stars 22,000+
AI tools used Cursor Pro + Claude 3.5/3.7 Sonnet

Each task was randomly assigned: some completed with AI, others without. Developers recorded their screens and reported their times. A textbook randomized controlled trial (RCT).

The 40% paradox: belief vs. reality

Here's what most guides won't tell you: the developers weren't just slower—they were convinced they were faster.

  • Before starting: predicted they'd be 24% faster with AI
  • After finishing: believed they'd been 20% faster
  • Actual measured result: they were 19% SLOWER

The gap between perception and reality is nearly 40 percentage points. It's like believing you ran a marathon in 3 hours when you actually took 4 and a half.

Why do we believe AI helps when it doesn't?

One participant explained it perfectly:

"I thought AI was speeding me up. But we don't focus on all the time we actually spend—just on how pleasant it was to go back and forth with the AI."

The trick is that coding with AI feels easier, even if it's not faster. It requires less cognitive effort. It's like preferring to drive even though it takes longer than the subway: the journey feels less taxing.

In fact, 69% of participants kept using Cursor after the study, despite knowing it slowed them down.

Why does AI slow down experts?

The researchers identified five main factors:

1. Over-optimism

We've all seen demos where AI writes a complete app in minutes. The problem is those demos use new, small projects—not million-line codebases with years of implicit architectural decisions.

2. The developers knew too much

When you've spent 5 years working on a repository, you know every corner. You know why that if statement is there, why that function has that weird name, which dependencies not to touch. AI knows none of this.

One developer compared it to "a new contributor who doesn't understand the codebase yet." Imagine having to explain all the context to a junior for every task. Sometimes it's faster to just do it yourself.

3. Repositories too large and complex

AI tools shine with small, well-documented projects. But when you have over a million lines of code, AI simply can't process all the necessary context.

4. Low acceptance rate

Developers accepted less than 44% of AI suggestions. That means more than half of the generated code was rejected or required cleanup.

5. Hidden time costs

Look at how time was distributed when using AI:

  • 9% reviewing and cleaning AI outputs
  • 4% waiting for AI to generate responses
  • Less time actively programming
  • Less time searching for information

That 13% of time "lost" to AI interaction adds up task after task.

The exception that proves the rule

There was one developer who was actually faster with AI: approximately 20% more productive. The difference? They had over 50 hours of intensive experience with Cursor.

This suggests something important: the learning curve for these tools might require hundreds of hours before you see real benefits. And most studies (including those from the companies themselves) don't measure this.

The elephant in the room: industry promises

Now let's compare these results with what companies are selling us:

Source Claim Context
GitHub/Microsoft (2023) 55% faster Simple task (HTTP server), more benefit for juniors
Google DORA (2025) Higher throughput But stability concerns
METR (2025) 19% slower Seniors, mature repos, real tasks

The GitHub study everyone cites ("55% faster") had a problem: developers completed a simple, artificial task. It's like measuring a new car's speed only on an empty race track.

Other independent studies aren't optimistic either

Uplevel Data Labs measured 800 developers with objective metrics and found no productivity gain and 41% more bugs. Bain reported that time savings in real enterprise adoption were "not notable."

The reactions: from denial to recognition

Critics of the study

Emmett Shear, former interim CEO of OpenAI and founder of Twitch, was direct:

"METR's analysis is tremendously misleading. The results indicate that people who essentially NEVER used AI tools are less productive while learning to use them, and say nothing about experienced AI users."

Shear has a valid point: only 1 of 16 developers had more than a week of experience with Cursor specifically. But that also reveals something: most companies adopt these tools without giving adequate learning time.

The developer community

On Hacker News and Reddit, reactions were mixed. One backend developer summarized many people's frustration:

"I hate fixing AI-written code. It solves the task, yes. But it has no vision. AI code lacks a sense of architecture, intent, or care."

Stack Overflow data confirms the skepticism

The Stack Overflow 2025 survey showed:

  • Trust in AI dropped from 43% to 33%
  • Positive sentiment dropped from 70% to 60%
  • But adoption rose to 84%

In other words: more and more people use AI for coding, but fewer and fewer trust it. An interesting paradox.

What does this mean for you?

If you're a programmer, I'm not telling you to uninstall Cursor or Claude Code. But I am telling you to measure your real productivity, not your feeling.

Where AI actually helps

  • MVPs and prototypes: When code quality matters less than speed
  • Boilerplate: Repetitive code anyone could write
  • Unit tests: Generating basic test cases
  • Documentation: Explaining existing code
  • Unfamiliar codebases: When you're the newbie, not the AI

Where AI probably slows you down

  • Your own mature codebase: Where you're already the expert
  • High-standard code: Where every detail matters
  • Complex architecture: Where implicit context is key
  • Deep debugging: Where you need to understand the "why"

For companies: beware of phantom ROI

If you're a tech lead or CTO, this study should make you rethink how you measure the impact of AI tools on your team.

Common mistakes

  1. Measuring by perceptions: "Do you feel more productive?" isn't a valid metric
  2. Adoption without training: Installing Copilot isn't the same as integrating it correctly
  3. Ignoring the learning curve: 50+ hours of intensive practice doesn't happen in a week
  4. Scaling before validating: Using AI where it doesn't make sense

What actually works

  • Adopt a portfolio mindset: Use AI where it augments cognition (docs, boilerplate), not where human expertise dominates
  • Measure objectively: Real time per task, not satisfaction surveys
  • Identify real use cases: Not all tasks benefit equally
  • Give learning time: If you expect results in the first week, you'll be disappointed

Market context: valuations vs. reality

Meanwhile, AI coding company valuations keep climbing:

Company Valuation Date
Cursor $29.3 billion November 2025
Cognition (Devin) $10.2 billion September 2025

Cursor reported over $1 billion in annualized revenue in November 2025. The market is paying for adoption, not proven productivity.

This doesn't mean these companies have no value. It means the value the market assigns them is based on promises, not rigorous evidence.

The uncomfortable question

If AI really made developers 55% faster, why haven't software companies cut their engineering teams in half? Why are they still hiring at the same rate?

The likely answer: because CTOs who use these tools daily know that marketing numbers don't translate to real productivity. They know that a senior developer with experience in the codebase is still irreplaceable.

My takeaway

After analyzing this study and dozens of sources, here's what I think:

AI coding tools have real value, just not the value they're selling us. They're useful for specific tasks, for certain developer profiles, in certain contexts. They're not a magic wand that multiplies everyone's productivity.

The problem isn't the technology. It's the narrative. We've been sold that AI is the future of software development, when in reality it's one more tool in the programmer's arsenal. A tool that, like all tools, has its place and its limitations.

If you use Cursor or Claude Code, keep doing so. But measure your real productivity. Don't be fooled by the feeling that everything is easier. Easier doesn't always mean faster.

And if someone tells you AI will make you 55% more productive, ask them: in what context? With what prior experience? On what types of tasks?

The METR data suggests the honest answer is more complicated than the industry wants to admit.

Was this helpful?

Frequently Asked Questions

Does the METR study prove that AI for coding is useless?

Not exactly. The study shows that expert developers in their own repositories are slower with AI. But AI can be useful in other contexts: unfamiliar codebases, boilerplate tasks, rapid prototypes, and documentation.

How many developers participated in the METR study?

The study included 16 experienced developers who completed 246 real tasks in their own open-source repositories. All had at least 5 years of experience in their projects.

What AI tools were tested in the study?

Developers primarily used Cursor Pro with Claude 3.5 and 3.7 Sonnet. Cursor is one of the most popular and highly-valued AI coding tools on the market.

Why do developers believe they're faster when they're not?

The study suggests that coding with AI requires less cognitive effort, which feels like greater productivity. Developers enjoy interacting with AI even though it takes longer to complete tasks.

Should I stop using Cursor or Claude Code?

Not necessarily. The study suggests AI works better for certain tasks (boilerplate, prototypes, documentation) and worse for others (code in mature repositories where you're already an expert). The key is to measure your real productivity, not your perception.

Written by

Sarah Chen

Tech educator focused on AI tools. Making complex technology accessible since 2018.

#artificial intelligence#productivity#cursor#claude#software development#studies

Related Articles