Earn up to $200 for every friend you refer this Spring Get started

AI tools for web development: 5 studies, 1 uncomfortable outcome

Written on April 9, 2026 by Harry Boxhall

Updated on April 9, 2026

Estimated read time 5 minutes

AI tools for web development: 5 studies, 1 uncomfortable outcome

Every developer we talk to says some version of the same thing: "I know I should be using AI tools."

Your team uses Copilot. That freelancer rebuilt a whole app with Cursor over the weekend. The thought leaders on LinkedIn won't stop commenting about 10x productivity.

If you're not using AI, you're falling behind. Right?

Maybe.

But probably not in the way you'd expect.

METR put this to the test. They asked 16 experienced open-source developers to complete real tasks, randomly assigning some to use AI tools and others not. The developers predicted AI would save them about 24% of their time.

The actual result? AI made them 19% slower.

And here's the kicker: after finishing, those developers still believed AI had helped. The feeling of productivity was so convincing it overrode reality.

We sat down with Paul, one of our leading web developers, and asked what he actually uses, what he's dropped, and what he'd tell another developer who feels the pressure to go all in.

What our leading developer actually uses day to day

Paul's setup is simpler than you'd expect. He uses one tool: Cursor.

No stack of five AI assistants. No rotating between tools depending on the task. Just Cursor, running OpenAI's Codex models, open in his editor all day.

Paul, Web Developer at hosting.com:

I only use AI in work. I generally try and use the Codex models. I feel they are better for coding, and I don't really like how Anthropic is run as a company, and they close source everything.

You won't read that opinion in most tool comparison articles. But it's the kind of honest preference that shapes real tool choices. Developers don't just pick tools on benchmarks. They pick them on trust, values, and what fits their workflow. The feature Paul uses most isn't code generation:

Recently I have been using Plan Mode a lot, especially if something we've been asked to do is complex and risky. That way we can write down a lot of our reasoning for going down a path, and also highlighting risks.

The biggest value Paul gets from his AI coding tool isn't the AI writing code. It's the AI helping him think through complex changes before touching anything. Planning, risk assessment, documenting reasoning. The stuff that happens before a single line gets written.

We currently don't have any AI rules set up in an Agent.md or in Cursor rules, so I do rewrite a lot of stuff to get more into our way, but this happens over many iterations.

He's not unusual. 66% of developers told Stack Overflow that "almost right, but not quite" is their single biggest frustration with AI code. Close enough to look useful. Wrong enough to cost you time fixing it.

We followed up and asked: "Has AI-generated code ever introduced a bug that made it past review?"

Not yet. We're quite thorough with testing. Edge cases can obviously slip through, but most of our work is on static websites. So most code issues can be easily and quickly resolved even when it's a manually coded bug.

Not a blanket endorsement of AI reliability. Just an observation that good testing practices still catch problems regardless of where the code came from. As you'll see later, the research backs him up on this.

What developers have quietly dropped

Remember vibe coding? Describing an app in plain English and letting AI build the whole thing? Great demo. Less great in production, because nobody on the team can read or maintain what comes out. Stack Overflow found nearly 77% of respondents said vibe coding is not part of their professional development work.

Autonomous agents haven't landed either.

Only 31% of developers currently use them, and 38% say they've got no plans to. The pitch is compelling: an AI that plans, executes, and ships without you hovering over it. The reality is that most developers still want to see what's happening before it goes anywhere near production.

Seems that although developers adopted the assistive features (autocomplete, chat, inline suggestions), they are skeptical or just straight up rejecting the autonomous ones.

The AI coding tools that actually matter in 2026

There are dozens of AI development tools available right now. However, here are the ones we think are worth knowing about, with honest assessments rather than just bullet point feature lists. If you're looking for broader trends beyond AI tooling, we covered the full web development landscape for 2026 separately.

Cursor

An interesting history, Cursor took VS Code, forked it, and rebuilt the whole thing around AI. Not a plugin. Not an extension. A full IDE where AI is part of every interaction:

Tab completions that predict your next edit
Inline edits triggered by natural language
Composer for multi-file changes in a single prompt
Background agents that can run tasks autonomously
Plan Mode for mapping out complex changes before writing code

The growth has been rapid, with Cursor becoming one of the most talked about AI coding tools in a short space of time. It's earned a strong reputation among developers who want deep AI integration in their editor.

Paul runs Codex models within Cursor rather than the default Anthropic models. He's also aware of the standalone Codex CLI tool but prefers staying in his editor: "I've heard good things about the Codex harness over Claude Code, so maybe if I was to use one of those I'd go that way, but I still prefer to be in my editor, so I'm sticking with Cursor for now."

The catch? Well i June 2025, Cursor moved from a simple "500 fast requests per month" model to a credit-based system. Costs now vary depending on which AI model your request touches and how complex it is. Cursor's own guidance suggests that daily tab completion users usually stay within the included usage, but daily agent users often land around $60 to $100 per month in total usage. Power users regularly blow past the $20/month plan before the month ends.

I use Cursor at work and it helps me a lot, especially Plan Mode. It's not just the writing of code, Plan Mode does help with planning out tasks and understanding risks of making changes, so this has been a big help for me.

GitHub Copilot

Copilot remains the most widely adopted AI coding assistant, and what started as a glorified autocomplete has grown into something more interesting:

Inline code suggestions across all major IDEs
Copilot Chat for natural language questions about your code
Agent Mode, now available to paid users in VS Code
MCP support, rolled out separately to VS Code users
GitHub's coding agent, available to paid Copilot users for autonomous task handling
Workspace-level context awareness
A free tier that's actually useful (2,000 completions + 50 chat or agent requests per month)

A monthly or the Pro plan, Copilot is the best value entry point in this space. Nothing else comes close at that price. Agent Mode is still playing catch-up to Cursor on raw capability, but for most developers who just want solid autocomplete and the ability to ask questions about their codebase, it does the job. Overages on premium requests cost $0.04 each, so at least the pricing surprises stay small.

Windsurf (formerly Codeium)

Cognition AI (the team behind Devin) agreed to acquire Windsurf in July 2025, though the financial terms were not publicly disclosed. The feature that sets it apart is Cascade, which blends the chat interface and the editor into one flow. The AI reads your entire codebase, plans changes, explains its reasoning, and executes while you steer.

It sits at #1 in the LogRocket AI Dev Tool Power Rankings as of February 2026. Windsurf's public pricing now lists Pro at $20/month and Teams at $40/user/month, with a March 2026 move to new self-serve usage-based plans.

Good first AI coding tool if you're getting started with AI-assisted development. The model selection is more restricted than Cursor's, custom rules are more basic, and the community is smaller than Cursor's or Copilot's. The Cognition acquisition also raises questions about long-term direction that nobody can answer yet.

v0 by Vercel

Different category to the others. v0 generates React and Next.js components from natural language prompts. It rebranded from v0.dev to v0.app in August 2025, and its February 2026 update added a new sandbox runtime, Git panel, and database integrations.

The output quality is good. Really good. Components look like they were built by a senior frontend developer with good design taste: proper architecture, Tailwind utilities, shadcn/ui patterns. You can drop them straight into a Next.js project. Vercel now presents v0 as handling UI, backend logic, and team collaboration, so it's broader than the original frontend-only tool.

The limitation: a full-stack generation can burn through your $20/month credits in a handful of prompts. Brilliant for prototyping. Less suited as a primary production workflow.

CodeRabbit (AI code review)

CodeRabbit is an AI-first pull request reviewer. Line-by-line feedback. Most-installed AI app on GitHub. Running on over 2 million repositories. Free for public and open-source repos, with a broader free plan also available. Paid plans start at $24/dev/month (billed annually) or $30 month-to-month.

Not sure we can use AI Code Review because we self-host GitLab behind IP restrictions, not sure our infrastructure team would allow that yet. A couple of YouTubers that I follow rave about CodeRabbit, especially as it's free if your GitHub repo is public and open source.

Worth noting: this isn't just Paul being cautious. Plenty of agencies and in-house teams operate under security accreditations that dictate exactly what software can touch the codebase, and AI tools are still being assessed against those frameworks.

Useful as a first-pass reviewer, especially for teams where PRs sit in queues. Business logic review has limitations since AI review tools work primarily from the diff context, and the self-hosted Git restriction isn't just Paul's problem. Plenty of agencies and studios run setups that cloud-based review tools can't easily reach.

Claude Code

Claude Code scores near the top of benchmarks (80.9% on SWE-bench Verified using Claude Opus 4.5) and is used by about 10% of developers among the newly tracked AI-enabled IDE tools in Stack Overflow's survey, but Paul avoids it on principle. Codex CLI is a terminal-based agent, though Codex also exists via web, app, and IDE routes. Both are legitimate tools, but neither fits how Paul prefers to work.

Pricing comparison: what you'll actually pay

Flat monthly fees are mostly gone. Credits, tokens, request-based billing. Here's what you're actually looking at as of March 2026:

Tool	Free tier	Individual	Team	Best for
Cursor	Hobby (limited)	$20/mo (Pro)	$40/user/mo	Daily driver for VS Code users
GitHub Copilot	2,000 completions + 50 chat/agent requests	$10/mo (Pro)	$19/user/mo	Best value entry point
Windsurf	Limited free tier	$20/mo (Pro)	$40/user/mo	AI beginners, guided workflow
v0 by Vercel	$5/mo in credits	$20/mo (Premium)	$30/user/mo	UI prototyping, React/Next.js
CodeRabbit	Free (public repos + broader free plan)	$24/dev/mo (annual)	Custom	Code review automation

Sources: Cursor, GitHub Copilot, Windsurf, v0, CodeRabbit. All prices verified March 2026.

Watch the real cost, not the sticker price. Two years ago, $20/month meant $20/month. Now that same plan uses credits that drain faster depending on which AI model your request touches. Cursor's own documentation notes that heavy agent users often end up at $60 to $100 per month. Multiply that across a team of five or ten developers and budgeting becomes a proper headache.

For freelancers, it's annoying. For agencies and teams trying to budget across multiple developers, it's worth running your actual usage for a full month before committing.

Quick guide: which tool for which situation

You're on a budget and just want to try AI coding: GitHub Copilot Free. Actually useful, zero cost, pairs well with any developer-friendly hosting setup
You live in VS Code and want deep AI integration: Cursor Pro ($20/mo). It's what Paul uses daily
You're brand new to AI-assisted development: Windsurf ($20/mo). Cascade walks you through changes step by step
You need fast UI prototyping in React/Next.js: v0 by Vercel. Best-in-class component output
Your PRs sit in review queues for days: CodeRabbit. Catches the obvious stuff while humans focus on logic
You want one tool to learn properly: Cursor. Paul's approach, and the research supports picking one and going deep

What the research actually says

The marketing claims are bold.

GitHub's own research found Copilot users completed a controlled coding task 55% faster (building an HTTP server in JavaScript, P=.0017). McKinsey found developers can complete certain coding tasks up to twice as fast with generative AI in a study of 40+ developers across code generation, refactoring, and documentation tasks.

Those gains are real for specific, well-scoped tasks. But five independent studies paint a messier picture.

1. METR: AI made experienced developers 19% slower

Study: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Sample: 16 developers, 246 real issues, randomized controlled trial

Tools used: Primarily Cursor Pro with Claude 3.5/3.7 Sonnet

Tasks took 19% longer with AI than without. But the perception gap is what's interesting:

Before the study, developers predicted AI would save them 24% of their time
After finishing, they still believed AI had saved them about 20%
The actual measured result was 19% slower

Why? Because experienced developers on familiar codebases already work fast.

Adding AI means writing prompts, reviewing suggestions, debugging AI mistakes, and constantly switching between your own thinking and the AI's. For these developers, the overhead ate the time savings whole.

2. Google DORA: faster coding, less stable software

Study: Accelerate State of DevOps Report 2024

Sample: Thousands of engineering teams globally

Published by: Google Cloud

For every 25% rise in AI adoption within a team:

Individual productivity increased by ~2.1%
Developer experience (flow) increased by ~2.6%
Code review speed increased by ~3.1%
Delivery stability dropped by 7.2%
Delivery throughput dropped by 1.5%

Developers were coding faster but producing larger changesets, testing less rigorously, and creating more post-deployment issues. DORA described it as AI improving the development process without improving software delivery.

3. Stack Overflow: everyone uses AI, nobody trusts it

Study: 2025 Stack Overflow Developer Survey

Sample: 49,000+ developers across 177 countries

Scope: 62 questions covering 314 technologies

The numbers that matter:

84% of developers use or plan to use AI tools (up from 76% in 2024)
Trust remains weak: only about a third of developers somewhat or highly trust AI output, while 46% distrust it
66% say their biggest frustration is AI solutions that are "almost right, but not quite"
45% say debugging AI-generated code takes longer than expected

The "almost right" problem is the defining frustration of 2026. AI gets you 80% of the way there, then you spend just as long wrestling with the last 20%.

4. GitClear: the hidden cost in your codebase

Study: AI Copilot Code Quality 2025

Sample: 211 million changed lines of code, January 2020 to December 2024

Method: Longitudinal analysis spanning pre-AI and post-AI eras

Three numbers that should bother any team lead:

Code churn (lines rewritten within two weeks) rose from 3.05% in 2020 to 5.67% in 2024
Copy-pasted changed lines rose from 8.66% in 2021 to 12.32% in 2024
Moved or refactored code fell from 24.65% in 2021 to 9.47% over the same period

AI writes code that works. It also copies and pastes rather than abstracting properly, and nobody's going back to clean it up. That's technical debt on a compound interest schedule.

5. Uplevel: more bugs, more pressure

Study: AI for Developer Productivity

Sample: ~800 developers (351 with Copilot, 434 without) across Uplevel's enterprise customer base

Method: Longitudinal analysis comparing teams with and without Copilot access

The findings nobody expected:

Developers with Copilot access introduced 41% more bugs while shipping at the same rate. More code, more of it broken
Productivity gains were concentrated in boilerplate and routine tasks. Complex problem-solving showed no improvement
Developers without Copilot experienced a 28% reduction in burnout risk. Copilot users only managed 17%

Paul's team hasn't had an AI bug make it past review. This study shows what happens when that discipline isn't there. Testing matters more now than it did before AI tools, not less.

The pattern across all five studies: AI helps with specific, well-scoped tasks. It is not the universal productivity multiplier the marketing decks promise. And the gap between how fast developers feel they're going and how fast they're actually going? Wider than anyone's comfortable admitting.

What we actually recommend

Set up your rules files before you write a single prompt. Cursor rules, Agent.md, .github/copilot-instructions.md. Whatever your tool supports. Paul rewrites "a lot" of AI output because his team hasn't configured theirs yet. That's hours of avoidable rework every week.

Pick one tool. Learn it properly. Every extra tool in your stack adds overhead, and the research says that overhead already costs more than most developers realise. Paul's one-tool approach isn't minimal for the sake of it. It's the setup that actually produces results.

Paul, asked what he'd tell a developer feeling pressure to go all in on AI:

I use Cursor at work and it helps me a lot, especially Plan Mode. It's not just the writing of code, Plan Mode does help with planning out tasks and understanding risks of making changes, so this has been a big help for me.

One last thing from Paul on the hype around AI replacing developers:

I heavily use AI at work, but for hobbyists it can be a little expensive. We have a Cursor licence here at work which allows us to use basically any AI model. I would like to add more but it's getting the infrastructure team to agree. You see all these stories of people using AI to build their app and getting rid of developers, that is not a real world scenario. It's probably costing them more than it would be for a developer to do the work. AI does it quicker, but it can be very expensive. It's all to do with tokens and such, and you can burn tokens easily. But if you want to start, do it, try and learn some basics, and you can always ask questions.

Harry Boxhall

View more posts by Harry Boxhall