ArXiv Paper Digest: How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

2026-04-27

Authors: Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun

If you've used an AI coding agent — something like Cursor, Claude Code, or a similar tool that reads your codebase, plans changes, and writes code across multiple files — you've probably noticed it can burn through a lot of tokens. Tokens are the units language models charge by, and agentic workflows consume far more of them than a simple chat because the AI is reading files, thinking, calling tools, reading more files, revising, and so on in long loops. This paper is the first systematic attempt to figure out exactly where all those tokens go.

The researchers analyzed execution traces from AI agents working on real coding tasks and asked three pointed questions:

Where do the tokens actually go? They broke down consumption by category — reading code context, generating plans, writing code, handling tool calls, processing error messages, and retrying failed attempts. The picture that emerges is that a surprisingly large share of tokens isn't spent on the "useful" part (writing code) but on context-gathering and recovery from mistakes.
Which models are more token-efficient? Not all language models burn tokens at the same rate for the same task. The study compares models to see which ones get the job done with less waste — an important practical consideration when the bill arrives.
Can we predict token usage before a task starts? This is the most ambitious part. The authors explore whether you can look at a task description and estimate how many tokens an agent will need, much like a contractor giving a quote before starting work. Early results suggest certain task features — complexity, number of files involved, ambiguity of requirements — are reasonable predictors.

The key insight is that agentic AI usage has a fundamentally different cost profile than conversational AI. A simple chatbot exchange might cost fractions of a cent. An agentic coding session tackling a hard bug can consume millions of tokens, running up real bills. Understanding why helps both tool builders and users. Tool builders can optimize their agent loops to reduce redundant context loading and unnecessary retries. Users and engineering managers can make better decisions about when to deploy an agent versus doing the work manually.

The prediction angle is especially practical: if a system could flag "this task will likely be expensive" before execution, teams could set budgets, choose cheaper models for simpler tasks, or restructure a request to reduce cost — the same way you'd get estimates before hiring a contractor.

Why it matters: As AI coding agents move from novelty to daily tooling, understanding and predicting their token costs is essential for making them economically sustainable at scale.

All newsletters