2026-04-27
If you've used an AI coding agent — something like Cursor, Claude Code, or a similar tool that reads your codebase, plans changes, and writes code across multiple files — you've probably noticed it can burn through a lot of tokens. Tokens are the units language models charge by, and agentic workflows consume far more of them than a simple chat because the AI is reading files, thinking, calling tools, reading more files, revising, and so on in long loops. This paper is the first systematic attempt to figure out exactly where all those tokens go.
The researchers analyzed execution traces from AI agents working on real coding tasks and asked three pointed questions:
The key insight is that agentic AI usage has a fundamentally different cost profile than conversational AI. A simple chatbot exchange might cost fractions of a cent. An agentic coding session tackling a hard bug can consume millions of tokens, running up real bills. Understanding why helps both tool builders and users. Tool builders can optimize their agent loops to reduce redundant context loading and unnecessary retries. Users and engineering managers can make better decisions about when to deploy an agent versus doing the work manually.
The prediction angle is especially practical: if a system could flag "this task will likely be expensive" before execution, teams could set budgets, choose cheaper models for simpler tasks, or restructure a request to reduce cost — the same way you'd get estimates before hiring a contractor.
