How Much Does It Cost to Run an AI Agent?

The honest answer is: it depends on the workflow, not the agent label.

A daily research agent, a coding agent, an inbox triage agent, and a customer-support agent can all use the same model and still have very different monthly bills. The difference comes from token volume, retries, tool calls, context length, and how much of the prompt can be cached.

Use the LLM cost calculator to plug in your own scenario. It reads from TLDL's verified pricing API, so the calculator and pricing pages use the same source data.

The Formula

At the simplest level:

monthly cost =
  (monthly input tokens / 1,000,000 * input price)
+ (monthly output tokens / 1,000,000 * output price)

That formula is useful, but incomplete. Real agents add four multipliers:

Context size: The instructions, retrieved documents, previous messages, tool outputs, and state you include on every run.
Run frequency: How often the agent wakes up, receives a task, retries, or performs background checks.
Model mix: Whether every step uses the same model or cheap routing handles simple work.
Cache hit rate: Whether stable prompt sections, long system instructions, or repeated context qualify for cached-input pricing.

That is why two teams can both say "we run an AI agent" and mean very different cost profiles.

Start With Workflows, Not Models

Before choosing a model, write down the agent loop:

Question	Why it matters
How many tasks run per month?	Frequency drives total usage more than any single call.
How many model calls happen per task?	Agents often plan, call tools, inspect results, then answer.
How large is the average input?	Long context and retrieved docs usually dominate cost.
How large is the average output?	Reports, code diffs, summaries, and emails can be output-heavy.
What percent of input is cacheable?	Stable instructions and recurring context can change the economics.
Which calls need the strongest model?	Routing simple steps to cheaper models can matter more than provider choice.

Once you have rough numbers, use the calculator instead of guessing.

A Practical Estimation Method

1. Count agent runs

Start with monthly runs:

monthly runs = daily runs * active days

If the agent runs on demand, estimate tasks instead:

monthly tasks = active users * tasks per user per month

Do not use the best day or worst day. Use a normal week, then multiply.

2. Count model calls per run

Many agents do more than one model call:

planning call
retrieval or tool-selection call
one or more tool-result interpretation calls
final answer call
retry or repair call

If you only estimate the final answer, you will undercount.

3. Split input and output

Input and output are priced separately by most providers. Track them separately:

monthly input tokens = runs * calls per run * average input tokens
monthly output tokens = runs * calls per run * average output tokens

This is especially important for agents that read a lot and write little, such as monitoring agents, or agents that write a lot, such as report generators.

4. Estimate cacheable input

Stable prompt sections are often repeated:

system instructions
policy and style guides
tool schemas
fixed project context
recurring customer or account context

If a provider supports cached-input pricing for the model you use, estimate what percent of input is stable enough to cache. The LLM cost calculator includes a cache hit-rate field for this reason.

5. Model expensive steps separately

Do not average everything into one model if the workflow has distinct steps.

A common pattern:

cheap or fast model for classification and routing
stronger model for reasoning, coding, or final synthesis
specialized model for embeddings, search, or reranking

Calculate the expensive step separately. Then calculate the cheap steps. Add them.

Example: Inbox Triage Agent

An inbox agent usually has high input and modest output.

The input includes email text, sender metadata, labels, previous thread context, and user preferences. The output may be a short decision: archive, reply later, draft response, escalate, or ignore.

Cost drivers:

number of messages checked per month
whether every message uses an LLM or only uncertain cases do
how much thread history is included
whether the same user preferences are cached
whether drafts are generated for every email or only selected emails

This is a good candidate for routing. A small model can classify obvious messages, while a stronger model handles ambiguous threads and reply drafts.

Example: Research Agent

A research agent often has a larger context window and more tool calls.

It may search, read several documents, summarize sources, compare claims, and produce a final memo. The final memo might be short, but the intermediate input can be large.

Cost drivers:

number of sources read
whether full documents or excerpts are passed to the model
number of synthesis passes
whether retrieved context is deduplicated
whether the final answer needs citations or structured output

The fastest way to reduce cost is usually not switching providers. It is sending less irrelevant context.

Example: Coding Agent

A coding agent can be output-heavy and retry-heavy.

It reads files, reasons about a change, writes code, runs tests, reads errors, and patches again. Every loop adds tokens.

Cost drivers:

repository context included per task
number of files read
test and error output size
number of repair loops
whether diffs are small or whole files are regenerated
whether the strongest model is used for every edit

For coding workflows, reliability can be cheaper than raw token price. A low-price model that needs repeated repair loops can cost more in total than a stronger model that finishes cleanly.

What to Measure in Production

Track these numbers per task type:

input tokens
cached input tokens
output tokens
model name
retries
tool calls
latency
success rate
human correction rate

Token cost alone is not enough. A cheap run that produces unusable work is not cheap.

The Useful Rule

For most teams, the monthly cost is not mysterious. It is just hidden because the workload has not been written down.

Start with:

monthly runs
model calls per run
average input tokens
average output tokens
cache hit rate
model mix

Then calculate the bill.

Use the LLM cost calculator for the arithmetic, and check the LLM API pricing comparison when you need the underlying source data and verification dates.

How Much Does It Cost to Run an AI Agent?

How Much Does It Cost to Run an AI Agent?

The Formula

Start With Workflows, Not Models

A Practical Estimation Method

1. Count agent runs

2. Count model calls per run

3. Split input and output

4. Estimate cacheable input

5. Model expensive steps separately

Example: Inbox Triage Agent

Example: Research Agent

Example: Coding Agent

What to Measure in Production

The Useful Rule

Related

Enjoyed this article?

Read the latest TLDL issue