Blog

How Much Does It Cost to Run an AI Agent?

By TLDL

A practical method for estimating AI agent API costs from token volume, tool calls, retries, cache hit rate, and model choice.

Share

How Much Does It Cost to Run an AI Agent?

The honest answer is: it depends on the workflow, not the agent label.

A daily research agent, a coding agent, an inbox triage agent, and a customer-support agent can all use the same model and still have very different monthly bills. The difference comes from token volume, retries, tool calls, context length, and how much of the prompt can be cached.

Use the LLM cost calculator to plug in your own scenario. It reads from TLDL's verified pricing API, so the calculator and pricing pages use the same source data.

The Formula

At the simplest level:

monthly cost =
  (monthly input tokens / 1,000,000 * input price)
+ (monthly output tokens / 1,000,000 * output price)

That formula is useful, but incomplete. Real agents add four multipliers:

  1. Context size: The instructions, retrieved documents, previous messages, tool outputs, and state you include on every run.
  2. Run frequency: How often the agent wakes up, receives a task, retries, or performs background checks.
  3. Model mix: Whether every step uses the same model or cheap routing handles simple work.
  4. Cache hit rate: Whether stable prompt sections, long system instructions, or repeated context qualify for cached-input pricing.

That is why two teams can both say "we run an AI agent" and mean very different cost profiles.

Start With Workflows, Not Models

Before choosing a model, write down the agent loop:

QuestionWhy it matters
How many tasks run per month?Frequency drives total usage more than any single call.
How many model calls happen per task?Agents often plan, call tools, inspect results, then answer.
How large is the average input?Long context and retrieved docs usually dominate cost.
How large is the average output?Reports, code diffs, summaries, and emails can be output-heavy.
What percent of input is cacheable?Stable instructions and recurring context can change the economics.
Which calls need the strongest model?Routing simple steps to cheaper models can matter more than provider choice.

Once you have rough numbers, use the calculator instead of guessing.

A Practical Estimation Method

1. Count agent runs

Start with monthly runs:

monthly runs = daily runs * active days

If the agent runs on demand, estimate tasks instead:

monthly tasks = active users * tasks per user per month

Do not use the best day or worst day. Use a normal week, then multiply.

2. Count model calls per run

Many agents do more than one model call:

  • planning call
  • retrieval or tool-selection call
  • one or more tool-result interpretation calls
  • final answer call
  • retry or repair call

If you only estimate the final answer, you will undercount.

3. Split input and output

Input and output are priced separately by most providers. Track them separately:

monthly input tokens = runs * calls per run * average input tokens
monthly output tokens = runs * calls per run * average output tokens

This is especially important for agents that read a lot and write little, such as monitoring agents, or agents that write a lot, such as report generators.

4. Estimate cacheable input

Stable prompt sections are often repeated:

  • system instructions
  • policy and style guides
  • tool schemas
  • fixed project context
  • recurring customer or account context

If a provider supports cached-input pricing for the model you use, estimate what percent of input is stable enough to cache. The LLM cost calculator includes a cache hit-rate field for this reason.

5. Model expensive steps separately

Do not average everything into one model if the workflow has distinct steps.

A common pattern:

  • cheap or fast model for classification and routing
  • stronger model for reasoning, coding, or final synthesis
  • specialized model for embeddings, search, or reranking

Calculate the expensive step separately. Then calculate the cheap steps. Add them.

Example: Inbox Triage Agent

An inbox agent usually has high input and modest output.

The input includes email text, sender metadata, labels, previous thread context, and user preferences. The output may be a short decision: archive, reply later, draft response, escalate, or ignore.

Cost drivers:

  • number of messages checked per month
  • whether every message uses an LLM or only uncertain cases do
  • how much thread history is included
  • whether the same user preferences are cached
  • whether drafts are generated for every email or only selected emails

This is a good candidate for routing. A small model can classify obvious messages, while a stronger model handles ambiguous threads and reply drafts.

Example: Research Agent

A research agent often has a larger context window and more tool calls.

It may search, read several documents, summarize sources, compare claims, and produce a final memo. The final memo might be short, but the intermediate input can be large.

Cost drivers:

  • number of sources read
  • whether full documents or excerpts are passed to the model
  • number of synthesis passes
  • whether retrieved context is deduplicated
  • whether the final answer needs citations or structured output

The fastest way to reduce cost is usually not switching providers. It is sending less irrelevant context.

Example: Coding Agent

A coding agent can be output-heavy and retry-heavy.

It reads files, reasons about a change, writes code, runs tests, reads errors, and patches again. Every loop adds tokens.

Cost drivers:

  • repository context included per task
  • number of files read
  • test and error output size
  • number of repair loops
  • whether diffs are small or whole files are regenerated
  • whether the strongest model is used for every edit

For coding workflows, reliability can be cheaper than raw token price. A low-price model that needs repeated repair loops can cost more in total than a stronger model that finishes cleanly.

What to Measure in Production

Track these numbers per task type:

  • input tokens
  • cached input tokens
  • output tokens
  • model name
  • retries
  • tool calls
  • latency
  • success rate
  • human correction rate

Token cost alone is not enough. A cheap run that produces unusable work is not cheap.

The Useful Rule

For most teams, the monthly cost is not mysterious. It is just hidden because the workload has not been written down.

Start with:

  1. monthly runs
  2. model calls per run
  3. average input tokens
  4. average output tokens
  5. cache hit rate
  6. model mix

Then calculate the bill.

Use the LLM cost calculator for the arithmetic, and check the LLM API pricing comparison when you need the underlying source data and verification dates.

Related

Author

T

TLDL

AI-powered podcast insights

← Back to blog

Enjoyed this article?

Get the best AI insights delivered to your inbox daily.

Newsletter

Read the latest TLDL issue

Website-native AI podcast briefings for engineers, founders, and investors.

Published on the website. Follow by RSS if you want updates without another inbox.