LLM API Pricing 2026 — Compare GPT-5, Claude 4, Gemini 2.5, DeepSeek Costs
April 2026: GPT-5.4 $2.50/M, Claude Sonnet $3/$15, Gemini Flash $0.30, DeepSeek $0.14. Compare 30+ LLM prices. Find the cheapest API for your app.
LLM API Pricing Comparison — April 2026
Last updated: April 9, 2026 — Claude Opus 4.6 now available with 1M context
TL;DR — LLM API Pricing as of April 2026
- Cheapest: Gemini 2.0 Flash-Lite — $0.075/$0.30 per 1M tokens
- Best Value: DeepSeek V3.2 — $0.28/$0.42 per 1M tokens
- Best Overall: GPT-5.4 — $2.50/$10 per 1M tokens
- Best Mid-tier: Claude Sonnet 4.6 ($3/$15) or GPT-5.2 ($1.75/$14)
- Premium: Claude Opus 4.6 — $5/$25 (now with 1M context) | GPT-5.2 Pro — $21/$168
- Free Tier: Google Gemini (free on most models)
Three years ago, a million tokens of GPT-4 cost about as much as a decent dinner. Today the same workload runs for less than a cup of coffee — and on some models, less than a stick of gum. The race that started with OpenAI's first rate card has turned into a sustained price war between four labs (OpenAI, Anthropic, Google, DeepSeek) and a long tail of open-weight challengers willing to undercut anyone on raw cost-per-token.
This guide breaks down every major pricing tier as of April 2026, with the trade-offs that matter when you're picking a default model for production. If you're shipping new features today, also see our OpenClaw use cases for 2026 for examples of how teams blend cheap and premium models in the same pipeline.
Why pricing changed faster than anyone predicted
The drop from GPT-4's launch price ($30/$60 per 1M tokens in March 2023) to GPT-5.4's $2.50/$10 today represents roughly a 12x reduction in input cost in 36 months. A few forces converged:
- Mixture-of-Experts at scale. DeepSeek V3.2, Mistral, and Google's Gemini 2.5 line all activate only a fraction of their parameters per token. Inference cost falls without sacrificing benchmark scores.
- Hardware contracts. Anthropic's deal with AWS Trainium and OpenAI's expanded GB200 capacity created room for repeated price cuts in late 2025.
- Open-weight pressure. Once DeepSeek V3 hit $0.14 input in early 2025, every closed-source provider had to justify a 10-20x premium against an OpenAI-comparable model.
- Cached input pricing. Anthropic's prompt caching, OpenAI's automatic prefix cache, and Gemini's context caching mean repeat traffic is now 2-10x cheaper than the headline rate.
The practical result: most teams are over-paying. They picked a model in 2024, never revisited the choice, and are now spending 3-5x more than necessary on workloads that a smaller model would handle fine.
Flagship tier — the models you'd reach for first
| Model | Input ($/1M) | Output ($/1M) | Context | Cached input |
|---|---|---|---|---|
| GPT-5.4 | $2.50 | $10.00 | 400K | $0.625 |
| GPT-5.2 | $1.75 | $14.00 | 400K | $0.44 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | $0.30 |
| Claude Opus 4.6 (1M) | $5.00 | $25.00 | 1M | $0.50 |
| Gemini 2.5 Pro | $1.25 | $10.00 | 2M | $0.31 |
| Grok 4 | $3.00 | $15.00 | 256K | $0.75 |
GPT-5.4 has quietly become the default for general-purpose agentic work — it costs less than Claude Sonnet on input, handles tool calls reliably, and the 400K context fits most realistic codebases. Claude Sonnet 4.6 still wins for long-document reasoning and code generation, especially with the 1M context window now standard on Anthropic's API. Gemini 2.5 Pro is the cheapest flagship and the only one with a 2M context window, but its tool-calling behavior remains less predictable than the OpenAI or Anthropic equivalents.
For premium use cases — legal review, complex agents, multi-file refactors — Claude Opus 4.6 with 1M context is the model people are willing to pay for. At $5/$25 it's still half the price Opus commanded a year ago.
Budget tier — when speed and price beat capability
| Model | Input ($/1M) | Output ($/1M) | Context | Notes |
|---|---|---|---|---|
| GPT-5.4 mini | $0.25 | $2.00 | 400K | Default for chat UIs |
| GPT-5.4 nano | $0.05 | $0.40 | 128K | Classification, routing |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | Fastest Claude |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Free tier available |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | 1M | Cheapest mainstream model |
| DeepSeek V3.2 | $0.28 | $0.42 | 128K | MoE, open weights |
| DeepSeek R1.5 (reasoning) | $0.55 | $2.19 | 128K | Hidden reasoning included |
| Mistral Small 3.2 | $0.20 | $0.60 | 128K | EU data residency |
| Llama 4 Maverick (Together) | $0.27 | $0.85 | 1M | Best open-source flagship |
The budget tier is where 80% of production traffic should probably live. A typical SaaS chat product — RAG over docs, light tool use, no multi-step planning — runs perfectly well on Gemini 2.0 Flash or GPT-5.4 mini and costs roughly 1/20th of the flagship rate. DeepSeek V3.2 is the wild card: it's still the cheapest "GPT-4-class" model, but expect higher latency from its public API and consider self-hosting via Together or Fireworks if throughput matters.
Reasoning models — pay for thinking, not output
| Model | Input | Output | Notes |
|---|---|---|---|
| GPT-5.2 Pro | $21.00 | $168.00 | Hardest reasoning, long horizons |
| OpenAI o4 | $5.00 | $20.00 | Cheaper o-series |
| Claude Opus 4.6 (extended thinking) | $5.00 | $25.00 | Visible thinking tokens billed as output |
| Gemini 2.5 Pro Deep Think | $2.50 | $20.00 | Currently the cheapest reasoning model |
| DeepSeek R1.5 | $0.55 | $2.19 | Open-weight reasoning |
A common mistake: reasoning models charge for every hidden thinking token at the output rate. A single GPT-5.2 Pro call can burn 50K output tokens before producing a one-paragraph answer. Always cap max_output_tokens and prefer batched inference for non-interactive workloads — OpenAI's Batch API and Anthropic's Message Batches both knock 50% off these rates.
What to actually pick
A rough decision tree that holds up well in mid-2026:
- Building an agent with tools? GPT-5.4 if you want reliability, Claude Sonnet 4.6 if you need long context, Gemini 2.5 Pro if cost is the binding constraint.
- High-volume classification, extraction, or routing? GPT-5.4 nano or Gemini 2.0 Flash-Lite. Both are <$0.10 input.
- Code generation and refactors? Claude Sonnet 4.6 still has an edge on code quality benchmarks; Opus 4.6 for the hardest tasks.
- EU data residency required? Mistral Small 3.2 or Mistral Large via Azure EU regions.
- Self-hosted or air-gapped? DeepSeek V3.2 or Llama 4 Maverick weights, served through vLLM or SGLang.
For most teams, the right move is to put a cheap router model in front of a flagship — classify the request, downshift simple ones to a $0.10/M model, escalate the rest. Done well, this knocks 60-80% off a typical bill without users noticing.
Hidden costs that don't show up in headline rates
Token rates are only part of the picture. The line items that quietly dominate real-world bills:
- Output:input ratio. Output is 4-8x more expensive than input across every provider. A chatty model that answers in 800 words instead of 200 quadruples your spend.
- Tool-call loops. Each tool call adds the full message history back into the prompt. A 5-step agent with a 30K-token system prompt can pay for that prompt 5+ times per request.
- Image and audio tokens. GPT-5.4 charges roughly $0.01 per high-res image; long audio inputs on Gemini are billed per second.
- Egress and storage. Vector databases and file APIs (OpenAI Files, Anthropic Files) charge separately. Easy to overlook until you're storing 100GB of embeddings.
For a deeper look at where teams burn budget without realizing it, see our AI cost optimization patterns write-up — most of the wins come from prompt caching, batch inference, and aggressive output truncation rather than picking a different model.
How long these prices will hold
Expect another round of cuts before the end of 2026. OpenAI has signaled a GPT-5.5 launch in summer that will likely reset the flagship tier; Google's TPU v6 capacity is coming online; and DeepSeek's V4 release would re-anchor the budget tier. Cached input and batch pricing are the most stable numbers on this page — those are tied to infrastructure economics, not competitive positioning.
Bookmark this page; it gets a refresh roughly monthly. If a number here disagrees with a provider's official rate card, trust the rate card — and let us know so we can update.
Related Resources
Want more resources?
Subscribe to get the latest AI tools, guides, and updates.
Newsletter
Stay ahead of the curve
Key insights from top tech podcasts, delivered daily. Join 10,000+ engineers, founders, and investors.
One email per day. Unsubscribe anytime.