LLM API Pricing 2026 — Compare GPT-5, Claude 4, Gemini 2.5, DeepSeek Costs

March 2026: GPT-5.4 $2.50/M, Claude Sonnet $3/$15, Gemini Flash $0.30, DeepSeek $0.14. Compare 30+ LLM prices. Find the cheapest API for your app.

··13 min read
Share

LLM API Pricing Comparison — March 2026

Last updated: March 5, 2026GPT-5.4 now rolling out

TL;DR — LLM API Pricing as of March 2026

  • Cheapest: Gemini 2.0 Flash-Lite — $0.075/$0.30 per 1M tokens
  • Best Value: DeepSeek V3.2 — $0.28/$0.42 per 1M tokens
  • Best Overall: GPT-5.4 — $2.50/$10 per 1M tokens (new!)
  • Best Mid-tier: Claude Sonnet 4.6 ($3/$15) or GPT-5.2 ($1.75/$14)
  • Premium: Claude Opus 4.6 — $5/$25 | GPT-5.2 Pro — $21/$168
  • Free Tier: Google Gemini (free on most models)

The LLM pricing landscape has shifted dramatically. DeepSeek undercut everyone. OpenAI slashed flagship prices 80% year-over-year. Google offers a generous free tier. Choosing the wrong model can cost you 100x more than necessary for the same quality output.

This guide covers every major API with real cost examples so you can pick the right model for your budget. Updated weekly with official pricing.

Quick Answer: Which Model Should You Use?

Before diving into tables, here's what most developers actually need:

  • Cheapest option that works: Gemini 2.0 Flash-Lite at $0.075/$0.30 per million tokens. Hard to beat for simple tasks.
  • Best bang for the buck: DeepSeek V3.2 at $0.28/$0.42. Still very capable for the price, with 90% cache discounts.
  • Best mid-tier all-rounder: Claude Sonnet 4.6 ($3/$15) or GPT-5.2 ($1.75/$14). Both handle complex tasks well.
  • When you need the absolute best: Claude Opus 4.6 ($5/$25) or GPT-5.2 Pro ($21/$168). Use sparingly.
  • Free and open: Llama 4 and Gemini's free tier cost nothing for prototyping.

All Provider Pricing (March 2026)

OpenAI

Source: openai.com/api/pricing

ModelInput/MOutput/MCached Input/MContextBest For
GPT-5.2 Pro$21.00$168.00$2.10200KHardest reasoning tasks
GPT-5.2$1.75$14.00$0.175200KCoding, agents
GPT-5$1.25$10.00$0.125128KGeneral flagship
GPT-5 Mini$0.25$2.00$0.025200KFast, affordable
GPT-5 Nano$0.05$0.40$0.005128KHigh-volume simple tasks
o4-mini$1.10$4.40$0.275200KBest value reasoning
o3$2.00$8.00$1.00200KMid-tier reasoning
o3-pro$20.00$80.00200KStrong reasoning
o1$15.00$60.00$7.50200KLegacy reasoning
GPT-4.1$2.00$8.00$0.201MPrevious gen
GPT-4.1 Mini$0.40$1.60$0.041MPrevious gen budget
GPT-4.1 Nano$0.10$0.40$0.011MPrevious gen fast

OpenAI Batch API gives 50% off all models for async workloads processed within 24 hours. Cached input tokens cost 10% of standard input price.

Anthropic

Source: claude.com/pricing

ModelInput/MOutput/MCached Input/MContextBest For
Claude Opus 4.6$5.00$25.00$0.50200KComplex analysis, research
Claude Sonnet 4.6$3.00$15.00$0.30200KCoding, balanced tasks
Claude Haiku 4.5$1.00$5.00$0.10200KFast classification, chat

Opus 4.6 dropped 67% from the previous Opus 4.1 ($15/$75). Batch API saves another 50%. Prompt caching saves 90% on input tokens, stackable with batch for up to 95% total savings. Legacy Claude 3 Haiku ($0.25/$1.25) is deprecated and retiring April 2026.

Google Gemini

Source: ai.google.dev/pricing

ModelInput/MOutput/MCached Input/MContextBest For
Gemini 3.1 Pro (preview)$2.00 (≤200K) / $4.00$12.00 (≤200K) / $18.00200K+Next-gen flagship
Gemini 3 Flash (preview)$0.50$3.00Fast next-gen
Gemini 2.5 Pro (≤200K)$1.25$10.00$0.1252MLong documents, analysis
Gemini 2.5 Pro (>200K)$2.50$15.00$0.252MVery long context
Gemini 2.5 Flash$0.30$2.50$0.031MFast mid-tier
Gemini 2.5 Flash-Lite$0.10$0.401MCheapest mainstream
Gemini 2.0 Flash$0.10$0.40$0.0251MUltra cheap, proven

Free tier available on most models (Gemini 2.5 Flash, Flash-Lite, 2.0 Flash, etc.). Great for prototyping and low-traffic apps.

DeepSeek

Source: api-docs.deepseek.com/pricing

ModelInput/MOutput/MCached Input/MContextBest For
DeepSeek V3.2 (Chat)$0.28$0.42$0.028128KGeneral tasks, very cheap
DeepSeek V3.2 (Reasoner)$0.28$0.42$0.028128KReasoning, same price

DeepSeek V3.2 unified chat and reasoning into one model at one price. Cache hits save 90%.

xAI (Grok)

Source: docs.x.ai/developers/models

ModelInput/MOutput/MContextBest For
Grok 4$3.00$15.002MLarge context reasoning
Grok 4.1 Fast$0.20$0.502MBudget with huge context

New users get $25 in free credits. The 2M context window is the joint-largest available.

Mistral

Source: mistral.ai/pricing

ModelInput/MOutput/MContextBest For
Mistral Large 3$2.00$6.00128KEuropean hosting, GDPR
Mistral Medium 3$0.40$2.00128KMid-tier tasks
Mistral Nemo$0.02$0.02128KLightweight tasks
Ministral 8B$0.10$0.10128KCheapest Mistral option

Meta Llama (Open Weights — Self-Hosted)

ModelAPI CostContextNotes
Llama 4Free200KHost yourself or use a provider
Llama 3.3Free128KProven, well-supported

Llama models are free to download but you pay for compute. Typical hosted pricing through providers like Together, Fireworks, or Groq ranges from $0.05–$0.90/M tokens depending on model size and provider.

Price Ranking: Cheapest to Most Expensive

Sorted by blended cost (assuming 1:1 input-to-output ratio):

RankModelInput/MOutput/MBlended $/M
1Mistral Nemo$0.02$0.02$0.02
2GPT-5 Nano$0.05$0.40$0.23
3Gemini 2.0 Flash-Lite$0.075$0.30$0.19
4Ministral 8B$0.10$0.10$0.10
5Gemini 2.0 Flash$0.10$0.40$0.25
6Gemini 2.5 Flash-Lite$0.10$0.40$0.25
7Grok 4.1 Fast$0.20$0.50$0.35
8GPT-5 Mini$0.25$2.00$1.13
9DeepSeek V3.2$0.28$0.42$0.35
10Gemini 2.5 Flash$0.30$2.50$1.40
11Mistral Medium 3$0.40$2.00$1.20
12o4-mini / o3-mini$1.10$4.40$2.75
13Gemini 2.5 Pro$1.25$10.00$5.63
14GPT-5$1.25$10.00$5.63
15GPT-5.2$1.75$14.00$7.88
16o3 / GPT-4.1$2.00$8.00$5.00
17Mistral Large 3$2.00$6.00$4.00
18Claude Sonnet 4.6$3.00$15.00$9.00
19Grok 4$3.00$15.00$9.00
20Claude Opus 4.6$5.00$25.00$15.00
21o3-pro$20.00$80.00$50.00
22GPT-5.2 Pro$21.00$168.00$94.50
23o1-pro$150.00$600.00$375.00

Real Cost Examples

How much does it actually cost to do common tasks? These estimates use typical token counts.

Summarizing a 10-page document (~4,000 input tokens, ~500 output tokens)

ModelCost per docCost for 1,000 docs
DeepSeek V3.2$0.0013$1.33
Gemini 2.0 Flash$0.0006$0.60
GPT-5 Nano$0.0004$0.40
Claude Haiku 4.5$0.0065$6.50
GPT-5.2$0.0140$14.00
Claude Opus 4.6$0.0325$32.50

Chatbot conversation (avg ~800 input tokens, ~400 output tokens per turn)

ModelCost per turn10K users × 20 turns/day/month
Gemini 2.0 Flash$0.00024$14/mo
DeepSeek V3.2$0.00039$23/mo
GPT-5 Mini$0.001$60/mo
Claude Haiku 4.5$0.0028$168/mo
Claude Sonnet 4.6$0.0084$504/mo

Code generation (avg ~2,000 input tokens, ~1,500 output tokens per request)

ModelCost per request500 requests/day/month
GPT-5 Nano$0.0007$10.50/mo
DeepSeek V3.2$0.0012$18.00/mo
GPT-5.2$0.0245$367/mo
Claude Sonnet 4.6$0.0285$427/mo
Claude Opus 4.6$0.0475$712/mo

RAG pipeline (retrieval-augmented generation: ~8,000 input tokens, ~800 output tokens)

ModelCost per query50K queries/month
Gemini 2.5 Flash$0.0044$220/mo
DeepSeek V3.2$0.0026$128/mo
GPT-5 Mini$0.0036$180/mo
Gemini 2.5 Pro$0.018$900/mo
Claude Sonnet 4.6$0.036$1,800/mo

Best Model by Use Case

High-volume chatbots and customer support

Pick: Gemini 2.0 Flash or DeepSeek V3.2

At $0.10–$0.28/M input, these handle simple Q&A and routing at pennies per thousand conversations. Use a smarter model as fallback for edge cases only.

Coding assistants and code generation

Pick: GPT-5.2 or Claude Sonnet 4.6

Both excel at code. GPT-5.2 is slightly cheaper ($1.75 vs $3.00 input). Claude Sonnet tends to follow complex instructions more precisely. For budget coding, DeepSeek V3.2 is still very capable at $0.28/M input.

Document summarization and extraction

Pick: Gemini 2.5 Pro

The 2M context window means you can stuff entire documents without chunking. At $1.25/M input (≤200K), it's cheaper than Claude or GPT for long-context work. Gemini 2.5 Flash at $0.30/M is a solid cheaper alternative.

Research and complex reasoning

Pick: Claude Opus 4.6 or GPT-5.2 Pro

For tasks where accuracy justifies the cost — legal analysis, scientific review, complex multi-step reasoning. Opus 4.6 at $5/$25 is dramatically cheaper than GPT-5.2 Pro at $21/$168 and often matches it.

Prototyping and experimentation

Pick: Gemini free tier or Llama 4 (self-hosted)

Gemini gives 1,000 free requests/day. Llama 4 costs nothing to run if you have the hardware. Both eliminate cost as a barrier during development.

Classification, tagging, and routing

Pick: GPT-5 Nano or Ministral 8B

Simple decision-making tasks don't need big models. At $0.05–$0.10/M input, you can classify millions of items for under $10.

Cost Optimization Tips

1. Prompt caching (saves 75–90%)

Every major provider now offers prompt caching. If your system prompt or few-shot examples stay the same across requests, cached tokens cost a fraction of the base price:

ProviderCache SavingsCached Input Cost (flagship)
OpenAI90% off$0.175/M (GPT-5.2)
Anthropic90% off$0.50/M (Opus 4.6)
Google75% off$0.31/M (Gemini 2.5 Pro)
DeepSeek90% off$0.028/M (V3.2)

With a 2,000-token system prompt sent 100K times: uncached costs $1.00 with GPT-5.2, cached costs $0.035. That's $96.50 saved per 100K requests on the system prompt alone.

2. Batch API (saves 50%)

OpenAI's Batch API processes requests asynchronously within 24 hours at half price. Perfect for:

  • Nightly data processing
  • Bulk content generation
  • Evaluation pipelines
  • Anything that doesn't need real-time responses

3. Model routing (saves 60–80%)

Don't send every request to your best model. Route by complexity:

Simple query → GPT-5 Nano ($0.05/M)
Medium query → GPT-5 Mini ($0.25/M)
Hard query   → GPT-5.2 ($1.75/M)

If 70% of your traffic is simple, 20% medium, and 10% hard, your effective cost drops from $1.75/M to about $0.27/M — an 85% reduction.

4. Output token management

Output tokens cost 4–8x more than input tokens across most providers. Reduce output costs by:

  • Asking for structured JSON instead of verbose prose
  • Setting max_tokens limits
  • Requesting bullet points instead of paragraphs
  • Using "be concise" in your system prompt (it works)

5. Off-peak scheduling (DeepSeek)

DeepSeek V3.2 offers cache hit pricing at $0.028/M (90% off) for repeated context. Structure your prompts to maximize cache hits on system prompts and few-shot examples.

6. Stack discounts

Combine prompt caching + batch API + model routing for maximum savings. Example with Anthropic:

  • Base Opus 4.6: $5.00/M input
  • With caching: $0.50/M input (90% off)
  • With batch: $0.25/M input (additional 50% off)
  • Effective cost: 95% cheaper than list price

Frequently Asked Questions

What is the cheapest LLM API in 2026?

Gemini 2.0 Flash-Lite at $0.075/$0.30 per million tokens is the cheapest mainstream option. For even lower costs, Mistral Nemo costs just $0.02/M tokens.

Which LLM has the best price-to-performance ratio?

DeepSeek V3.2 offers strong value at $0.28/$0.42 per million tokens with unified chat and reasoning capabilities at one price.

How much does GPT-5 cost?

GPT-5 starts at $1.25/$10.00 per million input/output tokens. The premium GPT-5.2 Pro costs $21/$168 but offers the highest capability.

Is Claude cheaper than GPT?

Claude Sonnet 4.6 at $3/$15 is more expensive than GPT-5 ($1.25/$10) but competitive with GPT-5.2 ($1.75/$14). Claude Haiku 4.5 at $1/$5 is the budget Claude option.

Does Google offer free LLM API?

Yes, Google offers free input/output tokens on most Gemini models (2.5 Flash, Flash-Lite, 2.0 Flash, etc.) — great for prototyping.

How can I save money on LLM API costs?

Use prompt caching (saves 90% on repeated context), batch API (50% off for async tasks), and choose the right model for each task—don't use GPT-5 Pro for simple queries.

Cost Calculator Resources

The Bottom Line

LLM API prices dropped roughly 80% across the board from 2025 to 2026. The gap between "cheap" and "premium" is now 1,000x+ (Mistral Nemo at $0.02/M vs o1-pro at $375/M blended). For most production apps, models in the $0.10–$3.00/M range handle the job. Save the expensive models for tasks where quality directly impacts revenue.


More AI Resources

Prices from official provider websites as of March 2026. LLM pricing changes fast — verify current rates at OpenAI, Anthropic, Google, DeepSeek, xAI, and Mistral before committing.