LLM API Pricing Comparison — March 2026

Last updated: March 5, 2026 — GPT-5.4 now rolling out

TL;DR — LLM API Pricing as of March 2026

Cheapest: Gemini 2.0 Flash-Lite — $0.075/$0.30 per 1M tokens

Best Value: DeepSeek V3.2 — $0.28/$0.42 per 1M tokens

Best Overall: GPT-5.4 — $2.50/$10 per 1M tokens (new!)

Best Mid-tier: Claude Sonnet 4.6 ($3/$15) or GPT-5.2 ($1.75/$14)

Premium: Claude Opus 4.6 — $5/$25 | GPT-5.2 Pro — $21/$168

Free Tier: Google Gemini (free on most models)

The LLM pricing landscape has shifted dramatically. DeepSeek undercut everyone. OpenAI slashed flagship prices 80% year-over-year. Google offers a generous free tier. Choosing the wrong model can cost you 100x more than necessary for the same quality output.

This guide covers every major API with real cost examples so you can pick the right model for your budget. Updated weekly with official pricing.

Quick Answer: Which Model Should You Use?

Before diving into tables, here's what most developers actually need:

Cheapest option that works: Gemini 2.0 Flash-Lite at $0.075/$0.30 per million tokens. Hard to beat for simple tasks.
Best bang for the buck: DeepSeek V3.2 at $0.28/$0.42. Still very capable for the price, with 90% cache discounts.
Best mid-tier all-rounder: Claude Sonnet 4.6 ($3/$15) or GPT-5.2 ($1.75/$14). Both handle complex tasks well.
When you need the absolute best: Claude Opus 4.6 ($5/$25) or GPT-5.2 Pro ($21/$168). Use sparingly.
Free and open: Llama 4 and Gemini's free tier cost nothing for prototyping.

All Provider Pricing (March 2026)

OpenAI

Source: openai.com/api/pricing

Model	Input/M	Output/M	Cached Input/M	Context	Best For
GPT-5.2 Pro	$21.00	$168.00	$2.10	200K	Hardest reasoning tasks
GPT-5.2	$1.75	$14.00	$0.175	200K	Coding, agents
GPT-5	$1.25	$10.00	$0.125	128K	General flagship
GPT-5 Mini	$0.25	$2.00	$0.025	200K	Fast, affordable
GPT-5 Nano	$0.05	$0.40	$0.005	128K	High-volume simple tasks
o4-mini	$1.10	$4.40	$0.275	200K	Best value reasoning
o3	$2.00	$8.00	$1.00	200K	Mid-tier reasoning
o3-pro	$20.00	$80.00	—	200K	Strong reasoning
o1	$15.00	$60.00	$7.50	200K	Legacy reasoning
GPT-4.1	$2.00	$8.00	$0.20	1M	Previous gen
GPT-4.1 Mini	$0.40	$1.60	$0.04	1M	Previous gen budget
GPT-4.1 Nano	$0.10	$0.40	$0.01	1M	Previous gen fast

OpenAI Batch API gives 50% off all models for async workloads processed within 24 hours. Cached input tokens cost 10% of standard input price.

Anthropic

Source: claude.com/pricing

Model	Input/M	Output/M	Cached Input/M	Context	Best For
Claude Opus 4.6	$5.00	$25.00	$0.50	200K	Complex analysis, research
Claude Sonnet 4.6	$3.00	$15.00	$0.30	200K	Coding, balanced tasks
Claude Haiku 4.5	$1.00	$5.00	$0.10	200K	Fast classification, chat

Opus 4.6 dropped 67% from the previous Opus 4.1 ($15/$75). Batch API saves another 50%. Prompt caching saves 90% on input tokens, stackable with batch for up to 95% total savings. Legacy Claude 3 Haiku ($0.25/$1.25) is deprecated and retiring April 2026.

Google Gemini

Source: ai.google.dev/pricing

Model	Input/M	Output/M	Cached Input/M	Context	Best For
Gemini 3.1 Pro (preview)	$2.00 (≤200K) / $4.00	$12.00 (≤200K) / $18.00	—	200K+	Next-gen flagship
Gemini 3 Flash (preview)	$0.50	$3.00	—	—	Fast next-gen
Gemini 2.5 Pro (≤200K)	$1.25	$10.00	$0.125	2M	Long documents, analysis
Gemini 2.5 Pro (>200K)	$2.50	$15.00	$0.25	2M	Very long context
Gemini 2.5 Flash	$0.30	$2.50	$0.03	1M	Fast mid-tier
Gemini 2.5 Flash-Lite	$0.10	$0.40	—	1M	Cheapest mainstream
Gemini 2.0 Flash	$0.10	$0.40	$0.025	1M	Ultra cheap, proven

Free tier available on most models (Gemini 2.5 Flash, Flash-Lite, 2.0 Flash, etc.). Great for prototyping and low-traffic apps.

DeepSeek

Source: api-docs.deepseek.com/pricing

Model	Input/M	Output/M	Cached Input/M	Context	Best For
DeepSeek V3.2 (Chat)	$0.28	$0.42	$0.028	128K	General tasks, very cheap
DeepSeek V3.2 (Reasoner)	$0.28	$0.42	$0.028	128K	Reasoning, same price

DeepSeek V3.2 unified chat and reasoning into one model at one price. Cache hits save 90%.

xAI (Grok)

Source: docs.x.ai/developers/models

Model	Input/M	Output/M	Context	Best For
Grok 4	$3.00	$15.00	2M	Large context reasoning
Grok 4.1 Fast	$0.20	$0.50	2M	Budget with huge context

New users get $25 in free credits. The 2M context window is the joint-largest available.

Mistral

Source: mistral.ai/pricing

Model	Input/M	Output/M	Context	Best For
Mistral Large 3	$2.00	$6.00	128K	European hosting, GDPR
Mistral Medium 3	$0.40	$2.00	128K	Mid-tier tasks
Mistral Nemo	$0.02	$0.02	128K	Lightweight tasks
Ministral 8B	$0.10	$0.10	128K	Cheapest Mistral option

Meta Llama (Open Weights — Self-Hosted)

Model	API Cost	Context	Notes
Llama 4	Free	200K	Host yourself or use a provider
Llama 3.3	Free	128K	Proven, well-supported

Llama models are free to download but you pay for compute. Typical hosted pricing through providers like Together, Fireworks, or Groq ranges from $0.05–$0.90/M tokens depending on model size and provider.

Price Ranking: Cheapest to Most Expensive

Sorted by blended cost (assuming 1:1 input-to-output ratio):

Rank	Model	Input/M	Output/M	Blended $/M
1	Mistral Nemo	$0.02	$0.02	$0.02
2	GPT-5 Nano	$0.05	$0.40	$0.23
3	Gemini 2.0 Flash-Lite	$0.075	$0.30	$0.19
4	Ministral 8B	$0.10	$0.10	$0.10
5	Gemini 2.0 Flash	$0.10	$0.40	$0.25
6	Gemini 2.5 Flash-Lite	$0.10	$0.40	$0.25
7	Grok 4.1 Fast	$0.20	$0.50	$0.35
8	GPT-5 Mini	$0.25	$2.00	$1.13
9	DeepSeek V3.2	$0.28	$0.42	$0.35
10	Gemini 2.5 Flash	$0.30	$2.50	$1.40
11	Mistral Medium 3	$0.40	$2.00	$1.20
12	o4-mini / o3-mini	$1.10	$4.40	$2.75
13	Gemini 2.5 Pro	$1.25	$10.00	$5.63
14	GPT-5	$1.25	$10.00	$5.63
15	GPT-5.2	$1.75	$14.00	$7.88
16	o3 / GPT-4.1	$2.00	$8.00	$5.00
17	Mistral Large 3	$2.00	$6.00	$4.00
18	Claude Sonnet 4.6	$3.00	$15.00	$9.00
19	Grok 4	$3.00	$15.00	$9.00
20	Claude Opus 4.6	$5.00	$25.00	$15.00
21	o3-pro	$20.00	$80.00	$50.00
22	GPT-5.2 Pro	$21.00	$168.00	$94.50
23	o1-pro	$150.00	$600.00	$375.00

Real Cost Examples

How much does it actually cost to do common tasks? These estimates use typical token counts.

Summarizing a 10-page document (~4,000 input tokens, ~500 output tokens)

Model	Cost per doc	Cost for 1,000 docs
DeepSeek V3.2	$0.0013	$1.33
Gemini 2.0 Flash	$0.0006	$0.60
GPT-5 Nano	$0.0004	$0.40
Claude Haiku 4.5	$0.0065	$6.50
GPT-5.2	$0.0140	$14.00
Claude Opus 4.6	$0.0325	$32.50

Chatbot conversation (avg ~800 input tokens, ~400 output tokens per turn)

Model	Cost per turn	10K users × 20 turns/day/month
Gemini 2.0 Flash	$0.00024	$14/mo
DeepSeek V3.2	$0.00039	$23/mo
GPT-5 Mini	$0.001	$60/mo
Claude Haiku 4.5	$0.0028	$168/mo
Claude Sonnet 4.6	$0.0084	$504/mo

Code generation (avg ~2,000 input tokens, ~1,500 output tokens per request)

Model	Cost per request	500 requests/day/month
GPT-5 Nano	$0.0007	$10.50/mo
DeepSeek V3.2	$0.0012	$18.00/mo
GPT-5.2	$0.0245	$367/mo
Claude Sonnet 4.6	$0.0285	$427/mo
Claude Opus 4.6	$0.0475	$712/mo

RAG pipeline (retrieval-augmented generation: ~8,000 input tokens, ~800 output tokens)

Model	Cost per query	50K queries/month
Gemini 2.5 Flash	$0.0044	$220/mo
DeepSeek V3.2	$0.0026	$128/mo
GPT-5 Mini	$0.0036	$180/mo
Gemini 2.5 Pro	$0.018	$900/mo
Claude Sonnet 4.6	$0.036	$1,800/mo

Best Model by Use Case

High-volume chatbots and customer support

Pick: Gemini 2.0 Flash or DeepSeek V3.2

At $0.10–$0.28/M input, these handle simple Q&A and routing at pennies per thousand conversations. Use a smarter model as fallback for edge cases only.

Coding assistants and code generation

Pick: GPT-5.2 or Claude Sonnet 4.6

Both excel at code. GPT-5.2 is slightly cheaper ($1.75 vs $3.00 input). Claude Sonnet tends to follow complex instructions more precisely. For budget coding, DeepSeek V3.2 is still very capable at $0.28/M input.

Document summarization and extraction

Pick: Gemini 2.5 Pro

The 2M context window means you can stuff entire documents without chunking. At $1.25/M input (≤200K), it's cheaper than Claude or GPT for long-context work. Gemini 2.5 Flash at $0.30/M is a solid cheaper alternative.

Research and complex reasoning

Pick: Claude Opus 4.6 or GPT-5.2 Pro

For tasks where accuracy justifies the cost — legal analysis, scientific review, complex multi-step reasoning. Opus 4.6 at $5/$25 is dramatically cheaper than GPT-5.2 Pro at $21/$168 and often matches it.

Prototyping and experimentation

Pick: Gemini free tier or Llama 4 (self-hosted)

Gemini gives 1,000 free requests/day. Llama 4 costs nothing to run if you have the hardware. Both eliminate cost as a barrier during development.

Classification, tagging, and routing

Pick: GPT-5 Nano or Ministral 8B

Simple decision-making tasks don't need big models. At $0.05–$0.10/M input, you can classify millions of items for under $10.

Cost Optimization Tips

1. Prompt caching (saves 75–90%)

Every major provider now offers prompt caching. If your system prompt or few-shot examples stay the same across requests, cached tokens cost a fraction of the base price:

Provider	Cache Savings	Cached Input Cost (flagship)
OpenAI	90% off	$0.175/M (GPT-5.2)
Anthropic	90% off	$0.50/M (Opus 4.6)
Google	75% off	$0.31/M (Gemini 2.5 Pro)
DeepSeek	90% off	$0.028/M (V3.2)

With a 2,000-token system prompt sent 100K times: uncached costs $1.00 with GPT-5.2, cached costs $0.035. That's $96.50 saved per 100K requests on the system prompt alone.

2. Batch API (saves 50%)

OpenAI's Batch API processes requests asynchronously within 24 hours at half price. Perfect for:

Nightly data processing
Bulk content generation
Evaluation pipelines
Anything that doesn't need real-time responses

3. Model routing (saves 60–80%)

Don't send every request to your best model. Route by complexity:

Simple query → GPT-5 Nano ($0.05/M)
Medium query → GPT-5 Mini ($0.25/M)
Hard query   → GPT-5.2 ($1.75/M)

If 70% of your traffic is simple, 20% medium, and 10% hard, your effective cost drops from $1.75/M to about $0.27/M — an 85% reduction.

4. Output token management

Output tokens cost 4–8x more than input tokens across most providers. Reduce output costs by:

Asking for structured JSON instead of verbose prose
Setting max_tokens limits
Requesting bullet points instead of paragraphs
Using "be concise" in your system prompt (it works)

5. Off-peak scheduling (DeepSeek)

DeepSeek V3.2 offers cache hit pricing at $0.028/M (90% off) for repeated context. Structure your prompts to maximize cache hits on system prompts and few-shot examples.

6. Stack discounts

Combine prompt caching + batch API + model routing for maximum savings. Example with Anthropic:

Base Opus 4.6: $5.00/M input
With caching: $0.50/M input (90% off)
With batch: $0.25/M input (additional 50% off)
Effective cost: 95% cheaper than list price

Frequently Asked Questions

What is the cheapest LLM API in 2026?

Gemini 2.0 Flash-Lite at $0.075/$0.30 per million tokens is the cheapest mainstream option. For even lower costs, Mistral Nemo costs just $0.02/M tokens.

Which LLM has the best price-to-performance ratio?

DeepSeek V3.2 offers strong value at $0.28/$0.42 per million tokens with unified chat and reasoning capabilities at one price.

How much does GPT-5 cost?

GPT-5 starts at $1.25/$10.00 per million input/output tokens. The premium GPT-5.2 Pro costs $21/$168 but offers the highest capability.

Is Claude cheaper than GPT?

Claude Sonnet 4.6 at $3/$15 is more expensive than GPT-5 ($1.25/$10) but competitive with GPT-5.2 ($1.75/$14). Claude Haiku 4.5 at $1/$5 is the budget Claude option.

Does Google offer free LLM API?

Yes, Google offers free input/output tokens on most Gemini models (2.5 Flash, Flash-Lite, 2.0 Flash, etc.) — great for prototyping.

How can I save money on LLM API costs?

Use prompt caching (saves 90% on repeated context), batch API (50% off for async tasks), and choose the right model for each task—don't use GPT-5 Pro for simple queries.

Cost Calculator Resources

PricePerToken — Compare 300+ models side by side
LLM Pricing — 72+ models with filtering
CostGoat — Calculator with usage projections

The Bottom Line

LLM API prices dropped roughly 80% across the board from 2025 to 2026. The gap between "cheap" and "premium" is now 1,000x+ (Mistral Nemo at $0.02/M vs o1-pro at $375/M blended). For most production apps, models in the $0.10–$3.00/M range handle the job. Save the expensive models for tasks where quality directly impacts revenue.

More AI Resources

AI Trends 2026 — What's next in AI
Best AI Coding Tools — Dev tools comparison
AI Companies Landscape — Provider directory

Prices from official provider websites as of March 2026. LLM pricing changes fast — verify current rates at OpenAI, Anthropic, Google, DeepSeek, xAI, and Mistral before committing.