LLM API Pricing 2026 — Compare GPT-5, Claude 4, Gemini 2.5, DeepSeek Costs
March 2026: GPT-5.4 $2.50/M, Claude Sonnet $3/$15, Gemini Flash $0.30, DeepSeek $0.14. Compare 30+ LLM prices. Find the cheapest API for your app.
LLM API Pricing Comparison — March 2026
Last updated: March 5, 2026 — GPT-5.4 now rolling out
TL;DR — LLM API Pricing as of March 2026
- Cheapest: Gemini 2.0 Flash-Lite — $0.075/$0.30 per 1M tokens
- Best Value: DeepSeek V3.2 — $0.28/$0.42 per 1M tokens
- Best Overall: GPT-5.4 — $2.50/$10 per 1M tokens (new!)
- Best Mid-tier: Claude Sonnet 4.6 ($3/$15) or GPT-5.2 ($1.75/$14)
- Premium: Claude Opus 4.6 — $5/$25 | GPT-5.2 Pro — $21/$168
- Free Tier: Google Gemini (free on most models)
The LLM pricing landscape has shifted dramatically. DeepSeek undercut everyone. OpenAI slashed flagship prices 80% year-over-year. Google offers a generous free tier. Choosing the wrong model can cost you 100x more than necessary for the same quality output.
This guide covers every major API with real cost examples so you can pick the right model for your budget. Updated weekly with official pricing.
Quick Answer: Which Model Should You Use?
Before diving into tables, here's what most developers actually need:
- Cheapest option that works: Gemini 2.0 Flash-Lite at $0.075/$0.30 per million tokens. Hard to beat for simple tasks.
- Best bang for the buck: DeepSeek V3.2 at $0.28/$0.42. Still very capable for the price, with 90% cache discounts.
- Best mid-tier all-rounder: Claude Sonnet 4.6 ($3/$15) or GPT-5.2 ($1.75/$14). Both handle complex tasks well.
- When you need the absolute best: Claude Opus 4.6 ($5/$25) or GPT-5.2 Pro ($21/$168). Use sparingly.
- Free and open: Llama 4 and Gemini's free tier cost nothing for prototyping.
All Provider Pricing (March 2026)
OpenAI
Source: openai.com/api/pricing
| Model | Input/M | Output/M | Cached Input/M | Context | Best For |
|---|---|---|---|---|---|
| GPT-5.2 Pro | $21.00 | $168.00 | $2.10 | 200K | Hardest reasoning tasks |
| GPT-5.2 | $1.75 | $14.00 | $0.175 | 200K | Coding, agents |
| GPT-5 | $1.25 | $10.00 | $0.125 | 128K | General flagship |
| GPT-5 Mini | $0.25 | $2.00 | $0.025 | 200K | Fast, affordable |
| GPT-5 Nano | $0.05 | $0.40 | $0.005 | 128K | High-volume simple tasks |
| o4-mini | $1.10 | $4.40 | $0.275 | 200K | Best value reasoning |
| o3 | $2.00 | $8.00 | $1.00 | 200K | Mid-tier reasoning |
| o3-pro | $20.00 | $80.00 | — | 200K | Strong reasoning |
| o1 | $15.00 | $60.00 | $7.50 | 200K | Legacy reasoning |
| GPT-4.1 | $2.00 | $8.00 | $0.20 | 1M | Previous gen |
| GPT-4.1 Mini | $0.40 | $1.60 | $0.04 | 1M | Previous gen budget |
| GPT-4.1 Nano | $0.10 | $0.40 | $0.01 | 1M | Previous gen fast |
OpenAI Batch API gives 50% off all models for async workloads processed within 24 hours. Cached input tokens cost 10% of standard input price.
Anthropic
Source: claude.com/pricing
| Model | Input/M | Output/M | Cached Input/M | Context | Best For |
|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | $0.50 | 200K | Complex analysis, research |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | 200K | Coding, balanced tasks |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 | 200K | Fast classification, chat |
Opus 4.6 dropped 67% from the previous Opus 4.1 ($15/$75). Batch API saves another 50%. Prompt caching saves 90% on input tokens, stackable with batch for up to 95% total savings. Legacy Claude 3 Haiku ($0.25/$1.25) is deprecated and retiring April 2026.
Google Gemini
Source: ai.google.dev/pricing
| Model | Input/M | Output/M | Cached Input/M | Context | Best For |
|---|---|---|---|---|---|
| Gemini 3.1 Pro (preview) | $2.00 (≤200K) / $4.00 | $12.00 (≤200K) / $18.00 | — | 200K+ | Next-gen flagship |
| Gemini 3 Flash (preview) | $0.50 | $3.00 | — | — | Fast next-gen |
| Gemini 2.5 Pro (≤200K) | $1.25 | $10.00 | $0.125 | 2M | Long documents, analysis |
| Gemini 2.5 Pro (>200K) | $2.50 | $15.00 | $0.25 | 2M | Very long context |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.03 | 1M | Fast mid-tier |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | — | 1M | Cheapest mainstream |
| Gemini 2.0 Flash | $0.10 | $0.40 | $0.025 | 1M | Ultra cheap, proven |
Free tier available on most models (Gemini 2.5 Flash, Flash-Lite, 2.0 Flash, etc.). Great for prototyping and low-traffic apps.
DeepSeek
Source: api-docs.deepseek.com/pricing
| Model | Input/M | Output/M | Cached Input/M | Context | Best For |
|---|---|---|---|---|---|
| DeepSeek V3.2 (Chat) | $0.28 | $0.42 | $0.028 | 128K | General tasks, very cheap |
| DeepSeek V3.2 (Reasoner) | $0.28 | $0.42 | $0.028 | 128K | Reasoning, same price |
DeepSeek V3.2 unified chat and reasoning into one model at one price. Cache hits save 90%.
xAI (Grok)
Source: docs.x.ai/developers/models
| Model | Input/M | Output/M | Context | Best For |
|---|---|---|---|---|
| Grok 4 | $3.00 | $15.00 | 2M | Large context reasoning |
| Grok 4.1 Fast | $0.20 | $0.50 | 2M | Budget with huge context |
New users get $25 in free credits. The 2M context window is the joint-largest available.
Mistral
Source: mistral.ai/pricing
| Model | Input/M | Output/M | Context | Best For |
|---|---|---|---|---|
| Mistral Large 3 | $2.00 | $6.00 | 128K | European hosting, GDPR |
| Mistral Medium 3 | $0.40 | $2.00 | 128K | Mid-tier tasks |
| Mistral Nemo | $0.02 | $0.02 | 128K | Lightweight tasks |
| Ministral 8B | $0.10 | $0.10 | 128K | Cheapest Mistral option |
Meta Llama (Open Weights — Self-Hosted)
| Model | API Cost | Context | Notes |
|---|---|---|---|
| Llama 4 | Free | 200K | Host yourself or use a provider |
| Llama 3.3 | Free | 128K | Proven, well-supported |
Llama models are free to download but you pay for compute. Typical hosted pricing through providers like Together, Fireworks, or Groq ranges from $0.05–$0.90/M tokens depending on model size and provider.
Price Ranking: Cheapest to Most Expensive
Sorted by blended cost (assuming 1:1 input-to-output ratio):
| Rank | Model | Input/M | Output/M | Blended $/M |
|---|---|---|---|---|
| 1 | Mistral Nemo | $0.02 | $0.02 | $0.02 |
| 2 | GPT-5 Nano | $0.05 | $0.40 | $0.23 |
| 3 | Gemini 2.0 Flash-Lite | $0.075 | $0.30 | $0.19 |
| 4 | Ministral 8B | $0.10 | $0.10 | $0.10 |
| 5 | Gemini 2.0 Flash | $0.10 | $0.40 | $0.25 |
| 6 | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.25 |
| 7 | Grok 4.1 Fast | $0.20 | $0.50 | $0.35 |
| 8 | GPT-5 Mini | $0.25 | $2.00 | $1.13 |
| 9 | DeepSeek V3.2 | $0.28 | $0.42 | $0.35 |
| 10 | Gemini 2.5 Flash | $0.30 | $2.50 | $1.40 |
| 11 | Mistral Medium 3 | $0.40 | $2.00 | $1.20 |
| 12 | o4-mini / o3-mini | $1.10 | $4.40 | $2.75 |
| 13 | Gemini 2.5 Pro | $1.25 | $10.00 | $5.63 |
| 14 | GPT-5 | $1.25 | $10.00 | $5.63 |
| 15 | GPT-5.2 | $1.75 | $14.00 | $7.88 |
| 16 | o3 / GPT-4.1 | $2.00 | $8.00 | $5.00 |
| 17 | Mistral Large 3 | $2.00 | $6.00 | $4.00 |
| 18 | Claude Sonnet 4.6 | $3.00 | $15.00 | $9.00 |
| 19 | Grok 4 | $3.00 | $15.00 | $9.00 |
| 20 | Claude Opus 4.6 | $5.00 | $25.00 | $15.00 |
| 21 | o3-pro | $20.00 | $80.00 | $50.00 |
| 22 | GPT-5.2 Pro | $21.00 | $168.00 | $94.50 |
| 23 | o1-pro | $150.00 | $600.00 | $375.00 |
Real Cost Examples
How much does it actually cost to do common tasks? These estimates use typical token counts.
Summarizing a 10-page document (~4,000 input tokens, ~500 output tokens)
| Model | Cost per doc | Cost for 1,000 docs |
|---|---|---|
| DeepSeek V3.2 | $0.0013 | $1.33 |
| Gemini 2.0 Flash | $0.0006 | $0.60 |
| GPT-5 Nano | $0.0004 | $0.40 |
| Claude Haiku 4.5 | $0.0065 | $6.50 |
| GPT-5.2 | $0.0140 | $14.00 |
| Claude Opus 4.6 | $0.0325 | $32.50 |
Chatbot conversation (avg ~800 input tokens, ~400 output tokens per turn)
| Model | Cost per turn | 10K users × 20 turns/day/month |
|---|---|---|
| Gemini 2.0 Flash | $0.00024 | $14/mo |
| DeepSeek V3.2 | $0.00039 | $23/mo |
| GPT-5 Mini | $0.001 | $60/mo |
| Claude Haiku 4.5 | $0.0028 | $168/mo |
| Claude Sonnet 4.6 | $0.0084 | $504/mo |
Code generation (avg ~2,000 input tokens, ~1,500 output tokens per request)
| Model | Cost per request | 500 requests/day/month |
|---|---|---|
| GPT-5 Nano | $0.0007 | $10.50/mo |
| DeepSeek V3.2 | $0.0012 | $18.00/mo |
| GPT-5.2 | $0.0245 | $367/mo |
| Claude Sonnet 4.6 | $0.0285 | $427/mo |
| Claude Opus 4.6 | $0.0475 | $712/mo |
RAG pipeline (retrieval-augmented generation: ~8,000 input tokens, ~800 output tokens)
| Model | Cost per query | 50K queries/month |
|---|---|---|
| Gemini 2.5 Flash | $0.0044 | $220/mo |
| DeepSeek V3.2 | $0.0026 | $128/mo |
| GPT-5 Mini | $0.0036 | $180/mo |
| Gemini 2.5 Pro | $0.018 | $900/mo |
| Claude Sonnet 4.6 | $0.036 | $1,800/mo |
Best Model by Use Case
High-volume chatbots and customer support
Pick: Gemini 2.0 Flash or DeepSeek V3.2
At $0.10–$0.28/M input, these handle simple Q&A and routing at pennies per thousand conversations. Use a smarter model as fallback for edge cases only.
Coding assistants and code generation
Pick: GPT-5.2 or Claude Sonnet 4.6
Both excel at code. GPT-5.2 is slightly cheaper ($1.75 vs $3.00 input). Claude Sonnet tends to follow complex instructions more precisely. For budget coding, DeepSeek V3.2 is still very capable at $0.28/M input.
Document summarization and extraction
Pick: Gemini 2.5 Pro
The 2M context window means you can stuff entire documents without chunking. At $1.25/M input (≤200K), it's cheaper than Claude or GPT for long-context work. Gemini 2.5 Flash at $0.30/M is a solid cheaper alternative.
Research and complex reasoning
Pick: Claude Opus 4.6 or GPT-5.2 Pro
For tasks where accuracy justifies the cost — legal analysis, scientific review, complex multi-step reasoning. Opus 4.6 at $5/$25 is dramatically cheaper than GPT-5.2 Pro at $21/$168 and often matches it.
Prototyping and experimentation
Pick: Gemini free tier or Llama 4 (self-hosted)
Gemini gives 1,000 free requests/day. Llama 4 costs nothing to run if you have the hardware. Both eliminate cost as a barrier during development.
Classification, tagging, and routing
Pick: GPT-5 Nano or Ministral 8B
Simple decision-making tasks don't need big models. At $0.05–$0.10/M input, you can classify millions of items for under $10.
Cost Optimization Tips
1. Prompt caching (saves 75–90%)
Every major provider now offers prompt caching. If your system prompt or few-shot examples stay the same across requests, cached tokens cost a fraction of the base price:
| Provider | Cache Savings | Cached Input Cost (flagship) |
|---|---|---|
| OpenAI | 90% off | $0.175/M (GPT-5.2) |
| Anthropic | 90% off | $0.50/M (Opus 4.6) |
| 75% off | $0.31/M (Gemini 2.5 Pro) | |
| DeepSeek | 90% off | $0.028/M (V3.2) |
With a 2,000-token system prompt sent 100K times: uncached costs $1.00 with GPT-5.2, cached costs $0.035. That's $96.50 saved per 100K requests on the system prompt alone.
2. Batch API (saves 50%)
OpenAI's Batch API processes requests asynchronously within 24 hours at half price. Perfect for:
- Nightly data processing
- Bulk content generation
- Evaluation pipelines
- Anything that doesn't need real-time responses
3. Model routing (saves 60–80%)
Don't send every request to your best model. Route by complexity:
Simple query → GPT-5 Nano ($0.05/M)
Medium query → GPT-5 Mini ($0.25/M)
Hard query → GPT-5.2 ($1.75/M)If 70% of your traffic is simple, 20% medium, and 10% hard, your effective cost drops from $1.75/M to about $0.27/M — an 85% reduction.
4. Output token management
Output tokens cost 4–8x more than input tokens across most providers. Reduce output costs by:
- Asking for structured JSON instead of verbose prose
- Setting max_tokens limits
- Requesting bullet points instead of paragraphs
- Using "be concise" in your system prompt (it works)
5. Off-peak scheduling (DeepSeek)
DeepSeek V3.2 offers cache hit pricing at $0.028/M (90% off) for repeated context. Structure your prompts to maximize cache hits on system prompts and few-shot examples.
6. Stack discounts
Combine prompt caching + batch API + model routing for maximum savings. Example with Anthropic:
- Base Opus 4.6: $5.00/M input
- With caching: $0.50/M input (90% off)
- With batch: $0.25/M input (additional 50% off)
- Effective cost: 95% cheaper than list price
Frequently Asked Questions
What is the cheapest LLM API in 2026?
Gemini 2.0 Flash-Lite at $0.075/$0.30 per million tokens is the cheapest mainstream option. For even lower costs, Mistral Nemo costs just $0.02/M tokens.
Which LLM has the best price-to-performance ratio?
DeepSeek V3.2 offers strong value at $0.28/$0.42 per million tokens with unified chat and reasoning capabilities at one price.
How much does GPT-5 cost?
GPT-5 starts at $1.25/$10.00 per million input/output tokens. The premium GPT-5.2 Pro costs $21/$168 but offers the highest capability.
Is Claude cheaper than GPT?
Claude Sonnet 4.6 at $3/$15 is more expensive than GPT-5 ($1.25/$10) but competitive with GPT-5.2 ($1.75/$14). Claude Haiku 4.5 at $1/$5 is the budget Claude option.
Does Google offer free LLM API?
Yes, Google offers free input/output tokens on most Gemini models (2.5 Flash, Flash-Lite, 2.0 Flash, etc.) — great for prototyping.
How can I save money on LLM API costs?
Use prompt caching (saves 90% on repeated context), batch API (50% off for async tasks), and choose the right model for each task—don't use GPT-5 Pro for simple queries.
Cost Calculator Resources
- PricePerToken — Compare 300+ models side by side
- LLM Pricing — 72+ models with filtering
- CostGoat — Calculator with usage projections
The Bottom Line
LLM API prices dropped roughly 80% across the board from 2025 to 2026. The gap between "cheap" and "premium" is now 1,000x+ (Mistral Nemo at $0.02/M vs o1-pro at $375/M blended). For most production apps, models in the $0.10–$3.00/M range handle the job. Save the expensive models for tasks where quality directly impacts revenue.
More AI Resources
- AI Trends 2026 — What's next in AI
- Best AI Coding Tools — Dev tools comparison
- AI Companies Landscape — Provider directory
Prices from official provider websites as of March 2026. LLM pricing changes fast — verify current rates at OpenAI, Anthropic, Google, DeepSeek, xAI, and Mistral before committing.
Related Resources
Want more resources?
Subscribe to get the latest AI tools, guides, and updates.
Newsletter
Stay ahead of the curve
Key insights from top tech podcasts, delivered daily. Join 10,000+ engineers, founders, and investors.
One email per day. Unsubscribe anytime.