LLM API Pricing (July 2026): GPT-5.4 $2.50/M · Claude Sonnet 5 $2/M · Full Table

Compare 23 verified LLM API prices across 7 providers. LFM2 24B A2B on Together starts at $0.03/M input tokens; includes GPT, Claude, Gemini, DeepSeek, Groq, xAI, and official source links.

··2 min read
Share

Verified pricing dataset

LLM API prices from tracked source data

Prices are shown only after source verification. Pending providers are tracked but excluded from rankings and calculators.

As of
Update cadence
Checked daily; published only after source verification.
Source policy
Official provider pricing pages or APIs only.
Machine-readable data
/api/pricing.json
Compare monthly costs in the LLM cost calculator

Quick pricing answers

Cheapest input

LFM2 24B A2B on Together

$0.03/M input · $0.12/M output

Cheapest output

Llama 3.1 8B Instant on Groq

$0.08/M output

Lowest first-party lab price

Gemini 2.5 Flash-Lite

Google · $0.10/M input

Best cached-input price

DeepSeek V4 Flash

$0.0028/M cached input

These answers are generated from 23 verified models across 7 providers as of July 2026. Use the calculator for workload-specific totals because output tokens and cache hit rate can change the cheapest choice.

ProviderModelInput / 1MCached input / 1MOutput / 1MContextVerified
DeepSeekDeepSeek V4 Flash$0.14$0.0028$0.281,000,000July 2026
DeepSeekDeepSeek V4 Pro$0.435$0.0036$0.871,000,000July 2026
OpenAIGPT-5.5$5.00$0.50$30.00n/aJuly 2026
OpenAIGPT-5.4$2.50$0.25$15.00n/aJuly 2026
OpenAIGPT-5.4 mini$0.75$0.075$4.50n/aJuly 2026
OpenAIGPT-5.4 nano$0.20$0.02$1.25n/aJuly 2026
AnthropicClaude Opus 4.8$5.00$0.50$25.00n/aJuly 2026
AnthropicClaude Sonnet 5$2.00$0.20$10.00n/aJuly 2026
AnthropicClaude Sonnet 4.6$3.00$0.30$15.00n/aJuly 2026
AnthropicClaude Haiku 4.5$1.00$0.10$5.00n/aJuly 2026
GoogleGemini 3 Flash Preview$0.50$0.05$3.00n/aJuly 2026
GoogleGemini 2.5 Pro$1.25$0.125$10.001,000,000July 2026
GoogleGemini 2.5 Flash$0.30$0.03$2.501,000,000July 2026
GoogleGemini 2.5 Flash-Lite$0.10$0.01$0.40n/aJuly 2026
xAIGrok Build 0.1$1.00n/a$2.00256,000July 2026
Together AIDeepSeek V4 Pro on Together$1.74$0.20$3.48n/aJuly 2026
Together AIMiniMax M3 on Together$0.30$0.06$1.20n/aJuly 2026
Together AIgpt-oss-120B on Together$0.15n/a$0.60n/aJuly 2026
Together AILFM2 24B A2B on Together$0.03n/a$0.12n/aJuly 2026
GroqGPT OSS 20B on Groq$0.075$0.038$0.30128,000July 2026
GroqGPT OSS 120B on Groq$0.15$0.075$0.60128,000July 2026
GroqLlama 4 Scout on Groq$0.11n/a$0.34128,000July 2026
GroqLlama 3.1 8B Instant on Groq$0.05n/a$0.08128,000July 2026

Pending source verification

Mistral

Official model docs are tracked, but a stable per-model token pricing table was not found in the fetched docs. Do not publish Mistral prices until verified from an official pricing table.

https://docs.mistral.ai/getting-started/models/models_overview/

Fireworks

Official pricing page exposes serverless, fine-tuning, and GPU-hour pricing, but the fetched content points per-token estimates to a separate blog. Keep token pricing pending until a stable official per-model token table is available.

https://fireworks.ai/pricing

OpenRouter

Provider exposes model pricing through its models API; ingestion needs a deterministic model-selection policy before publishing aggregate OpenRouter prices.

https://openrouter.ai/api/v1/models

Pricing questions

What is the cheapest LLM API in July 2026?

LFM2 24B A2B on Together has the lowest verified input-token price at $0.03/M input tokens.

Which model is cheapest for generation-heavy workloads?

Llama 3.1 8B Instant on Groq has the lowest verified output-token price at $0.08/M output tokens.

Why are some providers still pending?

TLDL publishes prices only when the official source exposes stable per-model token pricing. Pending rows keep tracked sources visible without mixing guessed prices into the table.

Pricing data changelog

2026-07-04 · DeepSeek

Verified DeepSeek V4 Flash and V4 Pro pricing from the official DeepSeek API pricing page.

https://api-docs.deepseek.com/quick_start/pricing

2026-07-04 · OpenAI

Verified GPT-5.5 and GPT-5.4 family standard API pricing from the official OpenAI pricing page.

https://platform.openai.com/docs/pricing

2026-07-04 · Anthropic

Verified current Claude first-party API pricing from the official Anthropic pricing page.

https://docs.anthropic.com/en/docs/about-claude/pricing

2026-07-04 · Google

Verified Gemini 3 Flash and Gemini 2.5 family standard text pricing from the official Gemini API pricing page.

https://ai.google.dev/gemini-api/docs/pricing

2026-07-04 · Groq

Verified Groq on-demand LLM token pricing from the official Groq pricing page.

https://groq.com/pricing/

2026-07-04 · Together AI

Verified selected Together AI serverless inference prices from the official Together AI pricing page.

https://www.together.ai/pricing

2026-07-04 · xAI

Verified Grok Build 0.1 token pricing from the official xAI docs overview.

https://docs.x.ai/overview

2026-07-04 · TLDL

Kept Mistral, Fireworks token inference, and OpenRouter aggregate pricing as source-pending where the official source is not a stable per-model token table in this dataset yet.

https://www.tldl.io/api/pricing.json

This page compares LLM API prices from TLDL's shared pricing dataset. The table above is generated from src/data/llm-pricing.json, the same source used by the public pricing API and the LLM cost calculator.

Prices are included only after they are verified from official provider sources. Providers that are still pending stay visible in the source-tracking list, but they are excluded from rankings and calculators until the data is verified.

Why the dataset is strict

LLM pricing pages drift quickly. Model names change, cached-token rules change, context tiers split by prompt length, and provider pages move behind bot protection. TLDL treats the official rate card as the source of truth and avoids publishing copied numbers when a source cannot be verified.

How to read the table

  • Input / 1M is the provider's verified input-token price per million tokens.
  • Cached input / 1M is shown only when the provider exposes a separate cached-input price.
  • Output / 1M is the provider's verified output-token price per million tokens.
  • Context and Verified show the model metadata and verification month from the shared dataset.

Source status

The pending section shows providers where TLDL already tracks the official pricing URL but has not verified current token prices. This is intentional: pending rows are better than stale prices because they make the missing verification explicit.

Related