Which LLM API is cheapest for long generated outputs?

Llama 3.1 8B Instant on Groq has the lowest verified output-token price in TLDL's dataset at $0.08 per 1M output tokens.

Where does TLDL pricing data come from?

TLDL pricing data comes from official provider pricing pages or APIs. A model price is published only when the source URL and verification date are recorded in the shared pricing dataset.

Why are some providers marked source pending?

Source-pending providers have an official pricing source tracked, but the current token prices have not been verified in the dataset. TLDL does not publish guessed or stale prices.

Can agents use the pricing data directly?

Yes. The pricing dataset is available as JSON at https://www.tldl.io/api/pricing.json with open CORS and cache headers.

LLM API Pricing (July 2026): GPT-5.4 $2.50/M · Claude Sonnet 5 $2/M · Full Table

Q: What is the cheapest LLM API in July 2026?

The cheapest verified input-token price in TLDL's dataset is LFM2 24B A2B on Together at $0.03 per 1M input tokens.

Verified pricing dataset

LLM API prices from tracked source data

Prices are shown only after source verification. Pending providers are tracked but excluded from rankings and calculators.

As of: July 4, 2026
Update cadence: Checked daily; published only after source verification.
Source policy: Official provider pricing pages or APIs only.
Machine-readable data: /api/pricing.json
Tracked sources: https://api-docs.deepseek.com/quick_start/pricing https://platform.openai.com/docs/pricing https://docs.anthropic.com/en/docs/about-claude/pricing https://ai.google.dev/gemini-api/docs/pricing https://docs.x.ai/overview https://www.together.ai/pricing https://groq.com/pricing/https://docs.mistral.ai/getting-started/models/models_overview/https://fireworks.ai/pricing https://openrouter.ai/api/v1/models

Compare monthly costs in the LLM cost calculator

Quick pricing answers

Cheapest input

LFM2 24B A2B on Together

$0.03/M input · $0.12/M output

Cheapest output

Llama 3.1 8B Instant on Groq

$0.08/M output

Lowest first-party lab price

Gemini 2.5 Flash-Lite

Google · $0.10/M input

Best cached-input price

DeepSeek V4 Flash

$0.0028/M cached input

These answers are generated from 23 verified models across 7 providers as of July 2026. Use the calculator for workload-specific totals because output tokens and cache hit rate can change the cheapest choice.

Provider	Model	Input / 1M	Cached input / 1M	Output / 1M	Context	Verified
DeepSeek	DeepSeek V4 Flash	$0.14	$0.0028	$0.28	1,000,000	July 2026
DeepSeek	DeepSeek V4 Pro	$0.435	$0.0036	$0.87	1,000,000	July 2026
OpenAI	GPT-5.5	$5.00	$0.50	$30.00	n/a	July 2026
OpenAI	GPT-5.4	$2.50	$0.25	$15.00	n/a	July 2026
OpenAI	GPT-5.4 mini	$0.75	$0.075	$4.50	n/a	July 2026
OpenAI	GPT-5.4 nano	$0.20	$0.02	$1.25	n/a	July 2026
Anthropic	Claude Opus 4.8	$5.00	$0.50	$25.00	n/a	July 2026
Anthropic	Claude Sonnet 5	$2.00	$0.20	$10.00	n/a	July 2026
Anthropic	Claude Sonnet 4.6	$3.00	$0.30	$15.00	n/a	July 2026
Anthropic	Claude Haiku 4.5	$1.00	$0.10	$5.00	n/a	July 2026
Google	Gemini 3 Flash Preview	$0.50	$0.05	$3.00	n/a	July 2026
Google	Gemini 2.5 Pro	$1.25	$0.125	$10.00	1,000,000	July 2026
Google	Gemini 2.5 Flash	$0.30	$0.03	$2.50	1,000,000	July 2026
Google	Gemini 2.5 Flash-Lite	$0.10	$0.01	$0.40	n/a	July 2026
xAI	Grok Build 0.1	$1.00	n/a	$2.00	256,000	July 2026
Together AI	DeepSeek V4 Pro on Together	$1.74	$0.20	$3.48	n/a	July 2026
Together AI	MiniMax M3 on Together	$0.30	$0.06	$1.20	n/a	July 2026
Together AI	gpt-oss-120B on Together	$0.15	n/a	$0.60	n/a	July 2026
Together AI	LFM2 24B A2B on Together	$0.03	n/a	$0.12	n/a	July 2026
Groq	GPT OSS 20B on Groq	$0.075	$0.038	$0.30	128,000	July 2026
Groq	GPT OSS 120B on Groq	$0.15	$0.075	$0.60	128,000	July 2026
Groq	Llama 4 Scout on Groq	$0.11	n/a	$0.34	128,000	July 2026
Groq	Llama 3.1 8B Instant on Groq	$0.05	n/a	$0.08	128,000	July 2026

Pending source verification

Mistral

Official model docs are tracked, but a stable per-model token pricing table was not found in the fetched docs. Do not publish Mistral prices until verified from an official pricing table.

https://docs.mistral.ai/getting-started/models/models_overview/

Fireworks

Official pricing page exposes serverless, fine-tuning, and GPU-hour pricing, but the fetched content points per-token estimates to a separate blog. Keep token pricing pending until a stable official per-model token table is available.

https://fireworks.ai/pricing

OpenRouter

Provider exposes model pricing through its models API; ingestion needs a deterministic model-selection policy before publishing aggregate OpenRouter prices.

https://openrouter.ai/api/v1/models

Pricing questions

What is the cheapest LLM API in July 2026?

LFM2 24B A2B on Together has the lowest verified input-token price at $0.03/M input tokens.

Which model is cheapest for generation-heavy workloads?

Llama 3.1 8B Instant on Groq has the lowest verified output-token price at $0.08/M output tokens.

Why are some providers still pending?

TLDL publishes prices only when the official source exposes stable per-model token pricing. Pending rows keep tracked sources visible without mixing guessed prices into the table.

Pricing data changelog

2026-07-04 · DeepSeek

Verified DeepSeek V4 Flash and V4 Pro pricing from the official DeepSeek API pricing page.

https://api-docs.deepseek.com/quick_start/pricing

2026-07-04 · OpenAI

Verified GPT-5.5 and GPT-5.4 family standard API pricing from the official OpenAI pricing page.

https://platform.openai.com/docs/pricing

2026-07-04 · Anthropic

Verified current Claude first-party API pricing from the official Anthropic pricing page.

https://docs.anthropic.com/en/docs/about-claude/pricing

2026-07-04 · Google

Verified Gemini 3 Flash and Gemini 2.5 family standard text pricing from the official Gemini API pricing page.

https://ai.google.dev/gemini-api/docs/pricing

2026-07-04 · Groq

Verified Groq on-demand LLM token pricing from the official Groq pricing page.

https://groq.com/pricing/

2026-07-04 · Together AI

Verified selected Together AI serverless inference prices from the official Together AI pricing page.

https://www.together.ai/pricing

2026-07-04 · xAI

Verified Grok Build 0.1 token pricing from the official xAI docs overview.

https://docs.x.ai/overview

2026-07-04 · TLDL

Kept Mistral, Fireworks token inference, and OpenRouter aggregate pricing as source-pending where the official source is not a stable per-model token table in this dataset yet.

https://www.tldl.io/api/pricing.json

This page ranks only models with verified pricing in TLDL's shared dataset. The table above is generated from the same source used by the public pricing API, the LLM API pricing comparison, and the LLM cost calculator.

Pending providers are tracked but excluded from cheapest-model claims until their official source data is verified. That rule prevents stale copied numbers from being ranked as if they were current.

Current verified budget options

The verified table above is the source of truth. Sort by input price for routing, extraction, classification, and batch-analysis workloads. Sort by output price for summarization, chat, long-form generation, and agent workflows where responses can become large.

When cheap is not actually cheap

Low token prices can be offset by retries, weaker tool calling, latency, lower success rates, or longer generated outputs. Test cost per completed task, not just cost per token.

How to compare models

Use the LLM cost calculator with your expected input tokens, output tokens, and cache behavior. A small model can be cheaper for simple routing and more expensive for hard tasks if it needs repeated calls.