LLM Cost Calculator

Estimate monthly cost across Claude, GPT, Gemini, DeepSeek, Mistral, and Llama models from your token volume and request count.

LLM Cost Calculator

Estimate monthly cost across Claude, GPT, Gemini, DeepSeek, Mistral, and more from your token volume.

Suggest pricing update

Usage

Prompt tokens / request

Completion tokens / request

Requests / month

Cache hit ratio: 0%

Share of prompt tokens served from cache (when supported).

Quick presets

Prices in USD per 1M tokens. Verified manually each month against provider pricing pages. Last data refresh Jan 15, 2026.

Filters

Provider

Vision-capable onlyPrompt-cache onlyMin context

Monthly cost 16 models · sorted by monthly cost ↑

						Caps
Google	Gemini 2.5 Flash-Lite	1M	$0.10	$0.40	$15.00	👁️⚡
OpenAI	GPT-4o mini	128K	$0.15	$0.60	$22.50	👁️⚡
DeepSeek	DeepSeek V3	64K	$0.27	$1.10	$41.00	⚡
Meta (via Together)	Llama 3.3 70B Instruct Turbo	131K	$0.88	$0.88	$66.00
Google	Gemini 2.5 Flash	1M	$0.30	$2.50	$77.50	👁️⚡
DeepSeek	DeepSeek R1	64K	$0.55	$2.19	$82.25	⚡
OpenAI	o3-mini	200K	$1.10	$4.40	$165.00	⚡
Anthropic	Claude Haiku 4.5	200K	$1.00	$5.00	$175.00	👁️⚡
Mistral	Mistral Large 2	128K	$2.00	$6.00	$250.00
Meta (via Together)	Llama 3.1 405B Instruct Turbo	131K	$3.50	$3.50	$262.50
Google	Gemini 2.5 Pro	2M	$1.25	$10.00	$312.50	👁️⚡
OpenAI	GPT-4o	128K	$2.50	$10.00	$375.00	👁️⚡
Anthropic	Claude Sonnet 4.6	200K	$3.00	$15.00	$525.00	👁️⚡
xAI	Grok 3	131K	$3.00	$15.00	$525.00	👁️⚡
OpenAI	o1	200K	$15.00	$60.00	$2,250.00	👁️⚡
Anthropic	Claude Opus 4.7	200K	$15.00	$75.00	$2,625.00	👁️⚡

Cheapest match: Gemini 2.5 Flash-Lite at $15.00/month for your usage.

What is the LLM Cost Calculator?

The LLM Cost Calculator estimates how much you would pay each month to run a workload through every major large language model API: Claude (Opus, Sonnet, Haiku), GPT-4o, o1, o3, Gemini 2.5 Pro/Flash, DeepSeek V3 and R1, Mistral, Grok, and Llama via Together. You enter your prompt and completion token sizes plus your monthly request volume, and the tool ranks all models by cost so you can pick the most efficient option for your use case.

Pricing is verified manually against each provider's official pricing page. Every row shows the date it was last checked, so you always know how current the data is.

Why a calculator and not a comparison table?

Pricing pages list per-million token rates, but the cost that actually matters is the monthly bill given your real workload. A model with cheap input but expensive output is fine for chatbots and terrible for agents. The calculator does the math so you can compare apples to apples.

Key Features

Token-driven inputs: prompt size, completion size, monthly request count, and an optional cache hit ratio.
Prompt caching: when a model supports it (Claude, GPT-4o, Gemini), the cache rate is applied to the cached share of your input tokens.
Workload presets: light chatbot, heavy chatbot, RAG / document Q&A, coding agent, hobby project. One click and the inputs reflect a realistic scenario.
Filters: provider chips, vision-only, prompt-cache-only, minimum context window.
Sortable table: click any column header to sort by provider, name, context, input price, output price, or monthly cost.
Cheapest match callout: the lowest-cost model for your inputs is always highlighted.

How to use it

Pick a preset that matches your use case, or enter custom token counts and monthly request volume.
If you plan to use prompt caching (Anthropic, OpenAI, and Google all support some form of it), drag the cache hit ratio slider to the share you expect to be served from cache.
Optionally filter to providers you actually want to use, or exclude models without vision or without context big enough for your needs.
Read the monthly cost column. The cheapest match is highlighted at the bottom.

How prices stay current

The data file is reviewed once a month against each provider's pricing page. A reminder GitHub Action opens a tracking issue on the first of every month listing models whose lastVerified date is older than 30 days. Spot something stale or wrong? The Suggest pricing update button opens a pre-filled GitHub issue.

Frequently asked

Are these real prices? Yes, taken from the official pricing pages of Anthropic, OpenAI, Google, DeepSeek, Mistral, xAI, and Together. Each row carries its own verification date.
What about regional pricing? Numbers reflect the standard global API tier. Region-specific deals (Azure OpenAI enterprise, Vertex AI committed use) are not modelled here.
Does it count input and output separately? Yes. Output tokens are usually 4-5x more expensive than input tokens, and the calculator reflects that.
What is prompt caching? A discount on input tokens that have already been processed in a previous request. Useful for system prompts or long context that repeats. The cache hit ratio slider lets you model how much of your input would be cached.

toolMCP Server DirectorySearchable directory of Model Context Protocol servers, filterable by language, transport, and category.toolAI Token CounterEstimate token count and API cost for GPT-4, Claude, Gemini, and Llama models with live counting.toolPrompt OptimizerStructure and optimize AI prompts with role, context, constraints, examples, and output format templates.articleCan an AI Really Code Itself? Inside Anthropic\Explore the bold claim that Claude Code wrote 80% of its own code. Understand what it really means, how the agentic AI tool works, and its implications for software development.

Free forever, no ads, no tracking. Support the project

LLM Cost Calculator