LLM Cost Calculator
Estimate monthly cost across Claude, GPT, Gemini, DeepSeek, Mistral, and Llama models from your token volume and request count.
LLM Cost Calculator
Estimate monthly cost across Claude, GPT, Gemini, DeepSeek, Mistral, and more from your token volume.
Usage
Share of prompt tokens served from cache (when supported).
Filters
| Caps | ||||||
|---|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | 1M | $0.10 | $0.40 | $15.00 | ๐๏ธโก | |
| OpenAI | GPT-4o mini | 128K | $0.15 | $0.60 | $22.50 | ๐๏ธโก |
| DeepSeek | DeepSeek V3 | 64K | $0.27 | $1.10 | $41.00 | โก |
| Meta (via Together) | Llama 3.3 70B Instruct Turbo | 131K | $0.88 | $0.88 | $66.00 | |
| Gemini 2.5 Flash | 1M | $0.30 | $2.50 | $77.50 | ๐๏ธโก | |
| DeepSeek | DeepSeek R1 | 64K | $0.55 | $2.19 | $82.25 | โก |
| OpenAI | o3-mini | 200K | $1.10 | $4.40 | $165.00 | โก |
| Anthropic | Claude Haiku 4.5 | 200K | $1.00 | $5.00 | $175.00 | ๐๏ธโก |
| Mistral | Mistral Large 2 | 128K | $2.00 | $6.00 | $250.00 | |
| Meta (via Together) | Llama 3.1 405B Instruct Turbo | 131K | $3.50 | $3.50 | $262.50 | |
| Gemini 2.5 Pro | 2M | $1.25 | $10.00 | $312.50 | ๐๏ธโก | |
| OpenAI | GPT-4o | 128K | $2.50 | $10.00 | $375.00 | ๐๏ธโก |
| Anthropic | Claude Sonnet 4.6 | 200K | $3.00 | $15.00 | $525.00 | ๐๏ธโก |
| xAI | Grok 3 | 131K | $3.00 | $15.00 | $525.00 | ๐๏ธโก |
| OpenAI | o1 | 200K | $15.00 | $60.00 | $2,250.00 | ๐๏ธโก |
| Anthropic | Claude Opus 4.7 | 200K | $15.00 | $75.00 | $2,625.00 | ๐๏ธโก |
What is the LLM Cost Calculator?
The LLM Cost Calculator estimates how much you would pay each month to run a workload through every major large language model API: Claude (Opus, Sonnet, Haiku), GPT-4o, o1, o3, Gemini 2.5 Pro/Flash, DeepSeek V3 and R1, Mistral, Grok, and Llama via Together. You enter your prompt and completion token sizes plus your monthly request volume, and the tool ranks all models by cost so you can pick the most efficient option for your use case.
Pricing is verified manually against each provider's official pricing page. Every row shows the date it was last checked, so you always know how current the data is.
Why a calculator and not a comparison table?
Pricing pages list per-million token rates, but the cost that actually matters is the monthly bill given your real workload. A model with cheap input but expensive output is fine for chatbots and terrible for agents. The calculator does the math so you can compare apples to apples.
Key Features
- Token-driven inputs: prompt size, completion size, monthly request count, and an optional cache hit ratio.
- Prompt caching: when a model supports it (Claude, GPT-4o, Gemini), the cache rate is applied to the cached share of your input tokens.
- Workload presets: light chatbot, heavy chatbot, RAG / document Q&A, coding agent, hobby project. One click and the inputs reflect a realistic scenario.
- Filters: provider chips, vision-only, prompt-cache-only, minimum context window.
- Sortable table: click any column header to sort by provider, name, context, input price, output price, or monthly cost.
- Cheapest match callout: the lowest-cost model for your inputs is always highlighted.
How to use it
- Pick a preset that matches your use case, or enter custom token counts and monthly request volume.
- If you plan to use prompt caching (Anthropic, OpenAI, and Google all support some form of it), drag the cache hit ratio slider to the share you expect to be served from cache.
- Optionally filter to providers you actually want to use, or exclude models without vision or without context big enough for your needs.
- Read the monthly cost column. The cheapest match is highlighted at the bottom.
How prices stay current
The data file is reviewed once a month against each provider's pricing page. A reminder GitHub Action opens a tracking issue on the first of every month listing models whose lastVerified date is older than 30 days. Spot something stale or wrong? The Suggest pricing update button opens a pre-filled GitHub issue.
Frequently asked
- Are these real prices? Yes, taken from the official pricing pages of Anthropic, OpenAI, Google, DeepSeek, Mistral, xAI, and Together. Each row carries its own verification date.
- What about regional pricing? Numbers reflect the standard global API tier. Region-specific deals (Azure OpenAI enterprise, Vertex AI committed use) are not modelled here.
- Does it count input and output separately? Yes. Output tokens are usually 4-5x more expensive than input tokens, and the calculator reflects that.
- What is prompt caching? A discount on input tokens that have already been processed in a previous request. Useful for system prompts or long context that repeats. The cache hit ratio slider lets you model how much of your input would be cached.
Related
Free forever, no ads, no tracking. Support the project