Anthropic has shipped Claude Fable 5 — a new tier above Opus, and the most capable Claude model to date. The model id is claude-fable-5, and it is live on OmniaKey now at 70% off the official rate, on the same key and balance as every other model.

What's new in Fable 5

Fable 5 is not an Opus point release. It is a new top tier with its own pricing, sitting above Opus 4.8 the way Opus sits above Sonnet:

	Claude Fable 5	Claude Opus 4.8
Model id	`claude-fable-5`	`claude-opus-4-8`
Context window	1M tokens	1M tokens
Max output	128K tokens	128K tokens
Thinking	Adaptive only — explicit `disabled` rejected; omit the field to skip thinking	Adaptive, optional — explicit `disabled` accepted
Official price (per 1M tokens, in / out)	$10 / $50	$5 / $25

The request surface is the same as Opus 4.8 and 4.7: adaptive thinking replaces fixed thinking budgets, and the classic sampling knobs are gone entirely (more on that below). If your code already runs on Opus 4.8, switching is a one-string change — with one exception: an explicit thinking: {"type": "disabled"} is rejected on Fable 5 (details in the migration notes below).

For benchmark numbers, Anthropic's Fable 5 system card is the primary source. This post sticks to what changes in practice: specs, pricing, and how to run it.

API pricing: official vs OmniaKey

Fable 5 launches at double the Opus rate — $10 input / $50 output per million tokens. Heavy agent sessions burn output tokens fast, so the rate matters more than it seems. On OmniaKey, every Anthropic model is billed at 30% of the official price — the same 70% discount across the catalog:

Per 1M tokens	Input	Output	Cache hit
Anthropic official	$10	$50	$1
OmniaKey	$3	$15	$0.30

That is per-token billing with no monthly plan — top up, spend, and the dashboard shows exactly which calls cost what. Prompt caching passes through, so long agent sessions hit the $0.30 cache rate on repeated context.

Fable 5 or Opus 4.8?

At twice the price, Fable 5 is not the new default — it is the new ceiling.

Stay on Opus 4.8 for day-to-day coding. It's still exceptional at long-horizon agentic work, and in most sessions you won't feel the difference.
Reach for Fable 5 when you're genuinely stuck — the hardest refactors, deep multi-step reasoning, work where a failed run costs more than the tokens.

Since both run on the same endpoint and key, the practical pattern is: default to Opus 4.8, escalate to /model claude-fable-5 for the tasks that earn it, drop back after.

Try it in Claude Code

If Claude Code already points at OmniaKey, you only need to switch models inside the session:

text

/model claude-fable-5

If you're starting from scratch, it's two environment variables:

bash

export ANTHROPIC_BASE_URL="https://api.omniakey.com"
export ANTHROPIC_AUTH_TOKEN="your-omniakey-api-key"
claude

Use the bare host — no /v1 suffix. Claude Code appends /v1/messages itself. The full walkthrough, including key creation, is in the Claude Code setup guide.

Cursor, Cline, and aider drive Fable 5 through OmniaKey's OpenAI-compatible endpoint instead — same claude-fable-5 id, no protocol gymnastics:

OpenAI-compatible

https://api.omniakey.com/v1

Anthropic-native

https://api.omniakey.com

Gemini-native

https://api.omniakey.com/v1beta

Whichever surface you use, the model id you request is the model that runs. OmniaKey never silently swaps a Fable 5 call to something cheaper.

Migrating from older Claude models: three 400s to know

Fable 5 keeps the Opus 4.8 request surface. Coming from older Claude models, though, three request shapes that used to work now return 400 — through any gateway, OmniaKey included, because these are model-level rules:

Sampling parameters are gone. temperature, top_p, and top_k all return 400. Delete them; steer with the prompt instead.
Fixed thinking budgets are gone. thinking: {"type": "enabled", "budget_tokens": N} returns 400. Use thinking: {"type": "adaptive"} and let the model decide how much to think.
You cannot explicitly disable thinking. Unique to Fable 5: thinking: {"type": "disabled"} returns 400 (Opus 4.8 still accepts it). To run without thinking, omit the thinking field entirely.

Prefilling the final assistant turn also remains unsupported, as on every model since the 4.6 family — use structured outputs instead. Few-shot assistant messages earlier in the conversation are still fine.

Get an OmniaKey API key See model pricing