Claude Fable 5: What's New
Anthropic's new top-tier model explained — specs, API pricing at 70% off official rates, Fable 5 vs Opus 4.8, and a two-variable setup for Claude Code.
Anthropic has shipped Claude Fable 5 — a new tier above Opus, and the most capable Claude model to date. The model id is claude-fable-5, and it is live on OmniaKey now at 70% off the official rate, on the same key and balance as every other model.
What's new in Fable 5
Fable 5 is not an Opus point release. It is a new top tier with its own pricing, sitting above Opus 4.8 the way Opus sits above Sonnet:
| Claude Fable 5 | Claude Opus 4.8 | |
|---|---|---|
| Model id | claude-fable-5 | claude-opus-4-8 |
| Context window | 1M tokens | 1M tokens |
| Max output | 128K tokens | 128K tokens |
| Thinking | Adaptive only — explicit disabled rejected; omit the field to skip thinking | Adaptive, optional — explicit disabled accepted |
| Official price (per 1M tokens, in / out) | $10 / $50 | $5 / $25 |
The request surface is the same as Opus 4.8 and 4.7: adaptive thinking replaces fixed thinking budgets, and the classic sampling knobs are gone entirely (more on that below). If your code already runs on Opus 4.8, switching is a one-string change — with one exception: an explicit thinking: {"type": "disabled"} is rejected on Fable 5 (details in the migration notes below).
For benchmark numbers, Anthropic's Fable 5 system card is the primary source. This post sticks to what changes in practice: specs, pricing, and how to run it.
API pricing: official vs OmniaKey
Fable 5 launches at double the Opus rate — $10 input / $50 output per million tokens. Heavy agent sessions burn output tokens fast, so the rate matters more than it seems. On OmniaKey, every Anthropic model is billed at 30% of the official price — the same 70% discount across the catalog:
| Per 1M tokens | Input | Output | Cache hit |
|---|---|---|---|
| Anthropic official | $10 | $50 | $1 |
| OmniaKey | $3 | $15 | $0.30 |
That is per-token billing with no monthly plan — top up, spend, and the dashboard shows exactly which calls cost what. Prompt caching passes through, so long agent sessions hit the $0.30 cache rate on repeated context.
Fable 5 or Opus 4.8?
At twice the price, Fable 5 is not the new default — it is the new ceiling.
- Stay on Opus 4.8 for day-to-day coding. It's still exceptional at long-horizon agentic work, and in most sessions you won't feel the difference.
- Reach for Fable 5 when you're genuinely stuck — the hardest refactors, deep multi-step reasoning, work where a failed run costs more than the tokens.
Since both run on the same endpoint and key, the practical pattern is: default to Opus 4.8, escalate to /model claude-fable-5 for the tasks that earn it, drop back after.
Try it in Claude Code
If Claude Code already points at OmniaKey, you only need to switch models inside the session:
/model claude-fable-5
If you're starting from scratch, it's two environment variables:
export ANTHROPIC_BASE_URL="https://api.omniakey.com"
export ANTHROPIC_AUTH_TOKEN="your-omniakey-api-key"
claude
Use the bare host — no /v1 suffix. Claude Code appends /v1/messages itself. The full walkthrough, including key creation, is in the Claude Code setup guide.
Cursor, Cline, and aider drive Fable 5 through OmniaKey's OpenAI-compatible endpoint instead — same claude-fable-5 id, no protocol gymnastics:
Whichever surface you use, the model id you request is the model that runs. OmniaKey never silently swaps a Fable 5 call to something cheaper.
Migrating from older Claude models: three 400s to know
Fable 5 keeps the Opus 4.8 request surface. Coming from older Claude models, though, three request shapes that used to work now return 400 — through any gateway, OmniaKey included, because these are model-level rules:
- Sampling parameters are gone.
temperature,top_p, andtop_kall return 400. Delete them; steer with the prompt instead. - Fixed thinking budgets are gone.
thinking: {"type": "enabled", "budget_tokens": N}returns 400. Usethinking: {"type": "adaptive"}and let the model decide how much to think. - You cannot explicitly disable thinking. Unique to Fable 5:
thinking: {"type": "disabled"}returns 400 (Opus 4.8 still accepts it). To run without thinking, omit thethinkingfield entirely.
Prefilling the final assistant turn also remains unsupported, as on every model since the 4.6 family — use structured outputs instead. Few-shot assistant messages earlier in the conversation are still fine.