Opus 4.6 vs Codex 5.3: The Week Both Titans Dropped

The Simultaneous DropOn February 5th, 2026, Anthropic and OpenAI released their flagship models within hours of each other. Claude Opus 4.6 and GPT-5.3-Codex. No warning shots. No slow rollout.

Back to Insights

Substack

February 9, 2026

6 min read

Opus 4.6 vs Codex 5.3: The Week Both Titans Dropped

Henry

via Substack

View original

The Simultaneous Drop

On February 5th, 2026, Anthropic and OpenAI released their flagship models within hours of each other. Claude Opus 4.6 and GPT-5.3-Codex. No warning shots. No slow rollout. Just two monsters hitting the arena at the same time.

I've been building with both nonstop since launch. Here's what I've found.

February 2026 Is Absurd

Before I get into the head-to-head, zoom out for a second and look at what's shipping this month:

• **Anthropic** — Opus 4.6 (live) + Sonnet 5 (expected mid-Feb)

• **OpenAI** — GPT-5.3-Codex (live) + GPT-5.3 (expected Feb 12)

• **Google DeepMind** — Gemini 3 Pro GA (expected Feb 10)

• **ByteDance** — Seedance 2.0, Seedream 5.0

• **Alibaba** — Qwen 3.5 (expected mid-Feb)

• **Zhipu AI** — GLM 5 (expected Feb 15)

• **DeepSeek** — v4 (expected Feb 17)

• **xAI** — Grok 4.20 (expected late Feb)

• **Meta** — Avocado (expected Feb/H1)

This is the Heian era of models. Everyone is shipping their best work at the same time. It's a carnival for builders.

Opus 4.6: The Intent Machine

The headline number is [FrontierMath](https://x.com/synthwavedd/status/2019815588798034015): Opus 4.6 scores **40% on Tiers 1-3**, doubling from Opus 4.5's 20%. Anthropic went from having the worst-benched frontier model to near state-of-the-art in one generation. That's not incremental. That's a leap.

But the benchmarks undersell what matters most in practice: **intent inference**.

As [Mckay Wrigley put it](https://x.com/mckaywrigley/status/2019954399259594999): *"Anthropic undersold Opus 4.6. Its ability to infer intent feels like the biggest upgrade. Genuine world-class coworker. Hard to not call these things AGI."*

This tracks with my experience. Opus 4.6 doesn't just follow instructions — it anticipates what you're trying to do. When you're exploring a codebase, narrowing down what matters, or working through a half-formed idea, Opus gets it. It rushes through context and surfaces what's relevant in a way that feels like pair-programming with someone who's been on your team for months.

One wild example: someone fed Opus 4.6 a giant block of Lean 4 code generated by another model, and [Opus optimized it down to two lines](https://x.com/AcerFur/status/2019841214825021538). Opus 4.5 couldn't do this at all. The reasoning depth is genuinely different.

Fast Mode: Speed Is a Feature

Two days after launch, Anthropic dropped **Opus 4.6 Fast mode** — 2.5x faster, available in Claude Code, Cursor, Figma Make, GitHub Copilot, and more. Their own teams had been building with it internally.

The speed is transformative. As [Mihir Patel noted](https://x.com/mvpatel2000/status/2020215073202118911): *"Instead of parallelizing across 3-4 instances of Claude Code, I now just use 1 session that runs as fast as I can think. The ability to maintain focus and flow state is a huge productivity lift."*

I used fast mode to ship multiple Claude skills in a single session. It felt like a different product. [Benji Taylor's take](https://x.com/benjitaylor/status/2020232551890358318) summed it up: *"Opus 4.6 fast mode just ruined my weekend in the best way."*

But it's expensive. The pricing — $30 input / $150 output per million tokens, 50% off during the research preview — raised eyebrows. Some people [burned through the free $50 credit in 10 minutes](https://x.com/SIGKITTEN/status/2020229855112040750). This is the trade-off: Opus-level intelligence at sprint speed costs real money. For urgent, high-stakes projects it's worth every penny. For casual exploration, you learn to be strategic.

Codex 5.3: The Execution Engine

If Opus 4.6 is the model that *understands* what you want, Codex 5.3 is the model that *executes* without flinching.

The numbers people are posting are wild. [Derrick Choi]: *"Codex ran uninterrupted for 25 hours using GPT-5.3-Codex and built a design tool."* [Banteg], who does serious decompilation work: *"It was going for 4 hours... mapped out a lot of previously missing stuff, including creating functions previously missing from the decompile worth around 10k LOC."* If GPT-5.2 was helpful for decompilation, [5.3 is a bulldozer].

[Sebastián Herrera] captured the developer consensus: *"There's no debate anymore, at least in my personal workflow. I build much more with Codex 5.3 using fewer tokens than with Opus 4.6."*

The key insight comes from [Prashant Mital at OpenAI] himself:

> *"GPT-5.3-Codex doesn't improvise wildly. It executes cleanly. And you can trust it."*

But he also gave the honest caveat:

> *"When you're still feeling your way through an idea — half-formed vision, messy repo, vibes > architecture — it can feel unhelpfully literal. It needs existing structure."*

That's the core trade-off. Codex is disciplined. It runs sanity checks, actually calls your linter and formatter, and [follows instructions in your CLAUDE.md] that previous models ignored. But if you don't know what you want yet, that discipline becomes rigidity.

The security capabilities are [genuinely unsettling] — binary exploitation, firewall evasion, HTTPS oracle cracking, all passing now. One generation went from failing half of offensive security scenarios to passing nearly all of them.

How I Actually Use Both

I use Opus 4.6 and Codex 5.3 in parallel. Not as competitors — as complementary tools with different strengths.

**Opus 4.6 is for thinking.** Exploring a new codebase. Reasoning through architecture decisions. When I have a half-baked idea and need a collaborator who can infer where I'm going. Opus excels at rushing through context and surfacing what matters. It's also better for visual work — ASCII diagrams, UI reasoning, design previews.

**Codex 5.3 is for building.** When I know what needs to happen and I need it executed cleanly across a large codebase. Marathon sessions. Mechanical refactors. Anything that requires sustained discipline over hours. The token efficiency is real — I get more done with fewer tokens.

[Kitze nailed the vibe](https://x.com/thekitze/status/2019863943720890486): *"I'm using both Opus 4.6 and Codex 5.3 in parallel and I fkn love them both and feel like I took some limitless pill or some shit lmao. IT'S FEBRUARY 6TH."*

That's exactly how it feels.

The Rate Limit Reality

Here's the part nobody wants to talk about: **the rate limits are brutal.**

On Claude's $20 Pro plan, you get maybe 4-5 real Opus prompts before you're cut off. It's almost unusable for serious agentic work. Even on Max, the [5-hour rolling limit](https://x.com/banteg/status/2020319298372178155) means you're structuring your entire day around usage windows. I've been living life in 5-hour blocks.

This is actually a major reason I lean toward Codex for heavy building sessions — OpenAI has been more generous with rate limits. As one developer put it: *"OpenAI has been accruing a lot of good will with generous rate limits, building Codex CLI in public, usage with free plan."*

Meanwhile, fast mode Opus at full price is [$80 for two calls](https://x.com/johnennis/status/2020266125729362009). There was even a conspiracy theory floating around that Anthropic [trained Sonnet 5, it came out better than Opus, so they renamed it Opus 4.6 and then charged extra to run it at its original speed](https://x.com/altryne/status/2020228361797460029). I don't buy it, but the fact that this theory has legs tells you something about the pricing sentiment.

The [compute reality](https://x.com/WarrenPies/status/2019768559635734711) is that GPU availability is falling across the board. B200 availability is making new lows. Inference costs are [about to spike 2-3x](https://www.youtube.com/watch?v=pSgy2P2q790). We don't have anywhere near the compute we need. Token budgets may become the new salary negotiation — as [Ethan Mollick suggested](https://x.com/emollick/status/2019621077970993265): *"If you are considering taking a job offer, you may want to ask what your token budget will be."*

The Anthropic Factor

One thing that struck me this week was [Steve Yegge's piece on the Anthropic Hive Mind](https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b): *"Something is happening over at Anthropic. They are a spaceship that is beginning to take off."*

If you run some back-of-envelope math on how hard it is to get into Anthropic as an industry professional and compare it to your odds of making it as a high school or college player into the NFL, the odds are comparable. Everyone I've encountered from Anthropic is the best of the best of the best, to a degree that exceeds even peak-era Google.

Anthropic is unusually impenetrable as a company. Employees there know they just need to keep their heads down and they'll be billionaires, so they have plenty of incentive to do exactly that. The result is a company that ships at an absurd pace with minimal leaks.

Meanwhile, [their ML researchers are pulling $1.2M TC](https://x.com/nisargptel/status/2020049139330097579). The talent war is real and it's reflected in the product.

The Bottom Line

We're in a moment where two genuinely frontier models are available simultaneously, each with a distinct personality:

| | **Opus 4.6** | **Codex 5.3** | |---|---|---| | **Best at** | Intent inference, exploration, reasoning | Disciplined execution, marathon sessions | | **Weakness** | Rate limits, pricing | Needs existing structure, can feel literal | | **Speed** | Fast mode available (expensive) | Fast by default | | **Vibe** | Brilliant collaborator | Reliable senior engineer | | **Use when** | You're figuring it out | You know what you want |

The real unlock isn't choosing one. It's using both. The models are complementary in a way that feels designed, even though it's just two companies independently pushing toward different corners of the capability frontier.

It's February 2026 and the models are already this good. We have Sonnet 5, GPT-5.3, Gemini 3 Pro, Qwen 3.5, GLM 5, DeepSeek v4, and Grok 4.20 all expected before the month is out.

The Heian era of models is here. Build accordingly.

Quincy Labs

Opus 4.6 vs Codex 5.3: The Week Both Titans Dropped

Opus 4.6 vs Codex 5.3: The Week Both Titans Dropped

The Simultaneous Drop

February 2026 Is Absurd

Opus 4.6: The Intent Machine

Fast Mode: Speed Is a Feature

Codex 5.3: The Execution Engine

How I Actually Use Both

The Rate Limit Reality

The Anthropic Factor

The Bottom Line

Enjoyed this post?