Produced by W2D1 Media. Work with us →
Day One
A model can pass every test you write and still produce something that looks completely awful.
Peter Gostev
Share this quote on X on LinkedIn Download card

Peter Gostev is head of AI capabilities at Arena (LMArena), the community-based platform where millions of real people vote in blind tests to rank AI models, born out of research at UC Berkeley. Before Arena, Peter was Head of AI at Moonpig and built a large following sharing hands-on explorations of what the latest models can actually do. He joins Georgie Healy from London for a genuinely nerdy, insider look at how models are judged and where the frontier is heading.

In this episode, Peter explains the difference between static benchmarks and human judgment, and why a model can pass every test you write and still produce something that looks completely awful. He breaks down the current state of the leaderboards, why Anthropic's models are dominating and how that tracks with real world adoption, and gives a sharp comparison of the top Western models, including why Anthropic's non-reasoning models are exceptional while OpenAI's strength lies in deep reasoning. Georgie and Peter get into why people aren't using Chinese models more despite their quality, the economics behind AI pricing and how enterprise usage is priced very differently from consumer subscriptions, why release cadence matters as much as capability, and what the wave of data centre investment means for the models arriving next. Along the way there's a fond detour on Opus 3 as the model you could talk to for hours, and why better models can sometimes feel worse.

Tune in for a clear-eyed, hype-free guide to how AI models are really evaluated, straight from someone who watches the charts move in real time.

Proudly presented by
Produced by W2D1 Media

Liked this episode? Imagine one for your fund.

We're W2D1 Media — the team behind the Day One Network and Blackbird's Wild Hearts. We turn podcasts into trust, authority and pipeline.

Book a call →
More from In The Blink Of AI with Georgie Healy

Related episodes

Produced by W2D1 Media

Turn podcasting into pipeline

We're the team behind the Day One Network and Blackbird's Wild Hearts. We help founders, funds and operators build trust, authority and deal flow with a show tailored to their market.

Investors

Win better deals and stay top‑of‑mind with founders.

Book a call →

Founders & Operators

Close more deals and build a category you own.

Book a call →

Sponsors

Reach founders and operators with a show they trust.

Book a call →