• GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies
    Apr 24 2026

    GPT 5.5 full analysis, plus DeepSeek V4 paper highlights, comparisons with Mythos, a vibe-coded game w/ GPT Image 2, and 50 data-points you wouldn’t get from just reading the headlines.


    Chapters:

    01:11 - GPT 5.5 Comparison

    06:04 - Mythos Marketing

    11:50 - Recursive Self-Improvement?

    14:11 - Deepseek V4

    18:03 - VibeCode Experiment Extravaganza

    21:44 - The Scarce Compute Era



    https://80000hours.org/aiexplained



    OpenAI Benchmarks:
    https://openai.com/index/introducing-gpt-5-5/


    5.5 System Card: https://deploymentsafety.openai.com/gpt-5-5/gpt-5-5.pdf


    Direct Comparison: https://pbs.twimg.com/media/HGnNm5GWEAAJ1Ob?format=jpg&name=4096x4096


    DeepSeek Paper: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro


    SWE Bench Pro - benchmark of choice? https://x.com/ChowdhuryNeil/status/2047416077622395025


    AA Omniscience: https://artificialanalysis.ai/evaluations/omniscience

    Vending Bench: https://x.com/andonlabs/status/2047377260412649967


    Opus 4.7 System Card: https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf


    Sam Altman Drunk Phase: https://x.com/sama/with_replies


    Noam Brown: https://x.com/polynoamial/status/2047387675762802998


    DeepSeek Compute Crunch: https://www.bloomberg.com/news/articles/2026-04-24/deepseek-unveils-newest-flagship-a-year-after-ai-breakthrough?srnd=phx-ai


    Spreadsheet Bench: https://x.com/nicochristie/status/2047476237464211721


    Pattern Recognition: https://arcprize.org/leaderboard


    Leader Interviews:

    Core Memory: https://www.youtube.com/watch?v=NCKQL0op30E

    Knowledge Podcast: https://www.youtube.com/watch?v=6JoUcQ1qmAc
    Big Tech Round 1: https://www.youtube.com/watch?v=J6vYvk7R190&t=1116s

    Big Tech Round 2: https://www.youtube.com/watch?v=YnoQ8RJbALw&t=8s


    Claude Code Limitations: https://x.com/TheAmolAvasare/status/2046724659039932830


    ChatGPT 5.4 for Clinicians: https://openai.com/index/making-chatgpt-better-for-clinicians/


    Image Arena: https://x.com/arena/status/2046670703311884548


    VibeCode Bench: https://www.vals.ai/benchmarks/vibe-code


    5.5-made Game +Seedance 2.0: https://rosemere-quest.pages.dev/


    Show More Show Less
    25 mins
  • Claude Opus 4.7 - A New Frontier, in Performance … and Drama
    Apr 17 2026

    Claude Opus 4.7 just dropped, but behind every headline lies a deeper story. From a bonanza of benchmarks, to seeing the fruits of one of the biggest mega-projects in US history, to sneaky Mythos disclaimers, to Anthropic admitting compute restraints and, forcing lower capability of Opus 4.7. Where the new model falls behind Gemini but ahead of GPT 5.4, plus why some users are furious at Anthropic. Ending with a 9-year animus, that still affects AI today…

    https://assemblyai.com/aiexplained



    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:58 - Benchmarks
    05:21 - Market Share + Compute Problems
    08:12 - Mythos Exclusives
    12:56 - User Frustration + Claude Code Updates
    14:03 - Brockman Amodei Rivalry
    17:40 - OpenAI vs Anthropic Approach to Code

    Claude 4.7 Opus Release Notes: https://www.anthropic.com/news/claude-opus-4-7
    vs Mythos: https://pbs.twimg.com/media/HGCGugrXUAAKcHp?format=jpg&name=medium

    232-page System Card: https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf

    ARC-AGI 2: https://x.com/arcprize/status/2044834615417053305/photo/1

    ParseBench: https://x.com/jerryjliu0/status/2044902620746363016/photo/1

    GDPVal: https://artificialanalysis.ai/evaluations/gdpval-aa

    Vidoc Security Replication: https://blog.vidocsecurity.com/blog/we-reproduced-anthropics-mythos-findings-with-public-models

    Boris Cherny Settings: https://x.com/Hesamation/status/2043016923961577516/photo/2

    User Frustration: https://x.com/RileyRalmuto/status/2044836116189069660

    VibeCode Bench: https://x.com/ValsAI/status/2044791415524471099/photo/1

    Verge Memo: https://www.theverge.com/ai-artificial-intelligence/911118/openai-memo-cro-ai-competition-anthropic

    5.4 Cyber: ​​https://openai.com/index/scaling-trusted-access-for-cyber-defense/

    Data Centers in Absolute $: https://x.com/finmoorhouse/status/2044933442236776794/photo/1

    …in % of GDP: https://pbs.twimg.com/media/HGEN8FGWQAAN7Np?format=jpg&name=4096x4096

    WSJ Exclusive: https://www.wsj.com/tech/ai/the-decadelong-feud-shaping-the-future-of-ai-7075acde

    Brockman Interview: https://www.youtube.com/watch?v=J6vYvk7R190

    $1T Valuation: https://x.com/StefanFSchubert/status/2045039686997967082

    Emotions: https://www.patreon.com/c/aiexplained/posts

    https://lmcouncil.ai/benchmarks


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    20 mins
  • Claude Mythos: Highlights from 244-page Release
    Apr 8 2026

    The model, the mythos, the legend. We have a new best AI model, but not all of us. How good is it, what does it’s new offensive capabilities mean? Why does it’s 244 page report card remind me of Her, and why did the creator of Claude Code call it ‘terrifying’. 30+ highlights sourced by reading the paper in full, old-school, no AI summary.

    https://80000hours.org/aiexplained


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:56 - Internal Release + Availability
    02:37 - General Capabilities
    05:12 - Self-improvement?
    06:15 - ‘Terrifying’ Landscape
    11:07 - Safety Decision
    13:22 - Coding
    14:49 - Alignment, Awareness
    19:52 - GUI for Agents/Claws + Hallucinations
    21:34 - …Emotions?
    25:29 - Her connection

    244-page System Card: https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8dda846ab289.pdf

    Project Glasswing: https://www.anthropic.com/glasswing
    Zero-Day Details: https://red.anthropic.com/2026/mythos-preview/

    Mythos ‘terrifying’: https://x.com/bcherny/status/2041605852382351666

    New Yorker Altman/Amodei: https://archive.fo/20260406100412/https://www.newyorker.com/magazine/2026/04/13/sam-altman-may-control-our-future-can-he-be-trusted

    Alignment Risk Update: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de43218158e5f25c.pdf

    In a Park: https://x.com/sleepinyourhat/status/2041584808514744742

    “Uhm” - https://x.com/thsottiaux/status/2041749947385815109


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    28 mins
  • OpenAI Spud, a Claude Model set to ‘stir governments’, Beast Mode ARC-AGI-3
    Mar 26 2026

    First look at exclusive reports about OpenAI's new Spud model, and the model Anthropic think will stir governments to urgency, all in the context of the newly-launched ARC-AGI-3. What does the extreme difficulty of that benchmarks, and its quirky scoring metrics, mean for AI in 2026?

    https://assemblyai.com/aiexplained


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    00:55 - OpenAI Side Quests
    01:58 - Claude New Model Coming + Universal Equity?
    03:13 - ARC-AGI 3
    05:00 - Intentional or Unintentional Gaming?
    07:11 - But is it AGI Harbinger? No Harness
    09:41 - Not the First
    12:32 - Automated Researcher
    15:00 - Claw Caveat

    Spud: https://www.theinformation.com/articles/openai-ceo-shifts-responsibilities-preps-spud-ai-model?utm_campaign=Editorial&utm_content=Article&utm_medium=organic_social&utm_source=bluesky%2Cfacebook%2Clinkedin%2Cthreads%2Ctwitter&rc=sy0ihq

    FT: OpenAI Special Model: https://www.ft.com/content/de9bf0af-b241-424f-8229-5870b1c0d93d?syn-25a6b1a6=1

    Jensen Huang: https://www.forbes.com/sites/antoniopequenoiv/2026/03/23/nvidias-jensen-huang-says-he-thinks-weve-achieved-agi/

    Axios Article: https://archive.fo/20260326100140/https://www.axios.com/2026/03/26/anthropic-pentagon-ai-deal#selection-827.0-829.257

    https://arcprize.org/arc-agi/3

    ARC AGI 3 Paper: https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf

    NetHack Leaderboard: https://balrogai.com/
    Paper: https://ai.meta.com/research/publications/the-nethack-learning-environment/
    https://x.com/_rockt/status/2036864121585438995

    Claw Shells: https://x.com/DrJimFan/status/2036494601750716711

    OpenAI Automated Researcher: https://www.technologyreview.com/2026/03/20/1134438/openai-is-throwing-everything-into-building-a-fully-automated-researcher/

    Patreon Post: https://www.patreon.com/c/aiexplained/posts

    Eng Jobs: https://x.com/lennysan/status/2036535460726767793

    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    16 mins
  • What the New ChatGPT 5.4 Means for the World
    Mar 6 2026

    Just 48 hours after releasing GPT 5.3 Instant, OpenAI have released GPT 5.4 Thinking, so either their is an imminent singularity or perhaps we are being distracted from other news. This video will give 9 crucial bits of context, not just on the GPT 5.4 drop but on the background to the meltdown between the Pentagon and Anthropic. What does this say about the state of AI progress, your job, and what is next.


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for 15% off paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    01:06: GPT 5.4 Breakdown
    05:06 - Closing the Loop
    06:35 - Spiky Performance
    10:31 - Advice
    11:32 - Less Encouraging Developments - Fired Like Dogs
    17:45 - But Used in Iran


    GPT 5.4: https://openai.com/index/introducing-gpt-5-4/

    Hallucinations: https://artificialanalysis.ai/evaluations/omniscience
    Investment Banking Bench: https://x.com/bradlightcap/status/2029684672343728452
    Move 37: https://x.com/nasqret/status/2029628846518010099
    System Card: https://deploymentsafety.openai.com/gpt-5-4-thinking/gpt-5-4-thinking.pdf

    Prediction Market Scandal: https://www.wired.com/story/openai-fires-employee-insider-trading-polymarket-kalshi/


    GPT 5.3 Instant: https://openai.com/index/gpt-5-3-instant/

    GDPVal: https://openai.com/index/gdpval/

    Claude in Iran: https://www.washingtonpost.com/technology/2026/03/04/anthropic-ai-iran-campaign

    ‘Like Dogs’: https://x.com/AndrewCurran_/status/2029605783311470679

    Altman leak: https://www.cnbc.com/2026/03/03/sam-altman-tells-openai-staff-operational-decisions-up-to-government.html

    Original 2024 Switch: https://archive.fo/20240116172526/https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans#selection-6173.83-6173.226

    Amodei Original Memo: https://www.theinformation.com/articles/read-anthropic-ceos-memo-attacking-openais-mendacious-pentagon-announcement?rc=sy0ihq
    Anthropic Apology: https://www.anthropic.com/news/where-stand-department-war
    OpenAI Employee Reaction: https://x.com/tszzl/status/2029334980481212820

    DoD Suppler Risk: https://www.cnbc.com/amp/2026/03/05/anthropic-pentagon-ai-claude-iran.html
    Atlantic Exclusive: https://archive.fo/20260301152646/https://www.theatlantic.com/technology/2026/03/inside-anthropics-killer-robot-dispute-with-the-pentagon/686200/#selection-941.61-941.212
    No Negotiation: https://x.com/USWREMichael/status/2029754965778907493

    $20B Doubling: https://archive.ph/20260304111124/https://www.bloomberg.com/news/articles/2026-03-03/anthropic-nears-20-billion-revenue-run-rate-amid-pentagon-feud

    March 2022 Interview: https://www.youtube.com/watch?v=uAA6PZkek4A

    https://lmcouncil.ai/



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    22 mins
  • Deadline Day for Autonomous AI Weapons & Mass Surveillance
    Feb 27 2026

    Will Anthropic be forced to make a version of Claude for war? And does a new paper expose the risks of Claude agents, in both OpenClaw and the field of war? Plus, 5 more twists in the story of the Pentagon versus Anthropic + some AI lab employees, and a petition that could change everything, or nothing...


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:44 - Deadline Day + Petition
    02:42 - Twist 1: Existing Deal
    03:26 - Twist 2: Existing Policy
    04:21 - Twist 3: Twin Threats
    05:54 - Twist 4: Interesting Objections
    11:32 - Twist 5: Anthropic’s Dropped Policy


    Dario Statement: https://www.anthropic.com/news/statement-department-of-war

    Google/OpenAI Petition: https://notdivided.org/

    Axios on Amodei Rejection: https://www.axios.com/2026/02/26/anthropic-rejects-pentagon-ai-terms

    FT on US Threat: https://www.ft.com/content/11d27612-d6c5-4cf7-94dd-f65603549b7f

    Politico on Latest: https://archive.ph/20260227013117/https://www.politico.com/news/2026/02/26/incoherent-hegseths-anthropic-ultimatum-confounds-ai-policymakers-00800135

    The Verge on Current Deal: https://www.theverge.com/ai-artificial-intelligence/883456/anthropic-pentagon-department-of-defense-negotiations

    Anthropic RSP change: https://www.anthropic.com/news/responsible-scaling-policy-v3

    Time Magazine on RSP: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/

    Agent of Chaos Paper: https://x.com/NatalieShapira/status/2026062499599319526

    AI Agent Reliability Paper: https://arxiv.org/pdf/2602.16666

    My Patreon Video: https://www.patreon.com/posts/real-mystery-ai-151647211

    Patreon Documentary: https://www.patreon.com/posts/our-new-age-of-133960279



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    14 mins
  • Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
    Feb 20 2026

    Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!

    https://epoch.ai/ai-explained-datacenters


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    00:30 - Post-training Dominance
    04:00 - ARC-AGI 2 Caveat
    05:54 - Simple Bench Record
    08:22 - Hallucination Caveat
    10:05 - Model Card
    11:12 - Exponential Coming
    12:20 - Amodei on Generalizing
    15:10 - One True Benchmark?
    17:02 - Other Metrics…

    Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf

    Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

    Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy

    Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week

    Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience

    Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526
    ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1

    Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442

    METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/

    Talaas Fast: https://chatjimmy.ai/

    Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved

    Metaculus FutureEval: https://www.metaculus.com/futureeval/

    Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    19 mins
  • The Two Best AI Models/Enemies Just Got Released Simultaneously
    Feb 6 2026

    The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much more

    https://assemblyai.com/aiexplained

    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:54 - Self-improvement?
    02:44 - Knowledge Work
    05:30 - Overly agentic behaviour
    09:12 - Who Shouldn’t Use Claude Opus
    11:39 - Step-change?
    15:09 - Claude’s ‘Personhood’

    Hassabis Roadmap: https://www.patreon.com/posts/hassabis-roadmap-149750869

    Release of Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
    212 Page System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
    Claude Code Tip: https://x.com/bcherny/status/2019475897691124107


    GPT Codex 5.3: https://openai.com/index/introducing-gpt-5-3-codex/

    System Card: https://openai.com/index/gpt-5-3-codex-system-card/

    Browse Comp: https://arxiv.org/pdf/2504.12516v1
    Finance Agent: https://www.vals.ai/benchmarks/finance_agent
    Terminal Bench 2: https://arxiv.org/pdf/2601.11868
    Vending Bench: https://andonlabs.com/blog/opus-4-6-vending-bench

    My X post: https://x.com/AIExplainedYT/status/2016851303436095647

    Anthropic Apology: https://x.com/ch402/status/2014066134194995256/photo/1

    Altman rebuttal: https://x.com/sama/status/2019139174339928189
    https://x.com/sama/status/2019140276246442089

    4% of GitHub: https://x.com/dylan522p/status/2019490550911766763



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    20 mins