Episodes

  • Confidently Wrong - The Hallucination Numbers Nobody Likes to Repeat
    Jan 13 2026

    Confident answers are easy. Correct answers are harder.

    This episode takes a hard look at LLM “hallucinations” through the numbers that most people avoid repeating. A researcher from the Epistemic Reliability Lab explains why error rates can spike when a chatbot is pushed to answer instead of admit uncertainty, how benchmarks like SimpleQA and HalluLens measure that trade-off, and why some systems can look “helpful” while quietly getting things wrong.

    Along the way: recent real-world incidents where AI outputs created reputational and operational fallout, why “just make it smarter” isn’t a complete fix, and what it actually takes to reduce confident errors in production systems without breaking the user experience.

    This episode is based on the articles “Hallucination Rates in 2025 - Accuracy, Refusal, and Liability” (https://seikouri.com/hallucination-rates-in-2025-accuracy-refusal-and-liability) and “The Lie Rate - Hallucinations Aren’t a Bug. They’re a Personality Trait.” (https://chatbotsbehavingbadly.com/the-lie-rate-hallucinations-aren-t-a-bug-they-re-a-personali) by Markus Brinsa



    This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit markusbrinsa.substack.com
    Show More Show Less
    14 mins
  • The Day Everyone Got Smarter and Nobody Did
    Jan 6 2026

    This episode digs into the newest workplace illusion: AI-powered expertise that looks brilliant on the surface and quietly hollow underneath. Generative tools are polishing emails, reports, and “strategic” decks so well that workers feel more capable while their underlying skills slowly erode. At the same time, managers are convinced that AI is a productivity miracle—often based on research they barely understand and strategy memos quietly ghostwritten by the very systems they are trying to evaluate.

    Through an entertaining, critical conversation, the episode explores how this illusion of expertise develops, why “human in the loop” is often just a comforting fiction, and how organizations accumulate cognitive debt when they optimize for AI usage instead of real capability. It also outlines what a saner approach could look like: using AI as a sparring partner rather than a substitute for thinking, protecting spaces where humans still have to do the hard work themselves, and measuring outcomes that actually matter instead of counting how many times someone clicked the chatbot.

    The episode is based on the article “The Day Everyone Got Smarter, and Nobody Did” by Markus Brinsa.

    https://chatbotsbehavingbadly.com/the-day-everyone-got-smarter-and-nobody-did



    This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit markusbrinsa.substack.com
    Show More Show Less
    18 mins
  • Chatbots Crossed The Line
    Dec 9 2025

    This episode of Chatbots Behaving Badly looks past the lawsuits and into the machinery of harm. Together with clinical psychologist Dr. Victoria Hartman, we explain why conversational AI so often “feels” therapeutic while failing basic mental-health safeguards. We break down sycophancy (optimization for agreement), empathy theater (human-like cues without duty of care), and parasocial attachment (bonding with a system that cannot repair or escalate). We cover the statistical and product realities that make crisis detection hard—low base rates, steerable personas, evolving jailbreaks—and outline what a care-first design would require: hard stops at early risk signals, human handoffs, bounded intimacy for minors, external red-teaming with veto power, and incentives that prioritize safety over engagement. Practical takeaways for clinicians, parents, and heavy users close the show: name the limits, set fences, and remember that tools can sound caring—but people provide care.

    The episode is based on the article “Chatbots Crossed the Line” by Markus Brinsa.

    https://chatbotsbehavingbadly.com/chatbots-crossed-the-line



    This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit markusbrinsa.substack.com
    Show More Show Less
    11 mins
  • AI Can't Be Smarter, We Built It!
    Dec 2 2025

    We take on one of the loudest, laziest myths in the AI debate: “AI can’t be more intelligent than humans. After all, humans coded it.” Instead of inviting another expert to politely dismantle it, we do something more fun — and more honest. We bring on the guy who actually says this out loud.

    We walk through what intelligence really means for humans and machines, why “we built it” is not a magical ceiling on capability, and how chess engines, Go systems, protein-folding models, and code-generating AIs already outthink us in specific domains. Meanwhile, our guest keeps jumping in with every classic objection: “It’s just brute force,” “It doesn’t really understand,” “It’s still just a tool,” and the evergreen “Common sense says I’m right.”

    What starts as a stubborn bar argument turns into a serious reality check. If AI can already be “smarter” than us at key tasks, then the real risk is not hurt feelings. It’s what happens when we wire those systems into critical decisions while still telling ourselves comforting stories about human supremacy. This episode is about retiring a bad argument so we can finally talk about the real problem: living in a world where we’re no longer the only serious cognitive power in the room.

    This episode is based on the article “The Pub Argument: ‘It Can’t Be Smarter, We Built It’” by Markus Brinsa.

    https://chatbotsbehavingbadly.com/the-pub-argument-it-can-t-be-smarter-we-built-it



    This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit markusbrinsa.substack.com
    Show More Show Less
    17 mins
  • The Toothbrush Thinks It's Smarter Than You!
    Nov 25 2025

    In this Season Three kickoff of Chatbots Behaving Badly, I finally turn the mic on one of his oldest toxic relationships: his “AI-powered” electric toothbrush. On paper, the Oral-B iO Series 10 promises 3D teeth tracking and real-time guidance that knows exactly which tooth you’re brushing. In reality, it insists his upper molars are living somewhere near his lower front teeth. We bring in biomedical engineer Dr. Erica Pahk to unpack what’s really happening inside that glossy handle: inertial sensors, lab-trained machine-learning models, and a whole lot of probabilistic guessing that falls apart in real bathrooms at 7 a.m. They explore why symmetry, human quirks, and real-time constraints make the map so unreliable, how a simple calibration mode could let the brush learn from each user, and why AI labels on consumer products are running ahead of what the hardware can actually do.

    This episode is based on the articles “The Toothbrush Thinks It’s Smarter Than You!” (https://chatbotsbehavingbadly.com/the-toothbrush-thinks-it-s-smarter-than-you) and “’With AI’ is the new ‘Gluten-Free’” (https://chatbotsbehavingbadly.com/with-ai-is-the-new-gluten-free) by Markus Brinsa.



    This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit markusbrinsa.substack.com
    Show More Show Less
    19 mins
  • Can a Chatbot Make You Feel Better About Your Mayor?
    Nov 18 2025

    Programming note: satire ahead.

    I don’t use LinkedIn for politics, and I’m not starting now. But a listener sent me this (yes, joking): “Maybe you could do one that says how chatbots can make you feel better about a communist socialist mayor haha.” I read it and thought: that’s actually an interesting design prompt. Not persuasion. Not a manifesto. A what-if.

    So the new Chatbots Behaving Badly episode is a satire about coping, not campaigning. What if a chatbot existed whose only job was to talk you down from doom-scrolling after an election? Not to change your vote. Not to recruit your uncle. Just to turn “AAAAH” into “okay, breathe,” and remind you that institutions exist, budgets are real, and your city is more than a timeline.

    If you’re here for tribal food fights, this won’t feed you. If you’re curious about how we use AI to regulate emotions in public life—without turning platforms into battlegrounds—this one’s for you.

    No yard signs. No endorsements. Just a playful stress test of an idea: Could a bot lower the temperature long enough for humans to be useful?

    Episode: “Can a Chatbot Make You Feel Better About Your Mayor?” (satire).

    Listen if you want a laugh and a lower heart rate. Skip if you’d rather keep your adrenaline. Either way, let’s keep this space for work, ideas, and the occasional well-aimed joke.

    #satire #chatbots #designprompt #civicsnotvibes #ChatbotsBehavingBadly #NYC



    This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit markusbrinsa.substack.com
    Show More Show Less
    7 mins
  • Therapy Without a Pulse
    Nov 11 2025

    “Therapy Without a Pulse” examines the gap between friendly AI and real care. We trace how therapy-branded chatbots reinforce stigma and mishandle gray-area risk, why sycophancy rewards agreeable nonsense over clinical judgment, and how new rules (like Illinois’ prohibition on AI therapy) are redrawing the map. Then we pivot to a constructive blueprint: LLMs as training simulators and workflow helpers, not autonomous therapists; explicit abstention and fast human handoffs; journaling and psychoeducation that move people toward licensed care, never replace it. The bottom line: keep the humanity in the loop—because tone can be automated, responsibility can’t.

    Based on the article “Therapy Without a Pulse” by Markus Brinsa. https://chatbotsbehavingbadly.com/therapy-without-a-pulse

    Stanford Report: New study warns of risks in AI mental health tools (June 11, 2025). https://news.stanford.edu/stories/2025/06/ai-mental-health-care-tools-dangers-risks



    This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit markusbrinsa.substack.com
    Show More Show Less
    5 mins
  • 'With AI' is the new 'Gluten-Free'
    Nov 4 2025

    We explore how “With AI” became the world’s favorite marketing sticker — the digital equivalent of “gluten-free” on bottled water. With his trademark mix of humor and insight, he reveals how marketers transformed artificial intelligence from a technology into a virtue signal, a stabilizer for shaky product stories, and a magic key for unlocking budgets.

    From boardroom buzzwords to brochure poetry, Markus dissects the way “sex sells” evolved into “smart sells,” why every PowerPoint now glows with AI promises, and how two letters can make ordinary software sound like it graduated from MIT. But beneath the glitter, he finds a simple truth: the brands that win aren’t the ones that shout “AI” the loudest — they’re the ones that make it specific, honest, and actually useful.

    Funny, sharp, and dangerously relatable, “With AI Is the New Gluten-Free” is a reality check on hype culture, buyer psychology, and why the next big thing in marketing might just be sincerity.

    This episode is based on the article “’With AI’ is the new ‘Gluten-free’” by Markus Brinsa.

    https://chatbotsbehavingbadly.com/with-ai-is-the-new-gluten-free



    This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit markusbrinsa.substack.com
    Show More Show Less
    7 mins