AI Engineering Podcast cover art

AI Engineering Podcast

AI Engineering Podcast

Written by: Tobias Macey
Listen for free

About this listen

This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.© 2024 Boundless Notions, LLC.
Episodes
  • Taming Voice Complexity with Dynamic Ensembles at Modulate
    Feb 8 2026
    Summary In this episode of the AI Engineering Podcast, Carter Huffman, co-founder and CTO of Modulate, discusses the engineering behind low-latency, high-accuracy Voice AI. He explains why voice is a uniquely challenging modality due to its rich non-textual signals like tone, emotion, and context, and how simple speech-to-text-to-speech pipelines can't capture the necessary nuance. Carter introduces Modulate's Ensemble Listening Model (ELM) architecture, which uses dynamic routing and cost-based optimization to achieve scalability and precision in various audio environments. He covera topics such as reliability under distributed systems constraints, watchdogging with periodic model checks, structured long-horizon memory for conversations, and the trade-offs that make ensemble approaches compelling for repeated tasks at scale. Carter also shares insights on how ELMs generalize beyond voice, draws parallels to database query planners and mixture-of-experts, and discusses strategies for observability and evaluation in complex processing pipelines. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Carter Huffman about his work building an ensemble approach to low latency voice AIInterview IntroductionHow did you get involved in machine learning?Can you describe the "Ensemble Listening" approach and the story behind why Modulate moved away from monolithic architectures?When designing a real-time voice system, how do you handle the routing logic between specialized models without blowing your latency budget?What does the "gatekeeper" or routing layer actually look like in code?You’ve mentioned "evals that don’t lie." How do you build a validation pipeline for noisy, adversarial voice data that catches regressions that a simple word-error-rate (WER) might miss?In an ensemble of models, a failure in one specialized node might not crash the system, but it can degrade the output quality. How do you monitor for these "silent failures" in real-time without introducing massive overhead?For many teams, the default is to call an API for a frontier model. At what point in the scaling or latency curve does it become technically (or economically) necessary to swap a general LLM for a suite of specialized, smaller models?How do you track the real-world costs associated with the technical and human overhead of this more complex system?What are the most interesting, innovative, or unexpected ways that you have seen orchestrated ensembles used in live conversation environments?What are the most interesting, unexpected, or challenging lessons that you have learned while managing the lifecycle of multiple specialized models simultaneously?When is an ensemble approach the wrong choice? (e.g., At what level of complexity or throughput is the overhead of orchestration more trouble than it’s worth?)What do you have planned for the future of Ensemble Listening Models?Are we looking at self-optimizing routers, or perhaps moving these ensembles closer to the edge?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links ModulateNasa Jet Propulsion LaboratoryOpenAI WhisperMulti-Armed ...
    Show More Show Less
    59 mins
  • GPU Clouds, Aggregators, and the New Economics of AI Compute
    Jan 27 2026
    Summary In this episode I sit down with Hugo Shi, co-founder and CTO of Saturn Cloud, to map the strategic realities of sourcing and operating GPUs across clouds. Hugo breaks down today’s provider landscape—from hyperscalers to full-service GPU clouds, bare metal/concierge providers, and emerging GPU aggregators—and how to choose among them based on security posture, managed services, and cost. We explore practical layers of capability (compute, orchestration with Kubernetes/Slurm, storage, networking, and managed services), the trade-offs of portability on “Kubernetes-native” stacks, and the persistent challenge of data gravity. We also discuss current supply dynamics, the growing availability of on-demand capacity as newer chips roll out, and how AMD’s ecosystem is maturing as real competition to NVIDIA. Hugo shares patterns for separating training and inference across providers, why traditional ML is far from dead, and how usage varies wildly across domains like biotech. We close with predictions on consolidation, full‑stack experiences from GPU clouds, financial-style GPU marketplaces, and much-needed advances in reliability for long-running GPU jobs. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Hugo Shi about the strategic realities of sourcing GPUs in the cloud for your training and inference workloadsInterviewIntroductionHow did you get involved in machine learning?Can you start by giving a summary of your understanding of the current market for "cloud" GPUs?How would you characterize the customer base for the "neocloud" providers?How is the access to the GPU compute typically mediated?The predominant cloud providers (AWS, GCP, Azure) have gained market share by offering numerous differentiated services and ease-of-use features. What are the types of services that you might expect from a GPU provider?The "cloud-native" ecosystem was developed with the promise of enabling workload portability, but the realities are often more complicated. What are some of the difficulties that teams encounter when trying to adapt their workloads to these different cloud providers?What are the toolchains/frameworks/architectures that you are seeing as most effective at adapting to these different compute environments?One of the major themes in the 2010s that worked against multi-cloud strategies was the idea of "data gravity". What are the strategies that teams are using to mitigate that tax on their workloads?That is a more substantial impact when dealing with training workloads than for inference compute. How are you seeing teams think about the balance of cost savings vs. operational complexity for those different workloads?What are the most interesting, innovative, or unexpected ways that you have seen teams capitalize on GPU capacity across these new providers?What are the most interesting, unexpected, or challenging lessons that you have learned while working on enabling teams to execute workloads on these neoclouds?When is a "neocloud" or "GPU cloud" provider the wrong choice?What are your predictions for the future evolutions of GPU-as-a-service as hardware availability improves and model architectures become more efficient?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with ...
    Show More Show Less
    46 mins
  • The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI
    Jan 20 2026
    Summary In this episode of the AI Engineering Podcast Niklas Gustavsson, Chief Architect at Spotify, talks about scaling AI across engineering and product. He explores how Spotify's highly distributed architecture was built to support rapid adoption of coding agents like Copilot, Cursor, and Claude Code, enabled by standardization and Backstage. The conversation covers the tension between bottoms-up experimentation and platform standardization, and how Spotify is moving toward monorepos and fleet management. Niklas discusses the emergence of "fleet-wide agents" that can execute complex code changes with robust testing and LLM-as-judge loops to ensure quality. He also touches on the shift in engineering workflows as code generation accelerates, the growing use of agents beyond coding, and the lessons learned in sandboxing, agent skills/rules, and shared evaluation frameworks. Niklas highlights Spotify's decade-long experience with ML product work and shares his vision for deeper end-to-end integration of agentic capabilities across the full product lifecycle and making collaborative "team-level memory" for agents a reality. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsUnlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Niklas Gustavsson about how Spotify is scaling AI usage in engineering and product workInterview IntroductionHow did you get involved in machine learning?Can you start by giving an overview of your engineering practices independent of AI?What was your process for introducing AI into the developmer experience? (e.g. pioneers doing early work (bottom-up) vs. top-down)There are countless agentic coding tools on the market now. How do you balance organizational standardization vs. exploration?Beyond the toolchain, what are your methods for sharing best practices and upskilling engineers on use of agentic toolchains for software/product engineering?Spotify has been operationalizing ML/AI features since before the introduction of LLMs and transformer models. How has that history helped inform your adoption of generative AI in your overall engineering organization?As you use these generative and agentic AI utilities in your day-to-day, how have those lessons learned fed back into your AI-powered product features?What are some of the platform capabilities/developer experience investments that you have made to improve the overall effectiveness of agentic coding in your engineering organization?What are some examples of guardrails/speedbumps that you have introduced to avoid injecting unreliable or untested work into production?As the (time/money/cognitive) cost of writing code drops that increases the burden on reviewing that code. What are some of the ways that you are working to scale that side of the equation?What are some of the ways that agentic coding/CLI utilities have bled into other areas of engineering/opertions/product development beyond just writing code?What are the most interesting, innovative, or unexpected ways that you have seen your team applying AI/agentic engineering practices?What are the most interesting, unexpected, or challenging lessons that you have learned while working on operationalizing and scaling agentic engineering patterns in your teams?When is agentic code generation the wrong choice?What do you have planned for the future of AI and agentic coding patterns and practices in your organization?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list,...
    Show More Show Less
    56 mins
No reviews yet