Domesticating AI cover art

Domesticating AI

Domesticating AI

Written by: SoyPete Tech
Listen for free

LIMITED TIME OFFER | Get 2 Months for ₹5/month

About this listen

Domesicating AI is a bi-weekly podcast about practical AI for developers. We help developers run useful AI systems reliably, on the hardware and budgets they actually have. - Miriah Peterson: Software engineer, Go educator, and community builder focused on *production-first* AI. Runs SoyPete Tech (streams + writing + open-source). - Matt Sharp: AI Engineer/Strategist, co-author of *LLMs in Production*, MLOps practitioner. Writes **The Data Pioneer**. - Chris Brousseau: NLP practitioner, co-author of LLMs in Production, VP of AI at VEOX. You can find him as IMJONEZZSoyPete Tech
Episodes
  • Hardware-First Home AI: Chips, Memory, Backends, and What to Buy
    Feb 27 2026

    Episode 3 is a hardware-first guide to running AI at home. We break down what CPUs vs GPUs vs NPUs vs TPUs actually do in the inference pipeline, why memory capacity isn’t the same as performance (model loading, KV cache, and MoE), why backends/runtimes are real constraints (CUDA vs ROCm vs Metal/MLX vs CPU), and how to scale from one box to multi-GPU and multi-machine setups.


    Keep your AI on a leash.


    Links mentioned:

    - GPU Glossary (Modal): https://modal.com/gpu-glossary

    - CUDA → ROCm headline: https://wccftech.com/the-claude-code-has-managed-to-port-nvidia-cuda-backend-to-rocm-in-just-30-minutes/

    - Unsloth PR: https://github.com/unslothai/unsloth/pull/3856

    Show More Show Less
    33 mins
  • From “Inference Box” to Dev Rig: What NVIDIA DGX Spark Actually Is | Ep 2
    Feb 13 2026
    Everyone keeps calling NVIDIA DGX Spark an “inference box”… but in practice it behaves more like a dev rig.In Ep 2 of Domesticating AI, we break down what Spark is actually good for (AI development + fine-tuning) vs what it isn’t (a magical drop-in inference server). We also dig into why unified memory changes the local-AI experience, the “gateway stack” (Ollama + Open WebUI), when you outgrow turnkey UIs, and how homelab economics + networking decisions shape what you should run at home.In this episodeTraining vs inference (and why “inference server” gets misused)Unified memory: what it changes for model loading + workflowsOllama + Open WebUI as the fastest on-ramp for local AIFine-tuning workflows (QLoRA/Unsloth-style) and where Spark shinesHomelab reality: Docker “recipes,” troubleshooting, and collaborationSafer remote access: TailscaleCloud vs home economics (when cloud is cheaper… and when it explodes)NVIDIA / DGX SparkDGX Spark: https://www.nvidia.com/en-us/products/workstations/dgx-spark/Build hub / recipes: https://build.nvidia.com/sparkNIM on Spark playbook: https://build.nvidia.com/spark/nim-llmLocal AI runners + UIsOllama: https://ollama.com/Open WebUI (GitHub): https://github.com/open-webui/open-webuiOpen WebUI docs: https://docs.openwebui.com/llama.cpp: https://github.com/ggml-org/llama.cppLM Studio: https://lmstudio.ai/vLLM: https://github.com/vllm-project/vllmJan: https://jan.ai/Fine-tuning + workflowsUnsloth: https://github.com/unslothai/unslothImage generation tools (mentioned)ComfyUI: https://github.com/Comfy-Org/ComfyUIAUTOMATIC1111 SD WebUI: https://github.com/AUTOMATIC1111/stable-diffusion-webuiNetworking / Remote accessTailscale: https://tailscale.com/Cloud GPU alternatives (mentioned)Runpod pricing: https://www.runpod.io/pricingModal pricing: https://modal.com/pricingMiriah Peterson (Host): Miriah Peterson is a software engineer, Go educator, and community builder focused on production-first AI—treating LLM systems like real software with real users. She runs SoyPete Tech (streams + writing + open-source projects) and stays active in the Utah dev community through meetups and events, with a practical focus on shipping local and cloud AI systems.Connect:SoyPete Tech (YouTube): https://www.youtube.com/@SoyPete_TechSoyPete Tech (Substack): https://soypetetech.substack.com/LinkedIn: https://www.linkedin.com/in/miriah-peterson-35649b5b/Matt Sharp (Host): Matt Sharp is an AI Engineer and Strategist for a tech consulting firm and co-author of LLMs in Production. He’s a recovering data scientist and MLOps expert with 10+ years of experience operationalizing ML systems in production. Matt also teaches a graduate-level MLOps-in-production course at Utah State University as an adjunct professor. You can find him on Substack (Data Pioneer), LinkedIn, and on his other podcast, the Learning Curve.Connect:Data Pioneer (Substack): https://thedatapioneer.substack.com/Chris Brousseau (Host): Chris Brousseau is a linguist by training and an NLP practitioner by trade, with a career spanning linguistically informed NLP, modern LLM systems, and MLOps practices. He’s co-author of LLMs in Production and is currently VP of AI at VEOX. You can find him as IMJONEZZ (two Z’s) on YouTube, GitHub, and on LinkedIn.Connect:YouTube (IMJONEZZ): https://www.youtube.com/channel/UCPtkaw_x97yP4WevW7axk0gLinkedIn: https://www.linkedin.com/in/chris-brousseau/en📘 LLMs in Production (Matt Sharp & Chris Brousseau): https://www.manning.com/books/llms-in-productionLinks & ResourcesHosts
    Show More Show Less
    43 mins
  • Your First AI at Home
    Jan 30 2026
    Domesticating AI — S01E01: Your First AI at HomeHosts: Miriah Peterson, Matt Sharp, Chris BrousseauThis episode is your practical on-ramp to running AI at home: why inference engines matter, what to install first, and how to make “local AI” feel stable instead of fragile. The hosts start with a hardware + market reality check (tinygrad’s tinybox-style “AI server appliance” idea and the ongoing memory/RAM crunch), then break down what an inference engine actually does, how popular runtimes compare (llama.cpp, vLLM, Ollama, TGI), and a sane starter workflow for getting from “downloaded a model” to “usable local AI.”​Inference engines are the “runtime”: model loading, tokenization, KV cache/context handling, and the serving layer.​Pick your engine based on your goal: tinkering (llama.cpp) vs serving throughput (vLLM/TGI) vs it-just-works packaging (Ollama).​You don’t need a brand-new rig to start, but RAM/VRAM constraints will shape everything.​Use leaderboards as a hint, then validate with your own small eval prompts that match your workload.​If you’re exposing anything beyond your LAN: reverse proxy + TLS + don’t casually open ports.0:00 Intro + host chaos + what the show is1:08 News: tinygrad / “AI server appliance” thinking (tinybox vibes)2:44 News: RAM prices + the memory crunch for builders8:26 Main: building your first AI at home (why now)8:49 What is an inference engine?12:30 Engines compared: llama.cpp vs vLLM vs Ollama vs TGI15:42 Do you need to buy a new computer? (CPU vs GPU realities)25:32 Models for home: fit-to-hardware, quantization, context34:37 Leaderboards vs evals: picking models you can trust44:00 Community + meetups + where to follow45:22 Outro — “Keep your AI on a leash”News / context​Tom’s Hardware: TinyBox production + multi-GPU appliance concept (Tom's Hardware)​Reuters: AI-driven memory shortage / supply-chain crunch (Reuters)​IDC: 2026 device impacts from the memory shortage (IDC)Inference engines​llama.cpp (GGML org) (GitHub)​vLLM OpenAI-compatible server (docs.vllm.ai)​Ollama docs (quickstart) (Ollama Documentation)​Hugging Face Text Generation Inference (TGI) (GitHub)​Miriah Peterson: Software engineer, Go educator, and community builder focused on production-first AI. Runs SoyPete Tech (streams + writing + open-source).​Matt Sharp: AI Engineer/Strategist, co-author of LLMs in Production, MLOps practitioner. Writes The Data Pioneer. (thedatapioneer.substack.com)​Chris Brousseau: NLP practitioner, co-author of LLMs in Production, VP of AI at VEOX. You can find him as IMJONEZZ. (veox.ai)​SoyPete Tech (YouTube): (youtube.com)​SoyPete Tech (Substack): (soypetetech.substack.com)​Matt’s Substack (The Data Pioneer): (thedatapioneer.substack.com)​Chris on YouTube (IMJONEZZ): (youtube.com)​LLMs in Production (book): (Manning Publications)
    Show More Show Less
    42 mins
No reviews yet