Your First AI at Home cover art

Your First AI at Home

Your First AI at Home

Listen for free

View show details

LIMITED TIME OFFER | Get 2 Months for ₹5/month

About this listen

Domesticating AI — S01E01: Your First AI at HomeHosts: Miriah Peterson, Matt Sharp, Chris BrousseauThis episode is your practical on-ramp to running AI at home: why inference engines matter, what to install first, and how to make “local AI” feel stable instead of fragile. The hosts start with a hardware + market reality check (tinygrad’s tinybox-style “AI server appliance” idea and the ongoing memory/RAM crunch), then break down what an inference engine actually does, how popular runtimes compare (llama.cpp, vLLM, Ollama, TGI), and a sane starter workflow for getting from “downloaded a model” to “usable local AI.”​Inference engines are the “runtime”: model loading, tokenization, KV cache/context handling, and the serving layer.​Pick your engine based on your goal: tinkering (llama.cpp) vs serving throughput (vLLM/TGI) vs it-just-works packaging (Ollama).​You don’t need a brand-new rig to start, but RAM/VRAM constraints will shape everything.​Use leaderboards as a hint, then validate with your own small eval prompts that match your workload.​If you’re exposing anything beyond your LAN: reverse proxy + TLS + don’t casually open ports.0:00 Intro + host chaos + what the show is1:08 News: tinygrad / “AI server appliance” thinking (tinybox vibes)2:44 News: RAM prices + the memory crunch for builders8:26 Main: building your first AI at home (why now)8:49 What is an inference engine?12:30 Engines compared: llama.cpp vs vLLM vs Ollama vs TGI15:42 Do you need to buy a new computer? (CPU vs GPU realities)25:32 Models for home: fit-to-hardware, quantization, context34:37 Leaderboards vs evals: picking models you can trust44:00 Community + meetups + where to follow45:22 Outro — “Keep your AI on a leash”News / context​Tom’s Hardware: TinyBox production + multi-GPU appliance concept (Tom's Hardware)​Reuters: AI-driven memory shortage / supply-chain crunch (Reuters)​IDC: 2026 device impacts from the memory shortage (IDC)Inference engines​llama.cpp (GGML org) (GitHub)​vLLM OpenAI-compatible server (docs.vllm.ai)​Ollama docs (quickstart) (Ollama Documentation)​Hugging Face Text Generation Inference (TGI) (GitHub)​Miriah Peterson: Software engineer, Go educator, and community builder focused on production-first AI. Runs SoyPete Tech (streams + writing + open-source).​Matt Sharp: AI Engineer/Strategist, co-author of LLMs in Production, MLOps practitioner. Writes The Data Pioneer. (thedatapioneer.substack.com)​Chris Brousseau: NLP practitioner, co-author of LLMs in Production, VP of AI at VEOX. You can find him as IMJONEZZ. (veox.ai)​SoyPete Tech (YouTube): (youtube.com)​SoyPete Tech (Substack): (soypetetech.substack.com)​Matt’s Substack (The Data Pioneer): (thedatapioneer.substack.com)​Chris on YouTube (IMJONEZZ): (youtube.com)​LLMs in Production (book): (Manning Publications)
No reviews yet