๐๐ผ Hey AI heads ๐๏ธ
Join us for the very first Tech Beats Live ๐ด, hosted by Kosseilaโaka @CloudDude from @CloudThrill.
๐ฏ This chill & laid-back livestream will unpack LLM quantization ๐ฅ:
- โ
WHY it matters
- โ
HOW it works
- โ
Enterprise (vLLM) vs Consumer (@Ollama) trade-offs
- โ
and WHERE itโs going next.
Weโll be joined by two incredible guest stars to talk Enterprise vs Consumer Quantz ๐ฃ๏ธ:
๐ท Eldar Kurtiฤ โ bringing the enterprise perspective with vLLM.
๐ท Colin Kealty โ aka Bartowski, creator of the top-downloaded GGUF quantized LLMs on Hugging Face.
๐ซต๐ผ Come learn and have some fun ๐.
๐๐ก๐๐ฉ๐ญ๐๐ซ๐ฌ:
(00:00) Host Introduction
(04:07) Eldar Intro
(07:33) Bartowski Intro
(13:04) Whatโs Quantization!
(16:19) Why LLM Quantization Matters?
(20:39) Training vs Inference โ โThe New Dealโ
(27:46) Biggest Misconception About Quantization
(33:22) Enterprise Quantization in Production (vLLM)
(48:48) Consumer LLMs & Quantization (Ollama, llama.cpp, GGUF) โ โLLMs for the Peopleโ
(01:06:45) BitNet 1-Bit Quantization from Microsoft
(01:28:14) How Long It Takes to Quantize a Model (Llama-3 70B) โ GGUF or lm-compressor
(01:34:23) What Is I-Matrix & Why People Confuse It with IQ Quantization?
(01:39:36) Whatโs LoRA & LoRA-Q?
(01:42:36) What Is Sparsity?
(01:47:42) What Is Distillation?
(01:52:34) Extreme Quantization (Unsloth) of Big Models (DeepSeek) at 2-bits 70 % Size Cut
(01:57:27) Will Future Models (Llama-5) Be Trained on FP4 Tensor Cores?
(02:02:15) The Future of LLMs on Edge Devices (Google AI Edge)
(02:08:00) How to Evaluate the Quality of a Quantized Model
(02:26:09) Hugging Faceโs Role in the World of LLM/Quantization
(02:33:46) Hugging Faceโs Role in the World of LLM/Quantization
(02:36:41) LocalLlama Sub-Reddit Down (Moderator Goes Bananas)
(02:40:11) Guestsโ Hope for the Future of LLMs & AI in General
๐ Check out the quantization blog: https://bitly/LLMQuant
#AI #LLM #Quantization #TechBeatsLive #LocalLlama #vLLM #Ollama