FlashOptim: Optimizers for Memory Efficient Training cover art

FlashOptim: Optimizers for Memory Efficient Training

FlashOptim: Optimizers for Memory Efficient Training

Listen for free

View show details

LIMITED TIME OFFER | Get 2 Months for ₹5/month

About this listen

In this episode, hosts Hal Turing and Dr. Ada Shannon explore the groundbreaking paper "FlashOptim: Optimizers for Memory Efficient Training" by researchers from Databricks AI Research. The discussion centers around innovative techniques to significantly reduce memory usage in neural network training without sacrificing model quality. Key methods such as Optimizer State Quantization, Float Splitting Techniques, and Companded Optimizer State Quantization are unpacked, highlighting their potential to lower memory requirements from 175 GiB to 113 GiB for large models like Llama-3.1-8B. Listeners interested in AI research will find this episode compelling as it addresses the democratization of AI by making advanced models more accessible to those with limited hardware resources. Sources: 1. https://arxiv.org/pdf/2602.23349 2. Mixed Precision Training — Paulius Micikevicius et al., 2018 https://scholar.google.com/scholar?q=Mixed+Precision+Training 3. 8-bit Optimizer States for Memory-Efficient Training — Tim Dettmers et al., 2022 https://scholar.google.com/scholar?q=8-bit+Optimizer+States+for+Memory-Efficient+Training 4. Parameter-Efficient Transfer Learning for NLP — Xiaoqi Li and Percy Liang, 2021 https://scholar.google.com/scholar?q=Parameter-Efficient+Transfer+Learning+for+NLP 5. Q-adam-mini: Memory-efficient 8-bit quantized optimizer for large language model training — approximate, 2023 https://scholar.google.com/scholar?q=Q-adam-mini:+Memory-efficient+8-bit+quantized+optimizer+for+large+language+model+training 6. Memory efficient optimizers with 4-bit states — approximate, 2023 https://scholar.google.com/scholar?q=Memory+efficient+optimizers+with+4-bit+states 7. ECO: Quantized Training without Full-Precision Master Weights — approximate, 2023 https://scholar.google.com/scholar?q=ECO:+Quantized+Training+without+Full-Precision+Master+Weights 8. AI Post Transformers: FlashOptim: Optimizers for Memory Efficient Training — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-02_urls_1.mp3
No reviews yet