FlashOptim: Optimizers for Memory Efficient Training

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

FlashOptim: Optimizers for Memory Efficient Training

Listen for free

View show details

LIMITED TIME OFFER | Get 2 Months for ₹5/month

About this listen

In this episode, hosts Hal Turing and Dr. Ada Shannon explore the groundbreaking paper "FlashOptim: Optimizers for Memory Efficient Training" by researchers from Databricks AI Research. The discussion centers around innovative techniques to significantly reduce memory usage in neural network training without sacrificing model quality. Key methods such as Optimizer State Quantization, Float Splitting Techniques, and Companded Optimizer State Quantization are unpacked, highlighting their potential to lower memory requirements from 175 GiB to 113 GiB for large models like Llama-3.1-8B. Listeners interested in AI research will find this episode compelling as it addresses the democratization of AI by making advanced models more accessible to those with limited hardware resources. Sources: 1. https://arxiv.org/pdf/2602.23349 2. Mixed Precision Training — Paulius Micikevicius et al., 2018 https://scholar.google.com/scholar?q=Mixed+Precision+Training 3. 8-bit Optimizer States for Memory-Efficient Training — Tim Dettmers et al., 2022 https://scholar.google.com/scholar?q=8-bit+Optimizer+States+for+Memory-Efficient+Training 4. Parameter-Efficient Transfer Learning for NLP — Xiaoqi Li and Percy Liang, 2021 https://scholar.google.com/scholar?q=Parameter-Efficient+Transfer+Learning+for+NLP 5. Q-adam-mini: Memory-efficient 8-bit quantized optimizer for large language model training — approximate, 2023 https://scholar.google.com/scholar?q=Q-adam-mini:+Memory-efficient+8-bit+quantized+optimizer+for+large+language+model+training 6. Memory efficient optimizers with 4-bit states — approximate, 2023 https://scholar.google.com/scholar?q=Memory+efficient+optimizers+with+4-bit+states 7. ECO: Quantized Training without Full-Precision Master Weights — approximate, 2023 https://scholar.google.com/scholar?q=ECO:+Quantized+Training+without+Full-Precision+Master+Weights 8. AI Post Transformers: FlashOptim: Optimizers for Memory Efficient Training — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-02_urls_1.mp3

No reviews yet