ZeRO Memory Optimizations: Toward Training Trillion Parameter Models cover art

ZeRO Memory Optimizations: Toward Training Trillion Parameter Models

ZeRO Memory Optimizations: Toward Training Trillion Parameter Models

Listen for free

View show details

About this listen

The paper introduces ZeRO, a novel approach to optimize memory usage when training massive language models. ZeRO-DP and ZeRO-R components effectively reduce memory redundancy and allow for training models with up to 170 billion parameters efficiently. The technique shows superlinear scalability, user-friendly implementation, and has the potential to democratize large model training in AI research. Read full paper: https://arxiv.org/abs/1910.02054 Tags: Systems and Performance, Deep Learning, Natural Language Processing
No reviews yet