Zero Bubble Pipeline Parallelism cover art

Zero Bubble Pipeline Parallelism

Zero Bubble Pipeline Parallelism

Listen for free

View show details

About this listen

Core idea is think about backward pass into two flows, one to compute grad wrt to parameters, and one to compute grad wrt to output of last layer, schedule so that you are always working instead of waiting (bubble). Read full paper: https://arxiv.org/abs/2401.10241 Tags: Systems and Performance, Deep Learning, Machine Learning
No reviews yet