Module 2: Attention Is All You Need (The Concept)

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Module 2: Attention Is All You Need (The Concept)

Listen for free

View show details

About this listen

Shay breaks down the 2017 paper "Attention Is All You Need" and introduces the transformer: a non-recurrent architecture that uses self-attention to process entire sequences in parallel.

The episode explains positional encoding, how self-attention creates context-aware token representations, the three key advantages over RNNs (parallelization, global receptive field, and precise signal mixing), the quadratic computational trade-off, and teases a follow-up episode that will dive into the math behind attention.

No reviews yet