Module 2: The Encoder (BERT) vs. The Decoder (GPT)
Failed to add items
Sorry, we are unable to add the item because your shopping basket is already at capacity.
Add to cart failed.
Please try again later
Add to wishlist failed.
Please try again later
Remove from wishlist failed.
Please try again later
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
Written by:
About this listen
Shay breaks down the encoder vs decoder split in transformers: encoders (BERT) read the full text with bidirectional attention to understand meaning, while decoders (GPT) generate text one token at a time using causal attention.
She ties the architecture to training (masked-word prediction vs next-token prediction), explains why decoder-only models dominate today (they can both interpret prompts and generate efficiently with KV caching), and previews the next episode on the MLP layer, where most learned knowledge lives.
No reviews yet