Module 2: The Encoder (BERT) vs. The Decoder (GPT) cover art

Module 2: The Encoder (BERT) vs. The Decoder (GPT)

Module 2: The Encoder (BERT) vs. The Decoder (GPT)

Listen for free

View show details

About this listen

Shay breaks down the encoder vs decoder split in transformers: encoders (BERT) read the full text with bidirectional attention to understand meaning, while decoders (GPT) generate text one token at a time using causal attention.

She ties the architecture to training (masked-word prediction vs next-token prediction), explains why decoder-only models dominate today (they can both interpret prompts and generate efficiently with KV caching), and previews the next episode on the MLP layer, where most learned knowledge lives.

No reviews yet