Module 2: The MLP Layer - Where Transformers Store Knowledge cover art

Module 2: The MLP Layer - Where Transformers Store Knowledge

Module 2: The MLP Layer - Where Transformers Store Knowledge

Listen for free

View show details

About this listen

Shay explains where a transformer actually stores knowledge: not in attention, but in the MLP (feed-forward) layer. The episode frames the transformer block as a two-step loop: attention moves information between tokens, then the MLP transforms each token’s representation independently to inject learned knowledge.

No reviews yet