Episodes

Module 3: The Lifecycle of an LLM : Pre-Training

Jan 25 2026

This episode explores the foundational stage of creating an LLM known as the pre-training phase. We break down the Trillion Token Diet by explaining how models move from random weights to sophisticated world models through the simple objective of next token prediction. You will learn about the Chinchilla Scaling Laws or the mathematical relationship between model size and data volume. We also discuss why the industry shifted from building bigger brains to better fed ones. By the end, you will understand the transition from raw statistical probability to parametric memory.

Show More Show Less

10 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Module 2: The MLP Layer - Where Transformers Store Knowledge

Jan 6 2026

Shay explains where a transformer actually stores knowledge: not in attention, but in the MLP (feed-forward) layer. The episode frames the transformer block as a two-step loop: attention moves information between tokens, then the MLP transforms each token’s representation independently to inject learned knowledge.

Show More Show Less

8 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Module 2: The Encoder (BERT) vs. The Decoder (GPT)

Jan 5 2026

Shay breaks down the encoder vs decoder split in transformers: encoders (BERT) read the full text with bidirectional attention to understand meaning, while decoders (GPT) generate text one token at a time using causal attention.

She ties the architecture to training (masked-word prediction vs next-token prediction), explains why decoder-only models dominate today (they can both interpret prompts and generate efficiently with KV caching), and previews the next episode on the MLP layer, where most learned knowledge lives.

Show More Show Less

8 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Module 2: Multi Head Attention & Positional Encodings

Jan 5 2026

Shay explains multi-head attention and positional encodings: how transformers run multiple parallel attention 'heads' that specialize, why we concatenate their outputs, and how positional encodings reintroduce word order into parallel processing.

The episode uses clear analogies (lawyer, engineer, accountant), highlights GPU efficiency, and previews the next episode on encoder vs decoder architectures.

Show More Show Less

9 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Module 2: Inside the Transformer -The Math That Makes Attention Work

Jan 3 2026

In this episode, Shay walks through the transformer's attention mechanism in plain terms: how token embeddings are projected into queries, keys, and values; how dot products measure similarity; why scaling and softmax produce stable weights; and how weighted sums create context-enriched token vectors.

The episode previews multi-head attention (multiple perspectives in parallel) and ends with a short encouragement to take a small step toward your goals.

Show More Show Less

12 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Module 2: Attention Is All You Need (The Concept)

Jan 3 2026

Shay breaks down the 2017 paper "Attention Is All You Need" and introduces the transformer: a non-recurrent architecture that uses self-attention to process entire sequences in parallel.

The episode explains positional encoding, how self-attention creates context-aware token representations, the three key advantages over RNNs (parallelization, global receptive field, and precise signal mixing), the quadratic computational trade-off, and teases a follow-up episode that will dive into the math behind attention.

Show More Show Less

12 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Module 2: The Transformer Architecture: History - The Bottleneck That Broke Language Models

Jan 3 2026

Shay breaks down why recurrent neural networks (RNNs) struggled with long-range dependencies in language: fixed-size hidden states and the vanishing gradient caused models to forget early context in long texts.

He explains how LSTMs added gates (forget, input, output) to manage memory and improve short-term performance but remained serial, creating a training and scaling bottleneck that prevented using massive parallel compute.

The episode frames this fundamental bottleneck in NLP and sets up the next episode on attention, ending with a brief reflection on persistence and steady effort.

Show More Show Less

7 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Module 1: Tokens - How Models Really Read

Dec 13 2025

This episode dives into the hidden layer where language stops being words and becomes numbers. We explore what tokens actually are, how tokenization breaks text into meaningful fragments, and why this design choice quietly shapes a model’s strengths, limits, and quirks. Once you understand tokens, you start seeing why language models sometimes feel brilliant and sometimes strangely blind.

Show More Show Less

12 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free

Episodes

Module 3: The Lifecycle of an LLM : Pre-Training

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Module 2: The MLP Layer - Where Transformers Store Knowledge

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Module 2: The Encoder (BERT) vs. The Decoder (GPT)

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Module 2: Multi Head Attention & Positional Encodings

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Module 2: Inside the Transformer -The Math That Makes Attention Work

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Module 2: Attention Is All You Need (The Concept)

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Module 2: The Transformer Architecture: History - The Bottleneck That Broke Language Models

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Module 1: Tokens - How Models Really Read

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed