AI: post transformers cover art

AI: post transformers

AI: post transformers

Written by: mcgrof
Listen for free

About this listen

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.mcgrof
Episodes
  • Scaling laws: long context length and in context learning
    Jan 17 2026

    Recent advancements in Long Context Language Models (LCLMs) demonstrate that In-Context Learning (ICL) capabilities follow predictable power-law scaling relationships, where performance improves monotonically with context length up to 10 million tokens and is governed by model depth, width, and training data volume. While Gemini 1.5 exhibits near-perfect recall and continued log-loss improvement at extreme scales, theoretical frameworks reveal that ICL functions mechanistically as implicit gradient descent, effectively performing low-rank weight updates to the model's MLP layers during inference. Furthermore, as context capacity expands, the necessity for sophisticated example selection strategies diminishes; simple random selection combined with data augmentation to fill the context window often yields optimal results, marking a shift from selection optimization to capacity utilization.


    Sources:


    1. **Gemini Team, Google** (2024)

    *Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context*

    https://arxiv.org/pdf/2403.05530


    2. **Jinheon Baek, Sun Jae Lee, Prakhar Gupta, Geunseob (GS) Oh, Siddharth Dalmia, Prateek Kolhar** (2024)

    *Revisiting In-Context Learning with Long Context Language Models*

    https://arxiv.org/pdf/2412.16926


    3. **Jiaheng Liu, Dawei Zhu, Zhiqi Bai, Yancheng He, et al.** (2025)

    *A Comprehensive Survey on Long Context Language Modeling*

    https://arxiv.org/pdf/2503.17407


    4. **Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Javier Gonzalvo** (2025)

    *Learning without training: The implicit dynamics of in-context learning*

    https://arxiv.org/pdf/2507.16003


    5. **Sushant Mehta, Ishan Gupta** (2025)

    *Scaling Laws and In-Context Learning: A Unified Theoretical Framework*

    https://arxiv.org/pdf/2511.06232

    Show More Show Less
    13 mins
  • DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup
    Jan 14 2026

    On January 12, 2026 DeepSeek released its paper on **Engram**, a novel AI architecture that incorporates **conditional memory** to optimize how large language models handle information. By utilizing a **lookup mechanism for static patterns**, this technology separates an AI's logical reasoning from its factual knowledge base. This structural shift allows massive models to run on **cheaper hardware** by offloading memory requirements to standard host RAM without sacrificing speed. Research indicates that this approach effectively **increases model depth**, freeing up the system's core processing power for more complex reasoning and long-context tasks. Ultimately, the **Engram** module enables superior performance across coding, math, and general logic compared to traditional architectures. This innovation suggests a future where AI is significantly **more efficient and accessible** through the strategic decoupling of memory and computation.


    Source:

    https://github.com/deepseek-ai/Engram/blob/main/Engram_paper.pdf

    Show More Show Less
    14 mins
  • PageANN: Scalable Disk ANNS with Page-Aligned Graphs
    Dec 7 2025

    The research paper presents PageANN, a novel framework engineered to overcome the severe latency and scalability limitations facing existing **disk-based Approximate Nearest Neighbor Search (ANNS)** methods used in vector databases. Current systems suffer from inefficient search paths and a crucial misalignment between logical graph node size and the **physical I/O granularity of Solid-State Drives (SSDs)**. PageANN introduces a core innovation: a **page-node graph structure** that directly maps logical graph nodes to physical SSD pages, significantly shortening I/O traversal paths and maximizing data utility during retrieval. This is supported by a co-designed **disk data layout** that embeds compressed neighbor vectors within each page and a dynamic **memory management strategy** utilizing lightweight indexing for fast query routing. According to experimental results, PageANN consistently **outperforms state-of-the-art techniques**, achieving substantial gains in throughput and latency across diverse datasets and memory constraints while maintaining comparable recall accuracy.


    Source:

    https://arxiv.org/pdf/2509.25487

    Show More Show Less
    14 mins
No reviews yet