Scaling laws: long context length and in context learning
Failed to add items
Add to cart failed.
Add to wishlist failed.
Remove from wishlist failed.
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
Written by:
About this listen
Recent advancements in Long Context Language Models (LCLMs) demonstrate that In-Context Learning (ICL) capabilities follow predictable power-law scaling relationships, where performance improves monotonically with context length up to 10 million tokens and is governed by model depth, width, and training data volume. While Gemini 1.5 exhibits near-perfect recall and continued log-loss improvement at extreme scales, theoretical frameworks reveal that ICL functions mechanistically as implicit gradient descent, effectively performing low-rank weight updates to the model's MLP layers during inference. Furthermore, as context capacity expands, the necessity for sophisticated example selection strategies diminishes; simple random selection combined with data augmentation to fill the context window often yields optimal results, marking a shift from selection optimization to capacity utilization.
Sources:
1. **Gemini Team, Google** (2024)
*Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context*
https://arxiv.org/pdf/2403.05530
2. **Jinheon Baek, Sun Jae Lee, Prakhar Gupta, Geunseob (GS) Oh, Siddharth Dalmia, Prateek Kolhar** (2024)
*Revisiting In-Context Learning with Long Context Language Models*
https://arxiv.org/pdf/2412.16926
3. **Jiaheng Liu, Dawei Zhu, Zhiqi Bai, Yancheng He, et al.** (2025)
*A Comprehensive Survey on Long Context Language Modeling*
https://arxiv.org/pdf/2503.17407
4. **Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Javier Gonzalvo** (2025)
*Learning without training: The implicit dynamics of in-context learning*
https://arxiv.org/pdf/2507.16003
5. **Sushant Mehta, Ishan Gupta** (2025)
*Scaling Laws and In-Context Learning: A Unified Theoretical Framework*
https://arxiv.org/pdf/2511.06232