Episodes

Ilya 1 The Annotated Transformer

Dec 15 2025

The Transformer has been on a lot of people’s minds over the last year five years. This post presents an
annotated version of the paper in the form of a line-by-line implementation. It reorders and deletes
some sections from the original paper and adds comments throughout. This document itself is a working
notebook, and should be a completely usable implementation. Code is available here here here here here here here here here here here here here.

Show More Show Less

15 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Breaking the Sorting Barrier for Shortest Paths

Aug 17 2025

This document presents a deterministic algorithm for the single-source shortest path (SSSP) problem on directed graphs with non-negative edge weights, achieving a time complexity of O(m log^(2/3) n). This groundbreaking result surpasses the long-standing O(m + n log n) barrier of Dijkstra's algorithm, demonstrating that Dijkstra's is not optimal for SSSP on sparse graphs when the vertex ordering by distance is not strictly required. The approach ingeniously merges concepts from Dijkstra's and Bellman-Ford algorithms using a recursive partitioning technique to manage the "frontier" of uncertain distances more efficiently, avoiding the sorting bottleneck inherent in traditional methods. It introduces a "FindPivots" procedure and a specialized data structure to limit the size of the set of vertices that need active consideration, thereby reducing computational overhead and improving performance.

Show More Show Less

30 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
AlphaEvolve overview.

Jun 19 2025

• We first introduce a new class of alphas with intriguingstrengths: like formulaic alphas, these alphas can modelscalar features and thus are simple to mine into a weaklycorrelated set, but, like machine learning alphas, they arehigh-dimensional data-driven models utilizing long-termfeatures. We then propose a novel alpha mining framework,AlphaEvolve, to generate the new alphas. To the best of ourknowledge, we are the first to solve the stock prediction problem based on AutoML and the first to tackle the problem ofmining weakly correlated alphas.• We enable AlphaEvolve to selectively inject relational domain knowledge without any strong structural assumptionin an alpha.• We propose an optimization technique to accelerate alphamining by pruning redundant alphas.• We conduct extensive experimental study on AlphaEvolveusing the stock price data of NASDAQ. The results show thatAlphaEvolve generates alphas with weakly correlated highreturns.

quambase.com

Show More Show Less

23 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Beating the Ramsey Limit: The Future of Quantum Sensing

May 21 2025

Timing is everything in quantum measurement. In this video, we explore a protocol—featured in Nature Communications—that uses Bloch vector dynamics to determine the optimal time to read out a qubit, enhancing the signal-to-noise ratio (SNR) and outperforming Ramsey sequences under all conditions.
Starting from a state in the xz-plane, the protocol tracks the coherence transfer from the z-axis to the x-axis under continuous drive. The key moment is the breakdown point—
Even in dephasing dominated scenarios, this technique still beats Ramsey — making it unconditionally superior.
Based on the research article published in Nature Communications

Show More Show Less

11 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Quantum Algorithms for Learning Periodic Functions

Apr 5 2025
This quantum algorithm leverages unique quantum properties, primarily related to quantum period finding, to efficiently learn periodic functions over a broad range of non-uniform distributions. The algorithm achieves an exponential quantum advantage over classical gradient-based algorithms, which are standard in machine learning, for learning these functions with Fourier-sparse input distributions such as Gaussian, generalized Gaussian, and logistic distributions.
Here's how the quantum algorithm leverages unique quantum properties:
Quantum Statistical Queries (QSQs) for Accessing Function Information: The algorithm operates in the QSQ model, which provides access to the target function $g_{w^\star}$ through queries that return approximations of expectation values involving a quantum example state $|g_{w^\star}\rangle$. This quantum access model is crucial for implementing quantum algorithms for learning.
Quantum Fourier Transform (QFT) for Period Finding: A key step in the algorithm is to perform period finding to learn the unknown vector $w^\star$ that defines the linear component within the periodic function $g_{w^\star}(x) = g(x^\top w^\star)$. The algorithm encodes the QFT into QSQs to estimate the frequencies present in the function, which are directly related to the inverse of the periods. This ability to efficiently analyze the frequency components is a hallmark of quantum algorithms like Shor's algorithm and its generalizations.
Handling Non-Integer and Real Periods with Hallgren's Algorithm: Unlike standard period finding algorithms that typically require integer periods, the periods $1/|w_j^\star|$ are not necessarily integers. The algorithm adapts Hallgren's algorithm for finding the period of pseudoperiodic functions, which can handle potentially irrational periods. This is a significant advantage over classical methods that might struggle with non-commensurate frequencies. The algorithm also generalizes Hallgren's approach to work with non-uniform distributions.
Pseudoperiodicity for Discretization of Real-Valued Functions: Since the target functions are real-valued, they need to be discretized to be represented in a quantum state. The algorithm carefully chooses a discretization that satisfies pseudoperiodicity, a weaker condition than strict periodicity, which ensures that the discretized function still retains information about the period of the original continuous function. This addresses a challenge where naive discretization could eliminate crucial information about the period.
New Period Finding Algorithm for Non-Uniform Distributions: Hallgren's algorithm is originally designed for uniform superpositions. The presented work develops a new period finding algorithm that is specifically tailored to work with sufficiently flat non-uniform input distributions, including Gaussians, generalized Gaussians, and logistic distributions. This is crucial because many real-world datasets follow non-uniform distributions, and achieving quantum advantage in such settings is a key open question in quantum learning theory. The "sufficiently flat" condition allows the algorithm to generalize beyond the idealized uniform distribution case.
Quantum Advantage over Gradient-Based Classical Algorithms: The classical hardness results show that any gradient-based classical algorithm requires an exponential number of iterations (gradient samples) in the dimension of the problem and the norm of $w^\star$ to learn these periodic neurons, especially when the input data distribution has a sufficiently sparse Fourier transform. The quantum algorithm, by leveraging the QFT for efficient frequency estimation, achieves the same task with a polynomial number of QSQs and gradient descent iterations, thus demonstrating an exponential quantum advantage. The classical difficulty stems from the objective function being sparse in Fourier space, leading to barren plateaus that hinder gradient-based optimization.
Show More Show Less
22 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
InSTA: Scaling Web Navigation Agent Training to the Internet

Mar 28 2025

The Key Innovations Behind InSTA🔥 Scaling from 200 to 150,000+ Websites
Traditional web navigation agents train on only ~200 handpicked websites. InSTA expands this dramatically:
✅ 1M candidate websites → 150K filtered safe sites → massive-scale training data ✅ Fully AI-generated task dataset – Eliminates reliance on costly human labeling ✅ Live interaction with real websites – Ensuring authentic agent behavior modeling
🔹 Task Proposer: LLMs generate realistic tasks per website, avoiding irrelevant or unsafe domains. 🔹 LLM Web Agents: Autonomous browsing agents complete tasks using Playwright API. 🔹 LLM Judges: AI evaluators achieve 93.1% accuracy in task success detection.
📊 Empowering the Next Generation of AI Agents
InSTA enables unprecedented generalization and autonomy in AI web agents:
🏆 AI-Driven Automation & SafetyReal-World Applications of InSTA✅ Zero-Shot Navigation – LLaMA-3.1-70B solves 16.7% of tasks on never-before-seen websites. ✅ E-Commerce & Enterprise Automation – Agents trained with InSTA can autonomously extract, summarize, and interact with web data. ✅ AI-Powered Web Exploration – Enhancing search, research, and personalized automation tools.

Show More Show Less

13 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Dynamic Tanh (DyT): The Future of Normalization-Free Transformers

Mar 27 2025

DyT’s Impact on Computational Efficiency
🔹 Faster Inference & Training – Benchmarks on LLaMA 7B show significant reductions in computation time. 🔹 Reduced Memory Footprint – Eliminating normalization layers improves efficiency in memory-constrained environments. 🔹 Superior Scaling for Large Models – DyT enables more efficient pretraining of billion-scale models.
🔹 Revolutionizing Transformer Design – DyT proves that explicit normalization layers are not essential, paving the way for lighter architectures. 🔹 Next-Gen AI Hardware Optimization – Lower compute requirements make DyT ideal for low-power AI chips and edge computing. 🔹 Beyond Transformers: Expanding DyT to Other Architectures – Future research may apply DyT-inspired scaling mechanisms to CNNs and RNNs.
At Quambase, we believe DyT represents a fundamental breakthrough in deep learning optimization. By eliminating normalization overhead, it enables faster, more scalable AI models, driving the next era of efficient deep learning architectures.
Future Implications: A Paradigm Shift in AI OptimizationConclusion: Towards More Efficient Deep Learning

Show More Show Less

19 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
LLM Post-Training: Fine-Tuning and Alignment Techniques

Mar 22 2025

This document provides a comprehensive survey of post-training techniques for large language models (LLMs), which build upon the foundation laid by pretraining. The authors categorize these methods into fine-tuning, reinforcement learning, and test-time scaling, exploring how each refines LLMs for improved reasoning, accuracy, and alignment with human values. The survey analyzes various algorithms and strategies within these categories, such as different reinforcement learning approaches like PPO and DPO, and scaling techniques like chain-of-thought prompting and beam search. Furthermore, it discusses relevant benchmarks for evaluating the effectiveness of these post-training methods and highlights emerging research directions in the field.

Show More Show Less

26 mins

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free

Episodes

Ilya 1 The Annotated Transformer

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Breaking the Sorting Barrier for Shortest Paths

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

AlphaEvolve overview.

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Beating the Ramsey Limit: The Future of Quantum Sensing

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Quantum Algorithms for Learning Periodic Functions

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

InSTA: Scaling Web Navigation Agent Training to the Internet

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Dynamic Tanh (DyT): The Future of Normalization-Free Transformers

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

LLM Post-Training: Fine-Tuning and Alignment Techniques

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed