• Ilya 1 The Annotated Transformer
    Dec 15 2025

    The Transformer has been on a lot of people’s minds over the last year five years. This post presents an

    annotated version of the paper in the form of a line-by-line implementation. It reorders and deletes

    some sections from the original paper and adds comments throughout. This document itself is a working

    notebook, and should be a completely usable implementation. Code is available here here here here here here here here here here here here here.

    Show More Show Less
    15 mins
  • Breaking the Sorting Barrier for Shortest Paths
    Aug 17 2025

    This document presents a deterministic algorithm for the single-source shortest path (SSSP) problem on directed graphs with non-negative edge weights, achieving a time complexity of O(m log^(2/3) n). This groundbreaking result surpasses the long-standing O(m + n log n) barrier of Dijkstra's algorithm, demonstrating that Dijkstra's is not optimal for SSSP on sparse graphs when the vertex ordering by distance is not strictly required. The approach ingeniously merges concepts from Dijkstra's and Bellman-Ford algorithms using a recursive partitioning technique to manage the "frontier" of uncertain distances more efficiently, avoiding the sorting bottleneck inherent in traditional methods. It introduces a "FindPivots" procedure and a specialized data structure to limit the size of the set of vertices that need active consideration, thereby reducing computational overhead and improving performance.

    Show More Show Less
    30 mins
  • AlphaEvolve overview.
    Jun 19 2025

    • We first introduce a new class of alphas with intriguingstrengths: like formulaic alphas, these alphas can modelscalar features and thus are simple to mine into a weaklycorrelated set, but, like machine learning alphas, they arehigh-dimensional data-driven models utilizing long-termfeatures. We then propose a novel alpha mining framework,AlphaEvolve, to generate the new alphas. To the best of ourknowledge, we are the first to solve the stock prediction problem based on AutoML and the first to tackle the problem ofmining weakly correlated alphas.• We enable AlphaEvolve to selectively inject relational domain knowledge without any strong structural assumptionin an alpha.• We propose an optimization technique to accelerate alphamining by pruning redundant alphas.• We conduct extensive experimental study on AlphaEvolveusing the stock price data of NASDAQ. The results show thatAlphaEvolve generates alphas with weakly correlated highreturns.


    quambase.com

    Show More Show Less
    23 mins
  • Beating the Ramsey Limit: The Future of Quantum Sensing
    May 21 2025

    Timing is everything in quantum measurement. In this video, we explore a protocol—featured in Nature Communications—that uses Bloch vector dynamics to determine the optimal time to read out a qubit, enhancing the signal-to-noise ratio (SNR) and outperforming Ramsey sequences under all conditions.

    Starting from a state in the xz-plane, the protocol tracks the coherence transfer from the z-axis to the x-axis under continuous drive. The key moment is the breakdown point—

    Even in dephasing dominated scenarios, this technique still beats Ramsey — making it unconditionally superior.

    Based on the research article published in Nature Communications

    Show More Show Less
    11 mins
  • Quantum Algorithms for Learning Periodic Functions
    Apr 5 2025

    This quantum algorithm leverages unique quantum properties, primarily related to quantum period finding, to efficiently learn periodic functions over a broad range of non-uniform distributions. The algorithm achieves an exponential quantum advantage over classical gradient-based algorithms, which are standard in machine learning, for learning these functions with Fourier-sparse input distributions such as Gaussian, generalized Gaussian, and logistic distributions.

    Here's how the quantum algorithm leverages unique quantum properties:

    • Quantum Statistical Queries (QSQs) for Accessing Function Information: The algorithm operates in the QSQ model, which provides access to the target function $g_{w^\star}$ through queries that return approximations of expectation values involving a quantum example state $|g_{w^\star}\rangle$. This quantum access model is crucial for implementing quantum algorithms for learning.

    • Quantum Fourier Transform (QFT) for Period Finding: A key step in the algorithm is to perform period finding to learn the unknown vector $w^\star$ that defines the linear component within the periodic function $g_{w^\star}(x) = g(x^\top w^\star)$. The algorithm encodes the QFT into QSQs to estimate the frequencies present in the function, which are directly related to the inverse of the periods. This ability to efficiently analyze the frequency components is a hallmark of quantum algorithms like Shor's algorithm and its generalizations.

    • Handling Non-Integer and Real Periods with Hallgren's Algorithm: Unlike standard period finding algorithms that typically require integer periods, the periods $1/|w_j^\star|$ are not necessarily integers. The algorithm adapts Hallgren's algorithm for finding the period of pseudoperiodic functions, which can handle potentially irrational periods. This is a significant advantage over classical methods that might struggle with non-commensurate frequencies. The algorithm also generalizes Hallgren's approach to work with non-uniform distributions.

    • Pseudoperiodicity for Discretization of Real-Valued Functions: Since the target functions are real-valued, they need to be discretized to be represented in a quantum state. The algorithm carefully chooses a discretization that satisfies pseudoperiodicity, a weaker condition than strict periodicity, which ensures that the discretized function still retains information about the period of the original continuous function. This addresses a challenge where naive discretization could eliminate crucial information about the period.

    • New Period Finding Algorithm for Non-Uniform Distributions: Hallgren's algorithm is originally designed for uniform superpositions. The presented work develops a new period finding algorithm that is specifically tailored to work with sufficiently flat non-uniform input distributions, including Gaussians, generalized Gaussians, and logistic distributions. This is crucial because many real-world datasets follow non-uniform distributions, and achieving quantum advantage in such settings is a key open question in quantum learning theory. The "sufficiently flat" condition allows the algorithm to generalize beyond the idealized uniform distribution case.

    • Quantum Advantage over Gradient-Based Classical Algorithms: The classical hardness results show that any gradient-based classical algorithm requires an exponential number of iterations (gradient samples) in the dimension of the problem and the norm of $w^\star$ to learn these periodic neurons, especially when the input data distribution has a sufficiently sparse Fourier transform. The quantum algorithm, by leveraging the QFT for efficient frequency estimation, achieves the same task with a polynomial number of QSQs and gradient descent iterations, thus demonstrating an exponential quantum advantage. The classical difficulty stems from the objective function being sparse in Fourier space, leading to barren plateaus that hinder gradient-based optimization.

    Show More Show Less
    22 mins
  • InSTA: Scaling Web Navigation Agent Training to the Internet
    Mar 28 2025

    The Key Innovations Behind InSTA🔥 Scaling from 200 to 150,000+ Websites

    Traditional web navigation agents train on only ~200 handpicked websites. InSTA expands this dramatically:

    1M candidate websites150K filtered safe sitesmassive-scale training dataFully AI-generated task dataset – Eliminates reliance on costly human labelingLive interaction with real websites – Ensuring authentic agent behavior modeling

    🔹 Task Proposer: LLMs generate realistic tasks per website, avoiding irrelevant or unsafe domains. 🔹 LLM Web Agents: Autonomous browsing agents complete tasks using Playwright API. 🔹 LLM Judges: AI evaluators achieve 93.1% accuracy in task success detection.

    📊 Empowering the Next Generation of AI Agents

    InSTA enables unprecedented generalization and autonomy in AI web agents:

    🏆 AI-Driven Automation & SafetyReal-World Applications of InSTAZero-Shot NavigationLLaMA-3.1-70B solves 16.7% of tasks on never-before-seen websites. ✅ E-Commerce & Enterprise Automation – Agents trained with InSTA can autonomously extract, summarize, and interact with web data. ✅ AI-Powered Web Exploration – Enhancing search, research, and personalized automation tools.

    Show More Show Less
    13 mins
  • Dynamic Tanh (DyT): The Future of Normalization-Free Transformers
    Mar 27 2025

    DyT’s Impact on Computational Efficiency

    🔹 Faster Inference & Training – Benchmarks on LLaMA 7B show significant reductions in computation time. 🔹 Reduced Memory Footprint – Eliminating normalization layers improves efficiency in memory-constrained environments. 🔹 Superior Scaling for Large Models – DyT enables more efficient pretraining of billion-scale models.

    🔹 Revolutionizing Transformer Design – DyT proves that explicit normalization layers are not essential, paving the way for lighter architectures. 🔹 Next-Gen AI Hardware Optimization – Lower compute requirements make DyT ideal for low-power AI chips and edge computing. 🔹 Beyond Transformers: Expanding DyT to Other Architectures – Future research may apply DyT-inspired scaling mechanisms to CNNs and RNNs.

    At Quambase, we believe DyT represents a fundamental breakthrough in deep learning optimization. By eliminating normalization overhead, it enables faster, more scalable AI models, driving the next era of efficient deep learning architectures.

    Future Implications: A Paradigm Shift in AI OptimizationConclusion: Towards More Efficient Deep Learning

    Show More Show Less
    19 mins
  • LLM Post-Training: Fine-Tuning and Alignment Techniques
    Mar 22 2025

    This document provides a comprehensive survey of post-training techniques for large language models (LLMs), which build upon the foundation laid by pretraining. The authors categorize these methods into fine-tuning, reinforcement learning, and test-time scaling, exploring how each refines LLMs for improved reasoning, accuracy, and alignment with human values. The survey analyzes various algorithms and strategies within these categories, such as different reinforcement learning approaches like PPO and DPO, and scaling techniques like chain-of-thought prompting and beam search. Furthermore, it discusses relevant benchmarks for evaluating the effectiveness of these post-training methods and highlights emerging research directions in the field.

    Show More Show Less
    26 mins