Episodes

  • Forecasting Downstream Performance of LLMs With Proxy Metrics
    May 24 2026
    ## Episode Summary In this episode, we cover: - **Forecasting Downstream Performance of LLMs With Proxy Metrics** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18607) - **DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback** (arXiv) - [Read more](http://arxiv.org/abs/2605.22781v1) - **Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.20244) - **AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17602) - **Forecasting Scientific Progress with Artificial Intelligence** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22681) --- *Sponsored by LimitLess AI*
    Show More Show Less
    Less than 1 minute
  • Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators
    May 23 2026
    ## Episode Summary In this episode, we cover: - **Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22717) - **DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback** (arXiv) - [Read more](http://arxiv.org/abs/2605.22781v1) - **AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17602) - **"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.21363) - **Forecasting Downstream Performance of LLMs With Proxy Metrics** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18607) --- *Sponsored by LimitLess AI*
    Show More Show Less
    Less than 1 minute
  • Efficient Agentic Reasoning Through Self-Regulated Simulative Planning
    May 23 2026
    ## Episode Summary In this episode, we cover: - **Efficient Agentic Reasoning Through Self-Regulated Simulative Planning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22138) - **AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation** (arXiv) - [Read more](http://arxiv.org/abs/2605.22816v1) - **Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.15669) - **Cambrian-P: Pose-Grounded Video Understanding** (arXiv) - [Read more](http://arxiv.org/abs/2605.22819v1) - **SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22668) --- *Sponsored by LimitLess AI*
    Show More Show Less
    Less than 1 minute
  • Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
    May 22 2026
    ## Episode Summary In this episode, we cover: - **Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18233) - **Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.19833) - **CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.19484) - **Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.14747) - **A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.20266) --- *Sponsored by LimitLess AI*
    Show More Show Less
    Less than 1 minute
  • Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
    May 21 2026
    ## Episode Summary In this episode, we cover: - **Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.08472) - **TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload** (arXiv) - [Read more](http://arxiv.org/abs/2605.20179v1) - **ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning** (arXiv) - [Read more](http://arxiv.org/abs/2605.20176v1) - **CaMo: Camera Motion Grounded Evaluation and Training for Vision-Language Models** (arXiv) - [Read more](http://arxiv.org/abs/2605.20165v1) - **A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents** (arXiv) - [Read more](http://arxiv.org/abs/2605.20173v1) --- *Sponsored by LimitLess AI*
    Show More Show Less
    Less than 1 minute
  • Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring
    May 20 2026
    ## Episode Summary In this episode, we cover: - **Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.16386) - **Evaluating Cognitive Age Alignment in Interactive AI Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17894) - **DexHoldem: Playing Texas Hold'em with Dexterous Embodied System** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18727) - **SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18630) - **AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.15565) --- *Sponsored by LimitLess AI*
    Show More Show Less
    Less than 1 minute
  • Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
    May 18 2026
    ## Episode Summary In this episode, we cover: - **Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.14040) - **A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation** (arXiv) - [Read more](http://arxiv.org/abs/2605.16250v1) - **Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.14786) - **Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.12524) - **Steered LLM Activations are Non-Surjective** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.09839) --- *Sponsored by LimitLess AI*
    Show More Show Less
    Less than 1 minute
  • Long Context Pre-Training with Lighthouse Attention
    May 17 2026
    ## Episode Summary In this episode, we cover: - **Long Context Pre-Training with Lighthouse Attention** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.06554) - **Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.15012) - **PreScam: A Benchmark for Predicting Scam Progression from Early Conversations** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.12243) - **WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.01018) - **Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.12034) --- *Sponsored by LimitLess AI*
    Show More Show Less
    Less than 1 minute