Episodes

Forecasting Downstream Performance of LLMs With Proxy Metrics

May 24 2026

## Episode Summary In this episode, we cover: - **Forecasting Downstream Performance of LLMs With Proxy Metrics** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18607) - **DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback** (arXiv) - [Read more](http://arxiv.org/abs/2605.22781v1) - **Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.20244) - **AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17602) - **Forecasting Scientific Progress with Artificial Intelligence** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22681) --- *Sponsored by LimitLess AI*
Show More Show Less

Less than 1 minute

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

May 23 2026

## Episode Summary In this episode, we cover: - **Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22717) - **DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback** (arXiv) - [Read more](http://arxiv.org/abs/2605.22781v1) - **AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17602) - **"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.21363) - **Forecasting Downstream Performance of LLMs With Proxy Metrics** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18607) --- *Sponsored by LimitLess AI*
Show More Show Less

Less than 1 minute

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

May 23 2026

## Episode Summary In this episode, we cover: - **Efficient Agentic Reasoning Through Self-Regulated Simulative Planning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22138) - **AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation** (arXiv) - [Read more](http://arxiv.org/abs/2605.22816v1) - **Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.15669) - **Cambrian-P: Pose-Grounded Video Understanding** (arXiv) - [Read more](http://arxiv.org/abs/2605.22819v1) - **SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22668) --- *Sponsored by LimitLess AI*
Show More Show Less

Less than 1 minute

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

May 22 2026

## Episode Summary In this episode, we cover: - **Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18233) - **Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.19833) - **CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.19484) - **Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.14747) - **A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.20266) --- *Sponsored by LimitLess AI*
Show More Show Less

Less than 1 minute

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models

May 21 2026

## Episode Summary In this episode, we cover: - **Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.08472) - **TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload** (arXiv) - [Read more](http://arxiv.org/abs/2605.20179v1) - **ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning** (arXiv) - [Read more](http://arxiv.org/abs/2605.20176v1) - **CaMo: Camera Motion Grounded Evaluation and Training for Vision-Language Models** (arXiv) - [Read more](http://arxiv.org/abs/2605.20165v1) - **A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents** (arXiv) - [Read more](http://arxiv.org/abs/2605.20173v1) --- *Sponsored by LimitLess AI*
Show More Show Less

Less than 1 minute

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring

May 20 2026

## Episode Summary In this episode, we cover: - **Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.16386) - **Evaluating Cognitive Age Alignment in Interactive AI Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17894) - **DexHoldem: Playing Texas Hold'em with Dexterous Embodied System** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18727) - **SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18630) - **AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.15565) --- *Sponsored by LimitLess AI*
Show More Show Less

Less than 1 minute

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning

May 18 2026

## Episode Summary In this episode, we cover: - **Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.14040) - **A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation** (arXiv) - [Read more](http://arxiv.org/abs/2605.16250v1) - **Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.14786) - **Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.12524) - **Steered LLM Activations are Non-Surjective** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.09839) --- *Sponsored by LimitLess AI*
Show More Show Less

Less than 1 minute

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Long Context Pre-Training with Lighthouse Attention

May 17 2026

## Episode Summary In this episode, we cover: - **Long Context Pre-Training with Lighthouse Attention** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.06554) - **Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.15012) - **PreScam: A Benchmark for Predicting Scam Progression from Early Conversations** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.12243) - **WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.01018) - **Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.12034) --- *Sponsored by LimitLess AI*
Show More Show Less

Less than 1 minute

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free

Episodes

Forecasting Downstream Performance of LLMs With Proxy Metrics

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed

Long Context Pre-Training with Lighthouse Attention

Failed to add items

Add to cart failed.

Add to wishlist failed.

Remove from wishlist failed.

Follow podcast failed

Unfollow podcast failed