FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional

Failed to add items

Sorry, we are unable to add the item because your shopping basket is already at capacity.

Add to cart failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional

Listen for free

View show details

About this listen

This February 2026 research paper introduces Bidaw, a novel system designed to optimize the performance of interactive Large Language Model (LLM) serving. The authors address inefficiencies in existing **key–value (KV) caching** methods, where the separation of computational engines and storage layers leads to high latency and redundant data processing. **Bidaw** implements **bidirectional awareness**, allowing the compute engine to schedule requests based on storage speeds while the storage system uses model output lengths to predict future data needs. Additionally, the system utilizes **storage-efficient tensor caching** to reduce memory footprints without sacrificing accuracy. Experimental results demonstrate that this approach significantly lowers response times and increases throughput compared to current state-of-the-art solutions. Ultimately, **Bidaw** bridges the gap between theoretical caching limits and practical local deployments for multi-round human-AI conversations.

Source:

February 24–26, 2026

Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional Computation–Storage Awareness

Tsinghua University, China University of Geosciences Beijing, China Telecom Omni-channel Operation Center

Shipeng Hu, Guangyan Zhang, Yuqi Zhou, Yaya Wei, Ziyan Zhong, Jike Chen

https://www.usenix.org/conference/fast26/presentation/hu-shipeng

No reviews yet