FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional
Failed to add items
Add to cart failed.
Add to wishlist failed.
Remove from wishlist failed.
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
Written by:
About this listen
This February 2026 research paper introduces Bidaw, a novel system designed to optimize the performance of interactive Large Language Model (LLM) serving. The authors address inefficiencies in existing **key–value (KV) caching** methods, where the separation of computational engines and storage layers leads to high latency and redundant data processing. **Bidaw** implements **bidirectional awareness**, allowing the compute engine to schedule requests based on storage speeds while the storage system uses model output lengths to predict future data needs. Additionally, the system utilizes **storage-efficient tensor caching** to reduce memory footprints without sacrificing accuracy. Experimental results demonstrate that this approach significantly lowers response times and increases throughput compared to current state-of-the-art solutions. Ultimately, **Bidaw** bridges the gap between theoretical caching limits and practical local deployments for multi-round human-AI conversations.
Source:
February 24–26, 2026
Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional Computation–Storage Awareness
Tsinghua University, China University of Geosciences Beijing, China Telecom Omni-channel Operation Center
Shipeng Hu, Guangyan Zhang, Yuqi Zhou, Yaya Wei, Ziyan Zhong, Jike Chen
https://www.usenix.org/conference/fast26/presentation/hu-shipeng