FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional cover art

FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional

FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional

Listen for free

View show details

About this listen

This February 2026 research paper introduces Bidaw, a novel system designed to optimize the performance of interactive Large Language Model (LLM) serving. The authors address inefficiencies in existing **key–value (KV) caching** methods, where the separation of computational engines and storage layers leads to high latency and redundant data processing. **Bidaw** implements **bidirectional awareness**, allowing the compute engine to schedule requests based on storage speeds while the storage system uses model output lengths to predict future data needs. Additionally, the system utilizes **storage-efficient tensor caching** to reduce memory footprints without sacrificing accuracy. Experimental results demonstrate that this approach significantly lowers response times and increases throughput compared to current state-of-the-art solutions. Ultimately, **Bidaw** bridges the gap between theoretical caching limits and practical local deployments for multi-round human-AI conversations.


Source:


February 24–26, 2026

Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional Computation–Storage Awareness

Tsinghua University, China University of Geosciences Beijing, China Telecom Omni-channel Operation Center

Shipeng Hu, Guangyan Zhang, Yuqi Zhou, Yaya Wei, Ziyan Zhong, Jike Chen

https://www.usenix.org/conference/fast26/presentation/hu-shipeng

No reviews yet