Tag

#Together AI

1 article

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

Together AI open-sources OSCAR, an attention-aware 2-bit KV cache quantization system that significantly reduces memory usage and improves decoding speed for long-context LLMs.

May 2549