Tag

#FlashQLA

1 article

Qwen Team Releases FlashQLA: a High-Performance Linear Attention Kernel Library That Achieves Up to 3× Speedup on NVIDIA Hopper GPUs

The Qwen team has released FlashQLA, a high-performance linear attention kernel library that achieves up to 3x speedup on NVIDIA Hopper GPUs, enhancing both pretraining and edge-side inference.

Apr 2963