Tag
1 article
The Qwen team has released FlashQLA, a high-performance linear attention kernel library that achieves up to 3x speedup on NVIDIA Hopper GPUs, enhancing both pretraining and edge-side inference.