Tag
1 article
Xiaomi's MiMo team, with TileRT, has achieved over 1000 tokens per second on a 1-trillion-parameter model using a single 8-GPU commodity node, marking a significant leap in LLM inference performance.