Tag

#mixture-of-experts

5 articles

Tencent Releases Hy3: An Open 295B Mixture-of-Experts (MoE) Model with 21B Active Parameters and 256K Context

Learn how to access and use Tencent's Hy3 Mixture-of-Experts model through OpenRouter, including understanding MoE architecture, using long-context capabilities, and experimenting with reasoning tasks.

Jul 614

Tencent releases Hy3 open-source model that allegedly matches models up to five times its active size

Learn how to work with mixture-of-experts (MoE) language models like Tencent's Hy3 using Hugging Face's Transformers library. This beginner-friendly tutorial teaches you to load, tokenize, and generate text with MoE models.

Jul 612

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

Learn how Cohere's North Mini Code uses mixture-of-experts architecture to enable efficient, large-scale coding assistance with 30B parameters and 3B active parameters.

Jun 1037

Cohere Releases Command A+: A 218B Sparse MoE Model for Agentic Workflows That Runs on as Few as Two H100 GPUs

Learn to deploy and use Cohere's Command A+ 218B parameter model for agentic workflows, optimized to run efficiently on just two H100 GPUs with W4A4 quantization.

May 2151

Researchers train AI model that hits near-full performance with just 12.5 percent of its experts

Researchers at the Allen Institute for AI and UC Berkeley have developed EMO, a mixture-of-experts model that maintains near-full performance using only 12.5% of its experts, making it more practical for memory-constrained settings.

May 1538