Tag

#transformer

4 articles

Luma Labs Launches Uni-1: The Autoregressive Transformer Model that Reasons through Intentions Before Generating Images

Explains how Luma Labs' Uni-1 model introduces a reasoning phase before image generation, addressing the 'intent gap' that affects current diffusion models.

Mar 234

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both

This article explains how a new AI model uses memory and flexible thinking time to solve problems more efficiently than traditional models.

Mar 2121

Meet Mamba-3: A New State Space Model Frontier with 2x Smaller States and Enhanced MIMO Decoding Hardware Efficiency

Learn to implement and use State Space Models with the Mamba architecture, focusing on Mamba-3's 2x smaller states and enhanced hardware efficiency.

Mar 1833

Moonshot AI Releases 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔 to Replace Fixed Residual Mixing with Depth-Wise Attention for Better Scaling in Transformers

This article explains how a new AI technique called Attention Residuals changes the way information flows in Transformer models, potentially making them more efficient and easier to train.

Mar 1527