Tag

#language models

26 articles

New review paper argues code is how AI agents think and act, not just what they produce

A new review paper argues that the true power of AI agents lies in the code that surrounds language models, not just in the models themselves. Companies like DeepSeek are already adapting this idea into their development strategies.

May 2913

NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3.82 Average Points on Llama-3.2-1B

This article explains NVIDIA's X-Token, a novel knowledge distillation technique that improves the performance of smaller language models by addressing token misalignment issues in previous methods like GOLD. It details how projection-guided cross-tokenizer alignment enhances model compression and deployment efficiency.

May 295

Stochastic Gradient Descent (SGD’s) Frequency Bias and How Adam Fixes It

This article explains how Stochastic Gradient Descent (SGD) creates a frequency bias in language models, where common words are learned better than rare ones. It shows how Adam optimizer improves this by giving more attention to rare tokens.

May 1814

Databricks brings GPT-5.5 to enterprise agent workflows

Databricks integrates GPT-5.5 into enterprise agent workflows following the model's state-of-the-art performance on the OfficeQA Pro benchmark.

May 1525

GM just laid off hundreds of IT workers to hire those with stronger AI skills

Learn to build an AI-native workflow system that combines data engineering, prompt engineering, and language model integration - skills in high demand in today's job market.

May 1128

Meta and Stanford Researchers Propose Fast Byte Latent Transformer That Reduces Inference Memory Bandwidth by Over 50% Without Tokenization

Meta and Stanford researchers introduce the Fast Byte Latent Transformer, reducing inference memory bandwidth by over 50% without subword tokenization.

May 1129

ChatGPT Has 'Goblin' Mania in the US. In China It Will 'Catch You Steadily'

This explainer examines how ChatGPT's Chinese deployment exhibits systematic linguistic tics that differ from its English version, revealing important insights about multilingual LLM behavior and training data effects.

May 726

Same prompt, different morals: how frontier AI models diverge on ethical dilemmas

Leading AI models show starkly different responses to identical ethical dilemmas, raising concerns about the lack of universal moral frameworks in artificial intelligence.

May 238

MIT study explains why scaling language models works so reliably

This explainer explains how superposition helps large AI models work better by storing and connecting information in overlapping ways, making them more powerful and creative.

May 231

A Coding Guide on LLM Post Training with TRL from Supervised Fine Tuning to DPO and GRPO Reasoning

Learn how to improve large language models using post-training techniques like Supervised Fine-Tuning, Reward Modeling, DPO, and GRPO with the TRL library.

May 133

OpenAI says old prompts are holding GPT-5.5 back and developers need a fresh baseline

OpenAI advises developers to abandon outdated prompting methods for GPT-5.5 and start fresh with minimal, role-based prompts to unlock the model's full potential.

Apr 2639

I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance

This explainer examines the tension between AI capability and control, using OpenAI's GPT-5.5 performance as a case study to understand alignment challenges in large language models.

Apr 2433