Tag
26 articles
A new review paper argues that the true power of AI agents lies in the code that surrounds language models, not just in the models themselves. Companies like DeepSeek are already adapting this idea into their development strategies.
This article explains NVIDIA's X-Token, a novel knowledge distillation technique that improves the performance of smaller language models by addressing token misalignment issues in previous methods like GOLD. It details how projection-guided cross-tokenizer alignment enhances model compression and deployment efficiency.
This article explains how Stochastic Gradient Descent (SGD) creates a frequency bias in language models, where common words are learned better than rare ones. It shows how Adam optimizer improves this by giving more attention to rare tokens.
Databricks integrates GPT-5.5 into enterprise agent workflows following the model's state-of-the-art performance on the OfficeQA Pro benchmark.
Learn to build an AI-native workflow system that combines data engineering, prompt engineering, and language model integration - skills in high demand in today's job market.
Meta and Stanford researchers introduce the Fast Byte Latent Transformer, reducing inference memory bandwidth by over 50% without subword tokenization.
This explainer examines how ChatGPT's Chinese deployment exhibits systematic linguistic tics that differ from its English version, revealing important insights about multilingual LLM behavior and training data effects.
Leading AI models show starkly different responses to identical ethical dilemmas, raising concerns about the lack of universal moral frameworks in artificial intelligence.
This explainer explains how superposition helps large AI models work better by storing and connecting information in overlapping ways, making them more powerful and creative.
Learn how to improve large language models using post-training techniques like Supervised Fine-Tuning, Reward Modeling, DPO, and GRPO with the TRL library.
OpenAI advises developers to abandon outdated prompting methods for GPT-5.5 and start fresh with minimal, role-based prompts to unlock the model's full potential.
This explainer examines the tension between AI capability and control, using OpenAI's GPT-5.5 performance as a case study to understand alignment challenges in large language models.