Tag

#evaluation

3 articles

Arena, the AI leaderboard everyone uses, just became a 100 million dollar business

Arena, the AI leaderboard platform that started as a UC Berkeley research project, has reached $100 million in annualized revenue in just eight months.

Jun 2935

Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

This article explains how to build a complete Langfuse observability and evaluation pipeline for LLM development, covering tracing, prompt management, scoring, and experimentation.

May 2464

AI agent benchmarks obsess over coding while ignoring 92% of the US labor market, study finds

This article explains how current AI agent benchmarks focus narrowly on coding tasks, ignoring 92% of the US labor market, and why this limits the real-world applicability of AI systems.

Mar 7121