Tag
1 article
Learn to build an AI interpretability tool that analyzes how language models make decisions by examining attention patterns and gradients, following principles discussed by Anthropic's Chris Olah.