Artificial intelligence models are increasingly capable of generating accurate responses to complex queries, but a new study reveals a concerning flaw: even when they get the answer right, they often point to incorrect sources. Researchers at Peking University have identified this issue as "attribution hallucination," a phenomenon where AI systems cite text passages that don't actually support their conclusions.
Identifying the Problem
The problem becomes especially critical in regulated fields such as law and medicine, where the accuracy of cited sources is paramount. The researchers introduced a new benchmark called CiteVQA to systematically evaluate AI models for this specific flaw. Using this tool, they found that even top-tier models like GPT and Gemini frequently produce correct answers but back them up with inaccurate citations.
Implications for AI Reliability
This discovery underscores a growing concern in the AI community: while large language models excel at generating plausible content, their ability to accurately attribute information remains unreliable. "This is a major risk for high-stakes applications," said one of the study's lead researchers. The flaw could lead to misinformation in critical domains, undermining trust in AI systems. The CiteVQA benchmark now offers a standardized way to test for this issue, potentially helping developers improve the reliability of AI outputs.
Looking Forward
As AI becomes more embedded in professional and academic environments, ensuring the integrity of cited sources is essential. The findings from Peking University highlight the need for better evaluation tools and methods that go beyond simple accuracy tests. Addressing attribution hallucination is not just a technical challenge—it's a step toward building more trustworthy AI systems.



