Introduction
In the world of AI research, a troubling trend has emerged: fake references—also called hallucinated citations—are slipping through peer review at top AI conferences. These false citations are generated by large language models (LLMs) like GPT, Gemini, and Claude, and they can mislead researchers and readers. But now, a new open-source tool called CiteAudit promises to help identify these fake citations. In this tutorial, you'll learn how to use CiteAudit to check scientific papers for hallucinated references. This is a beginner-friendly guide that will walk you through the setup and usage of this important tool.
Prerequisites
Before diving into CiteAudit, make sure you have the following:
- A computer with internet access
- Basic familiarity with using a terminal or command line
- Python 3.7 or higher installed on your system
- Git installed for cloning the repository
Why these prerequisites? Python is needed to run the tool's code, and Git helps you get the latest version of CiteAudit from its repository. A terminal is necessary to execute commands.
Step-by-Step Instructions
1. Clone the CiteAudit Repository
The first step is to get the CiteAudit tool onto your computer. Open your terminal and run the following command:
git clone https://github.com/citeaudit/citeaudit.git
This command downloads all the files from the CiteAudit GitHub repository to your local machine.
2. Navigate to the CiteAudit Directory
After cloning, you need to move into the CiteAudit directory:
cd citeaudit
This command changes your current working directory to the CiteAudit folder, where you'll find all the necessary files to run the tool.
3. Install Required Python Packages
CiteAudit depends on several Python libraries. Install them using pip, Python's package installer:
pip install -r requirements.txt
This command reads the requirements.txt file and installs all the necessary packages, such as requests, beautifulsoup4, and pydantic.
4. Prepare a Sample Scientific Paper
To test CiteAudit, you'll need a scientific paper with potential hallucinated citations. For this tutorial, create a simple text file named sample_paper.txt with the following content:
AI Research in 2025
Artificial intelligence has made significant strides in recent years. According to a study by Smith et al. (2023), machine learning models can now process over 10,000 images per second. Another researcher, Johnson (2022), claims that neural networks are capable of understanding complex language patterns. However, a paper by Brown et al. (2025) states that these models are not yet reliable for medical diagnosis.
References:
1. Smith, J. (2023). "Machine Learning Advances." Journal of AI, 12(3), 45-60.
2. Johnson, A. (2022). "Neural Networks and Language." AI Review, 9(2), 22-35.
3. Brown, T. (2025). "Medical Diagnosis with AI." Future Medicine, 7(4), 88-95.
This sample paper includes both real and fake references. The third reference (Brown et al., 2025) is fake because it's from a future year.
5. Run CiteAudit on Your Paper
With your paper ready, you can now run CiteAudit to analyze it. Use the following command:
python citeaudit.py sample_paper.txt
This command tells CiteAudit to analyze the sample_paper.txt file. The tool will scan the references and cross-check them against known databases to detect any fake citations.
6. Review the Output
After running the tool, CiteAudit will output a report. It will list any references that it suspects are hallucinated. For example, the output might look like this:
Detected hallucinated references:
- Brown, T. (2025). "Medical Diagnosis with AI." Future Medicine, 7(4), 88-95.
This reference appears to be from a future year and is likely hallucinated.
Why this matters: Detecting hallucinated references is crucial for maintaining the integrity of scientific research. If fake citations are not caught, they can mislead future researchers and distort the academic record.
7. (Optional) Customize CiteAudit for Your Needs
If you want to tweak CiteAudit's behavior, you can edit the configuration file. Look for a file named config.yaml and modify settings such as the databases used for verification or the confidence threshold for flagging references.
For example, you might want to add more databases to cross-check against:
database_sources:
- semantic_scholar
- crossref
- pubmed
- google_scholar
This step allows you to tailor the tool to your specific research needs.
Summary
In this tutorial, you've learned how to use the open-source tool CiteAudit to detect hallucinated references in scientific papers. You cloned the tool, installed its dependencies, prepared a sample paper, and ran the analysis. By identifying fake citations, CiteAudit helps ensure that the research community can trust the references in academic work. As AI continues to influence scientific publishing, tools like CiteAudit will become increasingly important for maintaining research integrity.
Next steps: Try using CiteAudit on real research papers from AI conferences or journals. The more you practice, the better you'll become at spotting potential issues in citations.



