⚠️ Fake Science? AI's Shocking Citation Scam 🤯

May 21, 2026 |

AI

🎧 Audio Summaries
English flag
French flag
German flag
Japanese flag
Korean flag
Mandarin flag
Spanish flag
🛒 Shop on Amazon

🧠Quick Intel


  • ,900 AI-generated fake citations were identified across arXiv, bioRxiv, SSRN, and PubMed Central.
  • Researchers analyzed 111 million references from 2.5 million scientific papers to detect these fabricated citations.
  • Large language models (LLMs) like Gemini and ChatGPT are producing “hallucinations”—plausible-sounding but incorrect information.
  • The prevalence of non-existent references increased sharply following widespread LLM adoption.
  • ArXiv will ban authors submitting work with hallucinated citations or unverified AI content.
  • Usha Haley warns that the issue represents a serious threat to the foundation of peer review and cumulative knowledge within academia.
  • Steinn Sigurdsson, arXiv’s scientific director, stated that AI-generated content is “either actively wrong or just noise,” diluting the scientific corpus.
  • 📝Summary


    Researchers at Cornell and UCLA identified 146,900 AI-generated fake citations across four major research databases – arXiv, bioRxiv, SSRN, and PubMed Central. Analyzing 111 million references from 2.5 million papers, the study revealed a sharp increase in non-existent citations following the widespread adoption of large language models like Gemini and ChatGPT. These “hallucinations,” as they’re known, were distributed across numerous papers. Usha Haley, a Wichita State University professor, highlighted the serious implications for the foundation of scientific knowledge. arXiv has responded by announcing a ban on authors submitting work with unverified AI-generated citations, acknowledging the risk of “diluted” scientific content.

    💡Insights



    THE GROWING THREAT OF AI-GENERATED CITATIONS
    The burgeoning use of large language models (LLMs) like Gemini and ChatGPT presents a significant and unsettling challenge to the integrity of scientific research. A recent study conducted by researchers from Cornell and UCLA revealed the alarming presence of 146,900 AI-generated fake citations across four major research databases – arXiv, bioRxiv, SSRN, and PubMed Central. This discovery underscores a critical limitation of LLMs: their tendency to produce seemingly plausible but fundamentally inaccurate information, a phenomenon known as “hallucination.” The potential for researchers to unknowingly incorporate fabricated references into their work represents a serious erosion of trust within the scientific community and poses a considerable risk to the validity of research findings.

    METHODOLOGY AND FINDINGS OF THE RESEARCH
    The research team undertook a comprehensive analysis of 111 million references extracted from 2.5 million scientific papers. Their meticulous investigation focused on identifying citations with titles that could not be matched to any existing publication. While some instances were attributable to simple spelling errors, the study uncovered a substantial number of “hallucinated” citations – entirely fabricated references. Notably, the researchers examined citation rates prior to the widespread adoption of chatbots, revealing a sharp increase in non-existent references following the rise of LLMs. This pattern indicates that the problem is not solely a consequence of recent chatbot usage but a systemic issue exacerbated by their accessibility. Furthermore, the distribution of these fraudulent citations was widespread, appearing across numerous papers rather than concentrated in a few, suggesting a broader reliance on AI-generated references without adequate verification.

    IMPLICATIONS AND RESPONSES
    The proliferation of AI-generated citations represents a serious threat to the scholarly record and the foundations of scientific knowledge. Usha Haley, a professor of management at Wichita State University, aptly describes this situation as a “serious warning,” highlighting how these fabricated citations undermine trust in the peer-review process and cumulative knowledge. The problem is particularly concerning for early-career scholars who are increasingly reliant on AI tools. Recognizing the severity of the issue, scientific repository arXiv has announced a proactive approach, intending to ban authors who submit work containing hallucinated citations or exhibiting signs of unchecked AI content. arXiv’s scientific director, Steinn Sigurdsson, emphasizes the need to address the "noise" generated by AI, arguing that it hinders the ability to identify genuine scientific advancements. The organization’s efforts represent a crucial step towards safeguarding the integrity of the scientific record in the age of AI.