Fabricated Citations in Medical Research: A Reader-Friendly Summary and Critical Review

Mehmet Nurullah KurutkanMay 10, 2026

Source article: Topaz, M., Roguin, N., Gupta, P., Zhang, Z., & Peltonen, L.-M. (2026). Fabricated citations: An audit across 2·5 million biomedical papers. The Lancet, 407(10444), 1779–1780.

What the Study Is About

When scientists publish a paper, they cite earlier studies to back up what they say. Each citation is essentially a promise: “If you check this source, you will find a real study that supports my claim.” When that promise is broken — when the cited study does not actually exist — readers, peer reviewers, and the doctors or policy makers who rely on the paper have no way to check the evidence.

Topaz and colleagues (2026) call these non-existent citations “fabricated references.” They explain that fabricated citations can appear for three reasons. The first is paper mills, which are commercial operations that produce fake research papers and sell authorship slots. The second is deliberate dishonesty by authors. The third, and increasingly common, reason is the careless use of artificial intelligence writing tools such as ChatGPT or Claude. These tools sometimes invent citations that sound completely real — with believable titles, real author names, and reasonable publication dates — but point to studies that were never written. Earlier research had already shown that between 30% and 69% of citations produced by AI tools in medical writing are fake.

To find out how widespread this problem has become, the research team built an automated system that checked the references of 2,471,758 medical papers published in PubMed Central’s open-access collection between January 2023 and February 2026. This collection contained more than 125 million individual references. About 77% of those references (roughly 97 million) had a unique identification number called a PMID, which made them traceable; the remaining 23% — mainly websites, books, and informal documents — could not be checked and were left out.

For each traceable reference, the system pulled the official record from two large databases (PubMed and Crossref) and compared the title, authors, and journal name to what the paper actually claimed. When something did not match, the reference was flagged. To avoid mistakenly labelling a real but slightly miscited reference as fabricated, the team used several filters. One of these filters was an AI tool itself — Claude 3.5 Haiku — which was asked to tell the difference between a true fake and a simple formatting error (for example, a shortened title). References that survived all the filters were then searched against four major databases: PubMed (about 37 million records), Crossref (over 160 million digital identifiers), OpenAlex (over 250 million scholarly works), and Google Scholar. If a reference could not be found in any of these four sources, it was classified as fabricated. The team checked 500 cases by hand using three independent reviewers and found that the system was correct about 91% of the time.

The results were striking. Out of 97 million checked references, the team found 4,046 fabricated ones spread across 2,810 papers. More importantly, the rate was rising rapidly. In 2023, only about one paper in every 2,828 contained a fabricated reference. By 2025, this had climbed to one in 458, and in the first weeks of 2026, one in every 277 papers contained at least one made-up citation. Put differently, the rate of fabrication grew more than twelve-fold — from about 4 per 10,000 papers in 2023 to nearly 57 per 10,000 papers in early 2026.

The article also highlights some unusually serious cases. One paper from 2025 on a urological surgery technique contained 18 fabricated references out of the 30 the team could check — meaning 60% of its citations pointed to studies that did not exist. Each fake reference was tailored to the paper’s specific topic and attributed to real surgeons, which made the deception hard to spot. The team also found signs of organised fraud: in one surgical journal in 2025, the same two authors appeared on 11 different papers, and these papers shared 15 fabricated references covering very different topics, from CRISPR diagnostics to gut microbiome research. Most affected papers (91%) contained only one or two fabricated citations, but 246 papers contained three or more. Review articles — papers that summarise existing research — had a fabrication rate 57% higher than other types of articles.

The authors note that the sharp increase that began in mid-2024 lines up with the timing one would expect after AI writing tools became widely available in late 2022 and 2023, given that papers typically take three to seven months from submission to publication. They acknowledge, however, that paper mills and changes in how journals are indexed could also have contributed.

Is the Method Reliable? An Honest Assessment

The study has clear strengths. The sample size is enormous and the verification approach uses four independent databases instead of relying on just one, which makes a “not found” verdict more trustworthy. The filtering process is layered carefully to avoid wrongly accusing authors. The 91% accuracy rate, confirmed by three independent human reviewers, is reasonable for an automated system of this scale. Presenting the data quarter by quarter also makes the trend easy to follow and difficult to dismiss.

That said, several limitations are important — and the authors deserve credit for openly admitting some of them. The system measures how often it is right when it flags a fake (precision), but not how many fakes it misses (recall). This means the reported numbers should be read as a floor, not a ceiling: the real fabrication rate is probably higher than what this study reports. The 23% of references that were excluded because they lacked an identifier could change the picture in either direction — fakes might cluster in those informal sources, or fraudsters might deliberately use traceable identifiers to look legitimate. The PubMed Central open-access collection also leaves out subscription-only journals and most non-English regional publications, so the findings cannot be generalised to the entire medical literature without caution.

Playing Devil’s Advocate

A more sceptical reading would raise several further concerns.

The most uncomfortable point is the circularity of using one AI tool to detect the failures of other AI tools. Claude 3.5 Haiku was used to filter out false positives, and the authors also acknowledge that they used Claude in writing the paper itself. Whether the same model can reliably catch the hallucinations of other AI systems — ChatGPT, Gemini, Llama, and so on — is not directly tested. The model was applied without any fine-tuning, which means its blind spots may have shaped the results in ways no one can fully see.

The link between AI adoption and the rise in fake citations is suggestive but not proven. The authors describe a striking time match between when AI tools became popular and when fabrication rates began to climb. But correlation is not causation. The same period saw growth in paper mill operations in several countries, changes in how PubMed indexes journals, and a major expansion in open-access mega-journals, all of which could push the numbers up independently of AI use. The study does not statistically separate these explanations, which means readers should be cautious about treating “AI tools caused this” as a settled conclusion.

The headline figure — a more than twelve-fold increase — is rhetorically powerful but worth putting in context. Even at the highest rate observed, only about 0.57% of papers contained a fabricated citation. For comparison, an earlier systematic review by Jergas and Baethge (2015) found that roughly one in four citations in medical journal articles contained errors of some kind. Reference inaccuracy is a long-standing, pre-AI problem; fabrication is a new and disturbing layer on top of it, but it is not yet the dominant form of citation failure.

The 2026 data point also deserves a note of caution. It covers only seven weeks, not a full quarter, yet appears on the chart alongside complete quarters. Although the authors mark it with an open symbol, visual readers may still treat it as a finished data point. Submission patterns vary across the year — January often sees a wave of submissions and review articles — so a short window can over- or under-state the underlying trend.

A small number of extreme cases also do a lot of heavy lifting in the paper’s narrative. The 60%-fabricated urology paper and the eleven coordinated papers in one surgical journal are vivid examples, but they represent a tiny fraction of the affected papers. The study does not report how much these outlier cases pull the overall rate upward, so we cannot tell whether the broader literature has a real, evenly spread problem or whether a few coordinated fraud rings are dominating the statistics.

Finally, the rule “if no database contains the reference, it must be fake” is reasonable but not airtight. Indexing takes time, especially for very recent papers, and several legitimate publication types — small regional journals, non-English sources, conference proceedings, and journals that have since closed — are under-represented in the four databases used. Some of the 4,046 references labelled as fabricated may simply be real but invisible.

None of these objections overturn the study’s core message. The trend is real, the practical recommendations (automatic reference checking before peer review, integrity metadata in indexing, retrospective screening of existing papers, a formal category for fabricated references in research integrity databases) are sensible regardless of how much of the rise is attributable to AI specifically. But the implied story that “AI tools are flooding medicine with fake citations” is stronger than the evidence presented can fully support — and that distinction matters when the goal is to design policy responses that actually fix the right problem.

References (APA 7)

Jergas, H., & Baethge, C. (2015). Quotation accuracy in medical journal articles—A systematic review and meta-analysis. PeerJ, 3, e1364. https://doi.org/10.7717/peerj.1364

Topaz, M., Roguin, N., Gupta, P., Zhang, Z., & Peltonen, L.-M. (2026). Fabricated citations: An audit across 2·5 million biomedical papers. The Lancet, 407(10444), 1779–1780. https://doi.org/10.1016/S0140-6736(26)00603-3

Subscribe to the Health Topics Newsletter!

The Third Ethnographic Wave in Health: Four Fields of Digital Health Ethnography and the Epistemological Opportunities They Offer to Health Management
May 11, 2026
Abstract This essay conceptualizes the transformation of ethnography in the health field through a three-wave model and systematically maps the…
The Behavior Change Technique Taxonomy v1: A Common Language for Intervention Science
May 4, 2026
A guide to the 93-technique classification system that transformed how we design, report, and evaluate behavior change interventions — and…
Item-Level Content Overlap Analysis (Jingle–Jangle)
May 1, 2026
Concept, Scope, Hazards, Blind Spots, and Limitations 1. The Core of the Concept Item-level content overlap analysis is a psychometric…
Principles for Understanding Trust in Artificial Intelligence: A Conceptual Map and the Research Questions Left Open
April 29, 2026
Citation: Everett, J. A. C., Claessens, S., Knöchel, T.-D., & Reinecke, M. G. (2026). Principles for understanding trust in artificial…
A Step-by-Step Guide to Bibliometric Analysis Techniques: Methodological Advances and Healthcare Applications (2021–2026)
April 27, 2026
Mehmet Nurullah Kurutkan Düzce University, Faculty of Business, Department of Health Management Abstract Bibliometric analysis has, in five years, evolved…
The Sense and Nonsense of Effect Size: What Does Funder and Ozer’s (2019) Call Mean for Health Research?
April 26, 2026
The Position and Central Thesis of the Article David C. Funder and Daniel J. Ozer's article "Evaluating Effect Size in…
When Authorship Becomes Currency: A Critical Reading of the Authorship Misappropriation Diamond and Its Unfinished Agenda
April 25, 2026
A commentary on Oyenuga, Apata, Oladele, and Jeresa (2026), Ethics & Behavior, with reference to Ioannidis, Klavans, and Boyack (2018,…
Machine Learning in Surgical PROMs: A Critical Academic Appraisal and a Forward Research Agenda
April 24, 2026
Reviewed article: Alanezi T., Li B., Al-Omran L. et al. (2026). Machine learning in the development and application of patient-reported…
GLOBAL: The First Evidence-Based Reporting Guideline for Bibliometric Analyses
April 20, 2026
A Review and Critical Commentary for healthtopic.orgArticle Reviewed: Ng JY, Syed N, Zubashev D, Masood M, Liu H, Sabé M,…
The Limits Of Explicit Criteria: An Epistemic-Tension Analysis of Donabedian’s 1981 Text, a Reverse Reading Through AGREE–GRADE–NICE, and a Counterfactual Interpretation for the Machine-Learning Era
April 19, 2026
Abstract This study examines Avedis Donabedian's infrequently cited 1981 text, Advantages and Limitations of Explicit Criteria for Assessing the Quality…