Health Policy

The Importance of Sample Size in AI Health Predictions

Mehmet Nurullah KurutkanJune 5, 2025

Artificial intelligence (AI) is reshaping healthcare by enabling more accurate and individualized clinical predictions. Yet, amid the excitement over technological advancement, one fundamental issue remains surprisingly neglected: sample size. In their landmark Viewpoint published in The Lancet Digital Health, Riley et al. (2025) shed light on how inadequate sample sizes compromise the reliability, fairness, and clinical utility of AI-based prediction models.

Key Message: Bigger Isn’t Always Better—But Bigger Is Often Necessary

Contrary to the prevailing assumption that AI can “learn” efficiently even with modest data, the authors argue that insufficient sample sizes lead to:

Unrepresentative Data: Small datasets fail to capture the diversity of the target population. As a result, prediction models become biased, especially for under-represented groups, which threatens health equity.
Unstable Predictor Effects: Small samples lead to inconsistent selection of predictors and volatile coefficient estimates, making model explanations unreliable and impeding trust.
Prediction Instability: Using thousands of simulations, the authors show that small sample sizes result in wildly fluctuating risk predictions—for the same individual.
Weaker Discrimination and Miscalibration: Performance indicators such as c-statistics and calibration slopes deteriorate significantly with smaller datasets. This can mean life or death when clinical decisions depend on AI outputs.
Lower Clinical Utility: Poorly performing models are not just theoretical problems—they can lead to wrong treatments or missed diagnoses in real patients.
False Confidence in Validation: The paper presents multiple examples where small test datasets led to exaggerated or misleading claims about model performance, such as unrealistically perfect c-statistics.

Solutions and Tools

To overcome these issues, the authors recommend integrating statistical sample size calculations into the core of AI model design. They highlight several tools—including pmsampsize and pmvalsampsize for R and Stata—that help researchers determine appropriate sample sizes for model development and validation.

A Call to Action

The authors stress that ethical and scientific rigor demands attention to sample size. Particularly in high-stakes healthcare contexts, unreliable AI predictions can do more harm than good. Their advice is clear: avoid using convenience datasets that are too small, even if they are readily available, and do not proceed without verifying their adequacy.

Final Thought

This article is a wake-up call for clinicians, data scientists, regulators, and funders. As AI continues to integrate into healthcare, methodological discipline—including robust sample size planning—must take center stage. Otherwise, the promise of AI may be undercut by its weakest link: inadequate data.

Reference:

Riley, R. D., Ensor, J., Snell, K. I. E., Archer, L., Whittle, R., Dhiman, P., … & Collins, G. S. (2025). Importance of sample size on the quality and utility of AI-based prediction models for healthcare. The Lancet Digital Health, https://doi.org/10.1016/j.landig.2025.01.013

Video

Subscribe to the Health Topics Newsletter!

When One Method Is Not Enough: The Multimethod SEM Framework for Rigorous Research
March 12, 2026
Physicians rarely rely on a single diagnostic test when confronting a complex disease. They combine imaging, laboratory work, and genetic…
Can Generative AI Strengthen Critical Thinking? A Pedagogical Framework for LLM Integration in Higher Education
March 12, 2026
The rapid integration of large language models (LLMs) such as GPT-4 and DeepSeek R1 into higher education has generated considerable…
Analysis theories on artificial intelligence, ChatGPT, data science, and metaverse
February 15, 2026
The rapid convergence of artificial intelligence, data science, generative AI systems such as ChatGPT, and immersive environments like the metaverse…
Lotus Protocol: A New Approach to Systematic Reviews
February 13, 2026
The article How to Conduct a Multi-Domain Systematic (Literature) Review? Guidelines Using The Lotus Protocol addresses a growing methodological gap…
The Health Benefits of Voluntary Simplicity
February 12, 2026
Voluntary simplicity is a multidimensional lifestyle orientation that refers to individuals’ conscious reduction of consumption levels in order to build…
Reviewer Fatigue and the Future of Peer Evaluation
February 11, 2026
The contemporary academic publishing ecosystem is sustained by peer review, a system widely regarded as the epistemic backbone of scientific…
Factors Driving 30-Day ED Revisits in Older Patients
February 11, 2026
Population ageing has transformed emergency care demand patterns worldwide, placing unprecedented pressure on emergency departments (EDs) and exposing systemic gaps…
Addressing Care Worker Burnout: Key Findings
February 11, 2026
The growing complexity of long-term care needs, combined with chronic workforce shortages, has positioned nursing homes among the most psychologically…
Impact of Loneliness on Quality of Life in Older Adults
February 10, 2026
This article, titled “Loneliness as a Predictor of Quality of Life in Older Adults Receiving Primary Health Care in Türkiye:…
Analysis of Patient Participation: Trends and Insights
February 10, 2026
The growing emphasis on patient-centered healthcare has transformed the role of patients from passive recipients of care into active partners…

The Importance of Sample Size in AI Health Predictions

Video

Subscribe to the Health Topics Newsletter!

Related Posts