Artificial intelligence (AI) is reshaping healthcare by enabling more accurate and individualized clinical predictions. Yet, amid the excitement over technological advancement, one fundamental issue remains surprisingly neglected: sample size. In their landmark Viewpoint published in The Lancet Digital Health, Riley et al. (2025) shed light on how inadequate sample sizes compromise the reliability, fairness, and clinical utility of AI-based prediction models.
Key Message: Bigger Isn’t Always Better—But Bigger Is Often Necessary
Contrary to the prevailing assumption that AI can “learn” efficiently even with modest data, the authors argue that insufficient sample sizes lead to:
- Unrepresentative Data: Small datasets fail to capture the diversity of the target population. As a result, prediction models become biased, especially for under-represented groups, which threatens health equity.
- Unstable Predictor Effects: Small samples lead to inconsistent selection of predictors and volatile coefficient estimates, making model explanations unreliable and impeding trust.
- Prediction Instability: Using thousands of simulations, the authors show that small sample sizes result in wildly fluctuating risk predictions—for the same individual.
- Weaker Discrimination and Miscalibration: Performance indicators such as c-statistics and calibration slopes deteriorate significantly with smaller datasets. This can mean life or death when clinical decisions depend on AI outputs.
- Lower Clinical Utility: Poorly performing models are not just theoretical problems—they can lead to wrong treatments or missed diagnoses in real patients.
- False Confidence in Validation: The paper presents multiple examples where small test datasets led to exaggerated or misleading claims about model performance, such as unrealistically perfect c-statistics.
Solutions and Tools
To overcome these issues, the authors recommend integrating statistical sample size calculations into the core of AI model design. They highlight several tools—including pmsampsize and pmvalsampsize for R and Stata—that help researchers determine appropriate sample sizes for model development and validation.
A Call to Action
The authors stress that ethical and scientific rigor demands attention to sample size. Particularly in high-stakes healthcare contexts, unreliable AI predictions can do more harm than good. Their advice is clear: avoid using convenience datasets that are too small, even if they are readily available, and do not proceed without verifying their adequacy.
Final Thought
This article is a wake-up call for clinicians, data scientists, regulators, and funders. As AI continues to integrate into healthcare, methodological discipline—including robust sample size planning—must take center stage. Otherwise, the promise of AI may be undercut by its weakest link: inadequate data.
Reference:
Riley, R. D., Ensor, J., Snell, K. I. E., Archer, L., Whittle, R., Dhiman, P., … & Collins, G. S. (2025). Importance of sample size on the quality and utility of AI-based prediction models for healthcare. The Lancet Digital Health, https://doi.org/10.1016/j.landig.2025.01.013

