Artificial Intelligence

Beyond the Algorithm: Human Factors in Explainable AI and the Future of Clinical Trust

Mehmet Nurullah KurutkanNovember 17, 2025

In the rapidly evolving landscape of artificial intelligence (AI) in medicine, the integration of Explainable AI (XAI) is often championed as a remedy to the black-box problem—a perceived barrier to clinician trust. However, in their recent article published in npj Digital Medicine, Nicolson et al. (2025) deliver a meticulously crafted challenge to this assumption by interrogating whether XAI truly enhances trust, reliance, and diagnostic performance among clinicians using a concrete use-case: gestational age estimation via fetal ultrasound.

The study employs a three-stage reader protocol with ten clinicians (nine sonographers and one obstetric registrar), progressing from unaided estimation, to AI predictions, and finally to predictions accompanied by prototype-based visual explanations. Their central research question probes whether the addition of explanations improves not only performance, but also trust and reliance—measured both quantitatively (e.g., Mean Absolute Error [MAE], Weight of Advice [WoA]) and qualitatively (self-reported trust metrics).

At first glance, the results appear promising: the introduction of AI predictions significantly reduced clinicians’ MAE from 23.5 to 15.7 days (p = 0.008), and explanations led to a further (though statistically non-significant) improvement to 14.3 days. However, beneath this aggregated improvement lies a critical nuance: responses to the explanations were highly heterogeneous. Some participants improved markedly, while others performed worse than when aided by predictions alone.

This inconsistency strikes at the heart of XAI’s promises. Contrary to the widespread belief that explanations universally bolster trust, this study finds no significant increase in self-reported trust following exposure to explanations. In fact, a modest downward trend was observed. Moreover, reliance—as measured by changes in agreement with model estimates and WoA—was not significantly affected by the addition of explanations.

To address this ambiguity, the authors introduce the concept of “appropriate reliance”—a behaviorally anchored metric that distinguishes between justified and unjustified deference to AI predictions. This nuanced categorization (appropriate, under-reliance, and over-reliance) reveals that while appropriate reliance was more frequent than misuse, the addition of explanations did not improve this balance.

A striking finding is the alignment between perceived and actual utility of explanations. Participants who reported the explanations as helpful tended to exhibit improved accuracy (i.e., reduced MAE), while those who found them confusing or irrelevant tended to worsen. This correlation (r = –0.74, p = 0.014) underscores the importance of subjective usability in evaluating the efficacy of XAI systems—challenging the notion that user perception is an unreliable metric, as previously suggested in literature (Nagendran et al., 2023).

The authors identify several potential reasons for this outcome. One is cognitive mismatch: the prototype-based visual explanations (heatmaps and similarity-based prototypes) were unfamiliar to clinicians accustomed to reasoning via anatomical landmarks and measurements. This introduced cognitive load that, rather than assisting decision-making, may have hindered it—a concern echoed in prior critiques of XAI design (Ehrmann et al., 2022; Asgari et al., 2024).

Another key insight relates to the regulatory implications. Current frameworks such as the EU’s Artificial Intelligence Act assume that adding interpretability mechanisms like XAI inherently supports human oversight. Yet, this study shows that such explanations may degrade performance in certain users, suggesting a need for more robust evaluation standards that include human–AI interaction outcomes rather than technical metrics alone.

The authors recommend moving away from generic XAI solutions toward explanation designs that better align with clinical reasoning processes. Rather than defaulting to saliency maps or prototype visualizations, explanation formats should consider task-specific reasoning styles, decision complexity, and user expertise. Training and interface design are also identified as critical mediators in promoting appropriate reliance.

While limited by sample size and task specificity, the study makes a compelling case for embedding human variability at the center of XAI evaluation. It cautions against relying on explanation presence as a proxy for safety, performance, or trust, and instead calls for a deeper engagement with how different clinicians actually interpret and integrate model outputs.

In essence, this work is a pivotal reminder that AI in healthcare is not simply a matter of algorithms—but of people, cognition, and design. Explanations do not exist in a vacuum. Their impact depends on how they interface with human reasoning under conditions of uncertainty, time pressure, and domain-specific heuristics. As AI systems become more deeply embedded in clinical workflows, designing for this human factor will be the critical differentiator between safe augmentation and misplaced reliance.

Reference: Nicolson, A., Bradburn, E., Gal, Y., Papageorghiou, A. T., & Noble, J. A. (2025). The human factor in explainable artificial intelligence: Clinician variability in trust, reliance, and performance. npj Digital Medicine, 8(658). https://doi.org/10.1038/s41746-025-02023-0

Subscribe to the Health Topics Newsletter!

When One Method Is Not Enough: The Multimethod SEM Framework for Rigorous Research
March 12, 2026
Physicians rarely rely on a single diagnostic test when confronting a complex disease. They combine imaging, laboratory work, and genetic…
Can Generative AI Strengthen Critical Thinking? A Pedagogical Framework for LLM Integration in Higher Education
March 12, 2026
The rapid integration of large language models (LLMs) such as GPT-4 and DeepSeek R1 into higher education has generated considerable…
Analysis theories on artificial intelligence, ChatGPT, data science, and metaverse
February 15, 2026
The rapid convergence of artificial intelligence, data science, generative AI systems such as ChatGPT, and immersive environments like the metaverse…
Lotus Protocol: A New Approach to Systematic Reviews
February 13, 2026
The article How to Conduct a Multi-Domain Systematic (Literature) Review? Guidelines Using The Lotus Protocol addresses a growing methodological gap…
The Health Benefits of Voluntary Simplicity
February 12, 2026
Voluntary simplicity is a multidimensional lifestyle orientation that refers to individuals’ conscious reduction of consumption levels in order to build…
Reviewer Fatigue and the Future of Peer Evaluation
February 11, 2026
The contemporary academic publishing ecosystem is sustained by peer review, a system widely regarded as the epistemic backbone of scientific…
Factors Driving 30-Day ED Revisits in Older Patients
February 11, 2026
Population ageing has transformed emergency care demand patterns worldwide, placing unprecedented pressure on emergency departments (EDs) and exposing systemic gaps…
Addressing Care Worker Burnout: Key Findings
February 11, 2026
The growing complexity of long-term care needs, combined with chronic workforce shortages, has positioned nursing homes among the most psychologically…
Impact of Loneliness on Quality of Life in Older Adults
February 10, 2026
This article, titled “Loneliness as a Predictor of Quality of Life in Older Adults Receiving Primary Health Care in Türkiye:…
Analysis of Patient Participation: Trends and Insights
February 10, 2026
The growing emphasis on patient-centered healthcare has transformed the role of patients from passive recipients of care into active partners…

Beyond the Algorithm: Human Factors in Explainable AI and the Future of Clinical Trust

Subscribe to the Health Topics Newsletter!

Related Posts