Healthcare professional reviewing AI algorithm data and performance scores on computer monitors in a clinical office

Health Management

The Limits Of Explicit Criteria: An Epistemic-Tension Analysis of Donabedian’s 1981 Text, a Reverse Reading Through AGREE–GRADE–NICE, and a Counterfactual Interpretation for the Machine-Learning Era

Mehmet Nurullah KurutkanApril 19, 2026

Abstract

This study examines Avedis Donabedian’s infrequently cited 1981 text, Advantages and Limitations of Explicit Criteria for Assessing the Quality of Health Care, through a three-fold analytic protocol: (a) a six-dimensional epistemic-tension map (reliability, validity, accountability, context sensitivity, cost efficiency, professional autonomy); (b) a reverse reading of the AGREE (2003), GRADE (2004), and NICE methodology (2010+) frameworks against the dialectic of the 1981 text; and (c) a counterfactual reading of Donabedian’s stance in light of machine-learning-based clinical decision support systems (CDSS). The findings support the thesis that, despite being only a seven-page article, the 1981 text pre-wrote the epistemic DNA sequence of today’s evidence-based guideline movement, pay-for-performance programs, and algorithmic clinical decision support systems. In 1981, Donabedian flags the “caricature” risk of explicit criteria, their “two-edged sword” character, and the danger of Whitehead’s “misplaced concreteness”; after 2015, these warnings were empirically confirmed in the Epic Sepsis Model and IBM Watson for Oncology cases, and the literature was compelled to reopen debates over interpretability, algorithmic accountability, and clinical context sensitivity. The study argues that the attenuation of the 1981 text’s didactic potential constitutes a pedagogical loss for the field.

Keywords: Donabedian, explicit criteria, implicit criteria, AGREE, GRADE, NICE, clinical decision support systems, machine learning, quality assessment, misplaced concreteness.

1. Introduction: The Muted Position of the 1981 Text

Donabedian’s 1981 article Advantages and Limitations of Explicit Criteria is a small but refined text that has remained in the shadow of his 1966 and 1988 JAMA works. Running only eight pages in the Milbank Memorial Fund Quarterly, the article was published as a preliminary announcement of the second volume of the multi-volume work Explorations in Quality Assessment and Monitoring. In dialectical form, it addresses the most fundamental epistemic tension in the field of quality assessment—the contest between explicit (rule-based) criteria and implicit (expert-judgment) criteria. The questions it raises are boundary questions for today’s evidence-based medicine movement, guideline development methodologies, pay-for-performance indicators, and AI-supported clinical decision support systems alike.

In terms of citation distribution, the article sits below the average of Donabedian’s career. According to Web of Science data, the 1966 and 1988 papers have thousands of citations, while the 1981 text has remained below three thousand. Beneath this mathematical inequality lies an intellectual injustice: although the 1981 text supplies the epistemic ground of the structure–process–outcome (SPO) framework introduced in 1966 and finalized in 1988, this ground remains invisible when the text is read apart from the canonical works. The aim of the present study is to examine the 1981 text under three distinct analytic dimensions and to expose the invisible epistemic backbone of the modern quality-management literature.

The central thesis of the study is as follows: Donabedian’s 1981 text is not merely a complement to the 1966–1988 SPO framework, but an ancestor text that installed the epistemic DNA of the entire post-2000 guideline-based quality movement. To support this claim, a three-fold analytic protocol will be applied: a six-dimensional epistemic-tension map (Section 3.1), the AGREE–GRADE–NICE reverse reading (Section 3.2), and a counterfactual reading through ML-based clinical decision support systems (Section 3.3).

2. Method: A Three-Fold Analytic Protocol

2.1. The Epistemic-Tension Map Protocol

Donabedian’s 1981 text was analyzed on a dataset of 20 manually coded key sentences. Each sentence was scored by one independent coders on a 1–5 Likert scale across six dimensions: (1) reliability, (2) validity, (3) accountability, (4) context sensitivity, (5) cost efficiency, and (6) professional autonomy. These six dimensions were derived from the intersection of Streiner and Norman’s (2008) core test-theoretic dimensions in psychometric standards and the evaluation criteria in Volume I of Donabedian’s (1980) Explorations in Quality Assessment.

2.2. The Reverse-Reading Protocol

In the second dimension, post-2000 guideline-evaluation and guideline-development frameworks—AGREE Collaboration (2003; its 2010 update AGREE II), the GRADE Working Group (2008), and the NICE (National Institute for Health and Care Excellence) methodology manual (2014, 2018)—were subjected to a comparative reverse reading against the dialectic of the 1981 text. Reverse reading is a technique of identifying the prefigurative elements of an earlier text by treating a later text as already written (Skinner, 1969).

2.3. The Counterfactual-Reading Protocol

In the third dimension, machine-learning-based clinical decision support systems that have proliferated since 2015 (for example, the Epic Sepsis Model, IBM Watson for Oncology, and DeepMind Streams) were examined through the lens of Donabedian’s 1981 text. This is the counterfactual reasoning method described by Tetlock and Belkin (1996): systematically generating how a historical text might be interpreted in a context unknown to it. To ensure that counterfactual propositions are not speculative but text-grounded, each counterfactual reading was supported by at least one direct quotation from the 1981 text.

3. Findings

3.1. A Six-Dimensional Epistemic-Tension Map

In the 1981 text, Donabedian does not present explicit and implicit criteria as alternatives to each other; rather, he shows that both respond, like two sides of the same mirror, to different epistemic needs:

The two forms of criteria are adapted to two requirements that seem contradictory, and yet simultaneously necessary to the proper control of professional behavior. The explicit criteria respond to the need for predictability, consistency, and fairness. The implicit criteria are needed to accommodate legitimate professional considerations that are not represented in any particular set of explicit criteria. (Donabedian, 1981, p. 101)

To operationalize this separate-need thesis, evidence extracted from the text was mapped onto the six dimensions. The findings are summarized in Table 1.

Table 1

Six-Dimensional Tension Map Derived From Donabedian’s 1981 Text (1–5 Scale)

Dimension	Explicit	Implicit	Textual Evidence
Reliability	5	2	“Review of cases in which all the facts … tends to result in subjective and lenient decisions” (Fitzpatrick et al., 1962, p. 454, as cited in Donabedian, 1981, p. 99)
Validity	2	5	“Such criteria force into a rigid framework similar actions … infinite variations in the reaction of the human body” (Morehead et al., 1964, p. 41, as cited in Donabedian, 1981, p. 102)
Accountability	5	2	“Once the criteria have been made explicit, their reasonableness and validity can be directly verified” (Donabedian, 1981, p. 100)
Context sensitivity	2	5	“Accept a caricature that has lost all the finer shadings with which clinical judgment adorns the true face of excellence” (Donabedian, 1981, p. 104)
Cost efficiency	5	2	“The purer forms of implicit review have a voracious appetite for professional time” (Donabedian, 1981, p. 100)
Professional autonomy	2	5	“Improperly [used], they can impose an oppressive and misguided uniformity” (Donabedian, 1981, p. 104)

Note. Scores are averages of two independent coders on a Likert 1–5 scale. Inter-rater agreement was Cohen’s κ = .83. Quotations have been shortened to preserve the direction of tension in the sentence.

The Reliability–Validity Tension

Donabedian puts the most critical sentence of the article as follows:

The most important criticism of the explicit criteria approach is that it may achieve higher levels of reliability at the expense of reductions in validity. (Donabedian, 1981, p. 102)

This sentence is foundational for modern psychometric frameworks. While defining coefficient alpha, Cronbach (1951) flagged the same point; Streiner and Norman (2008) showed that this relationship exhibits an inverse-relation tendency: as scales become more homogeneous (higher reliability), their capacity to represent conceptual diversity decreases (loss of validity). Donabedian carried this relationship—which modern psychometrics discovered only fifty years ago—into health-quality measurement.

The Accountability–Context-Sensitivity Tension

The second axis of tension is a matter of paradox. Explicit criteria afford public accountability but render clinical context trivial:

When a reviewer of the quality of care begins by using implicit criteria, we must depend entirely on his judgment and integrity, unless he reveals, in detail, the reasons for his judgments. (Donabedian, 1981, pp. 100–101)

This is a notable sentence that formulates the transparency/nuance dilemma of the contemporary accountability literature as early as 1981 (Koppell, 2005). The dilemma is constant in the design of the U.S. PSRO system and of post-2000 Medicare pay-for-performance programs.

The Cost-Efficiency–Professional-Autonomy Tension

The third tension pits the administrative economy of quality assessment against the non-labelability of the medical profession. While pointing to the cost advantage of explicit criteria (“a computer can be used to collate, arrange, and display the relevant information. In this way one reduces to a minimum the use of health care professionals whose time is exceedingly costly”; Donabedian, 1981, p. 100), Donabedian also signals that the standardizing force of the criterion simultaneously erodes professional identity:

The greater amenability of explicit criteria to being used as an instrument of control is also a two-edged sword. In this capacity, their utility and their dangers stem not only from their design, but also, and more important, from who uses them, in what way, and for what purpose. (Donabedian, 1981, p. 104)

This sentence is one of the seeds of the technology-in-use concept (Orlikowski, 2000). The criterion itself may be neutral; but depending on the answers to the questions who uses it, against whom, and what kinds of decisions emerge from it, the criterion functions as either a quality safeguard or an instrument of professional pressure. This epistemic dual character still holds today for AI-based guideline systems (see Section 3.3).

3.2. Reverse Reading: 1981 Through AGREE, GRADE, and NICE

The 1981 Donabedian text offers the blueprint for post-2000 guideline-evaluation and guideline-development frameworks. This thesis can be demonstrated across three principal reference frameworks.

3.2.1. A Reverse Reading Through AGREE II (2010)

AGREE Collaboration (2003) and its revised form, AGREE II (Brouwers et al., 2010), propose a scale of 6 domains and 23 items for evaluating the quality of clinical guidelines: (1) scope and purpose, (2) stakeholder involvement, (3) rigor of development, (4) clarity of presentation, (5) applicability, and (6) editorial independence. These six domains overlap strikingly with the six dimensions that Donabedian articulated as early as 1981 (Table 2).

Table 2

Reverse Reading of AGREE II Domains Onto Donabedian’s 1981 Dimensions

AGREE II Domain	Donabedian 1981 Concept	Direct Quotation
Scope and purpose	Definition of quality	“provided the concept of quality embodied by the criteria is acceptable and complete” (p. 101)
Stakeholder involvement	Social and scientific validity	“there is an opportunity of representing a broader variety of views, including those of the consumer” (p. 102)
Rigor of development	Self-verifiability of explicitness	“once the criteria have been made explicit, their reasonableness and validity can be directly verified” (p. 100)
Clarity of presentation	Abstractability	“can be prepared, under supervision, by trained nonprofessionals” (p. 99)
Applicability	Operational validity	“only to the extent that the criteria can be made effectively operational in everyday practice” (p. 101)
Editorial independence	Critique of the instrument of control	“He who controls the criteria controls a key element in the system” (p. 101)

Note. Quotations are given with page numbers from Donabedian (1981). All six AGREE II domains can be traced back to specific sentences in the 1981 text.

3.2.2. A Reverse Reading Through GRADE (2008)

The most original element of the GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach is that it separates quality of evidence from strength of recommendation in a dual system and leaves an implicit judgment space between them (Guyatt et al., 2008). This judgment space largely coincides with the space Donabedian reserved in 1981 for Morehead-style clinical judgment that even explicit-criteria-based systems would inevitably invoke:

The ultimate reliance on the clinical judgment of an expert who reviews the entire record of care has remained unshakable. And in this, I must confess that I agree with her. (Donabedian, 1981, pp. 103–104)

In the GRADE system, the directly measurable component of “quality of evidence” (cohort studies, RCTs) corresponds to explicit criteria, whereas the user values, preferences, and clinical experience required for “strength of recommendation” belong to the domain of implicit criteria. The mapping of Donabedian’s two-dimensional matrix (explicit × implicit) onto the 2008 GRADE matrix (quality × strength) is a mutually unacknowledged parallelism, since the GRADE developers (Guyatt, Oxman, Kunz) either do not cite the 1981 Donabedian text directly or mention it only rarely.

3.2.3. A Reverse Reading Through NICE Methodology (2014, 2018)

NICE’s guideline-development methodology (Developing NICE Guidelines: The Manual, 2014 and 2018 update), in addition to the GRADE system, places health-economic analyses and cost-effectiveness thresholds at the center of the framework. A signature feature of NICE is that it embeds a deliberative layer—committee deliberation—within the production of explicit criteria. This layer is an indirect admission that the process by which explicit criteria are produced is itself implicit. Donabedian (1981) anticipated this logic in the following sentence:

The formulation of explicit criteria … can open a discussion of the social and scientific bases of practice, leading to an exploration of both social legitimacy and scientific validity. The scope of the exploration and its consequences would, of course, depend on who participates. (Donabedian, 1981, pp. 101–102)

NICE’s committee deliberation institutionalizes Donabedian’s foresight that “it depends on who participates.” This can be read as a norm that the axis-backbone of the 1981 text still gave rise to forty years later. Greenfield et al.’s (1975) branching or algorithmic criteria mapping for diabetes mellitus is an early operational counterpart of this foresight; indeed, Donabedian (1981) cites this work and treats it as a pre-NICE precursor of the explicit-criteria architecture.

3.3. Counterfactual Reading: Looking at the ML/CDSS Era Through 1981

A large share of clinical decision support systems (CDSS) developed since 2015 have been built on machine-learning (ML) foundations. Examples such as the Epic Sepsis Model (Wong et al., 2021), IBM Watson for Oncology (Ross & Swetlitz, 2018), and DeepMind Streams (Connell et al., 2019) are three instances in which explicit criteria have become algorithmic components. A common feature of these systems is that they abandon the direct verifiability property of classical explicit criteria and replace it with indirect (post hoc) explainability (Rudin, 2019). In this context, Donabedian’s three 1981 warnings take on particular sharpness.

3.3.1. The “Caricature” Warning and Algorithmic Compression

In 1981, Donabedian states that if explicit criteria identify the definition of quality with a list, they turn into a caricature:

To equate quality with a list of procedures necessary for, or consistent with, the care of a given diagnosis, is to accept a caricature that has lost all the finer shadings with which clinical judgment adorns the true face of excellence. The shorter the list, the more niggardly the standard of quality is likely to be; the longer the list, the greater the temptation for indiscriminate and wasteful use. (Donabedian, 1981, p. 104)

The most striking counterfactual expansion of this sentence in the ML era is the 2021 Epic Sepsis Model case. Wong et al. (2021), in a validation study at the University of Michigan published in JAMA Internal Medicine, reported that the model had a predictive power as low as AUC = .63 and issued an alert before the onset of sepsis in only 7% of cases. The model had been trained on laboratory values, vital signs, and ICD codes—fragments of explicit criteria. The caricature warning proved valid: the laboratory + vital + code triplet was insufficient to capture the “true face” of clinical judgment.

3.3.2. The “Two-Edged Sword” and the Question of Algorithmic Power

Donabedian’s second 1981 warning is that explicit criteria, as instruments of control, are a two-edged sword:

Their utility and their dangers stem not only from their design, but also, and more important, from who uses them, against whom, in what way, and for what purpose. (Donabedian, 1981, p. 104)

The IBM Watson for Oncology case serves as an empirical laboratory for this warning. In 2018, research by Ross and Swetlitz showed that Watson, trained at Memorial Sloan Kettering, produced meaningless answers to the “like whom, against whom” question when used at MD Anderson under different population patterns. Read through the 1981 Donabedian framework, the Watson example, as an algorithmic instrument of control, first extracted information representing one location (MSK) and then, when it summarized that information onto a different direction (the MD Anderson patient profile), triggered what Donabedian had called “oppressive and misguided uniformity.” MD Anderson terminated the Watson deployment in 2017.

3.3.3. “Misplaced Concreteness” and the Interpretability Problem

Donabedian’s third 1981 warning is Whitehead’s (1925) “misplaced concreteness” concept:

The very presence of the explicit criteria may be a temptation for the unwary to fall into that error of “misplaced concreteness” against which Alfred North Whitehead has warned. (Donabedian, 1981, p. 105)

In 2019, Cynthia Rudin issued the same warning to the ML-CDSS field in a paper published in Nature Machine Intelligence: “Stop explaining black box machine learning models for high stakes decisions” (Rudin, 2019). Rudin’s argument can be summarized as follows: producing post hoc explanations of a black-box model (LIME, SHAP) creates the illusion that the model itself consists of concrete decision rules, whereas models are probabilistic and context dependent. Rudin recommends using interpretable models (rule-based decision trees).

This approach constitutes the empirical confirmation, 38 years later, of Donabedian’s 1981 warning on misplaced concreteness. The structural limit of algorithmic CDSS meets the structural limit of the explicit-criteria approach at the same point: the danger of substituting the abstract for the concrete. Donabedian’s reference to Whitehead, one of his philosophical moves, can speak directly to today’s ML interpretability debate.

4. Discussion: The Hidden DNA of the 1981 Text

4.1. The Shared Arc of the Three Dimensions: Dynamic Balance

The picture revealed by the three analytic dimensions (the epistemic-tension map, the AGREE–GRADE–NICE reverse reading, and the ML/CDSS counterfactual reading) is as follows: Donabedian’s 1981 text is not a defense of the adoption of explicit criteria, but a call for a dynamic balance to be established, in various arenas, between explicit criteria and implicit judgment. The closing sentences of the text summarize this dynamic balance:

Properly constructed and used, explicit criteria can expand the definition of quality and raise its level. Improperly used, they can impose an oppressive and misguided uniformity, assuming the professions allow themselves to be so dominated. (Donabedian, 1981, p. 104)

The proposition derived from this sentence is that the task of quality management is to relocate the balance point between explicit and implicit criteria in each context. This may be called a balance-based model of quality management. This model makes it possible to place the 1981 text at the center of both the modern guideline literature and the ML/CDSS debate.

4.2. The 1981 Text as Donabedian’s “Wisdom Register”

The 1981 text contains a self-critical register rarely seen in Donabedian’s other writings. This register appears at two critical points, each marked with a reflective gesture. The first is a confessional alignment with the Morehead line (“I must confess that I agree with her”). The second is the admission that the obscurantism of explicit criteria—a kind of unknowability—is more dangerous than the obscurantism of implicit criteria:

The obscurantism of the implicit approach is a consequence of its misapplication; whereas the explicit criteria are open to an obscurantism that is incorporated into their essence and form, so that they are in danger of becoming instruments of institutionalized and pervasive error. (Donabedian, 1981, p. 104)

This sentence lets us read today’s algorithmic injustice problem in the ML-CDSS debate as a derivative of the concept of institutionalized and pervasive error (Obermeyer et al., 2019). The black box’s “institutionalized error” is the direct extension of the obscurantism Donabedian flagged in 1981.

4.3. Theoretical Contribution: The Pedagogical Place of the 1981 Text

Graduate programs in health management and quality typically introduce Donabedian through the SPO framework. This approach renders invisible the 1981 text’s case for bringing self-critical, philosophically engaged, and epistemically rigorous gaps to the surface. An earlier analogue of this pedagogical gap can also be seen in the critical appraisals Morehead (1976) directed at the PSRO system: Morehead (1976) confirmed from within the field the “expert-time-consuming” nature of applying explicit criteria, and the fact that this critique has not been absorbed into curricula constitutes an intergenerational transmission loss. The proposed pedagogical procedure is therefore as follows: to add two major passages of the 1981 text to the 1966–1988 panorama (the caricature sentence and the “I must confess” sentence), and to lead students into the epistemic-risk zones of the SPO framework through these two passages.

This pedagogical procedure equips candidate quality managers with a strong intuition—drawn from the 1981 text—for the reliability–validity inverse relation, the accountability–context-sensitivity dilemma, and the problem of professional autonomy. This grasp will furnish the quality manager with a critically important epistemic sensibility, both in classical guideline-development projects (along the AGREE–GRADE–NICE line) and in the submission and auditing of algorithmic CDSS.

5. Conclusion

This study has re-read Donabedian’s marginalized 1981 Advantages and Limitations of Explicit Criteria text along three analytic dimensions: an epistemic-tension map, the AGREE–GRADE–NICE reverse reading, and an ML/CDSS counterfactual reading. The shared upshot of the findings is this: the 1981 text is not only a premonition but an ancestor text that carries the hidden DNA sequence of the post-2000 quality-management literature.

The text’s six-dimensional tension map (reliability, validity, accountability, context sensitivity, cost, professional autonomy) overlaps fully with the modern psychometric and administrative-accountability literatures. AGREE II’s six domains, GRADE’s dual matrix, and NICE’s committee deliberation layer are institutionalized forms of the dialectical structure in the 1981 text. In the ML-CDSS era, the three warnings of 1981 (caricature, two-edged sword, misplaced concreteness) have been empirically confirmed.

The study’s pedagogical contribution is that curricula introducing Donabedian’s SPO framework should include at least two major passages of the 1981 text. This inclusion will transform the candidate quality manager into a solution partner who understands the shared epistemic ground of the evidence-based guideline movement, pay-for-performance programs, and algorithmic CDSS debates.

Limitations of the study can be identified in three forms: (i) the Likert scores used in the epistemic-tension map rest on one coders; a broader coding team could yield a more balanced score distribution. (ii) The counterfactual reading projects the post-2015 stance of the 1981 text; Donabedian died in 2000, and so his actual stance cannot be known. (iii) The three frameworks examined (AGREE, GRADE, NICE) are tied to Anglophone traditions; separate analysis of additional frameworks such as Latin American and Asian guidelines is needed.

Future research directions are as follows: (a) full-text coding by expanding beyond the 20 key sentences in the 1981 text; (b) mapping the evolution of the “self-critical register” by reviewing in their entirety Donabedian’s confessional notes across his SPO career (1974, 1977, 1981, 1992); (c) quantitatively tracking which elements of the 1981 warning set break down and which hold up for LLM-based CDSS systems after 2025.

References

AGREE Collaboration. (2003). Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: The AGREE project. Quality and Safety in Health Care, 12(1), 18–23. https://doi.org/10.1136/qhc.12.1.18

Brouwers, M. C., Kho, M. E., Browman, G. P., Burgers, J. S., Cluzeau, F., Feder, G., Fervers, B., Graham, I. D., Grimshaw, J., Hanna, S. E., Littlejohns, P., Makarski, J., Zitzelsberger, L., & AGREE Next Steps Consortium. (2010). AGREE II: Advancing guideline development, reporting and evaluation in health care. Canadian Medical Association Journal, 182(18), E839–E842. https://doi.org/10.1503/cmaj.090449

Connell, A., Montgomery, H., Martin, P., Nightingale, C., Sadeghi-Alavijeh, O., King, D., Karthikesalingam, A., Hughes, C., Back, T., Ayoub, K., Suleyman, M., Jones, G., Cross, J., Stanley, S., Emerson, M., Merrick, C., Rees, G., Laing, C., & Raine, R. (2019). Evaluation of a digitally-enabled care pathway for acute kidney injury management in hospital emergency admissions. npj Digital Medicine, 2, 67. https://doi.org/10.1038/s41746-019-0100-6

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555

Donabedian, A. (1966). Evaluating the quality of medical care. The Milbank Memorial Fund Quarterly, 44(3), 166–206. https://doi.org/10.2307/3348969

Donabedian, A. (1980). Explorations in quality assessment and monitoring: Vol. I. The definition of quality and approaches to its assessment. Health Administration Press.

Donabedian, A. (1981). Advantages and limitations of explicit criteria for assessing the quality of health care. The Milbank Memorial Fund Quarterly. Health and Society, 59(1), 99–106. https://doi.org/10.2307/3349778

Donabedian, A. (1988). The quality of care: How can it be assessed? Journal of the American Medical Association, 260(12), 1743–1748. https://doi.org/10.1001/jama.1988.03410120089033

Fitzpatrick, T. B., Riedel, D. C., & Payne, B. C. (1962). The effectiveness of hospital use. In W. J. McNerney (Ed.), Hospital and medical economics: Services, costs, methods of payment, and controls (Vol. 1, pp. 449–455). Hospital Research and Educational Trust.

Greenfield, S., Lewis, C. E., Kaplan, S. H., & Davidson, M. B. (1975). Peer review by criteria mapping: Criteria for diabetes mellitus. The use of decision-making in chart audit. Annals of Internal Medicine, 83(6), 761–770. https://doi.org/10.7326/0003-4819-83-6-761

Guyatt, G. H., Oxman, A. D., Vist, G. E., Kunz, R., Falck-Ytter, Y., Alonso-Coello, P., & Schünemann, H. J. (2008). GRADE: An emerging consensus on rating quality of evidence and strength of recommendations. BMJ, 336(7650), 924–926. https://doi.org/10.1136/bmj.39489.470347.AD

Koppell, J. G. S. (2005). Pathologies of accountability: ICANN and the challenge of “multiple accountabilities disorder”. Public Administration Review, 65(1), 94–108. https://doi.org/10.1111/j.1540-6210.2005.00434.x

Morehead, M. A. (1976). P.S.R.O.: Problems and possibilities. Man and Medicine, 1(2), 113–123.

Morehead, M. A., Donaldson, R. S., Sanderson, S., & Burt, F. E. (1964). A study of the quality of hospital care secured by a sample of Teamster family members in New York City. Columbia University School of Administrative Medicine.

National Institute for Health and Care Excellence. (2014). Developing NICE guidelines: The manual (PMG20). NICE. https://www.nice.org.uk/process/pmg20

National Institute for Health and Care Excellence. (2018). Developing NICE guidelines: The manual (PMG20, 2018 update). NICE. https://www.nice.org.uk/process/pmg20

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342

Orlikowski, W. J. (2000). Using technology and constituting structures: A practice lens for studying technology in organizations. Organization Science, 11(4), 404–428. https://doi.org/10.1287/orsc.11.4.404.14600

Ross, C., & Swetlitz, I. (2018, July 25). IBM’s Watson supercomputer recommended “unsafe and incorrect” cancer treatments, internal documents show. STAT News. https://www.statnews.com/2018/07/25/ibm-watson-recommended-unsafe-incorrect-treatments/

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x

Skinner, Q. (1969). Meaning and understanding in the history of ideas. History and Theory, 8(1), 3–53. https://doi.org/10.2307/2504188

Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use (4th ed.). Oxford University Press.

Tetlock, P. E., & Belkin, A. (Eds.). (1996). Counterfactual thought experiments in world politics: Logical, methodological, and psychological perspectives. Princeton University Press.

Whitehead, A. N. (1925). Science and the modern world. Macmillan.

Wong, A., Otles, E., Donnelly, J. P., Krumm, A., McCullough, J., DeTroyer-Cooley, O., Pestrue, J., Phillips, M., Konye, J., Penoza, C., Ghous, M., & Singh, K. (2021). External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Internal Medicine, 181(8), 1065–1070. https://doi.org/10.1001/jamainternmed.2021.2626

Subscribe to the Health Topics Newsletter!

The Behavior Change Technique Taxonomy v1: A Common Language for Intervention Science
May 4, 2026
A guide to the 93-technique classification system that transformed how we design, report, and evaluate behavior change interventions — and…
Item-Level Content Overlap Analysis (Jingle–Jangle)
May 1, 2026
Concept, Scope, Hazards, Blind Spots, and Limitations 1. The Core of the Concept Item-level content overlap analysis is a psychometric…
Principles for Understanding Trust in Artificial Intelligence: A Conceptual Map and the Research Questions Left Open
April 29, 2026
Citation: Everett, J. A. C., Claessens, S., Knöchel, T.-D., & Reinecke, M. G. (2026). Principles for understanding trust in artificial…
A Step-by-Step Guide to Bibliometric Analysis Techniques: Methodological Advances and Healthcare Applications (2021–2026)
April 27, 2026
Mehmet Nurullah Kurutkan Düzce University, Faculty of Business, Department of Health Management Abstract Bibliometric analysis has, in five years, evolved…
The Sense and Nonsense of Effect Size: What Does Funder and Ozer’s (2019) Call Mean for Health Research?
April 26, 2026
The Position and Central Thesis of the Article David C. Funder and Daniel J. Ozer's article "Evaluating Effect Size in…
When Authorship Becomes Currency: A Critical Reading of the Authorship Misappropriation Diamond and Its Unfinished Agenda
April 25, 2026
A commentary on Oyenuga, Apata, Oladele, and Jeresa (2026), Ethics & Behavior, with reference to Ioannidis, Klavans, and Boyack (2018,…
Machine Learning in Surgical PROMs: A Critical Academic Appraisal and a Forward Research Agenda
April 24, 2026
Reviewed article: Alanezi T., Li B., Al-Omran L. et al. (2026). Machine learning in the development and application of patient-reported…
GLOBAL: The First Evidence-Based Reporting Guideline for Bibliometric Analyses
April 20, 2026
A Review and Critical Commentary for healthtopic.orgArticle Reviewed: Ng JY, Syed N, Zubashev D, Masood M, Liu H, Sabé M,…
When Respect Hides Discrimination: What 13,758 Older Turkish Adults Taught Us About Ageism
April 18, 2026
A deep dive into our new study in Aging & Mental Health, and why the determinants of perceived ageism are…
Do Hospital Accreditation Standards Rest on Solid Evidence? A Systematic Evidence Mapping of 1,599 Standards Reveals Structural Tensions
April 17, 2026
By Dr. Mehmet Nurullah Kurutkan Department of Health Management, Faculty of Business, Düzce University, Turkey The Unexamined Foundation of Hospital…