Health Management

Do Hospital Accreditation Standards Rest on Solid Evidence? A Systematic Evidence Mapping of 1,599 Standards Reveals Structural Tensions

Mehmet Nurullah KurutkanApril 17, 2026

By Dr. Mehmet Nurullah Kurutkan Department of Health Management, Faculty of Business, Düzce University, Turkey

The Unexamined Foundation of Hospital Accreditation

Hospital accreditation has become the dominant institutional mechanism for quality assurance across health systems worldwide. From the Joint Commission in the United States to the Australian Council on Healthcare Standards, from ISQua’s meta-standards to England’s Care Quality Commission, accreditation frameworks shape how hospitals organize care, train staff, and measure performance. Yet a fundamental question has received surprisingly little systematic attention: what is the evidence base of the standards themselves?

This is not the same as asking whether accreditation “works.” A substantial body of literature has examined accreditation’s impact on clinical outcomes, with mixed and often inconclusive results. Greenfield and Braithwaite (2008) found high heterogeneity and limited effectiveness evidence across 66 empirical studies. Brubakk et al. (2015) identified only three systematic reviews on accreditation’s impact on patient outcomes. Alhawajreh et al. (2023), screening 17,830 studies, could include only 21 — a ratio that speaks to the field’s methodological immaturity. Ibrahim et al. (2022), in a landmark BMJ Quality & Safety study, mapped the evidence level of Joint Commission R3 reports across 20 standards and 76 components, finding that 72% relied on Level 4–5 evidence.

But all of these studies ask “does accreditation improve outcomes?” rather than “what does each standard rest upon?” The distinction matters. A recent study from Turkey — the first to systematically map every single item of a national accreditation framework — offers a comprehensive answer and, in the process, generates conceptual tools with implications far beyond the Turkish context. This article synthesizes its core findings and discusses what they mean for accreditation systems globally.

Mapping an Entire National Standard: Scope and Method

Turkey’s Health Quality Standards (Sağlıkta Kalite Standartları, SKS) Version 6, developed by the Ministry of Health, is a mandatory accreditation framework for all hospitals. It comprises 5 dimensions, 46 chapters, 523 standards, and 1,599 assessment criteria — making it one of the most comprehensive national accreditation systems in operation.

The study applied a three-component hybrid design. First, systematic evidence mapping classified every item using the Oxford Centre for Evidence-Based Medicine (OCEBM 2011) hierarchy. Second, comparative evidence-gap mapping assessed each item against eight international reference families (WHO, JCI, NICE, AHA/ERC, ISO 15189, ASHRAE 170, AAMI, and ISQua), generating a seven-type gap typology. Third, framework synthesis — using Donabedian’s structure-process-outcome model as the initial scaffold — produced three overarching analytical themes from 312 codes consolidated into 29 descriptive themes.

The coding workflow itself represents a methodological innovation: a researcher-directed AI-assisted coding process where a large language model performed preliminary evidence searches and classifications, which the lead researcher then validated item by item against primary sources. A 15% random cross-check and full audit trail ensured trustworthiness under Lincoln and Guba’s (1985) framework.

Where the Evidence Is — and Where It Isn’t

The evidence distribution across the five dimensions reveals a striking gradient. Dimension 3 (Healthcare Services), representing the clinical core, accounts for 63% of all items and is the only dimension containing Level 1 evidence (systematic reviews of randomized controlled trials). Sixteen Level 1 evidence sources were identified, clustering in surgical safety (WHO Surgical Safety Checklist), intensive care (CLABSI bundle, Surviving Sepsis Campaign), emergency medicine (AHA/ERC CPR guidelines), and perinatal care (kangaroo care, active management of postpartum hemorrhage, continuous support during childbirth).

Dimension 1 (Institutional Services) — covering governance, quality management, human resources, and organizational infrastructure — relies almost entirely on Level 5 evidence. This is not a deficiency but an ontological feature: organizational interventions (committee structuring, governance mechanisms, reporting protocols) cannot be tested through randomized controlled trials. The OCEBM hierarchy was designed for clinical interventions; applying it to organizational domains creates a category mismatch that the study addresses through a novel D5 sub-layer taxonomy.

This taxonomy disaggregates Level 5 into six qualitatively distinct sub-layers: D5a (supranational normative authority — WHO/ISQua frameworks), D5b (national legislation), D5c (facility-specific operational standards), D5d (technical/engineering standards), D5e (consensus statements), and D5f* (non-evidentiary axioms). The asterisk on D5f* signals that these items fall outside the evidence pyramid entirely. Items like the ALARA principle in radiation safety or HEPA filtration standards in infection control rest on physics, not clinical trials — asking for RCT evidence is a category mistake. Similarly, patient rights declarations and informed consent requirements are ethical axioms, not testable hypotheses.

Dimension 4 (Support Services) proved the most epistemologically distinctive: laboratory standards align with ISO 15189, sterilization with AAMI ST79, and facility management with ASHRAE 170. Here, the evidence base is engineering science, not clinical medicine — a finding that challenges the implicit assumption that accreditation standards should be evaluated exclusively through clinical evidence hierarchies.

The Seven-Type Gap Typology: How National Standards Relate to International Norms

Rather than a binary “aligned/not aligned” classification, the study developed a seven-type comparative gap typology that captures the nuanced relationship between national and international standards:

Type b (full alignment) dominated at approximately 58% of the 368 matrix cells (46 chapters × 8 international reference families), confirming that Turkey’s standards are broadly concordant with international norms. Types a/a′ (numerical threshold differences) accounted for 12%, often explicable through local epidemiological data or resource constraints. Type c (structural expansion, where the national standard adds layers beyond the international reference) comprised 8% — for example, Turkey’s dual-layer surgical checklist implementation that adds a second verification step beyond the WHO original. Type e (missing items — present internationally but absent from SKS) at 9% flagged specific gaps, particularly in laboratory measurement uncertainty requirements (ISO 15189 clause 7.2.2), hospital pharmacy compounding standards (USP 797/800), and clinical decision support system standardization.

Most intriguing was Type d′ (regulatory integration): cases where Turkey did not merely adopt international evidence but integrated it with domestic legislation in ways that sometimes exceeded international precedents. Four examples stood out — a dual-layer surgical checklist, an anti-stigma standard translating Thornicroft et al.’s (2016) Lancet evidence into an auditable accreditation item, anti-ligature design specifications for psychiatric units operationalizing seven safety measures, and a triple legal safeguard for involuntary psychiatric admission integrating Turkish law with international human rights frameworks.

Three Analytical Themes with Global Implications

The Comprehensiveness Paradox

As an accreditation system expands its scope, implementation burden increases, evidence density per item decreases, and audit depth diminishes — yet narrowing scope risks creating critical safety gaps. This tension crystallizes along five axes: three drawn from existing literature (scope–depth, compliance–improvement, universality–contextuality) and two emerging from the data (measurement–improvement, resource–expectation).

Through the lens of DiMaggio and Powell’s (1983) institutional isomorphism, this paradox is not accidental but structural. Coercive isomorphism (regulatory mandates) drives standard expansion; normative isomorphism (evidence-based medicine expectations) demands depth. These two pressures work in opposite directions. Every health crisis triggers a “more standards” reflex; professional communities lobby for their domains to be included. The result is an accretion-only dynamic — standards are added but never systematically removed — creating what systems dynamics would call a “Limits to Growth” archetype.

Of the 46 chapters analyzed, 15 exhibited a “Reverse Direction” pattern (scope expansion weakening the evidence base), 3 showed “Mixed” patterns, 2 demonstrated “Model Integrity” (where scope and depth were preserved together), and 26 were coded as paradox-neutral. The paradox-neutral chapters were not randomly distributed: all 10 institutional chapters (inherently structural, not clinical), 3 rights-based chapters (ethical axioms, not amenable to evidence-axis analysis), and 13 clinical chapters with insufficient item density for pattern assignment fell into this category.

The Irreversibility Gradient

Across the 24 clinical chapters, the proportion of items anchored to Level 1 evidence or D5f* axioms (termed “core density”) ranges from 17% to 57%. This range is not random: organ transplantation (57%), dialysis (53%), and neonatal intensive care (31%) — areas where clinical errors are irreversible — show the highest core density. Physical medicine and rehabilitation (17%) — an iterative, adjustable process — shows the lowest.

This finding echoes Reason’s (2000) Swiss Cheese model: defense layers are thickest where hazards are most severe. It also connects to Rasmussen’s (1997) drift-to-danger principle: chapters with low core density have thinner safety margins and higher vulnerability to erosion. The practical implication is clear: in high-irreversibility areas, standards cannot be pruned without directly increasing safety risk. In low-irreversibility areas, structural items (committee requirements, reporting formats) offer more room for contextual adaptation.

The Smart Selectivity Principle

Not every Level 1 evidence source should automatically become an accreditation standard item. The Cochrane Library contains over 8,000 systematic reviews; a quality standard is not a clinical guideline. The study proposes a four-criterion decision matrix for determining whether a clinical evidence source warrants direct standard inclusion: (1) universal applicability (valid across all hospital types), (2) effect magnitude (clinically and statistically significant impact on mortality or morbidity), (3) implementation simplicity (achievable without advanced technology or subspecialty expertise), and (4) timelessness (evidence validity maintained for at least 10 years).

Each criterion is scored on a 1–5 Likert scale. Items scoring ≥16/20 are candidates for direct inclusion or process standard status; those scoring <16 are better served through “guideline reference” — the standard mandates having a protocol, but defers clinical specifics to the relevant guideline. The “timelessness” criterion is particularly novel: the WHO Surgical Safety Checklist (score: 5) has only grown stronger since 2009, while specific sepsis scoring systems (qSOFA → nSOFA → SOFA revision cycles) score low on timelessness and risk becoming outdated between standard revision cycles.

The Ontological Distinction: Quality Standard vs. Clinical Guideline

Perhaps the study’s most consequential theoretical contribution is its systematic formulation of the ontological difference between quality standards and clinical guidelines. A clinical guideline asks “what should be done?” and answers with intervention-specific, evidence-graded recommendations updated every 1–5 years. A quality standard asks “how can it be done safely?” and answers with structural safeguards — committees, checklists, protocols, monitoring systems — typically revised on 5–10 year cycles.

This distinction explains why criticizing organizational standards (Dimensions 1, 2, 5) for relying on Level 5 evidence fundamentally misunderstands what accreditation does. In clinical domains (Dimension 3), however, Level 5 evidence for items that could be supported at higher levels represents a genuine improvement opportunity. This nuance — that the same evidence level signals different things in different domains — demands what the authors call “dimension-sensitive analysis.”

A Five-Point Reform Agenda

The findings converge on a reform agenda applicable not only to Turkey but to accreditation systems globally:

Tiered standard architecture. Rather than treating all items as equally mandatory, standards should be stratified into three tiers: core safety (mandatory), managerial infrastructure (expected), and developmental goals (encouraged). Audit energy should concentrate on the first tier.

Evidence-based pruning. Items at evidence level D5c with no international equivalent should be downgraded from mandatory to guidance status. Items supported by Level 1–3 evidence and aligned with international core standards should remain mandatory. The political feasibility of pruning requires framing it as “status downgrade” rather than “removal” — shifting the legitimacy basis from political preference to scientific criteria.

Outcome-oriented transformation. A shift from predominantly process standards (“the infection control committee shall meet monthly”) toward outcome indicators (“hospital-acquired infection rates shall remain below threshold X”).

Continuous digital monitoring. Replacing periodic on-site audits with real-time indicator extraction and risk-based inspection, following the Care Quality Commission’s (2022) model.

Contextual flexibility. Moving from a one-size-fits-all standard set toward modular standards differentiated by hospital type and institutional maturity level (ISQua, 2021).

Cross-Dimensional Integration: The Hidden Architecture

A discovery that emerged only through whole-system analysis was the existence of four vertical integration chains cutting across dimensions: medication safety, patient safety, worker safety, and infection control. Each chain links institutional governance (Dimension 1) through rights-based frameworks (Dimension 2) to clinical implementation (Dimension 3) and indicator monitoring (Dimension 5). Of the 46 chapters, 38 contain at least one direct cross-reference to another chapter, with an average of 3.2 cross-links per chapter.

From a complex adaptive systems perspective (Plsek & Greenhalgh, 2001), this network exhibits emergence: the whole-system safety architecture is greater than the sum of individual chapter standards. This also means that modifying any single chapter can trigger domino effects across the network — a critical consideration for standard revision processes.

Implications for Global Accreditation Policy

While the empirical data come from Turkey, the conceptual frameworks developed — the Comprehensiveness Paradox, Irreversibility Gradient, Smart Selectivity Principle, D5 taxonomy, and seven-type gap typology — are designed to be transferable. The Comprehensiveness Paradox likely affects every accreditation system operating under institutional isomorphic pressures. The Irreversibility Gradient provides a principled basis for risk-stratified standard design regardless of national context. The Smart Selectivity matrix offers a reproducible tool for any standard-setting body deciding which clinical evidence to incorporate directly versus reference indirectly.

Future research priorities include: cross-national application of the gap typology to JCI, Accreditation Canada, and ACHS systems; empirical validation of the Irreversibility Gradient against clinical outcome data; inter-rater reliability studies among accreditation surveyors; Delphi panel validation of the Smart Selectivity rubric; and multi-country comparative studies to determine whether the Comprehensiveness Paradox is a universal feature of accreditation or context-specific.

The broader message is clear: accreditation standards deserve the same epistemic scrutiny we apply to the clinical interventions they regulate. Moving beyond “does accreditation work?” to “what does accreditation rest upon?” opens a more productive research agenda — one that can actually inform the design of better, leaner, more evidence-conscious quality systems.

References

Alhawajreh, M. J., Paterson, A., & Jackson, W. J. (2023). Impact of hospital accreditation on quality improvement in healthcare: A systematic review. PLoS ONE, 18(12), e0294180.

Braithwaite, J., Greenfield, D., Westbrook, J., Pawsey, M., Westbrook, M., Gibberd, R., … & Lancaster, J. (2010). Health service accreditation as a predictor of clinical and organisational performance: a blinded, random, stratified study. BMJ Quality & Safety, 19(1), 14-21.

Brubakk, K., Vist, G. E., Bukholm, G., Barach, P., & Tjomsland, O. (2015). A systematic review of hospital accreditation: the challenges of measuring complex intervention effects. BMC health services research, 15(1), 280.

Care Quality Commission. (2022). Our strategy from 2021. London: CQC.

DiMaggio, P. J., & Powell, W. W. (1983). The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields. American Sociological Review, 48(2), 147–160.

Dixon-Woods, M. (2011). Using framework-based synthesis for conducting reviews of qualitative studies. BMC Medicine, 9, 39.

Donabedian, A. (1966). Evaluating the quality of medical care. Milbank Memorial Fund Quarterly, 44(3), 166–206.

Greenfield, D., & Braithwaite, J. (2008). Health sector accreditation research: A systematic review. IJQHC, 20(3), 172–183.

Hinchcliff, R., Greenfield, D., Moldovan, M., Westbrook, J. I., Pawsey, M., Mumford, V., & Braithwaite, J. (2012). Narrative synthesis of health service accreditation literature. BMJ quality & safety, 21(12), 979-991.

Ibrahim, S. A., Reynolds, K. A., Poon, E., & Alam, M. (2022). The evidence base for US joint commission hospital accreditation standards: cross sectional study. BMJ, 377.

ISQua. (2021). Clarifying the Concept of External Evaluation (White Paper). Dublin: ISQua.

Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Sage.

OCEBM Levels of Evidence Working Group. (2011). The Oxford 2011 Levels of Evidence. Oxford Centre for Evidence-Based Medicine.

Plsek, P. E., & Greenhalgh, T. (2001). The challenge of complexity in health care. BMJ, 323(7313), 625–628.

Rasmussen, J. (1997). Risk management in a dynamic society. Safety Science, 27(2–3), 183–213.

Reason, J. (2000). Human error: Models and management. BMJ, 320(7237), 768–770.

Thornicroft, G., Mehta, N., Clement, S., Evans-Lacko, S., Doherty, M., Rose, D., … & Henderson, C. (2016). Evidence for effective interventions to reduce mental-health-related stigma and discrimination. The Lancet, 387(10023), 1123-1132.

Dr. Mehmet Nurullah Kurutkan is Associate Professor of Health Management at Düzce University, Turkey. His research focuses on healthcare quality, accreditation systems, and evidence-based health policy.

This article is based on: Kurutkan, M. N. (2026). Evidence base of Turkey’s Health Quality Standards Hospital Set (SKS v6): A multi-approach systematic mapping of 1,599 items. [Manuscript under review].

Subscribe to the Health Topics Newsletter!

The Behavior Change Technique Taxonomy v1: A Common Language for Intervention Science
May 4, 2026
A guide to the 93-technique classification system that transformed how we design, report, and evaluate behavior change interventions — and…
Item-Level Content Overlap Analysis (Jingle–Jangle)
May 1, 2026
Concept, Scope, Hazards, Blind Spots, and Limitations 1. The Core of the Concept Item-level content overlap analysis is a psychometric…
Principles for Understanding Trust in Artificial Intelligence: A Conceptual Map and the Research Questions Left Open
April 29, 2026
Citation: Everett, J. A. C., Claessens, S., Knöchel, T.-D., & Reinecke, M. G. (2026). Principles for understanding trust in artificial…
A Step-by-Step Guide to Bibliometric Analysis Techniques: Methodological Advances and Healthcare Applications (2021–2026)
April 27, 2026
Mehmet Nurullah Kurutkan Düzce University, Faculty of Business, Department of Health Management Abstract Bibliometric analysis has, in five years, evolved…
The Sense and Nonsense of Effect Size: What Does Funder and Ozer’s (2019) Call Mean for Health Research?
April 26, 2026
The Position and Central Thesis of the Article David C. Funder and Daniel J. Ozer's article "Evaluating Effect Size in…
When Authorship Becomes Currency: A Critical Reading of the Authorship Misappropriation Diamond and Its Unfinished Agenda
April 25, 2026
A commentary on Oyenuga, Apata, Oladele, and Jeresa (2026), Ethics & Behavior, with reference to Ioannidis, Klavans, and Boyack (2018,…
Machine Learning in Surgical PROMs: A Critical Academic Appraisal and a Forward Research Agenda
April 24, 2026
Reviewed article: Alanezi T., Li B., Al-Omran L. et al. (2026). Machine learning in the development and application of patient-reported…
GLOBAL: The First Evidence-Based Reporting Guideline for Bibliometric Analyses
April 20, 2026
A Review and Critical Commentary for healthtopic.orgArticle Reviewed: Ng JY, Syed N, Zubashev D, Masood M, Liu H, Sabé M,…
The Limits Of Explicit Criteria: An Epistemic-Tension Analysis of Donabedian’s 1981 Text, a Reverse Reading Through AGREE–GRADE–NICE, and a Counterfactual Interpretation for the Machine-Learning Era
April 19, 2026
Abstract This study examines Avedis Donabedian's infrequently cited 1981 text, Advantages and Limitations of Explicit Criteria for Assessing the Quality…
When Respect Hides Discrimination: What 13,758 Older Turkish Adults Taught Us About Ageism
April 18, 2026
A deep dive into our new study in Aging & Mental Health, and why the determinants of perceived ageism are…