The article, “Evaluating the Relevance, Generalization, and Applicability of Research Issues in External Validation and Translation Methodology,” by Lawrence W. Green and Russell E. Glasgow (2006), provides a comprehensive analysis and actionable framework for addressing the critical gap between health research and its practical application.
The Problem: Neglect of External Validity:
The authors argue that recent developments in evidence-based medicine and public health have highlighted an “embarrassing” disconnect between scientific findings and practice. While there has been a strong emphasis on internal validity—the ability to confidently determine cause-and-effect, often achieved through rigorous experimental controls like Randomized Controlled Trials (RCTs)—there has been a significant and relative neglect of external validity. Internal validity provides the “strength of evidence,” but external validity assesses the “weight of evidence” – how relevant, generalizable, and applicable that evidence is to diverse real-world situations, populations, and settings. This imbalance makes much “strength of evidence” research less useful for everyday practice.
Internal vs. External Validity: The Imbalance
- Internal Validity: Historically, health professions have deeply internalized criteria for judging internal validity, such as Bradford Hill’s criteria and Koch’s postulates, which focus on proving causation. Campbell and Stanley’s (1963) influential work also emphasized “threats to internal validity,” while “threats to external validity” were “seldom referenced”. Reporting standards like CONSORT criteria primarily focus on internal validity. The justification for this focus was that without internal validity, external validity would be irrelevant.
- External Validity: The article asserts that this almost exclusive focus on internal validity has come at the cost of external validity, limiting the applicability of research to the varied circumstances of medical and public health practice. Regulatory agencies often base decisions more on the “weight of evidence” because human behavior and social change studies rarely establish causation unequivocally. The authors conclude that significant energy and resources must be redirected toward developing and applying criteria and measures for external validity. They advocate for “practice-based evidence” that explicitly addresses external validity and local realities, even if it means some trade-off in experimental control compared to academically based research.
Addressing the Gap: Solutions and Frameworks
The article proposes a multi-faceted approach to bridge the research-to-practice gap:
- Questions and Guides: Tools for practitioners, program planners, and policymakers to assess the applicability and generalizability of evidence to different situations and populations.
- Criteria for Reviewers: Standards to evaluate external validity and potential for generalization.
- Procedures for Adaptation: Methods for practitioners and planners to adapt evidence-based interventions and integrate them with local population/setting characteristics, relevant theory, and experience.
To support these, the authors discuss key theoretical and practical frameworks:
- Generalizability Theory (Cronbach et al., 1972): This theory identifies various “facets” across which program effects can be evaluated, summarized as “utoS”:
- Units (u): Individual patients, moderator variables, subpopulations.
- Treatments (t): Variations in treatment delivery or modality.
- Occasions (o): Patterns of maintenance or relapse over time.
- Settings (S): Clinics, worksites, schools where programs are evaluated. This theory also introduced concepts of robustness (consistency of effects across domains) and replication as crucial for the strength of evidence.
- Practical Clinical Trials (PCTs) (Tunis et al., 2003): These trials prioritize outcomes important to decision-makers (e.g., cost-effectiveness, quality of life) and use representative (or heterogeneous) samples of patients and settings, evaluating new treatments against realistic alternatives. PCTs aim to make research more relevant to real-world needs and circumstances.
- RE-AIM Framework (Glasgow et al.): The RE-AIM framework (Reach, Effectiveness, Adoption, Implementation, Maintenance) is specifically designed to aid the planning, conduct, evaluation, and reporting of studies with the goal of translating research into practice. Table 2 in the source provides definitions and evaluation questions for each dimension:
- Reach (individual level): The participation rate among the intended audience and the representativeness of those participants. Questions include: What percentage of the target population participated? Did it reach those most in need? Were participants representative of your practice setting?.
- Effectiveness (individual level): The impact on key outcomes, quality of life, consistency of effects across subgroups, and adverse impacts. Questions include: Did the program achieve its targeted outcomes? Did it produce unintended adverse consequences? How did it affect quality of life? What were the costs?.
- Adoption (setting and organizational levels): The participation rate and representativeness of settings (e.g., clinics, schools) in the evaluation. Questions include: Did low-resource organizations serving high-risk populations use it? Was it consistent with the organization’s mission, values, and priorities?.
- Implementation (setting and organizational levels): The level and consistency of delivery of program components across staff. Questions include: How many staff delivered the program? Were different components delivered as intended?.
- Maintenance (individual and setting levels): Long-term effectiveness for individuals and sustainability/adaptation of the program at the organizational level. Questions include: Did the program produce lasting effects at the individual level? Did organizations sustain the program over time? How did the program evolve?.
Program Adaptation and Evolution
The article highlights the tension between “fidelity” (adhering strictly to original intervention protocols) and “reinvention or customization” (adapting programs to fit local needs, resources, and clientele). While fidelity is important to ensure an intervention resembles the original evidence-based protocol, some degree of adaptation is almost always necessary and often desirable, especially in community settings.
The proposed solution lies in documenting:
- A limited set of key components or principles of an evidence-based program.
- The range of permissible adaptations that still retain the essential elements of the original intervention.
- Justifications for theory-driven and experience-driven deviations from recommendations, especially when related to local moderating variables and history.
External Validity Quality Rating Criteria
Green and Glasgow propose a preliminary set of 16 quality rating criteria for external validity, to be used in addition to existing guidelines like CONSORT. These are categorized under four main headings:
- Reach and Representativeness:
- Participation: Analyses of participation rates among potential settings, delivery staff, and patients/consumers.
- Target Audience: Explicit statement of the intended target audience for adoption (settings) and application (individuals).
- Representativeness—Settings: Comparisons of study settings to the intended target audience of program settings, or to those that declined participation.
- Representativeness—Individuals: Analyses of similarities and differences between participants and non-participants or the intended target audience (e.g., age, gender, education, income, race/ethnicity, medical conditions, health literacy, problem status).
- Program or Policy Implementation and Adaptation:
- Consistent Implementation: Data on the level and quality of implementation of different program components.
- Staff Expertise: Data on the training or experience required to deliver the program, or the quality of implementation by different staff types.
- Program Adaptation: Information on how different settings modified or adapted the program to fit local needs.
- Mechanisms: Data on the processes or mediating variables through which the program achieved its effects.
- Outcomes for Decision Making:
- Significance: Outcomes reported in a way comparable to clinical guidelines or public health goals.
- Adverse Consequences: Reporting of quality-of-life or potential negative outcomes.
- Moderators: Analyses of moderator effects, including different subgroups of participants and staff, to assess robustness versus specificity of effects.
- Sensitivity: Sensitivity analyses to assess dose-response effects, threshold levels, or points of diminishing returns.
- Costs: Data on program costs using standard economic or accounting methods.
- Maintenance and Institutionalization:
- Long-term Effects: Data reported on effects at least 12 months post-treatment.
- Institutionalization: Data on the sustainability, reinvention, or evolution of program implementation at least 12 months after formal evaluation.
- Attrition: Data on attrition by condition and analyses of the representativeness of those who drop out.
Filling Evidence Gaps: The Role of Theory, Experience, and Local Wisdom
Recognizing that experimental evidence will never cover all combinations of settings, populations, and circumstances, the authors emphasize the need to fill these gaps by blending science with the art of practice. This involves systematically combining:
- Theory: To generalize from existing evidence to local circumstances.
- Experience: Drawing on the tacit knowledge of other practitioners and planners.
- Local Data and Wisdom: Incorporating the intuitive understanding and familiarity of local stakeholders.
This systematic blending is crucial for creating locally appropriate programs and policies.
Alignment of Priority Determinants with Program Components
The article highlights two levels of alignment, drawing on the RE-AIM and PRECEDE-PROCEED models:
- Ecological Alignment: At the institutional or organizational level, aligning program components with the policy, regulatory, or organizational changes needed from groups, organizations, or communities (e.g., changing non-smoking policies, food choices in vending machines). These interventions often have greater external validity and can influence large numbers of people without direct persuasion.
- Individual Alignment: At the individual, behavioral, or family level, aligning specific predisposing, enabling, or reinforcing factors with program components for which evidence of effectiveness has been derived from more internally valid but less externally valid research.
Intervention Matching, Mapping, Pooling, and Patching
To systematically align evidence with local circumstances and populations, the authors propose a four-step process:
- Matching: Aligning intervention types with the ecological levels where they can have effects (e.g., community, schools, worksites, healthcare institutions, families, individuals). This is based on models like MATCH (Multilevel Approach to Community Health).
- Mapping: Using theories to link specific interventions from prior research to specific mediating (causal, intermediate) variables, and accounting for moderating variables (characteristics of setting, population, circumstances) that influence intervention-outcome relationships. Intervention mapping provides structured steps to use theory for filling evidence gaps.
- Pooling: Reviewing and collecting tacit knowledge and “best experience” from prior attempts to address the health problem, even if not formally published, often through professional networks.
- Patching: Integrating existing community-preferred interventions and indigenous wisdom to fill gaps in evidence-based “best practices” that may not fully address local needs or circumstances. The PATCH (Planned Approach to Community Health) model offers a way to incorporate indigenous wisdom and existing local programs.
Conclusion
The article strongly recommends the continued development and formalization of procedures for practitioners and planners to review and fill evidence gaps, alongside refining criteria for judging the generalizability and external validity of studies. This systematic blending of “top-down” research evidence with “bottom-up” local wisdom, theory, and experience is crucial for increasing the credibility and effective application of evidence-based practices, ultimately leading to better programs and practice.
Reference: Green, L. W., & Glasgow, R. E. (2006). Evaluating the relevance, generalization, and applicability of research issues in external validation and translation methodology. Evaluation & The Health Professions, 29(1), 126–153.
