Swiss Cheese Model for AI Safety

Mehmet Nurullah KurutkanJuly 1, 2025

This paper, titled “Swiss Cheese Model for AI Safety: A Taxonomy and Reference Architecture for Multi-Layered Guardrails of Foundation Model Based Agents”, authored by Md Shamsujjoha, Qinghua Lu, Dehai Zhao, and Liming Zhu from Data61, CSIRO, Australia, addresses the growing concerns surrounding AI safety in Foundation Model (FM)-based agents.

The paper highlights that while FM-based agents are revolutionizing application development due to their versatile nature and ability to adapt to a wide range of tasks, their rapidly growing capabilities and autonomy introduce significant challenges for AI safety. These challenges include the potential for generating harmful or offensive content, producing dangerous or unintended outcomes, and spreading disinformation and misinformation. Existing guardrail approaches are often insufficient, as they primarily focus on functional correctness and are typically single-layered, applied narrowly to specific agent artifacts, which cannot effectively manage the inherent autonomous and non-deterministic nature of FM-agents. If a single guardrail fails, associated risks may bypass it, potentially impacting the final results.

To address these critical issues, the authors propose a robust solution: multi-layered guardrails. The core contributions of this paper are:

A Comprehensive Taxonomy of Runtime Guardrails: Based on a systematic literature review (SLR), the paper presents a comprehensive taxonomy to categorize runtime guardrails from a software architecture perspective. This taxonomy comprises two primary categories:
- Quality Attributes: These are essential for designing runtime guardrails, ensuring they meet critical performance, security, and reliability goals. Key attributes include:
  - Accuracy (mitigating hallucinations, misinformation, disinformation).
  - Efficiency (preventing resource-intensive tasks, endless loops).
  - Privacy (handling sensitive data, preventing leakage).
  - Security (protecting from malicious activities, data breaches, adversarial attacks).
  - Safety (preventing harmful or misleading outputs).
  - Fairness (addressing bias and discrimination).
  - Compliance (adhering to legal and regulatory standards, copyright protection).
  - Generalizability (functioning effectively across diverse scenarios without prior configurations).
  - Customizability (providing tailored protection to meet specific requirements).
  - Adaptability (adjusting and remaining effective under varying conditions).
  - Traceability (tracking and recording origins, processes, and decision paths).
  - Portability (being easily adapted and applied across different FM-based agents).
  - Interoperability (working seamlessly across differing agents and technologies).
  - Interpretability (clarity and transparency of guardrail operations).
- Design Options: These represent practical approaches for implementing guardrails. They include:
  - Actions: Such as Block, Filter, Flag, Modify, Validate, Parallel calls, Retry, Fall back, Human intervention, Defer, Isolate, Redundancy, and Evaluate.
  - Targets: Guardrails can be applied to various elements, including Pipelines (Prompts, Intermediate Results, Final Results) and Artifacts (Goals, Context, Memory, Reasoning, Plans, Workflows, Tools, Knowledge Bases, Other Agents, FMs, Execution Time).
  - Rules: Uniform, priority-enabled, context-dependent, and negotiable (hard/soft).
  - Applicability Scope: Industry, organizational, team, and user levels.
  - Modality: Single modal (text, image, audio) or multimodal.
  - Underlying Models: Rule-based, hybrid, and machine learning models (narrow models and FMs).
A Novel Reference Architecture for Multi-Layered Guardrails: Inspired by the Swiss Cheese Model, the paper proposes a reference architecture for designing multi-layered runtime guardrails for FM-based agents. In this model, each “cheese slice” represents a protective layer within the agent system. These layers are designed to protect specific quality attributes (e.g., privacy, security), specific pipeline stages (e.g., prompts, intermediate results, final results), and agent artifacts (e.g., goals, plans, tools). The key insight is that while each layer may have its own weaknesses (i.e., “holes”), these holes are positioned differently across layers. Therefore, the combined layers create a robust defense against failures, ensuring that if one layer fails, another can catch and mitigate the issue. The architecture also incorporates an AgentOps infrastructure for continuous monitoring and logging, feeding data back to activate relevant guardrails.

This proposed taxonomy and reference architecture aim to provide concrete and robust guidance for researchers and practitioners to build AI-safety-by-design from a software architecture perspective. The methodology employed in this study involved a systematic literature review to identify relevant research and synthesize findings. The authors also discuss potential threats to validity, such as search and selection bias, and the generalizability of guardrails, suggesting continual re-evaluation and refinement.

The paper concludes by setting the stage for future work, which includes developing guardrail services for a scientific agent platform that will implement the proposed reference architecture and integrate the various design options outlined in the taxonomy.

Reference: Shamsujjoha, M., Lu, Q., Zhao, D., & Zhu, L. (2025, March). Swiss Cheese Model for AI Safety: A Taxonomy and Reference Architecture for Multi-Layered Guardrails of Foundation Model Based Agents. In 2025 IEEE 22nd International Conference on Software Architecture (ICSA) (pp. 37-48). IEEE.

Video

Podcast Link

https://notebooklm.google.com/notebook/685bacd2-76e5-48cb-bf94-8d8feb4c3ef3/audio

Subscribe to the Health Topics Newsletter!

When One Method Is Not Enough: The Multimethod SEM Framework for Rigorous Research
March 12, 2026
Physicians rarely rely on a single diagnostic test when confronting a complex disease. They combine imaging, laboratory work, and genetic…
Can Generative AI Strengthen Critical Thinking? A Pedagogical Framework for LLM Integration in Higher Education
March 12, 2026
The rapid integration of large language models (LLMs) such as GPT-4 and DeepSeek R1 into higher education has generated considerable…
Analysis theories on artificial intelligence, ChatGPT, data science, and metaverse
February 15, 2026
The rapid convergence of artificial intelligence, data science, generative AI systems such as ChatGPT, and immersive environments like the metaverse…
Lotus Protocol: A New Approach to Systematic Reviews
February 13, 2026
The article How to Conduct a Multi-Domain Systematic (Literature) Review? Guidelines Using The Lotus Protocol addresses a growing methodological gap…
The Health Benefits of Voluntary Simplicity
February 12, 2026
Voluntary simplicity is a multidimensional lifestyle orientation that refers to individuals’ conscious reduction of consumption levels in order to build…
Reviewer Fatigue and the Future of Peer Evaluation
February 11, 2026
The contemporary academic publishing ecosystem is sustained by peer review, a system widely regarded as the epistemic backbone of scientific…
Factors Driving 30-Day ED Revisits in Older Patients
February 11, 2026
Population ageing has transformed emergency care demand patterns worldwide, placing unprecedented pressure on emergency departments (EDs) and exposing systemic gaps…
Addressing Care Worker Burnout: Key Findings
February 11, 2026
The growing complexity of long-term care needs, combined with chronic workforce shortages, has positioned nursing homes among the most psychologically…
Impact of Loneliness on Quality of Life in Older Adults
February 10, 2026
This article, titled “Loneliness as a Predictor of Quality of Life in Older Adults Receiving Primary Health Care in Türkiye:…
Analysis of Patient Participation: Trends and Insights
February 10, 2026
The growing emphasis on patient-centered healthcare has transformed the role of patients from passive recipients of care into active partners…

Swiss Cheese Model for AI Safety

Video

Podcast Link

Subscribe to the Health Topics Newsletter!

Related Posts