Reversibility Policy Primitive

Policy Intelligence Series

Reversibility is becoming a core policy primitive for agentic AI, because governance must classify undoability before actions execute.

Published on

Jun 26, 2026

Subscribe to our newsletter

Governance primitives

Reversibility is the policy primitive agentic AI is missing

Three regulatory frameworks now name it explicitly. The standards community has acknowledged the gap. No major governance platform has shipped it as a typed property.

May 2026

In October 2025, a developer named Mike Wolak filed GitHub issue #10077 against Claude Code. He had authorized the agent to rebuild a project and was not bypassing its permission system. The agent issued a recursive delete that began at the root of his filesystem, wiping every user-owned file in his home directory and stopping only where system permissions held. The exact command was never even logged. Every policy check had passed: the agent had the right identity, the right permissions, and a recognizable purpose. What no policy had evaluated was whether that action could be undone.

That is the gap. Not an edge case, but the structural gap in how agent policy works today.

Agentic AI risk is not proportional to how often an agent acts, but to how many of its actions cannot be undone. Three regulators have now named reversibility explicitly, two academic centers treat it as a precondition for working controls, and the vendor ecosystem is converging on the vocabulary from opposite directions: recovery (Rubrik), pre-execution policy (AWS Bedrock AgentCore, Microsoft AGT), identity (WorkOS, Oasis). What is missing across all of them is reversibility as a typed, structured property of every tool an agent can call. Until that primitive exists, “human in the loop” is a posture, not an enforceable architecture.

The fourth dimension

Most policy engines evaluate three things when an agent attempts a tool call: who the agent is, what action it is trying to take, and what data is involved. Those three dimensions cover a large proportion of agentic risk. They do not cover the dimension that determines whether a mistake becomes an incident.

Two tool calls can be identical across identity, scope, and data sensitivity and produce outcomes with entirely different blast radii, depending on one variable: whether the action is recoverable.

Action reversibility reference

ACTION	REVERSIBILITY	WHY IT MATTERS
Send email to customer	Irreversible	Customer has seen it; retraction is worse than the error
Draft email in outbox	Fully reversible	Human can delete the draft before send
`DELETE FROM orders WHERE customer_id = 42`	Reversible within recovery window	Backup must exist and be restorable inside the recovery window
Wire $50,000 via SWIFT	Irreversible	Recall is legal/operational and rarely completes
Submit FDA 510(k) filing	Procedurally reversible	Withdrawal creates regulatory record regardless of outcome
Update record in internal CRM	Reversible with cost	Version history exists; rollback is one click

A million read-only operations produce a log. One irreversible write produces an incident. The frequency of agent actions is noise. The recoverability of agent actions is signal. A logistics operator dispatching a route, a hospital system updating a clinical record, a media organization publishing to wire, a financial institution clearing a payment: each crosses a different reversibility threshold, and each demands a different policy response.

Reversibility is the fourth axis. Today’s policy engines ask who the agent is, what it is doing, and what data it touches. The question they do not ask is whether the action can be undone.

What regulation now requires

The regulatory and standards communities have arrived at this conclusion from independent directions. Three frameworks name reversibility with statutory or normative precision.

EU AI Act. The EU AI Act anchors reversibility in three distinct places in the statutory text.

Article 14(4)(d) addresses human oversight of high-risk AI systems. It requires that natural persons assigned to oversight must be able to “disregard, override or reverse the output” of a high-risk AI system in any particular situation. This is a capability requirement: the human must be architecturally positioned to reverse the system’s output.

Article 3(49) defines a serious incident as including a serious and irreversible disruption of the management or operation of critical infrastructure. That definition is the legal threshold triggering mandatory reporting under Article 73.

Article 60(4)(k) conditions real-world testing of high-risk AI systems on outcomes being effectively reversed and disregarded. Irreversibility is not a risk factor in this provision. It is a blocking condition for operating in a production testing environment.

The September 2025 draft guidance from the European Commission operationalizes irreversibility directly in the reporting clock. A serious incident triggers a 15-day notification window. Awareness that a death may have been caused compresses that to 10 days. A serious and irreversible disruption of critical infrastructure triggers a 2-day obligation. Irreversibility compresses the response window by an order of magnitude.

Singapore IMDA. On 22 January 2026, at the World Economic Forum, Singapore’s Minister Josephine Teo launched the IMDA Model AI Governance Framework for Agentic AI. The IMDA press release describes it as the first agentic AI governance framework of its kind in the world.

The framework names reversibility as one of the explicit factors organizations must evaluate before deploying an agent, alongside scope of actions and level of autonomy. Irreversible actions require human approval gates before execution. Bird and Bird’s implementation commentary notes the framework is particularly relevant for financial institutions and healthcare providers. Mayer Brown provides market-entry implementation guidance for enterprises operating in Asian markets. AsiaTechLens frames the operator stakes: the framework already functions as a procurement and audit baseline, requiring operators to bound autonomy, prove oversight, and design rollback before scaling.

NIST and CSA Labs. The Agentic Profile widely cited in governance discussions is a proposed profile published by CSA Labs, not an official NIST document. CSA Labs argues that the NIST AI RMF lacks a risk area corresponding to the harm potential of autonomous actions, and that adequate agentic governance requires an action-consequence risk area covering scope, reversibility, and interconnectedness. NIST AI RMF implementation guidance does include the direction to “factor in reversibility and remediation difficulty” as part of risk treatment, though that anchor lives at the implementation-guidance level, not in the core framework.

Reversibility has moved from a useful concept to a statutory threshold in less than eighteen months. The standards community is publicly asking for the primitive that operationalizes it.

The classification problem

Practitioners have started classifying tool calls by reversibility: read-only, reversible cheaply, reversible with cost, irreversible. The classification lives in best-practice guidance and blog posts, not in policy engines.

WorkOS’s guidance on securing agentic applications states the position directly: “Classify tools and actions by reversibility, blast radius, and sensitivity of data touched.” The pattern repeats across the practitioner security literature.

Ron Fybish, in his CISO guide to agentic AI security, names the lever plainly: “Reversibility is the single strongest lever in agent security.” His guidance recommends listing every tool in each agent’s scope and classifying it as reversible, reversible-with-effort, or irreversible, then applying proportionally stronger controls as the classification approaches irreversibility.

The problem is not the vocabulary. The problem is that the classification lives in a spreadsheet or a blog post, not in a typed schema property that a policy engine can read, validate, and act on.

A policy engine cannot evaluate a Confluence page or a classification document at request time. It needs a schema property declared at tool registration, readable at sub-millisecond speed, and versionable alongside the tool itself. That is the structural gap between knowing a tool is dangerous and being able to enforce that knowledge at the moment an agent attempts to use it.

What the vendor ecosystem is actually shipping

Five distinct positions have emerged across the governance and infrastructure landscape. None of them is reversibility as a typed policy primitive.

Rubrik Agent Rewind, launched in August 2025, takes a post-hoc recovery position. It isolates an agent’s specific changes, including file edits, configuration updates, database mutations, and code modifications, and reverses them selectively rather than restoring the full system. The product addresses a genuine need. Post-hoc recovery and pre-execution enforcement are, however, distinct capabilities. By the time Rubrik acts, the SWIFT wire has already cleared.

AWS Bedrock AgentCore Policy, launched at re:Invent in December 2025 and built on the Cedar policy language, operates pre-execution. It enforces at the tool level with default-deny semantics. AWS draws a clear distinction: its content filtering layer governs what agents say; AgentCore Policy governs what agents do. The Cedar schema does not include a reversibility tier as a typed property. Identity, action type, and resource are the evaluation dimensions. Recoverability is not.

Microsoft’s Agent Governance Toolkit, open-sourced in April 2026, ships tamper-evident Merkle-chained audit logs and sub-millisecond pre-execution policy enforcement. The audit architecture is serious engineering work. Reversibility is not a typed schema property in the toolkit’s policy language.

A fourth position is occupied by identity and access management platforms. Oasis applies a four-decision gate to agent access: allow, warn, step-up, or deny. Zenity, named by Gartner as a leading governance platform, focuses on AI security posture management and compliance mapping. Salesforce’s Agentforce enforcement layer enforces risk-tier-aware blocking within its own ecosystem. None of the three treats reversibility as a typed property customers can declare, extend, or compose in their own policy.

WorkOS operates at the identity and access layer. Its published guidance on action classification is the clearest articulation of the typed-reversibility argument available in the practitioner literature. WorkOS does not ship a runtime policy engine. The commentary identifies a gap that WorkOS cannot address from its architectural position.

LangGraph’s interrupt() primitive is the de facto reversibility-aware human-in-the-loop mechanism in open-source agent frameworks. Practitioner guidance recommends applying it specifically to irreversible, high-blast-radius actions. It is a framework primitive, not a governance substrate with typed schemas, audit trails, or composable policy primitives.

Anthropic’s Claude Code auto mode uses a classifier that screens for destructive actions before execution. The classifier is reversibility-aware in its operational effect. The reversibility property is implicit in the classifier’s training, not a typed schema property that external policy rules can read, extend, or compose over.

The vocabulary has spread broadly across the ecosystem. No major governance platform has shipped the typed schema property.

The fragmentation problem

The absence of reversibility as a typed primitive creates a failure surface that extends well beyond individual tool calls. This is the sharpest argument for why the primitive must be declared at the tool level, not described in documentation.

The 2026 arXiv paper The Controllability Trap, which proposes a governance framework for military AI agents, formalizes six failure modes. The F4 mode, Commitment Irreversibility, generalizes cleanly beyond that setting. The F4 finding: “individually minor, individually authorised tool calls can cumulatively cross irreversibility thresholds.” If a policy gate only sees one call at a time, an agent can fragment an irreversible action across many individually permissible steps and never trigger a rule.

A policy engine that evaluates each call in isolation cannot detect this pattern. The fragmentation is not a bug in agent behavior. It is a structural limitation of session-blind policy evaluation.

The UC Berkeley CLTC’s Agentic AI Risk-Management Standards Profile identifies, among its risk-management levers, three that bear directly on reversibility: human control and accountability, system-level risk assessment, and continuous monitoring and post-deployment oversight. Stanford CodeX's Kahana analysis of the Profile shows that all three map onto the same structural dependency, namely that each lever assumes the ability to distinguish recoverable from unrecoverable actions at the point of execution. All three controls depend on knowing which actions within a session are recoverable and what the cumulative recoverability exposure is at any point. If that property is not tracked across a session, the controls the profile prescribes cannot function as specified.

The executive framing emerged separately. The Yale Chief Executive Leadership Institute published a CEO deployment matrix in May 2026 with proximity-to-customer on one axis and reversibility on the other, amplified in Fortune. A business school building its deployment framework around reversibility is the same gap the arXiv paper formalizes: neither works without knowing which actions are recoverable.

Stanford CodeX’s AILCCP framework makes the dependency explicit. Kahana's Rate and Scope Limiter control throttles how often an agent acts, how much it can spend, and how far its blast radius extends, holding compounding actions below the threshold at which a kill switch becomes the only remaining option. A limiter that weights actions by blast radius cannot distinguish a reversible write from an irreversible one without a typed reversibility property to read. The control mechanism assumes the primitive the ecosystem has not shipped. The control cannot be implemented without the primitive it requires.

“It needs controls that survive contact with the systems they are meant to govern.”

Eran Kahana, Stanford CodeX, Kill Switches Don’t Work If the Agent Writes the Policy (March 2026)

Solving the fragmentation problem requires sessionized reversibility scoring. That requires reversibility as a typed, structured property at the tool level. A session-level accumulator cannot aggregate what individual actions have not declared.

What this means in practice

The case is clear. Three statutory or normative frameworks name reversibility explicitly. Two academic research centers treat it as a precondition for working controls. The vendor ecosystem has converged on the vocabulary from five distinct positions without producing the primitive that ties them together. The gap is not conceptual. It is structural.

Three practical asks

For CISOs: ask your vendors whether reversibility is a typed property of every tool registration or a category in their technical documentation. The difference is the distance between a policy that can enforce a constraint and a document that describes one. The EU’s 2-day incident reporting clock is triggered by irreversibility. Your vendor’s answer determines whether you are architecturally ready to meet it.
For builders: classify your tool catalog before your compliance or procurement team asks for it. The IMDA framework and EU AI Act Article 60(4)(k) both assume the classification exists before deployment, not during an audit.
For analysts: stop treating agent governance as a single category. Pre-execution enforcement, post-hoc recovery, and identity controls are three distinct capabilities. Reversibility is the dimension that connects them, and the current ecosystem provides each without a shared primitive that makes them interoperable.

OpenBox AI is the runtime governance platform that treats reversibility as a first-class typed property in its policy substrate.

Sources

Original-publisher sources, accessed 12 June 2026.

anthropics/claude-code. Issue #10077: Claude Code executed a recursive delete affecting the entire home directory. GitHub, 21 October 2025. https://github.com/anthropics/claude-code/issues/10077

European Parliament and Council. Regulation (EU) 2024/1689 (AI Act), Articles 3(49), 14(4), 60(4) and 73. https://artificialintelligenceact.eu/article/14/

European Commission. Draft guidance and reporting template for serious AI incidents under Article 73, 26 September 2025. https://digital-strategy.ec.europa.eu/en/consultations/ai-act-commission-issues-draft-guidance-and-reporting-template-serious-ai-incidents-and-seeks

Infocomm Media Development Authority. Model AI Governance Framework for Agentic AI, announced by Minister Josephine Teo, 22 January 2026. https://www.imda.gov.sg/resources/press-releases-factsheets-and-speeches/press-releases/2026/new-model-ai-governance-framework-for-agentic-ai

Bird & Bird. Singapore Introduces New Model AI Governance Framework for Agentic AI, January 2026. https://www.twobirds.com/en/insights/2026/singapore/singapore-introduces-new-model-ai-governance-framework-for-agentic-ai

Mayer Brown. Singapore’s Agentic AI Framework: Practical Guidance for Market Entry, April 2026. https://www.mayerbrown.com/en/insights/publications/2026/04/singapores-agentic-ai-framework-practical-guidance-for-market-entry

AsiaTechLens. Agentic AI Can Act: Singapore’s New Guidelines for Agents. https://www.asiatechlens.com/p/agentic-ai-can-act-singapore-new-guidelines-agents-china

CSA Labs. Agentic AI NIST AI RMF Profile, version 1. https://labs.cloudsecurityalliance.org/agentic/agentic-nist-ai-rmf-profile-v1/

National Institute of Standards and Technology. AI Risk Management Framework. https://www.nist.gov/itl/ai-risk-management-framework

Subramanyam Sahoo. The Controllability Trap: A Governance Framework for Military AI Agents. arXiv:2603.03515, March 2026. https://arxiv.org/abs/2603.03515

UC Berkeley Center for Long-Term Cybersecurity (Madkour, Newman, Raman, Jackson, Murphy and Yuan). Agentic AI Risk-Management Standards Profile, February 2026. https://cltc.berkeley.edu/publication/agentic-ai-risk-profile/

Eran Kahana. Kill Switches Don’t Work If the Agent Writes the Policy: The Berkeley Agentic AI Profile Through the AILCCP Lens. CodeX, Stanford Law School, 7 March 2026. https://law.stanford.edu/2026/03/07/kill-switches-dont-work-if-the-agent-writes-the-policy-the-berkeley-agentic-ai-profile-through-the-ailccp-lens/

Jeffrey Sonnenfeld and colleagues, Yale Chief Executive Leadership Institute. Your Trusted Advocate or Your Rebellious Frankenstein. Fortune, 7 May 2026. https://fortune.com/2026/05/07/agentic-ai-customer-proximity-framework-ceos-yale-celi-sonnenfeld/

Ron Fybish. The CISO’s Guide to Agentic AI Security. CYBER WOW, April 2026. https://cyberwow.com/p/the-cisos-guide-to-agentic-ai-security

WorkOS. Best Practices for AI Agent Access Control. https://workos.com/blog/ai-agent-access-control-best-practices

Rubrik. Rubrik Unveils Agent Rewind for When AI Agents Go Awry, 12 August 2025. https://www.rubrik.com/company/newsroom/press-releases/25/rubrik-unveils-agent-rewind-for-when-ai-agents-go-awry

Amazon Web Services. Policy in Amazon Bedrock AgentCore, introduced as a preview at re:Invent, 2 December 2025 (generally available March 2026). https://aws.amazon.com/bedrock/agentcore/faqs/

Microsoft. Agent Governance Toolkit, open-sourced 2 April 2026 (MIT license). https://github.com/microsoft/agent-governance-toolkit

Oasis Security. Introducing Oasis Agentic Access Management. https://www.oasis.security/blog/introducing-oasis-agentic-access-management

Zenity. Platform. https://zenity.io/platform

Salesforce. Agentforce. https://www.salesforce.com/agentforce/

LangChain. Making It Easier to Build Human-in-the-Loop Agents with interrupt. https://www.langchain.com/blog/making-it-easier-to-build-human-in-the-loop-agents-with-interrupt

Anthropic. Claude Code. https://www.anthropic.com/product/claude-code