Human-in-the-Loop Isn’t a Feature.

Regulatory Analysis

Human-in-the-loop is no longer a workflow preference but a regulatory requirement. This analysis explains why EU AI Act Article 14, DORA, and FCA accountability rules require oversight controls inside the authorization pipeline, and why most application-layer approval mechanisms fail to satisfy compliance obligations.

Published on

Jun 19, 2026

Subscribe to our newsletter

EU AI Act Article 14, DORA Articles 5–6, and FCA SM&CR converge on a single structural demand: human-in-the-loop (HITL) controls must be embedded into AI agent governance architectures at the authorization layer, not bolted onto application workflows after deployment. Mapping what each regulatory framework requires of human oversight in AI agent deployments, and why application-layer approval flows fail to satisfy any of them.

OpenBox | Compliance & Governance | 2026

Human-in-the-loop compliance is a legal obligation in 2026, not a product decision. EU AI Act Article 14, DORA, and FCA SM&CR each mandate structural human oversight of AI agents at the authorization layer. Most enterprises have implemented something that does not qualify, and the gap is now consequential.

01 The AI Agent Compliance Gap Opening in 2026

Three regulatory frameworks are converging on the same structural demand for human oversight in AI agent deployments. The EU AI Act (Regulation (EU) 2024/1689) had its Annex III high-risk AI system obligations (including Article 14 human oversight requirements) originally set for 2 August 2026. A provisional Digital Omnibus on AI agreement, reached 6 May 2026 and pending formal adoption in the Official Journal, has deferred those obligations to 2 December 2027. DORA has applied to EU financial entities since January 2025. The FCA has made explicit, through its principles-based framework and the Senior Managers and Certification Regime (SM&CR), that accountability for AI-driven decisions cannot be transferred to the system making them.

The problem is that most current HITL implementations are not built to satisfy any of these. They exist at the application layer: approval notifications sent to email or Slack, confirmation screens in custom UIs, workflow checkpoints with no structured record of who approved what, when, or why. Each framework, examined precisely, requires something architecturally different from what most enterprises currently have.

REGULATORY HITL REQUIREMENTS AT A GLANCE
Framework	Reference	Core HITL Obligation	Penalty Exposure
EU AI Act	Article 14 (Reg. (EU)2024/1689)	High-risk AI systems must allow natural persons to understand, monitor, and override agent operations during use.	Up to 3% of global annual turnover or €15M (Article 99(4))
DORA	Articles 5–6	Management body holds direct accountability for ICT risk decisions; AI-executed decisions do not dissolve that chain.	Supervisory measures; reputational and operational exposure
FCA / SM&CR	SM&CR	Named senior managers retain personal accountability for functions they oversee, regardless of whether an AI agent executes the action.	Personal regulatory liability for named senior managers
NIST AI RMF	GOVERN Function(AI RMF 1.0)	High-risk AI system outputs require designated human-in-the-loop checkpoints; oversight roles and accountability structures must be established and documented across the deployment lifecycle.	Voluntary baseline; mandatory for US federal contractors under OMB M-24-10

02 Human Oversight Requirements: What EU AI Act, DORA, FCA, and NIST AI RMF Require

EU AI Act Article 14: Human Oversight as a Runtime Architecture Requirement

Article 14 of the EU AI Act (Regulation (EU) 2024/1689) establishes human oversight obligations for high-risk AI systems. The requirement covers three distinct capabilities that must be operational during use: the ability of designated persons to understand what the system is doing and the reasoning behind its outputs; the ability to monitor system operation and detect anomalies in real time; and the ability to override, interrupt, or halt the system when necessary.

This is not a documentation requirement. It is a runtime architecture requirement for AI agent governance. The human oversight function must be capable and active during deployment, not described in a policy document. High-risk use cases in financial services, including creditworthiness assessments and insurance risk scoring, fall within the Act’s Annex III scope. For enterprises deploying agents that touch these decisions, Annex III obligations including Article 14 were originally set to apply from 2 August 2026, now deferred to 2 December 2027 under the provisional Digital Omnibus on AI agreement (6 May 2026, pending formal adoption in the Official Journal). Compliance preparation should proceed against that revised date; the architectural requirements imposed by Article 14 do not diminish with the extended timeline. The penalty exposure under Article 99(4) for non-compliance with high-risk AI system obligations reaches 3% of global annual turnover or €15 million, whichever is higher.

Article 73 of the EU AI Act adds a second architectural demand on the same governance system. When a serious incident involving a high-risk AI system occurs, providers must notify the relevant market surveillance authority within strict windows: 15 days for general serious incidents; 10 days where a death may have been caused; and 2 days for widespread infringements or severe disruption of critical infrastructure. The EC draft guidance on Article 73, published 26 September 2025, clarifies these thresholds and the documentation a compliant notification must contain. A HITL architecture that generates no structured governance record cannot support these timelines. The evidence package required for notification is specific: timestamps, decision context, approval records, and execution traces. It cannot be reconstructed from email threads or Slack logs after the fact.

DORA: Why AI Agent Deployments Cannot Break the Human Accountability Chain

The Digital Operational Resilience Act, applicable to EU financial entities since January 2025, approaches the same problem from an ICT governance angle. DORA’s risk management framework requirements place the management body of a financial entity as directly responsible for the management of ICT risk. That responsibility is not conditional on whether a human or an AI agent executed the relevant action.

For AI agents deployed in financial operations, the implication is precise: if an agent takes an action that affects operational resilience, and there is no evidence that the appropriate human oversight function was engaged and capable of intervening, the institution’s ICT governance posture has a demonstrable gap. Accountability chains in AI agent deployments are only as strong as their weakest documented link.

FCA SM&CR: Why AI Agents Cannot Inherit Personal Accountability from Senior Managers

The FCA’s principles-based framework does not prescribe HITL in the way Article 14 does. What it does is more consequential for enterprise AI governance: it preserves personal accountability at the senior manager level for every function they oversee, regardless of how that function is executed. Under SM&CR, a named senior manager responsible for a credit decisioning function remains personally accountable for AI-driven decisions made within that function. The FCA expects firms to demonstrate that human oversight was available, applied, and evidenced. An AI agent executing actions without a structured human oversight record creates direct exposure for the individuals whose names are on the accountability map.

NIST AI RMF: The GOVERN Function and Human Oversight Checkpoints

The NIST AI Risk Management Framework (AI RMF 1.0) is a voluntary US framework, mandatory for US federal contractors under OMB M-24-10 and increasingly adopted as an enterprise AI governance baseline internationally. Its GOVERN function establishes the policies, accountability structures, and oversight mechanisms that must operate across the full AI lifecycle. Within GOVERN, the RMF explicitly requires human-in-the-loop checkpoints for high-risk AI system outputs, with designated stakeholders identified for security, compliance, and decision authority for each deployment.

For enterprises operating under EU AI Act, DORA, or FCA obligations, NIST AI RMF provides a structurally consistent complementary baseline. The GOVERN function’s human oversight requirements align directly with Article 14’s runtime demands and DORA’s accountability chain requirements. Its adoption supports the documentation and evidence practices that all three binding frameworks require, and maps cleanly onto the authorization pipeline architecture described in Section 05.

Governance cannot be added after deployment and made to look like it was there from the start. The authorization pipeline is where compliance is built or broken.

03 Why Application-Layer HITL Fails EU AI Act, DORA, and FCA Scrutiny

The instinctive engineering response to a HITL requirement is to add an approval screen. A step in the workflow sends a notification, waits for a response, and then continues. This satisfies the surface appearance of human involvement. It satisfies none of the frameworks above.

No structured audit record

An email approval or Slack confirmation produces no governance event. There is no UTC timestamp tied to the specific agent action under review, no record of what information the approver saw, no link to the execution trace. When a regulator asks for evidence of human oversight on a specific decision, there is nothing to produce.

No connection to agent behavior

Application-layer HITL operates outside the agent’s decision logic. It cannot be triggered by the agent’s behavioural compliance history, its risk classification, or the specific nature of the action being requested. It is a fixed checkpoint rather than a risk-sensitive control.

No enforcement authority

An application-layer approval screen can be bypassed. If the notification system fails, if no response arrives within the expected window, if a runtime error occurs, the agent may continue without a recorded approval. The HITL mechanism has no authoritative relationship to the agent’s execution path.

No coherent audit chain

Even when an approval occurs, it lives in a system separate from the agent’s execution record. Correlating the two requires manual reconstruction. In a compliance examination, reconstructed evidence carries far less weight than contemporaneous structured records.

04 The Architectural Problem: HITL Must Sit Inside the Authorization Pipeline

The core issue is architectural. HITL implemented as a feature on top of an existing agent is positioned downstream from the authorization decision. The agent has already determined what it wants to do. The approval gate is interposed after that determination, by an external system, with no structural authority over the agent’s execution path.

What EU AI Act Article 14, DORA, and FCA frameworks require is different: an oversight function that is capable of understanding, monitoring, and interrupting the AI agent during operation, with evidence that this function was operative. That requires the HITL mechanism to sit inside the system that makes authorization decisions, not outside it. It needs to be triggered by the same logic that determines what an agent is allowed to do, and it needs to produce audit evidence that is part of the same immutable record as every other governance decision made about that agent.

This is a governance layer problem. Not a product feature problem.

05 Human-in-the-Loop in the AI Agent Authorization Pipeline

OpenBox approaches human-in-the-loop compliance as one of four possible governance verdicts that the authorization pipeline can issue for any agent action. When OpenBox evaluates an agent session, the pipeline runs through guardrails, OPA policies, and behavioural rules to produce one of four outcomes:

OPENBOX AUTHORIZATION PIPELINE

GUARDRAILS

Input & outputvalidation

OPA POLICIES

Policyevaluation

BEHAVIOURAL RULES

Stateful patterndetection

GOVERNANCE VERDICT

Per-action authorizationdecision

→

ALLOW

BLOCK

HALT

REQUIRE_APPROVAL

REQUIRE_APPROVAL is not an external notification. It is a governance verdict, produced by the same authorization pipeline that produces every other decision for that agent. Policies and behavioural rules, which use stateful multi-step pattern detection to flag sequences of operations across time rather than individual events in isolation, trigger REQUIRE_APPROVAL gates based on the logic configured for that agent’s risk profile. Guardrails perform input and output validation; approval gate triggering is handled by policies and behavioural rules in the pipeline that follows them.

Every agent in OpenBox begins with the Assess phase, where a Risk Profile Score is produced from 14 parameters across three weighted categories: Base Security, AI-Specific, and Impact. This score is one of three components that compose the Trust Score, OpenBox’s composite 0–100 trustworthiness metric, alongside the Behavioural Score and Alignment Score. The Risk Profile Score is static unless formally re-assessed; it is the primary static input to the Trust Score, which in turn determines the agent’s Trust Tier and shapes the policy environment that governs it. Tier 3 agents handle PII, financial data, and critical actions. Tier 4 agents operate in system administration and destructive action contexts. While Tier 3 and Tier 4 contexts represent the most acute regulatory exposure for financial services deployments, the authorization pipeline and the REQUIRE_APPROVAL verdict are available across all four Trust Tiers, from Tier 1 read-only and public data agents through to Tier 4. Policy configuration determines when approval gates fire at each tier level.

HITL governance in OpenBox operates across the full Trust Lifecycle: Assess, Authorize, Monitor, Verify, and Adapt. The Assess phase establishes the risk profile that informs policy calibration. The Authorize phase configures the policies and behavioural rules that trigger REQUIRE_APPROVAL gates. The Monitor phase provides real-time AI agent observability; the Dashboard Alerts capability is the operational mechanism for configuring and responding to anomalies during operation, directly addressing Article 14’s requirement that designated persons be able to detect and act on system anomalies in real time. The Verify phase provides post-hoc audit through Session Replay, enabling full reconstruction of any agent session, including the context surrounding every REQUIRE_APPROVAL event. The Adapt phase closes the loop: policies, including those that govern approval thresholds, can be updated based on observed agent behavior, meaning HITL sensitivity is not frozen at deployment. It evolves as the governance system learns.

The authorization pipeline applies consistently across all supported integrations. Agents built on Temporal, LangChain, LangGraph, Mastra, Deep Agents, and other documented frameworks are wrapped with the same governance layer. The HITL mechanism is framework-agnostic; a compliance team adopting OpenBox does not face a different governance architecture depending on which orchestration stack the engineering team chose.

What the Governance Audit Record Contains

When a REQUIRE_APPROVAL verdict is issued, OpenBox records a governance event in the immutable audit trail. Records cannot be modified after creation. The record contains:

GOVERNANCE EVENT \| APPROVAL RECORD
TIMESTAMP	2026-05-19T14:23:07Z (UTC)
AGENT	credit-decisioning-agent-04
EVENT_TYPE	governance_evaluation
VERDICT	REQUIRE_APPROVAL
REASON	behavioral_rule: high-value action threshold triggered
WORKFLOW_ID	wf_7fae2c91b3d4
APPROVED_BY	sarah.chen@institution.com
APPROVED_AT	2026-05-19T14:26:41Z
EXPIRATION	2026-05-19T16:26:41Z

Each session’s governance events are cryptographically signed, producing a tamper-evident proof certificate documented fully in Administration › Attestation & Cryptographic Proof. The approval record is not a log entry in a separate system. It is a cryptographically attested governance event, part of the same chain that records every ALLOW, BLOCK, and HALT decision for that agent across its entire operational history. Session Replay, housed in the Verify phase of the Trust Lifecycle, enables post-hoc reconstruction of the full execution context around any governance decision, including REQUIRE_APPROVAL events.

06 AI Compliance Readiness: Five Questions to Test Your HITL Architecture

For financial services teams operating under the EU AI Act, DORA, or FCA expectations, the questions around human-in-the-loop compliance are now concrete. Each maps directly to documented OpenBox capabilities.

Can you produce a complete record of every human approval requested, who approved or denied it, and when?

If your HITL is implemented in an application-layer workflow, you likely cannot. The record is fragmented across notification systems and email threads that were never designed as governance evidence.

→ Administration › Compliance & Audit and the Organization Audit Log provide a structured, searchable, exportable governance record. Governance events, including every REQUIRE_APPROVAL outcome, are recorded with full context and cannot be modified after creation.

Is your HITL gate connected to the agent’s risk posture?

If approval requirements are fixed checkpoints rather than policy-driven verdicts, they are not responsive to the actual risk introduced by that agent’s actions.

→ Dashboard Trust Overview provides an aggregate view of Trust Scores across all deployed agents. Policy configuration in the Authorize phase ties approval thresholds directly to each agent’s Trust Tier and behavioural compliance history.

Is your HITL evidence cryptographically attestable?

A screenshot of an approval email does not withstand regulatory scrutiny. A cryptographically signed, tamper-proof governance event does.

→ Administration › Attestation & Cryptographic Proof produces a tamper-evident proof certificate per session, cryptographically signing the complete set of governance events including approval records.

Is your HITL bypass-proof at the authorization layer?

If the agent can proceed without an approval when the notification system fails or times out, the oversight function is not structurally reliable. Reliability requires HITL to be an authoritative governance verdict, not an advisory notification.

→ OpenBox: Because REQUIRE_APPROVAL is a governance verdict produced by the full authorization pipeline, it carries structural authority over the agent’s execution path. It is not an optional notification layer that can be bypassed by a system timeout.

Can your compliance team replay any agent session, including the approval event and its context?

Post-hoc audit capability is the difference between compliance that can be demonstrated and compliance that can only be asserted.

→ Session Replay (Trust Lifecycle › Verify) enables full reconstruction of any agent session and every governance decision made during it, including the complete approval record and the behavioural context that triggered the REQUIRE_APPROVAL verdict.

The gap between how HITL is typically implemented and what EU AI Act Article 14, DORA, and FCA frameworks require is not a configuration gap. It is an architectural one. Approval mechanisms that exist outside the authorization pipeline are not capable of satisfying the oversight, intervention, and evidencing requirements that each framework imposes, each in its own language but toward the same structural end: human oversight of AI agents must be real, operative, and demonstrable.

Governance cannot be added after deployment and made to look like it was there from the start. The authorization pipeline is where human-in-the-loop compliance is built or broken. HITL belongs in that pipeline, triggered by the same risk logic that governs every other decision the agent makes, producing the same quality of immutable, cryptographically attested evidence as every other governance event across the full Trust Lifecycle.

That is not a feature. It is the system.

See how OpenBox implements REQUIRE_APPROVAL across your agent stack. Get access →