Secure Systems Engineering

Autonomous Defense Without Autonomous Risk: A Blueprint for Policy-Governed Security Agents

2026-04-28T00:00:00+00:00

Everyone in security is talking about using AI to defend systems. Fewer people are talking about what happens when the AI gets it wrong.

I’ve spent a lot of time thinking about this. Not the detection side. Tooling there has improved dramatically. The harder problem is what comes after detection: safe, verified, auditable remediation at scale. That’s the gap I want to explore here.

The real problem isn’t detection

Security teams today are not short on signals. Cloud posture drift, exposed secrets, risky IAM policies, container findings, Kubernetes misconfigurations, it all gets surfaced. The problem is that surfacing a finding and safely fixing it are completely different problems.

The question that never gets answered fast enough is:

“What is risky, why does it matter, what should be fixed first, and what can be safely remediated without creating new risk?”

That gap between finding and fix is currently filled by humans doing repetitive triage work. An autonomous defensive system should close that gap, but only where it can do so safely. That last part is what most designs get wrong.

The architecture I’d build

The key design principle: each component does one thing. The agent plans. The policy engine decides. The broker executes. Nothing short-circuits this chain.

1. Signal ingestion and normalization

The system needs to pull from a lot of sources:

Cloud posture tools (Security Hub, Security Command Center, Defender for Cloud)
SAST/DAST/SCA scanners
Kubernetes admission logs and runtime alerts
IAM analyzer findings
Asset inventory and ownership metadata
CI/CD pipeline metadata
Vulnerability databases (NVD, OSV, GitHub Advisory)

Before anything else happens, everything gets normalized to a common schema:

{
  "asset": "payments-api-prod",
  "finding": "public_s3_bucket",
  "severity": "high",
  "owner": "payments-platform",
  "environment": "prod",
  "data_classification": "restricted",
  "internet_exposed": true,
  "exploitability": "medium"
}

This feels like boring plumbing work, and it is. But it’s also where most real implementations break down. If downstream components have to handle provider-specific formats, everything becomes fragile and hard to test.

2. Context is everything: the security knowledge graph

Here’s something I feel strongly about: findings should never be evaluated in isolation.

A public S3 bucket is not the same as a public S3 bucket containing restricted payment data reachable from an internet-facing production service. Same finding type, completely different risk. A flat scanner output will treat them identically. A knowledge graph won’t.

With this graph, the risk engine can reason like this:

“This S3 bucket is public. It’s connected to a production payment service, contains restricted data, and is reachable from an internet-facing workload. The blast radius of exploitation is the entire payments dataset. Treat this as critical.”

Without the graph, you get a severity label. With it, you get context-aware prioritization. That’s the difference between a tool and a system that actually helps.

3. Risk scoring that’s actually explainable

Raw scanner severity is a starting point, not a risk score. A high-severity finding in a dev account with no data and no production traffic is not the same priority as a medium-severity finding in a production service handling customer PII.

I like this model because every score is explainable:

Risk = Severity × Exposure × Asset Criticality × Data Sensitivity × Exploitability × Blast Radius

def calculate_risk(finding):
    score = 0

    score += finding.severity_score    * 0.25
    score += finding.exposure_score    * 0.20
    score += finding.asset_criticality * 0.20
    score += finding.data_sensitivity  * 0.15
    score += finding.exploitability    * 0.10
    score += finding.blast_radius      * 0.10

    return round(score, 2)

The weights should be calibrated to your environment. The right calibration for a payments platform is different from an internal tooling team. What matters is that you can always show which inputs drove a score and why. That’s what makes it trustable.

4. The agent plans. It doesn’t act.

This is probably the most important design decision in the whole system.

The agent’s job is to reason about a finding and produce a remediation plan. Nothing more. It doesn’t decide whether it can execute. It doesn’t hold credentials. It just says: “here’s what I think should happen.”

{
  "finding": "public_s3_bucket",
  "recommended_action": "block_public_access",
  "risk": "high",
  "confidence": "high",
  "requires_human_approval": false,
  "rollback_plan": "restore previous bucket policy",
  "verification": "confirm public access block is enabled"
}

What happens next is entirely determined by the policy engine. That separation is the core safety property of the system.

5. The policy engine is the safety boundary

This is the component I care about most. Every remediation plan flows through here before anything touches infrastructure.

The policy engine is the safety boundary between AI reasoning and production action.

The engine evaluates the proposed action against explicit, version-controlled rules and returns one of three decisions: allow, require approval, or deny.

package autonomous_defense

default allow = false

allow {
  input.action == "block_public_s3_access"
  input.environment != "prod"
  input.confidence == "high"
  input.rollback_available == true
}

allow {
  input.action == "rotate_low_privilege_secret"
  input.secret_scope == "single_service"
  input.owner_notified == true
  input.rollback_available == true
}

requires_approval {
  input.environment == "prod"
}

requires_approval {
  input.action == "delete_resource"
}

deny {
  input.blast_radius == "high"
}

A few things I’ve encoded deliberately here:

Environment matters. Non-prod actions can be automated. Prod requires approval or is denied outright.
Rollback is a precondition, not an afterthought. No rollback plan, no action.
Blast radius is a hard gate. High blast radius is always denied, no exceptions.
Policy is code. It lives in version control, gets reviewed like any other change, and the version is logged with every decision.

Your security posture should live in executable rules, not in a document that nobody reads.

6. Credentials should be narrow and short-lived

Most autonomous system designs fail here. The agent gets an admin service account “for convenience” and the policy layer becomes the only line of defense. That’s not a system, it’s a ticking clock.

The execution broker should only ever hold credentials for the specific action it’s executing, and those credentials should disappear when the action completes.

❌  Agent has admin access to cloud accounts.
    Agent decides what to do.
    Agent executes.
    Log: "agent did something."

✓   Agent produces a plan.
    Policy engine approves specific action.
    Broker receives narrowly-scoped credential for that action.
    Action is executed, logged, and verified.
    Log: full context, agent, tenant, action, resource, policy version, outcome.

The bad design has one defense layer. The good design has four.

7. Not everything should be automated

I want to be direct about this because it’s easy to get wrong.

Action	Automation level
Add missing security label	Fully automated
Open ticket with fix recommendation	Fully automated
Block public access on non-prod bucket	Automated with verification
Rotate low-risk secret	Automated with owner notification
Change production IAM policy	Human approval required
Delete production resource	Denied by default
Modify network path or firewall in prod	Human approval required

The system should be autonomous where blast radius is low, and human-governed where blast radius is high. A human approval gate is not a design failure. It’s the right answer when the cost of getting it wrong is high.

8. Every action needs a closed loop

Executing a fix and assuming it worked is not verification. The system needs to confirm the action had the intended effect and roll back if it didn’t.

sequenceDiagram participant Agent participant Policy participant Broker participant Cloud participant Verifier participant Audit Agent->>Policy: Request remediation Policy->>Broker: Approved action Broker->>Cloud: Apply fix Cloud->>Verifier: New state Verifier->>Audit: Record result Verifier-->>Broker: Success or rollback required

Some examples of what verification actually looks like in practice:

Public access block: confirm the S3 block-public-access flag is enabled
IAM permission removal: confirm the permission is absent from the current policy
Package upgrade: confirm the deployed version doesn’t match the vulnerable range
Kubernetes runtime class: confirm the workload spec references an approved runtime class
Network egress policy: confirm the policy rejects requests to the instance metadata IP

If verification fails, rollback is automatic and the finding re-enters the queue.

Don’t forget to threat-model the system itself

Any system that can make automated changes to infrastructure is itself a high-value target. A few things worth thinking through:

Prompt injection via finding content. An attacker controls a resource name or tag that gets embedded in the agent’s context and manipulates it into proposing the wrong action. Treat all external data as untrusted.

Policy bypass via crafted input. A finding or plan constructed to satisfy policy conditions it shouldn’t. Policy evaluation should use data from authoritative sources, not data passed in by the agent.

Credential theft from the broker. Credentials should be requested per-action and invalidated immediately after use. No persistent credential store.

Audit log tampering. If the agent can modify its own audit log, you’ve lost the trail. Write to an append-only, externally controlled sink.

The point

The opportunity isn’t in building an AI that can fix infrastructure. That’s the easy part. The hard part is building a system that governs the AI’s ability to act, with explicit policy, least-privilege execution, verification, rollback, and a complete audit trail.

An AI that can fix production only when policy allows, only with minimum required permissions, only with a verified rollback path, and only while writing every decision to an immutable log. That’s a system security teams can actually trust.

I’ve built a working demo of this: autonomous-defense-policy-agent. It runs against simulated findings and shows exactly how each component behaves across allow, approval, and deny paths.

IAM Design for Multi-Tenant AI Platforms

2026-04-21T00:00:00+00:00

The IAM model that works well for SaaS applications starts to break down when tenants are autonomous agents. An agent doesn’t just read data on behalf of a user — it spawns sub-tasks, calls external APIs, writes back to tenant state, and may run for minutes or hours unattended. The blast radius of a confused deputy or an over-permissioned credential is much larger than in a request/response world.

This post covers the design decisions I’ve found load-bearing when building IAM for platforms where tenants run code — and increasingly, where that code is AI-driven.

The core problem: ambient authority

In a typical multi-tenant SaaS, each API request carries a credential that scopes the request to a tenant. The credential is short-lived, bound to the HTTP session, and discarded when the request ends.

An agent inverts this model. It acquires credentials early (at task start), uses them across many operations over a long time window, and may delegate further to tools, sub-agents, or external services. At each step, the authority it holds tends to be ambient — silently present, not explicitly re-evaluated.

This creates two failure modes:

Confused deputy: The agent is tricked (via prompt injection, a malicious tool response, or a compromised dependency) into using its credentials on behalf of a different tenant.
Privilege accumulation: The agent picks up permissions it doesn’t need for the current step and holds them for the full task lifetime, widening the window for misuse.

The design goal is to make authority explicit, minimal, and re-evaluated at scope boundaries.

Tenant identity vs. agent identity

The first structural decision is whether an agent runs as a tenant identity or on behalf of a tenant identity.

Running as the tenant means the agent is issued a credential derived directly from the tenant’s identity (e.g., a service account in the tenant’s project, or a JWT with the tenant’s sub). Any action the agent takes is audited as the tenant. The upside is simplicity. The downside is that the platform has no first-party view into what the agent is doing — it looks identical to the tenant using the API directly.

Running on behalf of the tenant means the agent is issued a platform credential that carries the tenant’s identity as a claim, not as the identity itself. The platform can then enforce policies based on both dimensions: what the agent is (platform service, version, trust level) and what tenant it’s acting for.

# Token structure for "on behalf of" model
{
  "iss": "platform.internal",
  "sub": "agent/worker-7f3a",          # agent identity
  "act": { "sub": "tenant/acme-corp" }, # RFC 8693 delegation claim
  "scope": "read:documents write:tasks",
  "jti": "unique-token-id",
  "iat": 1745865600,
  "exp": 1745869200                     # 1h max
}

RFC 8693 (Token Exchange) defines the act claim for exactly this pattern. Downstream services can check both the acting identity and the original subject without having to trust the agent to self-report which tenant it’s working for.

Scoping credentials to the task, not the agent

A long-running agent should not hold a credential scoped to everything it might need. Credentials should be scoped to the current task and renewed (or exchanged) when the task context changes.

A practical pattern: issue a task token at task start that encodes the task ID, tenant, and the minimal permission set for that task type. The token lifetime should match the expected task duration, not a generic TTL. If the task completes in 3 minutes, a 60-minute token is 57 minutes of unnecessary exposure.

# Scoped task token
{
  "sub": "agent/worker-7f3a",
  "act": { "sub": "tenant/acme-corp" },
  "task_id": "task-9b2c",
  "task_type": "document_summarization",
  "scope": "read:documents",           # not write:documents
  "exp": 1745866800                    # 20 min, not 1 hr
}

When the agent needs a different permission (say, to write a result back), it exchanges the task token for a new one scoped to that operation. The exchange is logged. The original task token is not expanded — it stays narrow.

This forces the platform to have an explicit model of which task types need which permissions, which is good architectural hygiene regardless of security requirements.

Authorization at the resource layer

Token scopes tell you what class of operations the agent can perform. They don’t tell you which specific resources a tenant is allowed to touch. That’s the resource layer, and it should be enforced by the authorization service, not trusted from the token.

The pattern I lean toward:

flowchart LR
    Agent -->|task token| Gateway
    Gateway -->|authz check| Authz
    Authz -->|tenant policy| PolicyStore
    Authz -->|resource metadata| ResourceDB
    Gateway -->|if allowed| Resource

The gateway receives the task token, extracts the tenant identity and requested action, and calls the authorization service with the full context: who is acting (agent + tenant), what they want to do (action + resource ID), and what the current task is. The authorization service evaluates the tenant’s policy against the resource metadata — does this tenant own this resource? Does their plan allow this action?

Importantly, the gateway does not trust the agent to self-report the tenant. It reads the tenant claim from the platform-issued token. An agent cannot request resources for a different tenant by claiming a different identity.

Isolation at the execution layer

IAM handles credential issuance and authorization checks. It doesn’t enforce compute isolation. A sufficiently privileged agent running in a shared execution environment can still exfiltrate credentials from the process environment or reach adjacent tenant workloads through the network.

The enforcement layers that compose well with the IAM model above:

Layer	Mechanism	What it prevents
Network	Per-tenant egress rules, no east-west by default	Lateral movement between tenant workloads
Credential storage	In-memory only, no disk serialization	Credential persistence across task boundaries
Metadata service	Block IMDS from agent processes	Cloud credential theft via SSRF
Syscall filtering	Seccomp profiles	Process escape from container

The IMDS point deserves emphasis. On EC2/GKE/AKS, the instance metadata service is reachable by any process in the VM unless explicitly blocked. An agent that can make outbound HTTP requests can request the node’s IAM credentials and use them entirely outside the platform’s authorization model. Block IMDS at the network policy layer for any workload running agent code.

Audit: what to log

The authorization check is the right place to generate the audit record, not the agent. The agent cannot audit itself accurately — it may be compromised, and even if it isn’t, it doesn’t have visibility into the full authorization context.

A useful audit event:

{
  "event": "authz.decision",
  "decision": "allow",
  "principal": {
    "agent": "agent/worker-7f3a",
    "tenant": "tenant/acme-corp"
  },
  "action": "read",
  "resource": {
    "type": "document",
    "id": "doc-1234",
    "tenant": "tenant/acme-corp"
  },
  "task_id": "task-9b2c",
  "policy_version": "2026-04-01",
  "timestamp": "2026-04-28T12:00:00Z"
}

The policy_version field matters more than it looks. When you investigate an incident, you need to know which policy was in effect at decision time — not what the current policy says.

What breaks first in practice

The three things I’ve seen fail most often in real deployments:

1. Token lifetimes set by convention, not by task duration. Teams default to “1 hour” because that’s what the docs say. Long-running agent tasks end up with expired credentials mid-execution; the fix is usually to just make the token longer, which undermines the whole model.

2. Scopes inflated at onboarding. The first version of the task token spec gets extended each time a new task type is added, until it’s effectively *. Add new task types by adding new token types.

3. Delegation chains that aren’t logged end-to-end. Agent A calls tool B which calls service C. The authorization log at C shows the platform credential, not the originating agent or tenant. When something goes wrong at C, there’s no way to trace back to the task. Propagate the task ID (and the full delegation chain) in a request header and log it at every layer.

These patterns aren’t novel — they’re applications of least-privilege and defense-in-depth to a context where the “user” is a process that can act autonomously for an extended period. The novelty is that most existing IAM tooling is optimized for human users and short-lived API calls. Adapting it for agents requires being explicit about decisions the tooling used to make implicitly.

Policy as Code: Enforcing Security Guardrails with OPA and Rego

2026-04-01T00:00:00+00:00

Security enforcement logic tends to spread. Access control checks live in application code. Admission rules live in shell scripts. IAM conditions live in JSON buried in Terraform. Kubernetes RBAC lives in YAML files that no one reviews. Over time the policy surface becomes impossible to audit, test, or reason about consistently.

Policy as Code addresses this by treating security rules as first-class, version-controlled, testable code — evaluated by a dedicated policy engine rather than scattered across every system that needs to make an authorization decision.

Open Policy Agent (OPA) is the most widely adopted engine for this pattern. This post covers how to design with it effectively: the data model, the evaluation model, integration patterns, and the failure modes that matter in production.

What OPA actually does

OPA is a general-purpose policy engine. You give it:

A policy — written in Rego, OPA’s declarative language
Input — a JSON document describing the request being evaluated
Data — context OPA can query during evaluation (asset inventory, user roles, etc.)

OPA evaluates the policy against the input and data and returns a decision. That decision can be a boolean, a structured object, a list — whatever your policy defines.

Input (JSON)  +  Policy (Rego)  +  Data (JSON)  →  Decision (JSON)

The engine has no built-in concept of users, resources, or actions. Those are defined by your policy. This makes OPA applicable across a wide range of enforcement points: Kubernetes admission, API gateways, CI/CD pipelines, infrastructure provisioning, and autonomous agent systems.

The Rego data model

Rego is a declarative language built on Datalog. The key mental model shift from imperative languages: you define what is true, not what to do.

A policy that allows read access to non-production resources:

package authz

default allow = false

allow {
    input.action == "read"
    input.resource.environment != "prod"
    valid_principal
}

valid_principal {
    input.principal.role == "engineer"
}

valid_principal {
    input.principal.role == "analyst"
}

This policy has no if/else, no loops, no mutation. Each rule defines a condition under which a value is true. If any rule body evaluates to true, the rule head is true. Multiple rules with the same head are implicitly OR’d.

The evaluation model: OPA evaluates all rules and returns the result. There is no short-circuit, no ordering dependency, no hidden state. Given the same input and data, the same policy always returns the same decision.

Structuring policies for real systems

A flat policy file works for toy examples. Production systems need structure.

Separate packages by enforcement domain. Each enforcement point gets its own package:

policy/
├── authz/
│   └── api_gateway.rego        # API request authorization
├── admission/
│   └── kubernetes.rego         # Kubernetes admission control
├── cloud/
│   ├── iam.rego                # IAM policy evaluation
│   └── storage.rego            # Storage access decisions
└── autonomous/
    └── remediation.rego        # Agent action authorization

Separate policy from data. Policies should not hardcode lists of approved values. They should query external data:

package authz.api_gateway

import data.approved_services
import data.user_roles

default allow = false

allow {
    service := approved_services[input.service_id]
    role    := user_roles[input.principal_id]
    role.permissions[_] == input.action
}

The approved_services and user_roles documents are loaded into OPA separately — from a database, a config map, or a bundle. The policy expresses how to evaluate; the data expresses what is true at a given point in time.

Layer allow and deny explicitly. OPA has no built-in precedence between allow and deny rules. You define the logic:

package cloud.storage

default decision = "deny"

decision = "allow" {
    allow
    not deny
}

allow {
    input.action == "read"
    input.resource.classification != "restricted"
}

deny {
    input.resource.environment == "prod"
    input.principal.type == "service_account"
    not input.principal.approved_for_prod
}

Explicit layering makes the precedence visible in the policy, not hidden in evaluation order.

Kubernetes admission control

OPA integrates with Kubernetes via the admission webhook. Every resource creation or update passes through OPA before it is committed to the cluster.

A policy that blocks containers running as root:

package kubernetes.admission

deny[msg] {
    input.request.kind.kind == "Pod"
    container := input.request.object.spec.containers[_]
    not container.securityContext.runAsNonRoot

    msg := sprintf("container '%v' must set runAsNonRoot: true", [container.name])
}

A policy that requires all deployments to have resource limits:

deny[msg] {
    input.request.kind.kind == "Deployment"
    container := input.request.object.spec.template.spec.containers[_]
    not container.resources.limits

    msg := sprintf("container '%v' must define resource limits", [container.name])
}

The deny[msg] pattern collects all violations as a set of messages rather than short-circuiting on the first. This means a single rejected request returns all the reasons it was rejected — useful for developer feedback.

CI/CD pipeline enforcement

Admission control catches problems at deploy time. Policy checks in CI catch them earlier, when the cost of a fix is lower.

The pattern: run OPA against infrastructure-as-code output (Terraform plan, Helm chart, Kubernetes manifests) in CI before the deployment reaches the cluster.

# evaluate a Terraform plan against a policy bundle
terraform show -json plan.tfplan | opa eval \
  --data policy/ \
  --input /dev/stdin \
  "data.terraform.deny"

A policy that flags public S3 buckets in Terraform plans:

package terraform

deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_s3_bucket_public_access_block"
    resource.change.after.block_public_acls == false

    msg := sprintf("resource '%v' must block public ACLs", [resource.address])
}

The same policy language, the same engine, the same evaluation model — applied at a different point in the delivery pipeline.

Policy for autonomous agent systems

The pattern that matters most for agentic systems: the policy engine as the decision boundary between AI reasoning and infrastructure action.

An agent produces a remediation plan. The plan is evaluated against an OPA policy before any action is taken. The agent cannot bypass this — it has no credentials, no direct access to infrastructure. It can only submit plans and receive decisions.

package autonomous.remediation

default allow       = false
default requires_approval = false
default deny        = false

# Hard denies — no path to allow
deny {
    input.blast_radius == "high"
}

deny {
    input.action_type == "delete_resource"
}

# Production requires a human in the loop
requires_approval {
    input.environment == "prod"
    not deny
}

# Automated allow: non-prod, high confidence, rollback available
allow {
    input.environment != "prod"
    input.confidence  == "high"
    input.rollback_available == true
    not deny
}

The policy version is logged with every decision. When investigating an incident, you can reconstruct exactly what policy was in effect at the time of each action — which rule matched, which data was evaluated. This is the auditability property that makes autonomous action trustworthy.

Testing policies

Untested policies are configuration. The Rego unit test framework makes policies verifiable:

package autonomous.remediation_test

test_deny_high_blast_radius {
    deny with input as {
        "action_type":        "block_public_s3_access",
        "blast_radius":       "high",
        "environment":        "dev",
        "confidence":         "high",
        "rollback_available": true
    }
}

test_allow_non_prod_s3_block {
    allow with input as {
        "action_type":        "block_public_s3_access",
        "blast_radius":       "low",
        "environment":        "dev",
        "confidence":         "high",
        "rollback_available": true
    }
}

test_requires_approval_in_prod {
    requires_approval with input as {
        "action_type":        "block_public_s3_access",
        "blast_radius":       "low",
        "environment":        "prod",
        "confidence":         "high",
        "rollback_available": true
    }
}

Run with:

opa test policy/ -v

Policy changes go through the same review process as application code — diff, review, test, merge. This is the core value proposition: security rules become auditable, reviewable, and testable like any other code.

What breaks in practice

Policy and data drift. The policy assumes a data schema. The data changes. Nothing enforces the contract between them. Mitigation: schema validation on data bundles; tests that cover the boundary conditions that depend on specific data shapes.

Overly broad defaults. default allow = false is correct but teams sometimes flip it for convenience and never flip it back. Make the default explicit in every package, not just the root.

Missing the enforcement point. OPA decides; something else must enforce. If a service ignores the OPA decision, the policy is theatre. The integration must be verified — ideally with an integration test that confirms a denied request actually fails.

Policy explosion without structure. A single flat policy file becomes unmanageable past a few hundred lines. Package structure and data separation are not premature optimisation — they are prerequisites for a policy layer that a team can maintain over time.

The value of a policy engine is not that it makes authorization easier to write. It is that it makes authorization possible to audit, test, and evolve independently of the systems it governs. When a policy change needs to go through review, when every decision is logged with the policy version that produced it, and when tests break before a bad rule reaches production — that is when policy as code pays for itself.