<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://woxff.github.io/security-blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://woxff.github.io/security-blog/" rel="alternate" type="text/html" /><updated>2026-05-01T16:56:59+00:00</updated><id>https://woxff.github.io/security-blog/feed.xml</id><title type="html">Secure Systems Engineering</title><subtitle>Writing on secure systems design, cloud-native security, and autonomous defense.</subtitle><author><name>qwoxff</name></author><entry><title type="html">Autonomous Defense Without Autonomous Risk: A Blueprint for Policy-Governed Security Agents</title><link href="https://woxff.github.io/security-blog/2026/04/28/autonomous-defense-without-autonomous-risk/" rel="alternate" type="text/html" title="Autonomous Defense Without Autonomous Risk: A Blueprint for Policy-Governed Security Agents" /><published>2026-04-28T00:00:00+00:00</published><updated>2026-04-28T00:00:00+00:00</updated><id>https://woxff.github.io/security-blog/2026/04/28/autonomous-defense-without-autonomous-risk</id><content type="html" xml:base="https://woxff.github.io/security-blog/2026/04/28/autonomous-defense-without-autonomous-risk/"><![CDATA[<p>Everyone in security is talking about using AI to defend systems. Fewer people are talking about what happens when the AI gets it wrong.</p>

<p>I’ve spent a lot of time thinking about this. Not the detection side. Tooling there has improved dramatically. The harder problem is what comes after detection: safe, verified, auditable remediation at scale. That’s the gap I want to explore here.</p>

<h2 id="the-real-problem-isnt-detection">The real problem isn’t detection</h2>

<p>Security teams today are not short on signals. Cloud posture drift, exposed secrets, risky IAM policies, container findings, Kubernetes misconfigurations, it all gets surfaced. The problem is that surfacing a finding and safely fixing it are completely different problems.</p>

<p>The question that never gets answered fast enough is:</p>

<blockquote>
  <p>“What is risky, why does it matter, what should be fixed first, and what can be safely remediated without creating new risk?”</p>
</blockquote>

<p>That gap between finding and fix is currently filled by humans doing repetitive triage work. An autonomous defensive system should close that gap, but only where it can do so safely. That last part is what most designs get wrong.</p>

<h2 id="the-architecture-id-build">The architecture I’d build</h2>

<p><img src="/security-blog/assets/images/autonomous-defense-architecture.png" alt="Autonomous Defense System Architecture" style="max-width: 100%; height: auto; border-radius: 6px; margin: 1.5rem 0;" /></p>

<p>The key design principle: each component does one thing. The agent plans. The policy engine decides. The broker executes. Nothing short-circuits this chain.</p>

<hr />

<h2 id="1-signal-ingestion-and-normalization">1. Signal ingestion and normalization</h2>

<p>The system needs to pull from a lot of sources:</p>

<ul>
  <li>Cloud posture tools (Security Hub, Security Command Center, Defender for Cloud)</li>
  <li>SAST/DAST/SCA scanners</li>
  <li>Kubernetes admission logs and runtime alerts</li>
  <li>IAM analyzer findings</li>
  <li>Asset inventory and ownership metadata</li>
  <li>CI/CD pipeline metadata</li>
  <li>Vulnerability databases (NVD, OSV, GitHub Advisory)</li>
</ul>

<p>Before anything else happens, everything gets normalized to a common schema:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"asset"</span><span class="p">:</span><span class="w"> </span><span class="s2">"payments-api-prod"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"finding"</span><span class="p">:</span><span class="w"> </span><span class="s2">"public_s3_bucket"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"severity"</span><span class="p">:</span><span class="w"> </span><span class="s2">"high"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"owner"</span><span class="p">:</span><span class="w"> </span><span class="s2">"payments-platform"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"environment"</span><span class="p">:</span><span class="w"> </span><span class="s2">"prod"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"data_classification"</span><span class="p">:</span><span class="w"> </span><span class="s2">"restricted"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"internet_exposed"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
  </span><span class="nl">"exploitability"</span><span class="p">:</span><span class="w"> </span><span class="s2">"medium"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>This feels like boring plumbing work, and it is. But it’s also where most real implementations break down. If downstream components have to handle provider-specific formats, everything becomes fragile and hard to test.</p>

<hr />

<h2 id="2-context-is-everything-the-security-knowledge-graph">2. Context is everything: the security knowledge graph</h2>

<p>Here’s something I feel strongly about: findings should never be evaluated in isolation.</p>

<p>A public S3 bucket is not the same as a public S3 bucket containing restricted payment data reachable from an internet-facing production service. Same finding type, completely different risk. A flat scanner output will treat them identically. A knowledge graph won’t.</p>

<p><img src="/security-blog/assets/images/security-knowledge-graph.png" alt="Security Knowledge Graph" style="max-width: 100%; height: auto; border-radius: 6px; margin: 1.5rem 0;" /></p>

<p>With this graph, the risk engine can reason like this:</p>

<blockquote>
  <p>“This S3 bucket is public. It’s connected to a production payment service, contains restricted data, and is reachable from an internet-facing workload. The blast radius of exploitation is the entire payments dataset. Treat this as critical.”</p>
</blockquote>

<p>Without the graph, you get a severity label. With it, you get context-aware prioritization. That’s the difference between a tool and a system that actually helps.</p>

<hr />

<h2 id="3-risk-scoring-thats-actually-explainable">3. Risk scoring that’s actually explainable</h2>

<p>Raw scanner severity is a starting point, not a risk score. A high-severity finding in a dev account with no data and no production traffic is not the same priority as a medium-severity finding in a production service handling customer PII.</p>

<p>I like this model because every score is explainable:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Risk = Severity × Exposure × Asset Criticality × Data Sensitivity × Exploitability × Blast Radius
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calculate_risk</span><span class="p">(</span><span class="n">finding</span><span class="p">):</span>
    <span class="n">score</span> <span class="o">=</span> <span class="mi">0</span>

    <span class="n">score</span> <span class="o">+=</span> <span class="n">finding</span><span class="p">.</span><span class="n">severity_score</span>    <span class="o">*</span> <span class="mf">0.25</span>
    <span class="n">score</span> <span class="o">+=</span> <span class="n">finding</span><span class="p">.</span><span class="n">exposure_score</span>    <span class="o">*</span> <span class="mf">0.20</span>
    <span class="n">score</span> <span class="o">+=</span> <span class="n">finding</span><span class="p">.</span><span class="n">asset_criticality</span> <span class="o">*</span> <span class="mf">0.20</span>
    <span class="n">score</span> <span class="o">+=</span> <span class="n">finding</span><span class="p">.</span><span class="n">data_sensitivity</span>  <span class="o">*</span> <span class="mf">0.15</span>
    <span class="n">score</span> <span class="o">+=</span> <span class="n">finding</span><span class="p">.</span><span class="n">exploitability</span>    <span class="o">*</span> <span class="mf">0.10</span>
    <span class="n">score</span> <span class="o">+=</span> <span class="n">finding</span><span class="p">.</span><span class="n">blast_radius</span>      <span class="o">*</span> <span class="mf">0.10</span>

    <span class="k">return</span> <span class="nb">round</span><span class="p">(</span><span class="n">score</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>

<p>The weights should be calibrated to your environment. The right calibration for a payments platform is different from an internal tooling team. What matters is that you can always show which inputs drove a score and why. That’s what makes it trustable.</p>

<hr />

<h2 id="4-the-agent-plans-it-doesnt-act">4. The agent plans. It doesn’t act.</h2>

<p>This is probably the most important design decision in the whole system.</p>

<p>The agent’s job is to reason about a finding and produce a remediation plan. Nothing more. It doesn’t decide whether it can execute. It doesn’t hold credentials. It just says: “here’s what I think should happen.”</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"finding"</span><span class="p">:</span><span class="w"> </span><span class="s2">"public_s3_bucket"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"recommended_action"</span><span class="p">:</span><span class="w"> </span><span class="s2">"block_public_access"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"risk"</span><span class="p">:</span><span class="w"> </span><span class="s2">"high"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"confidence"</span><span class="p">:</span><span class="w"> </span><span class="s2">"high"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"requires_human_approval"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w">
  </span><span class="nl">"rollback_plan"</span><span class="p">:</span><span class="w"> </span><span class="s2">"restore previous bucket policy"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"verification"</span><span class="p">:</span><span class="w"> </span><span class="s2">"confirm public access block is enabled"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>What happens next is entirely determined by the policy engine. That separation is the core safety property of the system.</p>

<hr />

<h2 id="5-the-policy-engine-is-the-safety-boundary">5. The policy engine is the safety boundary</h2>

<p>This is the component I care about most. Every remediation plan flows through here before anything touches infrastructure.</p>

<blockquote>
  <p>The policy engine is the safety boundary between AI reasoning and production action.</p>
</blockquote>

<p>The engine evaluates the proposed action against explicit, version-controlled rules and returns one of three decisions: allow, require approval, or deny.</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ow">package</span> <span class="n">autonomous_defense</span>

<span class="ow">default</span> <span class="n">allow</span> <span class="o">=</span> <span class="kc">false</span>

<span class="n">allow</span> <span class="p">{</span>
  <span class="n">input</span><span class="p">.</span><span class="n">action</span> <span class="o">==</span> <span class="s2">"block_public_s3_access"</span>
  <span class="n">input</span><span class="p">.</span><span class="n">environment</span> <span class="p">!</span><span class="o">=</span> <span class="s2">"prod"</span>
  <span class="n">input</span><span class="p">.</span><span class="n">confidence</span> <span class="o">==</span> <span class="s2">"high"</span>
  <span class="n">input</span><span class="p">.</span><span class="n">rollback_available</span> <span class="o">==</span> <span class="kc">true</span>
<span class="p">}</span>

<span class="n">allow</span> <span class="p">{</span>
  <span class="n">input</span><span class="p">.</span><span class="n">action</span> <span class="o">==</span> <span class="s2">"rotate_low_privilege_secret"</span>
  <span class="n">input</span><span class="p">.</span><span class="n">secret_scope</span> <span class="o">==</span> <span class="s2">"single_service"</span>
  <span class="n">input</span><span class="p">.</span><span class="n">owner_notified</span> <span class="o">==</span> <span class="kc">true</span>
  <span class="n">input</span><span class="p">.</span><span class="n">rollback_available</span> <span class="o">==</span> <span class="kc">true</span>
<span class="p">}</span>

<span class="n">requires_approval</span> <span class="p">{</span>
  <span class="n">input</span><span class="p">.</span><span class="n">environment</span> <span class="o">==</span> <span class="s2">"prod"</span>
<span class="p">}</span>

<span class="n">requires_approval</span> <span class="p">{</span>
  <span class="n">input</span><span class="p">.</span><span class="n">action</span> <span class="o">==</span> <span class="s2">"delete_resource"</span>
<span class="p">}</span>

<span class="n">deny</span> <span class="p">{</span>
  <span class="n">input</span><span class="p">.</span><span class="n">blast_radius</span> <span class="o">==</span> <span class="s2">"high"</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A few things I’ve encoded deliberately here:</p>

<ul>
  <li><strong>Environment matters.</strong> Non-prod actions can be automated. Prod requires approval or is denied outright.</li>
  <li><strong>Rollback is a precondition, not an afterthought.</strong> No rollback plan, no action.</li>
  <li><strong>Blast radius is a hard gate.</strong> High blast radius is always denied, no exceptions.</li>
  <li><strong>Policy is code.</strong> It lives in version control, gets reviewed like any other change, and the version is logged with every decision.</li>
</ul>

<p>Your security posture should live in executable rules, not in a document that nobody reads.</p>

<hr />

<h2 id="6-credentials-should-be-narrow-and-short-lived">6. Credentials should be narrow and short-lived</h2>

<p>Most autonomous system designs fail here. The agent gets an admin service account “for convenience” and the policy layer becomes the only line of defense. That’s not a system, it’s a ticking clock.</p>

<p>The execution broker should only ever hold credentials for the specific action it’s executing, and those credentials should disappear when the action completes.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>❌  Agent has admin access to cloud accounts.
    Agent decides what to do.
    Agent executes.
    Log: "agent did something."

✓   Agent produces a plan.
    Policy engine approves specific action.
    Broker receives narrowly-scoped credential for that action.
    Action is executed, logged, and verified.
    Log: full context, agent, tenant, action, resource, policy version, outcome.
</code></pre></div></div>

<p>The bad design has one defense layer. The good design has four.</p>

<hr />

<h2 id="7-not-everything-should-be-automated">7. Not everything should be automated</h2>

<p>I want to be direct about this because it’s easy to get wrong.</p>

<table>
  <thead>
    <tr>
      <th>Action</th>
      <th>Automation level</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Add missing security label</td>
      <td>Fully automated</td>
    </tr>
    <tr>
      <td>Open ticket with fix recommendation</td>
      <td>Fully automated</td>
    </tr>
    <tr>
      <td>Block public access on non-prod bucket</td>
      <td>Automated with verification</td>
    </tr>
    <tr>
      <td>Rotate low-risk secret</td>
      <td>Automated with owner notification</td>
    </tr>
    <tr>
      <td>Change production IAM policy</td>
      <td>Human approval required</td>
    </tr>
    <tr>
      <td>Delete production resource</td>
      <td>Denied by default</td>
    </tr>
    <tr>
      <td>Modify network path or firewall in prod</td>
      <td>Human approval required</td>
    </tr>
  </tbody>
</table>

<p>The system should be autonomous where blast radius is low, and human-governed where blast radius is high. A human approval gate is not a design failure. It’s the right answer when the cost of getting it wrong is high.</p>

<hr />

<h2 id="8-every-action-needs-a-closed-loop">8. Every action needs a closed loop</h2>

<p>Executing a fix and assuming it worked is not verification. The system needs to confirm the action had the intended effect and roll back if it didn’t.</p>

<div class="mermaid">
sequenceDiagram
    participant Agent
    participant Policy
    participant Broker
    participant Cloud
    participant Verifier
    participant Audit

    Agent-&gt;&gt;Policy: Request remediation
    Policy-&gt;&gt;Broker: Approved action
    Broker-&gt;&gt;Cloud: Apply fix
    Cloud-&gt;&gt;Verifier: New state
    Verifier-&gt;&gt;Audit: Record result
    Verifier--&gt;&gt;Broker: Success or rollback required
</div>

<p>Some examples of what verification actually looks like in practice:</p>

<ul>
  <li>Public access block: confirm the S3 block-public-access flag is enabled</li>
  <li>IAM permission removal: confirm the permission is absent from the current policy</li>
  <li>Package upgrade: confirm the deployed version doesn’t match the vulnerable range</li>
  <li>Kubernetes runtime class: confirm the workload spec references an approved runtime class</li>
  <li>Network egress policy: confirm the policy rejects requests to the instance metadata IP</li>
</ul>

<p>If verification fails, rollback is automatic and the finding re-enters the queue.</p>

<hr />

<h2 id="dont-forget-to-threat-model-the-system-itself">Don’t forget to threat-model the system itself</h2>

<p>Any system that can make automated changes to infrastructure is itself a high-value target. A few things worth thinking through:</p>

<p><strong>Prompt injection via finding content.</strong> An attacker controls a resource name or tag that gets embedded in the agent’s context and manipulates it into proposing the wrong action. Treat all external data as untrusted.</p>

<p><strong>Policy bypass via crafted input.</strong> A finding or plan constructed to satisfy policy conditions it shouldn’t. Policy evaluation should use data from authoritative sources, not data passed in by the agent.</p>

<p><strong>Credential theft from the broker.</strong> Credentials should be requested per-action and invalidated immediately after use. No persistent credential store.</p>

<p><strong>Audit log tampering.</strong> If the agent can modify its own audit log, you’ve lost the trail. Write to an append-only, externally controlled sink.</p>

<hr />

<h2 id="the-point">The point</h2>

<p>The opportunity isn’t in building an AI that can fix infrastructure. That’s the easy part. The hard part is building a system that governs the AI’s ability to act, with explicit policy, least-privilege execution, verification, rollback, and a complete audit trail.</p>

<p>An AI that can fix production only when policy allows, only with minimum required permissions, only with a verified rollback path, and only while writing every decision to an immutable log. That’s a system security teams can actually trust.</p>

<p>I’ve built a working demo of this: <a href="https://github.com/woxff/autonomous-defense-policy-agent">autonomous-defense-policy-agent</a>. It runs against simulated findings and shows exactly how each component behaves across allow, approval, and deny paths.</p>]]></content><author><name>qwoxff</name></author><category term="agentic-ai" /><category term="policy-engines" /><category term="cloud-security" /><category term="opa" /><category term="autonomous-systems" /><summary type="html"><![CDATA[Autonomous defense should not mean letting an AI agent freely fix production. It means a policy-governed system that detects risk, reasons about impact, proposes remediation, executes only low-risk actions, and escalates everything else with full auditability.]]></summary></entry><entry><title type="html">IAM Design for Multi-Tenant AI Platforms</title><link href="https://woxff.github.io/security-blog/2026/04/21/iam-for-multi-tenant-ai-platforms/" rel="alternate" type="text/html" title="IAM Design for Multi-Tenant AI Platforms" /><published>2026-04-21T00:00:00+00:00</published><updated>2026-04-21T00:00:00+00:00</updated><id>https://woxff.github.io/security-blog/2026/04/21/iam-for-multi-tenant-ai-platforms</id><content type="html" xml:base="https://woxff.github.io/security-blog/2026/04/21/iam-for-multi-tenant-ai-platforms/"><![CDATA[<p>The IAM model that works well for SaaS applications starts to break down when tenants are autonomous agents. An agent doesn’t just read data on behalf of a user — it spawns sub-tasks, calls external APIs, writes back to tenant state, and may run for minutes or hours unattended. The blast radius of a confused deputy or an over-permissioned credential is much larger than in a request/response world.</p>

<p>This post covers the design decisions I’ve found load-bearing when building IAM for platforms where tenants run code — and increasingly, where that code is AI-driven.</p>

<h2 id="the-core-problem-ambient-authority">The core problem: ambient authority</h2>

<p>In a typical multi-tenant SaaS, each API request carries a credential that scopes the request to a tenant. The credential is short-lived, bound to the HTTP session, and discarded when the request ends.</p>

<p>An agent inverts this model. It acquires credentials early (at task start), uses them across many operations over a long time window, and may delegate further to tools, sub-agents, or external services. At each step, the authority it holds tends to be <em>ambient</em> — silently present, not explicitly re-evaluated.</p>

<p>This creates two failure modes:</p>

<ol>
  <li><strong>Confused deputy</strong>: The agent is tricked (via prompt injection, a malicious tool response, or a compromised dependency) into using its credentials on behalf of a different tenant.</li>
  <li><strong>Privilege accumulation</strong>: The agent picks up permissions it doesn’t need for the current step and holds them for the full task lifetime, widening the window for misuse.</li>
</ol>

<p>The design goal is to make authority <em>explicit</em>, <em>minimal</em>, and <em>re-evaluated at scope boundaries</em>.</p>

<h2 id="tenant-identity-vs-agent-identity">Tenant identity vs. agent identity</h2>

<p>The first structural decision is whether an agent runs <em>as</em> a tenant identity or <em>on behalf of</em> a tenant identity.</p>

<p><strong>Running as the tenant</strong> means the agent is issued a credential derived directly from the tenant’s identity (e.g., a service account in the tenant’s project, or a JWT with the tenant’s <code class="language-plaintext highlighter-rouge">sub</code>). Any action the agent takes is audited as the tenant. The upside is simplicity. The downside is that the platform has no first-party view into what the agent is doing — it looks identical to the tenant using the API directly.</p>

<p><strong>Running on behalf of the tenant</strong> means the agent is issued a platform credential that carries the tenant’s identity as a <em>claim</em>, not as the identity itself. The platform can then enforce policies based on both dimensions: what the agent is (platform service, version, trust level) and what tenant it’s acting for.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Token structure for "on behalf of" model
{
  "iss": "platform.internal",
  "sub": "agent/worker-7f3a",          # agent identity
  "act": { "sub": "tenant/acme-corp" }, # RFC 8693 delegation claim
  "scope": "read:documents write:tasks",
  "jti": "unique-token-id",
  "iat": 1745865600,
  "exp": 1745869200                     # 1h max
}
</code></pre></div></div>

<p><a href="https://www.rfc-editor.org/rfc/rfc8693">RFC 8693</a> (Token Exchange) defines the <code class="language-plaintext highlighter-rouge">act</code> claim for exactly this pattern. Downstream services can check both the acting identity and the original subject without having to trust the agent to self-report which tenant it’s working for.</p>

<h2 id="scoping-credentials-to-the-task-not-the-agent">Scoping credentials to the task, not the agent</h2>

<p>A long-running agent should not hold a credential scoped to everything it <em>might</em> need. Credentials should be scoped to the current task and renewed (or exchanged) when the task context changes.</p>

<p>A practical pattern: issue a <strong>task token</strong> at task start that encodes the task ID, tenant, and the minimal permission set for that task type. The token lifetime should match the expected task duration, not a generic TTL. If the task completes in 3 minutes, a 60-minute token is 57 minutes of unnecessary exposure.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Scoped task token
{
  "sub": "agent/worker-7f3a",
  "act": { "sub": "tenant/acme-corp" },
  "task_id": "task-9b2c",
  "task_type": "document_summarization",
  "scope": "read:documents",           # not write:documents
  "exp": 1745866800                    # 20 min, not 1 hr
}
</code></pre></div></div>

<p>When the agent needs a different permission (say, to write a result back), it exchanges the task token for a new one scoped to that operation. The exchange is logged. The original task token is not expanded — it stays narrow.</p>

<p>This forces the platform to have an explicit model of which task types need which permissions, which is good architectural hygiene regardless of security requirements.</p>

<h2 id="authorization-at-the-resource-layer">Authorization at the resource layer</h2>

<p>Token scopes tell you what <em>class</em> of operations the agent can perform. They don’t tell you which <em>specific resources</em> a tenant is allowed to touch. That’s the resource layer, and it should be enforced by the authorization service, not trusted from the token.</p>

<p>The pattern I lean toward:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>flowchart LR
    Agent --&gt;|task token| Gateway
    Gateway --&gt;|authz check| Authz
    Authz --&gt;|tenant policy| PolicyStore
    Authz --&gt;|resource metadata| ResourceDB
    Gateway --&gt;|if allowed| Resource
</code></pre></div></div>

<p>The gateway receives the task token, extracts the tenant identity and requested action, and calls the authorization service with the full context: who is acting (agent + tenant), what they want to do (action + resource ID), and what the current task is. The authorization service evaluates the tenant’s policy against the resource metadata — does this tenant own this resource? Does their plan allow this action?</p>

<p>Importantly, the gateway does not trust the agent to self-report the tenant. It reads the tenant claim from the platform-issued token. An agent cannot request resources for a different tenant by claiming a different identity.</p>

<h2 id="isolation-at-the-execution-layer">Isolation at the execution layer</h2>

<p>IAM handles credential issuance and authorization checks. It doesn’t enforce compute isolation. A sufficiently privileged agent running in a shared execution environment can still exfiltrate credentials from the process environment or reach adjacent tenant workloads through the network.</p>

<p>The enforcement layers that compose well with the IAM model above:</p>

<table>
  <thead>
    <tr>
      <th>Layer</th>
      <th>Mechanism</th>
      <th>What it prevents</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Network</td>
      <td>Per-tenant egress rules, no east-west by default</td>
      <td>Lateral movement between tenant workloads</td>
    </tr>
    <tr>
      <td>Credential storage</td>
      <td>In-memory only, no disk serialization</td>
      <td>Credential persistence across task boundaries</td>
    </tr>
    <tr>
      <td>Metadata service</td>
      <td>Block IMDS from agent processes</td>
      <td>Cloud credential theft via SSRF</td>
    </tr>
    <tr>
      <td>Syscall filtering</td>
      <td>Seccomp profiles</td>
      <td>Process escape from container</td>
    </tr>
  </tbody>
</table>

<p>The IMDS point deserves emphasis. On EC2/GKE/AKS, the instance metadata service is reachable by any process in the VM unless explicitly blocked. An agent that can make outbound HTTP requests can request the node’s IAM credentials and use them entirely outside the platform’s authorization model. Block IMDS at the network policy layer for any workload running agent code.</p>

<h2 id="audit-what-to-log">Audit: what to log</h2>

<p>The authorization check is the right place to generate the audit record, not the agent. The agent cannot audit itself accurately — it may be compromised, and even if it isn’t, it doesn’t have visibility into the full authorization context.</p>

<p>A useful audit event:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"event"</span><span class="p">:</span><span class="w"> </span><span class="s2">"authz.decision"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"decision"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"principal"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"agent"</span><span class="p">:</span><span class="w"> </span><span class="s2">"agent/worker-7f3a"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"tenant"</span><span class="p">:</span><span class="w"> </span><span class="s2">"tenant/acme-corp"</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"action"</span><span class="p">:</span><span class="w"> </span><span class="s2">"read"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"resource"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"document"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"doc-1234"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"tenant"</span><span class="p">:</span><span class="w"> </span><span class="s2">"tenant/acme-corp"</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"task_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"task-9b2c"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"policy_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2026-04-01"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"timestamp"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2026-04-28T12:00:00Z"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">policy_version</code> field matters more than it looks. When you investigate an incident, you need to know which policy was in effect at decision time — not what the current policy says.</p>

<h2 id="what-breaks-first-in-practice">What breaks first in practice</h2>

<p>The three things I’ve seen fail most often in real deployments:</p>

<p><strong>1. Token lifetimes set by convention, not by task duration.</strong> Teams default to “1 hour” because that’s what the docs say. Long-running agent tasks end up with expired credentials mid-execution; the fix is usually to just make the token longer, which undermines the whole model.</p>

<p><strong>2. Scopes inflated at onboarding.</strong> The first version of the task token spec gets extended each time a new task type is added, until it’s effectively <code class="language-plaintext highlighter-rouge">*</code>. Add new task types by adding new token types.</p>

<p><strong>3. Delegation chains that aren’t logged end-to-end.</strong> Agent A calls tool B which calls service C. The authorization log at C shows the platform credential, not the originating agent or tenant. When something goes wrong at C, there’s no way to trace back to the task. Propagate the task ID (and the full delegation chain) in a request header and log it at every layer.</p>

<hr />

<p>These patterns aren’t novel — they’re applications of least-privilege and defense-in-depth to a context where the “user” is a process that can act autonomously for an extended period. The novelty is that most existing IAM tooling is optimized for human users and short-lived API calls. Adapting it for agents requires being explicit about decisions the tooling used to make implicitly.</p>]]></content><author><name>qwoxff</name></author><category term="iam" /><category term="agentic-ai" /><category term="multi-tenancy" /><category term="cloud-security" /><summary type="html"><![CDATA[How to structure identity, delegation, and authorization when your tenants are autonomous agents running on shared infrastructure.]]></summary></entry><entry><title type="html">Policy as Code: Enforcing Security Guardrails with OPA and Rego</title><link href="https://woxff.github.io/security-blog/2026/04/01/policy-as-code-opa-rego-security-guardrails/" rel="alternate" type="text/html" title="Policy as Code: Enforcing Security Guardrails with OPA and Rego" /><published>2026-04-01T00:00:00+00:00</published><updated>2026-04-01T00:00:00+00:00</updated><id>https://woxff.github.io/security-blog/2026/04/01/policy-as-code-opa-rego-security-guardrails</id><content type="html" xml:base="https://woxff.github.io/security-blog/2026/04/01/policy-as-code-opa-rego-security-guardrails/"><![CDATA[<p>Security enforcement logic tends to spread. Access control checks live in application code. Admission rules live in shell scripts. IAM conditions live in JSON buried in Terraform. Kubernetes RBAC lives in YAML files that no one reviews. Over time the policy surface becomes impossible to audit, test, or reason about consistently.</p>

<p>Policy as Code addresses this by treating security rules as first-class, version-controlled, testable code — evaluated by a dedicated policy engine rather than scattered across every system that needs to make an authorization decision.</p>

<p><a href="https://www.openpolicyagent.org/">Open Policy Agent (OPA)</a> is the most widely adopted engine for this pattern. This post covers how to design with it effectively: the data model, the evaluation model, integration patterns, and the failure modes that matter in production.</p>

<h2 id="what-opa-actually-does">What OPA actually does</h2>

<p>OPA is a general-purpose policy engine. You give it:</p>

<ol>
  <li><strong>A policy</strong> — written in Rego, OPA’s declarative language</li>
  <li><strong>Input</strong> — a JSON document describing the request being evaluated</li>
  <li><strong>Data</strong> — context OPA can query during evaluation (asset inventory, user roles, etc.)</li>
</ol>

<p>OPA evaluates the policy against the input and data and returns a decision. That decision can be a boolean, a structured object, a list — whatever your policy defines.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Input (JSON)  +  Policy (Rego)  +  Data (JSON)  →  Decision (JSON)
</code></pre></div></div>

<p>The engine has no built-in concept of users, resources, or actions. Those are defined by your policy. This makes OPA applicable across a wide range of enforcement points: Kubernetes admission, API gateways, CI/CD pipelines, infrastructure provisioning, and autonomous agent systems.</p>

<h2 id="the-rego-data-model">The Rego data model</h2>

<p>Rego is a declarative language built on Datalog. The key mental model shift from imperative languages: you define <em>what is true</em>, not <em>what to do</em>.</p>

<p>A policy that allows read access to non-production resources:</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ow">package</span> <span class="n">authz</span>

<span class="ow">default</span> <span class="n">allow</span> <span class="o">=</span> <span class="kc">false</span>

<span class="n">allow</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">action</span> <span class="o">==</span> <span class="s2">"read"</span>
    <span class="n">input</span><span class="p">.</span><span class="n">resource</span><span class="p">.</span><span class="n">environment</span> <span class="p">!</span><span class="o">=</span> <span class="s2">"prod"</span>
    <span class="n">valid_principal</span>
<span class="p">}</span>

<span class="n">valid_principal</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">principal</span><span class="p">.</span><span class="n">role</span> <span class="o">==</span> <span class="s2">"engineer"</span>
<span class="p">}</span>

<span class="n">valid_principal</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">principal</span><span class="p">.</span><span class="n">role</span> <span class="o">==</span> <span class="s2">"analyst"</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This policy has no <code class="language-plaintext highlighter-rouge">if/else</code>, no loops, no mutation. Each rule defines a condition under which a value is true. If any rule body evaluates to true, the rule head is true. Multiple rules with the same head are implicitly OR’d.</p>

<p>The evaluation model: OPA evaluates all rules and returns the result. There is no short-circuit, no ordering dependency, no hidden state. Given the same input and data, the same policy always returns the same decision.</p>

<h2 id="structuring-policies-for-real-systems">Structuring policies for real systems</h2>

<p>A flat policy file works for toy examples. Production systems need structure.</p>

<p><strong>Separate packages by enforcement domain.</strong> Each enforcement point gets its own package:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>policy/
├── authz/
│   └── api_gateway.rego        # API request authorization
├── admission/
│   └── kubernetes.rego         # Kubernetes admission control
├── cloud/
│   ├── iam.rego                # IAM policy evaluation
│   └── storage.rego            # Storage access decisions
└── autonomous/
    └── remediation.rego        # Agent action authorization
</code></pre></div></div>

<p><strong>Separate policy from data.</strong> Policies should not hardcode lists of approved values. They should query external data:</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ow">package</span> <span class="n">authz</span><span class="p">.</span><span class="n">api_gateway</span>

<span class="ow">import</span> <span class="n">data</span><span class="p">.</span><span class="n">approved_services</span>
<span class="ow">import</span> <span class="n">data</span><span class="p">.</span><span class="n">user_roles</span>

<span class="ow">default</span> <span class="n">allow</span> <span class="o">=</span> <span class="kc">false</span>

<span class="n">allow</span> <span class="p">{</span>
    <span class="n">service</span> <span class="o">:=</span> <span class="n">approved_services</span><span class="p">[</span><span class="n">input</span><span class="p">.</span><span class="n">service_id</span><span class="p">]</span>
    <span class="n">role</span>    <span class="o">:=</span> <span class="n">user_roles</span><span class="p">[</span><span class="n">input</span><span class="p">.</span><span class="n">principal_id</span><span class="p">]</span>
    <span class="n">role</span><span class="p">.</span><span class="n">permissions</span><span class="p">[</span><span class="n">_</span><span class="p">]</span> <span class="o">==</span> <span class="n">input</span><span class="p">.</span><span class="n">action</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">approved_services</code> and <code class="language-plaintext highlighter-rouge">user_roles</code> documents are loaded into OPA separately — from a database, a config map, or a bundle. The policy expresses <em>how</em> to evaluate; the data expresses <em>what</em> is true at a given point in time.</p>

<p><strong>Layer allow and deny explicitly.</strong> OPA has no built-in precedence between allow and deny rules. You define the logic:</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ow">package</span> <span class="n">cloud</span><span class="p">.</span><span class="n">storage</span>

<span class="ow">default</span> <span class="n">decision</span> <span class="o">=</span> <span class="s2">"deny"</span>

<span class="n">decision</span> <span class="o">=</span> <span class="s2">"allow"</span> <span class="p">{</span>
    <span class="n">allow</span>
    <span class="ow">not</span> <span class="n">deny</span>
<span class="p">}</span>

<span class="n">allow</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">action</span> <span class="o">==</span> <span class="s2">"read"</span>
    <span class="n">input</span><span class="p">.</span><span class="n">resource</span><span class="p">.</span><span class="n">classification</span> <span class="p">!</span><span class="o">=</span> <span class="s2">"restricted"</span>
<span class="p">}</span>

<span class="n">deny</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">resource</span><span class="p">.</span><span class="n">environment</span> <span class="o">==</span> <span class="s2">"prod"</span>
    <span class="n">input</span><span class="p">.</span><span class="n">principal</span><span class="p">.</span><span class="n">type</span> <span class="o">==</span> <span class="s2">"service_account"</span>
    <span class="ow">not</span> <span class="n">input</span><span class="p">.</span><span class="n">principal</span><span class="p">.</span><span class="n">approved_for_prod</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Explicit layering makes the precedence visible in the policy, not hidden in evaluation order.</p>

<h2 id="kubernetes-admission-control">Kubernetes admission control</h2>

<p>OPA integrates with Kubernetes via the admission webhook. Every resource creation or update passes through OPA before it is committed to the cluster.</p>

<p>A policy that blocks containers running as root:</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ow">package</span> <span class="n">kubernetes</span><span class="p">.</span><span class="n">admission</span>

<span class="n">deny</span><span class="p">[</span><span class="n">msg</span><span class="p">]</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">kind</span><span class="p">.</span><span class="n">kind</span> <span class="o">==</span> <span class="s2">"Pod"</span>
    <span class="n">container</span> <span class="o">:=</span> <span class="n">input</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">object</span><span class="p">.</span><span class="n">spec</span><span class="p">.</span><span class="n">containers</span><span class="p">[</span><span class="n">_</span><span class="p">]</span>
    <span class="ow">not</span> <span class="n">container</span><span class="p">.</span><span class="n">securityContext</span><span class="p">.</span><span class="n">runAsNonRoot</span>

    <span class="n">msg</span> <span class="o">:=</span> <span class="n">sprintf</span><span class="p">(</span><span class="s2">"container '%v' must set runAsNonRoot: true"</span><span class="p">,</span> <span class="p">[</span><span class="n">container</span><span class="p">.</span><span class="n">name</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A policy that requires all deployments to have resource limits:</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">deny</span><span class="p">[</span><span class="n">msg</span><span class="p">]</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">kind</span><span class="p">.</span><span class="n">kind</span> <span class="o">==</span> <span class="s2">"Deployment"</span>
    <span class="n">container</span> <span class="o">:=</span> <span class="n">input</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">object</span><span class="p">.</span><span class="n">spec</span><span class="p">.</span><span class="n">template</span><span class="p">.</span><span class="n">spec</span><span class="p">.</span><span class="n">containers</span><span class="p">[</span><span class="n">_</span><span class="p">]</span>
    <span class="ow">not</span> <span class="n">container</span><span class="p">.</span><span class="n">resources</span><span class="p">.</span><span class="n">limits</span>

    <span class="n">msg</span> <span class="o">:=</span> <span class="n">sprintf</span><span class="p">(</span><span class="s2">"container '%v' must define resource limits"</span><span class="p">,</span> <span class="p">[</span><span class="n">container</span><span class="p">.</span><span class="n">name</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">deny[msg]</code> pattern collects all violations as a set of messages rather than short-circuiting on the first. This means a single rejected request returns all the reasons it was rejected — useful for developer feedback.</p>

<h2 id="cicd-pipeline-enforcement">CI/CD pipeline enforcement</h2>

<p>Admission control catches problems at deploy time. Policy checks in CI catch them earlier, when the cost of a fix is lower.</p>

<p>The pattern: run OPA against infrastructure-as-code output (Terraform plan, Helm chart, Kubernetes manifests) in CI before the deployment reaches the cluster.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># evaluate a Terraform plan against a policy bundle</span>
terraform show <span class="nt">-json</span> plan.tfplan | opa <span class="nb">eval</span> <span class="se">\</span>
  <span class="nt">--data</span> policy/ <span class="se">\</span>
  <span class="nt">--input</span> /dev/stdin <span class="se">\</span>
  <span class="s2">"data.terraform.deny"</span>
</code></pre></div></div>

<p>A policy that flags public S3 buckets in Terraform plans:</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ow">package</span> <span class="n">terraform</span>

<span class="n">deny</span><span class="p">[</span><span class="n">msg</span><span class="p">]</span> <span class="p">{</span>
    <span class="n">resource</span> <span class="o">:=</span> <span class="n">input</span><span class="p">.</span><span class="n">resource_changes</span><span class="p">[</span><span class="n">_</span><span class="p">]</span>
    <span class="n">resource</span><span class="p">.</span><span class="n">type</span> <span class="o">==</span> <span class="s2">"aws_s3_bucket_public_access_block"</span>
    <span class="n">resource</span><span class="p">.</span><span class="n">change</span><span class="p">.</span><span class="n">after</span><span class="p">.</span><span class="n">block_public_acls</span> <span class="o">==</span> <span class="kc">false</span>

    <span class="n">msg</span> <span class="o">:=</span> <span class="n">sprintf</span><span class="p">(</span><span class="s2">"resource '%v' must block public ACLs"</span><span class="p">,</span> <span class="p">[</span><span class="n">resource</span><span class="p">.</span><span class="n">address</span><span class="p">])</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The same policy language, the same engine, the same evaluation model — applied at a different point in the delivery pipeline.</p>

<h2 id="policy-for-autonomous-agent-systems">Policy for autonomous agent systems</h2>

<p>The pattern that matters most for agentic systems: the policy engine as the decision boundary between AI reasoning and infrastructure action.</p>

<p>An agent produces a remediation plan. The plan is evaluated against an OPA policy before any action is taken. The agent cannot bypass this — it has no credentials, no direct access to infrastructure. It can only submit plans and receive decisions.</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ow">package</span> <span class="n">autonomous</span><span class="p">.</span><span class="n">remediation</span>

<span class="ow">default</span> <span class="n">allow</span>       <span class="o">=</span> <span class="kc">false</span>
<span class="ow">default</span> <span class="n">requires_approval</span> <span class="o">=</span> <span class="kc">false</span>
<span class="ow">default</span> <span class="n">deny</span>        <span class="o">=</span> <span class="kc">false</span>

<span class="c1"># Hard denies — no path to allow</span>
<span class="n">deny</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">blast_radius</span> <span class="o">==</span> <span class="s2">"high"</span>
<span class="p">}</span>

<span class="n">deny</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">action_type</span> <span class="o">==</span> <span class="s2">"delete_resource"</span>
<span class="p">}</span>

<span class="c1"># Production requires a human in the loop</span>
<span class="n">requires_approval</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">environment</span> <span class="o">==</span> <span class="s2">"prod"</span>
    <span class="ow">not</span> <span class="n">deny</span>
<span class="p">}</span>

<span class="c1"># Automated allow: non-prod, high confidence, rollback available</span>
<span class="n">allow</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">.</span><span class="n">environment</span> <span class="p">!</span><span class="o">=</span> <span class="s2">"prod"</span>
    <span class="n">input</span><span class="p">.</span><span class="n">confidence</span>  <span class="o">==</span> <span class="s2">"high"</span>
    <span class="n">input</span><span class="p">.</span><span class="n">rollback_available</span> <span class="o">==</span> <span class="kc">true</span>
    <span class="ow">not</span> <span class="n">deny</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The policy version is logged with every decision. When investigating an incident, you can reconstruct exactly what policy was in effect at the time of each action — which rule matched, which data was evaluated. This is the auditability property that makes autonomous action trustworthy.</p>

<h2 id="testing-policies">Testing policies</h2>

<p>Untested policies are configuration. The Rego unit test framework makes policies verifiable:</p>

<div class="language-rego highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ow">package</span> <span class="n">autonomous</span><span class="p">.</span><span class="n">remediation_test</span>

<span class="n">test_deny_high_blast_radius</span> <span class="p">{</span>
    <span class="n">deny</span> <span class="ow">with</span> <span class="n">input</span> <span class="ow">as</span> <span class="p">{</span>
        <span class="s2">"action_type"</span><span class="o">:</span>        <span class="s2">"block_public_s3_access"</span><span class="p">,</span>
        <span class="s2">"blast_radius"</span><span class="o">:</span>       <span class="s2">"high"</span><span class="p">,</span>
        <span class="s2">"environment"</span><span class="o">:</span>        <span class="s2">"dev"</span><span class="p">,</span>
        <span class="s2">"confidence"</span><span class="o">:</span>         <span class="s2">"high"</span><span class="p">,</span>
        <span class="s2">"rollback_available"</span><span class="o">:</span> <span class="kc">true</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="n">test_allow_non_prod_s3_block</span> <span class="p">{</span>
    <span class="n">allow</span> <span class="ow">with</span> <span class="n">input</span> <span class="ow">as</span> <span class="p">{</span>
        <span class="s2">"action_type"</span><span class="o">:</span>        <span class="s2">"block_public_s3_access"</span><span class="p">,</span>
        <span class="s2">"blast_radius"</span><span class="o">:</span>       <span class="s2">"low"</span><span class="p">,</span>
        <span class="s2">"environment"</span><span class="o">:</span>        <span class="s2">"dev"</span><span class="p">,</span>
        <span class="s2">"confidence"</span><span class="o">:</span>         <span class="s2">"high"</span><span class="p">,</span>
        <span class="s2">"rollback_available"</span><span class="o">:</span> <span class="kc">true</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="n">test_requires_approval_in_prod</span> <span class="p">{</span>
    <span class="n">requires_approval</span> <span class="ow">with</span> <span class="n">input</span> <span class="ow">as</span> <span class="p">{</span>
        <span class="s2">"action_type"</span><span class="o">:</span>        <span class="s2">"block_public_s3_access"</span><span class="p">,</span>
        <span class="s2">"blast_radius"</span><span class="o">:</span>       <span class="s2">"low"</span><span class="p">,</span>
        <span class="s2">"environment"</span><span class="o">:</span>        <span class="s2">"prod"</span><span class="p">,</span>
        <span class="s2">"confidence"</span><span class="o">:</span>         <span class="s2">"high"</span><span class="p">,</span>
        <span class="s2">"rollback_available"</span><span class="o">:</span> <span class="kc">true</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Run with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>opa <span class="nb">test </span>policy/ <span class="nt">-v</span>
</code></pre></div></div>

<p>Policy changes go through the same review process as application code — diff, review, test, merge. This is the core value proposition: security rules become auditable, reviewable, and testable like any other code.</p>

<h2 id="what-breaks-in-practice">What breaks in practice</h2>

<p><strong>Policy and data drift.</strong> The policy assumes a data schema. The data changes. Nothing enforces the contract between them. Mitigation: schema validation on data bundles; tests that cover the boundary conditions that depend on specific data shapes.</p>

<p><strong>Overly broad defaults.</strong> <code class="language-plaintext highlighter-rouge">default allow = false</code> is correct but teams sometimes flip it for convenience and never flip it back. Make the default explicit in every package, not just the root.</p>

<p><strong>Missing the enforcement point.</strong> OPA decides; something else must enforce. If a service ignores the OPA decision, the policy is theatre. The integration must be verified — ideally with an integration test that confirms a denied request actually fails.</p>

<p><strong>Policy explosion without structure.</strong> A single flat policy file becomes unmanageable past a few hundred lines. Package structure and data separation are not premature optimisation — they are prerequisites for a policy layer that a team can maintain over time.</p>

<hr />

<p>The value of a policy engine is not that it makes authorization easier to write. It is that it makes authorization possible to audit, test, and evolve independently of the systems it governs. When a policy change needs to go through review, when every decision is logged with the policy version that produced it, and when tests break before a bad rule reaches production — that is when policy as code pays for itself.</p>]]></content><author><name>qwoxff</name></author><category term="opa" /><category term="rego" /><category term="policy-as-code" /><category term="kubernetes" /><category term="cloud-security" /><summary type="html"><![CDATA[How to replace scattered security logic with a unified, testable policy layer using Open Policy Agent and Rego — applied to cloud infrastructure, Kubernetes, and autonomous agent systems.]]></summary></entry></feed>