IAM Design for Multi-Tenant AI Platforms

The IAM model that works well for SaaS applications starts to break down when tenants are autonomous agents. An agent doesn’t just read data on behalf of a user — it spawns sub-tasks, calls external APIs, writes back to tenant state, and may run for minutes or hours unattended. The blast radius of a confused deputy or an over-permissioned credential is much larger than in a request/response world.

This post covers the design decisions I’ve found load-bearing when building IAM for platforms where tenants run code — and increasingly, where that code is AI-driven.

The core problem: ambient authority

In a typical multi-tenant SaaS, each API request carries a credential that scopes the request to a tenant. The credential is short-lived, bound to the HTTP session, and discarded when the request ends.

An agent inverts this model. It acquires credentials early (at task start), uses them across many operations over a long time window, and may delegate further to tools, sub-agents, or external services. At each step, the authority it holds tends to be ambient — silently present, not explicitly re-evaluated.

This creates two failure modes:

Confused deputy: The agent is tricked (via prompt injection, a malicious tool response, or a compromised dependency) into using its credentials on behalf of a different tenant.
Privilege accumulation: The agent picks up permissions it doesn’t need for the current step and holds them for the full task lifetime, widening the window for misuse.

The design goal is to make authority explicit, minimal, and re-evaluated at scope boundaries.

Tenant identity vs. agent identity

The first structural decision is whether an agent runs as a tenant identity or on behalf of a tenant identity.

Running as the tenant means the agent is issued a credential derived directly from the tenant’s identity (e.g., a service account in the tenant’s project, or a JWT with the tenant’s sub). Any action the agent takes is audited as the tenant. The upside is simplicity. The downside is that the platform has no first-party view into what the agent is doing — it looks identical to the tenant using the API directly.

Running on behalf of the tenant means the agent is issued a platform credential that carries the tenant’s identity as a claim, not as the identity itself. The platform can then enforce policies based on both dimensions: what the agent is (platform service, version, trust level) and what tenant it’s acting for.

# Token structure for "on behalf of" model
{
  "iss": "platform.internal",
  "sub": "agent/worker-7f3a",          # agent identity
  "act": { "sub": "tenant/acme-corp" }, # RFC 8693 delegation claim
  "scope": "read:documents write:tasks",
  "jti": "unique-token-id",
  "iat": 1745865600,
  "exp": 1745869200                     # 1h max
}

RFC 8693 (Token Exchange) defines the act claim for exactly this pattern. Downstream services can check both the acting identity and the original subject without having to trust the agent to self-report which tenant it’s working for.

Scoping credentials to the task, not the agent

A long-running agent should not hold a credential scoped to everything it might need. Credentials should be scoped to the current task and renewed (or exchanged) when the task context changes.

A practical pattern: issue a task token at task start that encodes the task ID, tenant, and the minimal permission set for that task type. The token lifetime should match the expected task duration, not a generic TTL. If the task completes in 3 minutes, a 60-minute token is 57 minutes of unnecessary exposure.

# Scoped task token
{
  "sub": "agent/worker-7f3a",
  "act": { "sub": "tenant/acme-corp" },
  "task_id": "task-9b2c",
  "task_type": "document_summarization",
  "scope": "read:documents",           # not write:documents
  "exp": 1745866800                    # 20 min, not 1 hr
}

When the agent needs a different permission (say, to write a result back), it exchanges the task token for a new one scoped to that operation. The exchange is logged. The original task token is not expanded — it stays narrow.

This forces the platform to have an explicit model of which task types need which permissions, which is good architectural hygiene regardless of security requirements.

Authorization at the resource layer

Token scopes tell you what class of operations the agent can perform. They don’t tell you which specific resources a tenant is allowed to touch. That’s the resource layer, and it should be enforced by the authorization service, not trusted from the token.

The pattern I lean toward:

flowchart LR
    Agent -->|task token| Gateway
    Gateway -->|authz check| Authz
    Authz -->|tenant policy| PolicyStore
    Authz -->|resource metadata| ResourceDB
    Gateway -->|if allowed| Resource

The gateway receives the task token, extracts the tenant identity and requested action, and calls the authorization service with the full context: who is acting (agent + tenant), what they want to do (action + resource ID), and what the current task is. The authorization service evaluates the tenant’s policy against the resource metadata — does this tenant own this resource? Does their plan allow this action?

Importantly, the gateway does not trust the agent to self-report the tenant. It reads the tenant claim from the platform-issued token. An agent cannot request resources for a different tenant by claiming a different identity.

Isolation at the execution layer

IAM handles credential issuance and authorization checks. It doesn’t enforce compute isolation. A sufficiently privileged agent running in a shared execution environment can still exfiltrate credentials from the process environment or reach adjacent tenant workloads through the network.

The enforcement layers that compose well with the IAM model above:

Layer	Mechanism	What it prevents
Network	Per-tenant egress rules, no east-west by default	Lateral movement between tenant workloads
Credential storage	In-memory only, no disk serialization	Credential persistence across task boundaries
Metadata service	Block IMDS from agent processes	Cloud credential theft via SSRF
Syscall filtering	Seccomp profiles	Process escape from container

The IMDS point deserves emphasis. On EC2/GKE/AKS, the instance metadata service is reachable by any process in the VM unless explicitly blocked. An agent that can make outbound HTTP requests can request the node’s IAM credentials and use them entirely outside the platform’s authorization model. Block IMDS at the network policy layer for any workload running agent code.

Audit: what to log

The authorization check is the right place to generate the audit record, not the agent. The agent cannot audit itself accurately — it may be compromised, and even if it isn’t, it doesn’t have visibility into the full authorization context.

A useful audit event:

{
  "event": "authz.decision",
  "decision": "allow",
  "principal": {
    "agent": "agent/worker-7f3a",
    "tenant": "tenant/acme-corp"
  },
  "action": "read",
  "resource": {
    "type": "document",
    "id": "doc-1234",
    "tenant": "tenant/acme-corp"
  },
  "task_id": "task-9b2c",
  "policy_version": "2026-04-01",
  "timestamp": "2026-04-28T12:00:00Z"
}

The policy_version field matters more than it looks. When you investigate an incident, you need to know which policy was in effect at decision time — not what the current policy says.

What breaks first in practice

The three things I’ve seen fail most often in real deployments:

1. Token lifetimes set by convention, not by task duration. Teams default to “1 hour” because that’s what the docs say. Long-running agent tasks end up with expired credentials mid-execution; the fix is usually to just make the token longer, which undermines the whole model.

2. Scopes inflated at onboarding. The first version of the task token spec gets extended each time a new task type is added, until it’s effectively *. Add new task types by adding new token types.

3. Delegation chains that aren’t logged end-to-end. Agent A calls tool B which calls service C. The authorization log at C shows the platform credential, not the originating agent or tenant. When something goes wrong at C, there’s no way to trace back to the task. Propagate the task ID (and the full delegation chain) in a request header and log it at every layer.

These patterns aren’t novel — they’re applications of least-privilege and defense-in-depth to a context where the “user” is a process that can act autonomously for an extended period. The novelty is that most existing IAM tooling is optimized for human users and short-lived API calls. Adapting it for agents requires being explicit about decisions the tooling used to make implicitly.