Defense

Tool Permission Matrix for AI Agents

Date: May 18, 2026Status: UpdatedLicense: CC BY 4.0Read time: 7 min

Summary: A reusable Build / Protect template for deciding which agent tools are allowed, approval-required, or blocked.

Framework mapping

Mapped to public frameworks where useful for education and reuse. These mappings are not compliance claims, certifications, or assurance statements.

OWASP LLM06 Excessive AgencyOWASP LLM02 Sensitive Information DisclosureNIST AI RMF Govern and ManageMITRE ATLAS

Responsible-use note

AI Security Commons materials are created for education, defensive research, and responsible AI security learning. Attack examples are simplified and controlled. Do not use these techniques against systems without authorization. Review the Research Use Terms before applying any lab ideas.

Tool permission tiers

STEP 1

Allowed

STEP 2

Approval required

STEP 3

Blocked

STEP 4

Logged and replayable

Most AI agent tools can be sorted into a simple tier before adding context-specific policy.

Problem this artifact explains

Agentic systems fail dangerously when every tool feels equally available to the model. AI War Games uses this matrix as the Protect artifact in the practice loop: learners attack tool use, builders define boundaries, and defenders replay whether the boundary held.

Permission tiers

Start with three tiers. A tool may be allowed by default, require explicit user or policy approval, or be blocked for the scenario. Every tier should still produce logs that Defender can replay.

Allowed: low-risk, read-only, scoped, logged, and reversible behavior.
Approval-required: actions with side effects, sensitive data, external communication, or durable state change.
Blocked: actions outside the mission goal, beyond user authority, or unsafe for an educational preview.

Practical matrix template

Use this matrix before publishing a Builder scenario or when reviewing an Agentic Lab failure.

Read-only lookup: allowed when scoped, attributed, and rate-limited.
Draft-only email or ticket: allowed if it creates a draft for human review rather than sending.
Send email or message: approval-required with recipient, content, and reason displayed.
Modify database or permissions: approval-required or blocked unless the mission explicitly teaches that control.
Write memory: approval-required for high-impact claims; blocked for identity, authorization, or safety-critical claims.
External webhook or API: approval-required by default and blocked when the destination is untrusted or not part of the lab.

Where to practice this in AI War Games

Agentic Lab demonstrates what happens when a model can choose tools with too much freedom. Builder lets authenticated users encode a narrower permission model in a custom scenario. Defender shows the replay evidence when a permission boundary was missing or too broad.

What to defend

The matrix teaches builders to defend authority, not just wording. A stronger system can still be helpful while refusing side effects that are not authorized, not scoped, or not reviewable.

Move tool authority out of the prompt and into a gateway.
Validate parameters deterministically before execution.
Require fresh approval for irreversible or external actions.
Record tool proposals, denials, approvals, and outputs for replay.

Builder checklist

Before publishing a custom mission, list each simulated tool, classify it, define its protected asset, decide the expected attacker path, and specify the defense success criteria. That makes the lab teachable and gives Defender a concrete failure to review.