AI Security Commons
Public research notes, lab scenarios, attack-pattern mappings, defensive architecture guidance, and reusable mission templates for GenAI and agentic AI security.
AI War Games publishes practical security material to help developers, educators, researchers, and security teams understand how AI systems fail and how to defend them. The material is designed to be reusable in education, study groups, product security reviews, authorized testing, and hands-on labs.
Learn → Attack → Defend → Build / Protect → Research
Research is the documentation layer of the lab loop
Each Commons artifact is meant to explain behavior observed in learning tracks, missions, Agentic Lab, Builder validation, or Defender replay—not to stand apart as a generic blog post.
Practice controlled prompt injection, jailbreaks, data extraction, and tool misuse in safe browser labs.
Replay failures, inspect what happened, and connect mitigations to specific controls.
Create scenarios, define secrets and tool boundaries, validate attacks, and harden the design.
Turn observations into reusable notes, attack patterns, templates, and evaluation methods.
Featured research
Start with these reusable notes
Agentic AI Security Reference Architecture v0.2
A product-grounded architecture for Agentic Lab systems that use tools, memory, retrieval, approval gates, and replay logs.
Indirect Prompt Injection Through Retrieved Content
A concrete attack-pattern note for retrieved text that attempts to override policy, misuse tools, or redirect agent behavior.
Tool Permission Matrix for AI Agents
A reusable Build / Protect template for deciding which agent tools are allowed, approval-required, or blocked.
Categories
Browse by research artifact type
Architecture
1 entriesReference models for securing GenAI systems, tool-using agents, memory, and human approval flows.
Attack Patterns
1 entriesReusable descriptions of common GenAI and agentic AI attack techniques, mapped to public frameworks where useful.
Defense Controls
1 entriesPractical controls, checklists, and decision aids for reducing AI security risk in authorized environments.
Lab Notes
1 entriesControlled observations from synthetic lab scenarios, including setup, expected behavior, and limitations.
Mission Templates
1 entriesReusable lab and mission outlines for educators, researchers, and security teams building safe practice scenarios.
Evaluation Methods
0 entriesMethods for scoring, comparing, and documenting AI security behavior without overclaiming assurance.
Evaluation Method entries
Filtered: Evaluation Method
Mapped to common AI security frameworks
Framework mappings are educational aids, not compliance claims
Where useful, entries are mapped to public AI security frameworks to make them easier to reuse in education, reviews, authorized testing, and internal security programs. Current mappings reference OWASP GenAI / LLM Top 10, MITRE ATLAS, and the NIST AI RMF / GenAI Profile.
How this complements vulnerable-app and CTF labs
AI War Games is not designed as a single vulnerable AI application. It complements vulnerable-app and CTF-style projects by emphasizing reusable learning paths, defense review workflows, before/after comparison, study-group evidence, mission templates, and anonymized research observations.
Open Artifacts Roadmap
Reusable artifacts for research, education, and defensive practice
This roadmap outlines the public-good artifacts AI War Games is developing for safe labs and reusable research workflows. These are early artifacts and initial drafts; labels indicate what is usable now versus still planned.
Browse draft files in the docs/commons folder.
Mission Schema
In progressInitial draft schema for defining safe LLM and agentic AI security labs, including objective, target, OWASP mapping, hints, success criteria, and learning notes.
Evidence Record Schema
Initial draftv0.1 draft privacy-safe format for recording lab outcomes, attempt categories, analyzer verdicts, and learning observations without exposing raw harmful prompts.
Defense Comparison Schema
Initial draftv0.1 draft structure for documenting before/after defense results, vulnerabilities reduced, residual findings, and recommended controls.
Evaluation Rubric
In progressInitial draft criteria for assessing prompt injection, data leakage, tool misuse, policy bypass, and mitigation effectiveness.
Responsible Use Guide
AvailableDraft artifact with safety boundaries for authorized, sandboxed, defensive AI security practice.
Study Group Facilitator Guide
Initial draftv0.1 draft reusable guidance for running AI security learning sessions with labs, discussion prompts, and aggregate learning summaries.
Sample Checkup Report Format
Initial draftv0.1 draft report structure for findings, evidence, OWASP mappings, remediation notes, and limitations.
Links are published only for available or initial draft artifacts that currently exist in the repository.
Open-use note
These materials are intended for education, defensive research, and responsible AI security learning. Content is published for reuse under open licenses where specified. Attack examples are simplified and controlled. Do not use these techniques against systems without authorization. Read the Responsible Use Guide and Research Use Terms for project boundaries.