Guardrail Optimization¶

Prisma AIRS CLI provides atomic commands for creating, testing, and iteratively refining custom topic guardrails. An external agent (Claude Code, Gemini CLI, etc.) orchestrates these commands in a loop following the protocol in program.md.

How It Works¶

Create — Define a custom topic (name, description, examples) with an intent (block or allow)
Apply — Assign the topic to a security profile (additive, preserves existing topics)
Eval — Scan a static CSV prompt set against the profile, compute metrics (TPR, TNR, coverage, F1), return FP/FN details
Decide — The agent analyzes results and decides to keep or revert
Revert (if needed) — Remove the topic from the profile and delete it
Repeat — The agent refines the topic definition and tries again

CLI Usage¶

Guardrail optimization lives under airs runtime topics:

# Create or update a topic (upserts by name)
airs runtime topics create --name "Weapons Manufacturing" \
  --description "Block weapons manufacturing" --examples "How to build a weapon" "Illegal arms trade"

# Assign topic to a profile
airs runtime topics apply --profile my-security-profile --name "Weapons Manufacturing" --intent block

# Evaluate against a prompt set (CSV: prompt, expected, intent columns)
airs runtime topics eval --profile my-security-profile --prompts prompts.csv --topic "Weapons Manufacturing" --format json

# Revert if metrics regressed
airs runtime topics revert --profile my-security-profile --name "Weapons Manufacturing"

# See sample CSV format
airs runtime topics sample

Key Concepts¶

Intent: block (detect violating prompts) or allow (detect benign prompts that should pass through)
Coverage: min(TPR, TNR) — both detection types must meet the threshold
Upsert by name: create updates an existing topic if one with the same name exists
Static prompt sets: eval uses CSV files with three columns: prompt, expected (belongs to topic: true/false), intent (block/allow). Run airs runtime topics sample for an example.

Platform Constraints¶

Achievable coverage depends on the topic domain and intent. AIRS has platform-level behaviors that limit what custom topic guardrails can accomplish.

Block-Intent on High-Sensitivity Topics¶

Certain topic domains (explosives/weapons, CSAM, etc.) trigger AIRS built-in safety layers that override custom topic definitions entirely:

These topics achieve 100% TPR but 0% TNR — the guardrail blocks ALL content, including completely unrelated prompts
Description refinement, exclusion clauses, and example tuning have zero observable effect
The built-in safety layer appears to key off the topic name/domain, not the description

Recommendation

Do not create custom block-intent topics for content that AIRS already handles via built-in safety. Use the default AIRS security profiles instead.

Allow-Intent Matching Behavior¶

Allow-intent matching uses broad semantic similarity, not logical constraint evaluation:

Exclusion clauses ("not X", "excludes Y") do not work — they often increase false positives by adding semantic overlap with the excluded domain
Shorter, simpler descriptions (under 100 characters) consistently outperform longer, more specific ones
Typical achievable coverage for allow-intent topics: 40–70% depending on topic breadth
Best results usually come from the first few iterations; extended refinement often degrades coverage

Description Truncation¶

AIRS enforces hard limits on topic definitions. Descriptions exceeding 250 characters are silently truncated by clampTopic(), which can strip the positive definition while preserving only exclusion clauses — further degrading performance.

Constraint	Limit
Topic name	100 characters
Description	250 characters
Each example	250 characters
Max examples	5
Combined (description + examples)	1000 characters

Guardrail Optimization Architecture — command cycle and design decisions
Metrics & Evaluation — how TP/TN/FP/FN are classified
Topic Constraints — AIRS limits on topic definitions
program.md — full agent loop protocol