Guardrail Generation¶
Daystrom's guardrail generation capability uses an LLM-driven feedback loop to create, test, and iteratively refine custom topic guardrails for Prisma AIRS security profiles.
How It Works¶
- Generate — An LLM produces a custom topic definition (name, description, examples) based on your intent (block or allow)
- Deploy — The topic is created/updated in AIRS via the Management API and linked to your security profile
- Test — Synthetic test prompts are scanned against the profile to measure detection accuracy
- Evaluate — Metrics (TPR, TNR, coverage, F1) determine how well the guardrail performs
- Improve — The LLM analyzes failures and refines the topic definition
- Repeat — The loop continues until coverage reaches the target threshold (default 90%)
CLI Usage¶
# Interactive mode — prompts for all inputs
daystrom generate
# Non-interactive with all options
daystrom generate \
--topic-name "weapons-discussion" \
--description "Block discussions about weapons manufacturing" \
--intent block \
--profile my-security-profile \
--target-coverage 0.9 \
--max-iterations 5
Key Concepts¶
- Intent:
block(detect violating prompts) orallow(detect benign prompts that should pass through) - Coverage:
min(TPR, TNR)— both detection types must meet the threshold - Topic name lock: After iteration 1, only the description and examples are refined — the name stays fixed
- Test composition: Iteration 2+ carries forward failed tests and adds regression checks alongside fresh LLM-generated tests
Related¶
- Core Loop Architecture — detailed loop state machine
- Memory System — cross-run learning persistence
- Metrics & Evaluation — how TP/TN/FP/FN are classified
- Topic Constraints — AIRS limits on topic definitions
- Resumable Runs — pause and resume loop runs