Release Notes¶

v1.7.2¶

Features¶

daystrom model-security command group: Full AI Model Security operations — security groups CRUD, rule browsing, rule instance configuration, scan operations (create/list/get), evaluations, violations, files, label management, and PyPI authentication.
SdkModelSecurityService: New service wrapping ModelSecurityClient with camelCase normalization for all 23 SDK methods.
5 subcommand groups: groups (list/get/create/update/delete), rules (list/get), rule-instances (list/get/update), scans (list/get/create/evaluations/violations/files), labels (add/set/delete/keys/values), plus pypi-auth.
SDK upgrade: @cdot65/prisma-airs-sdk v0.6.0 → v0.6.1 — fixed list filter options for groups, rules, and rule instances.

Tests¶

390 tests across 25 spec files (up from 360)

v1.7.0¶

Features¶

Red team target CRUD: Full target lifecycle management — create, get, update, delete via CLI (daystrom redteam targets <subcommand>) and library API (SdkRedTeamService).
Target connection validation: --validate flag on targets create and targets update validates connectivity before saving (SDK v0.6.0 TargetOperationOptions).
Target probe: daystrom redteam targets probe --config conn.json tests a target connection without persisting.
Target profile management: targets profile <uuid> and targets update-profile <uuid> for target profiling configuration.
Prompt set full CRUD: get, update, archive/unarchive, version-info, CSV template download via daystrom redteam prompt-sets <subcommand>.
CSV prompt upload: daystrom redteam prompt-sets upload <uuid> file.csv for bulk prompt ingestion (SDK v0.6.0 uploadPromptsCsv()).
Individual prompt CRUD: list, get, add, update, delete prompts within sets via daystrom redteam prompts <subcommand>.
Property management: daystrom redteam properties {list,create,values,add-value} for custom attack property names and values.
SDK upgrade: @cdot65/prisma-airs-sdk v0.4.0 → v0.6.0 — fully typed target schemas (connection params, background, metadata, additional context), no breaking changes.

CLI Changes¶

Existing flat commands refactored to subcommand groups:

Before (v1.6.0)	After (v1.7.0)
`daystrom redteam targets`	`daystrom redteam targets list`
`daystrom redteam prompt-sets`	`daystrom redteam prompt-sets list`
—	`daystrom redteam targets {get,create,update,delete,probe,profile,update-profile}`
—	`daystrom redteam prompt-sets {get,create,update,archive,download,upload}`
—	`daystrom redteam prompts {list,get,add,update,delete}`
—	`daystrom redteam properties {list,create,values,add-value}`

Tests¶

360 tests across 24 spec files (up from 333)
100% coverage on redteam.ts and promptsets.ts

v1.6.0¶

Features¶

daystrom audit <profileName>: New command evaluates all topics in an AIRS security profile. Generates tests per topic, scans them, computes per-topic and composite metrics (TPR, TNR, coverage, accuracy, F1), and detects cross-topic conflicts.
Per-topic metrics: Each topic gets its own efficacy breakdown, enabling identification of weak guardrails within a profile.
Conflict detection: Identifies cross-topic interference — prompts that are false negatives for one topic and false positives for another.
Audit reports: --format json and --format html export audit results with per-topic metrics tables, conflict sections, and composite scores.
getProfileTopics(): New ManagementService method extracts enriched topic entries from profile policy structure.
TestCase.targetTopic: New optional field for audit topic attribution (backward-compatible with existing loop).

Tests¶

333 tests across 24 spec files (up from 298)

v1.5.0¶

Features¶

Structured evaluation reports: daystrom report now supports --format json and --format html for machine-readable and shareable report export.
Per-test-case details: --tests flag includes individual test results (prompt, expected/actual outcome, pass/fail, category, source) in all output formats.
Run comparison: --diff <runId> compares two runs side-by-side with metric deltas (coverage, TPR, TNR, accuracy, F1).
Self-contained HTML reports: HTML output includes embedded CSS with run summary, iteration trends, metrics tables, test result tables, and diff sections. No external dependencies.
JSON export: Clean structured JSON to stdout for CI/CD pipelines and programmatic consumption.
New report module: buildReportJson() and buildReportHtml() exported as library functions for custom integrations.

Tests¶

298 tests across 21 spec files (up from 272)

v1.4.0¶

Features¶

Carry forward failures: FP and FN test cases from each iteration are automatically carried into the next iteration's test suite. Failed tests are re-scanned to verify whether topic refinement fixed them.
Regression tier: TP and TN (correct) test cases from the previous iteration are re-scanned as regression tests. If a previously-correct test now fails after topic refinement, it's counted as a regression.
Weighted category generation: Per-category error rates from the previous iteration are passed to the LLM test generator, which produces proportionally more tests for high-error categories.
tests:composed event: New loop event reports test composition breakdown (generated, carried failures, regression tier, total) on iterations 2+.
regressionCount metric: EfficacyMetrics now includes the count of regression-tier tests that failed.
CategoryBreakdown type: New exported type for per-category FP/FN/error-rate breakdown.
computeCategoryBreakdown() helper: New exported function to compute per-category error rates from test results.
Test source tagging: TestCase.source field tracks how each test entered the suite ('generated', 'carried-fp', 'carried-fn', 'regression').

Tests¶

272 tests across 19 spec files (up from 265)

v1.3.1¶

Documentation¶

End-to-end example workflow: New Examples section with a complete guardrail-to-red-team walkthrough — generate a topic guardrail, export prompts as a custom prompt set, launch a CUSTOM red team scan, monitor status, and review per-prompt results. All output captured from a real AIRS run.

v1.3.0¶

Bug Fixes¶

Fix custom scan payload: AIRS API expects custom_prompt_sets as an array of UUID strings, not objects. createScan() was wrapping each UUID in { uuid }, causing 422 validation errors on all CUSTOM scan requests.
Fix ASR display: AIRS API returns ASR/score/threatRate as percentages (0-100), not ratios (0-1). Renderer was multiplying by 100, showing e.g. 1250% instead of 12.5%.

Features¶

Custom attack list in reports: daystrom redteam report <jobId> --attacks now shows prompt-level results for CUSTOM scans — prompt text, goal, threat status, and per-prompt ASR.

Tests¶

258 tests across 19 spec files (up from 255)

Documentation¶

Updated red team CLI examples with real-world usage patterns
Added tip for finding prompt set UUIDs

v1.2.0¶

Features¶

daystrom redteam command group: Full AI Red Team scan operations — launch static/dynamic/custom scans, poll for completion, view reports with severity breakdowns and attack details, list targets and categories, abort running scans.
SdkRedTeamService: New service wrapping RedTeamClient for programmatic red team operations. Normalizes all SDK responses into clean TypeScript interfaces.
7 subcommands: scan, status, report, list, targets, categories, abort.

Tests¶

255 tests across 19 spec files (up from 230), 100% coverage on new code.

v1.1.2¶

Bug Fixes¶

Fix profile topic-list payload: AIRS rejects empty topic-list entries. The assignTopicToProfile method was sending two entries (one for the action, one empty for the opposite), causing a 400 error on profile update. Now sends a single entry containing only the active topic. Also removed unnecessary revision field from topic entries.

v1.1.1¶

Bug Fixes¶

Fix allow-intent detection (P0): The v1.1.0 action === 'allow' heuristic was wrong — AIRS returns action: 'allow' for all prompts on allow topics. Detection now uses the category field ('benign' = topic matched, 'malicious' = no match), with fallback to triggered when category is absent.
Fix profile guardrail-level action: topic-guardrails entry in security profiles now always uses action: 'block' to enforce violations. Previously defaulted to 'allow', causing all topic guardrails to be unenforced.

Features¶

--debug-scans flag: Dumps raw AIRS scan responses to a JSONL file (~/.daystrom/debug-scans-*.jsonl) for offline inspection. Available on both generate and resume commands.
Scanner extracts category: The category field from AIRS responses is now included in ScanResult.
--create-prompt-set flag: Auto-creates a custom prompt set in AI Runtime Security from the best iteration's test cases. Prompts include goals indicating expected guardrail behavior. Available on both generate and resume commands.
SdkPromptSetService: New service wrapping RedTeamClient.customAttacks for prompt set CRUD.
promptset:created event: New loop event emitted after prompt set creation with set ID, name, and prompt count.

Tests¶

230 tests across 18 spec files (up from 209)

v1.1.0¶

Bug Fixes¶

Fix allow-intent detection (P0): AIRS never sets triggered: true for allow-intent topics — the loop now derives detection from the action field (action === 'allow' = topic matched). This fixes 0% TPR on all allow guardrails.

Features¶

Intent-aware refinement: analyzeResults() and improveTopic() now receive the guardrail intent (block/allow), enabling the LLM to prioritize the correct error type during refinement — FN reduction for block guardrails, FP reduction for allow guardrails
Intent-specific test generation: test prompts now use different strategies and category taxonomies for block vs allow guardrails, with asymmetric ratios (~15 positive / ~25 negative for allow)
Variable example count (2-5): LLM now varies example count between iterations to find optimal configuration. Memory system tracks example count correlation with efficacy.
Test accumulation: new --accumulate-tests flag carries test prompts forward across iterations with case-insensitive deduplication for regression detection
Max accumulated tests cap: --max-accumulated-tests <n> limits growth of accumulated test pool
tests:accumulated event: new loop event reports new/total/dropped test counts when accumulation is active

Tests¶

209 tests across 17 spec files (up from 192)

v1.0.8¶

Documentation¶

Sync remaining docs pages with v1.0.7 changes: fix event table in design-decisions.md, add format:check to contributing.md and local-setup.md

v1.0.7¶

Dependencies¶

Bump @cdot65/prisma-airs-sdk from ^0.2.0 to ^0.4.0 -- adds Model Security, Red Team domains, typed enums, JSDoc, shared retry logic (backward compatible)

CI¶

Add explicit format:check step to CI workflow to catch formatting violations in PRs

Documentation¶

Fix inaccurate LoopEvent documentation in CLAUDE.md and docs/architecture/core-loop.md
Remove unused env vars (CLOUD_ML_REGION, ANTHROPIC_VERTEX_PROJECT_ID, PANW_AI_SEC_API_TOKEN, PANW_AI_SEC_PROFILE_NAME) from .env.example and reference docs
Add JSDoc/TSDoc to all 36+ exported symbols across the public API

Code Quality¶

Extract shared byteLen() utility from constraints.ts, remove duplicate in service.ts
Remove deprecated MAX_EXAMPLES_COUNT constant

v1.0.0¶

First stable release of Daystrom.

Highlights¶

Core iterative refinement loop with async generator architecture
6 LLM providers: Claude (API, Vertex, Bedrock) and Gemini (API, Vertex, Bedrock)
Cross-run learning memory with keyword categorization and budget-aware prompt injection
AIRS integration -- topic CRUD via Management API, batch scanning via Scan API
4 CLI commands: generate, resume, report, list
Automatic topic constraint clamping for AIRS limits
Comprehensive metrics: TPR, TNR, coverage, accuracy, F1
Resumable runs with full state persistence
192 tests across 17 spec files
Full documentation site at cdot65.github.io/daystrom
Docker support with multi-arch images (amd64 + arm64)

v0.1.0 -- Initial Release¶

The first public release of Daystrom: an automated CLI for generating, testing, and iteratively refining Palo Alto Prisma AIRS custom topic guardrails.

Highlights¶

Core iterative refinement loop with async generator architecture (runLoop() yields typed LoopEvent discriminated unions)
6 LLM providers supported: Claude (API, Vertex, Bedrock) and Gemini (API, Vertex, Bedrock)
Cross-run learning memory with keyword categorization and budget-aware prompt injection
AIRS integration -- topic CRUD via Management API, batch scanning via Scan API
4 CLI commands: generate, resume, report, list
Automatic topic constraint clamping for AIRS limits (100 char name, 250 char description, 250 char/example, 5 examples max, 1000 char combined)
Comprehensive metrics: TPR, TNR, coverage, accuracy, F1
Resumable runs with full state persistence to ~/.daystrom/runs/
165+ tests with ~98% statement coverage

Architecture Decisions¶

Decision	Rationale
AsyncGenerator loop	Enables streaming events to CLI renderer, pause/resume, and decoupled orchestration
Structured LLM output (Zod)	Guarantees type-safe topic definitions with automatic retry on parse failure
MSW for test mocking	Fully offline test suite, no AIRS credentials needed
File-based memory	Simple persistence, no database dependency, human-readable JSON