Release Notes¶
v1.7.2¶
Features¶
daystrom model-securitycommand group: Full AI Model Security operations — security groups CRUD, rule browsing, rule instance configuration, scan operations (create/list/get), evaluations, violations, files, label management, and PyPI authentication.SdkModelSecurityService: New service wrappingModelSecurityClientwith camelCase normalization for all 23 SDK methods.- 5 subcommand groups:
groups(list/get/create/update/delete),rules(list/get),rule-instances(list/get/update),scans(list/get/create/evaluations/violations/files),labels(add/set/delete/keys/values), pluspypi-auth. - SDK upgrade:
@cdot65/prisma-airs-sdkv0.6.0 → v0.6.1 — fixed list filter options for groups, rules, and rule instances.
Tests¶
- 390 tests across 25 spec files (up from 360)
v1.7.0¶
Features¶
- Red team target CRUD: Full target lifecycle management —
create,get,update,deletevia CLI (daystrom redteam targets <subcommand>) and library API (SdkRedTeamService). - Target connection validation:
--validateflag ontargets createandtargets updatevalidates connectivity before saving (SDK v0.6.0TargetOperationOptions). - Target probe:
daystrom redteam targets probe --config conn.jsontests a target connection without persisting. - Target profile management:
targets profile <uuid>andtargets update-profile <uuid>for target profiling configuration. - Prompt set full CRUD:
get,update,archive/unarchive,version-info, CSV template download viadaystrom redteam prompt-sets <subcommand>. - CSV prompt upload:
daystrom redteam prompt-sets upload <uuid> file.csvfor bulk prompt ingestion (SDK v0.6.0uploadPromptsCsv()). - Individual prompt CRUD:
list,get,add,update,deleteprompts within sets viadaystrom redteam prompts <subcommand>. - Property management:
daystrom redteam properties {list,create,values,add-value}for custom attack property names and values. - SDK upgrade:
@cdot65/prisma-airs-sdkv0.4.0 → v0.6.0 — fully typed target schemas (connection params, background, metadata, additional context), no breaking changes.
CLI Changes¶
Existing flat commands refactored to subcommand groups:
| Before (v1.6.0) | After (v1.7.0) |
|---|---|
daystrom redteam targets |
daystrom redteam targets list |
daystrom redteam prompt-sets |
daystrom redteam prompt-sets list |
| — | daystrom redteam targets {get,create,update,delete,probe,profile,update-profile} |
| — | daystrom redteam prompt-sets {get,create,update,archive,download,upload} |
| — | daystrom redteam prompts {list,get,add,update,delete} |
| — | daystrom redteam properties {list,create,values,add-value} |
Tests¶
- 360 tests across 24 spec files (up from 333)
- 100% coverage on
redteam.tsandpromptsets.ts
v1.6.0¶
Features¶
daystrom audit <profileName>: New command evaluates all topics in an AIRS security profile. Generates tests per topic, scans them, computes per-topic and composite metrics (TPR, TNR, coverage, accuracy, F1), and detects cross-topic conflicts.- Per-topic metrics: Each topic gets its own efficacy breakdown, enabling identification of weak guardrails within a profile.
- Conflict detection: Identifies cross-topic interference — prompts that are false negatives for one topic and false positives for another.
- Audit reports:
--format jsonand--format htmlexport audit results with per-topic metrics tables, conflict sections, and composite scores. getProfileTopics(): NewManagementServicemethod extracts enriched topic entries from profile policy structure.TestCase.targetTopic: New optional field for audit topic attribution (backward-compatible with existing loop).
Tests¶
- 333 tests across 24 spec files (up from 298)
v1.5.0¶
Features¶
- Structured evaluation reports:
daystrom reportnow supports--format jsonand--format htmlfor machine-readable and shareable report export. - Per-test-case details:
--testsflag includes individual test results (prompt, expected/actual outcome, pass/fail, category, source) in all output formats. - Run comparison:
--diff <runId>compares two runs side-by-side with metric deltas (coverage, TPR, TNR, accuracy, F1). - Self-contained HTML reports: HTML output includes embedded CSS with run summary, iteration trends, metrics tables, test result tables, and diff sections. No external dependencies.
- JSON export: Clean structured JSON to stdout for CI/CD pipelines and programmatic consumption.
- New report module:
buildReportJson()andbuildReportHtml()exported as library functions for custom integrations.
Tests¶
- 298 tests across 21 spec files (up from 272)
v1.4.0¶
Features¶
- Carry forward failures: FP and FN test cases from each iteration are automatically carried into the next iteration's test suite. Failed tests are re-scanned to verify whether topic refinement fixed them.
- Regression tier: TP and TN (correct) test cases from the previous iteration are re-scanned as regression tests. If a previously-correct test now fails after topic refinement, it's counted as a regression.
- Weighted category generation: Per-category error rates from the previous iteration are passed to the LLM test generator, which produces proportionally more tests for high-error categories.
tests:composedevent: New loop event reports test composition breakdown (generated, carried failures, regression tier, total) on iterations 2+.regressionCountmetric:EfficacyMetricsnow includes the count of regression-tier tests that failed.CategoryBreakdowntype: New exported type for per-category FP/FN/error-rate breakdown.computeCategoryBreakdown()helper: New exported function to compute per-category error rates from test results.- Test source tagging:
TestCase.sourcefield tracks how each test entered the suite ('generated','carried-fp','carried-fn','regression').
Tests¶
- 272 tests across 19 spec files (up from 265)
v1.3.1¶
Documentation¶
- End-to-end example workflow: New Examples section with a complete guardrail-to-red-team walkthrough — generate a topic guardrail, export prompts as a custom prompt set, launch a CUSTOM red team scan, monitor status, and review per-prompt results. All output captured from a real AIRS run.
v1.3.0¶
Bug Fixes¶
- Fix custom scan payload: AIRS API expects
custom_prompt_setsas an array of UUID strings, not objects.createScan()was wrapping each UUID in{ uuid }, causing 422 validation errors on all CUSTOM scan requests. - Fix ASR display: AIRS API returns ASR/score/threatRate as percentages (0-100), not ratios (0-1). Renderer was multiplying by 100, showing e.g. 1250% instead of 12.5%.
Features¶
- Custom attack list in reports:
daystrom redteam report <jobId> --attacksnow shows prompt-level results for CUSTOM scans — prompt text, goal, threat status, and per-prompt ASR.
Tests¶
- 258 tests across 19 spec files (up from 255)
Documentation¶
- Updated red team CLI examples with real-world usage patterns
- Added tip for finding prompt set UUIDs
v1.2.0¶
Features¶
daystrom redteamcommand group: Full AI Red Team scan operations — launch static/dynamic/custom scans, poll for completion, view reports with severity breakdowns and attack details, list targets and categories, abort running scans.SdkRedTeamService: New service wrappingRedTeamClientfor programmatic red team operations. Normalizes all SDK responses into clean TypeScript interfaces.- 7 subcommands:
scan,status,report,list,targets,categories,abort.
Tests¶
- 255 tests across 19 spec files (up from 230), 100% coverage on new code.
v1.1.2¶
Bug Fixes¶
- Fix profile topic-list payload: AIRS rejects empty
topic-listentries. TheassignTopicToProfilemethod was sending two entries (one for the action, one empty for the opposite), causing a 400 error on profile update. Now sends a single entry containing only the active topic. Also removed unnecessaryrevisionfield from topic entries.
v1.1.1¶
Bug Fixes¶
- Fix allow-intent detection (P0): The v1.1.0
action === 'allow'heuristic was wrong — AIRS returnsaction: 'allow'for all prompts on allow topics. Detection now uses thecategoryfield ('benign'= topic matched,'malicious'= no match), with fallback totriggeredwhencategoryis absent. - Fix profile guardrail-level action:
topic-guardrailsentry in security profiles now always usesaction: 'block'to enforce violations. Previously defaulted to'allow', causing all topic guardrails to be unenforced.
Features¶
--debug-scansflag: Dumps raw AIRS scan responses to a JSONL file (~/.daystrom/debug-scans-*.jsonl) for offline inspection. Available on bothgenerateandresumecommands.- Scanner extracts
category: Thecategoryfield from AIRS responses is now included inScanResult. --create-prompt-setflag: Auto-creates a custom prompt set in AI Runtime Security from the best iteration's test cases. Prompts include goals indicating expected guardrail behavior. Available on bothgenerateandresumecommands.SdkPromptSetService: New service wrappingRedTeamClient.customAttacksfor prompt set CRUD.promptset:createdevent: New loop event emitted after prompt set creation with set ID, name, and prompt count.
Tests¶
- 230 tests across 18 spec files (up from 209)
v1.1.0¶
Bug Fixes¶
- Fix allow-intent detection (P0): AIRS never sets
triggered: truefor allow-intent topics — the loop now derives detection from theactionfield (action === 'allow'= topic matched). This fixes 0% TPR on all allow guardrails.
Features¶
- Intent-aware refinement:
analyzeResults()andimproveTopic()now receive the guardrail intent (block/allow), enabling the LLM to prioritize the correct error type during refinement — FN reduction for block guardrails, FP reduction for allow guardrails - Intent-specific test generation: test prompts now use different strategies and category taxonomies for block vs allow guardrails, with asymmetric ratios (~15 positive / ~25 negative for allow)
- Variable example count (2-5): LLM now varies example count between iterations to find optimal configuration. Memory system tracks example count correlation with efficacy.
- Test accumulation: new
--accumulate-testsflag carries test prompts forward across iterations with case-insensitive deduplication for regression detection - Max accumulated tests cap:
--max-accumulated-tests <n>limits growth of accumulated test pool tests:accumulatedevent: new loop event reports new/total/dropped test counts when accumulation is active
Tests¶
- 209 tests across 17 spec files (up from 192)
v1.0.8¶
Documentation¶
- Sync remaining docs pages with v1.0.7 changes: fix event table in design-decisions.md, add
format:checkto contributing.md and local-setup.md
v1.0.7¶
Dependencies¶
- Bump
@cdot65/prisma-airs-sdkfrom^0.2.0to^0.4.0-- adds Model Security, Red Team domains, typed enums, JSDoc, shared retry logic (backward compatible)
CI¶
- Add explicit
format:checkstep to CI workflow to catch formatting violations in PRs
Documentation¶
- Fix inaccurate
LoopEventdocumentation inCLAUDE.mdanddocs/architecture/core-loop.md - Remove unused env vars (
CLOUD_ML_REGION,ANTHROPIC_VERTEX_PROJECT_ID,PANW_AI_SEC_API_TOKEN,PANW_AI_SEC_PROFILE_NAME) from.env.exampleand reference docs - Add JSDoc/TSDoc to all 36+ exported symbols across the public API
Code Quality¶
- Extract shared
byteLen()utility fromconstraints.ts, remove duplicate inservice.ts - Remove deprecated
MAX_EXAMPLES_COUNTconstant
v1.0.0¶
First stable release of Daystrom.
Highlights¶
- Core iterative refinement loop with async generator architecture
- 6 LLM providers: Claude (API, Vertex, Bedrock) and Gemini (API, Vertex, Bedrock)
- Cross-run learning memory with keyword categorization and budget-aware prompt injection
- AIRS integration -- topic CRUD via Management API, batch scanning via Scan API
- 4 CLI commands:
generate,resume,report,list - Automatic topic constraint clamping for AIRS limits
- Comprehensive metrics: TPR, TNR, coverage, accuracy, F1
- Resumable runs with full state persistence
- 192 tests across 17 spec files
- Full documentation site at cdot65.github.io/daystrom
- Docker support with multi-arch images (amd64 + arm64)
v0.1.0 -- Initial Release¶
The first public release of Daystrom: an automated CLI for generating, testing, and iteratively refining Palo Alto Prisma AIRS custom topic guardrails.
Highlights¶
- Core iterative refinement loop with async generator architecture (
runLoop()yields typedLoopEventdiscriminated unions) - 6 LLM providers supported: Claude (API, Vertex, Bedrock) and Gemini (API, Vertex, Bedrock)
- Cross-run learning memory with keyword categorization and budget-aware prompt injection
- AIRS integration -- topic CRUD via Management API, batch scanning via Scan API
- 4 CLI commands:
generate,resume,report,list - Automatic topic constraint clamping for AIRS limits (100 char name, 250 char description, 250 char/example, 5 examples max, 1000 char combined)
- Comprehensive metrics: TPR, TNR, coverage, accuracy, F1
- Resumable runs with full state persistence to
~/.daystrom/runs/ - 165+ tests with ~98% statement coverage
Architecture Decisions¶
| Decision | Rationale |
|---|---|
| AsyncGenerator loop | Enables streaming events to CLI renderer, pause/resume, and decoupled orchestration |
| Structured LLM output (Zod) | Guarantees type-safe topic definitions with automatic retry on parse failure |
| MSW for test mocking | Fully offline test suite, no AIRS credentials needed |
| File-based memory | Simple persistence, no database dependency, human-readable JSON |