Core Loop¶
The heart of Daystrom. The core loop (src/core/loop.ts) is an async generator that yields typed events as it works. The CLI renders those events, but the loop itself has no knowledge of how its output is displayed — making it independently testable and reusable.
What Happens Each Iteration¶
Each iteration follows a fixed sequence:
flowchart TD
Start([Iteration Start]) --> GenOrImprove{Iteration 1?}
GenOrImprove -->|Yes| Generate[Generate Topic via LLM]
GenOrImprove -->|No| Improve[Improve Topic via LLM]
Generate --> Clamp[clampTopic - enforce AIRS limits]
Improve --> Clamp
Clamp --> Deploy[Deploy via Management API]
Deploy --> Wait[Wait for Propagation - 10s default]
Wait --> TestGen[Generate Test Cases via LLM]
TestGen --> Scan[Batch Scan via Scanner API]
Scan --> Metrics[Compute Metrics - TPR, TNR, Coverage, F1]
Metrics --> Analyze[Analyze FP/FN Patterns via LLM]
Analyze --> Check{Coverage >= Target?}
Check -->|Yes| Complete([Loop Complete])
Check -->|No| MaxCheck{Max Iterations?}
MaxCheck -->|Yes| Complete
MaxCheck -->|No| Start
Events¶
The generator yields events at each stage. Consumers (like the CLI renderer) iterate the generator and switch on the event type.
Yielded by runLoop()¶
| Event | Payload | When |
|---|---|---|
iteration:start |
iteration number | Start of each iteration |
generate:complete |
CustomTopic |
After LLM generates or improves topic |
apply:complete |
topic ID | After topic deployed to AIRS (yielded but intentionally unhandled in CLI) |
tests:composed |
generated, carried failures, regression tier, total | After test suite composed from generated + carried FP/FN + regression tier (iteration 2+) |
tests:accumulated |
new count, total count, dropped count | After test accumulation merges new + old tests (only when accumulateTests enabled, iteration 2+) |
test:progress |
completed count, total | Per-test scan completion |
evaluate:complete |
EfficacyMetrics |
After metrics computed |
analyze:complete |
AnalysisReport |
After FP/FN analysis |
iteration:complete |
IterationResult |
Full iteration summary |
memory:extracted |
learning count | Learnings extracted post-loop (only if memory enabled) |
loop:complete |
best iteration, run state | Terminal: target reached or max iterations |
Defined but not yielded by runLoop()¶
| Event | Payload | Status |
|---|---|---|
loop:paused |
current state | Reserved for future use — not currently yielded |
memory:loaded |
learning count | Emitted by CLI command (generate.ts) before the loop starts, not by the generator itself |
Terminal Events
loop:complete is the terminal event. After it is yielded, the generator returns and no further events are produced. loop:paused is defined in the type union for future use but is not currently yielded.
Topic Name Locking¶
The topic name is generated once during iteration 1 and locked for all subsequent iterations. Only the description and examples are refined in later iterations.
This prevents two problems:
- Identity thrashing -- changing the topic name on each iteration would create new AIRS entities instead of updating the existing one.
- Entity inconsistency -- downstream profile references depend on a stable topic identity.
Name Immutability
The loop enforces name locking internally. Even if the LLM returns a different name in its improvement output, the original name from iteration 1 is preserved.
Stop Conditions¶
The loop terminates when either condition is met:
| Condition | Default | Description |
|---|---|---|
| Coverage target reached | 0.9 (90%) |
coverage = min(TPR, TNR) must meet or exceed targetCoverage |
| Max iterations exceeded | 20 |
Hard upper bound on refinement cycles |
Coverage Definition
Coverage is defined as min(TPR, TNR), not a simple average. This ensures both true-positive and true-negative performance must reach the target -- the system cannot pass by excelling at one while failing the other.
Four LLM Calls Per Iteration¶
Each iteration makes up to four LLM calls, all using withStructuredOutput(ZodSchema):
- Generate / Improve Topic -- produces a
CustomTopic(name, description, examples) - Generate Test Cases -- produces positive and negative test prompts
- Analyze Results -- examines false positives and false negatives for patterns (intent-aware: prioritizes FN reduction for block, FP reduction for allow)
- (Post-loop) Extract Learnings -- distills iteration history into reusable memory entries
Test Composition¶
On iteration 2+, the test suite is automatically composed from three sources:
- Carried failures (always-on): FP and FN test cases from the previous iteration are re-tested to verify whether topic refinement resolved them. Tagged with
source: 'carried-fp'or'carried-fn'. - Regression tier (always-on): TP and TN (correct) test cases from the previous iteration are re-scanned. If they now fail, that's a regression. Tagged with
source: 'regression'. - Fresh generated tests: New tests from the LLM, informed by per-category error rates from the previous iteration (weighted generation). Tagged with
source: 'generated'.
All three pools are deduplicated case-insensitively by prompt text. Priority: carried failures > regression > generated.
The tests:composed event reports the breakdown on each iteration 2+.
Weighted Category Generation¶
On iteration 2+, computeCategoryBreakdown() computes per-category FP/FN error rates from the previous iteration's results. This breakdown is injected into the LLM test generation prompt, instructing it to generate proportionally more tests for weak categories.
Regression Tracking¶
EfficacyMetrics.regressionCount counts regression-tier tests that failed (previously correct, now wrong after topic refinement). Regressions also count in the normal FP/FN tallies — the separate counter surfaces how many failures were caused by topic changes vs. being new failures.
Test Accumulation (Legacy)¶
The accumulateTests flag enables additional full-pool accumulation on top of the composition logic. When enabled, tests from all prior iterations are also merged (not just the previous iteration's failures and regressions):
- Deduplication: case-insensitive by prompt text, new tests take priority over old
- Max cap: optional
maxAccumulatedTestslimits total count, keeping newest first - Event:
tests:accumulatedis yielded on iterations 2+ with new/total/dropped counts