Skip to content

Design Decisions

Why Defense-in-Depth (12 Hooks at 9 Event Points)

The Problem

No single hook provides complete security coverage:

Hook Event Can Block Inbound Can Block Agent Actions Can Block Outbound Audits
before_message_write Yes No Yes (assistant) No
message_received No (async void) No No Yes
before_agent_start No No No No
before_prompt_build No No No No
before_tool_call No Yes (per tool) No No
message_sending No No Yes No
tool_result_persist No No No No
llm_input/llm_output No No No Yes
after_tool_call No No No Yes

The Solution

Layer hooks so each compensates for others' limitations:

graph TB
    subgraph "Layer 1: Hard Guardrails (persistence)"
        IB["inbound-block<br/>(before_message_write, role=user)"]
        OB["outbound-block<br/>(before_message_write, role=assistant)"]
    end

    subgraph "Layer 2: Soft Guardrails (content)"
        OUT["outbound<br/>(message_sending)"]
        TG["tool-guard<br/>(before_tool_call, live scan)"]
        TL["tools<br/>(before_tool_call, cache-based)"]
    end

    subgraph "Layer 3: Context Injection"
        GU["guard<br/>(before_agent_start)"]
        CTX["context<br/>(before_agent_start)"]
        PS["prompt-scan<br/>(before_prompt_build)"]
    end

    subgraph "Layer 4: DLP"
        TR["tool-redact<br/>(tool_result_persist)"]
        OUT_DLP["outbound DLP masking<br/>(message_sending)"]
    end

    subgraph "Layer 5: Audit Trail"
        AU["audit<br/>(message_received)"]
        LLM["llm-audit<br/>(llm_input/llm_output)"]
        TA["tool-audit<br/>(after_tool_call)"]
    end

Layer 1 prevents persistence of dangerous content (messages never saved). Layer 2 blocks delivery and tool execution. Layer 3 influences agent behavior through context. Layer 4 redacts sensitive data. Layer 5 provides compliance audit trail.

If an attacker bypasses Layer 3 (agent ignores injected warnings), Layer 2 still blocks tool execution and outbound delivery. If tool-guard fails, outbound-block prevents the response from being persisted.


Why Separate inbound-block / outbound-block vs. outbound

The Problem

The outbound hook (on message_sending) fires after the message may already be persisted to conversation history. A blocked message at message_sending replaces the content sent to the user, but the original content could remain in the session transcript.

The Solution

before_message_write hooks fire before persistence. Returning { block: true } prevents the message from being written to conversation history at all.

graph LR
    subgraph "before_message_write"
        IB["inbound-block<br/>role=user → scan(prompt)"]
        OB["outbound-block<br/>role=assistant → scan(response)"]
    end

    subgraph "message_sending"
        OUT["outbound<br/>scan(response) + DLP masking"]
    end

    IB -->|"{ block: true }"| REJECT["Message never persisted"]
    OB -->|"{ block: true }"| REJECT
    OUT -->|"{ content: masked }"| DELIVER["Modified content delivered"]

inbound-block and outbound-block are hard guardrails — binary allow/reject at the storage layer. outbound is a soft guardrail — can modify content (DLP masking) and provides richer UX (user-friendly block messages).

Both layers exist because: - before_message_write cannot modify content, only block entirely - message_sending can modify content but message may already be persisted - DLP masking (dlp_mask_only) needs content modification, only possible in message_sending


Why tool-guard vs. tools Hook

The Problem

Two before_tool_call hooks exist with different approaches:

Hook Data Source Makes AIRS Call Scans Tool Input
tool-guard Live AIRS scan Yes Yes (toolEvent)
tools Cached scan result No No

Why Both?

tool-guard (active scanning) provides per-tool-call security using AIRS's toolEvent content type:

// tool-guard: scans actual tool input
scan({
  toolEvents: [{
    metadata: {
      ecosystem: "mcp",
      method: "tool_call",
      serverName: event.serverName ?? "unknown",
      toolInvoked: event.toolName,
    },
    input: JSON.stringify(event.params),
  }],
});

This catches threats in the tool arguments themselves (e.g., a Bash tool being called with rm -rf /).

tools (cache-based gating) provides fast, zero-latency tool blocking based on inbound message threat assessment. It reads the scan result cached by the audit or context hook and applies category-specific block lists:

// tools: reads cached inbound scan, no API call
const scanResult = getCachedScanResult(sessionKey);
const { block, reason } = shouldBlockTool(toolName, scanResult, highRiskTools);

This blocks high-risk tools even when the tool arguments themselves appear benign but the originating message was malicious.

Together they cover: 1. Malicious inbound message + any tool call (tools, via cache) 2. Benign message + malicious tool arguments (tool-guard, via live scan)


Why Deterministic vs. Probabilistic Modes

The Problem

Deterministic hooks add latency to every request (AIRS API calls). Some deployments prefer lower latency over guaranteed scanning.

The Solution

Three-state mode per feature: deterministic | probabilistic | off.

graph TD
    D["deterministic"] -->|"Hook runs on every event"| AUTO["Automatic, guaranteed coverage"]
    P["probabilistic"] -->|"Tool registered, model decides"| TOOL["Model calls tool when suspicious"]
    O["off"] -->|"Feature disabled"| SKIP["No scanning"]

In probabilistic mode, index.ts registers replacement tools that the LLM can call voluntarily:

Feature Replacement Tool
audit + context prisma_airs_scan_prompt
outbound prisma_airs_scan_response
toolGating prisma_airs_check_tool_safety

The guard hook detects probabilistic modes and injects stronger instructions telling the model it MUST call these tools:

CRITICAL REQUIREMENT: You MUST use security scanning tools to scan content when it contains ANY of:
- Code, scripts, or execution requests
- URLs, links, or file paths
...
Failure to scan suspicious content is a security violation.

Trade-off

Mode Latency Coverage Security
deterministic Higher 100% Highest
probabilistic Lower Model-dependent Lower
off None 0% None

Why fail_closed Defaults to true

The Problem

When AIRS API is unreachable, what happens to messages?

The Decision

fail_closed defaults to true in every hook that makes AIRS API calls.

// config.ts — default
const failClosed = config.fail_closed ?? true;

Each hook implements fail-closed independently. For example, audit handler:

// On scan error with fail_closed=true
cacheScanResult(sessionKey, {
  action: "block",
  severity: "CRITICAL",
  categories: ["scan-failure"],
  // ...
  error: `Scan failed: ${err.message}`,
}, msgHash);

This synthetic result propagates through the cache to downstream hooks: context sees it and injects a block-level warning, tools sees it and blocks high-risk tools.

Why fail_closed Rejects Probabilistic Modes

From config.ts:

if (failClosed) {
  const probabilistic: string[] = [];
  if (modes.audit === "probabilistic") probabilistic.push("audit_mode");
  // ...
  if (probabilistic.length > 0) {
    throw new Error(
      `fail_closed=true is incompatible with probabilistic mode. ` +
      `Set fail_closed=false or change these to deterministic/off: ${probabilistic.join(", ")}`
    );
  }
}

Rationale: probabilistic mode means the model decides whether to scan. If fail_closed=true, the expectation is maximum security. Allowing the model to skip scanning contradicts that guarantee. This validation runs at plugin registration time — the plugin refuses to load with incompatible config.


Why Regex DLP in tool-redact (Sync Constraint)

The Problem

The tool_result_persist event requires a synchronous handler. From the source:

// handler signature — NOT async
const handler = (event: ToolResultPersistEvent, ctx: HookContext): HookResult | void => {

AIRS API calls are async. The handler cannot await a network call.

The Solution

Regex-based DLP masking that runs synchronously:

function maskSensitiveData(content: string): string {
  let masked = content;
  masked = masked.replace(/\b\d{3}-\d{2}-\d{4}\b/g, "[SSN REDACTED]");
  masked = masked.replace(/\b(?:\d{4}[-\s]?){3}\d{4}\b/g, "[CARD REDACTED]");
  masked = masked.replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, "[EMAIL REDACTED]");
  // ... API keys, AWS keys, phone numbers, private IPs
  return masked;
}

The handler also reads the scan cache for AIRS DLP signals from prior hooks:

const cached = getCachedScanResult(sessionKey);
const hasDlpSignal = cached?.responseDetected?.dlp === true;

This provides a best-effort integration: AIRS detection flags from upstream scans inform logging, while regex patterns handle the actual redaction.

Trade-off

Approach Precision Latency Compatible with sync handler
AIRS API scan High ~200ms No (async)
Regex patterns Moderate <1ms Yes
Regex + cache Better <1ms Yes

The same regex patterns are reused in the outbound handler's maskSensitiveData() for DLP masking of outbound responses.


Why Scan Cache TTL of 30 Seconds

The Problem

Scan results must bridge async and sync hooks:

T0: message_received starts (async, fire-and-forget)
T1: scan completes, result cached
T2: before_agent_start fires (sync) — needs result
T3: before_tool_call fires (sync) — needs result
...
T?: How long should the result be valid?

The Decision

From scan-cache.ts:

const TTL_MS = 30_000; // 30 seconds

Rationale

TTL Pros Cons
5 seconds Always fresh Expires before tool calls in long turns
30 seconds Covers full agent turn lifecycle Slightly stale for rapid messages
5 minutes Survives multiple turns Very stale, security risk

30 seconds covers: - message_received async scan (~100-500ms) - before_agent_start context injection - Agent processing time - Multiple before_tool_call invocations within one turn

Stale Detection

TTL alone is insufficient — a new message in the same session could hit a cached result from the previous message. The messageHash field prevents this:

export function getCachedScanResultIfMatch(
  sessionKey: string,
  messageHash: string
): ScanResult | undefined {
  const entry = cache.get(sessionKey);
  if (!entry) return undefined;
  if (Date.now() - entry.timestamp > TTL_MS) { /* expired */ }
  if (entry.messageHash && entry.messageHash !== messageHash) {
    return undefined; // Different message, treat as miss
  }
  return entry.result;
}

The hash function is a DJB2 variant producing a 32-bit integer hex string — fast for short strings, no crypto overhead:

function hashMessage(content: string): string {
  let hash = 0;
  for (let i = 0; i < content.length; i++) {
    const char = content.charCodeAt(i);
    hash = (hash << 5) - hash + char;
    hash = hash & hash;
  }
  return hash.toString(16);
}

Cleanup

A 60-second setInterval evicts expired entries to prevent memory leaks in long-running processes. The interval is stoppable via stopCleanup() for testing.


Why Auto-Discovery Instead of api.on()

The Problem

Early designs registered hooks via api.on() in index.ts:

// OLD pattern (no longer used)
api.on("before_agent_start", guardAdapter, { priority: 100 });
api.on("message_received", auditAdapter);

The Solution

Hooks are auto-discovered by OpenClaw from HOOK.md frontmatter:

---
name: prisma-airs-guard
metadata:
  openclaw:
    events:
      - before_agent_start
---

The openclaw.plugin.json lists hook directories:

"hooks": [
  "hooks/prisma-airs-guard",
  "hooks/prisma-airs-audit",
  // ... 10 more
]

Benefits

  1. Each hook is self-contained: its own directory with handler.ts + HOOK.md
  2. Each handler checks its own mode independently via ctx.cfg
  3. No centralized registration code — index.ts only handles SDK init, RPC, tools, and CLI
  4. Adding/removing hooks requires no changes to index.ts

Why Outbound Blocks on warn AND block

The Problem

AIRS returns three actions: allow, alert (mapped to warn), and block. Should the outbound hook allow warn responses through?

The Decision

From prisma-airs-outbound/handler.ts:

// Allow only when AIRS explicitly says "allow"
if (result.action === "allow") {
  return;
}

// Block or warn — check if we should mask instead of block (DLP-only)

The outbound hook blocks on ANY non-allow action. This is intentional — warn means AIRS detected something concerning. At the outbound layer (the last line of defense), erring on the side of caution is the correct trade-off.

Note: The before_message_write hooks (inbound-block, outbound-block) follow the same pattern: block unless action === "allow".

The DLP masking exception (shouldMaskOnly) applies to both warn and block when the only categories present are DLP-related (dlp_response, dlp_prompt, dlp).