Code Extraction

The code extractor separates code blocks from natural language in AI responses, enabling dedicated malicious code scanning via the AIRS code_response field.

Extraction Strategy

The extractor uses three strategies in priority order:

1. Fenced Code Blocks

```python
def example():
    return "hello"
```

Detects language from the fence annotation. Supports all common programming languages.

2. Indented Code Blocks

    function example() {
        return "hello";
    }

Lines with 4+ leading spaces are treated as code.

3. Heuristic Fallback

When no fenced or indented blocks are found, the extractor looks for code indicators:

Import/require statements
Function/class definitions
Braces, semicolons, arrow functions
Shell commands (pipes, redirects)

Content is classified as code if the ratio of code-like characters exceeds 15% (CODE_CHAR_THRESHOLD = 0.15).

Output

interface ExtractedContent {
  naturalLanguage: string;   // Text portions
  codeBlocks: string[];      // Extracted code blocks
  languages: string[];       // Detected languages
}

Multiple code blocks are joined with \n\n---\n\n separators before being sent in the code_response field.

AIRS Field Mapping

Extracted Content	AIRS Field	Detection Engines
Natural language	`response`	DLP, toxicity, URL categorization
Code blocks	`code_response`	WildFire, ATP (malicious code)

Why separate fields matter

The code_response field activates WildFire and Advanced Threat Prevention engines that specifically analyze code for malicious patterns. These engines don't run on natural language content, so splitting the response ensures comprehensive coverage.

Extraction Strategy​

1. Fenced Code Blocks​

2. Indented Code Blocks​

3. Heuristic Fallback​

Output​

AIRS Field Mapping​