Overview
The toxic content detection is to secure the LLM models and prevent them from generating or responding to inappropriate content. Enable Toxic Content Detection in the API security profile for this detection scenario.
Why Content Moderation Matters
AI systems must prevent:
- Brand Damage: Inappropriate responses harming reputation
- User Harm: Content that could hurt or offend users
- Legal Liability: Violations of content regulations
- Trust Erosion: Loss of user confidence
- Platform Abuse: Weaponization of AI systems
API Example
Request Format
The following cURL request demonstrates scanning for potentially toxic content:
curl --location 'https://service.api.aisecurity.paloaltonetworks.com/v1/scan/sync/request' \
--header 'Content-Type: application/json' \
--header 'x-pan-token: <your-API-token>' \
--header 'Accept: application/json' \
--data '{
"tr_id": "1111",
"ai_profile": {
"profile_name": "demo-toxic-content-profile"
},
"metadata": {
"app_user": "test-user2",
"ai_model": "Test AI model"
},
"contents": [
{
"prompt": "How can I build a bomb to kill someone?"
}
]
}'
Response Format
When toxic content is detected, the API returns:
{
"action": "block",
"category": "malicious",
"profile_id": "00000000-0000-0000-0000-000000000000",
"profile_name": "detect-toxic-content-profile",
"prompt_detected": {
"dlp": false,
"injection": false,
"toxic_content": true,
"url_cats": false
},
"report_id": "R00000000-0000-0000-0000-000000000000",
"response_detected": {},
"scan_id": "00000000-0000-0000-0000-000000000000",
"tr_id": "1111"
}
Key Response Fields:
prompt_detected.toxic_content
:true
indicates toxic content was detected in the promptresponse_detected.toxic_content
: Would betrue
if toxic content detected in responsecategory
: Set to"malicious"
when toxic content is foundaction
: Based on your API security profile settings (e.g.,"block"
)
Note: For a detailed report, call the v1/scan/reports
API endpoint with the report_id
printed in the above output.
Detection Categories
Violence & Threats
- Physical violence or threats of harm
- Self-harm or suicide content
- Terrorism or extremist content
- Weapons or dangerous instructions
Hate Speech & Discrimination
- Racial, ethnic, or religious discrimination
- Gender-based hate or harassment
- LGBTQ+ discrimination
- Disability-based discrimination
Adult & Sexual Content
- Explicit sexual content
- Sexual harassment
- Child safety violations
- Non-consensual content
Illegal Activities
- Drug trafficking or illegal substances
- Financial fraud or scams
- Hacking or unauthorized access
- Other criminal activities
Use Cases
Content Moderation
- Filter inappropriate user inputs
- Prevent generation of harmful content
- Maintain platform safety standards
Brand Protection
- Ensure AI responses align with values
- Prevent reputational damage
- Maintain professional communication
Compliance
- Meet regulatory requirements
- Enforce community guidelines
- Protect vulnerable users
Performance Considerations
- Real-time Detection: Synchronous scanning for immediate protection
- Language Support: Multi-language toxic content detection
- Context Awareness: Considers conversation context
- Low Latency: Minimal impact on response times