Add prompt-defense-audit to Red Team Handbook tools by ppcvote · Pull Request #816 · OWASP/www-project-top-10-for-large-language-model-applications

ppcvote · 2026-04-03T15:02:23Z

Summary

Adds prompt-defense-audit to the GenAI Red Team Handbook's tools directory and to the resources page.

What This Tool Does

Unlike offensive tools (garak, promptfoo) that probe running LLMs for vulnerabilities, prompt-defense-audit takes a defensive posture approach: it analyzes system prompt text to determine whether adequate defenses are in place before deployment.

Think of it as a configuration audit (is the firewall on?) rather than a penetration test (can I break in?).

OWASP LLM Top 10 Mapping

The tool's 12 defense checks map directly to the OWASP LLM Top 10:

OWASP Category	# of Defense Checks	What's Checked
LLM01 Prompt Injection	6	Role boundary, instruction boundary, multilang bypass, unicode attacks, indirect injection, input validation
LLM02 Insecure Output Handling	1	Output format restrictions (code, scripts, links)
LLM06 Sensitive Info Disclosure	1	System prompt and data leakage prevention
LLM08 Excessive Agency	—	Covered indirectly by role boundary + instruction boundary
LLM09 Overreliance	1	Harmful/weaponizable content prevention

Key Properties

Deterministic: Pure regex, same input = same output (no ML inference)
Zero cost: No API calls, no tokens consumed
< 5ms execution time — CI/CD pipeline friendly
Zero dependencies
Production-tested: Powers UltraProbe web scanner

Changes

Added initiatives/genai_red_team_handbook/tools/prompt-defense-audit/README.md — detailed tool documentation with usage examples and OWASP mapping
Added a Tools section to resources/index.md linking to the tool

Links

GitHub: https://github.com/ppcvote/prompt-defense-audit
npm: npm install prompt-defense-audit
Web scanner: https://ultralab.tw/probe

Adds prompt-defense-audit — a deterministic system prompt defense scanner that checks for defensive posture against 12 common LLM attack vectors using pure regex pattern matching. Maps directly to OWASP LLM Top 10: - LLM01 (Prompt Injection): 6 defense checks - LLM02 (Insecure Output Handling): 1 defense check - LLM06 (Sensitive Info Disclosure): 1 defense check - LLM09 (Overreliance): 1 defense check Zero AI cost, zero dependencies, < 5ms execution. Production-tested at ultralab.tw/probe. GitHub: https://github.com/ppcvote/prompt-defense-audit npm: prompt-defense-audit

ppcvote · 2026-04-29T15:55:43Z

Status update — added empirical backing since this PR opened:

Production corpus expanded: 257 → 1,646 system prompts (4 public datasets, deduped). Per-vector gap rates: https://github.com/ppcvote/prompt-defense-audit/blob/master/research/gap-20260405.json
Aggregate: mean defense score 36/100, 78.3% of corpus scores F (<45)
MCP server audit (new): applied the same methodology to the 7 official servers in modelcontextprotocol/servers — 6/7 scored F, 8 defense vectors at 100% gap rate. Detail: https://github.com/ppcvote/prompt-defense-audit/tree/master/research/mcp-per-server. Cross-referenced from modelcontextprotocol/servers#3537.

The MCP audit is particularly relevant for the Red Team Handbook context: it shows that even reference implementations of agent infrastructure ship without basic prompt-level defensive language. A static audit catches this in CI before deployment, complementary to runtime guardrails.

Happy to address review feedback when maintainers have a moment.

ppcvote · 2026-05-08T10:24:59Z

Update — prompt-defense-audit v1.4.0 released

The PR currently references the 12-vector model. v1.4.0 expands to 17 vectors, adding 5 agent-specific defenses derived from a structured analysis of six documented crypto AI agent prompt-injection incidents. Mapping to OWASP LLM Top 10 categories:

LLM01 (Prompt Injection): new encoding-injection (decoded/translated content as untrusted), new cross-agent-auth (authority not inherited across agents)
LLM07 (Insecure Plugin Design): new function-immutable (function semantics fixed mid-conversation), new transaction-guardrails (hard limits on transactions)
LLM08 (Excessive Agency): new memory-provenance (RAG memory may be poisoned), new cross-agent-auth

Reference incidents (with primary sources in CASE_STUDIES.md):

Lobstar Wilde, Feb 2026 — ~$250K, social engineering + decimal bug
Grok×Bankrbot Morse, May 2026 — ~$175K, encoding-based indirect injection + tool authorization
AIXBT, Mar 2025 — ~$106K, dashboard takeover (control-plane, not prompt injection — included for honest scoping)
Freysa, Nov 2024 — $47K, function semantic redefinition
ElizaOS, May 2025 — Princeton academic PoC, cross-platform memory poisoning
Bankrbot, Mar 2025 — ~$330K, social engineering precursor

Companion blog: https://ultralab.tw/en/blog/crypto-ai-agent-prompt-injection-static-analysis
Release: https://github.com/ppcvote/prompt-defense-audit/releases/tag/v1.4.0

If the PR moves forward, happy to update the description to reference the 17-vector model and the case studies. The Red Team Handbook framing fits the new vectors well — they're each grounded in real production failures rather than theoretical attack categories.

ppcvote requested review from GangGreenTemperTatum, rossja and virtualsteve-star as code owners April 3, 2026 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add prompt-defense-audit to Red Team Handbook tools#816

Add prompt-defense-audit to Red Team Handbook tools#816
ppcvote wants to merge 1 commit into
OWASP:mainfrom
ppcvote:feat/add-prompt-defense-audit-tool

ppcvote commented Apr 3, 2026

Uh oh!

ppcvote commented Apr 29, 2026

Uh oh!

ppcvote commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ppcvote commented Apr 3, 2026

Summary

What This Tool Does

OWASP LLM Top 10 Mapping

Key Properties

Changes

Links

Uh oh!

ppcvote commented Apr 29, 2026

Uh oh!

ppcvote commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant