Skip to content

Add prompt-defense-audit to Red Team Handbook tools#816

Open
ppcvote wants to merge 1 commit into
OWASP:mainfrom
ppcvote:feat/add-prompt-defense-audit-tool
Open

Add prompt-defense-audit to Red Team Handbook tools#816
ppcvote wants to merge 1 commit into
OWASP:mainfrom
ppcvote:feat/add-prompt-defense-audit-tool

Conversation

@ppcvote
Copy link
Copy Markdown

@ppcvote ppcvote commented Apr 3, 2026

Summary

Adds prompt-defense-audit to the GenAI Red Team Handbook's tools directory and to the resources page.

What This Tool Does

Unlike offensive tools (garak, promptfoo) that probe running LLMs for vulnerabilities, prompt-defense-audit takes a defensive posture approach: it analyzes system prompt text to determine whether adequate defenses are in place before deployment.

Think of it as a configuration audit (is the firewall on?) rather than a penetration test (can I break in?).

OWASP LLM Top 10 Mapping

The tool's 12 defense checks map directly to the OWASP LLM Top 10:

OWASP Category # of Defense Checks What's Checked
LLM01 Prompt Injection 6 Role boundary, instruction boundary, multilang bypass, unicode attacks, indirect injection, input validation
LLM02 Insecure Output Handling 1 Output format restrictions (code, scripts, links)
LLM06 Sensitive Info Disclosure 1 System prompt and data leakage prevention
LLM08 Excessive Agency Covered indirectly by role boundary + instruction boundary
LLM09 Overreliance 1 Harmful/weaponizable content prevention

Key Properties

  • Deterministic: Pure regex, same input = same output (no ML inference)
  • Zero cost: No API calls, no tokens consumed
  • < 5ms execution time — CI/CD pipeline friendly
  • Zero dependencies
  • Production-tested: Powers UltraProbe web scanner

Changes

  1. Added initiatives/genai_red_team_handbook/tools/prompt-defense-audit/README.md — detailed tool documentation with usage examples and OWASP mapping
  2. Added a Tools section to resources/index.md linking to the tool

Links

Adds prompt-defense-audit — a deterministic system prompt defense
scanner that checks for defensive posture against 12 common LLM attack
vectors using pure regex pattern matching.

Maps directly to OWASP LLM Top 10:
- LLM01 (Prompt Injection): 6 defense checks
- LLM02 (Insecure Output Handling): 1 defense check
- LLM06 (Sensitive Info Disclosure): 1 defense check
- LLM09 (Overreliance): 1 defense check

Zero AI cost, zero dependencies, < 5ms execution.
Production-tested at ultralab.tw/probe.

GitHub: https://github.com/ppcvote/prompt-defense-audit
npm: prompt-defense-audit
@ppcvote
Copy link
Copy Markdown
Author

ppcvote commented Apr 29, 2026

Status update — added empirical backing since this PR opened:

The MCP audit is particularly relevant for the Red Team Handbook context: it shows that even reference implementations of agent infrastructure ship without basic prompt-level defensive language. A static audit catches this in CI before deployment, complementary to runtime guardrails.

Happy to address review feedback when maintainers have a moment.

@ppcvote
Copy link
Copy Markdown
Author

ppcvote commented May 8, 2026

Update — prompt-defense-audit v1.4.0 released

The PR currently references the 12-vector model. v1.4.0 expands to 17 vectors, adding 5 agent-specific defenses derived from a structured analysis of six documented crypto AI agent prompt-injection incidents. Mapping to OWASP LLM Top 10 categories:

  • LLM01 (Prompt Injection): new encoding-injection (decoded/translated content as untrusted), new cross-agent-auth (authority not inherited across agents)
  • LLM07 (Insecure Plugin Design): new function-immutable (function semantics fixed mid-conversation), new transaction-guardrails (hard limits on transactions)
  • LLM08 (Excessive Agency): new memory-provenance (RAG memory may be poisoned), new cross-agent-auth

Reference incidents (with primary sources in CASE_STUDIES.md):

  • Lobstar Wilde, Feb 2026 — ~$250K, social engineering + decimal bug
  • Grok×Bankrbot Morse, May 2026 — ~$175K, encoding-based indirect injection + tool authorization
  • AIXBT, Mar 2025 — ~$106K, dashboard takeover (control-plane, not prompt injection — included for honest scoping)
  • Freysa, Nov 2024 — $47K, function semantic redefinition
  • ElizaOS, May 2025 — Princeton academic PoC, cross-platform memory poisoning
  • Bankrbot, Mar 2025 — ~$330K, social engineering precursor

Companion blog: https://ultralab.tw/en/blog/crypto-ai-agent-prompt-injection-static-analysis
Release: https://github.com/ppcvote/prompt-defense-audit/releases/tag/v1.4.0

If the PR moves forward, happy to update the description to reference the 17-vector model and the case studies. The Red Team Handbook framing fits the new vectors well — they're each grounded in real production failures rather than theoretical attack categories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant