Introduction to ZeroLeaks
ZeroLeaks is an autonomous AI security scanner that tests LLM systems for prompt injection vulnerabilities. It simulates real-world attacks to find security weaknesses before attackers do.Why ZeroLeaks?
Your system prompts contain proprietary instructions, business logic, and sensitive configurations. Attackers use prompt injection to extract this data. ZeroLeaks simulates real-world attacks to find vulnerabilities before they do.Installation
Get started with ZeroLeaks in your project using npm, yarn, pnpm, or bun
Quick start
Run your first security scan in under 5 minutes
API reference
Complete API documentation for runSecurityScan and createScanEngine
Attack techniques
Learn about the attack techniques and methods used by ZeroLeaks
Key features
Multi-agent architecture
Multi-agent architecture
ZeroLeaks uses six specialized agents:
- Strategist: Selects attack strategies based on defense profile
- Attacker: Generates attack prompts
- Evaluator: Analyzes responses for leaks
- Mutator: Creates variations of successful attacks
- Inspector: Performs defense fingerprinting (TombRaider pattern)
- Orchestrator: Coordinates multi-turn attack sequences
Tree of Attacks (TAP)
Tree of Attacks (TAP)
Systematic exploration of attack vectors with pruning. ZeroLeaks builds a tree of potential attacks, exploring promising branches while pruning unsuccessful paths to maximize efficiency.
Modern attack techniques
Modern attack techniques
Incorporates cutting-edge research including:
- Crescendo: Multi-turn trust escalation
- Many-Shot: Context priming with examples
- Chain-of-Thought Hijacking: Reasoning manipulation
- Policy Puppetry: YAML/JSON format exploitation
- Siren: Trust-building manipulation sequences
- Echo Chamber: Gradual escalation through agreement
Defense fingerprinting
Defense fingerprinting
Identifies specific defense systems in use (Prompt Shield, Llama Guard, etc.) and adapts attack strategies accordingly using the TombRaider dual-agent pattern.
Research-backed approaches
Research-backed approaches
Incorporates CVE-documented vulnerabilities and academic research, including:
- CVE-2025-32711 (EchoLeak)
- TAP (Tree of Attacks with Pruning)
- PAIR (Prompt Automatic Iterative Refinement)
- Best-of-N sampling
- TombRaider jailbreak pattern
- Skeleton Key guardrail bypass
Dual scan modes
Dual scan modes
- System prompt extraction: Tests if attackers can extract your system prompt
- Prompt injection testing: Tests if attackers can inject malicious instructions
Open source vs hosted
ZeroLeaks is available as both an open source package and a hosted service at zeroleaks.ai.| Feature | Open source | Hosted (zeroleaks.ai) |
|---|---|---|
| Price | Free | From $0/mo |
| Setup | Self-hosted, bring your own API keys | Zero configuration |
| Scans | Unlimited | Free tier: 3/mo, Startup: Unlimited |
| Reports | JSON output | Interactive dashboard + PDF exports |
| History | Manual tracking | Full scan history & trends |
| Support | Community | Priority support |
| Updates | Manual | Automatic |
| CI/CD Integration | — | Included |
Tech stack
| Component | Technology |
|---|---|
| Runtime | Bun |
| Language | TypeScript |
| LLM Provider | OpenRouter |
| AI SDK | Vercel AI SDK |
| Architecture | Multi-agent orchestration |
Attack categories
ZeroLeaks includes probes across 15+ attack categories:- Direct: Straightforward extraction requests
- Encoding: Base64, ROT13, Unicode bypasses
- Persona: DAN, Developer Mode, roleplay attacks
- Social: Authority, urgency, reciprocity exploits
- Technical: Format injection, context manipulation
- Crescendo: Multi-turn trust escalation
- Many-Shot: Context priming with examples
- CoT Hijack: Chain-of-thought manipulation
- Policy Puppetry: YAML/JSON format exploitation
- ASCII Art: Visual obfuscation techniques
- Injection: Prompt injection attacks
- Hybrid: Combined XSS/CSRF-style attacks
- Tool Exploit: MCP and tool-calling exploits
- Siren: Trust-building manipulation sequences
- Echo Chamber: Gradual escalation through agreement
Next steps
Install ZeroLeaks
Install the package and configure your API key
Run your first scan
Get started with a working example