The attacks we simulate and the controls we verify
This page details the specific vulnerabilities we test for and the safeguards we check. Use it to understand our scope or prepare your team.
- Prompt injection and jailbreaking.
- Data leakage and retrieval attacks.
- Output safety and policy bypass.
- Tool/function calling abuse.
Prompt injection testing
We attempt to make your AI do things it shouldn't.
Direct jailbreaking
Roleplay attacks, DAN-style prompts, and multi-turn manipulation to bypass system instructions.
System prompt extraction
Attempts to leak your system prompt, revealing business logic and security controls.
Indirect injection
Malicious instructions hidden in retrieved documents, user profiles, or external content.
Instruction hierarchy bypass
Trying to override developer instructions with user-level prompts.
Context manipulation
Exploiting long context windows to hide attacks or confuse the model about instruction source.
Encoding tricks
Base64, ROT13, and other encoding schemes to sneak harmful instructions past filters.
Data leakage testing
We try to extract information your AI shouldn't reveal.
Cross-tenant retrieval
Attempting to access other users' documents through crafted queries in multi-tenant RAG systems.
PII extraction
Probing for personal data that may have been embedded or retrieved unintentionally.
Training data extraction
Attempts to recover verbatim training examples or fine-tuning data from model responses.
Context window leakage
Extracting previous conversation content or system context from the current session.
Metadata exposure
Leaking file paths, database schemas, or internal identifiers through model outputs.
Credential disclosure
Checking if API keys, passwords, or secrets can be extracted from prompts or retrieved content.
Output safety testing
We verify your AI refuses harmful requests and doesn't generate dangerous content.
Harmful content generation
Testing for generation of illegal, violent, or explicitly harmful content.
Policy bypass
Attempts to generate content that violates your stated usage policies.
XSS and injection payloads
Checking if model outputs can contain executable code or injection attacks.
Malicious links
Testing if the AI can be tricked into generating phishing URLs or malware links.
Unintended tool execution
For agentic systems: testing if attackers can trigger dangerous function calls.
Social engineering outputs
Checking if the AI can be used to craft scam messages or impersonate others.
Safeguards we check
Beyond attack simulation, we verify you have controls in place:
- Input filtering: Guardrails that detect and block malicious prompts.
- Output filtering: Post-processing that catches harmful responses.
- Access controls: RAG retrieval respects user permissions.
- Logging: Prompts and responses are logged for incident response.
- Kill switches: Ability to disable AI features quickly.
What you get
- Vulnerability findings with severity ratings
- Reproduction steps for each finding
- Remediation guidance
- Control gap analysis
- Executive summary for customers
Want the full technical scope?
We can walk through exactly which tests apply to your system architecture.