Control framework

24 controls for AI security

This is our reference framework. We test for these controls during engagements and help you implement what's missing.

Framework scope
  • Security controls for LLM and RAG systems.
  • Data protection and access control.
  • Output safety and policy enforcement.
  • Operational controls (logging, monitoring, incident response).

Note: Bias testing and explainability are out of scope—we focus on security.

Reference

Full control list

Click any control to see what we test for and what evidence looks like.

Common Criteria - Security (CC)
CC-1.1 Access Control and MFA

Requirement: Enforce MFA for cloud consoles and code repos. Restrict access to model weights, prompt repositories, and fine-tuning data to approved ML engineers.

Evidence: Access review logs; IdP configuration; model repo permission screenshots.

CC-1.2 Encryption at Rest and In Transit

Requirement: Use AES-256 for databases and TLS 1.2+ for all traffic. Encrypt vector database volumes and embeddings in transit.

Evidence: KMS policies; storage encryption settings; TLS test report.

CC-1.3 Input Sanitization and Injection Defense

Requirement: All user inputs pass through a guardrail layer that detects and blocks prompt injection patterns before model execution.

Evidence: Middleware code; prompt-injection pen test results; blocked request logs.

CC-1.4 Output Security

Requirement: Scan model outputs for malicious payloads or unsafe links before rendering to users.

Evidence: Output filtering configuration; logs of blocked outputs.

CC-1.5 Incident Response for AI Failures

Requirement: Incident response includes AI-specific scenarios and a kill switch to disable tool use or external API access.

Evidence: IR playbooks; tabletop exercise report simulating runaway model behavior.

CC-1.6 Logging and Anomaly Detection

Requirement: Log prompts, tool calls, and retrieval events with tamper-resistant storage. Alert on abnormal prompt patterns or elevated refusal rates.

Evidence: Logging architecture diagram; SIEM alert rules; sample alert tickets.

CC-1.7 Tooling Permissions and Least Privilege

Requirement: AI agents only receive scoped tool access, with explicit allowlists for external APIs and file access.

Evidence: Tool permission manifests; access approval records.

Confidentiality (C)
C-1.1 Data Classification and Embedding Labels

Requirement: Tag data as Public, Confidential, or Restricted. Embeddings inherit classification and are filtered by user role.

Evidence: Data flow diagrams; vector DB metadata schema; access policy rules.

C-1.2 Retrieval Access Control (RAG)

Requirement: Retrieval enforces user permissions before a chunk enters the context window.

Evidence: Retrieval policy tests; screenshots of denied queries.

C-1.3 Zero-Retention Agreements

Requirement: Contracts with LLM providers include explicit no-training or zero-retention terms.

Evidence: Vendor agreements; API settings showing retention disabled.

C-1.4 Context Flushing

Requirement: Enforce isolation between user sessions and cryptographically verify context window reset between sessions.

Evidence: Session management code review; penetration test attempting cross-session leakage.

C-1.5 PII Scrubbing Before Embedding

Requirement: Detect and mask PII before embedding to prevent irreversible exposure in vector stores.

Evidence: Pipeline configuration; scrubber logs and sample redaction reports.

C-1.6 Secrets and Credential Redaction

Requirement: Remove secrets or API keys from prompts and outputs using pattern detection and secret scanning.

Evidence: Secret scanning configuration; blocked secret leakage logs.

Availability (A)
A-1.1 Disaster Recovery and Model Rollback

Requirement: Maintain backups for model registries and prompt configs with tested rollback within minutes.

Evidence: Rollback test logs; model registry screenshots.

A-1.2 Token-Based Rate Limiting

Requirement: Apply token-level limits to prevent model DoS and runaway compute costs.

Evidence: API gateway configuration; rate limit logs.

A-1.3 Inference Latency Monitoring

Requirement: Track time-to-first-token and end-to-end latency with SLA thresholds.

Evidence: Monitoring dashboards; alert configurations.

A-1.4 Capacity Planning and Cost Guardrails

Requirement: Forecast GPU usage and enforce budget alerts to prevent overload or outage during spikes.

Evidence: Capacity plan; billing alerts; autoscaling policies.

A-1.5 Provider Dependency and Failover

Requirement: Document third-party model dependencies and establish a failover plan or graceful degradation path.

Evidence: Dependency map; failover runbook; outage simulation report.

Processing Integrity (PI)
PI-1.1 Evaluation and Regression Testing

Requirement: Run a golden set evaluation on every model or prompt change; maintain a defined quality threshold.

Evidence: CI/CD evaluation logs; versioned evaluation dataset.

PI-1.2 Grounding Enforcement

Requirement: For RAG, force the model to prioritize retrieved context with low temperature settings for factual tasks.

Evidence: System prompt configuration; grounding metrics.

PI-1.3 Uncertainty Refusal

Requirement: If retrieval confidence is low, the system refuses or escalates instead of fabricating an answer.

Evidence: Refusal logs; confidence threshold logic.

PI-1.4 Deterministic Handoffs

Requirement: Route calculations, dates, and compliance logic to deterministic code paths rather than the model.

Evidence: Architecture diagram; function-calling or tool-use code.

PI-1.5 Post-Deployment Sampling

Requirement: Sample production outputs and measure hallucination rates with defined remediation thresholds.

Evidence: Sampling plan; review logs; remediation tickets.

Change Management (CM)
CM-1.1 Versioning of Prompts and Models

Requirement: Treat prompts and model configs as production code with version control and change approvals.

Evidence: Git history for prompt/config files; change request tickets.

CM-1.2 Human Review for Model Changes

Requirement: A domain expert reviews a sample of outputs before promotion to production.

Evidence: Sign-off records; review checklist.

CM-1.3 Staged Rollouts and Rollbacks

Requirement: Use canary or staged rollouts with defined rollback triggers for prompt or model updates.

Evidence: Release pipeline logs; rollback criteria documentation.

CM-1.4 Change Impact Assessments

Requirement: Document the expected impact of changes on safety, reliability, and customer-facing behavior.

Evidence: Change impact templates; review meeting notes.

Standards alignment

Where this fits

Our framework draws from established AI security standards:

  • OWASP LLM Top 10: Attack categories and mitigations.
  • MITRE ATLAS: Adversary tactics for ML systems.
  • NIST AI RMF: Risk management approach.
  • ISO 42001: AI management system requirements.

Not a checklist exercise

We don't just verify you have policies. We test if your controls actually work by attacking your system.

A control that exists on paper but fails under attack gets flagged as a finding.

See how you measure up

We'll assess your system against this framework and show you exactly where you stand.

Get an assessment