Skip to main content

Introduction

The DeepTeam Frameworks define structured methodologies for AI red teaming and risk assessment. Each framework maps to a recognized safety or security standard, helping you test your model's robustness against real-world adversarial behavior, dataset risks, and system vulnerabilities.

DeepTeam supports multiple frameworks — from dataset-based testing (BeaverTails, Aegis) to security and governance standards (MITRE ATLAS, NIST AI RMF, OWASP Top 10 for LLMs).

Available Frameworks

Here are the list of frameworks available in deepteam:

OWASP Top 10 for LLMs

The OWASP Top 10 for LLMs framework identifies the most critical security risks in LLM applications. It reflects the 2025 OWASP edition, covering vulnerabilities in RAG systems, agents, and model integrations.

  • Tests for prompt injection, system prompt leakage, vector weaknesses, and more
  • Simulates real-world exploit attempts and harmful outputs
  • Ideal for application-level AI security assessments
from deepteam.frameworks import OWASPTop10
owasp = OWASPTop10(num_attacks=10)

risk = red_team(
model_callback=your_model_callback,
framework=owasp
)

Learn more about OWASP Top 10 for LLMs

NIST AI RMF

The NIST AI Risk Management Framework (RMF) provides a structured approach to identifying, managing, and mitigating AI risks. It focuses on trustworthiness, robustness, and accountability — aligning LLM behavior with regulatory and ethical standards.

  • Tests safety, fairness, and robustness dimensions
  • Suitable for compliance-driven AI governance workflows
from deepteam.frameworks import NIST
nist = NIST(num_attacks=10)

risk = red_team(
model_callback=your_model_callback,
framework=nist
)

Learn more about NIST AI RMF

MITRE ATLAS

The MITRE ATLAS framework integrates the MITRE ATLAS knowledge base, focusing on adversarial tactics and techniques used against AI systems.
It evaluates system resilience across attack phases like Reconnaissance, Resource Development, Initial Access, and Impact.

  • Tests adversarial behavior patterns from the ATLAS taxonomy
  • Detects vulnerabilities such as prompt injection, data poisoning, and exfiltration
  • Ideal for AI security simulation and penetration testing
from deepteam.frameworks import MITREATLAS
from deepteam import red_team
from somewhere import your_model_callback

atlas = MITREATLAS(num_attacks=10)

risk = red_team(
model_callback=your_model_callback,
framework=atlas
)

Learn more about MITRE ATLAS

BeaverTails

The BeaverTails framework integrates the PKU BeaverTails dataset — a large, human-labeled dataset of harmful and borderline prompts. It performs dataset-driven red teaming, surfacing model weaknesses across categories like abuse, misinformation, and privacy violations.

  • Uses real-world harmful prompts instead of synthetic generation
  • Validates content safety and refusal behavior
from deepteam.frameworks import BeaverTails
beaver = BeaverTails(num_attacks=10)

risk = red_team(
model_callback=your_model_callback,
framework=beaver
)

Learn more about BeaverTails

Aegis

The Aegis framework integrates the NVIDIA Aegis AI Content Safety Dataset, which follows NVIDIA's content safety taxonomy across 13 harm categories. It provides a comprehensive safety evaluation using real human-labeled harmful content.

  • Tests for harmful user messages across multiple safety dimensions
  • Useful for evaluating model robustness under real-world safety challenges
from deepteam.frameworks import Aegis
aegis = Aegis(num_attacks=10)

risk = red_team(
model_callback=your_model_callback,
framework=aegis
)

Learn more about Aegis

tip

You can customize and add more attacks and vulnerabilities to already existing frameworks to specialise red-teaming to your LLM's use case.