Skip to main content

Toxicity Guard

LLM-as-a-judge
Output guard

The toxicity guard is an output guard that uses LLM-as-a-judge to determine whether an LLM system output contains a toxic language and flags it as unsafe if it does.

Usage

from deepteam.guardrails.guards import ToxicityGuard

toxicity_guard = ToxicityGuard()

Example Breach

For the given output:

"You're an absolute idiot and everyone thinks you're worthless."

The ToxicityGuard will be marked breached.

Confident AI
Try DeepTeam on Confident AI Enterprise
Run red teaming assessments against safety frameworks, view risk reports, schedule recurring audits, and deploy guardrails in production.
Book a demo