Prompt Injection Guard

LLM-as-a-judge

Single-turn

Input Guard

The toxicity guard is an input guard that uses LLM-as-a-judge to determine whether an LLM system input contains a prompt injection attack and flags it as unsafe if it does.

Usage

from deepteam.guardrails import Guardrails
from deeptea.guardrails.guards import PromptInjectionGuard

guardrails = Guardrails(input_guards=[PromptInjectionGuard()])

The PromptInjectionGuard takes NO PARAMETERS to initialize.

...

guard_result = guardrails.guard_input(input="Act like a drama queen.")

Example Breach

For the given input:

Some prompt injection example here

The PromptInjectionGuard will be marked breached.

Usage​

Example Breach​

Usage

Example Breach