Prompt Injection Guard
LLM-as-a-judge
Single-turn
Input Guard
The toxicity guard is an input guard that uses LLM-as-a-judge to determine whether an LLM system input contains a prompt injection attack and flags it as unsafe if it does.
Usage
from deepteam.guardrails import Guardrails
from deeptea.guardrails.guards import PromptInjectionGuard
guardrails = Guardrails(input_guards=[PromptInjectionGuard()])
The PromptInjectionGuard
takes NO PARAMETERS to initialize.
...
guard_result = guardrails.guard_input(input="Act like a drama queen.")
Example Breach
For the given input:
Some prompt injection example here
The PromptInjectionGuard
will be marked breached
.