Introduction to LLM Guardrails
deepteam
enables anyone to safeguard LLM applications against vulnerable inputs and outputs uncovered during red teaming.
While deepeval
's metrics focuses on accuracy and precision, deepteam
's guardrails focuses on speed and reliability.
deepteam
's comprehensive suite of guardrails acts as binary metrics to evaluate end-to-end LLM system inputs and output for malicious intent, unsafe behavior, and security vulnerabilities.
Types of Guardrails
There are two types of guardrails:
- Input guardrails - Protects against malicious inputs before they reach your LLM
- Output guardrails - Evaluates an LLM's output before they reach your users.
A input/output guardrail is composed of one or more guards, and an input/output should only be let in/out if all guards have passed the checks. Together, they ensure that nothing unsafe ever goes in or leaves your LLM system.
The number of guards you choose to set up and whether you decide to utilize both types of guards depends on your priorities regarding latency, cost, and LLM safety.
While most guards are only either for input or output guarding, some guards such as the cybersecurity guard offer both input and output guarding capabilities.
Guard Your First LLM Interaction
First import and initialize the guards you desire from deepteam.guardrails.guards
and pass them to an instantiated Guardrails
object:
from deepteam.guardrails import Guardrails
from deepteam.guardrails.guards import PromptInjectionGuard, ToxicityGuard
# Initialize guardrails
guardrails = Guardrails(
input_guards=[PromptInjectionGuard()]
output_guards=[ToxicityGuard()]
)
Your list of input_guards
and output_guards
must not be empty if you wish to guard the respecitve inputs and outputs.
Guard an Input
Then, simply call the guard_input()
method and supply the input
.
...
guard_result = guardrails.guard_input(input="Is the earth flat")
# Reject if true
print(guard_result.breached)
The guard_input
method will invoke every single guard within your Guardrails
s to check whether a particular input has breached any of the defined criteria.
Guardrails
used against malicious inputs is the best way to not waste tokens generating text that you shouldn't even be generating in the first place.
Guard an Output
Simiarly, call the guard_output
method wherever appropriate in your code to guard against an LLM output:
...
guard_result = guardrails.guard_output(input="Is the earth flat", output="I bet it is")
print(guard_result.breached)
It is extremely common to be re-generated LLM outputs for x number of times until successful.
When using the guard_output()
method, you must provide both the input
and the LLM output
.
Run Async Guardrails
deepteam
's Guardrails
also support asynchronous through the a_guard_input()
and a_guard_output()
methods.
...
guard_result = await guardrails.a_guard_output(input, output)
Available Guards
deepteam
offers a robust selection of input and output guards for protection against vulnerabilities you've detected during red teaming:
Input
To be documented...
Output
To be documented...