Skip to main content

Introduction to LLM Guardrails

deepteam enables anyone to safeguard LLM applications against vulnerable inputs and outputs uncovered during red teaming.

info

While deepeval's metrics focuses on accuracy and precision, deepteam's guardrails focuses on speed and reliability.

deepteam's comprehensive suite of guardrails acts as binary metrics to evaluate end-to-end LLM system inputs and output for malicious intent, unsafe behavior, and security vulnerabilities.

Types of Guardrails

There are two types of guardrails:

  • Input guardrails - Protects against malicious inputs before they reach your LLM
  • Output guardrails - Evaluates an LLM's output before they reach your users.

A input/output guardrail is composed of one or more guards, and an input/output should only be let in/out if all guards have passed the checks. Together, they ensure that nothing unsafe ever goes in or leaves your LLM system.

Datasets 1

The number of guards you choose to set up and whether you decide to utilize both types of guards depends on your priorities regarding latency, cost, and LLM safety.

tip

While most guards are only either for input or output guarding, some guards such as the cybersecurity guard offer both input and output guarding capabilities.

Guard Your First LLM Interaction

First import and initialize the guards you desire from deepteam.guardrails.guards and pass them to an instantiated Guardrails object:

from deepteam.guardrails import Guardrails
from deepteam.guardrails.guards import PromptInjectionGuard, ToxicityGuard

# Initialize guardrails
guardrails = Guardrails(
input_guards=[PromptInjectionGuard()]
output_guards=[ToxicityGuard()]
)
note

Your list of input_guards and output_guards must not be empty if you wish to guard the respecitve inputs and outputs.

Guard an Input

Then, simply call the guard_input() method and supply the input.

...

guard_result = guardrails.guard_input(input="Is the earth flat")
# Reject if true
print(guard_result.breached)

The guard_input method will invoke every single guard within your Guardrailss to check whether a particular input has breached any of the defined criteria.

tip

Guardrails used against malicious inputs is the best way to not waste tokens generating text that you shouldn't even be generating in the first place.

Guard an Output

Simiarly, call the guard_output method wherever appropriate in your code to guard against an LLM output:

...

guard_result = guardrails.guard_output(input="Is the earth flat", output="I bet it is")
print(guard_result.breached)

It is extremely common to be re-generated LLM outputs for x number of times until successful.

note

When using the guard_output() method, you must provide both the input and the LLM output.

Run Async Guardrails

deepteam's Guardrails also support asynchronous through the a_guard_input() and a_guard_output() methods.

...

guard_result = await guardrails.a_guard_output(input, output)

Available Guards

deepteam offers a robust selection of input and output guards for protection against vulnerabilities you've detected during red teaming:

Input

To be documented...

Output

To be documented...