Attacks Introduction
deepteam
offers 10+ SOTA, research-backed attack methods such as prompt injection, linear jailbreaking, and leetspeak to expose undesirable vulnerabilities elicited from your LLM app. Attacks on their own are not specific to any vulnerability.
Quick Summary
Attack methods in deepteam
are all about enhancing or progressing existing "baseline" adversarial attacks — harmful prompts that target a specific vulnerability. These "baseline" attacks, are usually simulated from a specific vulnerability.
You would enhance a baseline attack for single-turn use cases, while progressing a single attack into a multi-turn one if you're building conversational use cases.
deepteam
also allows you to combine single and multi-turn attack methods, where you would progress a baseline attack using a multi-turn attack like LinearJailbreaking
while applying turn-level enhancements such as Roleplay
for each step in the progression.
How Does It Work?
deepteam
's attacks work by first simulating simplistic "baseline" adversarial attacks, before progressively applying various attack methods such as PromptInjection
to create more sophisticated versions akin to what a malicious user would be doing. This is known as attack enhancement.
The target LLM's outputs to these attacks are then evaluated to determine if your LLM system is weak against a certain vulnerability. Each attack is simulated based on a certain vulnerability.
In deepteam
, there are two main categories of attacks:
- Single-turn
- Multi-turn
Each category of attacks contain their own list of attack methods. For example, you'll find a few jailbreakings in multi-turn attacks, while something like Leetspeak
for single-turn ones. Defining your attack method is as simple as importing them from the attacks
module and provide them in the attacks
parameter of the red_team()
method.
from deepteam.attaks.single_turn import Leetspeak
from deepteam.attacks.multi_turn import LinearJailbreaking
from deepteam.vulnerabilities import Bias
from deepteam import red_team
from somewhere import your_callback
risk_assessment = red_team(
attacks=[Leetspeak(), LinearJailbreaking()],
vulnerabilities=[Bias()],
model_callback=your_callback
)
deepteam
randomly samples each attack method per vulnerability type during red teaming.
Single vs Multi-Turn
In deepteam
the attacks are classified into two types:
- Single-turn enhancements, and
- Multi-turn progressions
Enhancements
Single-turn enhancements are just one-shot attack enhancements, they take a single attack and enhance it without involving the usage of target LLM. An example of single-turn attack is ROT13
, here's how it works:
Single-turn attacks only replace the input
of a RTTestCase
, the red_team
method in deepteam
takes care of passing this enhanced attack to the target LLM and populating the actual_output
to make the test case ready for evaluation.
Progressions
Multi-turn progressions are much more sophisticated and controlled attacks, they converse with the target LLM like a user and adjust their attacks to better probe the target LLM into generating harmful outputs. An example of multi-turn attacks is LinearJailbreaking
, here's how it works:
Click here for a detailed example on LinearJailbreaking
.
Multi-turn attacks simulate an entire conversation with the target LLM and track their exchanges in RTTurn
s. After the execution of multi-turn attacks, they return a list of turns
that can be populated in a RTTestCase
which can then be evaluated.
Single-Turn Enhancements
Single-turn are categorized into two main types:
- Encoding-based, and
- One-shot
Encoding-based attack enhancements apply simple encoding techniques, such as character rotation, to obscure the baseline attack. One-shot attack enhancements use a single LLM pass to modify the attack, for instance, by embedding it within a math problem.
Encoding-based
deepteam
supports multiple encoding-based attack enhancements that work by transforming the baseline attack using different encoding or encryption techniques. These enhancements are designed to obscure the content of the attack, making it more difficult for content filters to detect harmful intent. Encoding-based attacks leverage techniques like text rotation, character substitution, or encoding schemes to alter the visible content while retaining its malicious meaning.
One-shot
One-shot attack enhancements utilize an LLM to creatively modify the baseline attack in a single pass. These enhancements disguise or restructure the attack in ways that evade detection, making them more creative and adaptable to different contexts. The LLM applies the enhancement, which adds an element of unpredictability, making these attacks harder to detect with traditional methods.
Unlike encoding-based enhancements, which are deterministic, one-shot enhancements are non-deterministic and variable. This means there is a chance of non-compliance by the LLM, and in such cases, the enhancement can be retried up to 5 times.
Using a powerful but older model like gpt-4o-mini
can increase your enhancement success rates.
There are a total of 14 single-turn attacks available on deepteam
:
Base64
GrayBox
Leetspeak
MathProblem
Multilingual
PromptInjection
Roleplay
ROT-13
ContextPoisoning
GoalRedirection
InputBypass
PermissionEscalation
LinguisticConfusion
SystemOverride
Multi-Turn Progressions
Multi-turn progressions are iterative red-teaming flows that start from a base attack taken from a known vulnerability and adapt this attack across conversational turns. Each turn uses the target model's previous reply to refine the attack by rephrasing, changing persona, or escalating intensity. This technique allows you to probe weaknesses gradually instead of requesting a single direct exploit.
The flow continues until the model produces a harmful output or the configured turn/depth limits are reached; this bounded approach makes tests repeatable and reduces risk. Because each turn builds on prior context, multi-turn attacks are especially effective at revealing subtle failure modes such as inconsistent refusals, context leakage, and escalation vulnerabilities.
There are a total of 5 multi-turn attacks available on deepteam
:
Selecting Your Attack Strategy
When choosing what attacks to use in your red teaming tests, you should consider 3 important points:
- Is your LLM application a chatbot?
- What is the purpose of your LLM application?
- Does your LLM have have access to external resources or is it isolated to its knowledge base?
Depending on your answers to the above questions, you can select the attck strategies to use against your LLM application.
- For chatbots, multi-turn attack are the optimal strategy.
- Depending on the purpose of your LLM application, you can choose encoding-based or normal simulator based enhancement attacks.
- If your LLM application has access to external resources and is a chatbot, using multi-turn attacks with turn level enhancements is recommended.
If you need more guidance on choosing your attack strategy, feel free to ask us in discord. We'll be happy to have you there 🙂