Skip to main content

Attacks Introduction

deepteam offers 10+ SOTA, research-backed attack methods such as prompt injection, linear jailbreaking, and leetspeak to expose undesirable vulnerabilities elicited from your LLM app. Attacks on their own are not specific to any vulnerability.

Quick Summary

Attack methods in deepteam are all about enhancing or progressing existing "baseline" adversarial attacks — harmful prompts that target a specific vulnerability. These "baseline" attacks, are usually simulated from a specific vulnerability.

You would enhance a baseline attack for single-turn use cases, while progressing a single attack into a multi-turn one if you're building conversational use cases.

info

deepteam also allows you to combine single and multi-turn attack methods, where you would progress a baseline attack using a multi-turn attack like LinearJailbreaking while applying turn-level enhancements such as Roleplay for each step in the progression.

How Does It Work?

deepteam's attacks work by first simulating simplistic "baseline" adversarial attacks, before progressively applying various attack methods such as PromptInjection to create more sophisticated versions akin to what a malicious user would be doing. This is known as attack enhancement.

note

The target LLM's outputs to these attacks are then evaluated to determine if your LLM system is weak against a certain vulnerability. Each attack is simulated based on a certain vulnerability.

In deepteam, there are two main categories of attacks:

  • Single-turn
  • Multi-turn

Each category of attacks contain their own list of attack methods. For example, you'll find a few jailbreakings in multi-turn attacks, while something like Leetspeak for single-turn ones. Defining your attack method is as simple as importing them from the attacks module and provide them in the attacks parameter of the red_team() method.

from deepteam.attaks.single_turn import Leetspeak
from deepteam.attacks.multi_turn import LinearJailbreaking
from deepteam.vulnerabilities import Bias
from deepteam import red_team
from somewhere import your_callback

risk_assessment = red_team(
attacks=[Leetspeak(), LinearJailbreaking()],
vulnerabilities=[Bias()],
model_callback=your_callback
)
DID YOU KNOW?

deepteam randomly samples each attack method per vulnerability type during red teaming.

Single vs Multi-Turn

In deepteam the attacks are classified into two types:

  • Single-turn enhancements, and
  • Multi-turn progressions

Enhancements

Single-turn enhancements are just one-shot attack enhancements, they take a single attack and enhance it without involving the usage of target LLM. An example of single-turn attack is ROT13, here's how it works:

Single-turn attacks only replace the input of a RTTestCase, the red_team method in deepteam takes care of passing this enhanced attack to the target LLM and populating the actual_output to make the test case ready for evaluation.

Progressions

Multi-turn progressions are much more sophisticated and controlled attacks, they converse with the target LLM like a user and adjust their attacks to better probe the target LLM into generating harmful outputs. An example of multi-turn attacks is LinearJailbreaking, here's how it works:

info

Click here for a detailed example on LinearJailbreaking.

Multi-turn attacks simulate an entire conversation with the target LLM and track their exchanges in RTTurns. After the execution of multi-turn attacks, they return a list of turns that can be populated in a RTTestCase which can then be evaluated.

Single-Turn Enhancements

Single-turn are categorized into two main types:

  • Encoding-based, and
  • One-shot

Encoding-based attack enhancements apply simple encoding techniques, such as character rotation, to obscure the baseline attack. One-shot attack enhancements use a single LLM pass to modify the attack, for instance, by embedding it within a math problem.

Encoding-based

deepteam supports multiple encoding-based attack enhancements that work by transforming the baseline attack using different encoding or encryption techniques. These enhancements are designed to obscure the content of the attack, making it more difficult for content filters to detect harmful intent. Encoding-based attacks leverage techniques like text rotation, character substitution, or encoding schemes to alter the visible content while retaining its malicious meaning.

LangChain

One-shot

One-shot attack enhancements utilize an LLM to creatively modify the baseline attack in a single pass. These enhancements disguise or restructure the attack in ways that evade detection, making them more creative and adaptable to different contexts. The LLM applies the enhancement, which adds an element of unpredictability, making these attacks harder to detect with traditional methods.

LangChain

Unlike encoding-based enhancements, which are deterministic, one-shot enhancements are non-deterministic and variable. This means there is a chance of non-compliance by the LLM, and in such cases, the enhancement can be retried up to 5 times.

tip

Using a powerful but older model like gpt-4o-mini can increase your enhancement success rates.

There are a total of 14 single-turn attacks available on deepteam:

Multi-Turn Progressions

Multi-turn progressions are iterative red-teaming flows that start from a base attack taken from a known vulnerability and adapt this attack across conversational turns. Each turn uses the target model's previous reply to refine the attack by rephrasing, changing persona, or escalating intensity. This technique allows you to probe weaknesses gradually instead of requesting a single direct exploit.

LangChain

The flow continues until the model produces a harmful output or the configured turn/depth limits are reached; this bounded approach makes tests repeatable and reduces risk. Because each turn builds on prior context, multi-turn attacks are especially effective at revealing subtle failure modes such as inconsistent refusals, context leakage, and escalation vulnerabilities.

There are a total of 5 multi-turn attacks available on deepteam:

Selecting Your Attack Strategy

When choosing what attacks to use in your red teaming tests, you should consider 3 important points:

  1. Is your LLM application a chatbot?
  2. What is the purpose of your LLM application?
  3. Does your LLM have have access to external resources or is it isolated to its knowledge base?

Depending on your answers to the above questions, you can select the attck strategies to use against your LLM application.

  • For chatbots, multi-turn attack are the optimal strategy.
  • Depending on the purpose of your LLM application, you can choose encoding-based or normal simulator based enhancement attacks.
  • If your LLM application has access to external resources and is a chatbot, using multi-turn attacks with turn level enhancements is recommended.

If you need more guidance on choosing your attack strategy, feel free to ask us in discord. We'll be happy to have you there 🙂