Linear Jailbreaking
Linear Jailbreaking follows a systematic progression, where each new attack builds directly on the previous response from the LLM. The process involves iterating through increasingly persuasive attacks, gradually pushing the boundaries of the model's restrictions. At each stage, the enhanced attack is evaluated, and adjustments are made based on the model’s feedback. The goal is to lead the LLM towards generating harmful outputs while ensuring that each step builds on the previous one to maintain a logical flow.
The process continues until the attacker LLM generates a non-compliant attack or the maximum number of iterations is reached.
Usage
from deepteam.attacks.multi_turn import LinearJailbreaking
linear_jailbreaking = LinearJailbreaking()
There are TWO optional parameters when creating a LinearJailbreaking
attack:
- [Optional]
weight
: an integer that determines this attack method's selection probability, proportional to the total weight sum of allattacks
during red teaming. Defaulted to1
. - [Optional]
turns
: an integer that specifies the number of turns to use in attempt to jailbreak your LLM system. Defaulted to5
.
To use the LinearJailbreaking
attack method, supply it to the red_team()
method:
from deepteam import red_team
...
red_team(attacks=[linear_jailbreaking], model_callback=..., vulnerabilities=...)
Example
to be documented...