Skip to main content

Linear Jailbreaking

Linear Jailbreaking follows a systematic progression, where each new attack builds directly on the previous response from the LLM. The process involves iterating through increasingly persuasive attacks, gradually pushing the boundaries of the model's restrictions. At each stage, the enhanced attack is evaluated, and adjustments are made based on the model’s feedback. The goal is to lead the LLM towards generating harmful outputs while ensuring that each step builds on the previous one to maintain a logical flow.

LangChain
info

The process continues until the attacker LLM generates a non-compliant attack or the maximum number of iterations is reached.

Usage

from deepteam.attacks.multi_turn import LinearJailbreaking

linear_jailbreaking = LinearJailbreaking()

There are TWO optional parameters when creating a LinearJailbreaking attack:

  • [Optional] weight: an integer that determines this attack method's selection probability, proportional to the total weight sum of all attacks during red teaming. Defaulted to 1.
  • [Optional] turns: an integer that specifies the number of turns to use in attempt to jailbreak your LLM system. Defaulted to 5.

To use the LinearJailbreaking attack method, supply it to the red_team() method:

from deepteam import red_team
...

red_team(attacks=[linear_jailbreaking], model_callback=..., vulnerabilities=...)

Example

to be documented...