0:00
/
0:00
Transcript

"The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance"

The podcast on this paper is generated with Google's Illuminate.

This paper reveals LLMs are surprisingly sensitive to minor prompt variations

📚 https://arxiv.org/abs/2401.03729

Original Problem 🤔:

LLMs are widely used for data labeling, but practitioners make various prompt design choices - from output formats to jailbreaks. No systematic study exists on how these prompt variations affect model predictions and reliability.

-----

Solution in this Paper 🔧:

→ Tested 24 prompt variations across 3 categories:

- Output formats (JSON, CSV, XML etc.)

- Minor perturbations (spaces, greetings etc.)

- Jailbreaks (AIM, Dev Mode v2 etc.)

→ Evaluated on 11 classification tasks using ChatGPT and Llama2 models

→ Analyzed prediction changes, accuracy impacts, and similarity between variations

→ Used multidimensional scaling to visualize relationships between prompt variations

-----

Key Insights from this Paper 💡:

→ Even tiny changes like adding a space can cause 500+ prediction changes out of 11,000 samples

→ Larger models (ChatGPT, Llama-70B) are more robust to variations than smaller ones

→ No single prompt variation consistently performs best across tasks

→ Jailbreaks cause massive disruptions - over 2500 prediction changes in ChatGPT

→ Output format specifications can significantly impact accuracy

-----

Results 📊:

→ 10% predictions change just by specifying output format

→ ChatGPT's JSON checkbox caused more changes than plain JSON specification

→ Jailbreaks led to 90% invalid responses in ChatGPT

→ Majority voting across variations improved accuracy by 1-2%

Discussion about this video