ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

How much LLMs get confused by different ways of asking the same question.

Nov 11, 2024

How much LLMs get confused by different ways of asking the same question.

Meet ProSA: The framework that catches LLMs playing favorites with prompts

Original Problem 🔍:

LLMs exhibit prompt sensitivity, affecting performance and user satisfaction. Current research overlooks instance-level variations and subjective evaluations.

Solution in this Paper 🛠️:

• ProSA framework for evaluating prompt sensitivity in LLMs

• Novel PromptSensiScore (PSS) metric for quantifying sensitivity

• Instance-level analysis across multiple tasks and datasets

• Utilizes decoding confidence to explain underlying mechanisms

Key Insights from this Paper 💡:

• Prompt sensitivity varies across datasets and models

• Larger models generally show enhanced robustness

• Few-shot examples alleviate prompt sensitivity

• Subjective evaluations are susceptible to prompt sensitivities

• Higher model confidence correlates with increased prompt robustness

Results 📊:

• Llama3-70B-Instruct demonstrates highest robustness

• Transition from zero-shot to one-shot shows significant improvement

• Larger LLMs benefit more from increased few-shot instances

• LLMs robust in answering straightforward queries, sensitive in complex tasks

• Prompt sensitivity reflects model's confidence level

🧠 The analysis revealed that prompt sensitivity is essentially a reflection of the model's confidence level:

Higher confidence in outputs correlates with increased robustness against prompt semantic variations
When a model is robust to prompts for a given instance (low PSS score), it exhibits the highest decoding confidence
Conversely, when sensitive to prompts, the model's decoding confidence decreases
Rohan's Bytes is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Rohan's Bytes

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

How much LLMs get confused by different ways of asking the same question.

Discussion about this post