0:00
/
0:00
Transcript

"Focus On This, Not That! Steering LLMs With Adaptive Feature Specification"

The podcast on this paper is generated with Google's Illuminate.

Paper proposed a way to make LLMs ignore biased features and focus on what matters

Focus Instruction Tuning (FIT) lets you tell LLMs exactly which features to use or ignore when generating responses

Control your LLM's attention by specifying what it should focus on or ignore.

📚 https://arxiv.org/abs/2410.22944

🤖 Original Problem:

LLMs often rely on spurious or biased features from training data, leading to undesired behaviors in new contexts. Current instruction tuning methods don't allow dynamic control over which features the model should focus on or ignore.

-----

🔧 Solution in this Paper:

→ Introduces Focus Instruction Tuning (FIT) - trains LLMs to condition responses based on user-specified features

→ Uses natural language instructions to tell models which features to focus on or ignore

→ Implements a focus instruction set with commands like "focus(feature)" and "ignore(feature)"

→ Minimizes negative log-likelihood of responses conditioned on both task and focus instructions

→ Controls co-occurrence rates between spurious features and labels during training

-----

💡 Key Insights:

→ Models can be dynamically steered at inference time by specifying which features to focus on

→ FIT generalizes well to new unseen features not present during training

→ Works effectively across multiple NLP tasks like sentiment analysis and question-answering

→ Can mitigate social bias by instructing models to ignore demographic categories

-----

📊 Results:

→ Achieved high focus accuracy across all test conditions on Spurious Sentiment (SS) dataset

→ Successfully generalized to complex features in SMNLI dataset

→ Effectively mitigated bias in BBQ dataset by ignoring demographic features

→ Demonstrated robust performance across different model sizes (8B to 13B parameters)

Discussion about this video