0:00
/
0:00
Transcript

"Inference time LLM alignment in single and multidomain preference spectrum"

The podcast on this paper is generated with Google's Illuminate.

Alignment Vectors enable real-time LLM behavior control without retraining

📚 https://arxiv.org/abs/2410.19206

🎯 Original Problem:

Current LLM alignment methods require full retraining for changes and need reward models during inference, making them resource-intensive and inflexible.

-----

🔧 Solution in this Paper:

• Introduces Alignment Vectors (AV) - encoded representations of preference dimensions computed by subtracting base model parameters from aligned model parameters

• Enables dynamic behavior adjustment through simple linear operations during inference

• Tests three proficiency levels: Expert opinion (Exp), Generic response (Gen), Avoidance (Avd)

• Focuses on three domains: Medical, Legal, Financial

• Created 38k domain-specific queries using PersonaHub dataset and CreatePersona method

-----

💡 Key Insights:

• AVs are transferable across different fine-tuning stages of same model

• Reduces inference cost by 50% compared to prompt engineering

• Enables multidomain diverse preference alignment 12x faster than retraining

• Basic approach works only for LLMs with same architecture

• Grid search needed for multidomain alignment

-----

📊 Results:

• Achieves 93% safety preference accuracy at λ=1

• Medical domain: 95% expert accuracy at λ=0.5

• Financial domain: 85% expert accuracy at λ=0.3

• Legal domain: 100% expert accuracy at λ=0.3

• Human evaluation achieved Cohen's kappa score of 0.84

Discussion about this video