"Inference time LLM alignment in single and multidomain preference spectrum"

Playback speed

Share post at current time

0:00

Transcript

"Inference time LLM alignment in single and multidomain preference spectrum"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 28, 2024

Alignment Vectors enable real-time LLM behavior control without retraining

📚 https://arxiv.org/abs/2410.19206

🎯 Original Problem:

Current LLM alignment methods require full retraining for changes and need reward models during inference, making them resource-intensive and inflexible.

-----

🔧 Solution in this Paper:

• Introduces Alignment Vectors (AV) - encoded representations of preference dimensions computed by subtracting base model parameters from aligned model parameters

• Enables dynamic behavior adjustment through simple linear operations during inference

• Tests three proficiency levels: Expert opinion (Exp), Generic response (Gen), Avoidance (Avd)

• Focuses on three domains: Medical, Legal, Financial

• Created 38k domain-specific queries using PersonaHub dataset and CreatePersona method

-----

💡 Key Insights:

• AVs are transferable across different fine-tuning stages of same model

• Reduces inference cost by 50% compared to prompt engineering

• Enables multidomain diverse preference alignment 12x faster than retraining

• Basic approach works only for LLMs with same architecture

• Grid search needed for multidomain alignment

-----