"Adaptive Decoding via Latent Preference Optimization"

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

"Adaptive Decoding via Latent Preference Optimization"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 18, 2024

Transcript

AdaptiveDecoder lets LLMs switch between creative and precise modes automatically during generation

Neural networks can now learn when to be creative versus precise while generating text

LLMs currently use fixed temperature settings for all tasks, limiting their ability to balance between creative and factual responses. This paper introduces Adaptive Decoding, which dynamically adjusts sampling temperature during generation, allowing models to optimize performance across diverse tasks ranging from math to creative writing.

-----

https://arxiv.org/abs/2411.09661

🤔 Original Problem:

LLMs struggle with using a single fixed temperature for all tasks - low temperatures work better for factual tasks like math, while high temperatures are needed for creative writing. Manual temperature tuning is time-consuming and inflexible.

-----

🔧 Solution in this Paper:

→ The paper adds an AdaptiveDecoder neural layer that predicts optimal temperature values during generation

→ It can work at sequence-level (one temperature per response) or token-level (different temperatures for each token)

→ The decoder learns temperature selection through Latent Preference Optimization (LPO), which trains on pairs of preferred vs rejected outputs

→ LPO enables learning discrete choices like temperature values by optimizing for final response quality

-----

💡 Key Insights:

→ Dynamic temperature selection outperforms fixed temperatures across diverse tasks

→ Token-level adaptation allows fine-grained control within a single response

→ The method is generalizable to other decoding parameters beyond temperature

-----

📊 Results:

→ Reduced 3-gram repetitions by 42% compared to greedy decoding

→ Outperformed all fixed temperature baselines on UltraFeedback, Creative Writing, and GSM8K tasks

→ Learned to select appropriate temperatures: low for math, high for creative tasks

Rohan's Bytes

"Adaptive Decoding via Latent Preference Optimization"

Discussion about this video