0:00
/
0:00
Transcript

"Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs"

Generated below podcast on this paper with Google's Illuminate.

Simple token prepending trick makes LLMs understand sentences better without extra training.

Token Prepending enhances sentence embeddings from LLMs by allowing earlier tokens to access complete sentence information through a simple prepending operation.

-----

https://arxiv.org/abs/2412.11556

🤔 Original Problem:

Current LLMs use causal attention where earlier tokens can't see later tokens, leading to biased sentence embeddings. Existing solutions like repetition increase computational costs significantly.

-----

🔧 Solution in this Paper:

→ Token Prepending (TP) technique prepends a special <PST> token before the input sentence

→ At each layer, TP replaces the <PST> token's embedding with the sentence embedding from previous layer

→ This allows earlier tokens to access complete sentence information through causal attention

→ The operation stops after early layers (typically 7th or 8th) to optimize performance

→ Uses intermediate layer outputs instead of final layer for better semantic representation

-----

💡 Key Insights:

→ Simple token prepending significantly improves sentence embeddings without training

→ Early layers are crucial for capturing backward dependencies

→ Final layer embeddings contain less semantic information

→ Method works across different LLM architectures and sizes

-----

📊 Results:

→ Improves PromptEOL performance by 7.16 points on STS tasks

→ Adds only 4% inference overhead compared to baseline

→ Achieves 77.19 average Spearman correlation on STS tasks

→ Consistently outperforms baselines across 7 transfer learning tasks

Discussion about this video

User's avatar