LLMs can now directly process vector data from diverse domains by projecting it into their embedding space
Vector-ICL (in-context learning) outperforms few-shot ICL and domain-specific models across various tasks:
By aligning input data with an LLM’s embedding space through lightweight projectors, the Paper observes that LLMs can effectively process and learn from these projected vectors, which they term Vector-ICL.
- Text classification: Up to 98.16% accuracy (SST2 dataset)
- Text summarization: 20.49 RougeL score (XLSum dataset)
---------
📚 https://arxiv.org/abs/2410.05629
Original Problem 🔍:
LLMs excel at in-context learning (ICL) with textual data, but their capabilities with continuous vectors from diverse domains remain unexplored.
-----
Solution in this Paper 🧠:
• Vector-ICL: Technique enabling LLMs to perform ICL on continuous vector representations
• Embedding projection: Aligns input data with LLM's embedding space using lightweight projectors
• Two-step process: Pretraining projectors with language modeling objectives, then fine-tuning on specific tasks
• Applicable across various modalities: Text, numerical data, molecules, time series, graphs, and brain fMRI
--------
Architecture 🏗️
Input embeddings and LLM embeddings live in different vector spaces
-> Takes any input (text, numbers, brain fMRI, time series, graphs)
-> Input first goes through an encoder to get embeddings
-> These embeddings are transformed via "projector" to match LLM's embedding space dimension
-> The projected vectors are used as special tokens (box tokens □) in prompts
-----
Key Insights from this Paper 💡:
• LLMs can process and learn from continuous vector representations beyond discrete tokens
• Vector-ICL bridges non-textual domains with LLMs without extensive retraining
• Enhances LLMs' numerical reasoning and cross-modal capabilities
• Demonstrates LLMs' flexibility in adapting to different input representations
• Enables efficient use of LLMs in scientific and technical domains
-----
More on the Results 📊:
- Time-series and graph classification: Surpasses domain-specific models
- Brain fMRI decoding: Exceeds random baselines in text reconstruction
• Demonstrates effectiveness in cross-modal tasks like molecule captioning
Share this post