0:00
/
0:00
Transcript

Vector-ICL: In-context Learning with Continuous Vector Representations

Generated this podcast with Google's Illuminate.

LLMs can now directly process vector data from diverse domains by projecting it into their embedding space

Vector-ICL (in-context learning) outperforms few-shot ICL and domain-specific models across various tasks:

By aligning input data with an LLM’s embedding space through lightweight projectors, the Paper observes that LLMs can effectively process and learn from these projected vectors, which they term Vector-ICL.

- Text classification: Up to 98.16% accuracy (SST2 dataset)

- Text summarization: 20.49 RougeL score (XLSum dataset)

---------

📚 https://arxiv.org/abs/2410.05629

Original Problem 🔍:

LLMs excel at in-context learning (ICL) with textual data, but their capabilities with continuous vectors from diverse domains remain unexplored.

-----

Solution in this Paper 🧠:

• Vector-ICL: Technique enabling LLMs to perform ICL on continuous vector representations

• Embedding projection: Aligns input data with LLM's embedding space using lightweight projectors

• Two-step process: Pretraining projectors with language modeling objectives, then fine-tuning on specific tasks

• Applicable across various modalities: Text, numerical data, molecules, time series, graphs, and brain fMRI

--------

Architecture 🏗️

Input embeddings and LLM embeddings live in different vector spaces

-> Takes any input (text, numbers, brain fMRI, time series, graphs)

-> Input first goes through an encoder to get embeddings

-> These embeddings are transformed via "projector" to match LLM's embedding space dimension

-> The projected vectors are used as special tokens (box tokens □) in prompts

-----

Key Insights from this Paper 💡:

• LLMs can process and learn from continuous vector representations beyond discrete tokens

• Vector-ICL bridges non-textual domains with LLMs without extensive retraining

• Enhances LLMs' numerical reasoning and cross-modal capabilities

• Demonstrates LLMs' flexibility in adapting to different input representations

• Enables efficient use of LLMs in scientific and technical domains

-----

More on the Results 📊:

- Time-series and graph classification: Surpasses domain-specific models

- Brain fMRI decoding: Exceeds random baselines in text reconstruction

• Demonstrates effectiveness in cross-modal tasks like molecule captioning

Discussion about this video