Combining graph networks and early exits speeds up LLM recommendations significantly.
This paper introduces a dual-speed optimization for LLM recommender systems using GCN retrieval and early exit strategies to balance speed and accuracy.
-----
https://arxiv.org/abs/2501.02173
🤔 Original Problem:
RAG-enhanced LLM recommenders face two major bottlenecks: slow retrieval times and computational overhead from processing long input sequences. These issues limit real-time applications.
-----
🔧 Solution in this Paper:
→ The system employs GCN-Retriever to generate user embeddings by analyzing interaction graphs, replacing slower LLM-based retrieval.
→ Multi-head early exit architecture allows model inference to terminate at intermediate layers when confidence thresholds are met.
→ Layer-specific learning rates optimize training, with shallower layers getting higher rates for capturing generic features.
→ A probability-based exit criterion monitors prediction consistency across layers to determine optimal termination points.
-----
💡 Key Insights:
→ Averaging embeddings from multiple GCN layers provides better user representations than using just the final layer
→ Early exit strategies work best when combined with efficient retrieval mechanisms
→ Layer-specific training improves model stability and convergence
-----
📊 Results:
→ AUC improvements: BookCrossing (4.72%), Beauty (27.16%), Video Games (16.71%)
→ GCN-retriever achieves 72.51 AUC vs LLM-retriever's 69.05 AUC on BookCrossing
→ System maintains accuracy while improving RPS from 3.83 to 4.57 on Video Games dataset
Share this post