"FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models"

Playback speed

Share post at current time

0:00

Transcript

"FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 27, 2024

Train your small AI models with help from the big ones, privately

FedCoLLM introduces a novel framework that enables simultaneous enhancement of server-side LLMs and client-side Small Language Models (SLMs) while preserving data privacy through parameter-efficient federated learning and knowledge distillation.

-----

https://arxiv.org/abs/2411.11707

🤔 **Original Problem**:

→ Organizations can't directly share sensitive domain data with LLM providers for fine-tuning

→ Small companies lack resources to fine-tune large models locally

→ No existing solution for mutual knowledge transfer between server LLMs and client SLMs

-----

🛠️ **Solution in this Paper**:

→ FedCoLLM deploys lightweight LoRA adapters as bridges between clients and server

→ Uses mutual knowledge distillation between LLM and aggregated SLM via auxiliary dataset

→ Implements secure aggregation to protect privacy during knowledge transfer

→ Enables bidirectional knowledge flow while keeping raw data private

-----

💡 **Key Insights**:

→ Parameter-efficient adapters reduce communication costs to just 0.23-0.29% compared to full model training

→ Mutual knowledge distillation enables effective knowledge transfer without raw data sharing

→ Federated framework with secure aggregation preserves client privacy

-----

📊 **Results**:

→ Achieves 41-66% improvement over zero-shot performance across different model combinations

→ Matches centralized training performance while using only 0.23-0.29% of communication costs

→ Shows consistent gains for both server LLMs and client SLMs across multiple NLP tasks

Rohan's Bytes

"FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models"

Discussion about this video