PAPILLON : PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
Local models now protect your privacy while still accessing powerful LLM capabilities
Local models now protect your privacy while still accessing powerful LLM capabilities
Chain small and large LLMs to get best performance while keeping data private
๐ Original Problem:
Users share sensitive personal information with proprietary LLMs during inference, raising privacy concerns. While local open-source models help with privacy, they perform worse than proprietary models.
๐ ๏ธ Solution in this Paper:
โข PAPILLON: A multi-stage pipeline where local models act as privacy-conscious proxies
โข Uses DSPy prompt optimization to find optimal prompts for privacy preservation
โข Two key components:
Prompt Creator: Generates privacy-preserving prompts
Information Aggregator: Combines responses while protecting PII
Created PUPA benchmark with 901 real-world user-LLM interactions containing PII
๐ก Key Insights:
โข Simple redaction significantly lowers LLM response quality
โข Privacy-conscious delegation can balance privacy and performance
โข Smaller local models can effectively leverage larger models while protecting privacy
โข Prompt optimization improves both quality and privacy metrics
๐ Results:
โข Maintains 85.5% response quality compared to proprietary models
โข Restricts privacy leakage to only 7.5%
โข Outperforms simple redaction approaches
โข Shows consistent improvement across different model sizes
PAPILLON uses a multi-stage pipeline with:
A Prompt Creator that generates privacy-preserving prompts
An Information Aggregator that combines responses
Uses prompt optimization with DSPy to find optimal prompts
The system aims to maintain high response quality while minimizing privacy leakage.
๐ How is privacy leakage measured?
The system uses:
Quality Preservation metric comparing outputs to proprietary model responses
Privacy Preservation metric measuring percentage of PII leaked
LLM judges and crowd-sourcing validation for robustness.



