Genetic algorithm meets LLM reasoning and Zero-shot prompt recovery to reverse-engineer prompts from outputs.
Reverse Prompt Engineering (RPE) reconstructs original prompts from just 5 LLM outputs without accessing model internals.
Making LLMs work backwards: from answers to questions.
https://arxiv.org/abs/2411.06729
Original Problem 🤔:
Inferring the original prompt from LLM outputs is challenging, especially in black-box settings where only text outputs are available. Previous methods require extensive resources (64+ outputs) and often need access to internal model parameters.
-----
Solution in this Paper 🛠️:
→ Introduces Reverse Prompt Engineering (RPE), a zero-shot method using the target LLM's reasoning to reconstruct prompts from just 5 outputs
→ Employs a three-stage approach: One Answer One Shot (RPE1A1S) for basic inference, Five Answers Inference (RPE5A5S) for enhanced accuracy using multiple responses
→ Implements RPE-GA, an iterative optimization inspired by genetic algorithms that progressively refines candidate prompts through multiple iterations
→ Uses ROUGE-1 scores and cosine similarity to evaluate and select the best candidate prompts
-----
Key Insights from this Paper 💡:
→ Black-box prompt recovery is possible with minimal resources (5 outputs vs 64 required by previous methods)
→ Using multiple outputs reduces overemphasis on specific response details
→ Genetic algorithm-based optimization significantly improves prompt recovery accuracy
→ Zero-shot approach eliminates need for training data or additional model training
-----
Results 📊:
→ Outperforms state-of-the-art by 5.2% in cosine similarity across different embedding models
→ Achieves 2.3% higher similarity with ada-002 embeddings
→ Shows 8.1% improvement with text-embedding-3-large
→ Maintains slightly lower ROUGE-1 scores (-1.6%) while generating more natural prompts
Share this post