0:00
/
0:00
Transcript

"Reverse Prompt Engineering"

The podcast on this paper is generated with Google's Illuminate.

Genetic algorithm meets LLM reasoning and Zero-shot prompt recovery to reverse-engineer prompts from outputs.

Reverse Prompt Engineering (RPE) reconstructs original prompts from just 5 LLM outputs without accessing model internals.

Making LLMs work backwards: from answers to questions.

https://arxiv.org/abs/2411.06729

Original Problem 🤔:

Inferring the original prompt from LLM outputs is challenging, especially in black-box settings where only text outputs are available. Previous methods require extensive resources (64+ outputs) and often need access to internal model parameters.

-----

Solution in this Paper 🛠️:

→ Introduces Reverse Prompt Engineering (RPE), a zero-shot method using the target LLM's reasoning to reconstruct prompts from just 5 outputs

→ Employs a three-stage approach: One Answer One Shot (RPE1A1S) for basic inference, Five Answers Inference (RPE5A5S) for enhanced accuracy using multiple responses

→ Implements RPE-GA, an iterative optimization inspired by genetic algorithms that progressively refines candidate prompts through multiple iterations

→ Uses ROUGE-1 scores and cosine similarity to evaluate and select the best candidate prompts

-----

Key Insights from this Paper 💡:

→ Black-box prompt recovery is possible with minimal resources (5 outputs vs 64 required by previous methods)

→ Using multiple outputs reduces overemphasis on specific response details

→ Genetic algorithm-based optimization significantly improves prompt recovery accuracy

→ Zero-shot approach eliminates need for training data or additional model training

-----

Results 📊:

→ Outperforms state-of-the-art by 5.2% in cosine similarity across different embedding models

→ Achieves 2.3% higher similarity with ada-002 embeddings

→ Shows 8.1% improvement with text-embedding-3-large

→ Maintains slightly lower ROUGE-1 scores (-1.6%) while generating more natural prompts

Discussion about this video

User's avatar