New zero-shot prompts improve LLM's logical reasoning abilities.
This paper explores zero-shot LLM self-verification for reasoning tasks. LLMs check their own reasoning steps for correctness without training examples.
-----
Paper - https://arxiv.org/abs/2501.13122
Original Problem 😞:
→ Existing methods for verifying LLM-generated reasoning often require fine-tuning or few-shot examples.
Solution in this Paper 🤔:
→ This paper introduces a zero-shot COT prompt named COT STEP for generating structured, decomposable reasoning steps.
→ It also proposes two zero-shot verifier prompts (R-prompt and COTR-prompt) for LLMs to evaluate their own reasoning.
→ The R-prompt directly asks if a step is correct, while the COTR-prompt adds a chain-of-thought element to the verification process.
→ A scoring mechanism combines the generation probability of a reasoning step and the probability of the verification prediction.
Key Insights from this Paper 💡:
→ COT STEP achieves comparable performance to existing zero-shot COT prompts while also enabling automatic step decomposition.
→ Zero-shot COT-based verification is reasonably effective, especially for mathematical reasoning.
→ Using verifier scores in step-level greedy search shows some benefit, but the advantage diminishes when using self-consistency.
Results 💪:
→ COT STEP performs competitively with other zero-shot prompts across different LLMs.
→ Zero-shot COT-based verification shows promising results on mathematical reasoning tasks using SOLAR and Phi-3 LLMs as verifiers.
→ Step-wise greedy search guided by verification improves performance compared to plain COT without self-consistency.