LLMs can not only reason but also improve their own training data, leading to better reasoning.
This paper enhances LLM reasoning by refining training datasets with LLM-generated reasoning paradigms, utilizing a universal text template for training.
-----
https://arxiv.org/abs/2501.11110
Original Problem 🤔:
→ LLMs require high quality datasets for effective reasoning.
→ Creating datasets for complex reasoning is challenging and resource intensive.
→ Current datasets may lack diverse and refined reasoning approaches.
-----
Solution in this Paper 💡:
→ The paper introduces a PPT method featuring dataset enhancement.
→ It leverages LLMs to generate and refine reasoning approaches within datasets.
→ Specific prompts guide LLMs in enhancing datasets with structured instructions.
→ A universal text template is applied to standardize all training samples.
→ Training is conducted in three stages, with adjustments to epochs and sequence lengths.
→ DeepSpeed ZeRO Stage 3 and Flash-Attention are used to improve computational efficiency during training.
→ An annealing strategy at the end of training further enhances model accuracy on complex tasks.
-----
Key Insights from this Paper 🧠:
→ LLMs can be effectively utilized to improve the quality and diversity of their own training datasets.
→ Structured prompts are essential for guiding LLM-driven dataset enhancement processes.
→ Universal text templates streamline the processing and consistency of training data.
→ Distributed optimization and attention mechanisms are crucial for efficient LLM training.
-----
Results 📈:
→ Majority vote strategy with 8 samples on GSM8K surpasses GPT-4o's Pass@1 performance of 90.5%.
Share this post