Code LLMs can now teach themselves through self-generated instruction data and test-based validation.
SelfCodeAlign, proposed in this paper, enables code models to improve without relying on bigger teacher models
📚 https://arxiv.org/abs/2410.24198
🎯 Original Problem:
Instruction tuning for code LLMs typically relies on expensive human annotations or knowledge distillation from larger proprietary models, which may violate terms of service and limit generalizability.
-----
🔧 Solution in this Paper:
→ SelfCodeAlign: A pipeline that enables code LLMs to self-align without human annotations/distillation
→ Extracts diverse coding concepts from high-quality seed functions in The Stack V1
→ Uses base model to generate new coding tasks through in-context learning
→ Generates multiple responses per task with test cases for self-validation
→ Selects only passing examples for instruction tuning
-----
💡 Key Insights:
→ Models can learn better from their own data distribution than from teacher models
→ Explicit test case generation and validation is crucial for self-alignment
→ Seed selection and concept extraction improve instruction quality
→ Self-alignment can outperform distillation when model gaps aren't huge
-----
📊 Results:
→ Using CodeQwen1.5-7B achieves 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct
→ Outperforms models trained with OctoPack across all benchmarks
→ Effective across model sizes from 3B to 33B
→ Matches/exceeds performance of models using proprietary data
Share this post