PEFT methods train a small fraction of the parameters yet match full fine-tuning for unit test generation
https://arxiv.org/abs/2411.02462
🎯 Original Problem:
Training LLMs for specialized tasks like unit test generation requires expensive full fine-tuning of all parameters. This is computationally intensive and resource-heavy, making it impractical for most organizations.
-----
🔧 Solution in this Paper:
→ Evaluated three Parameter-Efficient Fine-Tuning (PEFT) methods: LoRA (Low-Rank Adaptation), (IA)³ (Infused Adapter), and Prompt Tuning
→ Used Methods2Test dataset containing real-world Java unit tests and HumanEval-X for evaluation
→ Tested across multiple model families ranging from 350M to 16B parameters
→ Measured syntactic correctness, CodeBLEU similarity score, and code coverage metrics
-----
💡 Key Insights:
→ LoRA emerged as most reliable method, often matching full fine-tuning performance
→ Prompt tuning showed high variability - excellent with larger models but inconsistent with smaller ones
→ (IA)³ showed minimal improvements but retained prior model knowledge well
→ Most methods achieved >80% syntactic correctness in generated tests
→ Resource requirements reduced significantly while maintaining performance
-----
📊 Results:
→ LoRA achieved performance comparable to full fine-tuning while training only 0.01% of parameters
→ Prompt tuning improved CodeGen2-1B's syntactic correctness by 66.63% using just 0.02% parameters
→ Generated tests maintained >80% syntactic correctness across most models
Share this post