0:00
/
0:00
Transcript

"Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study"

The podcast on this paper is generated with Google's Illuminate.

PEFT methods train a small fraction of the parameters yet match full fine-tuning for unit test generation

https://arxiv.org/abs/2411.02462

🎯 Original Problem:

Training LLMs for specialized tasks like unit test generation requires expensive full fine-tuning of all parameters. This is computationally intensive and resource-heavy, making it impractical for most organizations.

-----

🔧 Solution in this Paper:

→ Evaluated three Parameter-Efficient Fine-Tuning (PEFT) methods: LoRA (Low-Rank Adaptation), (IA)³ (Infused Adapter), and Prompt Tuning

→ Used Methods2Test dataset containing real-world Java unit tests and HumanEval-X for evaluation

→ Tested across multiple model families ranging from 350M to 16B parameters

→ Measured syntactic correctness, CodeBLEU similarity score, and code coverage metrics

-----

💡 Key Insights:

→ LoRA emerged as most reliable method, often matching full fine-tuning performance

→ Prompt tuning showed high variability - excellent with larger models but inconsistent with smaller ones

→ (IA)³ showed minimal improvements but retained prior model knowledge well

→ Most methods achieved >80% syntactic correctness in generated tests

→ Resource requirements reduced significantly while maintaining performance

-----

📊 Results:

→ LoRA achieved performance comparable to full fine-tuning while training only 0.01% of parameters

→ Prompt tuning improved CodeGen2-1B's syntactic correctness by 66.63% using just 0.02% parameters

→ Generated tests maintained >80% syntactic correctness across most models

Discussion about this video