CHIME, proposed in this paper, makes ChatGPT understand your bug reports better by speaking both human and computer languages
ChatGPT struggles with technical bug reports, showing only 36.4% accuracy when analyzing mixed text and code content. This paper introduces CHIME, a system that preprocesses technical reports and validates ChatGPT responses using context-free grammar and metamorphic testing, improving accuracy by 30.3%.
-----
https://arxiv.org/abs/2411.07360
🔍 Original Problem:
ChatGPT frequently produces incorrect or irrelevant answers when processing software bug reports containing both text and code snippets. The main challenges are understanding complex technical content like stack traces and integrating context from technical terms.
-----
🛠️ Solution in this Paper:
→ CHIME preprocesses technical reports by organizing mixed text and code content into structured metadata
→ It uses context-free grammar to efficiently parse stack traces in bug reports
→ The system validates ChatGPT responses through query transformation and metamorphic testing
→ CHIME extends the CoVe verification technique by mutating questions to catch incorrect responses
-----
💡 Key Insights:
→ Developers primarily need AI support for issue analytics and trend detection
→ Stack trace parsing and context integration are critical for accurate bug report analysis
→ Structured metadata improves LLM search capabilities
-----
📊 Results:
→ CHIME improved ChatGPT's accuracy by 30.3% on the benchmark dataset
→ Base ChatGPT showed only 36.4% accuracy on technical bug reports
→ Users found CHIME-enhanced responses more useful than standard ChatGPT
Share this post