0:00
/
0:00
Transcript

"Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs"

Generated below podcast on this paper with Google's Illuminate.

This paper presents a hierarchical approach using local LLMs for summarizing large codebases, specifically tailored for business applications.

-----

https://arxiv.org/abs/2501.07857

Solution in this Paper 💡:

→ A two-step hierarchical approach. First, code is segmented into smaller units using abstract syntax trees (ASTs).

→ Local LLMs summarize these segments using custom prompts tailored to each segment type (functions, variables, etc.).

→ These segment summaries are aggregated into file-level summaries, incorporating domain and problem context.

→ File summaries are then combined to create package-level summaries.

-----

Results 😎:

→ Grounding the LLM improved domain relevance (DS) by over 7% in file-level summarization.

→ Direct file-level summarization with LLMs missed approximately 11% of functions and 24% of variables, while the proposed approach covers all segments.

→ Structured prompts with in-context learning improved function summarization accuracy (e.g., completeness by >13%, correctness and cohesiveness by 5%).

Discussion about this video

User's avatar