0:00
/
0:00
Transcript

"Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware"

Generated below podcast on this paper with Google's Illuminate.

Decode Android malware behavior through smart code summarization.

MalParse leverages LLM capabilities to perform semantic analysis and categorization of Android malware through hierarchical code summarization, achieving high accuracy without malware pre-training.

https://arxiv.org/abs/2501.04848

Original Problem 🔍:

→ Android malware analysis traditionally requires intensive manual reverse engineering and deep expertise

→ Fast evolution of malicious code makes timely analysis challenging

→ Identifying critical malicious behaviors in complex codebases is time-consuming

-----

Solution in this Paper 🛠️:

→ MalParse employs a three-tier code summarization strategy using an LLM

→ Functions are summarized first, then aggregated into class summaries

→ Class summaries are consolidated into comprehensive package-level analysis

→ Uses strategic prompt engineering with vanilla, API-scoped, and malware-scoped prompts

→ Implements backward tracing to identify exact code causing malicious behavior

-----

Key Insights 💡:

→ LLMs can effectively analyze malware without specific pre-training

→ Hierarchical summarization provides better context than single-pass analysis

→ Strategic prompt engineering significantly improves malware detection accuracy

-----

Results 📊:

→ Achieved 77% malware classification accuracy

→ Malware-scoped prompts outperformed vanilla (49.5%) and API-scoped (56%) approaches

→ Generated detailed chain-of-thought analysis linking package to function level behaviors

Discussion about this video