PKRD-CoT introduces a zero-shot chain-of-thought prompting framework that enables Multi-Modal LLMs to perform autonomous driving tasks without expensive training.
It mimics human driving cognition through perception, knowledge, reasoning, and decision-making capabilities.
-----
https://arxiv.org/abs/2412.02025
🚗 Original Problem:
Traditional autonomous driving relies heavily on data-driven approaches, requiring extensive training data and computational resources. This leads to issues like dataset bias, overfitting, and poor generalization.
-----
🔧 Solution in this Paper:
→ PKRD-CoT framework integrates four fundamental driving capabilities: perception, knowledge, reasoning, and decision-making into a unified prompting system.
→ The system processes driving scenarios through sequential steps: environment observation, target identification, memory storage in JSON format, and decision making.
→ A memory module maintains contextual continuity by storing environmental information in structured JSON format.
→ The framework enables MLLMs to handle autonomous driving tasks without pre-training, similar to how humans learn driving through knowledge transfer.
-----
💡 Key Insights:
→ Knowledge-driven approach outperforms traditional data-driven methods in autonomous driving
→ Zero-shot chain-of-thought prompting can effectively guide MLLMs in complex driving scenarios
→ Memory module in JSON format enhances contextual understanding and decision-making
-----
📊 Results:
→ PKRD-CoT achieved 94% decision-making accuracy, outperforming zero-shot (72%) and role-playing (88%) methods
→ GPT-4 with PKRD-CoT showed superior performance across all tasks, followed by Claude and LLava 1.6
→ CogVLM excelled specifically in target localization tasks
Share this post