New Vision transformers make cars smarter by combining multiple sensor inputs for better object detection.
Camera-LiDAR Fusion Transformer (CLFT) combines visual and LiDAR data using vision transformers to improve traffic object segmentation across diverse weather conditions.
-----
https://arxiv.org/abs/2501.02858
🔍 Original Problem:
Autonomous vehicles struggle with accurate object detection in challenging weather conditions. Current methods using single sensors have limitations in rain, darkness, and complex urban environments.
-----
🛠️ Solution in this Paper:
→ CLFT uses a progressive assembly strategy to fuse camera and LiDAR data through vision transformers.
→ The model processes image patches and LiDAR point clouds in parallel through an encoder-decoder architecture.
→ Multi-Head Self-Attention mechanism weighs the importance of different input features dynamically.
→ A novel cross-fusion stage combines features from both sensors using RefineNet-based fusion.
-----
💡 Key Insights:
→ LiDAR outperforms camera in rainy conditions with 74% IoU for cyclists vs 71% for camera alone
→ Combined sensor data performs best in rainy nights with 63% IoU for cyclists
→ CLFT-Hybrid configuration achieves optimal balance between accuracy and computational efficiency
-----
📊 Results:
→ 68% IoU for pedestrian detection in dry conditions
→ 63% IoU for cyclist detection in challenging rainy nights
→ Outperforms traditional FCN networks in complex scenes
Share this post