UniRepLKNet: Expanding ConvNet capabilities with large kernels for multi-modal perception.
Achieve universal modeling across modalities
https://arxiv.org/abs/2410.08049
Original Problem 🔍:
ConvNets have been challenged by Vision Transformers in image recognition tasks and lack universal modeling capabilities across modalities.
-----
Solution in this Paper 🛠️:
• UniRepLKNet: A universal large-kernel ConvNet architecture
• Uses depth-wise large kernels (13x13) in middle and later stages
• Incorporates Dilated Reparam Block for enhanced large kernel convolutions
• Efficient implementation using block-wise implicit GEMM
• Modality-specific preprocessing for audio, point clouds, time-series, and video
-----
Key Insights from this Paper 💡:
• Large kernels improve performance without significant computational overhead
• Structural re-parameterization enhances large kernel effectiveness
• ConvNets can achieve universal perception across modalities
• Large-kernel ConvNets show higher shape bias than traditional ConvNets
Share this post