0:00
/
0:00
Transcript

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Generated this podcast with Google's Illuminate.

UniRepLKNet: Expanding ConvNet capabilities with large kernels for multi-modal perception.

Achieve universal modeling across modalities

https://arxiv.org/abs/2410.08049

Original Problem 🔍:

ConvNets have been challenged by Vision Transformers in image recognition tasks and lack universal modeling capabilities across modalities.

-----

Solution in this Paper 🛠️:

• UniRepLKNet: A universal large-kernel ConvNet architecture

• Uses depth-wise large kernels (13x13) in middle and later stages

• Incorporates Dilated Reparam Block for enhanced large kernel convolutions

• Efficient implementation using block-wise implicit GEMM

• Modality-specific preprocessing for audio, point clouds, time-series, and video

-----

Key Insights from this Paper 💡:

• Large kernels improve performance without significant computational overhead

• Structural re-parameterization enhances large kernel effectiveness

• ConvNets can achieve universal perception across modalities

• Large-kernel ConvNets show higher shape bias than traditional ConvNets