Transform complex data patterns into simple symbols for blazing-fast similarity search.
SOFA (Symbolic Fourier Approximation) introduces a fast data series similarity search method using symbolic Fourier approximation and tree indexing, outperforming existing approaches in both speed and accuracy.
-----
https://arxiv.org/abs/2411.17483
🔍 Original Problem:
Existing similarity search methods like SAX struggle with high-frequency data series, leading to poor performance and inefficient indexing. Current approaches fail to capture the intrinsic shape of signals, especially in noisy or high-frequency datasets.
-----
⚡ Solution in this Paper:
→ SOFA combines two key components: SFA (Symbolic Fourier Approximation) for data summarization and a modified MESSI tree index for fast querying.
→ SFA transforms data series into frequency domain using Fourier transform, then applies learned quantization based on actual data distribution.
→ The system selects Fourier coefficients with highest variance to better capture high-frequency components.
→ SOFA implements SIMD optimization for lower bound distance calculations, enabling parallel processing.
-----
🎯 Key Insights:
→ Data-adaptive quantization in frequency domain outperforms fixed binning approaches
→ Variance-based feature selection significantly improves representation of high-frequency signals
→ SIMD optimization with early abandoning enhances query performance
-----
📊 Results:
→ Up to 38x faster than MESSI on high-frequency datasets
→ 2-4x faster than FAISS for exact similarity queries
→ 10x faster than parallel sequential scan
→ Processes queries in 58ms (median case)
Share this post