Skip to content

FFT-accelerated Singular Spectrum Analysis in C. SSA at the speed of FFT. Decompose time series into trend, seasonality & noise. O(N log N) Hankel matvec.

License

Notifications You must be signed in to change notification settings

Tugbars/Singular-Spectrum-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SSA-Opt: High-Performance Singular Spectrum Analysis

A fast, header-only C library for Singular Spectrum Analysis with Python bindings, FFT-accelerated operations, and Intel MKL optimization.

17-30× faster than Rssa (the gold-standard R implementation).

1. What is SSA?

Singular Spectrum Analysis decomposes time series into:

  • Trends - Long-term movements
  • Periodic components - Cycles, seasonality
  • Noise - Random fluctuations

Unlike Fourier analysis, SSA is non-parametric and adapts to your data's structure. Ideal for financial analysis, signal processing, and anywhere you need clean signal extraction.

2. Results

2.1.1 Signal Denoising: 159× Noise Reduction

ssa_denoising_best

2.1.2 Signal Denoising with Cadzow

cadzow_comparison

SSA recovers clean signal from heavy noise (SNR: 4.1 → 26.1 dB).

2.2 Trend Extraction: Bitcoin Price Analysis

linkedin_btc_max_accuracy

Ensemble SSA extracts smooth trend from volatile Bitcoin prices (r = 0.987).

2.3.1 Forecasting: Airline Passengers

linkedin_forecast

2-year ahead forecast capturing trend and 12-month seasonality (RMSE: 47.5).

2.3.2 Forecasting: Finance

forecast_finance

3. Benchmarks

3.1 Speed: SSA-Opt vs Rssa(Golden Standard for SSA)

Benchmarked with benchmark_vs_rssa.py on Intel Core i9-14900KF, MKL 2025.0.3:

  • Components: k = 50
  • Window length: L = N/4 (standard SSA choice)
  • Signal: Synthetic (trend + periodic + noise)
benchmark_2x2
N L k SSA-Opt (ms) Rssa (ms) Speedup Correlation
500 125 50 2.0 33.9 17.0× 0.9767
1000 250 50 1.6 46.8 28.5× 0.9938
2000 500 50 2.5 64.3 25.3× 0.9982
5000 1250 50 7.2 148.3 20.5× 0.9995
10000 2500 50 12.2 370.8 30.5× 0.9999
20000 5000 50 26.3 618.9 23.6× 1.0000

Summary: Average 24× speedup, peak 30.5× at N=10,000. Throughput: 625 SSA/sec at N=1000.

Rssa uses PROPACK (Lanczos bidiagonalization). SSA-Opt uses randomized SVD with MKL.

3.2 MKL Configuration Impact

Using mkl_config.h for hybrid CPU optimization (P-core affinity, thread tuning):

N Default MKL Optimized Improvement
1000 2.7 ms 2.1 ms 1.25×
5000 10.7 ms 8.9 ms 1.20×
10000 23.4 ms 20.7 ms 1.13×
20000 41.3 ms 36.1 ms 1.14×

The mkl_config.h header auto-detects hybrid CPUs (Intel 12th-14th gen) and pins threads to P-cores only, avoiding E-core slowdown.

3.3 Accuracy Validation

Test Signal Correlation with True Signal
N=500, SNR=7dB 0.9895
N=1000, SNR=7dB 0.9973
N=10000, SNR=7dB 0.9999
N=20000, SNR=7dB 1.0000

4. Features

Core SSA

  • 3 Decomposition Methods - Sequential power iteration, block power iteration, and randomized SVD
  • FFT-Accelerated - O(N log N) Hankel matrix-vector products via convolution theorem
  • R2C Optimization - Real-to-complex FFT exploits Hermitian symmetry (50% memory savings)
  • Malloc-Free Hot Path - prepare() + decompose_randomized() for zero-allocation streaming

Analysis & Reconstruction

  • Grouped Reconstruction - Combine arbitrary component subsets
  • W-Correlation Matrix - Fast DSYRK-based weighted correlation for component grouping
  • Automatic Pair Detection - Find sine/cosine pairs via singular value + W-correlation matching
  • Variance Explained - Per-component and cumulative variance statistics
  • Component Statistics - Scree plot data, gap detection, automatic rank selection

Forecasting & Prediction

  • R-Forecast (LRF) - Linear Recurrent Formula for extrapolation
  • V-Forecast - Alternative vector forecast method
  • Verticality Check - Stability indicator for forecast reliability

Advanced Features

  • ESPRIT - Frequency/period estimation from eigenvectors
  • Cadzow Iterations - Iterative rank reduction for enhanced denoising
  • Gap Filling - Missing value imputation (iterative + simple methods)
  • MSSA - Multivariate SSA for multi-channel/correlated series

Performance

  • MKL-Optimized - Intel MKL for FFT, BLAS, LAPACK, and RNG
  • Batched FFT - Process multiple vectors in single MKL call
  • Cached FFTs - Pre-computed eigenvector FFTs for fast reconstruction
  • OpenMP Ready - Parallel-friendly design

Integration

  • Header-Only - Single #include "ssa_opt.h"
  • Python Bindings - Full ctypes wrapper with NumPy integration
  • Minimal Dependencies - Just Intel MKL (or compatible BLAS/LAPACK)

5. Quick Start

5.1 Python (Recommended)

from ssa_wrapper import SSA

# Load your data
prices = np.array([...])  # Your time series

# Decompose
ssa = SSA(prices, L=250)
ssa.decompose(k=30)  # Uses fast randomized SVD

# Extract components
trend = ssa.reconstruct([0])
cycle = ssa.reconstruct([1, 2])

# Forecast
forecast = ssa.forecast([0, 1, 2], n_forecast=50)

# Analysis
print(f"Variance explained: {ssa.variance_explained(0, 2):.1%}")
pairs = ssa.find_periodic_pairs()

5.2 C

#define SSA_OPT_IMPLEMENTATION
#include "ssa_opt_r2c.h"
#include "mkl_config.h"

int main() {
    // Initialize MKL (call once at startup)
    mkl_config_ssa_full(1);  // 1 = verbose
    
    // Setup
    SSA_Opt ssa = {0};
    ssa_opt_init(&ssa, signal, N, L);
    
    // Decompose (randomized is fastest)
    ssa_opt_decompose_randomized(&ssa, k, 8);
    
    // Reconstruct trend
    int trend_group[] = {0};
    double* trend = malloc(N * sizeof(double));
    ssa_opt_reconstruct(&ssa, trend_group, 1, trend);
    
    // Cleanup
    ssa_opt_free(&ssa);
    free(trend);
}

5.3 After Decomposition

Access results directly via the struct:

ssa->sigma[i]        // Singular values (descending order)
ssa->eigenvalues[i]  // Squared singular values (variance explained)
ssa->U[i * L]        // Left singular vector i (length L)
ssa->V[i * K]        // Right singular vector i (length K)

6. Algorithm

6.1 Trajectory Matrix (Hankel Embedding)

SSA embeds a 1D time series into a 2D trajectory matrix. Given signal x = [x₀, x₁, ..., x_{N-1}] and window length L, we construct an L × K Hankel matrix where K = N - L + 1:

         Column 0   Column 1   Column 2   ...   Column K-1
       ┌─────────────────────────────────────────────────┐
Row 0  │   x₀        x₁         x₂       ...    x_{K-1}  │
Row 1  │   x₁        x₂         x₃       ...    x_K      │
Row 2  │   x₂        x₃         x₄       ...    x_{K+1}  │
  ⋮    │   ⋮         ⋮          ⋮        ⋱       ⋮       │
Row L-1│  x_{L-1}    x_L       x_{L+1}   ...    x_{N-1}  │
       └─────────────────────────────────────────────────┘

H[i,j] = x[i+j]    (constant along anti-diagonals)

This matrix is never explicitly formed - it would require O(L × K) ≈ O(N²) memory.

6.2 Singular Value Decomposition

We compute the top-k singular triplets of H:

H ≈ σ₀·u₀·v₀ᵀ + σ₁·u₁·v₁ᵀ + ... + σₖ₋₁·uₖ₋₁·vₖ₋₁ᵀ

Each triplet (σᵢ, uᵢ, vᵢ) represents one component. Larger σ = more signal energy.

6.3 Reconstruction (Diagonal Averaging)

Each component forms a rank-1 matrix, then averaged back to 1D along anti-diagonals:

t=0: x̃[0] = X[0,0]
t=1: x̃[1] = (X[0,1] + X[1,0]) / 2
t=2: x̃[2] = (X[0,2] + X[1,1] + X[2,0]) / 3
...

6.4 Forecasting (Linear Recurrent Formula)

SSA signals satisfy a linear recurrence. The LRF extracts coefficients from left singular vectors to predict future values:

x[n] = a₁·x[n-1] + a₂·x[n-2] + ... + a_{L-1}·x[n-L+1]

7. Optimizations

7.1 FFT-Accelerated Matrix-Vector Product

Hankel matrix-vector multiplication equals convolution:

Forward: y = H·v                    Adjoint: z = Hᵀ·u

y[i] = Σⱼ x[i+j]·v[j]              z[j] = Σᵢ x[i+j]·u[i]
     = (x ∗ v)[i]                       = (x ∗ flip(u))[L-1+j]

┌─────────────────────────────────────────────────────────┐
│  conv(a,b) = IFFT( FFT(a) ⊙ FFT(b) )                   │
│                                                         │
│  Complexity: O(N²) direct  →  O(N log N) with FFT       │
└─────────────────────────────────────────────────────────┘

Pre-compute FFT(signal) once at init.
Each matvec: 1 FFT + 1 pointwise multiply + 1 IFFT

7.2 Block Power Method

Process b vectors simultaneously instead of sequentially:

Sequential (b=1):                 Block (b=32):
┌───┐                             ┌───────────────────┐
│ v │  → H·v → u → Hᵀ·u → v       │ v₀ v₁ v₂ ... v₃₁  │   = V_block
└───┘                             └───────────────────┘
                                           ↓
× k components                    H·V_block (32 FFTs batched)
× 100 iterations                           ↓
= 6400 individual FFTs            QR(U_block)
                                           ↓
                                  Hᵀ·U_block (32 FFTs batched)
                                           ↓
                                  = 200 batched calls total

Benefits:
• MKL batched FFT: single function call for 32 transforms
• Memory locality: V_block columns contiguous in memory
• BLAS3: QR uses GEMM instead of GEMV

7.3 Periodic QR

Every iteration:        vs.     Every 5 iterations:

iter 1: QR                      iter 1: (skip)
iter 2: QR                      iter 2: (skip)
iter 3: QR                      iter 3: (skip)
iter 4: QR                      iter 4: (skip)
iter 5: QR                      iter 5: QR ←
  ⋮                               ⋮
100 QR calls                    20 QR calls (5× reduction)

Stability maintained by:
• Final QR before Rayleigh-Ritz extraction
• Deflation orthogonalization against previous blocks each iteration

7.4 Rayleigh-Ritz Extraction

After block iteration converges, vectors span the correct subspace but aren't individually optimal:

┌─────────────────────────────────────────────────────┐
│ 1. Project: M = U_blockᵀ · (H · V_block)   [b × b]  │
│                                                     │
│ 2. Small SVD: M = Uₛ · Σ · Vₛᵀ             [b × b]  │
│                                                     │
│ 3. Rotate:  U_final = U_block · Uₛ                  │
│             V_final = V_block · Vₛ                  │
│             σ_final = diag(Σ)                       │
└─────────────────────────────────────────────────────┘

Cost: one 32×32 SVD (microseconds) → optimal singular values

7.5 GEMM-Based Operations

All multi-vector operations use BLAS Level 3:

Operation BLAS2 (sequential) BLAS3 (block)
Orthogonalize against previous k × GEMV Single GEMM
Project out components k × AXPY Single GEMM

GEMM achieves near-peak FLOPS due to cache reuse and SIMD.

7.6 Randomized SVD

For very large problems, randomized SVD is fastest:

1. Random projection: Y = H · Ω,  Ω is K × (k+p) random
2. Orthogonalize: Q = qr(Y)
3. Project: B = Qᵀ · H
4. SVD of small matrix: B = U_B · Σ · Vᵀ
5. Recover: U = Q · U_B

Complexity: O((k+p) × N log N) vs O(k × iterations × N log N) for power iteration.

7.7 R2C FFT Optimization

Real-to-Complex FFT exploits conjugate symmetry:

  • Output: N/2+1 complex values (vs N for C2C)
  • 50% less memory for FFT buffers
  • ~17% faster than complex-to-complex

7.8 Cached FFTs for W-Correlation

W-correlation matrix requires FFTs of all component reconstructions. With caching:

Operation Without Cache With Cache
wcorr_matrix 2n forward FFTs 0 forward FFTs
Speedup

Call ssa_opt_cache_ffts() once after decomposition if computing W-correlation multiple times.

8. Installation

8.1 Python

# Build the shared library
cd MKL
mkdir build && cd build
cmake .. -DUSE_MKL=ON
cmake --build . --config Release

# Copy ssa.dll/libssa.so to py/ folder
# Then:
cd ../py
python -c "from ssa_wrapper import SSA; print('OK')"

8.2 C/C++ (Header-Only)

#define SSA_OPT_IMPLEMENTATION
#include "ssa_opt_r2c.h"

Link with MKL:

# Windows
cl /O2 your_code.c /I"%MKLROOT%\include" mkl_rt.lib

# Linux
gcc -O3 your_code.c -I${MKLROOT}/include -lmkl_rt -lm

9. Benchmarking

Run the benchmark suite:

cd py

# Speed comparison vs Rssa
python benchmark_vs_rssa.py --speed

# Compare MKL configurations
python benchmark_vs_rssa.py --mkl-config

# Internal method comparison (seq/block/randomized)
python benchmark_vs_rssa.py --internal

# All benchmarks
python benchmark_vs_rssa.py --all

Parameter sensitivity tests:

python ssa_parameter_tests.py --snr        # Noise robustness
python ssa_parameter_tests.py --window     # L sensitivity
python ssa_parameter_tests.py --components # k sweep
python ssa_parameter_tests.py --all

10. Limitations

  • Memory: O(k × N) for storing singular vectors
  • Not for streaming: Batch processing only
  • Minimum size: N > 100 recommended

11. References

  1. Golyandina, N., & Zhigljavsky, A. (2013). Singular Spectrum Analysis for Time Series. Springer.
  2. Korobeynikov, A. (2010). Computation- and space-efficient implementation of SSA. Statistics and Its Interface, 3(3).
  3. Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness. SIAM Review, 53(2).

12. License

GPL 3.0 - See LICENSE file.

About

FFT-accelerated Singular Spectrum Analysis in C. SSA at the speed of FFT. Decompose time series into trend, seasonality & noise. O(N log N) Hankel matvec.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published