Skip to content

Benchmarks

Performance comparisons: rusket vs mlxtend, scaling to 1 Billion rows, and FPMiner throughput.

rusket includes two mining algorithms (FP-Growth and Eclat), both implemented in Rust. These benchmarks compare rusket against mlxtend (pure Python) on synthetic and real-world datasets.

Benchmark environment: Apple Silicon MacBook Air (M-series, arm64, 8 GB RAM). All timings are single-run wall-clock measurements. Your results may vary depending on hardware, thermal throttling, and background load.


Scale Benchmarks: 1M → 1B Rows 🚀

Interactive Chart

Three Input Paths

rusket supports three ways to ingest data at scale:

  1. from_transactions → sparse DataFrame — returns a pandas DataFrame, easy API
  2. Direct CSR → Rust — pass scipy.sparse.csr_matrix directly to fpgrowth(), skips pandas entirely
  3. FPMiner Streaming — memory-safe accumulator for 100M+ rows that don't fit in RAM

In-Memory Scale (fpgrowth)

Scale from_transactions → fpgrowth Direct CSR → fpgrowth Speedup
1M rows (200k txns × 10k items) 5.0s 0.1s 50×
10M rows (2M txns × 50k items) 24.4s 1.2s 20×
50M rows (10M txns × 100k items) 63.1s 4.0s 15×
100M rows (20M txns × 200k items) 134.2s 10.1s 13×
200M rows (40M txns × 200k items) 246.8s 17.6s 14×

Direct CSR is the power-user path

At 100M rows, direct CSR mining takes 1.2 seconds — the bottleneck is entirely the CSR build (24.5s). Compare to the pandas sparse path where mining alone takes 9.0s due to sparse.to_coo().tocsr() overhead.

Out-of-Core Scale (FPMiner Streaming)

For real-world retail datasets scaling to 1 Billion rows, FPMiner uses a memory-safe chunks approach (per-chunk sort + k-way merge).

Benchmark: 2,603 retail items, avg 4.4 items/basket, min_support = 0.1%

Scale add_chunk() mine() Total Time Itemsets Found
50M rows 4.8s 5.6s 10.4s 1,260
100M rows 10.6s 13.9s 24.6s 1,254
200M rows 22.7s 33.2s 55.9s 1,261
300M rows 30.0s 55.4s 85.4s 1,259

vs mlxtend

Dataset rusket (prep + mine) mlxtend (prep + mine) Speedup
50k rows (10k txns, 100 items) 0.0 s 0.1 s ~5×
500k rows (50k txns, 500 items) 0.2 s 1.8 s ~9×
2M rows (500k txns, 2k items) 0.2 s 16.0 s 80×

The "Auto" Routine Algorithm

rusket.mine(method="auto") dynamically selects the algorithm that performs best based on the dataset density (Borgelt 2003 heuristic).

  • FP-Growth: Excellent for dense data.
  • Eclat: Excellent for sparse data (like retail baskets). Eclat directly uses hardware SIMD array-intersections (popcnt) on the TID-lists, resulting in massive speedups (often 5× to 15× faster on sparse arrays).

Real-World Datasets

Dataset Transactions Items rusket mlxtend Speedup
andi_data.txt 8,416 119 9.7 s (22.8M itemsets) TIMEOUT 💥
andi_data2.txt 540,455 2,603 7.9 s 16.2 s

Dense data

On andi_data.txt (~23 items/basket), mlxtend can't finish in 60s. rusket mines 22.8M itemsets in under 10s.


The Power-User Pipeline

import numpy as np
from scipy import sparse as sp
from rusket import FPGrowth

csr = sp.csr_matrix(
    (np.ones(len(txn_ids), dtype=np.int8), (txn_ids, item_ids)),
    shape=(n_transactions, n_items),
)

freq = FPGrowth(csr).mine(min_support=0.001, max_len=3, column_names=item_names)

At 100M rows, the mining step takes 1.2 seconds (not a typo).


🏆 Conquering the 1 Billion Row Challenge

Bottleneck 1: Memory Exhaustion during Ingestion

The Solution: The FPMiner class provides an out-of-core streaming API. It accepts chunks of (txn_id, item_id) pairs, performs a fast \(O(k \log k)\) sort in Rust, buffers them, and uses a k-way merge to stream directly into the final compressed CSR memory block.

Bottleneck 2: Algorithmic Memory Thrashing

The Solution: rusket employs a zero-allocation intersect_count_into() kernel. It pre-allocates a thread-local scratch buffer, intersected in-place, with an early-exit heuristic that aborts the memory scan the moment it proves the remaining bits cannot satisfy min_support.

Bottleneck 3: Sequential Seriality

The Solution: rusket merges tree construction into the parallel worker loop. Conditional trees are collected and mined concurrently inside the rayon thread pool.


Running the benchmarks

uv run maturin develop --release
uv run python benchmarks/bench_scale.py    # Scale benchmark + Plotly chart
uv run python benchmarks/bench_realworld.py
uv run python benchmarks/bench_vs_mlxtend.py
uv run pytest tests/test_benchmark.py -v -s

Why is rusket faster?

Technique Description
Zero-copy CSR indptr/indices passed to Rust as pointer hand-offs
Arena FP-Tree Flat children arena, incremental is_path() tracking
Rayon Parallel conditional mining across CPU cores
Eclat popcount Vec<u64> bitsets + hardware popcnt for support
No Python loops FP-Tree, mining, and metrics all in Rust
pd.factorize O(n) integer encoding, faster than pd.Categorical at scale

Recommender Benchmarks vs LibRecommender

Measured with pytest-benchmark (5 rounds, warmed up, GC disabled). MovieLens 100k dataset (943 users, 1,682 items, 100k ratings). Only model.fit() is timed — no startup or data loading overhead.

Benchmark rusket LibRecommender Speedup
ALS (Cholesky) (64 factors, 15 epochs) 427 ms 1,324 ms 3.1×
ALS (eALS) (64 factors, 15 epochs) 360 ms N/A
BPR (64 factors, 10 epochs) 33 ms 681 ms 20.4×
ItemKNN (k=100) 55 ms 287 ms 5.2×
SVD (64 factors, 20 epochs) 55 ms ❌ TF-only (broken)
EASE 71 ms N/A

Benchmark vs PySpark

PySpark is the industry standard for large-scale recommendation workloads. While rusket runs strictly single-machine (utilizing Rust Rayon threading), it relies on native memory and avoids the JVM serialization overhead, resulting in massive speedups over PySpark local mode.

Benchmark environment: Apple Silicon M-Series, local[*] Spark master, 1 million ALS interactions (20k Users × 10k Items), 50k FPGrowth transactions.

Algorithm rusket pyspark.ml Speedup
FPGrowth (50k txns) 0.027 s 3.89 s 139.8×
ALS (1M interactions) 6.5 s 71.1 s 10.9×