Benchmarks¶

Performance comparisons: rusket vs mlxtend, scaling to 1 Billion rows, and FPMiner throughput.

rusket includes two mining algorithms (FP-Growth and Eclat), both implemented in Rust. These benchmarks compare rusket against mlxtend (pure Python) on synthetic and real-world datasets.

Benchmark environment: Apple Silicon MacBook Air (M-series, arm64, 8 GB RAM). All timings are single-run wall-clock measurements. Your results may vary depending on hardware, thermal throttling, and background load.

Scale Benchmarks: 1M → 1B Rows 🚀¶

Interactive Chart¶

Three Input Paths¶

rusket supports three ways to ingest data at scale:

from_transactions → sparse DataFrame — returns a pandas DataFrame, easy API
Direct CSR → Rust — pass scipy.sparse.csr_matrix directly to fpgrowth(), skips pandas entirely
FPMiner Streaming — memory-safe accumulator for 100M+ rows that don't fit in RAM

In-Memory Scale (fpgrowth)¶

Scale	`from_transactions` → fpgrowth	Direct CSR → fpgrowth	Speedup
1M rows (200k txns × 10k items)	5.0s	0.1s	50×
10M rows (2M txns × 50k items)	24.4s	1.2s	20×
50M rows (10M txns × 100k items)	63.1s	4.0s	15×
100M rows (20M txns × 200k items)	134.2s	10.1s	13×
200M rows (40M txns × 200k items)	246.8s	17.6s	14×

Direct CSR is the power-user path

At 100M rows, direct CSR mining takes 1.2 seconds — the bottleneck is entirely the CSR build (24.5s). Compare to the pandas sparse path where mining alone takes 9.0s due to sparse.to_coo().tocsr() overhead.

Out-of-Core Scale (FPMiner Streaming)¶

For real-world retail datasets scaling to 1 Billion rows, FPMiner uses a memory-safe chunks approach (per-chunk sort + k-way merge).

Benchmark: 2,603 retail items, avg 4.4 items/basket, min_support = 0.1%

Scale	add_chunk()	mine()	Total Time	Itemsets Found
50M rows	4.8s	5.6s	10.4s	1,260
100M rows	10.6s	13.9s	24.6s	1,254
200M rows	22.7s	33.2s	55.9s	1,261
300M rows	30.0s	55.4s	85.4s	1,259

vs mlxtend¶

Dataset	`rusket` (prep + mine)	`mlxtend` (prep + mine)	Speedup
50k rows (10k txns, 100 items)	0.0 s	0.1 s	~5×
500k rows (50k txns, 500 items)	0.2 s	1.8 s	~9×
2M rows (500k txns, 2k items)	0.2 s	16.0 s	80×

The "Auto" Routine Algorithm¶

rusket.mine(method="auto") dynamically selects the algorithm that performs best based on the dataset density (Borgelt 2003 heuristic).

FP-Growth: Excellent for dense data.
Eclat: Excellent for sparse data (like retail baskets). Eclat directly uses hardware SIMD array-intersections (popcnt) on the TID-lists, resulting in massive speedups (often 5× to 15× faster on sparse arrays).

Real-World Datasets¶

Dataset	Transactions	Items	`rusket`	`mlxtend`	Speedup
andi_data.txt	8,416	119	9.7 s (22.8M itemsets)	TIMEOUT 💥	∞
andi_data2.txt	540,455	2,603	7.9 s	16.2 s	2×

Dense data

On andi_data.txt (~23 items/basket), mlxtend can't finish in 60s. rusket mines 22.8M itemsets in under 10s.

The Power-User Pipeline¶

import numpy as np
from scipy import sparse as sp
from rusket import FPGrowth

csr = sp.csr_matrix(
    (np.ones(len(txn_ids), dtype=np.int8), (txn_ids, item_ids)),
    shape=(n_transactions, n_items),
)

freq = FPGrowth(csr).mine(min_support=0.001, max_len=3, column_names=item_names)

At 100M rows, the mining step takes 1.2 seconds (not a typo).

🏆 Conquering the 1 Billion Row Challenge¶

Bottleneck 1: Memory Exhaustion during Ingestion¶

The Solution: The FPMiner class provides an out-of-core streaming API. It accepts chunks of (txn_id, item_id) pairs, performs a fast \(O(k \log k)\) sort in Rust, buffers them, and uses a k-way merge to stream directly into the final compressed CSR memory block.

Bottleneck 2: Algorithmic Memory Thrashing¶

The Solution: rusket employs a zero-allocation intersect_count_into() kernel. It pre-allocates a thread-local scratch buffer, intersected in-place, with an early-exit heuristic that aborts the memory scan the moment it proves the remaining bits cannot satisfy min_support.

Bottleneck 3: Sequential Seriality¶

The Solution: rusket merges tree construction into the parallel worker loop. Conditional trees are collected and mined concurrently inside the rayon thread pool.

Running the benchmarks¶

uv run maturin develop --release
uv run python benchmarks/bench_scale.py    # Scale benchmark + Plotly chart
uv run python benchmarks/bench_realworld.py
uv run python benchmarks/bench_vs_mlxtend.py
uv run pytest tests/test_benchmark.py -v -s

Why is rusket faster?¶

Technique	Description
Zero-copy CSR	`indptr`/`indices` passed to Rust as pointer hand-offs
Arena FP-Tree	Flat children arena, incremental `is_path()` tracking
Rayon	Parallel conditional mining across CPU cores
Eclat popcount	`Vec<u64>` bitsets + hardware `popcnt` for support
No Python loops	FP-Tree, mining, and metrics all in Rust
`pd.factorize`	O(n) integer encoding, faster than `pd.Categorical` at scale

Recommender Benchmarks vs LibRecommender¶

Measured with pytest-benchmark (5 rounds, warmed up, GC disabled). MovieLens 100k dataset (943 users, 1,682 items, 100k ratings). Only model.fit() is timed — no startup or data loading overhead.

Benchmark	rusket	LibRecommender	Speedup
ALS (Cholesky) (64 factors, 15 epochs)	427 ms	1,324 ms	3.1×
ALS (eALS) (64 factors, 15 epochs)	360 ms	N/A	—
BPR (64 factors, 10 epochs)	33 ms	681 ms	20.4×
ItemKNN (k=100)	55 ms	287 ms	5.2×
SVD (64 factors, 20 epochs)	55 ms	❌ TF-only (broken)	—
EASE	71 ms	N/A	—

Benchmark vs PySpark¶

PySpark is the industry standard for large-scale recommendation workloads. While rusket runs strictly single-machine (utilizing Rust Rayon threading), it relies on native memory and avoids the JVM serialization overhead, resulting in massive speedups over PySpark local mode.

Benchmark environment: Apple Silicon M-Series, local[*] Spark master, 1 million ALS interactions (20k Users × 10k Items), 50k FPGrowth transactions.

Algorithm	`rusket`	`pyspark.ml`	Speedup
FPGrowth (50k txns)	0.027 s	3.89 s	139.8×
ALS (1M interactions)	6.5 s	71.1 s	10.9×