Architecture¶
How rusket relies on Rust, PyO3, and Rayon for zero-copy, zero-allocation Python bindings.
rusket is structured as a thin Python layer over a Rust core, compiled as a native extension module via PyO3 and maturin.
Class hierarchy & API conventions¶
rusket has two class families: Miners (frequent pattern discovery) and Recommenders (collaborative filtering). All share common API patterns.
Base classes¶
BaseModel (ABC)
├── Miner ─── FPGrowth, Eclat, FIN, LCM, FPGrowth (+ RuleMinerMixin)
│ PrefixSpan, HUPM (specialized miners)
├── ImplicitRecommender ─── ALS, BPR, EASE, ItemKNN, SVD, LightGCN
├── SequentialRecommender ─── FPMC, SASRec
└── FM (standalone, uses explicit feature matrices)
Common conventions¶
| Convention | Rule |
|---|---|
from_transactions(data, transaction_col, item_col, verbose, **kwargs) |
Class method on every class. Does NOT auto-fit. Call .fit() explicitly on recommenders. |
from_pandas(), from_polars(), from_spark() |
Aliases delegating to from_transactions() |
fit(**args) |
Sklearn-compatible. Required after from_transactions() for recommenders. |
predict(**args) |
Sklearn-compatible alias for recommend_items() (recommenders) or mine() (miners). |
verbose: int |
0 = silent, 1+ = progress logs |
seed: int |
Random-seed parameter (never random_state) |
fitted: bool |
Attribute set to True after .fit() |
__repr__ |
Every class has a __repr__ showing key hyperparameters |
| Docstrings | NumPy-style (Parameters, Returns, Raises) |
Miner interface¶
miner = FPGrowth.from_transactions(df, transaction_col="tid", item_col="item")
freq = miner.mine() # → pd.DataFrame (support, itemsets)
rules = miner.association_rules() # → pd.DataFrame (metrics)
recs = miner.recommend_for_cart(items, n=5) # → list[Any]
Recommender interface¶
# from_transactions() configures the model but does NOT fit
model = ALS.from_transactions(df, user_col="user", item_col="item", factors=64).fit()
ids, scores = model.recommend_items(user_id=0, n=10, exclude_seen=True)
| Method | Available on |
|---|---|
fit(interactions=None) |
All recommenders — accepts sparse matrix or uses data from from_transactions() |
predict(user_id, item_id) |
SVD (rating prediction); others: alias for recommend_items() |
recommend_items(user_id, n, exclude_seen) |
All recommenders |
recommend_users(item_id, n) |
ALS, SVD (others raise NotImplementedError) |
batch_recommend(n, exclude_seen, format) |
ALS, SVD |
user_factors / item_factors |
ALS, BPR, SVD, LightGCN |
Sequential recommenders (FPMC, SASRec)¶
Work on ordered sequences. SASRec also accepts ad-hoc sequences:
model = SASRec.from_transactions(df, user_col="user", item_col="item", timestamp_col="ts").fit()
ids, scores = model.recommend_items([1, 2, 3], n=10)
Repository layout¶
rusket/
├── src/ # Rust (PyO3)
│ ├── lib.rs # Module root — exports to Python
│ ├── fpgrowth.rs # FP-Tree + FP-Growth algorithm
│ ├── association_rules.rs # Rule generation + 12 metrics
│ └── common.rs # Shared helpers
├── python/
│ ├── rusket/ # Primary Python package (pyproject.toml name)
│ │ ├── __init__.py
│ │ ├── fpgrowth.py # Dispatch + numpy conversion
│ │ ├── association_rules.py # Label mapping + Rust call
│ │ └── _validation.py # Input validation helpers
│ └── fpgrowth_pyo3/ # Legacy compat package
│ └── ...
└── tests/
├── conftest.py
├── test_fpbase.py # Shared base test classes
├── test_fpgrowth.py # FP-Growth tests
├── test_association_rules.py # Association rules tests
└── test_benchmark.py # Performance benchmarks
Data flow¶
graph TD
classDef python fill:#4B8BBE,stroke:#306998,stroke-width:2px,color:white;
classDef rust fill:#DEA584,stroke:#000,stroke-width:2px,color:black;
classDef data fill:#FFD43B,stroke:#306998,stroke-width:2px,color:black;
A["Python Caller<br/>rusket.mine(df)"]:::python --> B{"Input Data Type"}:::python
B -->|Dense Pandas| C["C-Contiguous Array<br/>(uint8)"]:::data
B -->|Sparse Pandas| D["CSR Matrix<br/>(indptr, indices)"]:::data
B -->|Polars| E["Numpy View<br/>(Zero-copy)"]:::data
C --> F["Rust FFI<br/>fpgrowth_from_dense"]:::rust
D --> G["Rust FFI<br/>fpgrowth_from_csr"]:::rust
E --> F
F --> H["Tree Construction<br/>(Single Pass)"]:::rust
G --> H
H --> I["Recursive Mining<br/>(Rayon Parallel)"]:::rust
I --> J["Raw Vectors<br/>Vec<(count, Vec<usize>)>"]:::data
J --> K["Python Transformation<br/>Build pd.DataFrame"]:::python
FP-Growth algorithm¶
The Rust implementation follows the classic Han et al. (2000) FP-Growth algorithm:
- Header table scan — count item frequencies; prune items below
min_count. - FP-Tree construction — single-pass over transactions; compress into a prefix-tree structure.
- Recursive mining — for each frequent item, extract the conditional pattern base, build a conditional FP-Tree, and mine it recursively.
- Output — each leaf path materialises as one frequent itemset
(count, items).
Dispatch paths¶
| Path | Rust function | Input shape | Notes |
|---|---|---|---|
| Dense pandas | fpgrowth_from_dense |
[n_rows × n_cols] uint8 |
Contiguous C array |
| Sparse pandas | fpgrowth_from_csr |
CSR indptr + indices |
Zero-copy scipy CSR |
| Polars | fpgrowth_from_dense |
same as dense | Arrow → NumPy view |
Association rules¶
Rule generation is vectorised in Rust:
- For each frequent itemset of length ≥ 2, enumerate all non-empty antecedent / consequent splits.
- Look up antecedent and consequent supports from a pre-built hash map.
- Compute all 12 metrics in a single pass; filter by
(metric, min_threshold). - Return raw integer index lists to Python; Python maps back to column names / tuples.
Building from source¶
# Prerequisites: Rust 1.83+, Python 3.10+, uv
rustup update
uv sync
# Debug build (fast compile, slower runtime)
uv run maturin develop
# Release build (optimised)
uv run maturin develop --release
# Type checking
uv run basedpyright
# Tests
uv run pytest tests/ -x -q
# Cargo lint
cargo check
cargo clippy