Architecture¶

How rusket relies on Rust, PyO3, and Rayon for zero-copy, zero-allocation Python bindings.

rusket is structured as a thin Python layer over a Rust core, compiled as a native extension module via PyO3 and maturin.

Class hierarchy & API conventions¶

rusket has two class families: Miners (frequent pattern discovery) and Recommenders (collaborative filtering). All share common API patterns.

Base classes¶

BaseModel (ABC)
├── Miner ─── FPGrowth, Eclat, FIN, LCM, FPGrowth (+ RuleMinerMixin)
│             PrefixSpan, HUPM (specialized miners)
├── ImplicitRecommender ─── ALS, BPR, EASE, ItemKNN, SVD, LightGCN
├── SequentialRecommender ─── FPMC, SASRec
└── FM (standalone, uses explicit feature matrices)

Common conventions¶

Convention	Rule
`from_transactions(data, transaction_col, item_col, verbose, **kwargs)`	Class method on every class. Does NOT auto-fit. Call `.fit()` explicitly on recommenders.
`from_pandas()`, `from_polars()`, `from_spark()`	Aliases delegating to `from_transactions()`
`fit(**args)`	Sklearn-compatible. Required after `from_transactions()` for recommenders.
`predict(**args)`	Sklearn-compatible alias for `recommend_items()` (recommenders) or `mine()` (miners).
`verbose: int`	`0` = silent, `1`+ = progress logs
`seed: int`	Random-seed parameter (never `random_state`)
`fitted: bool`	Attribute set to `True` after `.fit()`
`__repr__`	Every class has a `__repr__` showing key hyperparameters
Docstrings	NumPy-style (`Parameters`, `Returns`, `Raises`)

Miner interface¶

miner = FPGrowth.from_transactions(df, transaction_col="tid", item_col="item")
freq  = miner.mine()                        # → pd.DataFrame (support, itemsets)
rules = miner.association_rules()            # → pd.DataFrame (metrics)
recs  = miner.recommend_for_cart(items, n=5) # → list[Any]

Recommender interface¶

# from_transactions() configures the model but does NOT fit
model = ALS.from_transactions(df, user_col="user", item_col="item", factors=64).fit()
ids, scores = model.recommend_items(user_id=0, n=10, exclude_seen=True)

Method	Available on
`fit(interactions=None)`	All recommenders — accepts sparse matrix or uses data from `from_transactions()`
`predict(user_id, item_id)`	SVD (rating prediction); others: alias for `recommend_items()`
`recommend_items(user_id, n, exclude_seen)`	All recommenders
`recommend_users(item_id, n)`	ALS, SVD (others raise `NotImplementedError`)
`batch_recommend(n, exclude_seen, format)`	ALS, SVD
`user_factors` / `item_factors`	ALS, BPR, SVD, LightGCN

Sequential recommenders (FPMC, SASRec)¶

Work on ordered sequences. SASRec also accepts ad-hoc sequences:

model = SASRec.from_transactions(df, user_col="user", item_col="item", timestamp_col="ts").fit()
ids, scores = model.recommend_items([1, 2, 3], n=10)

Repository layout¶

rusket/
├── src/                          # Rust (PyO3)
│   ├── lib.rs                    # Module root — exports to Python
│   ├── fpgrowth.rs               # FP-Tree + FP-Growth algorithm
│   ├── association_rules.rs      # Rule generation + 12 metrics
│   └── common.rs                 # Shared helpers
├── python/
│   ├── rusket/                  # Primary Python package (pyproject.toml name)
│   │   ├── __init__.py
│   │   ├── fpgrowth.py           # Dispatch + numpy conversion
│   │   ├── association_rules.py  # Label mapping + Rust call
│   │   └── _validation.py        # Input validation helpers
│   └── fpgrowth_pyo3/            # Legacy compat package
│       └── ...
└── tests/
    ├── conftest.py
    ├── test_fpbase.py            # Shared base test classes
    ├── test_fpgrowth.py          # FP-Growth tests
    ├── test_association_rules.py # Association rules tests
    └── test_benchmark.py         # Performance benchmarks

Data flow¶

graph TD
    classDef python fill:#4B8BBE,stroke:#306998,stroke-width:2px,color:white;
    classDef rust fill:#DEA584,stroke:#000,stroke-width:2px,color:black;
    classDef data fill:#FFD43B,stroke:#306998,stroke-width:2px,color:black;

    A["Python Caller<br/>rusket.mine(df)"]:::python --> B{"Input Data Type"}:::python

    B -->|Dense Pandas| C["C-Contiguous Array<br/>(uint8)"]:::data
    B -->|Sparse Pandas| D["CSR Matrix<br/>(indptr, indices)"]:::data
    B -->|Polars| E["Numpy View<br/>(Zero-copy)"]:::data

    C --> F["Rust FFI<br/>fpgrowth_from_dense"]:::rust
    D --> G["Rust FFI<br/>fpgrowth_from_csr"]:::rust
    E --> F

    F --> H["Tree Construction<br/>(Single Pass)"]:::rust
    G --> H

    H --> I["Recursive Mining<br/>(Rayon Parallel)"]:::rust
    I --> J["Raw Vectors<br/>Vec<(count, Vec<usize>)>"]:::data

    J --> K["Python Transformation<br/>Build pd.DataFrame"]:::python

FP-Growth algorithm¶

The Rust implementation follows the classic Han et al. (2000) FP-Growth algorithm:

Header table scan — count item frequencies; prune items below min_count.
FP-Tree construction — single-pass over transactions; compress into a prefix-tree structure.
Recursive mining — for each frequent item, extract the conditional pattern base, build a conditional FP-Tree, and mine it recursively.
Output — each leaf path materialises as one frequent itemset (count, items).

Dispatch paths¶

Path	Rust function	Input shape	Notes
Dense pandas	`fpgrowth_from_dense`	`[n_rows × n_cols]` uint8	Contiguous C array
Sparse pandas	`fpgrowth_from_csr`	CSR `indptr + indices`	Zero-copy scipy CSR
Polars	`fpgrowth_from_dense`	same as dense	Arrow → NumPy view

Association rules¶

Rule generation is vectorised in Rust:

For each frequent itemset of length ≥ 2, enumerate all non-empty antecedent / consequent splits.
Look up antecedent and consequent supports from a pre-built hash map.
Compute all 12 metrics in a single pass; filter by (metric, min_threshold).
Return raw integer index lists to Python; Python maps back to column names / tuples.

Building from source¶

# Prerequisites: Rust 1.83+, Python 3.10+, uv
rustup update
uv sync

# Debug build (fast compile, slower runtime)
uv run maturin develop

# Release build (optimised)
uv run maturin develop --release

# Type checking
uv run basedpyright

# Tests
uv run pytest tests/ -x -q

# Cargo lint
cargo check
cargo clippy