API Reference¶

This file is auto-generated by scripts/gen_api_reference.py. Do not edit by hand — update the Python docstrings instead.

Functional API¶

Convenience module-level functions. For most use-cases these are the only entry points you need.

`mine`¶

Mine frequent itemsets using the specified algorithm.

This module-level function relies on the Object-Oriented APIs.

from rusket.miners.mine import mine

mine(df: 'pd.DataFrame | pl.DataFrame | np.ndarray | Any', min_support: 'float' = 0.5, null_values: 'bool' = False, use_colnames: 'bool' = True, max_len: 'int | None' = None, method: 'str' = 'fpgrowth', verbose: 'int' = 0, column_names: 'list[str] | None' = None) -> 'pd.DataFrame'

`fpgrowth`¶

Find frequent itemsets using the optimal algorithm (Eclat or FP-growth).

This module-level function relies on the Object-Oriented APIs.

from rusket.miners.fpgrowth import fpgrowth

fpgrowth(df: 'pd.DataFrame | pl.DataFrame | np.ndarray | Any', min_support: 'float' = 0.5, null_values: 'bool' = False, use_colnames: 'bool' = True, max_len: 'int | None' = None, method: 'str' = 'fpgrowth', verbose: 'int' = 0, column_names: 'list[str] | None' = None) -> 'pd.DataFrame'

`eclat`¶

Find frequent itemsets using the Eclat algorithm.

This module-level function relies on the Object-Oriented APIs.

from rusket.miners.eclat import eclat

eclat(df: 'pd.DataFrame | pl.DataFrame | np.ndarray | Any', min_support: 'float' = 0.5, null_values: 'bool' = False, use_colnames: 'bool' = True, max_len: 'int | None' = None, verbose: 'int' = 0, column_names: 'list[str] | None' = None) -> 'pd.DataFrame'

`association_rules`¶

from rusket.miners.association_rules import association_rules

association_rules(df: 'pd.DataFrame | Any', num_itemsets: 'int | None' = None, df_orig: 'pd.DataFrame | None' = None, null_values: 'bool' = False, metric: 'str' = 'confidence', min_threshold: 'float' = 0.8, support_only: 'bool' = False, return_metrics: 'list[str]' = ['antecedent support', 'consequent support', 'support', 'confidence', 'lift', 'representativity', 'leverage', 'conviction', 'zhangs_metric', 'jaccard', 'certainty', 'kulczynski']) -> 'pd.DataFrame'

`prefixspan`¶

Mine sequential patterns using the PrefixSpan algorithm.

This function discovers frequent sequences of items across multiple users/sessions. Currently, this assumes sequences where each event consists of a single item (e.g., a sequence of page views or a sequence of individual products bought over time).

from rusket.miners.prefixspan import prefixspan

prefixspan(sequences: 'list[list[int]]', min_support: 'int | float', max_len: 'int | None' = None) -> 'pd.DataFrame'

Parameters

Parameter	Type	Description
sequences	list of list of int	A list of sequences, where each sequence is a list of integers representing items. Example: `[[1, 2, 3], [1, 3], [2, 3]]`.
min_support	int \| float	The minimum absolute support (number of sequences a pattern must appear in), or float percent.
max_len	int, optional	The maximum length of the sequential patterns to mine.

Returns

Name	Type	Description
pd.DataFrame		A DataFrame containing 'support' and 'sequence' columns.

`hupm`¶

Mine high-utility itemsets.

This function discovers combinations of items that generate a high total utility (e.g., profit) across all transactions, even if they aren't the most frequent.

from rusket.miners.hupm import hupm

hupm(transactions: 'list[list[int]]', utilities: 'list[list[float]]', min_utility: 'float', max_len: 'int | None' = None) -> 'pd.DataFrame'

Parameters

Parameter	Type	Description
transactions	list of list of int	A list of transactions, where each transaction is a list of item IDs.
utilities	list of list of float	A list of identical structure to `transactions`, but containing the numeric utility (e.g., profit) of that item in that specific transaction.
min_utility	float	The minimum total utility required to consider a pattern "high-utility".
max_len	int, optional	The maximum length of the itemsets to mine.

Returns

Name	Type	Description
pd.DataFrame		A DataFrame containing 'utility' and 'itemset' columns.

`sequences_from_event_log`¶

Convert an event log DataFrame into the sequence format required by PrefixSpan.

Accepts Pandas, Polars, or PySpark DataFrames. Data is grouped by user_col, ordered by time_col, and item_col values are collected into sequences.

from rusket.miners.prefixspan import sequences_from_event_log

sequences_from_event_log(df: 'Any', user_col: 'str', time_col: 'str', item_col: 'str') -> 'tuple[list[list[int]], dict[int, Any]]'

Parameters

Parameter	Type	Description
df	pd.DataFrame \| pl.DataFrame \| pyspark.sql.DataFrame	Event log containing users, timestamps, and items.
user_col	str	Column name identifying the sequence (e.g., user_id or session_id).
time_col	str	Column name for ordering events.
item_col	str	Column name for the items.

Returns

Name	Type	Description
tuple of (indptr, indices, item_mapping)		- indptr: CSR-style index pointer list. - indices: Flattened item index list. - item_mapping: A dictionary mapping the integer IDs back to the original item labels.

`mine_hupm`¶

Mine high-utility itemsets from a long-format DataFrame.

Converts a Pandas or Polars DataFrame into the required list-of-lists format and runs the High-Utility Pattern Mining (HUPM) algorithm.

from rusket.miners.hupm import mine_hupm

mine_hupm(data: 'Any', transaction_col: 'str', item_col: 'str', utility_col: 'str', min_utility: 'float', max_len: 'int | None' = None) -> 'pd.DataFrame'

Parameters

Parameter	Type	Description
data	pd.DataFrame or pl.DataFrame	A long-format DataFrame where each row represents an item in a transaction.
transaction_col	str	Column name identifying the transaction ID.
item_col	str	Column name identifying the item ID (must be numeric integers).
utility_col	str	Column name identifying the numeric utility (e.g. price, profit) of the item.
min_utility	float	The minimum total utility required to consider a pattern "high-utility".
max_len	int, optional	Maximum length of the itemsets to mine.

Returns

Name	Type	Description
pd.DataFrame		A DataFrame containing 'utility' and 'itemset' columns.

`mine_duckdb`¶

Stream directly from a DuckDB query via Arrow RecordBatches.

This is extremely memory efficient, bypassing Pandas entirely.

from rusket.miners.streaming import mine_duckdb

mine_duckdb(con: 'Any', query: 'str', n_items: 'int', txn_col: 'str', item_col: 'str', min_support: 'float' = 0.5, max_len: 'int | None' = None, chunk_size: 'int' = 1000000) -> 'pd.DataFrame'

`mine_spark`¶

Stream natively from a PySpark DataFrame on Databricks via Arrow.

Uses toLocalIterator() to fetch Arrow chunks incrementally directly to the driver node, avoiding massive memory spikes.

from rusket.miners.streaming import mine_spark

mine_spark(spark_df: 'Any', n_items: 'int', txn_col: 'str', item_col: 'str', min_support: 'float' = 0.5, max_len: 'int | None' = None) -> 'pd.DataFrame'

`from_transactions`¶

Convert long-format transactional data to a one-hot boolean matrix.

The return type mirrors the input type:

Polars DataFrame → Polars DataFrame
Pandas DataFrame → Pandas DataFrame
Spark DataFrame → Spark DataFrame
list[list[...]] → Pandas DataFrame

from rusket.miners.transactions import from_transactions

from_transactions(data: 'DataFrame | Sequence[Sequence[str | int]] | Any', transaction_col: 'str | None' = None, item_col: 'str | None' = None, min_item_count: 'int' = 1, verbose: 'int' = 0) -> 'Any'

Parameters

Parameter	Type	Description
data		One of: - Pandas / Polars / Spark DataFrame with (at least) two columns: one for the transaction identifier and one for the item. - List of lists where each inner list contains the items of a single transaction, e.g. `[["bread", "milk"], ["bread", "eggs"]]`.
transaction_col		Name of the column that identifies transactions. If `None` the first column is used. Ignored for list-of-lists input.
item_col		Name of the column that contains item values. If `None` the second column is used. Ignored for list-of-lists input.
min_item_count		Minimum number of times an item must appear to be included in the resulting one-hot-encoded matrix. Default is 1.

Returns

Name	Type	Description
DataFrame		A boolean DataFrame (same type as input) ready for :func:`rusket.fpgrowth` or :func:`rusket.eclat`. Column names correspond to the unique items.

Examples

>>> import rusket
>>> import pandas as pd
>>> df = pd.DataFrame({
...     "order_id": [1, 1, 1, 2, 2, 3],
...     "item": [3, 4, 5, 3, 5, 8],
... })
>>> ohe = rusket.from_transactions(df)
>>> freq = rusket.fpgrowth(ohe, min_support=0.5, use_colnames=True)

`from_transactions_csr`¶

Convert long-format transactional data to a CSR matrix + column names.

Unlike :func:from_transactions, this returns a raw scipy.sparse.csr_matrix that can be passed directly to :func:rusket.fpgrowth or :func:rusket.eclat — no pandas overhead.

For billion-row datasets, this processes data in chunks of chunk_size rows, keeping peak memory to one chunk + the running CSR.

from rusket.miners.transactions import from_transactions_csr

from_transactions_csr(data: 'DataFrame | str | Any', transaction_col: 'str | None' = None, item_col: 'str | None' = None, chunk_size: 'int' = 10000000) -> 'tuple[Any, list[str]]'

Parameters

Parameter	Type	Description
data		One of: - Pandas DataFrame with (at least) two columns. - Polars DataFrame or Spark DataFrame (converted internally). - File path (str / Path) to a Parquet file — read in chunks.
transaction_col		Name of the transaction-id column. Defaults to the first column.
item_col		Name of the item column. Defaults to the second column.
chunk_size		Number of rows per chunk. Lower values use less memory. Default: 10 million rows.

Returns

Name	Type	Description
tuple[scipy.sparse.csr_matrix, list[str]]		A CSR matrix and the list of column (item) names. Pass directly:: csr, names = from_transactions_csr(df) freq = fpgrowth(csr, min_support=0.001, use_colnames=True, column_names=names)

Examples

>>> import rusket
>>> csr, names = rusket.from_transactions_csr("orders.parquet")
>>> freq = rusket.fpgrowth(csr, min_support=0.001,
...                        use_colnames=True, column_names=names)

`from_pandas`¶

Shorthand for from_transactions(df, transaction_col, item_col).

from rusket.miners.transactions import from_pandas

from_pandas(df: 'pd.DataFrame', transaction_col: 'str | None' = None, item_col: 'str | None' = None, min_item_count: 'int' = 1, verbose: 'int' = 0) -> 'pd.DataFrame'

`from_polars`¶

Shorthand for from_transactions(df, transaction_col, item_col).

from rusket.miners.transactions import from_polars

from_polars(df: 'pl.DataFrame', transaction_col: 'str | None' = None, item_col: 'str | None' = None, min_item_count: 'int' = 1, verbose: 'int' = 0) -> 'pl.DataFrame'

`from_spark`¶

Shorthand for from_transactions(df, transaction_col, item_col).

from rusket.miners.transactions import from_spark

from_spark(df: 'SparkDataFrame', transaction_col: 'str | None' = None, item_col: 'str | None' = None, min_item_count: 'int' = 1, verbose: 'int' = 0) -> 'SparkDataFrame'

`from_arrow`¶

Convert a PyArrow Table in long format to a one-hot boolean PyArrow Table.

This is a zero-copy-friendly shorthand for from_transactions(table, ...). The input table must have at least two columns: one for the transaction identifier and one for the item. The returned table has boolean columns (one per unique item).

from rusket.miners.transactions import from_arrow

from_arrow(table: 'pa.Table', transaction_col: 'str | None' = None, item_col: 'str | None' = None, min_item_count: 'int' = 1, verbose: 'int' = 0) -> 'pa.Table'

Parameters

Parameter	Type	Description
table		A `pyarrow.Table` with at least two columns (transaction id + item).
transaction_col		Name of the transaction-id column. Defaults to the first column.
item_col		Name of the item column. Defaults to the second column.
min_item_count		Minimum occurrences for an item to be included. Default is 1.
verbose		Verbosity level.

Returns

Name	Type	Description
pyarrow.Table		A boolean Table ready for :func:`rusket.fpgrowth` / :func:`rusket.eclat`.

`evaluate`¶

Evaluate a trained recommendation model on a test set.

Compute metrics like NDCG@k, Hit Rate@k, Precision@k, and Recall@k using fast natively-backed Rust evaluation loops.

When a model has _user_labels / _item_labels (set by from_transactions()), the test IDs are automatically mapped to internal 0-based indices so that recommend_items() receives valid indices and the recommended item indices can be compared with the ground truth.

from rusket.evaluation.metrics import evaluate

evaluate(model: 'Any', test_interactions: 'Any', k: 'int' = 10, metrics: 'list[MetricName] | None' = None) -> 'dict[str, float]'

Parameters

Parameter	Type	Description
model	Any	A trained recommendation model supporting `recommend_items(user_id, k, exclude_seen)`.
test_interactions	np.ndarray or pd.DataFrame	Ground truth test interactions. Must either have columns "user" and "item", or be a 2D array format.
k	int, default=10	The cutoff rank for evaluation.
metrics	list of str, optional	Metrics to compute. Default: ["ndcg", "hr", "precision", "recall"].

Returns

Name	Type	Description
dict[str, float]		Dictionary of averaged metric values.

`train_test_split`¶

Split interactions into random train and test sets.

from rusket.evaluation.splitting import train_test_split

train_test_split(df: 'pd.DataFrame', user_col: 'str', item_col: 'str', test_size: 'float' = 0.2, random_state: 'int | None' = None) -> 'tuple[pd.DataFrame, pd.DataFrame]'

Parameters

Parameter	Type	Description
df	pd.DataFrame	The interaction dataframe.
user_col	str	Name of the user column.
item_col	str	Name of the item column.
test_size	float, default=0.2	Percentage of data to put in the test set.
random_state	int, optional	Set random seed (currently not used by Rust backend, but reserved for future).

Returns

Name	Type	Description
tuple[pd.DataFrame, pd.DataFrame]		train_df, test_df

`leave_one_out_split`¶

Leave exactly one interaction per user for the test set.

If a timestamp column is provided, the latest interaction is left out. If no timestamp is provided, a random interaction is chosen.

from rusket.evaluation.splitting import leave_one_out_split

leave_one_out_split(df: 'pd.DataFrame', user_col: 'str', item_col: 'str', timestamp_col: 'str | None' = None) -> 'tuple[pd.DataFrame, pd.DataFrame]'

Parameters

Parameter	Type	Description
df	pd.DataFrame	The interaction dataframe.
user_col	str	Name of the user column (must be numeric encoded to i32 ideally, or pandas int).
item_col	str	Name of the item column.
timestamp_col	str, optional	Name of the timestamp or ordering column.

Returns

Name	Type	Description
tuple[pd.DataFrame, pd.DataFrame]		train_df, test_df

`pca`¶

Project data into n_components dimensions using PCA.

from rusket.viz.pca import pca

pca(x: 'npt.NDArray[Any]', n_components: 'int' = 2, svd_solver: 'str' = 'auto') -> 'ProjectedSpace'

Parameters

Parameter	Type	Description
x	array-like of shape (n_samples, n_features)
n_components	int, default=2
svd_solver	{"auto", "exact", "randomized"}, default="auto"

`pca2`¶

Project data into exactly 2 dimensions using PCA.

from rusket.viz.pca import pca2

pca2(x: 'npt.NDArray[Any]', svd_solver: 'str' = 'auto') -> 'ProjectedSpace'

`pca3`¶

Project data into exactly 3 dimensions using PCA.

from rusket.viz.pca import pca3

pca3(x: 'npt.NDArray[Any]', svd_solver: 'str' = 'auto') -> 'ProjectedSpace'

OOP Mining API¶

All mining classes share a common Miner.from_transactions() / .mine() interface. FPGrowth, Eclat, FIN, LCM, and HUPM also inherit RuleMinerMixin which adds .association_rules() and .recommend_items() helpers.

`FPGrowth`¶

FP-Growth frequent itemset miner.

This class wraps the fast, core Rust FP-Growth implementation.

from rusket.miners.fpgrowth import FPGrowth

FPGrowth(data: 'pd.DataFrame | pl.DataFrame | np.ndarray | Any', item_names: 'list[str] | None' = None, min_support: 'float' = 0.5, null_values: 'bool' = False, use_colnames: 'bool' = True, max_len: 'int | None' = None, verbose: 'int' = 0, **kwargs: 'Any')

`FPGrowth.mine`¶

Execute the FP-growth algorithm on the stored data.

from rusket.miners.fpgrowth import FPGrowth.mine

FPGrowth.mine(**kwargs: 'Any') -> 'pd.DataFrame'

Returns

Name	Type	Description
pandas.DataFrame		DataFrame with two columns: - `support`: the support score. - `itemsets`: list of items (indices or column names).

`Eclat`¶

Eclat frequent itemset miner.

Eclat is typically faster than FP-growth on dense datasets due to efficient vertical bitset intersection logic.

from rusket.miners.eclat import Eclat

Eclat(data: 'pd.DataFrame | pl.DataFrame | np.ndarray | Any', item_names: 'list[str] | None' = None, min_support: 'float' = 0.5, null_values: 'bool' = False, use_colnames: 'bool' = True, max_len: 'int | None' = None, verbose: 'int' = 0, **kwargs: 'Any')

`Eclat.mine`¶

Execute the Eclat algorithm on the stored data.

from rusket.miners.eclat import Eclat.mine

Eclat.mine(**kwargs: 'Any') -> 'pd.DataFrame'

Returns

Name	Type	Description
pandas.DataFrame		DataFrame with two columns: - `support`: the support score. - `itemsets`: list of items (indices or column names).

`PrefixSpan`¶

Sequential Pattern Mining (PrefixSpan) model.

This class discovers frequent sequences of items across multiple users/sessions.

from rusket.miners.prefixspan import PrefixSpan

PrefixSpan(data: 'list[list[int]]', min_support: 'int | float', max_len: 'int | None' = None, item_mapping: 'dict[int, Any] | None' = None)

`PrefixSpan.mine`¶

Mine sequential patterns using PrefixSpan.

from rusket.miners.prefixspan import PrefixSpan.mine

PrefixSpan.mine(**kwargs: 'Any') -> 'pd.DataFrame'

Returns

Name	Type	Description
pd.DataFrame		A DataFrame containing 'support' and 'sequence' columns. Sequences are mapped back to original item names if `from_transactions` was used.

`HUPM`¶

High-Utility Pattern Mining (HUPM) model.

This class discovers combinations of items that generate a high total utility (e.g., profit) across all transactions, even if they aren't the most frequent.

from rusket.miners.hupm import HUPM

HUPM(transactions: 'list[list[int]]', utilities: 'list[list[float]]', min_utility: 'float', max_len: 'int | None' = None)

`HUPM.mine`¶

Mine high-utility itemsets.

from rusket.miners.hupm import HUPM.mine

HUPM.mine(**kwargs: 'Any') -> 'pd.DataFrame'

Returns

Name	Type	Description
pd.DataFrame		A DataFrame containing 'utility' and 'itemset' columns.

`FPMiner`¶

Streaming FP-Growth / Eclat accumulator for billion-row datasets.

Feeds (transaction_id, item_id) integer arrays to Rust one chunk at a time. Rust accumulates per-transaction item lists in a HashMap<i64, Vec<i32>>. Peak Python memory = one chunk.

from rusket.miners.streaming import FPMiner

FPMiner(n_items: 'int', max_ram_mb: 'int | None' = -1, hint_n_transactions: 'int | None' = None) -> 'None'

Parameters

Parameter	Type	Description
n_items	int	Number of distinct items (column count). All item IDs fed via :meth:`add_chunk` must be in `[0, n_items)`.

Examples

Process a Parquet file 10 M rows at a time:

>>> import pandas as pd
>>> import numpy as np
>>> from rusket import FPMiner
>>> miner = FPMiner(n_items=500_000)
>>> for chunk in pd.read_parquet("orders.parquet", chunksize=10_000_000):
...     txn = chunk["txn_id"].to_numpy(dtype="int64")
...     item = chunk["item_idx"].to_numpy(dtype="int32")
...     miner.add_chunk(txn, item)
>>> freq = miner.mine(min_support=0.001, max_len=3, use_colnames=True)

`FPMiner.add_arrow_batch`¶

Feed a PyArrow RecordBatch directly into the miner. Zero-copy extraction is used if types match (Int64/Int32).

from rusket.miners.streaming import FPMiner.add_arrow_batch

FPMiner.add_arrow_batch(batch: 'Any', txn_col: 'str', item_col: 'str') -> 'FPMiner'

`FPMiner.add_chunk`¶

Feed a chunk of (transaction_id, item_id) pairs.

from rusket.miners.streaming import FPMiner.add_chunk

FPMiner.add_chunk(txn_ids: 'np.ndarray', item_ids: 'np.ndarray') -> 'FPMiner'

Parameters

Parameter	Type	Description
txn_ids	np.ndarray[int64]	1-D array of transaction identifiers (arbitrary 64-bit integers).
item_ids	np.ndarray[int32]	1-D array of item column indices (0-based).

Returns

Name	Type	Description
self (for chaining)

`FPMiner.fit`¶

Sklearn-compatible alias for mine(). Runs the mining algorithm.

from rusket.miners.streaming import FPMiner.fit

FPMiner.fit(**kwargs: 'Any') -> 'FPMiner'

Returns

Name	Type	Description
self

`FPMiner.mine`¶

Mine frequent itemsets from all accumulated transactions.

from rusket.miners.streaming import FPMiner.mine

FPMiner.mine(min_support: 'float' = 0.5, max_len: 'int | None' = None, use_colnames: 'bool' = True, column_names: 'list[str] | None' = None, method: "typing.Literal['fpgrowth', 'eclat']" = 'fpgrowth', verbose: 'int' = 0) -> 'pd.DataFrame'

Parameters

Parameter	Type	Description
min_support	float	Minimum support threshold in `(0, 1]`.
max_len	int \| None	Maximum itemset length.
use_colnames	bool	If `True`, itemsets contain column names instead of indices.
column_names	list[str] \| None	Column names to use when `use_colnames=True`.
method	"fpgrowth" \| "eclat"	Mining algorithm to use.
verbose	int	Level of verbosity: >0 prints progress logs and times.

Returns

Name	Type	Description
pd.DataFrame		Columns `support` and `itemsets`.

`FPMiner.predict`¶

Return the last mined result, or run fit() first.

from rusket.miners.streaming import FPMiner.predict

FPMiner.predict(**kwargs: 'Any') -> 'pd.DataFrame'

Returns

Name	Type	Description
pd.DataFrame		The frequent itemsets.

`FPMiner.reset`¶

Free all accumulated data.

from rusket.miners.streaming import FPMiner.reset

FPMiner.reset() -> 'None'

`FIN`¶

FIN (Fast Itemset per Nodeset) frequent itemset miner.

This class wraps the fast core Rust FIN implementation.

from rusket.miners.fin import FIN

FIN(data: 'pd.DataFrame | pl.DataFrame | np.ndarray | Any', item_names: 'list[str] | None' = None, min_support: 'float' = 0.5, null_values: 'bool' = False, use_colnames: 'bool' = True, max_len: 'int | None' = None, verbose: 'int' = 0, **kwargs: 'Any')

`FIN.mine`¶

Execute the FIN algorithm on the stored data.

from rusket.miners.fin import FIN.mine

FIN.mine(**kwargs: 'Any') -> 'pd.DataFrame'

Returns

Name	Type	Description
pandas.DataFrame		DataFrame with two columns: - `support`: the support score. - `itemsets`: list of items (indices or column names).

`LCM`¶

LCM (Linear Closed Itemset Miner) frequent itemset miner.

This class wraps the fast core Rust LCM implementation using Prefix-Preserving Closure Extension. It produces only closed frequent itemsets, offering massive memory savings and faster execution out-of-the-box compared to classic algorithms on dense datasets.

from rusket.miners.lcm import LCM

LCM(data: 'pd.DataFrame | pl.DataFrame | np.ndarray | Any', item_names: 'list[str] | None' = None, min_support: 'float' = 0.5, null_values: 'bool' = False, use_colnames: 'bool' = True, max_len: 'int | None' = None, verbose: 'int' = 0, **kwargs: 'Any')

`LCM.mine`¶

Execute the LCM algorithm on the stored data to find closed itemsets.

from rusket.miners.lcm import LCM.mine

LCM.mine(**kwargs: 'Any') -> 'pd.DataFrame'

Returns

Name	Type	Description
pandas.DataFrame		DataFrame with two columns: - `support`: the support score. - `itemsets`: list of items (indices or column names).

`RuleMinerMixin` — Shared Miner Interface¶

FPGrowth, Eclat, FIN, LCM, and HUPM all inherit these methods from RuleMinerMixin. You do not construct RuleMinerMixin directly.

`RuleMinerMixin.association_rules`¶

Generate association rules from the mined frequent itemsets.

from rusket.model._mixins import RuleMinerMixin.association_rules

RuleMinerMixin.association_rules(metric: 'str' = 'confidence', min_threshold: 'float' = 0.8, return_metrics: 'list[str] | None' = None) -> 'pd.DataFrame'

Parameters

Parameter	Type	Description
metric	str, default='confidence'	The metric to evaluate if a rule is of interest.
min_threshold	float, default=0.8	The minimum threshold for the evaluation metric.
return_metrics	list[str] \| None, default=None	List of metrics to include in the resulting DataFrame. Defaults to all available metrics.

Returns

Name	Type	Description
pd.DataFrame		DataFrame of strong association rules.

`RuleMinerMixin.recommend_items`¶

Deprecated: use :meth:recommend_for_cart instead.

from rusket.model._mixins import RuleMinerMixin.recommend_items

RuleMinerMixin.recommend_items(items: 'list[Any]', n: 'int' = 5) -> 'list[Any]'

`RuleMinerMixin._invalidate_rules_cache`¶

Clear the cached association rules (call after re-mining).

from rusket.model._mixins import RuleMinerMixin._invalidate_rules_cache

RuleMinerMixin._invalidate_rules_cache() -> 'None'

Recommenders¶

`ALS`¶

Implicit ALS collaborative filtering model.

from rusket.recommenders.als import ALS

ALS(factors: 'int' = 64, regularization: 'float' = 0.01, alpha: 'float' = 40.0, iterations: 'int' = 15, seed: 'int' = 42, verbose: 'int' = 0, cg_iters: 'int' = 10, use_cholesky: 'bool' = False, use_eals: 'bool' = False, eals_iters: 'int' = 1, anderson_m: 'int' = 0, popularity_weighting: 'str' = 'none', use_biases: 'bool' = False, alpha_view: 'float' = 10.0, view_target: 'float' = 0.5, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int	Number of latent factors.
regularization	float	L2 regularisation weight.
alpha	float	Confidence scaling: `confidence = 1 + alpha * r`.
iterations	int	Number of ALS outer iterations.
seed	int	Random seed.
cg_iters	int	Conjugate Gradient iterations per user/item solve (ignored when `use_cholesky=True`). Reduce to 3 for very large datasets.
use_cholesky	bool	Use a direct Cholesky solve instead of iterative CG. Exact solution; faster when users have many interactions relative to `factors`.
use_eals	bool	Use element-wise ALS (eALS). Usually faster than Cholesky/CG and less memory intensive.
eals_iters	int	Number of inner iterations for eALS (default 1).
anderson_m	int	History window for Anderson Acceleration of the outer ALS loop (default 0 = disabled). Recommended value: 5. ALS is a fixed-point iteration `(U,V) → F(U,V)`. Anderson mixing extrapolates over the last `m` residuals to reach the fixed point faster, typically reducing the number of outer iterations by 30–50 % at identical recommendation quality:: # Baseline: 15 iterations model = ALS(iterations=15, cg_iters=3) # Anderson-accelerated: 10 iterations, ~2.5× faster, same quality model = ALS(iterations=10, cg_iters=3, anderson_m=5) Memory overhead: `m` copies of the full `(U ∥ V)` matrix (~57 MB per copy at 25M ratings, k=64).
popularity_weighting	str	Weighting scheme for missing data in eALS. Items that are frequently interacted-with provide stronger negative signals when not chosen. Options: `"none"` (uniform, default), `"sqrt"`, `"log"`, `"linear"`. Only used when `use_eals=True`.
use_biases	bool	If True, learn global bias (μ), user biases (b_u), and item biases (b_i) so that prediction becomes `μ + b_u + b_i + w_u · h_i`.
alpha_view	float	Confidence scaling for view interactions in VALS mode. Pass `view_matrix` to `fit()` to enable. Default 10.0.
view_target	float	Target value for view interactions (between 0.0 and 1.0). Purchases always target 1.0. Default 0.5.
use_cuda	bool	If True, use CUDA acceleration (CuPy or PyTorch) for batch recommendation. Falls back to CPU if no CUDA backend found. Default False.

Examples

Fold in a new user without retraining the entire model matrix:

>>> import rusket
>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> # Fit model on some data
>>> model = rusket.ALS(factors=8).fit(csr_matrix(np.random.randint(0, 2, size=(10, 20))))
>>> # New user interacts with items 3, 5, and 12
>>> latent_factors = model.recalculate_user([3, 5, 12])
>>> # `latent_factors` is a 1D array of length `factors=8`

`ALS.batch_recommend`¶

Top-N items for all users efficiently computed in parallel.

from rusket.recommenders.als import ALS.batch_recommend

ALS.batch_recommend(n: 'int' = 10, exclude_seen: 'bool' = True, format: "Literal['pandas', 'polars', 'spark']" = 'polars') -> 'Any'

Parameters

Parameter	Type	Description
n	int, default=10	The number of items to recommend per user.
exclude_seen	bool, default=True	Whether to exclude items the user has already interacted with.
format	str, default="polars"	The DataFrame format to return. One of "pandas", "polars", or "spark".

Returns

Name	Type	Description
DataFrame		A DataFrame with columns `user_id`, `item_id`, and `score`.

`ALS.build_ann_index`¶

Build an Approximate Nearest Neighbor index from item factors.

from rusket.recommenders.als import ALS.build_ann_index

ALS.build_ann_index(backend: 'str' = 'native', index_type: 'str' = 'hnsw', **kwargs: 'Any') -> 'Any'

Parameters

Parameter	Type	Description
backend	str	`"native"` uses the built-in Rust random-projection forest (:class:`~rusket.ApproximateNearestNeighbors`). `"faiss"` uses FAISS (requires `pip install faiss-cpu`).
index_type	str	For `"faiss"` backend: `"flat"`, `"hnsw"`, `"ivfflat"`, `"ivfpq"`. Ignored for `"native"` backend.
**kwargs		Additional arguments passed to the index builder.

Returns

Name	Type	Description
index		A fitted ANN index with a `query()` / `kneighbors()` method.

`ALS.fit`¶

Fit the model to the user-item interaction matrix.

from rusket.recommenders.als import ALS.fit

ALS.fit(interactions: 'Any' = None, *, view_matrix: 'Any' = None) -> 'ALS'

Parameters

Parameter	Type	Description
interactions	sparse matrix or numpy array, optional	If None, uses the matrix prepared by `from_transactions()`.
view_matrix	sparse matrix or numpy array, optional	Optional view/browse interaction matrix (same shape as `interactions`). When provided, enables VALS mode: views are treated as weaker positive signals with confidence `alpha_view` targeting `view_target`.

Raises

Exception	Condition
RuntimeError
TypeError

`ALS.recalculate_user`¶

Calculate the latent factors for a new or existing user given their interacted items.

from rusket.recommenders.als import ALS.recalculate_user

ALS.recalculate_user(user_items: 'Any') -> 'np.ndarray'

Parameters

Parameter	Type	Description
user_items	list of int or 1D array-like	The item indices the user has interacted with. If the model was fitted using a DataFrame with item names, these should be the mapped item indices from 0 to n_items - 1. Note: Confidence values for interactions are currently treated as 1.

Returns

Name	Type	Description
ndarray		A 1D numpy array of shape (factors,) containing the user's latent factors.

Raises

Exception	Condition
RuntimeError
ValueError

`ALS.recommend_items`¶

Top-N items for a user. Set exclude_seen=False to include already-seen items.

from rusket.recommenders.als import ALS.recommend_items

ALS.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[Any, Any]'

`ALS.recommend_users`¶

Top-N users for an item.

from rusket.recommenders.als import ALS.recommend_users

ALS.recommend_users(item_id: 'int', n: 'int' = 10) -> 'tuple[Any, Any]'

`eALS`¶

Element-wise ALS (eALS) collaborative filtering model.

A convenience wrapper around :class:ALS that sets use_eals=True by default. eALS updates latent factors element-by-element rather than block-wise, which is often faster and less memory-intensive for implicit datasets while yielding comparable or better recommendation quality.

from rusket.recommenders.als import eALS

eALS(*args: 'Any', use_eals: 'bool' = True, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int	Number of latent factors.
regularization	float	L2 regularisation weight.
alpha	float	Confidence scaling: `confidence = 1 + alpha * r`.
iterations	int	Number of ALS outer iterations.
seed	int	Random seed.
eals_iters	int	Number of inner iterations for eALS (default 1).
**kwargs		Additional arguments passed to :class:`ALS`.

`BPR`¶

Bayesian Personalized Ranking (BPR) model for implicit feedback.

BPR optimizes for ranking rather than reconstruction error (like ALS). It works by drawing positive items the user interacted with, and negative items they haven't, and adjusting latent factors to ensure the positive item scores higher.

from rusket.recommenders.bpr import BPR

BPR(factors: 'int' = 64, learning_rate: 'float' = 0.05, regularization: 'float' = 0.01, iterations: 'int' = 150, seed: 'int' = 42, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int	Number of latent factors (default: 64).
learning_rate	float	SGD learning rate (default: 0.05).
regularization	float	L2 regularization weight (default: 0.01).
iterations	int	Number of passes over the entire interaction dataset (default: 150).
seed	int	Random seed for Hogwild! SGD sampling (default: 42).
use_cuda	bool	If True, use CUDA acceleration (CuPy or PyTorch) for recommendation. Falls back to CPU if no CUDA backend found. Default False.

`BPR.fit`¶

Fit the BPR model to the user-item interaction matrix.

from rusket.recommenders.bpr import BPR.fit

BPR.fit(interactions: 'Any' = None) -> 'BPR'

Parameters

Parameter	Type	Description
interactions	sparse matrix or numpy array, optional	If None, uses the matrix prepared by `from_transactions()`.

Raises

Exception	Condition
RuntimeError
TypeError

`BPR.recommend_items`¶

Top-N items for a user.

from rusket.recommenders.bpr import BPR.recommend_items

BPR.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[Any, Any]'

`FM`¶

Factorization Machines (FM) context-aware model for predictive tasks (e.g. CTR).

This model supports binary classification tasks using Log Loss (Binary Cross Entropy). Inputs should be formatted as a scipy sparse CSR matrix where features are binary (0/1). Each row is a sample consisting of User, Item, and Context features.

from rusket.recommenders.fm import FM

FM(factors: 'int' = 8, learning_rate: 'float' = 0.05, regularization: 'float' = 0.01, iterations: 'int' = 100, seed: 'int' = 42, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int	Number of latent factors for the cross terms (default: 8).
learning_rate	float	SGD learning rate (default: 0.05).
regularization	float	L2 regularization weight (default: 0.01).
iterations	int	Number of training epochs (default: 100).
seed	int	Random seed for SGD sampling (default: 42).
verbose	bool	Whether to print training progress (default: False).
use_gpu	bool	If True, use GPU acceleration (CuPy or PyTorch) for recommendation. Falls back to CPU if no GPU backend found. Default False.

`FM.fit`¶

Fit the FM model to Context-aware Data.

from rusket.recommenders.fm import FM.fit

FM.fit(X: 'Any', y: 'Any') -> 'FM'

Parameters

Parameter	Type	Description
X	scipy.sparse.csr_matrix or numpy array	Sparse binary feature matrix of shape (n_samples, n_features). Each row represents a single interaction with all its context features.
y	numpy.ndarray	Binary target labels (0.0 or 1.0) of shape (n_samples,).

`FM.predict`¶

Alias for :meth:predict_proba.

from rusket.recommenders.fm import FM.predict

FM.predict(X: 'Any') -> 'Any'

`FM.predict_proba`¶

Predict the probability (CTR) of interactions.

from rusket.recommenders.fm import FM.predict_proba

FM.predict_proba(X: 'Any') -> 'Any'

Parameters

Parameter	Type	Description
X	scipy.sparse.csr_matrix or numpy array	Sparse binary feature matrix of shape (n_samples, n_features).

Returns

Name	Type	Description
numpy.ndarray		Predicted probabilities of shape (n_samples,).

`Recommender`¶

Hybrid recommender combining ALS collaborative filtering, semantic similarities, and association rules.

from rusket.recommenders.recommend import Recommender

Recommender(model: 'Any | None' = None, rules_df: 'pd.DataFrame | None' = None, item_embeddings: 'np.ndarray | None' = None)

`Recommender.predict_next_chunk`¶

Batch-rank the next best products for every user in user_history_df.

from rusket.recommenders.recommend import Recommender.predict_next_chunk

Recommender.predict_next_chunk(user_history_df: 'pd.DataFrame', user_col: 'str' = 'user_id', k: 'int' = 5) -> 'pd.DataFrame'

`Recommender.recommend_for_cart`¶

Suggest items to add to an active cart using association rules.

from rusket.recommenders.recommend import Recommender.recommend_for_cart

Recommender.recommend_for_cart(cart_items: 'list[int]', n: 'int' = 5) -> 'list[int]'

`Recommender.recommend_for_user`¶

Top-N recommendations for a user via Hybrid ALS + Semantic.

from rusket.recommenders.recommend import Recommender.recommend_for_user

Recommender.recommend_for_user(user_id: 'int', n: 'int' = 5, alpha: 'float' = 0.5, target_item_for_semantic: 'int | None' = None) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
user_id	int	The user ID to generate recommendations for.
n	int, default=5	Number of items to return.
alpha	float, default=0.5	Weight blending CF vs Semantic. `alpha=1.0` is pure CF. `alpha=0.0` is pure semantic.
target_item_for_semantic	int \| None, default=None	If provided, semantic similarity is computed against this item. If None, and alpha < 1.0, it computes semantic similarity against the user's most recently interacted item (if history is available) or falls back to pure CF.

`NextBestAction`¶

Hybrid recommender combining ALS collaborative filtering, semantic similarities, and association rules.

from rusket.recommenders.recommend import NextBestAction

NextBestAction(model: 'Any | None' = None, rules_df: 'pd.DataFrame | None' = None, item_embeddings: 'np.ndarray | None' = None)

`NextBestAction.predict_next_chunk`¶

Batch-rank the next best products for every user in user_history_df.

from rusket.recommenders.recommend import NextBestAction.predict_next_chunk

NextBestAction.predict_next_chunk(user_history_df: 'pd.DataFrame', user_col: 'str' = 'user_id', k: 'int' = 5) -> 'pd.DataFrame'

`NextBestAction.recommend_for_cart`¶

Suggest items to add to an active cart using association rules.

from rusket.recommenders.recommend import NextBestAction.recommend_for_cart

NextBestAction.recommend_for_cart(cart_items: 'list[int]', n: 'int' = 5) -> 'list[int]'

`NextBestAction.recommend_for_user`¶

Top-N recommendations for a user via Hybrid ALS + Semantic.

from rusket.recommenders.recommend import NextBestAction.recommend_for_user

NextBestAction.recommend_for_user(user_id: 'int', n: 'int' = 5, alpha: 'float' = 0.5, target_item_for_semantic: 'int | None' = None) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
user_id	int	The user ID to generate recommendations for.
n	int, default=5	Number of items to return.
alpha	float, default=0.5	Weight blending CF vs Semantic. `alpha=1.0` is pure CF. `alpha=0.0` is pure semantic.
target_item_for_semantic	int \| None, default=None	If provided, semantic similarity is computed against this item. If None, and alpha < 1.0, it computes semantic similarity against the user's most recently interacted item (if history is available) or falls back to pure CF.

`EASE`¶

Embarrassingly Shallow Autoencoders for Sparse Data (EASE).

An implicit collaborative filtering algorithm that computes a closed-form item-item similarity matrix by solving a ridge regression problem. EASE often achieves state-of-the-art recommendation quality and very fast inference, particularly on datasets with strong item-item correlations.

from rusket.recommenders.ease import EASE

EASE(regularization: 'float' = 500.0, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
regularization	float	L2 regularization weight (lambda). Higher values encourage smaller weights and reduce overfitting. Default is 500.0.
use_gpu	bool	If True, use GPU acceleration (CuPy or PyTorch) for recommendation. Falls back to CPU if no GPU backend found. Default False.

`EASE.fit`¶

Fit the model to the user-item interaction matrix (Rust-accelerated).

from rusket.recommenders.ease import EASE.fit

EASE.fit(interactions: 'Any' = None) -> 'EASE'

Parameters

Parameter	Type	Description
interactions	sparse matrix or numpy array, optional	If None, uses the matrix prepared by `from_transactions()`.

`EASE.recommend_items`¶

Top-N items for a user. Set exclude_seen=False to include already-seen items.

from rusket.recommenders.ease import EASE.recommend_items

EASE.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[Any, Any]'

`ItemKNN`¶

Ultra-fast Sparse Item-Item K-Nearest Neighbors Recommender.

Computes an item-item similarity matrix and only retains the top-K neighbors per item. Similarity methods include BM25, TF-IDF, Cosine, or unweighted Count.

from rusket.recommenders.item_knn import ItemKNN

ItemKNN(method: "Literal['bm25', 'tfidf', 'cosine', 'count']" = 'bm25', k: 'int' = 20, bm25_k1: 'float' = 1.2, bm25_b: 'float' = 0.75, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any')

`ItemKNN.fit`¶

Fit the ItemKNN model.

from rusket.recommenders.item_knn import ItemKNN.fit

ItemKNN.fit(interactions: 'Any' = None) -> 'ItemKNN'

Parameters

Parameter	Type	Description
interactions	scipy.sparse.csr_matrix, optional	A sparse matrix of shape (n_users, n_items). If None, uses the matrix prepared by `from_transactions()`.

Returns

Name	Type	Description
ItemKNN		The fitted model.

`ItemKNN.recommend_items`¶

Top-N items for a user.

from rusket.recommenders.item_knn import ItemKNN.recommend_items

ItemKNN.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[Any, Any]'

Parameters

Parameter	Type	Description
user_id	int	The user ID to generate recommendations for.
n	int, default=10	Number of items to return.
exclude_seen	bool, default=True	Whether to exclude items the user has already interacted with.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, scores)` sorted by descending score.

`FPMC`¶

Factorizing Personalized Markov Chains (FPMC) model for sequential recommendation.

FPMC combines Matrix Factorization (modeling user preferences) and Markov Chains (modeling sequential transitions between items). It is highly effective for tasks where both personal taste and sequential behavior matter (e.g., next-basket delivery).

from rusket.sequential.fpmc import FPMC

FPMC(factors: 'int' = 64, learning_rate: 'float' = 0.05, regularization: 'float' = 0.01, iterations: 'int' = 150, seed: 'int' = 42, time_aware: 'bool' = False, max_time_steps: 'int' = 256, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int	Number of latent factors (default: 64).
learning_rate	float	SGD learning rate (default: 0.05).
regularization	float	L2 regularization weight (default: 0.01).
iterations	int	Number of passes over the transitions (default: 150).
seed	int	Random seed for sampling (default: 42).
verbose	bool	Whether to print training progress (default: False).
use_gpu	bool	If True, use GPU acceleration (CuPy or PyTorch) for recommendation. Falls back to CPU if no GPU backend found. Default False.

`FPMC.fit`¶

Fit the FPMC model to a list of sequential interactions.

from rusket.sequential.fpmc import FPMC.fit

FPMC.fit(sequences: 'list[list[int]] | None' = None, timestamps: 'list[list[int]] | None' = None, n_items: 'int | None' = None) -> 'FPMC'

Parameters

Parameter	Type	Description
sequences	list of list of int, optional	List of item sequences, where each sequence belongs to a unique user. Users are assigned IDs from 0 to len(sequences)-1. If None, uses data prepared by `from_transactions()`.
timestamps	list of list of int, optional	Corresponding unix timestamps for sequences if time_aware is True.
n_items	int \| None	Maximum number of items. If None, it is inferred from data.

`FPMC.recommend_items`¶

Top-N sequential items for a user.

from rusket.sequential.fpmc import FPMC.recommend_items

FPMC.recommend_items(user_id: 'int', timestamp: 'int | None' = None, n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[Any, Any]'

`SVD`¶

Funk SVD collaborative filtering model.

Biased matrix factorization trained with SGD: r̂_ui = μ + b_u + b_i + p_u · q_i

from rusket.recommenders.svd import SVD

SVD(factors: 'int' = 64, learning_rate: 'float' = 0.005, regularization: 'float' = 0.02, iterations: 'int' = 20, seed: 'int' = 42, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int	Number of latent factors.
learning_rate	float	SGD learning rate.
regularization	float	L2 regularisation weight.
iterations	int	Number of SGD epochs.
seed	int	Random seed for reproducibility.
verbose	int	Verbosity level (0 = silent, 1+ = progress).
use_cuda	bool	If True, use CUDA acceleration (CuPy or PyTorch) for recommendation scoring. Falls back to CPU if no CUDA backend found. Default False.

`SVD.batch_recommend`¶

Top-N items for all users efficiently computed in parallel.

from rusket.recommenders.svd import SVD.batch_recommend

SVD.batch_recommend(n: 'int' = 10, exclude_seen: 'bool' = True, format: 'str' = 'polars') -> 'Any'

Parameters

Parameter	Type	Description
n	int	Number of items per user.
exclude_seen	bool	Whether to filter already-seen items.
format	str	Output format: "polars" or "pandas".

Returns

Name	Type	Description
DataFrame		A DataFrame with columns `user_id`, `item_id`, and `score`.

`SVD.fit`¶

Fit the model to the user-item interaction matrix.

from rusket.recommenders.svd import SVD.fit

SVD.fit(interactions: 'Any' = None) -> 'SVD'

Parameters

Parameter	Type	Description
interactions	scipy.sparse matrix, np.ndarray, pd.DataFrame, or polars DataFrame, optional	User-item interaction matrix with explicit ratings. If None, uses the matrix prepared by `from_transactions()`.

Returns

Name	Type	Description
self

`SVD.predict`¶

Predict the rating for a user-item pair.

from rusket.recommenders.svd import SVD.predict

SVD.predict(user_id: 'int', item_id: 'int') -> 'float'

Parameters

Parameter	Type	Description
user_id	int	User index.
item_id	int	Item index.

Returns

Name	Type	Description
float		Predicted rating.

`SVD.recommend_items`¶

Top-N items for a user.

from rusket.recommenders.svd import SVD.recommend_items

SVD.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[Any, Any]'

Parameters

Parameter	Type	Description
user_id	int	User index.
n	int	Number of items to recommend.
exclude_seen	bool	Whether to filter already-seen items.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		(item_ids, scores)

`SVD.recommend_users`¶

Top-N users for an item.

from rusket.recommenders.svd import SVD.recommend_users

SVD.recommend_users(item_id: 'int', n: 'int' = 10) -> 'tuple[Any, Any]'

`LightGCN`¶

LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation.

A state-of-the-art collaborative filtering model that propagates embeddings over the user–item bipartite graph without non-linear transformations.

Typical training time on ml-100k: < 0.5s/epoch.

from rusket.recommenders.lightgcn import LightGCN

LightGCN(factors: 'int' = 64, k_layers: 'int' = 3, learning_rate: 'float' = 0.001, lambda_: 'float' = 0.0001, ssl_ratio: 'float' = 0.0, ssl_temp: 'float' = 0.2, ssl_weight: 'float' = 0.1, iterations: 'int' = 20, seed: 'int | None' = None, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int	Embedding dimensionality (latent factors).
k_layers	int	Number of graph-propagation layers (1–4).
learning_rate	float	Adam learning rate.
lambda_	float	L2 regularization coefficient.
iterations	int	Number of training epochs.
seed	int or None	Seed for reproducible training.
verbose	int	Print training progress.
use_gpu	bool	If True, use GPU acceleration (CuPy or PyTorch) for recommendation. Falls back to CPU if no GPU backend found. Default False.

`LightGCN.fit`¶

Fit the model to a user-item interaction matrix.

from rusket.recommenders.lightgcn import LightGCN.fit

LightGCN.fit(interactions: 'Any' = None) -> 'LightGCN'

Parameters

Parameter	Type	Description
interactions	scipy.sparse.csr_matrix or numpy.ndarray, optional	A sparse or dense user-item interaction matrix. If None, uses data prepared by `from_transactions()`.

Returns

Name	Type	Description
LightGCN		The fitted model.

`LightGCN.recommend_items`¶

Top-N items for a user.

from rusket.recommenders.lightgcn import LightGCN.recommend_items

LightGCN.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
user_id	int	Original user ID (before encoding).
n	int, default=10	Number of recommendations.
exclude_seen	bool, default=True	Whether to exclude items the user has already interacted with.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, scores)` sorted by descending score.

`SASRec`¶

SASRec – Self-Attentive Sequential Recommendation.

Applies a causal Transformer to user interaction sequences to predict the next item. Significantly outperforms Markov-chain methods like FPMC on long sequences.

from rusket.sequential.sasrec import SASRec

SASRec(factors: 'int' = 64, n_layers: 'int' = 2, max_seq: 'int' = 50, learning_rate: 'float' = 0.0005, lambda_: 'float' = 0.0001, iterations: 'int' = 20, seed: 'int | None' = None, time_aware: 'bool' = False, max_time_steps: 'int' = 256, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int	Embedding dimensionality.
n_layers	int	Number of Transformer blocks.
max_seq	int	Maximum input sequence length (older items are dropped).
learning_rate	float	SGD learning rate (decays during training).
lambda_	float	L2 regularization.
iterations	int	Number of training epochs.
seed	int or None	Seed for reproducibility.
time_aware	bool	If true, incorporates timestamp deltas into sequential modeling.
max_time_steps	int	Maximum number of time bins (e.g. days) to consider for time-awareness.
verbose	int	Print epoch progress.
use_cuda	bool	If True, use GPU acceleration (CuPy or PyTorch) for recommendation. Falls back to CPU if no GPU backend found. Default False.

`SASRec.fit`¶

from rusket.sequential.sasrec import SASRec.fit

SASRec.fit(sequences: 'list[list[int]] | None' = None, timestamps: 'list[list[int]] | None' = None) -> 'SASRec'

`SASRec.recommend_items`¶

Top-N items for a user or an ad-hoc sequence.

from rusket.sequential.sasrec import SASRec.recommend_items

SASRec.recommend_items(user_id: 'int | list[int]', timestamps: 'list[int] | None' = None, n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
user_id	int or list[int]	The ID of the user (implicitly 0 to len(sequences)-1 from fit), or a list of items representing an ad-hoc sequence.
timestamps	list[int], optional	Corresponding unix timestamps if user_id is a list of items and time_aware=True.
n	int, default=10	Number of recommendations.
exclude_seen	bool, default=True	Whether to exclude items the user has already interacted with.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, scores)` sorted by descending score.

`PopularityRecommender`¶

Recommend items by global popularity (interaction count).

A non-personalised baseline that ranks every item by the total number of interactions it received. Useful as a sanity-check baseline when evaluating more sophisticated models.

from rusket.recommenders.popularity import PopularityRecommender

PopularityRecommender(verbose: 'int' = 0, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
verbose	int, default=0	Verbosity level.

`PopularityRecommender.fit`¶

Fit the model by counting interactions per item.

from rusket.recommenders.popularity import PopularityRecommender.fit

PopularityRecommender.fit(interactions: 'Any' = None) -> 'PopularityRecommender'

Parameters

Parameter	Type	Description
interactions	sparse matrix or numpy array, optional	User-item interaction matrix. If None, uses the matrix prepared by `from_transactions()`.

Returns

Name	Type	Description
PopularityRecommender		The fitted model.

`PopularityRecommender.recommend_items`¶

Return the n most popular items for a user.

from rusket.recommenders.popularity import PopularityRecommender.recommend_items

PopularityRecommender.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
user_id	int	Internal user index.
n	int, default=10	Number of items to return.
exclude_seen	bool, default=True	Whether to exclude items the user has already interacted with.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, scores)` sorted by descending popularity.

`ContentBased`¶

Content-based recommender using TF-IDF vectorization and cosine similarity.

Recommends items similar to a given item based on textual features (descriptions, tags, genres, etc.).

from rusket.recommenders.content_based import ContentBased

ContentBased(max_features: 'int' = 5000, ngram_range: 'tuple[int, int]' = (1, 2), **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
max_features	int, default=5000	Maximum number of TF-IDF features to extract.
ngram_range	tuple[int, int], default=(1, 2)	Range of n-grams for TF-IDF vectorisation.

`ContentBased.fit`¶

Compute TF-IDF vectors and the pairwise cosine similarity matrix (Rust-accelerated).

from rusket.recommenders.content_based import ContentBased.fit

ContentBased.fit() -> 'ContentBased'

Returns

Name	Type	Description
ContentBased		The fitted model.

`ContentBased.recommend_similar`¶

Find the n most similar items to a given item.

from rusket.recommenders.content_based import ContentBased.recommend_similar

ContentBased.recommend_similar(item: 'Any', n: 'int' = 10) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
item	Any	Item ID (as it appeared in `item_col` of the source DataFrame).
n	int, default=10	Number of similar items to return.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, similarity_scores)` sorted by descending similarity.

`HybridRecommender`¶

Weighted ensemble of multiple recommendation models.

Blends the output of several pre-fitted models by combining their recommend_items scores with configurable weights.

from rusket.recommenders.hybrid import HybridRecommender

HybridRecommender(models_and_weights: 'list[tuple[Any, float]]') -> 'None'

Parameters

Parameter	Type	Description
models_and_weights	list[tuple[Any, float]]	List of `(model, weight)` pairs. Each model must implement `recommend_items(user_id, n, exclude_seen) -> (ids, scores)`.

`HybridRecommender.fit`¶

No-op — constituent models must be pre-fitted.

from rusket.recommenders.hybrid import HybridRecommender.fit

HybridRecommender.fit() -> 'HybridRecommender'

`HybridRecommender.recommend_items`¶

Blend recommendations from all constituent models.

For each model, requests a large candidate set (n * 3), maps item scores into a shared score vector, applies the weight, and returns the top-n from the blended result.

from rusket.recommenders.hybrid import HybridRecommender.recommend_items

HybridRecommender.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
user_id	int	Internal user index.
n	int, default=10	Number of items to return.
exclude_seen	bool, default=True	Whether to exclude items already seen.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, blended_scores)` sorted by descending score.

`NMF`¶

Non-negative Matrix Factorization for collaborative filtering.

Decomposes the user-item interaction matrix R into two non-negative matrices W (users × factors) and H (factors × items) such that R ≈ W @ H. The multiplicative update rules guarantee non-negativity without a projection step.

from rusket.recommenders.nmf import NMF

NMF(factors: 'int' = 64, iterations: 'int' = 100, regularization: 'float' = 0.01, seed: 'int' = 42, verbose: 'int' = 0, use_cuda: 'bool | None' = None, **kwargs: 'Any') -> 'None'

Parameters

Parameter	Type	Description
factors	int, default=64	Number of latent factors.
iterations	int, default=100	Number of multiplicative update iterations.
regularization	float, default=0.01	L2 regularisation penalty applied to both W and H.
seed	int, default=42	Random seed for initialisation.
verbose	int, default=0	Verbosity level.
use_gpu	bool	If True, use GPU acceleration (CuPy or PyTorch) for recommendation. Falls back to CPU if no GPU backend found. Default False.

`NMF.fit`¶

Fit via multiplicative update rules (Rust-accelerated).

from rusket.recommenders.nmf import NMF.fit

NMF.fit(interactions: 'Any' = None) -> 'NMF'

Parameters

Parameter	Type	Description
interactions	sparse matrix or numpy array, optional	User-item interaction matrix. If None, uses the matrix prepared by `from_transactions()`.

Returns

Name	Type	Description
NMF		The fitted model.

`NMF.recommend_items`¶

Top-N items for a user via W @ H^T.

from rusket.recommenders.nmf import NMF.recommend_items

NMF.recommend_items(user_id: 'int', n: 'int' = 10, exclude_seen: 'bool' = True) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
user_id	int	Internal user index.
n	int, default=10	Number of items to return.
exclude_seen	bool, default=True	Whether to exclude already-seen items.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, scores)` sorted by descending score.

Analytics & Utilities¶

`score_potential`¶

Cross-selling potential scores — shape (n_users, n_items) or (n_users, len(target_categories)).

Items the user has already interacted with are masked to -inf.

from rusket.recommenders.recommend import score_potential

score_potential(user_history: 'list[list[int]]', model: 'Any', target_categories: 'list[int] | None' = None) -> 'np.ndarray'

`similar_items`¶

Find the most similar items to a given item ID based on latent factors.

Computes cosine similarity between the specified item's latent vector and all other item vectors in the item_factors matrix.

from rusket._internal.similarity import similar_items

similar_items(model: 'SupportsItemFactors', item_id: 'int', n: 'int' = 5) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
model	SupportsItemFactors	A fitted model instance with an `item_factors` property.
item_id	int	The internal integer index of the target item.
n	int	Number of most similar items to return.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, cosine_similarities)` sorted in descending order.

`find_substitutes`¶

Substitute/cannibalizing products via negative association rules.

Items with high individual support but low co-occurrence (lift < 1.0) likely cannibalize each other.

from rusket._internal.analytics import find_substitutes

find_substitutes(rules_df: 'pd.DataFrame', max_lift: 'float' = 0.8) -> 'pd.DataFrame'

Parameters

Parameter	Type	Description
rules_df		DataFrame output from `rusket.association_rules`.
max_lift		Upper bound for lift; lift < 1.0 implies negative correlation.

Returns

Name	Type	Description
pd.DataFrame sorted ascending by lift (most severe cannibalization first).

`customer_saturation`¶

Customer saturation by unique items/categories bought, split into deciles.

from rusket._internal.analytics import customer_saturation

customer_saturation(df: 'pd.DataFrame', user_col: 'str', category_col: 'str | None' = None, item_col: 'str | None' = None) -> 'pd.DataFrame'

Parameters

Parameter	Type	Description
df		Interaction DataFrame.
user_col		Column identifying the user.
category_col		Category column (optional; at least one of category/item required).
item_col		Item column (optional).

Returns

Name	Type	Description
pd.DataFrame with `unique_count`, `saturation_pct`, and `decile` columns.

`export_item_factors`¶

Exports latent item factors as a DataFrame for Vector DBs.

This format is ideal for ingesting into FAISS, Pinecone, or Qdrant for Retrieval-Augmented Generation (RAG) and semantic search.

from rusket.export.factors import export_item_factors

export_item_factors(model: 'SupportsItemFactors', include_labels: 'bool' = True, normalize: 'bool' = False, format: 'str' = 'pandas') -> 'Any'

Parameters

Parameter	Type	Description
model	SupportsItemFactors	A fitted model instance with an `item_factors` property.
include_labels	bool, default=True	Whether to include the string item labels (if available from the model's fitting method).
normalize	bool, default=False	Whether to L2-normalize the factors before export.
format	str, default="pandas"	The DataFrame format to return. One of "pandas", "polars", or "spark".

Returns

Name	Type	Description
Any		A DataFrame where each row is an item with columns `item_id`, optionally `item_label`, and `vector` (a dense 1-D numpy array of the item's latent factors).

Examples

>>> model = rusket.ALS(factors=32).fit(interactions)
>>> df = rusket.export_item_factors(model)
>>> # Ingest into FAISS / Pinecone / Qdrant
>>> vectors = np.stack(df["vector"].values)

`PCA`¶

Principal Component Analysis (PCA).

Linear dimensionality reduction using Singular Value Decomposition of the centred data, computed entirely in Rust via the faer crate.

from rusket.viz.pca import PCA

PCA(n_components: 'int' = 2, svd_solver: 'str' = 'auto') -> 'None'

Parameters

Parameter	Type	Description
n_components	int	Number of principal components to keep.
Attributes (available after `fit()`)
----------
components_	np.ndarray	Principal axes in feature space, shape `(n_components, n_features)`.
explained_variance_	np.ndarray	Variance explained per component (uses `n - 1` degrees of freedom).
explained_variance_ratio_	np.ndarray	Fraction of total variance explained per component.
singular_values_	np.ndarray	Singular values corresponding to each component.
mean_	np.ndarray	Per-feature empirical mean estimated from the training data.
n_components_	int	Number of components that were actually fitted (may be less than requested if `n_components > min(n_samples, n_features)`).

Examples

>>> import numpy as np
>>> import rusket
>>> X = np.random.default_rng(42).standard_normal((100, 10)).astype(np.float32)
>>> pca = rusket.PCA(n_components=3)
>>> pca.fit(X)
PCA(n_components=3)
>>> pca.transform(X).shape
(100, 3)
>>> pca.explained_variance_ratio_.sum()  # close to fraction of total
0.4...

`PCA.fit`¶

Fit PCA on the data matrix X.

from rusket.viz.pca import PCA.fit

PCA.fit(X: 'npt.NDArray[Any]') -> 'PCA'

Parameters

Parameter	Type	Description
X	array-like of shape (n_samples, n_features)	Training data.

Returns

Name	Type	Description
self

`PCA.fit_transform`¶

Fit the model with X and apply dimensionality reduction.

from rusket.viz.pca import PCA.fit_transform

PCA.fit_transform(X: 'npt.NDArray[Any]') -> 'npt.NDArray[np.float32]'

Parameters

Parameter	Type	Description
X	array-like of shape (n_samples, n_features)

Returns

Name	Type	Description
X_new	ndarray of shape (n_samples, n_components)

`PCA.transform`¶

Apply dimensionality reduction to X.

from rusket.viz.pca import PCA.transform

PCA.transform(X: 'npt.NDArray[Any]') -> 'npt.NDArray[np.float32]'

Parameters

Parameter	Type	Description
X	array-like of shape (n_samples, n_features)

Returns

Name	Type	Description
X_new	ndarray of shape (n_samples, n_components)

`Pipeline`¶

Multi-stage recommendation pipeline.

Composes multiple recommendation models into a retrieve → rerank → filter funnel, following the architecture used by production recommendation systems at Twitter/X, YouTube, and Spotify.

from rusket.evaluation.pipeline import Pipeline

Pipeline(retrieve: 'Any | list[Any] | None' = None, rerank: 'Any | None' = None, rules: 'Any | list[Any] | None' = None, filter: 'Callable[[list[Any], list[float]], tuple[list[Any], list[float]]] | None' = None, merge_strategy: "Literal['max', 'mean', 'sum']" = 'max') -> 'None'

Parameters

Parameter	Type	Description
retrieve	list or single model	One or more `ImplicitRecommender` instances used for candidate generation. Each model's `recommend_items()` is called and results are merged.
rerank	model, optional	An `ImplicitRecommender` used to re-score the merged candidate set. Typically a heavier model (e.g. BPR or LightGCN) that produces higher-quality rankings on a smaller candidate pool.
rules	model or list, optional	One or more `RuleBasedRecommender` instances. Rules are evaluated for the user's history and injected into the candidate set after re-ranking, with an artificially boosted score (e.g., +1,000,000) to ensure they always surface at the very top of the final recommendations.
filter	callable, optional	A function `(item_ids, scores) -> (filtered_ids, filtered_scores)` applied at the very end. Use for block lists, category restrictions, recency filters, NSFW removal, etc.
merge_strategy	{'max', 'mean', 'sum'}, default='max'	How to combine scores when multiple retrievers return the same item.

Examples

>>> pipeline = Pipeline(
...     retrieve=[als, item_knn],
...     rerank=bpr,
...     rules=my_curated_rules,
...     filter=lambda ids, sc: (
...         [i for i in ids if i not in blocked_set],
...         [s for i, s in zip(ids, sc) if i not in blocked_set],
...     ),
... )
>>> items, scores = pipeline.recommend(user_id=42, n=10)

`Pipeline.recommend`¶

Run the full pipeline for a single user.

from rusket.evaluation.pipeline import Pipeline.recommend

Pipeline.recommend(user_id: 'int | Any', n: 'int' = 10, exclude_seen: 'bool' = True, retrieve_k: 'int | None' = None) -> 'tuple[np.ndarray, np.ndarray]'

Parameters

Parameter	Type	Description
user_id	int or any	The user to generate recommendations for.
n	int, default=10	Number of final items to return.
exclude_seen	bool, default=True	Whether to exclude items the user has already interacted with.
retrieve_k	int, optional	Number of candidates per retriever. Defaults to `n * 10` to produce a wide candidate pool for re-ranking.

Returns

Name	Type	Description
tuple[np.ndarray, np.ndarray]		`(item_ids, scores)` arrays, sorted by descending score.

`Pipeline.recommend_batch`¶

Batch recommendations for multiple users.

Uses the Rust-accelerated fast path when all models expose user_factors / item_factors and share the same user indexing. Falls back to the Python per-user loop otherwise.

from rusket.evaluation.pipeline import Pipeline.recommend_batch

Pipeline.recommend_batch(user_ids: 'list[int | Any] | np.ndarray | None' = None, n: 'int' = 10, exclude_seen: 'bool' = True, retrieve_k: 'int | None' = None, format: 'str' = 'pandas') -> 'Any'

Parameters

Parameter	Type	Description
user_ids	list or array, optional	Users to score. If None, uses all users from the first retriever.
n	int, default=10	Items per user.
exclude_seen	bool, default=True	Whether to exclude items users have already interacted with.
retrieve_k	int, optional	Candidates per retriever (default: `n * 10`).
format	str, default='pandas'	Output format: `'pandas'`, `'polars'`, or `'records'`.

Returns

Name	Type	Description
DataFrame or list of dicts		Columns: `user_id`, `item_ids`, `scores`.

Visualization (`rusket.viz`)¶

Graph and visualization utilities. Requires networkx (pip install networkx).

`rusket.viz.to_networkx`¶

Convert a Rusket association rules DataFrame into a NetworkX Directed Graph.

Nodes represent individual items. Directed edges represent rules (antecedent → consequent). Edge weights are set by the edge_attr parameter (typically lift or confidence).

This is extremely useful for running community detection algorithms (e.g., Louvain, Girvan-Newman) to automatically discover product clusters, or for visualising cross-selling patterns as a force-directed graph.

from rusket.viz.plots import rusket.viz.to_networkx

rusket.viz.to_networkx(rules_df: 'pd.DataFrame', source_col: 'str' = 'antecedents', target_col: 'str' = 'consequents', edge_attr: 'str' = 'lift') -> 'networkx.DiGraph'

Parameters

Parameter	Type	Description
rules_df	pd.DataFrame	A Pandas DataFrame generated by `rusket.association_rules()`.
source_col	str, default='antecedents'	Column name containing antecedents (graph edge sources).
target_col	str, default='consequents'	Column name containing consequents (graph edge targets).
edge_attr	str, default='lift'	The metric to use as edge weight/thickness.

Returns

Name	Type	Description
networkx.DiGraph		A directed graph of the association rules. If `rules_df` is empty, returns an empty `DiGraph`.

Notes Requires the networkx package (pip install networkx). When multiple rules produce the same directed edge, only the highest-weight rule is retained.

Examples

>>> import rusket
>>> G = rusket.viz.to_networkx(rules_df, edge_attr="lift")
>>> # Community detection with networkx
>>> import networkx.algorithms.community as nx_comm
>>> communities = nx_comm.greedy_modularity_communities(G.to_undirected())

Distributed Spark API (`rusket.spark`)¶

All functions in rusket.spark distribute computation across PySpark partitions using Apache Arrow (zero-copy) for maximum throughput.

`rusket.spark.mine_grouped`¶

Distribute Market Basket Analysis across PySpark partitions.

This function groups a PySpark DataFrame by group_col and applies rusket.mine to each group concurrently across the cluster.

It assumes the input PySpark DataFrame is formatted like a dense boolean matrix (One-Hot Encoded) per group, where rows are transactions.

from rusket.integrations.spark import rusket.spark.mine_grouped

rusket.spark.mine_grouped(df: 'Any', group_col: 'str', min_support: 'float' = 0.5, max_len: 'int | None' = None, method: 'str' = 'fpgrowth', use_colnames: 'bool' = True) -> 'Any'

Parameters

Parameter	Type	Description
df		The input `pyspark.sql.DataFrame`.
group_col		The column to group by (e.g. `store_id`).
min_support		Minimum support threshold.
max_len		Maximum itemset length.
method		Algorithm to use: 'fpgrowth', or 'eclat'.
use_colnames		If True, returns item names instead of column indices. Must be True for PySpark `applyInArrow` schema consistency.

Returns

Name	Type	Description
pyspark.sql.DataFrame		A PySpark DataFrame containing: - `group_col` - `support` (float) - `itemsets` (array of strings)

`rusket.spark.rules_grouped`¶

Distribute Association Rule Mining across PySpark partitions.

This takes the frequent itemsets DataFrame (output of mine_grouped) and applies association_rules uniformly across the groups.

from rusket.integrations.spark import rusket.spark.rules_grouped

rusket.spark.rules_grouped(df: 'Any', group_col: 'str', num_itemsets: 'dict[Any, int] | int', metric: 'str' = 'confidence', min_threshold: 'float' = 0.8) -> 'Any'

Parameters

Parameter	Type	Description
df		The PySpark `DataFrame` containing frequent itemsets.
group_col		The column to group by.
num_itemsets		A dictionary mapping group IDs to their total transaction count, or a single integer if all groups have the same number of transactions.
metric		The metric to filter by (e.g. "confidence", "lift").
min_threshold		The minimal threshold for the evaluation metric.

Returns

Name	Type	Description
pyspark.sql.DataFrame		A DataFrame containing antecedents, consequents, and all rule metrics, prepended with the `group_col`.

`rusket.spark.prefixspan_grouped`¶

Distribute Sequential Pattern Mining (PrefixSpan) across PySpark partitions.

This function groups a PySpark DataFrame by group_col and applies PrefixSpan.from_transactions to each group concurrently across the cluster.

from rusket.integrations.spark import rusket.spark.prefixspan_grouped

rusket.spark.prefixspan_grouped(df: 'Any', group_col: 'str', user_col: 'str', time_col: 'str', item_col: 'str', min_support: 'int' = 1, max_len: 'int | None' = None) -> 'Any'

Parameters

Parameter	Type	Description
df		The input `pyspark.sql.DataFrame`.
group_col		The column to group by (e.g. `store_id`).
user_col		The column identifying the sequence within each group (e.g., `user_id` or `session_id`).
time_col		The column used for ordering events within a sequence.
item_col		The column containing the items.
min_support		The minimum absolute support (number of sequences a pattern must appear in).
max_len		Maximum length of the sequential patterns to mine.

Returns

Name	Type	Description
pyspark.sql.DataFrame		A PySpark DataFrame containing: - `group_col` - `support` (long/int64) - `sequence` (array of strings)

`rusket.spark.hupm_grouped`¶

Distribute High-Utility Pattern Mining (HUPM) across PySpark partitions.

This function groups a PySpark DataFrame by group_col and applies HUPM.from_transactions to each group concurrently across the cluster.

from rusket.integrations.spark import rusket.spark.hupm_grouped

rusket.spark.hupm_grouped(df: 'Any', group_col: 'str', transaction_col: 'str', item_col: 'str', utility_col: 'str', min_utility: 'float', max_len: 'int | None' = None) -> 'Any'

Parameters

Parameter	Type	Description
df		The input `pyspark.sql.DataFrame`.
group_col		The column to group by (e.g. `store_id`).
transaction_col		The column identifying the transaction within each group.
item_col		The column containing the numeric item IDs.
utility_col		The column containing the numeric utility (e.g., profit) of the item in the transaction.
min_utility		The minimum total utility required to consider a pattern "high-utility".
max_len		Maximum length of the itemsets to mine.

Returns

Name	Type	Description
pyspark.sql.DataFrame		A PySpark DataFrame containing: - `group_col` - `utility` (double/float64) - `itemset` (array of longs/int64)

`rusket.spark.recommend_batches`¶

Distribute Batch Recommendations across PySpark partitions.

This function uses mapInArrow to process partitions of users concurrently, applying a pre-fitted Recommender (or ALS) to each chunk.

from rusket.integrations.spark import rusket.spark.recommend_batches

rusket.spark.recommend_batches(df: 'Any', model: 'Any', user_col: 'str' = 'user_id', k: 'int' = 5) -> 'Any'

Parameters

Parameter	Type	Description
df		The PySpark `DataFrame` containing user histories (must contain `user_col`).
model		The pre-trained `Recommender` or `ALS` model instance to use for scoring.
user_col		The column identifying the user.
k		The number of top recommendations to return per user.

Returns

Name	Type	Description
pyspark.sql.DataFrame		A DataFrame with two columns: - `user_col` - `recommended_items` (array of longs/int64)

`rusket.spark.to_spark`¶

Convert a Pandas or Polars DataFrame into a PySpark DataFrame.

from rusket.integrations.spark import rusket.spark.to_spark

rusket.spark.to_spark(spark_session: 'Any', df: 'Any') -> 'Any'

Parameters

Parameter	Type	Description
spark_session		The active PySpark `SparkSession`.
df		The `pd.DataFrame` or `pl.DataFrame` to convert.

Returns

Name	Type	Description
pyspark.sql.DataFrame		The resulting PySpark DataFrame.

API Reference¶

Functional API¶

mine¶

fpgrowth¶

eclat¶

association_rules¶

prefixspan¶

hupm¶

sequences_from_event_log¶

mine_hupm¶

mine_duckdb¶

mine_spark¶

from_transactions¶

from_transactions_csr¶

from_pandas¶

from_polars¶

from_spark¶

from_arrow¶

evaluate¶

train_test_split¶

leave_one_out_split¶

pca¶

pca2¶

pca3¶

OOP Mining API¶

FPGrowth¶

FPGrowth.mine¶

Eclat¶

Eclat.mine¶

PrefixSpan¶

PrefixSpan.mine¶

HUPM¶

HUPM.mine¶

FPMiner¶

FPMiner.add_arrow_batch¶

FPMiner.add_chunk¶

FPMiner.fit¶

FPMiner.mine¶

FPMiner.predict¶

FPMiner.reset¶

FIN¶

FIN.mine¶

LCM¶

LCM.mine¶

RuleMinerMixin — Shared Miner Interface¶

RuleMinerMixin.association_rules¶

RuleMinerMixin.recommend_items¶

RuleMinerMixin._invalidate_rules_cache¶

Recommenders¶

ALS¶

ALS.batch_recommend¶

ALS.build_ann_index¶

ALS.fit¶

ALS.recalculate_user¶

ALS.recommend_items¶

ALS.recommend_users¶

eALS¶

BPR¶

BPR.fit¶

BPR.recommend_items¶

FM¶

FM.fit¶

FM.predict¶

FM.predict_proba¶

Recommender¶

Recommender.predict_next_chunk¶

Recommender.recommend_for_cart¶

Recommender.recommend_for_user¶

NextBestAction¶

NextBestAction.predict_next_chunk¶

NextBestAction.recommend_for_cart¶

NextBestAction.recommend_for_user¶

EASE¶

EASE.fit¶

EASE.recommend_items¶

ItemKNN¶

ItemKNN.fit¶

ItemKNN.recommend_items¶

FPMC¶

FPMC.fit¶

`mine`¶

`fpgrowth`¶

`eclat`¶

`association_rules`¶

`prefixspan`¶

`hupm`¶

`sequences_from_event_log`¶

`mine_hupm`¶

`mine_duckdb`¶

`mine_spark`¶

`from_transactions`¶

`from_transactions_csr`¶

`from_pandas`¶

`from_polars`¶

`from_spark`¶

`from_arrow`¶

`evaluate`¶

`train_test_split`¶

`leave_one_out_split`¶

`pca`¶

`pca2`¶

`pca3`¶

`FPGrowth`¶

`FPGrowth.mine`¶

`Eclat`¶

`Eclat.mine`¶

`PrefixSpan`¶

`PrefixSpan.mine`¶

`HUPM`¶

`HUPM.mine`¶

`FPMiner`¶

`FPMiner.add_arrow_batch`¶

`FPMiner.add_chunk`¶

`FPMiner.fit`¶

`FPMiner.mine`¶

`FPMiner.predict`¶

`FPMiner.reset`¶

`FIN`¶

`FIN.mine`¶

`LCM`¶

`LCM.mine`¶

`RuleMinerMixin` — Shared Miner Interface¶

`RuleMinerMixin.association_rules`¶

`RuleMinerMixin.recommend_items`¶

`RuleMinerMixin._invalidate_rules_cache`¶

`ALS`¶

`ALS.batch_recommend`¶

`ALS.build_ann_index`¶

`ALS.fit`¶

`ALS.recalculate_user`¶

`ALS.recommend_items`¶

`ALS.recommend_users`¶

`eALS`¶

`BPR`¶

`BPR.fit`¶

`BPR.recommend_items`¶

`FM`¶

`FM.fit`¶

`FM.predict`¶

`FM.predict_proba`¶

`Recommender`¶

`Recommender.predict_next_chunk`¶

`Recommender.recommend_for_cart`¶

`Recommender.recommend_for_user`¶

`NextBestAction`¶

`NextBestAction.predict_next_chunk`¶

`NextBestAction.recommend_for_cart`¶

`NextBestAction.recommend_for_user`¶

`EASE`¶

`EASE.fit`¶

`EASE.recommend_items`¶

`ItemKNN`¶

`ItemKNN.fit`¶

`ItemKNN.recommend_items`¶

`FPMC`¶

`FPMC.fit`¶

`FPMC.recommend_items`¶

`SVD`¶

`SVD.batch_recommend`¶

`SVD.fit`¶