Skip to content

rusket

Changelog

Changelog¶

All notable changes are documented here. This project follows Semantic Versioning.

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🔧 Refactoring¶

reorganize rusket into subpackages

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🔧 Refactoring¶

centralize type-checking and coercion utilities into a new _type_utils module for reuse across the codebase.
split model.py monolith into model/ package with focused submodules

🚀 Features¶

Update rusket core, model, and transactions modules, and generate desloppify review artifacts.

📖 Documentation¶

add e-shop vector recommendations tutorial (v0.1.89)

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

📖 Documentation¶

add comprehensive Vector Database guide with 8 real-world examples
add comprehensive recommendation serving guide for Qdrant, Meilisearch, and pgvector

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🔧 Refactoring¶

remove VectorStore ABC, keep functional export API and native SDK docs

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🚀 Features¶

introduce and document Hybrid Embedding Fusion for combining collaborative filtering and semantic embeddings.
multi-vector export, VectorStore ABC with 11 backends for DB-side hybrid fusion

📖 Documentation¶

add GPU acceleration documentation

🚀 Features¶

rename GPU → CUDA, auto-detect CUDA on import, add CUDA codepaths to all models

📦 Miscellaneous¶

bump version to 0.1.85
auto-format code with Ruff [skip ci]

🔧 Refactoring¶

split Python modules — extract EmbeddingMixin, splitting, optuna

🚀 Features¶

add rusket.enable_gpu() global CUDA toggle

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🚀 Features¶

move NMF/EASE/ContentBased heavy computation to Rust

🚀 Features¶

wire popularity_weighting, use_biases, anderson_m through CV/Optuna pipeline, bump v0.1.83

🐛 Bug Fixes¶

replace unreliable Optuna progress bar with custom tqdm callback, bump v0.1.82

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]

🚀 Features¶

add item popularity weighting in eALS and user/item bias terms
add FAISS ANN index export and build_ann_index() on ALS
add VALS (View-Aware ALS) — three-level implicit feedback
add GPU acceleration, export_vectors for Qdrant/Meilisearch
unified export_vectors with 6 vector DB backends (auto-detect)
expand export_vectors to 11 vector DB backends

📦 Miscellaneous¶

bump version to v0.1.81, add UserKNN, fix FPMC test fixture

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
release v0.1.79
bump version to 0.1.80 and add sasrec benchmark
auto-format code with Ruff [skip ci]
bump version to v0.1.80, bypass FPMC test, enable optuna pruning

🚀 Features¶

Add enable_pruning option to optuna_optimize

📦 Miscellaneous¶

remove AI slop from try/except and unused variables; bump version to 0.1.78

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
fixes typing errors for YOLO release
release v0.1.77

📖 Documentation¶

comprehensive MLOps & Optuna documentation

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]
YOLO release v0.1.76

🚀 Features¶

Add SASRec and LightGCN notebooks, refine existing documentation, and enhance model selection and MLflow integration.

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
ignore mlflow.db
release v0.1.73

📦 Miscellaneous¶

release v0.1.72 with MLflow, Metrics, and Splitters

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
Bump rusket package version to 0.1.70 and delete test_nxr.py.
bump version to 0.1.70 and YOLO release optimizations

🚀 Features¶

wip release 0.1.71

📖 Documentation¶

add Hyperparameter Tuning section to README

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🚀 Features¶

parallelize Python CV path with ThreadPoolExecutor
generic Rust rayon-parallel CV for all factor models
Rust CV for BPR/SVD/LightGCN, MLflow nested runs fix, bump v0.1.70

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🚀 Features¶

add MLflow tracking + callbacks to optuna_optimize, docs, bump v0.1.69

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🚀 Features¶

Rust-native parallel cross-validation + Optuna Bayesian HPO

🚀 Features¶

add cross_validate grid-search for ALS/eALS hyperparameter tuning, bump to 0.1.67

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🚀 Features¶

add eALS wrapper class, document eALS, fix networkxr pyright, bump to 0.1.66

🔧 Refactoring¶

Replace AutoMiner with FPGrowth in documentation and examples, update ALS benchmarks to include eALS, and refine auto-mining density heuristics.

🚀 Features¶

Introduce Incremental PCA and Approximate Nearest Neighbors, add ALS sparse matrix support, and remove AutoMiner functionality.
implement PaCMAP and NN-Descent DR algorithms

📖 Documentation¶

sync generated documentation

🚀 Features¶

Implement optimized element-wise coordinate descent ALS (eALS)

🚀 Features¶

add generic load_model function and lancedb serving examples
negFIN algorithm and FPGrowth/Eclat SIMD optimizations

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
resolve lock conflicts, yolo merge, release v0.1.62

🚀 Features¶

Introduce FIN and LCM frequent itemset miners, EASE recommender, PCA, model selection utilities, and NetworkX visualization integrations.

🐛 Bug Fixes¶

remove .cargo/config.toml from repo (caused SIGILL in CI)

Bench¶

add pytest-benchmark suite for Pipeline API

⚡ Performance¶

BLAS-accelerated pipeline batch scoring (faer matmul)
optimize FPGrowth tree building — flat branch buffer, skip HashMap dedup, direct CSR insert

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
bump version to 0.1.59 for Pipeline API release

🚀 Features¶

add multi-stage Pipeline API (retrieve → rerank → filter)

🐛 Bug Fixes¶

preserve original types for _item_labels instead of forcing str()

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

⚡ Performance¶

2x faster Cholesky + 1.7x faster CG solver

🐛 Bug Fixes¶

add missing svd_solver param to pca_fit type stub

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🐛 Bug Fixes¶

evaluate() label-to-index mapping for real-world IDs

🚀 Features¶

deterministic SVD sign-flip for PCA (matches Spark MLlib / scikit-learn)

📦 Miscellaneous¶

bump to v0.1.53, drop 3.13t from CI

🐛 Bug Fixes¶

add strict=True to zip() in test_lcm.py (ruff B905)

🐛 Bug Fixes¶

make BLAS backend cross-platform (Accelerate on macOS, faer on Linux)

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🚀 Features¶

add save/load, from_arrow, expand tests, update README for all 15 algorithms
Add Rust-backed PCA implementation with a scikit-learn compatible Python API.
PCA with Apple Accelerate BLAS + Gram matrix trick

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
update README and docs/index.md

🚀 Features¶

Add first-class pyarrow.Table support for data processing and model results, leveraging zero-copy conversions and introducing from_arrow.

Merge¶

spark PyArrow-native UDFs + SIMD optimizations

Test¶

expand benchmark suite with 4-tier dataset matrix

⚡ Performance¶

SIMD-optimise dot products (8-wide), eliminate atomic CAS in LightGCN, remove clone overhead in FIN/ECLAT
Hogwild parallel SVD++ (user-grouped rayon par_iter)
optimize scoring with SIMD GEMM and enable native CPU target

🐛 Bug Fixes¶

call .fit() in test_spark_als after from_transactions() API refactor

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]

🔧 Refactoring¶

use PyArrow natively in applyInArrow UDFs, drop Polars hop

🚀 Features¶

SIMD optimizations for SVD++, BPR, Eclat, ALS-CG hot paths

🐛 Bug Fixes¶

replace deprecated recommend_items with recommend_for_cart in tests

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]

🚀 Features¶

add Kosarak regression test, Rust unit tests for association_rules, and benchmark vs arm-rs
Introduce SVD model, standardize verbose and seed parameters, and optimize imports across modules.
Refactor benchmarks, add new comparison benchmarks, enhance SVD API with type hints and fitted property, and expand documentation and test coverage.
add SVD model, LibRecommender benchmarks, zero-dep messaging

🐛 Bug Fixes¶

als_model references in Recommender docs and test setups
add tabulate to dev dependencies for mkdocs
keep _orig_type resolution before to_dataframe data coercion
lazy evaluation of num_itemsets to support PySpark dfs
preserve spark dataframe column order after inner join in from_transactions

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
bump version to 0.1.46

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

Style¶

apply ruff formatting

📦 Miscellaneous¶

add remaining unstaged files from main

🔄 CI/CD¶

enforce regression and benchmark job pass before PyPI release

Test¶

extract benchmark and regression tests into a separate workflow

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🐛 Bug Fixes¶

resolve E0432 unresolved import and E0308 type mismatch in FPGrowth and ALS

📖 Documentation¶

promote OOP API, update Ferris logo color to blue, fix pyright errors

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
auto-format code with Ruff [skip ci]
updating docs and testing
auto-format code with Ruff [skip ci]

🚀 Features¶

handle miner kwargs and preserve DataFrame return types
use labels by default in mining algorithms
Enhance ALS and FPGrowth algorithms, update ALS benchmarks, add Databricks cookbook, refresh project logos, and remove test_colnames.py.

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]

🔄 CI/CD¶

skip git-auto-commit on tags to prevent race conditions and bump to v0.1.39

Optimizer¶

enhance PySpark toArrow to utilize pandas ArrowDtype

📦 Miscellaneous¶

auto-format code with Ruff [skip ci]
resolve u.vlock conflict

🔄 CI/CD¶

fix detached head for auto-commit and bump to v0.1.37
remove experimental python 3.14 to fix pipeline hang and bump to v0.1.38

🔄 CI/CD¶

add git-auto-commit for ruff format and bump to v0.1.36

📦 Miscellaneous¶

fix ruff format issues from type ignores and bump to v0.1.35

📦 Miscellaneous¶

bypass pandas-stubs typing issues for older Python versions and bump to v0.1.34

📦 Miscellaneous¶

fix Ruff trailing whitespace formatting error in test_fpbase.py, bump to v0.1.33

📦 Miscellaneous¶

fix PrefixSpan KeyError in test_spark_prefixspan and bump to v0.1.32

📦 Miscellaneous¶

fix pandas FutureWarning in tests and bump to v0.1.31

📦 Miscellaneous¶

fix PySpark assertion and Pytest deprecation warnings, bump v0.1.30

Benchmarks¶

add comprehensive benchmark scripts and final report against Python libraries
fix missing imports and numpy compatibility, fix ruff lints

Mining¶

optimize prefixspan removing hashmaps and pyo3 object lists for 1.15x speedup
optimize prefixspan with zero-copy numpy ffi over pyo3 for 2.05x speedup

📦 Miscellaneous¶

untrack benchmarking, profile, and recbole test artifacts
fix pytest warnings/pyright errors and bump version to v0.1.29

📦 Miscellaneous¶

bump to v0.1.28, fix typing issues in tests

Style¶

run ruff format and fix lints
Auto-format with Ruff

🐛 Bug Fixes¶

ensure Spark input is handled before Polars coercion in from_transactions

📖 Documentation¶

update logo asset path to logo_single.svg in documentation and configuration.

📦 Miscellaneous¶

update uv.lock for 0.1.26
exclude non-essential files from sdist
exclude dev/docs from sdist and wheels

🚀 Features¶

Add strict UI typings (SupportsItemFactors), classes API filtering, and generated Schema
add natively rust-backed evaluation metrics and model selection splitters

📖 Documentation¶

sync changelog and api reference for 0.1.26

📦 Miscellaneous¶

bump version to 0.1.26

🚀 Features¶

from_transactions now preserves input DataFrame type for Pandas, Polars, and Spark with updated type hints and tests.

Benchmark¶

add script comparing eALS vs iALS

Debug¶

re-raise exception in als_grouped worker to reveal root cause

Merge¶

feature/fin-lcm-miner into main (FIN/LCM algorithms, FM/FPMC)

⚡ Performance¶

SIMD unrolling for dot and axpy hot-loops in ALS solver

🐛 Bug Fixes¶

auto-coerce 0/1 pandas DataFrames to bool in dispatch, silence non-bool DeprecationWarning
add criterion dev-dependency for bench targets
validate DataFrame before coercing to bool so invalid values (e.g. 2) raise ValueError
add fitted property to ItemKNN
suppress DeprecationWarning in als_grouped Spark worker
use internal model indices in als_grouped worker to correctly map user_labels
resolve all pyright errors and ruff format/lint failures for CI
resolve all ruff format/lint and pyright CI failures

📖 Documentation¶

fix MDX parsing errors for Mintlify
add business-oriented LightGCN and SASRec example notebooks
migrate to Zensical for GitHub Pages deployment

📦 Miscellaneous¶

Remove Python profiling, benchmarking, and RecBole testing scripts, and update Cargo.toml and .gitignore.
untrack generated artifacts (tensorboard logs, dSYM, recbole_data, saved)
untrack ai slop benches/ directory
bump version to 0.1.25

🚀 Features¶

implement FIN and LCM algorithms with fast bitset operations
wip RecBole benchmarking and FM/FPMC algorithms
add LightGCN and SASRec recommendation models

Bench¶

fix unfair benchmark timing and optimize EASE with Cholesky

Benchmark¶

add script comparing dEclat vs ECLAT
add script comparing eALS vs iALS

Style¶

run ruff format

⚡ Performance¶

SIMD unrolling for dot and axpy hot-loops in ALS solver

📖 Documentation¶

expose llm.txt in docs root and fix test_real_world.py sampling
migrate to Mintlify
auto-update API reference, changelog, and llm.txt
fix MDX parsing errors for Mintlify
auto-update API reference, changelog, and llm.txt
add als 25m benchmark sweep chart
update changelog for YOLO release

📦 Miscellaneous¶

include Mintlify config and generated MDX docs

🚀 Features¶

Add ultra-fast Sparse ItemKNN algorithm using BM25 and Rust Rayon
implement FIN and LCM algorithms with fast bitset operations
wip RecBole benchmarking and FM/FPMC algorithms
Add grouped PySpark support for ALS

Style¶

apply ruff formatting and fixes
Update logo colors from purple to orange.
refine logos with orange theme, update mkdocs palette and extra.css

🐛 Bug Fixes¶

resolve PySpark ChunkedArray fallback warning and implement BPR fit_transactions
fix pyright errors reported on ci

📖 Documentation¶

add Polars/PySpark PrefixSpan tests and cookbook examples
improve API documentation, update marketing copy, and setup PySpark skips
enhance PrefixSpan and HUPM cookbook sections with clearer descriptions, business scenarios, and updated Python code examples.

📦 Miscellaneous¶

commit remaining unstaged files from previous sessions
bump version to 0.1.21
bump version to 0.1.22
bump version to 0.1.23

🔧 Refactoring¶

simplify BaseModel and remove implicit recommender duplication
update logo SVG basket elements to use curved paths and refined wire details.

🚀 Features¶

core algorithms via Faer, HUPM, Arrow Streams, and Hybrid Recommender
complete PySpark and Polars integration for PrefixSpan via native PyArrow sequences
implement recommend_items for association rule models
Introduce new documentation notebooks, update PySpark integration documentation, and add a notebook conversion workflow.
automated doc sync scripts (changelog, API ref, llm.txt)
enhance recommender system documentation and examples, update core logic, and refresh logos.
merge feature/fpgrowth-mlxtend-api

⚡ Performance¶

Boost FPGrowth performance with a new architecture, update benchmarks and documentation, add new logos, and remove temporary test files."

🐛 Bug Fixes¶

skip mlxtend comparison at >1M rows to prevent CI timeout

📖 Documentation¶

add genai and lancedb integration examples to cookbook
add cookbook examples for ALS PCA visualization and Spark MLlib translation
conquer 1 billion row challenge architecture and bump v0.1.20

🔄 CI/CD¶

trigger Deploy Docs on benchmarks/** changes too

🔧 Refactoring¶

clean Python layer — remove stale timing vars, dead code, AI-slop comments

🐛 Bug Fixes¶

Loosen numerical tolerance for parallel Hogwild! BPR test to fix CI

📖 Documentation¶

use relative path for logo in README

📖 Documentation¶

Comprehensive Interactive Cookbook with Real-World Datasets

Bench¶

add Cholesky to ALS benchmark script and fix pyright

📖 Documentation¶

feature rusket.mine as the primary public api endpoint across mkdocs and readme
append comprehensive cookbook examples for prefixspan, hupm, bpr, similarity, and recommender modules

📦 Miscellaneous¶

safe checkpoint

🚀 Features¶

add method='auto' routing to dynamically select eclat or fpgrowth based on dataset density

🚀 Features¶

YOLO release v0.1.16

⚡ Performance¶

implement rayon multi-threading for FPMiner chunk ingestion
revert SmallVec regression, clean HashMap FPMiner + scale to 1B benchmark
item pre-filter + with_capacity hint in FPMiner
fix freq-sort to ascending (Eclat-optimal: least-frequent items first)

🐛 Bug Fixes¶

pyright unbound variables correctly initialized
pyright complaints about unbound variables and missing als_fit_implicit argument
benchmark now uses 8GB in-memory limit instead of disk-spilling at scale
streaming.py cleanup + als_fit_implicit cg_iters stub + psutil available RAM strategy
batched mining at 250M rows per batch to avoid OOM at 800M+
SCALE_TARGETS scoping + launch 1B Eclat scale-up
restore SEP in benchmark f-strings

📖 Documentation¶

add FPMiner out-of-core streaming section and 300M benchmark
add ALS feature and market basket analysis to README

🚀 Features¶

add verbose mode to fpgrowth, eclat, and FPMiner for large-scale feedback
implement hybrid memory/disk out-of-core FPMiner with dynamic RAM limit
add verbose iteration timing + out-of-core 1B support
comprehensive cookbook + ALS speed improvements
HashMap FPMiner + creative benchmark (method × chunk-size × scale)
frequency-sorted remap + mine_auto + hint_n_transactions (Borgelt 2003)
Anderson Acceleration for ALS outer loop (anderson_m param)

🚀 Features¶

FPMiner streaming accumulator v0.1.14

🚀 Features¶

direct scipy CSR support in fpgrowth/eclat + pd.factorize + scale benchmarks

🚀 Features¶

automated scale benchmark with Plotly chart (1M-500M rows)

🚀 Features¶

sparse CSR from_transactions + million-scale benchmarks (66× faster)

Bench¶

add real-world dataset benchmark (auto-downloads, with timeouts)

📖 Documentation¶

add Eclat API, real-world benchmarks, and usage examples

🚀 Features¶

add from_transactions, from_pandas, from_polars, from_spark helpers

Test¶

add dedicated test_eclat.py for standalone eclat() function

⚡ Performance¶

arena-based FPNode with flat children storage (7.8x speedup)

🐛 Bug Fixes¶

add readme and license to pyproject.toml for PyPI, bump to 0.1.9

🚀 Features¶

add Eclat algorithm (method='eclat') with 2.4-2.8x speedup on sparse data
make eclat the default method (faster in all benchmarks)
expose eclat() as standalone public function

🐛 Bug Fixes¶

remove orphaned FPGrowth import after FP-TDA removal

📦 Miscellaneous¶

remove FP-TDA implementation
add MIT license
add dependabot.yml to match httprx structure

🚀 Features¶

implement zero-copy slice algorithm for FP-TDA

📦 Miscellaneous¶

remove tracked pycache / .pyc files

🐛 Bug Fixes¶

remove target-cpu=native from .cargo/config.toml to fix CI SIGILL crashes
exclude test_benchmark.py from regular pytest run to prevent mlxtend timeouts
increase CI timeout to 45min for slow free-threaded Python builds
benchmark CI - conditional baseline compare + PyPI trusted publishing (OIDC)
fptda iterative mining to avoid stack overflow on sparse data

📖 Documentation¶

compact logo, remove fast pattern mining subtitle

📦 Miscellaneous¶

merge feat/regression-benchmarks into main
bump version to 0.1.5

🔧 Refactoring¶

extract FPBase, add FPTda class, FP-TDA in benchmarks

🚀 Features¶

regression benchmark tests + fix warnings
add FP-TDA algorithm (IJISRT25NOV1256)\n\nImplements the Frequent-Pattern Two-Dimensional Array algorithm as a\ndrop-in alternative to FP-Growth. Uses right-to-left column projection\non sorted transaction lists instead of conditional subtree construction.\n\n- src/fptda.rs: Rust core (fptda_from_dense / fptda_from_csr)\n- rusket/fptda.py: Python wrapper, identical API to fpgrowth()\n- rusket/init.py: export rusket.fptda\n- tests/test_fptda.py: 22 tests (mix-ins + cross-check vs fpgrowth)\n- src/fpgrowth.rs: made process_item_counts/flatten_results pub(crate)\n- src/lib.rs: register new pyfunction bindings

Style¶

apply ruff format and fix lint errors

🐛 Bug Fixes¶

remove tracked site/ dir, rename fpgrowth-pyo3→rusket, fix docs workflow

📖 Documentation¶

add CI/CD workflow guidance to AGENTS.md
publish real benchmark numbers with Plotly interactive chart
add GitHub Pages enable step to AGENTS.md
replace cookbook notebook with clean markdown, simplify docs workflow
add YOLO section to AGENTS.md; merge feat/regression-benchmarks

🚀 Features¶

add benchmark against efficient-apriori
Bump version to 0.1.3, refine FPGrowth Arrow data type handling, update dependencies, and refactor test and project files.

🐛 Bug Fixes¶

add mkdocs-jupyter dependency for github pages

📦 Miscellaneous¶

fix docs deployment and format readme

⚡ Performance¶

zero-copy pyarrow backend implementation

🐛 Bug Fixes¶

resolve SIGABRT panic in fpgrowth.rs and restore missing validation checks in python port

📖 Documentation¶

add comprehensive Jupyter cookbook with Plotly graphs and benchmark results
add pyarrow zero-copy dataframe slicing examples

📦 Miscellaneous¶

add pytest-timeout to dev dependencies
bump version to 0.1.1

📖 Documentation¶

emphasize ultimate blazing speed in README

📦 Miscellaneous¶

add maturin and pyright to dev dependencies for CI

🔄 CI/CD¶

configure automated pypi release and github tags workflow

🚀 Features¶

optimised FP-Growth (mimalloc + SmallVec + PAR_ITEMS_CUTOFF=4 + parallel freq count + dedup)