Index

Blazing-fast fuzzy string matching — implemented entirely in Rust.
Built entirely by AI. Designed to beat RapidFuzz.

The Story

rustfuzz started as an experiment: can an AI agent, starting from scratch, build a fuzzy-matching library that outperforms RapidFuzz — one of the best-optimised C++ string-matching libraries in the Python ecosystem?

No human wrote the Rust. No human tuned the algorithm parameters. The AI drove every iteration, read every benchmark result, and decided what to rewrite next.

The answer the AI kept coming back to: Rust + PyO3 + tight Python-boundary design.

The Development Loop

Every feature and optimisation went through the same cycle:

flowchart LR
    R["🔍 Research<br>Profiler output<br>& algorithm gaps"]
    B["🦀 Build<br>Rust core<br>via PyO3"]
    T["✅ Test<br>All tests must pass<br>before proceeding"]
    BM["📊 Benchmark<br>vs RapidFuzz<br>& record results"]
    RP["🔁 Repeat<br>Find the next<br>bottleneck"]

    R --> B --> T --> BM --> RP --> R

    style R fill:#6366f1,color:#fff,stroke:none
    style B fill:#a855f7,color:#fff,stroke:none
    style T fill:#ef4444,color:#fff,stroke:none
    style BM fill:#22c55e,color:#fff,stroke:none
    style RP fill:#f59e0b,color:#fff,stroke:none

Each iteration asked:

Research — where is the remaining Python overhead? What does the profiler show?
Build — move that hot path into Rust. Eliminate copies, reduce allocations, avoid iterator protocol overhead.
Test — the full test suite must pass before proceeding. No broken correctness, no skipped edge cases.
Benchmark — run head-to-head comparisons vs RapidFuzz. Numbers don't lie.
Repeat — the next bottleneck is always waiting.

Why This Matters

RapidFuzz is exceptional — its C++ core, SIMD intrinsics, and decades of optimisation make it a formidable target. The goal of this project was never to dismiss it, but to prove that:

AI can drive non-trivial systems programming — not just generate boilerplate.
Rust + PyO3 can match C++ at the Python boundary — with the added safety guarantees Rust provides.
Iterative AI-driven optimisation works — each benchmark loop produced measurable gains.

Features


⚡ Blazing Fast	Core algorithms in Rust — no Python overhead, no GIL bottlenecks
🧠 Smart Matching	ratio, partial_ratio, token sort/set, Levenshtein, Jaro-Winkler, and more
🔒 Memory Safe	Rust's borrow checker — no segfaults, no buffer overflows
🐍 Pythonic API	Typed Python interface — `import rustfuzz.fuzz as fuzz` and go
📦 No Build Step	Pre-compiled wheels for Python 3.10–3.14 on Linux, macOS, and Windows
🏔️ Big Data Ready	Excels in 1 Billion Row Challenge benchmarks, crushing high-throughput tasks
🔍 3-Way Hybrid Search	BM25 + Fuzzy + Dense embeddings via RRF — 25ms at 1M docs, all in Rust
📄 Document Objects	First-class `Document(content, metadata)` + LangChain compatibility
🧩 Ecosystem Integrations	BM25, Hybrid Search, and LangChain Retrievers for Vector DBs
🎯 Retriever	Batteries-included SOTA search — auto-selects BM25, embeddings (OpenAI/Cohere/HF), and reranker

Installation

pip install rustfuzz
# or with uv:
uv pip install rustfuzz

Quick Example

import rustfuzz.fuzz as fuzz
from rustfuzz.distance import Levenshtein, JaroWinkler
from rustfuzz import process

# Similarity ratios
fuzz.ratio("hello world", "hello wrold")            # ~96.0
fuzz.partial_ratio("hello", "say hello world")      # 100.0
fuzz.token_sort_ratio("fuzzy wuzzy", "wuzzy fuzzy") # 100.0

# Edit distance
Levenshtein.distance("kitten", "sitting")           # 3
JaroWinkler.similarity("martha", "marhta")          # ~0.96

# Batch matching
process.extractOne("new york", ["New York", "Newark", "Los Angeles"])
# ('New York', 100.0, 0)

3-Way Hybrid Search

from rustfuzz.search import Document, HybridSearch

docs = [
    Document("Apple iPhone 15 Pro Max", {"brand": "Apple", "price": 1199}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
    Document("Google Pixel 8 Pro", {"brand": "Google", "price": 699}),
]

hs = HybridSearch(docs, embeddings=[[1, 0, 0], [0.9, 0.1, 0], [0.1, 0.9, 0]])

# Typo-tolerant + semantic search — all in Rust
results = hs.search("appel iphon", query_embedding=[1, 0, 0], n=1)
text, score, meta = results[0]
print(f"{text} — ${meta['price']}")
# Apple iPhone 15 Pro Max — $1199

Custom BM25 variants via fluent builder

You can seamlessly construct a HybridSearch model using any of the advanced BM25 variants (BM25L, BM25Plus, BM25T) via the .to_hybrid() builder method:

from rustfuzz.search import BM25L

results = (
    BM25L(docs, delta=0.5, b=0.8)
    .to_hybrid(embeddings=embeddings)
    .filter('brand = "Apple"')
    .match("iphone pro", n=10)
)

Cookbook Recipes 🧑‍🍳

Recipe	Description
Introduction	Get started — basic matching and terminology
Advanced Matching	Partial ratios, token sorts, score cutoffs
Benchmarks	Head-to-head speed comparisons vs RapidFuzz
Vector DB Hybrid Search	BM25 + dense embeddings with Qdrant, LanceDB, FAISS & more
LangChain Integration	Use rustfuzz as a LangChain Retriever
Real-World Examples	Entity resolution, deduplication & production patterns
Fuzzy Full Join	Multi-array fuzzy joins with MultiJoiner & RRF fusion
3-Way Hybrid Search	BM25 + Fuzzy + Dense via RRF — Document & LangChain support
EmbedAnything	Rust-native embeddings — dense + sparse, no PyTorch needed
Retriever	Batteries-included SOTA search — auto-selects BM25, embeddings & reranker

Start exploring from the navigation menu on the left!