Filtering & Sorting Search Results
rustfuzz provides a Meilisearch-compatible filtering and sorting API that lets you narrow down search results by metadata attributes — with filtering executed at the Rust level for maximum performance.
The API uses a fluent builder pattern, so you can chain operations naturally:
results = (
bm25
.filter('brand = "Apple" AND price > 500')
.sort("price:asc")
.get_top_n("iphone pro max", n=10)
)
Setup
Filtering and sorting require metadata to be attached to your documents. The easiest way is using rustfuzz.Document:
from rustfuzz import Document
from rustfuzz.search import BM25
docs = [
Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple", "category": "phone", "price": 1199, "in_stock": True, "year": 2024}),
Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "category": "phone", "price": 1299, "in_stock": True, "year": 2024}),
Document("Google Pixel 8 Pro", {"brand": "Google", "category": "phone", "price": 699, "in_stock": False, "year": 2023}),
Document("Apple MacBook Pro M3 Max", {"brand": "Apple", "category": "laptop", "price": 2499, "in_stock": True, "year": 2024}),
Document("Samsung Galaxy Tab S9", {"brand": "Samsung", "category": "tablet", "price": 799, "in_stock": True, "year": 2023}),
]
bm25 = BM25(docs)
Works with all search engines: BM25, BM25L, BM25Plus, BM25T, and HybridSearch.
Filtering
Basic Filter
# Only Apple products
results = bm25.filter('brand = "Apple"').get_top_n("pro", n=5)
for text, score, meta in results:
print(f" {text} — ${meta['price']}")
Filter Syntax Reference
The filter language mirrors the Meilisearch filter syntax:
| Operator | Example | Description |
|---|---|---|
= |
brand = "Apple" |
Equality |
!= |
brand != "Apple" |
Inequality |
> |
price > 1000 |
Greater than |
>= |
price >= 1000 |
Greater than or equal |
< |
price < 500 |
Less than |
<= |
price <= 500 |
Less than or equal |
TO |
price 100 TO 500 |
Inclusive range |
IN |
brand IN ["Apple", "Google"] |
Value in list |
EXISTS |
brand EXISTS |
Attribute exists |
IS NULL |
discount IS NULL |
Value is null |
IS EMPTY |
tags IS EMPTY |
Empty string or list |
AND |
brand = "Apple" AND price > 500 |
Logical AND |
OR |
brand = "Apple" OR brand = "Google" |
Logical OR |
NOT |
NOT brand = "Apple" |
Logical negation |
() |
(brand = "Apple" OR brand = "Google") AND price > 500 |
Grouping |
Nested Attributes (Dot Notation)
Access nested metadata with dot notation:
metadata = [
{"specs": {"weight_g": 200, "battery_mah": 4000}},
{"specs": {"weight_g": 350, "battery_mah": 5000}},
]
results = bm25.filter("specs.weight_g < 300").get_top_n("phone", n=5)
Combining Filters
Chain multiple .filter() calls — they're combined with AND:
results = (
bm25
.filter('brand = "Samsung"')
.filter("price < 1000")
.filter("in_stock = true")
.get_top_n("galaxy", n=10)
)
Or use a single expression:
results = bm25.filter(
'(brand = "Apple" OR brand = "Samsung") AND price 500 TO 2000 AND in_stock = true'
).get_top_n("pro", n=10)
Sorting
Basic Sort
# Sort by price ascending
results = bm25.sort("price:asc").get_top_n("phone", n=10)
# Sort by price descending
results = bm25.sort("price:desc").get_top_n("phone", n=10)
Multi-Key Sort
# Sort by brand ascending, then by price descending within each brand
results = bm25.sort(["brand:asc", "price:desc"]).get_top_n("phone", n=10)
You can also use a comma-separated string:
Missing Values
Documents with missing sort attributes are always placed at the end of results, regardless of sort direction.
Fluent Builder API
The .filter() and .sort() methods return a SearchQuery builder that accumulates operations lazily. You can terminate the chain with any search method:
Terminal Methods
| Method | Description |
|---|---|
.match(query, n) |
Search with filter/sort (BM25 + HybridSearch) |
.get_top_n(query, n) |
BM25 / Hybrid top-N search |
.get_top_n_rrf(query, n) |
BM25 + fuzzy RRF search |
.get_top_n_fuzzy(query, n) |
BM25 + fuzzy hybrid |
.collect() |
Execute a deferred .search() call |
Using .match() (recommended)
.match() is the simplest terminal method — it executes immediately:
# Filter → sort → match (executes immediately)
results = (
bm25
.filter('brand = "Apple" AND price > 500')
.sort("price:asc")
.match("pro max", n=10)
)
Lazy Execution with .search() + .collect()
For advanced use cases, .search() is lazy and requires .collect():
# Build the query lazily, execute with .collect()
results = (
bm25
.filter('category IN ["phone", "tablet"]')
.filter("in_stock = true")
.sort(["price:asc", "rating:desc"])
.search("pro max", n=20)
.collect()
)
HybridSearch Integration
Filter and sort work seamlessly with HybridSearch (BM25 + Fuzzy + Dense embeddings):
from rustfuzz import Document
from rustfuzz.search import HybridSearch
docs = [
Document("Apple iPhone 15 Pro Max", {"brand": "Apple", "price": 1199}),
Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
Document("Google Pixel 8 Pro", {"brand": "Google", "price": 699}),
]
# Provide dense embeddings for your corpus (e.g. from EmbedAnything or OpenAI)
embeddings = [
[0.9, 0.1, 0.0], # Apple
[0.1, 0.8, 0.1], # Samsung
[0.0, 0.2, 0.8], # Google
]
hs = HybridSearch(docs, embeddings=embeddings)
# Provide the embedding for your search query
query_emb = [0.85, 0.15, 0.0] # Embedding for "iphone pro"
# Filter + sort + semantic search (executes immediately)
results = (
hs
.filter('brand = "Apple"')
.sort("price:asc")
.match("iphone pro", n=5, query_embedding=query_emb)
)
Performance
Filtering is implemented at the Rust level:
- Python evaluates the filter expression against metadata to create a boolean mask
- The mask is passed to Rust's
get_top_n_filtered()which zeroes out scores for excluded documents - Rust's optimised heap selection skips masked documents entirely
This means filtering adds near-zero overhead — the dominant cost is still the BM25 scoring, which runs in Rust with Rayon parallelism.
DataFrame Integration
Build a filtered index directly from a DataFrame:
import polars as pl
df = pl.DataFrame({
"product": ["iPhone 15", "Galaxy S24", "Pixel 8"],
"brand": ["Apple", "Samsung", "Google"],
"price": [1199, 1299, 699],
})
bm25 = BM25.from_column(df, "product", metadata_columns=["brand", "price"])
# Filter by metadata extracted from the DataFrame
results = bm25.filter("price > 1000").get_top_n("phone", n=5)