π E-Shop Vector Recommendations¶
Build a complete e-shop recommendation engine backed by a vector database. This tutorial shows four high-value patterns that every online store needs:
| Pattern | Signal | Use Case |
|---|---|---|
| Personalised "For You" | User β Items | Homepage carousel, email campaigns |
| Similar Items | Item β Items | "You might also like" on PDP |
| Also Bought | Cart β Items | Cross-sell widget, checkout upsells |
| Customer Lookalikes | User β Users β Items | Cold-start for new users, CRM segments |
All patterns follow the same workflow:
Synthetic E-Shop Data¶
We'll use a small synthetic dataset throughout. Replace it with your real purchase history and product catalogue.
import numpy as np
import pandas as pd
import rusket
# ββ Purchase history ββββββββββββββββββββββββββββββββββββββββββββββ
np.random.seed(42)
n_users, n_items = 500, 200
rows = []
for user_id in range(n_users):
# Each user buys 5β20 items
bought = np.random.choice(n_items, size=np.random.randint(5, 21), replace=False)
for item_id in bought:
rows.append({"user_id": user_id, "item_id": int(item_id)})
purchases = pd.DataFrame(rows)
# ββ Product catalogue βββββββββββββββββββββββββββββββββββββββββββββ
categories = ["Electronics", "Clothing", "Home & Kitchen", "Sports", "Books"]
catalog = pd.DataFrame({
"item_id": range(n_items),
"name": [f"Product {i}" for i in range(n_items)],
"category": np.random.choice(categories, n_items),
"price": np.round(np.random.uniform(5, 500, n_items), 2),
"in_stock": np.random.choice([True, True, True, False], n_items), # 75 % in stock
})
print(f"Purchases: {len(purchases):,} | Users: {n_users} | Items: {n_items}")
print(purchases.head())
Step 1 β Train the Recommender¶
ALS produces both user and item factor matrices. Both are needed for the patterns below.
als = (
rusket.ALS(factors=64, iterations=15, regularization=0.1)
.from_transactions(purchases, user_col="user_id", item_col="item_id")
.fit()
)
print(f"User factors : {als.user_factors.shape}") # (500, 64)
print(f"Item factors : {als.item_factors.shape}") # (200, 64)
Any latent-factor model works
BPR, SVD, NMF, or LightGCN all expose .user_factors / .item_factors.
Pick whichever gives you the best offline metrics.
Step 2 β Export to a Vector Database¶
We'll use Qdrant for the examples, but every pattern works with any supported backend.
Export item embeddings¶
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
# Item embeddings β used for "Similar Items" and "Also Bought"
rusket.export_vectors(
als.item_factors,
client=client,
collection_name="shop_items",
ids=catalog["item_id"].tolist(),
payloads=catalog.drop(columns="item_id").to_dict("records"),
)
Export user embeddings¶
# User embeddings β used for "For You" and "Customer Lookalikes"
rusket.export_vectors(
als.user_factors,
client=client,
collection_name="shop_users",
)
Pattern 1 β Personalised "For You"¶
"Show this user items they're most likely to buy."
Query the item collection with the user's embedding vector. Items closest to the user in latent space are the strongest recommendations.
def recommend_for_you(
user_id: int, n: int = 10, category: str | None = None
) -> list[dict]:
"""Homepage 'For You' carousel."""
from qdrant_client.models import Filter, FieldCondition, MatchValue
user_vector = als.user_factors[user_id].tolist()
filters = [FieldCondition(key="in_stock", match=MatchValue(value=True))]
if category:
filters.append(
FieldCondition(key="category", match=MatchValue(value=category))
)
results = client.query_points(
collection_name="shop_items",
query=user_vector,
query_filter=Filter(must=filters),
limit=n,
with_payload=True,
)
return [
{"item_id": r.id, "score": round(r.score, 4), **r.payload}
for r in results.points
]
# Try it
recs = recommend_for_you(user_id=0, n=5)
for r in recs:
print(f" {r['item_id']:>3} {r['name']:<15} {r['category']:<16} score={r['score']}")
Pattern 2 β Similar Items¶
"Customers who viewed this item also looked atβ¦"
Query the item collection with another item's embedding. Nearest neighbours in item space share buying patterns.
def similar_items(item_id: int, n: int = 5) -> list[dict]:
"""'You might also like' on the product detail page."""
item_vector = als.item_factors[item_id].tolist()
results = client.query_points(
collection_name="shop_items",
query=item_vector,
limit=n + 1, # +1 to exclude the query item itself
with_payload=True,
)
return [
{"item_id": r.id, "score": round(r.score, 4), **r.payload}
for r in results.points
if r.id != item_id
][:n]
similars = similar_items(item_id=42, n=5)
for s in similars:
print(f" {s['item_id']:>3} {s['name']:<15} {s['category']:<16} sim={s['score']}")
Pattern 3 β "Customers Who Bought X Also Bought Y"¶
Cross-sell widget on the cart or checkout page.
Idea: Average the embeddings of items currently in the cart to create a session vector, then search for the closest items that are not already in the cart.
import numpy as np
def also_bought(cart_item_ids: list[int], n: int = 5) -> list[dict]:
"""Cross-sell: items frequently co-purchased with the current cart."""
from qdrant_client.models import Filter, FieldCondition, MatchValue
# Build a centroid from all cart item embeddings
cart_vectors = np.array([als.item_factors[i] for i in cart_item_ids])
centroid = cart_vectors.mean(axis=0).tolist()
results = client.query_points(
collection_name="shop_items",
query=centroid,
query_filter=Filter(
must=[FieldCondition(key="in_stock", match=MatchValue(value=True))]
),
limit=n + len(cart_item_ids),
with_payload=True,
)
# Remove items already in the cart
return [
{"item_id": r.id, "score": round(r.score, 4), **r.payload}
for r in results.points
if r.id not in cart_item_ids
][:n]
# User has items 10 and 55 in cart
cross_sells = also_bought(cart_item_ids=[10, 55], n=5)
for c in cross_sells:
print(f" {c['item_id']:>3} {c['name']:<15} {c['category']:<16} rel={c['score']}")
Why centroid averaging works
ALS learns a latent space where items bought together are close. Averaging the cart vectors moves the query point toward the cluster of items that overlap with all cart members β a natural cross-sell signal.
Pattern 4 β Customer Lookalikes¶
"Find users with similar taste, then recommend what they bought."
This is invaluable for cold-start users (few purchases) and CRM segmentation ("find 1,000 users who look like our VIPs").
4a β Find similar customers¶
def find_similar_customers(user_id: int, n: int = 5) -> list[dict]:
"""Find users with the most similar purchase behaviour."""
user_vector = als.user_factors[user_id].tolist()
results = client.query_points(
collection_name="shop_users",
query=user_vector,
limit=n + 1,
with_payload=True,
)
return [
{"user_id": r.id, "score": round(r.score, 4)}
for r in results.points
if r.id != user_id
][:n]
lookalikes = find_similar_customers(user_id=0, n=5)
print("Users most similar to user 0:")
for l in lookalikes:
print(f" user {l['user_id']:>3} similarity={l['score']}")
4b β Recommend from lookalike purchases¶
def recommend_from_lookalikes(
user_id: int, n_neighbours: int = 10, n_recs: int = 10
) -> list[dict]:
"""Recommend items that similar users bought but this user hasn't."""
# Items this user already bought
user_items = set(
purchases.loc[purchases["user_id"] == user_id, "item_id"]
)
# Find lookalike users
neighbours = find_similar_customers(user_id, n=n_neighbours)
# Collect their purchases, weighted by similarity
from collections import Counter
item_scores: Counter = Counter()
for nb in neighbours:
nb_items = set(
purchases.loc[purchases["user_id"] == nb["user_id"], "item_id"]
)
for item_id in nb_items - user_items:
item_scores[item_id] += nb["score"]
# Return top-N
top_items = item_scores.most_common(n_recs)
results = []
for item_id, score in top_items:
row = catalog.loc[catalog["item_id"] == item_id].iloc[0]
results.append({
"item_id": item_id,
"name": row["name"],
"category": row["category"],
"score": round(score, 4),
})
return results
recs = recommend_from_lookalikes(user_id=0, n_recs=5)
for r in recs:
print(f" {r['item_id']:>3} {r['name']:<15} {r['category']:<16} score={r['score']}")
Putting It Together β FastAPI E-Shop Service¶
Wrap all four patterns into a production-ready API:
from fastapi import FastAPI, Query as Q
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue
import numpy as np
import rusket
app = FastAPI(title="E-Shop Recommendations")
client = QdrantClient("localhost", port=6333)
model = rusket.load_model("trained_als.pkl") # persist and reload
@app.get("/for-you/{user_id}")
async def for_you(user_id: int, n: int = Q(default=10, le=50)):
"""Personalised recommendations."""
vec = model.user_factors[user_id].tolist()
res = client.query_points(
collection_name="shop_items", query=vec,
query_filter=Filter(must=[
FieldCondition(key="in_stock", match=MatchValue(value=True))
]),
limit=n, with_payload=True,
)
return [{"id": r.id, "score": r.score, **r.payload} for r in res.points]
@app.get("/similar/{item_id}")
async def similar(item_id: int, n: int = Q(default=5, le=20)):
"""Items similar to the given item."""
vec = model.item_factors[item_id].tolist()
res = client.query_points(
collection_name="shop_items", query=vec,
limit=n + 1, with_payload=True,
)
return [
{"id": r.id, "score": r.score, **r.payload}
for r in res.points if r.id != item_id
][:n]
@app.post("/also-bought")
async def also_bought(cart: list[int], n: int = Q(default=5, le=20)):
"""Cross-sell based on current cart contents."""
vecs = np.array([model.item_factors[i] for i in cart])
centroid = vecs.mean(axis=0).tolist()
res = client.query_points(
collection_name="shop_items", query=centroid,
query_filter=Filter(must=[
FieldCondition(key="in_stock", match=MatchValue(value=True))
]),
limit=n + len(cart), with_payload=True,
)
return [
{"id": r.id, "score": r.score, **r.payload}
for r in res.points if r.id not in cart
][:n]
@app.get("/lookalikes/{user_id}")
async def lookalikes(user_id: int, n: int = Q(default=10, le=50)):
"""Find customers with similar taste."""
vec = model.user_factors[user_id].tolist()
res = client.query_points(
collection_name="shop_users", query=vec,
limit=n + 1, with_payload=True,
)
return [
{"id": r.id, "score": r.score}
for r in res.points if r.id != user_id
][:n]
Architecture Overview¶
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β Purchase Log ββββββΆβ rusket.ALS.fit() ββββββΆβ User & Item β
β (DataFrame) β β β β Factor Matrices β
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββ¬ββββββββββ
β
rusket.export_vectors() β
βββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββ
β Vector DB β
β ββββββββββββββββ β
β β shop_items β β β item embeddings + catalogue metadata
β ββββββββββββββββ β
β ββββββββββββββββ β
β β shop_users β β β user embeddings
β ββββββββββββββββ β
ββββββββββ¬ββββββββββ
β
Native SDK queries (Qdrant / pgvector / β¦)
β
ββββββββββββββββββΌβββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββββ ββββββββββββββββ
β For You β β Similar β β Also Bought β
β (userβ β β Items β β (cart avg β β
β items) β β (itemβ β β items) β
ββββββββββββ β items) β ββββββββββββββββ
ββββββββββββββ
Tips & Variations¶
Use BPR or LightGCN for implicit feedback
ALS works well, but rusket.BPR and rusket.LightGCN often yield
better embeddings for click/purchase data with no explicit ratings.
Hybrid embeddings for richer signals
Fuse CF vectors with text/image embeddings using
rusket.HybridEmbeddingIndex and export as multi-vectors.
See the Vector DB docs for details.
pgvector alternative
Every pattern above works identically with PostgreSQL + pgvector. Just replace the Qdrant SDK calls with SQL queries β see the "Also Bought" SQL example.
Refresh cadence
Retrain and re-export embeddings on a schedule (e.g. nightly). Vector DB queries keep working during the export via collection aliases.