Reranking
Reranking re-scores the initial retrieval candidates with a more powerful model before passing them to the LLM. retrievalagent fetches top_k × retrieval_factor documents, reranks them, and surfaces the top rerank_top_n.
Reranker aliases
| Alias | Class | Default model | Install |
|---|---|---|---|
"cohere" |
CohereReranker |
rerank-v3.5 |
retrievalagent[cohere] |
"huggingface" / "hf" |
HuggingFaceReranker |
cross-encoder/ms-marco-MiniLM-L-6-v2 |
retrievalagent[huggingface] |
"jina" |
JinaReranker |
jina-reranker-v2-base-multilingual |
retrievalagent[jina] |
"llm" |
LLMReranker |
(agent's LLM) | (built-in) |
"rerankers" |
RerankersReranker |
ColBERT, Flashrank, RankGPT… | retrievalagent[rerankers] |
"embed-anything" |
EmbedAnythingReranker |
jina-reranker-v1-turbo-en |
retrievalagent[embed-anything] |
Usage
from retrievalagent import init_agent
# Cohere
rag = init_agent("docs", model="openai:gpt-5.4", reranker="cohere")
# HuggingFace — runs locally, no API key
rag = init_agent("docs", model="openai:gpt-5.4", reranker="huggingface")
# HuggingFace with a multilingual model
rag = init_agent("docs", model="openai:gpt-5.4", reranker="huggingface",
reranker_model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1")
# Jina (uses JINA_API_KEY)
rag = init_agent("docs", model="openai:gpt-5.4", reranker="jina")
# ColBERT via the rerankers library
rag = init_agent("docs", model="openai:gpt-5.4", reranker="rerankers",
reranker_model="colbert-ir/colbertv2.0",
reranker_kwargs={"model_type": "colbert"})
# Pre-built instance
from retrievalagent import CohereReranker
rag = init_agent("docs", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))
Tuning
rag = init_agent(
"docs",
model="openai:gpt-5.4",
reranker="cohere",
top_k=10, # final result count returned to the LLM
rerank_top_n=5, # how many of top_k the reranker sees
retrieval_factor=4, # fetch top_k × retrieval_factor candidates first
)
With these defaults retrievalagent fetches 40 candidates (10 × 4), reranks all 40, then passes the top 5 to the quality gate and LLM. Increase retrieval_factor for better recall at the cost of reranker latency.
Expert reranker
A second, more expensive reranker run on a smaller candidate set for high-stakes queries:
from retrievalagent import Agent, CohereReranker
from retrievalagent.rerankers import HuggingFaceReranker
rag = Agent(
index="docs",
reranker=HuggingFaceReranker(), # fast first pass
expert_reranker=CohereReranker(), # precision second pass
expert_top_n=3, # only top-3 go to expert
expert_threshold=0.7, # only if first-pass confidence < 0.7
)