Configuration
All parameters can be set via constructor arguments or environment variables.
Core parameters
| Parameter | Env var | Default | Description |
|---|---|---|---|
top_k |
RAG_TOP_K |
10 |
Final result count returned to the LLM |
rerank_top_n |
RAG_RERANK_TOP_N |
5 |
Candidates passed to the reranker |
retrieval_factor |
RAG_RETRIEVAL_FACTOR |
4 |
Over-retrieval multiplier (top_k × factor fetched before reranking) |
max_iter |
RAG_MAX_ITER |
3 |
Maximum retrieve-rewrite cycles |
semantic_ratio |
RAG_SEMANTIC_RATIO |
0.5 |
Hybrid search semantic weight (0 = pure BM25, 1 = pure vector) |
fusion |
RAG_FUSION |
"rrf" |
Score fusion: "rrf" (Reciprocal Rank Fusion) or "dbsf" (Distribution-Based) |
hyde_min_words |
RAG_HYDE_MIN_WORDS |
8 |
Minimum query word count to trigger HyDE |
verbose |
RAG_VERBOSE |
0 |
Set to 1 to log pipeline steps |
Constructor reference
from retrievalagent import Agent
rag = Agent(
index="docs", # collection / index name
backend=backend, # SearchBackend instance (default: InMemoryBackend)
collections=None, # dict[str, SearchBackend] for multi-collection mode
collection_descriptions={}, # human-readable descriptions for routing LLM
llm=None, # utility LLM — synonyms, rewrite, quality-gate (default: gpt-5.4-mini)
gen_llm=None, # generation LLM — final cited answer (default: gpt-5.5)
reranker=None, # reranker instance or alias string
top_k=10,
rerank_top_n=5,
retrieval_factor=4,
max_iter=3,
semantic_ratio=0.5,
fusion="rrf",
instructions="", # extra text appended to the system prompt
embed_fn=None, # callable (str) -> list[float]
boost_fn=None, # callable (doc_dict) -> float for business-signal boosting
filter=None, # always-on Meilisearch-style filter expression
hyde_min_words=8,
hyde_style_hint="",
auto_strategy=True, # sample docs at init and auto-configure
group_field="",
name_field="",
verbose=False,
)
Model routing
Three LLM slots serve different roles:
| Slot | Default | Used for |
|---|---|---|
llm |
gpt-5.4-mini |
synonym expansion, spell-correction, rewrite, quality-gate |
gen_llm |
gpt-5.5 |
final cited answer |
grader_llm |
inherits gen_llm |
answer grader; set separately via config.grader_model |
Use a cheap model for llm — it fires on every query. gen_llm only fires once per answer.
rag = Agent.from_model(
"openai:gpt-5.4-mini", # llm — utility calls
index="docs",
gen_model="openai:gpt-5.5", # gen_llm — generation
)
Pipeline
The pipeline runs a lean parallel fan-out on every query:
prepare → [keyword_search ∥ synonym_search] → evaluate
→ quality_gate → [semantic_backup →] merge_rerank → generate
keyword_search— pure BM25 onstate.querysynonym_search— single LLM call produces spell-correction + synonym/alias expansion + negation extraction, then fans out BM25 searches in parallel across all expanded termsquality_gate— ifmax(keyword_score, synonym_score) < threshold, triggerssemantic_backup(full vector search) before mergingmerge_rerank— deduplicates all pools, applies MMR diversity (lam=0.7), reranks, applies boosts
Spell-correction is treated as an additional parallel BM25 term — the original state.query is never overwritten:
query "troceknbeton"
→ keyword_search: BM25("troceknbeton")
→ synonym_search: BM25("trockenbeton") + BM25("Sichtbeton") + …
→ merge_rerank: fuse + dedup + MMR + rerank
Negative filter extraction
The synonym node extracts negated terms from the query and stores them in RAGState.excluded_terms. The merge node post-filters any doc whose text contains an excluded term.
Works for any negated concept (not brand-specific).
init_agent parameters
init_agent is a convenience wrapper that accepts string aliases and builds the backend, LLM, and reranker for you:
from retrievalagent import init_agent
rag = init_agent(
index="docs", # collection name (omit when using collections=)
collections=None, # list[str] or dict[str, description]
model="openai:gpt-5.4", # "provider:model" string
gen_model=None, # separate generation model (defaults to model)
backend="memory", # backend alias or SearchBackend instance
backend_url=None, # backend server URL
backend_kwargs={}, # extra kwargs passed to the backend constructor
reranker=None, # reranker alias or instance
reranker_model=None, # model name for the reranker
reranker_kwargs={}, # extra kwargs for the reranker constructor
embed_fn=None,
auto_strategy=True,
**agent_kwargs, # any Agent constructor kwarg
)
LLM cache
retrievalagent can cache LLM calls (preprocessing, HyDE, quality gate) to avoid redundant API calls during development:
| Env var | Default | Description |
|---|---|---|
RAG_CACHE |
0 |
Set to 1 to enable |
RAG_CACHE_DIR |
(none) | Path for disk-based JSON cache |
RAG_CACHE_PG_URL |
(none) | PostgreSQL connection string for persistent cache |
Disk cache example: