Skip to content

Configuration

All parameters can be set via constructor arguments or environment variables.

Core parameters

Parameter Env var Default Description
top_k RAG_TOP_K 10 Final result count returned to the LLM
rerank_top_n RAG_RERANK_TOP_N 5 Candidates passed to the reranker
retrieval_factor RAG_RETRIEVAL_FACTOR 4 Over-retrieval multiplier (top_k × factor fetched before reranking)
max_iter RAG_MAX_ITER 3 Maximum retrieve-rewrite cycles
semantic_ratio RAG_SEMANTIC_RATIO 0.5 Hybrid search semantic weight (0 = pure BM25, 1 = pure vector)
fusion RAG_FUSION "rrf" Score fusion: "rrf" (Reciprocal Rank Fusion) or "dbsf" (Distribution-Based)
hyde_min_words RAG_HYDE_MIN_WORDS 8 Minimum query word count to trigger HyDE
verbose RAG_VERBOSE 0 Set to 1 to log pipeline steps

Constructor reference

from retrievalagent import Agent

rag = Agent(
    index="docs",               # collection / index name
    backend=backend,            # SearchBackend instance (default: InMemoryBackend)
    collections=None,           # dict[str, SearchBackend] for multi-collection mode
    collection_descriptions={}, # human-readable descriptions for routing LLM
    llm=None,                   # utility LLM — synonyms, rewrite, quality-gate (default: gpt-5.4-mini)
    gen_llm=None,               # generation LLM — final cited answer (default: gpt-5.5)
    reranker=None,              # reranker instance or alias string
    top_k=10,
    rerank_top_n=5,
    retrieval_factor=4,
    max_iter=3,
    semantic_ratio=0.5,
    fusion="rrf",
    instructions="",            # extra text appended to the system prompt
    embed_fn=None,              # callable (str) -> list[float]
    boost_fn=None,              # callable (doc_dict) -> float for business-signal boosting
    filter=None,                # always-on Meilisearch-style filter expression
    hyde_min_words=8,
    hyde_style_hint="",
    auto_strategy=True,         # sample docs at init and auto-configure
    group_field="",
    name_field="",
    verbose=False,
)

Model routing

Three LLM slots serve different roles:

Slot Default Used for
llm gpt-5.4-mini synonym expansion, spell-correction, rewrite, quality-gate
gen_llm gpt-5.5 final cited answer
grader_llm inherits gen_llm answer grader; set separately via config.grader_model

Use a cheap model for llm — it fires on every query. gen_llm only fires once per answer.

rag = Agent.from_model(
    "openai:gpt-5.4-mini",   # llm — utility calls
    index="docs",
    gen_model="openai:gpt-5.5",  # gen_llm — generation
)

Pipeline

The pipeline runs a lean parallel fan-out on every query:

prepare → [keyword_search ∥ synonym_search] → evaluate
        → quality_gate → [semantic_backup →] merge_rerank → generate
  • keyword_search — pure BM25 on state.query
  • synonym_search — single LLM call produces spell-correction + synonym/alias expansion + negation extraction, then fans out BM25 searches in parallel across all expanded terms
  • quality_gate — if max(keyword_score, synonym_score) < threshold, triggers semantic_backup (full vector search) before merging
  • merge_rerank — deduplicates all pools, applies MMR diversity (lam=0.7), reranks, applies boosts

Spell-correction is treated as an additional parallel BM25 term — the original state.query is never overwritten:

query "troceknbeton"
  → keyword_search: BM25("troceknbeton")
  → synonym_search: BM25("trockenbeton") + BM25("Sichtbeton") + …
  → merge_rerank: fuse + dedup + MMR + rerank

Negative filter extraction

The synonym node extracts negated terms from the query and stores them in RAGState.excluded_terms. The merge node post-filters any doc whose text contains an excluded term.

"cola aber nicht zero" → excluded_terms=["zero"]
                       → docs mentioning "zero" removed after merge

Works for any negated concept (not brand-specific).

init_agent parameters

init_agent is a convenience wrapper that accepts string aliases and builds the backend, LLM, and reranker for you:

from retrievalagent import init_agent

rag = init_agent(
    index="docs",               # collection name (omit when using collections=)
    collections=None,           # list[str] or dict[str, description]
    model="openai:gpt-5.4",      # "provider:model" string
    gen_model=None,             # separate generation model (defaults to model)
    backend="memory",           # backend alias or SearchBackend instance
    backend_url=None,           # backend server URL
    backend_kwargs={},          # extra kwargs passed to the backend constructor
    reranker=None,              # reranker alias or instance
    reranker_model=None,        # model name for the reranker
    reranker_kwargs={},         # extra kwargs for the reranker constructor
    embed_fn=None,
    auto_strategy=True,
    **agent_kwargs,             # any Agent constructor kwarg
)

LLM cache

retrievalagent can cache LLM calls (preprocessing, HyDE, quality gate) to avoid redundant API calls during development:

Env var Default Description
RAG_CACHE 0 Set to 1 to enable
RAG_CACHE_DIR (none) Path for disk-based JSON cache
RAG_CACHE_PG_URL (none) PostgreSQL connection string for persistent cache

Disk cache example:

RAG_CACHE=1 RAG_CACHE_DIR=./.cache python my_app.py