Configuration

All parameters can be set via constructor arguments or environment variables.

Core parameters

Parameter	Env var	Default	Description
`top_k`	`RAG_TOP_K`	`10`	Final result count returned to the LLM
`rerank_top_n`	`RAG_RERANK_TOP_N`	`5`	Candidates passed to the reranker
`retrieval_factor`	`RAG_RETRIEVAL_FACTOR`	`4`	Over-retrieval multiplier (`top_k × factor` fetched before reranking)
`max_iter`	`RAG_MAX_ITER`	`3`	Maximum retrieve-rewrite cycles
`semantic_ratio`	`RAG_SEMANTIC_RATIO`	`0.5`	Hybrid search semantic weight (0 = pure BM25, 1 = pure vector)
`fusion`	`RAG_FUSION`	`"rrf"`	Score fusion: `"rrf"` (Reciprocal Rank Fusion) or `"dbsf"` (Distribution-Based)
`hyde_min_words`	`RAG_HYDE_MIN_WORDS`	`8`	Minimum query word count to trigger HyDE
`verbose`	`RAG_VERBOSE`	`0`	Set to `1` to log pipeline steps

Constructor reference

from retrievalagent import Agent

rag = Agent(
    index="docs",               # collection / index name
    backend=backend,            # SearchBackend instance (default: InMemoryBackend)
    collections=None,           # dict[str, SearchBackend] for multi-collection mode
    collection_descriptions={}, # human-readable descriptions for routing LLM
    llm=None,                   # utility LLM — synonyms, rewrite, quality-gate (default: gpt-5.4-mini)
    gen_llm=None,               # generation LLM — final cited answer (default: gpt-5.5)
    reranker=None,              # reranker instance or alias string
    top_k=10,
    rerank_top_n=5,
    retrieval_factor=4,
    max_iter=3,
    semantic_ratio=0.5,
    fusion="rrf",
    instructions="",            # extra text appended to the system prompt
    embed_fn=None,              # callable (str) -> list[float]
    boost_fn=None,              # callable (doc_dict) -> float for business-signal boosting
    filter=None,                # always-on Meilisearch-style filter expression
    hyde_min_words=8,
    hyde_style_hint="",
    auto_strategy=True,         # sample docs at init and auto-configure
    group_field="",
    name_field="",
    verbose=False,
)

Model routing

Three LLM slots serve different roles:

Slot	Default	Used for
`llm`	`gpt-5.4-mini`	synonym expansion, spell-correction, rewrite, quality-gate
`gen_llm`	`gpt-5.5`	final cited answer
`grader_llm`	inherits `gen_llm`	answer grader; set separately via `config.grader_model`

Use a cheap model for llm — it fires on every query. gen_llm only fires once per answer.

rag = Agent.from_model(
    "openai:gpt-5.4-mini",   # llm — utility calls
    index="docs",
    gen_model="openai:gpt-5.5",  # gen_llm — generation
)

Pipeline

The pipeline runs a lean parallel fan-out on every query:

prepare → [keyword_search ∥ synonym_search] → evaluate
        → quality_gate → [semantic_backup →] merge_rerank → generate

keyword_search — pure BM25 on state.query
synonym_search — single LLM call produces spell-correction + synonym/alias expansion + negation extraction, then fans out BM25 searches in parallel across all expanded terms
quality_gate — if max(keyword_score, synonym_score) < threshold, triggers semantic_backup (full vector search) before merging
merge_rerank — deduplicates all pools, applies MMR diversity (lam=0.7), reranks, applies boosts

Spell-correction is treated as an additional parallel BM25 term — the original state.query is never overwritten:

query "troceknbeton"
  → keyword_search: BM25("troceknbeton")
  → synonym_search: BM25("trockenbeton") + BM25("Sichtbeton") + …
  → merge_rerank: fuse + dedup + MMR + rerank

Negative filter extraction

The synonym node extracts negated terms from the query and stores them in RAGState.excluded_terms. The merge node post-filters any doc whose text contains an excluded term.

"cola aber nicht zero" → excluded_terms=["zero"]
                       → docs mentioning "zero" removed after merge

Works for any negated concept (not brand-specific).

`init_agent` parameters

init_agent is a convenience wrapper that accepts string aliases and builds the backend, LLM, and reranker for you:

from retrievalagent import init_agent

rag = init_agent(
    index="docs",               # collection name (omit when using collections=)
    collections=None,           # list[str] or dict[str, description]
    model="openai:gpt-5.4",      # "provider:model" string
    gen_model=None,             # separate generation model (defaults to model)
    backend="memory",           # backend alias or SearchBackend instance
    backend_url=None,           # backend server URL
    backend_kwargs={},          # extra kwargs passed to the backend constructor
    reranker=None,              # reranker alias or instance
    reranker_model=None,        # model name for the reranker
    reranker_kwargs={},         # extra kwargs for the reranker constructor
    embed_fn=None,
    auto_strategy=True,
    **agent_kwargs,             # any Agent constructor kwarg
)

LLM cache

retrievalagent can cache LLM calls (preprocessing, HyDE, quality gate) to avoid redundant API calls during development:

Env var	Default	Description
`RAG_CACHE`	`0`	Set to `1` to enable
`RAG_CACHE_DIR`	(none)	Path for disk-based JSON cache
`RAG_CACHE_PG_URL`	(none)	PostgreSQL connection string for persistent cache

Disk cache example:

RAG_CACHE=1 RAG_CACHE_DIR=./.cache python my_app.py