How It Works¶
retrAI implements a reinforcement-learning-inspired agent loop using LangGraph. Instead of searching a state space, it uses a large language model to reason about what needs to change and takes targeted code-editing actions until the goal is verified as achieved.
The Agent Loop¶
graph TD
S([START]) --> plan
plan["🧠 Plan<br/><small>LLM decides next actions</small>"]
plan -->|has tool calls| act
plan -->|no tool calls| evaluate
act["⚡ Act<br/><small>Execute tools</small>"]
act --> evaluate
evaluate["🎯 Evaluate<br/><small>Run goal.check()</small>"]
evaluate -->|"✅ achieved"| E([END])
evaluate -->|"🛑 max iterations"| E
evaluate -->|"continue + HITL"| human_check
evaluate -->|"🔄 continue"| plan
human_check["👤 Human Check<br/><small>Approve or abort</small>"]
human_check -->|approve| plan
human_check -->|abort| E
style plan fill:#7c3aed,color:#fff
style act fill:#2563eb,color:#fff
style evaluate fill:#059669,color:#fff
style human_check fill:#d97706,color:#fff
Nodes¶
| Node | Responsibility |
|---|---|
| plan | Calls the LLM with the full conversation history. Extracts tool calls from the response. Injects the goal's system prompt. |
| act | Executes each tool call (bash, file read/write, grep, etc.). Appends results to the message history. |
| evaluate | Calls goal.check(). Injects a status message. Increments iteration counter. Tracks token usage. |
| human_check | (HITL only) Interrupts graph execution via LangGraph's interrupt(). Waits for a human resume signal. |
Routing Logic¶
After each node, a router function decides the next step:
- After plan → has pending tool calls →
act, otherwise →evaluate - After evaluate → goal achieved →
END, max iterations →END, HITL →human_check, else →plan - After human_check → approved →
plan, aborted →END
State¶
The entire agent state is a single TypedDict that flows through the graph:
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
pending_tool_calls: list[ToolCall]
tool_results: list[ToolResult]
goal_achieved: bool
goal_reason: str
iteration: int
max_iterations: int
hitl_enabled: bool
model_name: str
cwd: str
run_id: str
total_tokens: int
estimated_cost_usd: float
failed_strategies: list[str]
consecutive_failures: int
Two things are injected via LangGraph's config["configurable"] (not stored in state):
goal— theGoalBaseinstance, used by theevaluatenodeevent_bus— theAsyncEventBus, used by all nodes to emit events
Tools Available to the Agent¶
The agent has a rich set of tools for code manipulation:
| Tool | Description |
|---|---|
bash_exec |
Run any shell command with configurable timeout |
file_read |
Read file contents |
file_write |
Write/overwrite files, creating directories as needed |
file_patch |
Apply targeted patches to files |
grep_search |
Search for patterns across files |
find_files |
List files matching a pattern |
git_diff |
Show unstaged changes |
run_pytest |
Run pytest with structured JSON report |
python_exec |
Execute Python code in a sandbox |
js_exec |
Execute JavaScript/TypeScript via Bun |
ml_train |
Train ML models in a sandbox |
sql_bench |
Benchmark SQL queries |
web_search |
Search the web for documentation |
visualize |
Generate charts and plots |
hypothesis_test |
Run statistical hypothesis tests |
Auto-context injection
On the first iteration, the agent automatically reads key project files (pyproject.toml, package.json, etc.) to build context before making changes.
Event System¶
Every agent action publishes a structured AgentEvent:
graph LR
E[AgentEvent] --> WS["🌐 WebSocket"]
E --> TUI["📟 TUI"]
E --> CLI["💻 CLI Logger"]
style E fill:#7c3aed,color:#fff
| Event Kind | When Emitted |
|---|---|
step_start |
A graph node begins execution |
tool_call |
The LLM requests a tool |
tool_result |
A tool execution completes |
llm_usage |
Token usage from an LLM call |
goal_check |
Goal evaluated |
human_check_required |
HITL gate reached |
iteration_complete |
Full iteration done |
run_end |
Run finished (achieved/failed) |
The AsyncEventBus uses per-subscriber asyncio.Queue objects for concurrent fan-out to all consumers.
Agent Memory¶
retrAI can persist learned strategies across runs. After each successful run, the agent extracts what worked:
# Stored in .retrai/memory.json
{
"strategies": [
{
"goal": "pytest",
"pattern": "When tests fail due to missing imports, check __init__.py",
"success_count": 3
}
]
}
On subsequent runs, high-confidence strategies are injected into the system prompt so the agent doesn't repeat mistakes.
HITL (Human-in-the-Loop)¶
With --hitl, the agent pauses after each evaluate step using LangGraph's interrupt() mechanism:
human_check_nodepublishes ahuman_check_requiredevent- The graph is suspended — persisted in
MemorySavercheckpointer - A human calls
POST /api/runs/{id}/resumewith{"decision": "approve"}or{"decision": "abort"} - The graph resumes from the checkpoint
This allows humans to review each iteration before the agent continues.