The agent loop
LLMAgent._drive is the core execution loop behind every LLMAgent.run(). It's a ReAct-style loop (think → tool call → observe → reply) with parallel tool dispatch, a handoff short-circuit, and six hook fire points. This page documents what happens each turn — useful when debugging hook ordering, tool-call routing, or subagent handoffs.
One turn at a glance
Key invariants
Per-run history isolation. Every
run()owns its ownmessageslist even when multiple agents share one LLM. ParallelLLMAgent.run()calls never touch each other's history — the only serialisation point isLLM._inference_lockinsidecomplete().Handoff-as-return-value. A synthetic
handoff_to_<agent>tool returns aHandoffsentinel; the loop transfers control to the target without a second LLM hop on the router. Matches OpenAI Agents SDK semantics (edgevox/agents/base.py).Interrupt pre-emption at three points. The loop checks
ctx.should_stop()(a) before each hop, (b) between parallel tool dispatches, and (c) — viastop_eventthreaded into llama-cpp — between sampled tokens.Six fire points, in fixed order.
ON_RUN_START → BEFORE_LLM → AFTER_LLM → (BEFORE_TOOL → dispatch → AFTER_TOOL)* → ON_RUN_END. TheBEFORE_TOOL/AFTER_TOOLpair fires per tool in parallel; the others are per-turn. Hooks see payloads defined inhooks.Hop budget. Default
max_tool_hops=3. Hitting it ends the turn with the lastcontentas the fallback reply — no runaway loops.
Typed context plumbing
LLMAgent.run() publishes the running tool registry and LLM onto ctx via typed fields (not magic scratchpad keys):
@dataclass
class AgentContext:
...
tool_registry: ToolRegistry | None = None # set during run()
llm: LLM | None = None # set during run()
hook_state: dict[int, dict[str, Any]] = field(default_factory=dict)
state: dict[str, Any] = field(default_factory=dict) # user scratchHooks that need the LLM (e.g. ContextCompactionHook for summarisation, TokenBudgetHook for exact token counts) read ctx.llm. Hooks that need per-instance state (fingerprint counters, retry budgets) write to ctx.hook_state[id(self)]. The legacy ctx.state["__tool_registry__"] / ["__llm__"] keys still work for one release but are deprecated.
Parallel tool dispatch
When the LLM emits N tool calls in one response, _dispatch_batch runs them on a ThreadPoolExecutor(max_workers=min(N, 8)). Order of results is preserved. A hook that returns end_turn at BEFORE_TOOL or AFTER_TOOL short-circuits the whole batch; the first such HookResult wins.
Skill dispatch (long-running goal-oriented tasks) runs synchronously inside the same worker slot but polls ctx.should_stop() every 50 ms so a barge-in cancels the skill cleanly.
Tool-call parsing
llm.complete() returns either a structured tool_calls field (chat templates that natively emit OpenAI-shaped function calls) or free-form content (most SLMs). For the free-form case the parser chain runs against the raw content first — which recovers tool calls emitted inside <think>...</think> blocks (Qwen3 et al.) — then falls back to the <think>-stripped content. The user-facing reply is always derived from the stripped text so chain-of-thought never reaches TTS. See tool-calling.
Handoff path
The subagent runs with a fresh Session (clean history) but the same deps, bus, hooks (ctx-level), blackboard, memory, interrupt, artifacts. Typed tool_registry / llm on the subagent ctx are installed by the target's own run() — not pre-seeded by the router.
Streaming
run_stream() currently yields the final reply as a single chunk. Multi-hop flows don't stream tokens because we need to buffer the full response before knowing whether it contains tool calls. A streaming path for final-hop text — interrupt-polled between token batches — is on the roadmap; it'll keep the multi-hop path buffered.
Debugging
- Attach
TimingHook()to measure per-fire-point latency. - Attach
EchoingHook()to trace every fire point + payload. - Attach
AuditLogHook("/tmp/audit.jsonl")to record turns for offline replay. - Compose
default_slm_hooks()fromedgevox.llm.hooks_slmto layer loop detection, echoed-payload substitution, and schema-retry on small models.
See also
hooks— authoring hooks; payload shapes per fire point.interrupt— barge-in + cancel-token semantics.tool-calling— parser chain and grammar-constrained decoding.