Skip to content

Configuration — Turn features on and off

The one-sentence EdgeVox philosophy: everything is optional. Every STT backend, TTS voice, memory store, VAD classifier, echo-cancellation strategy, hook, and workflow is a plug-in. You decide which of them load, and the framework degrades cleanly when something is missing.

This page is the single place to look when you want to enable, swap, or disable a feature. Each section follows the same shape: what it is → how to turn it on → how to turn it off → alternatives.

Install what you need, nothing more

EdgeVox ships a minimal core (pip install edgevox) that boots with a working default for every layer. Optional capabilities live behind pip install 'edgevox[extra]' flags — none of them are auto-installed, so your environment stays lean.

bash
# Minimal install — voice pipeline + LLM + chess + agent framework
pip install edgevox

# Opt-in extras (combine any subset — all stackable)
pip install 'edgevox[gpu]'             # onnxruntime-gpu for CUDA STT / VAD
pip install 'edgevox[dtln]'            # neural echo cancellation
pip install 'edgevox[sim]'             # 2-D IR-SIM robot sim
pip install 'edgevox[sim-mujoco]'      # 3-D MuJoCo tabletop / humanoid
pip install 'edgevox[desktop]'         # RookApp PySide6 desktop app
pip install 'edgevox[voice-vad]'       # WebRTC VAD backend
pip install 'edgevox[memory-vec]'      # VectorMemoryStore via sqlite-vec
pip install 'edgevox[dev]'             # ruff, pytest, pre-commit

Nothing here forces extras on anyone else — the CI publishes wheels for the minimal install, and every runtime import of an optional feature is wrapped in a try: import … that surfaces a clear pip install 'edgevox[X]' hint if the dep is missing.


1. Speech-to-text backends

Two backends ship; more via the BaseSTT Protocol.

BackendBest forOpt-in
faster-whisperEnglish + 99 other languagesdefault, no extra install
sherpa-onnx (zipformer)Vietnamesedefault, no extra install

Turn on a specific backend:

python
from edgevox.stt import create_stt

stt = create_stt(language="vi", backend="sherpa")   # explicit
stt = create_stt(language="en")                      # let the language config pick

Disable STT entirely (text-mode agents): don't instantiate one. The LLMAgent has no STT dependency; text-mode examples (edgevox-agent robot-panda --text-mode) show the pattern.

Write your own — subclass BaseSTT, implement transcribe(audio, language) -> str, either inject directly into the agent or add a branch to create_stt().


2. Text-to-speech backends

Four backends ship.

BackendLanguagesVoice count
Kokoro (MIT)9 languages, 56 voicesdefault for English + 8 others
Piper (MIT)40+ languagesdefault for Vietnamese, German, Russian, Arabic, Indonesian
Supertonic (Apache-2)Koreandefault for ko
PyThaiTTS (Apache-2)Thaidefault for th

Swap voice / backend:

python
from edgevox.tts import create_tts

tts = create_tts(language="en", voice="af_bella")                  # Kokoro voice
tts = create_tts(language="en", voice="en_US-amy-medium", backend="piper")

Disable TTS entirely (headless agents): don't instantiate. The agent's reply string is still returned from agent.run(); you just don't synthesize.


3. LLM — any GGUF via llama-cpp-python

One backend, one class (edgevox.llm.LLM), any model. The model_path argument accepts a local .gguf, a HuggingFace-flavoured shorthand (hf:repo:file.gguf), or a path resolved by edgevox-setup.

python
from edgevox.llm import LLM

llm = LLM(model_path="gemma-3-4b-it-E2B-q4_k_m.gguf")
llm = LLM(model_path="hf:bartowski/Llama-3.2-1B-Instruct-GGUF:Llama-3.2-1B-Instruct-Q4_K_M.gguf")
llm = LLM(model_path="/abs/path/to/my-model.gguf", n_ctx=8192, n_gpu_layers=-1)

Swap by model only: change model_path. Swap backend entirely: not supported via the built-in LLM class — write your own that conforms to the chat_stream(...) / count_tokens(...) shape and pass it to LLMAgent.bind_llm(...).


4. Memory stores — three implementations

All three implement the same MemoryStore Protocol, so swapping is a one-line change.

ClassBackingWhen to pick it
JSONMemoryStoredebounced JSON fileprototyping, human-readable inspection
SQLiteMemoryStorestdlib sqlite3 + WALrecommended default — crash-safe, multi-process-safe
VectorMemoryStoresqlite-vec extension + embed_fnsemantic retrieval over facts (search_facts("what's safe to cook?"))

Swap the default JSON store for SQLite:

python
# Before
from edgevox.agents.memory import JSONMemoryStore
store = JSONMemoryStore("./memory.json")

# After — crash-safe, same Protocol, no other changes
from edgevox.agents import SQLiteMemoryStore
store = SQLiteMemoryStore("./memory.db")

Opt in to vector search:

bash
pip install 'edgevox[memory-vec]'
python
from llama_cpp import Llama
from edgevox.agents import VectorMemoryStore, llama_embed

# Any embedding-enabled GGUF works; nomic-embed-text is a good small default.
embedder = Llama(
    model_path="nomic-embed-text-v1.5.Q4_K_M.gguf",
    embedding=True,
    n_ctx=2048,
    verbose=False,
)

store = VectorMemoryStore("./vec.db", embed_fn=llama_embed(embedder))
store.add_fact("user.allergies", "peanuts, shellfish")
hits = store.search_facts("what's safe to cook?", k=5)
for fact, distance in hits:
    print(f"{distance:.3f}  {fact.key}: {fact.value}")

llama_embed accepts either the framework's LLM (if it was built with an embedding-capable backend) or a raw llama_cpp.Llama instance. For most users the latter is simpler — spinning up a second, dedicated embedding model keeps it from fighting the main LLM for sampling time.

Disable memory entirely: don't register MemoryInjectionHook on the agent. The agent runs fine without a memory store; each turn just starts fresh.

Expose memory to the LLM itself (memory-as-tools):

python
from edgevox.agents.memory_tools import memory_tools

agent = LLMAgent(
    ...,
    tools=[*memory_tools(store), ...],      # remember_fact / forget_fact / recall_fact
)

Filter with memory_tools(store, include=("recall_fact",)) if you only want the LLM to read, not write.

See memory.md for the full data model.


5. Barge-in VAD backends

Four backends behind one BargeInVADWatcher Protocol. Pick based on accuracy/latency/weight trade-offs.

BackendClassInstallAccuracy
EnergyEnergyBargeInWatcherbuilt-inbaseline; 5-15 % false triggers in noisy rooms
WebRTCWebRTCVADWatcheredgevox[voice-vad]GMM baseline; large improvement over RMS
Silero v6SileroVADWatcherno extra install (reuses faster-whisper's ONNX)~1-2 % false triggers
TENTENVADWatcheronnxruntime (core) + nrl-ai/edgevox-models fetchlowest latency, 306 KB model

Turn on via the factory:

python
from edgevox.agents import create_vad_watcher

watcher = create_vad_watcher(
    "silero",                              # or "energy" / "webrtc" / "ten"
    controller,
    is_tts_playing=player.is_playing,
)
threading.Thread(target=watcher.run, args=(mic_stream,), daemon=True).start()

Turn off barge-in entirely: don't attach any watcher to your InterruptController. Interrupts still fire via ctx.interrupt.trigger(...) — you just lose the mic-driven path.


6. Echo cancellation backends

The player pushes its output signal into the recorder's AEC reference so self-triggering on TTS is suppressed.

BackendInstallNotes
nonebuilt-inno processing; rely on VAD echo-floor
nlmsbuilt-inclassic adaptive filter, ~10 LOC
specsubbuilt-inspectral subtraction
dtlnedgevox[dtln]neural AEC via TFLite (Apache-2 model)
python
from edgevox.audio.aec import create_aec

recorder.set_aec(create_aec("dtln"))
recorder.set_aec(create_aec("none"))       # or disable

7. Hooks — the main agent extension point

Hooks fire at six points in LLMAgent.run(): on_run_start, before_llm, after_llm, before_tool, after_tool, on_run_end. They're priority-ordered and composable.

Three categories ship:

CategoryHooksWhen to enable
Always-on basicsMemoryInjectionHook, NotesInjectorHook, PersistSessionHook, TokenBudgetHook, ContextCompactionHookanything with memory / long context
SLM hardeningdefault_slm_hooks() → loop-break, repetition guard, empty-args canary, name-validatesmall models (1-4 B params)
ObservabilityTimingHook, EchoingHook, EpisodeLoggerHookdebugging, metrics, audit trails
Safety / guardrailsSafetyGuardrailHook, ToolOutputTruncatorHookuser-facing / production

Turn on a hook:

python
from edgevox.agents import LLMAgent, MemoryInjectionHook, TimingHook

agent = LLMAgent(
    ...,
    hooks=[
        MemoryInjectionHook(memory_store=store),
        TimingHook(),
    ],
)

Turn off a hook: remove it from the hooks=[...] list. There's no global enable/disable — the list is the source of truth.

Turn on all SLM hardening at once:

python
from edgevox.llm.hooks_slm import default_slm_hooks

agent = LLMAgent(..., hooks=[*default_slm_hooks(), MemoryInjectionHook(store)])

Write your own — a hook is any callable with a points frozenset of fire-point names and a __call__(point, ctx, payload) method. See hooks.md.


8. Workflows — compose multiple agents without an LLM

All workflows implement the Agent Protocol, so they nest and compose arbitrarily.

WorkflowShapeWhen
SequenceA → B → Cpipeline of deterministic steps
Fallbacktry A; on fail try Bbest-effort paths
Looprepeat A until predicatepolling, refinement
Parallelfan out, fan inconcurrent tool calls
Routerpick one of Ndispatch
RetryA with backoffflaky tools
TimeoutA with deadlinelatency ceilings
SupervisorA oversees workersOTP-style restart
Orchestratorplan → dispatch → reducemulti-step task decomposition

Turn on: just construct and call. Turn off: don't use; LLMAgent alone works standalone.


9. Multi-agent coordination

Independent from workflows — these are for emergent patterns (supervisor watches blackboard, background planner reacts to bus events, etc.).

PrimitivePurpose
Blackboardthread-safe shared K/V with watchers; post_request(key, task) for request/reply
AgentMessage + send_message / subscribe_inboxdirect agent-to-agent messages over the bus
BackgroundAgentwraps any agent in a background thread that reacts to bus / blackboard triggers
AgentPoolstarts/stops a set of agents, shares the context

Turn on:

python
from edgevox.agents import Blackboard

bb = Blackboard()                               # sync watchers
bb = Blackboard(async_watchers=True)            # non-blocking watcher dispatch
fut = bb.post_request("plan.request", {"goal": "pick cup"}, timeout=5.0)

Turn off: just don't instantiate. Single-agent LLMAgent.run() has no bus / blackboard dependency.

See multiagent.md for composition patterns.


10. Simulation tiers

Three sim environments, all conforming to SimEnvironment. Agents don't know which one they're running in — swap by build time or CLI flag.

TierClassInstallGood for
Tier 0ToyWorldstdlibunit tests, offline CI
Tier 1IrSimEnvironmentedgevox[sim]2-D mobile robot navigation
Tier 2aMujocoArmEnvironmentedgevox[sim-mujoco]3-D Franka tabletop
Tier 2bMujocoHumanoidEnvironmentedgevox[sim-mujoco]Unitree G1 / H1 with procedural gait

Turn on: pip install 'edgevox[sim]' (or [sim-mujoco]) and pass the environment instance via ctx.deps.

Turn off: use ToyWorld or none at all — LLMAgent doesn't require an environment.


11. ROS2 integration

edgevox.integrations.ros2_* modules bridge agent skills to ROS2 topics / services / actions. Only loads if rclpy is importable (ROS2 is a system package, not a PyPI dep).

bash
source /opt/ros/jazzy/setup.bash
edgevox-agent robot-external --text-mode

Turn off: don't source a ROS2 workspace. The modules raise ImportError on load; the framework handles it and continues without ROS2.


12. Desktop apps (RookApp)

RookApp is an opt-in PySide6 app shipped as edgevox[desktop] plus the edgevox-chess-robot console script.

bash
pip install 'edgevox[desktop]'
edgevox-chess-robot --persona trash_talker

Configure via CLI flags or in-app ☰ → Settings… (persona, engine, skill, theme, voice, debug mode). Preferences persist via QSettings.

Turn off: just don't install the extra — the core edgevox package has no Qt dependency.



Writing your own components

Every category above is a Protocol — a typed shape the framework calls. Your custom class only has to match that shape. No registration needed unless you want factory-name lookup (e.g. create_stt("my-backend")).

Custom STT backend

python
from edgevox.stt import BaseSTT
import numpy as np

class MySTT(BaseSTT):
    _backend_name = "mystt"     # feeds the default ``display_name`` property

    def transcribe(self, audio: np.ndarray, language: str = "en") -> str:
        # audio is a float32 numpy array @ 16 kHz
        return self._my_model(audio)

Drop into a pipeline by passing your instance wherever the default create_stt(...) result would go — PipelineConfig(stt=MySTT(), ...) for the streaming pipeline, or the stt= kwarg of whatever higher-level factory you're using. STT isn't attached to the LLMAgent directly — it lives one layer up, producing the text that the agent's run(...) consumes.

Custom TTS backend

python
from edgevox.tts import BaseTTS
import numpy as np

class MyTTS(BaseTTS):
    sample_rate = 24_000
    _backend_name = "mytts"

    def synthesize(self, text: str) -> np.ndarray:
        return self._model.run(text)

    def synthesize_stream(self, text: str):
        # Optional — default yields one chunk. Stream for lower TTFA.
        for sentence in split(text):
            yield self._model.run(sentence)

Same injection story as STT: the pipeline owns the TTS instance, not the agent.

Custom memory store

Implement MemoryStore (see memory.md for the full method list):

python
from edgevox.agents.memory import MemoryStore, Fact, Preference, Episode

class RedisMemoryStore:
    def add_fact(self, key, value, *, scope="global", source=""): ...
    def get_fact(self, key, *, scope="global"): ...
    def facts(self, *, scope=None): ...
    def forget_fact(self, key, *, scope="global"): ...
    def set_preference(self, key, value): ...
    def preferences(self): ...
    def add_episode(self, kind, payload, outcome, *, agent=""): ...
    def recent_episodes(self, n=5, *, kind=None): ...
    def render_for_prompt(self, *, max_facts=20, max_episodes=5): ...

# isinstance check against the runtime-checkable Protocol works:
assert isinstance(RedisMemoryStore(), MemoryStore)

Custom barge-in VAD watcher

Implement BargeInVADWatcher — just run(frames) + stop():

python
from edgevox.agents import BargeInVADWatcher, InterruptController

class MyVADWatcher:
    def __init__(self, controller: InterruptController, *, is_tts_playing):
        self._controller = controller
        self._is_tts = is_tts_playing
        self._stopped = False

    def stop(self):
        self._stopped = True

    def run(self, frames):
        for f in frames:
            if self._stopped:
                return
            if self._my_classifier(f):
                self._controller.trigger(reason="user_speech_custom")

Custom hook

A hook is any callable with a points frozenset and a __call__(point, ctx, payload):

python
from edgevox.agents.hooks import BEFORE_LLM, AFTER_LLM

class LoggingHook:
    points = frozenset({BEFORE_LLM, AFTER_LLM})
    priority = 0  # observability — runs after business hooks

    def __call__(self, point, ctx, payload):
        if point == BEFORE_LLM:
            log.info("messages in: %d", len(payload["messages"]))
        elif point == AFTER_LLM:
            log.info("reply: %r", payload.get("content", "")[:80])

agent = LLMAgent(..., hooks=[LoggingHook()])

Priority guide: Safety=100, Business=50, Observability=0. The built-ins follow this scale so yours slots cleanly in order.

Custom workflow

Implement the Agent Protocol — name: str, run(task, ctx) -> AgentResult, run_stream(task, ctx) -> Iterator[str]. LLMAgent and every shipped workflow already do this, so your class composes with them.

python
from collections.abc import Iterator

from edgevox.agents import AgentContext, AgentResult
from edgevox.agents.base import Agent

class RoundRobin:
    """Alternate between sub-agents on successive calls."""

    def __init__(self, name: str, agents: list[Agent]):
        self.name = name
        self._agents = agents
        self._idx = 0

    def run(self, task: str, ctx: AgentContext) -> AgentResult:
        a = self._agents[self._idx % len(self._agents)]
        self._idx += 1
        return a.run(task, ctx)

    def run_stream(self, task: str, ctx: AgentContext) -> Iterator[str]:
        a = self._agents[self._idx % len(self._agents)]
        self._idx += 1
        yield from a.run_stream(task, ctx)

Anywhere the framework takes an Agent (workflow child, handoff target, background worker) you can pass this.

Custom LLM backend

Match the surface LLMAgent calls — a single complete(...) method that returns an OpenAI-shaped response dict:

python
class MyLLM:
    def complete(
        self,
        messages: list[dict],
        *,
        tools: list[dict] | None = None,
        tool_choice: str | dict = "auto",
        stream: bool = False,
        stop_event: threading.Event | None = None,
        grammar: object | None = None,
    ) -> dict:
        # Must return {"choices": [{"message": {"content": str,
        #                                        "tool_calls": list | None}}]}
        ...

    def count_tokens(self, text: str) -> int:
        # Only used by TokenBudgetHook / Compactor when passed ``ctx.llm``.
        return len(text) // 4   # stub

agent = LLMAgent(...)
agent.bind_llm(MyLLM())

stop_event is how barge-in halts generation mid-decode — your backend should poll it in the sampling loop. grammar is optional (llama-cpp GBNF or equivalent); backends that can't grammar-constrain can ignore it. The agent loop falls back gracefully for older shims via a TypeError catch — so it's safe to implement only the subset you support.


Settings reference

Every knob, by component. Defaults are what you get from a bare LLMAgent(...) and create_stt() / create_tts() / ... calls.

LLMAgent

argdefaultmeaning
namerequiredhuman-facing identifier
descriptionrequiredadvertised to handoff targets + workflows
instructionsrequiredsystem prompt
toolsNonelist of @tool callables, Tool objects, or a ToolRegistry
skillsNonelist of Skill objects (cancellable long-running tasks)
llmNonepre-bound LLM instance; otherwise set via agent.bind_llm(...)
handoffsNoneagents this one can hand off to
hooksNonelist of hook objects
max_tool_hops3tool-call hops per turn before abort
tool_choice_policy"auto""auto" / "required_first_hop" / "required_always"

Parallel tool-call dispatch happens automatically inside _drive when the LLM emits multiple tool_calls in a single response — no flag needed. Agent events are always published via ctx.on_event / the bus; subscribers attach via ctx.bus.subscribe(...).

MemoryStore implementations

storeconstructor argsnotable defaults
JSONMemoryStorepath, autoload=Trueflush debounce 2 s, episode ring 500
SQLiteMemoryStorepathWAL mode on, synchronous=NORMAL, episode ring 500
VectorMemoryStorepath, embed_fn, embedding_dim=Noneprobes dim via embed_fn("dimension probe") when not given

All three honour max_facts / max_episodes on render_for_prompt.

InterruptController + watchers

knobclassdefaultmeaning
policycontrollerInterruptPolicy()whether cancels also interrupt LLM, TTS, skills
cancel_llmpolicyTruethread cancel into llama-cpp stopping_criteria
cancel_ttspolicyTrueflush player on trigger
cancel_skillspolicyFalsesignal running skills (opt-in; not all are cancellable)
frame_msenergy / webrtc / ten20 / 20 / 16VAD frame duration
aggressivenesswebrtc20-3; higher trades recall for precision
thresholdsilero / ten0.4 / 0.5speech-probability cutoff
sustained_speech_msall VAD watchers120consecutive-speech window before trigger
tts_release_msall VAD watchers180refractory after TTS stops
echo_suppression_ratioenergy only2.0mic/TTS energy ratio required
echo_floor_window_msenergy only200per-segment calibration window

Blackboard

knobdefaultmeaning
async_watchersFalsefan out watchers via thread pool
max_watcher_workers4pool size when async

Compactor

knobdefaultmeaning
trigger_tokens4000summarise when the session crosses this count
keep_last_turns4never summarise the most-recent N user/assistant turns

STT

Per language, resolved via edgevox.core.config.get_lang(code). Override per-call:

python
create_stt(language="en", model_size="large-v3", device="cuda")
# model_size="sherpa" routes to the Sherpa-ONNX Vietnamese backend

TTS

python
create_tts(language="en", voice="af_heart", backend="kokoro")
# backend one of "kokoro" / "piper" / "supertonic" / "pythaitts" (or None for language default)

RookApp (desktop)

Env vars (also exposed as CLI flags and in-app Settings):

envdefaultmeaning
EDGEVOX_CHESS_PERSONAcasualgrandmaster / casual / trash_talker
EDGEVOX_CHESS_ENGINEpersona defaultstockfish / maia
EDGEVOX_CHESS_USER_PLAYSwhitewhite / black
EDGEVOX_CHESS_STOCKFISH_SKILLpersona default0-20
EDGEVOX_CHESS_MAIA_WEIGHTSrequired when engine=maia
EDGEVOX_MEMORY_DIR~/.edgevox/memoryoverride for default store location
EDGEVOX_TEN_VAD_MODELauto-fetchoverride to a local TEN VAD ONNX path

CLI

flagwhat it does
edgevox --text-modedisables STT + TTS, terminal chat only
edgevox --simple-uiheadless rich-console voice loop
edgevox --web-uiFastAPI + WebSocket server
edgevox-agent <name>run one of the built-in example agents
edgevox-setupdownload all default models
edgevox-chess-robotRookApp desktop entry point

What this doesn't cover

  • Custom transports (gRPC, MQTT, …). The EventBus is the canonical pub/sub; replace it with your own and the workflows + multi-agent primitives work unchanged.

When in doubt: look at the Protocol — that's the actual contract. The built-in classes are one implementation each; you can always write another.

Offline voice agent framework for robots