Skip to content

EdgeVoxVoice agents for robots.

Sub-second voice pipeline. Plug-and-play harness. Fully on-device.

EdgeVox TUI Screenshot
§ 01 — Four corners

What EdgeVox actually is.

A small framework with a wide surface — the harness, the voice pipeline, the robot bridge, and a reference desktop app.

§ 01

Agents & tools.

@tool and @skill decorators, LLMAgent with handoffs, nine composable workflows (Sequence, Fallback, Loop, Parallel, Router, Supervisor, Orchestrator, Retry, Timeout), cancellable skills with GoalHandle.

agent loop
§ 03

Robotics & sim.

ROS2-native — voice + robot_state + agent_event, TF2, Nav2 cmd_vel, execute_skill action server. ToyWorld · IR-SIM · MuJoCo Franka · Unitree G1/H1 · external Gazebo/Isaac.

robotics & sim
§ 04

Ships as an app.

RookApp is the reference PySide6 build — Qt UI + LLMAgent + llama.cpp + Stockfish in one Python process. No browser, no web server, no Tauri. The framework runs end-user products, not just demos.

rookapp guide
§ 02 — Demos

Things you can run today.

One pip install, one process, one warm laptop. Each shot below is a real screen from this repo.

EdgeVox TUI — voice pipeline
Voice pipeline TUI — streaming STT · LLM · TTS with VAD barge-in
MuJoCo Franka Panda pick-and-place
MuJoCo · Franka arm — voice-controlled pick-and-place
Unitree G1 humanoid
Unitree G1 humanoid — procedural gait + ONNX policy slot
RookApp — PySide6 desktop chess robot
RookApp desktop — offline chess partner in one Python process
§ 03 — Try it

Six entrypoints, one install.

Every edgevox-agent invocation composes with --ros2 for the full topic surface.

bash
edgevox                                       # voice pipeline TUI
edgevox-agent robot-panda --text-mode         # MuJoCo Franka pick-and-place
edgevox-agent robot-irsim --text-mode         # IR-SIM 2D navigation
edgevox-agent robot-humanoid --simple-ui      # Unitree G1 humanoid (auto-fetched)
edgevox-agent robot-external --text-mode      # drive any external ROS2 sim / robot
edgevox-chess-robot                           # RookApp — PySide6 desktop chess partner

Any edgevox-agent invocation composes with --ros2 to publish /edgevox/robot_state + /agent_event, accept cmd_vel / goal_pose + text_input, and expose the execute_skill action.

§ 04 — Principles

What we will and won't do.

A small set of rules the codebase actually enforces — not aspirations.

01
Plug-and-play, not patchable.
Every layer — STT, TTS, LLM, VAD, hooks, skills, tools, parsers — swaps via Protocols and registries. New behaviour lands as a plugin, never a conditional in core.
02
Offline by default.
No cloud APIs, no telemetry, no analytics. Whisper, Gemma, Kokoro, Piper, Supertonic, PyThaiTTS — every model runs on your hardware. Period.
03
Streaming is the contract.
STT < 0.5 s, LLM first token < 0.4 s, TTS first chunk < 0.1 s. No blocking calls hold the loop. Latency regressions block the merge.
04
Hardware-aware, never hardware-bound.
CUDA, Metal, CPU — every backend degrades gracefully. A missing accelerator is a config decision, not a crash.
05
Safety preempts the LLM.
SafetyMonitor halts skills before the LLM is consulted — stop-words land in the reactive layer, the halt path doesn't wait on a model round-trip.
06
MIT, no copyleft contamination.
License of every dependency is verified at add-time. GPL/AGPL/SSPL are refused. Your downstream stays unencumbered.

Offline voice agent framework for robots