Configuration

EdgeVox auto-detects hardware and selects optimal settings. Override with CLI flags or environment variables.

Auto-Detection

STT Model Selection

EdgeVox picks the Whisper model based on available resources:

CUDA GPU (>= 8GB VRAM) → large-v3-turbo (cuda, float16)
CUDA GPU (< 8GB VRAM)  → small (cuda, float16)
CPU (>= 32GB RAM)       → large-v3-turbo (cpu, int8)
CPU (>= 16GB RAM)       → medium (cpu, int8)
CPU (< 16GB RAM)        → small (cpu, int8)

Override with --stt and --stt-device.

Vietnamese defaults to Sherpa-ONNX Zipformer (30M int8) — falls back to Whisper automatically.

TTS Selection

Determined by language config in edgevox/core/config.py:

Kokoro-82M: English, French, Spanish, Hindi, Italian, Portuguese, Japanese, Chinese
Piper ONNX: Vietnamese, German, Russian, Arabic, Indonesian
Supertonic: Korean
PyThaiTTS: Thai

Override with --tts flag.

Environment Variables

Variable	Description
`EDGEVOX_MODEL_PATH`	Path to LLM GGUF file
`CUDA_VISIBLE_DEVICES`	GPU selection for multi-GPU systems

Model Hosting

Models are auto-downloaded to HuggingFace cache (~/.cache/huggingface/). Most TTS/STT models are consolidated in the nrl-ai/edgevox-models repo with automatic fallback to upstream sources.

Model	Source	Size
Whisper large-v3-turbo	`deepdml/faster-whisper-large-v3-turbo-ct2`	~1.5GB
Sherpa Zipformer (vi)	`nrl-ai/edgevox-models`	~30MB
Gemma 4 E2B IT	(local GGUF)	~2.5GB
Kokoro-82M	`nrl-ai/edgevox-models`	~338MB
Supertonic-2	`nrl-ai/edgevox-models`	~255MB
PyThaiTTS	`nrl-ai/edgevox-models`	~163MB
Piper voices	`nrl-ai/edgevox-models`	~50-100MB each
Silero VAD	`snakers4/silero-vad`	~2MB

Configuration ​

Auto-Detection ​

STT Model Selection ​

TTS Selection ​

Environment Variables ​

Model Hosting ​