FEATURES

Everything VoxisLive does, in depth.

VoxisLive is a Windows app that turns any system audio into a natural voice in your language, about two seconds behind the speaker. Here is every major capability, explained.

Driverless system-audio capture

VoxisLive reads your Windows audio mix directly through WASAPI process-loopback — the same low-level Windows Audio Session API that screen recorders use to capture what is playing. There is no VB-CABLE, no virtual sound device, and nothing to route. Install the app and it hears what you hear, immediately, on Windows 10 and 11.

Capture also excludes VoxisLive's own output, so the app never translates its own voice — even in a two-way conversation.

79 languages, swappable mid-session

Pick what you hear and what you speak from 79 languages, and swap the pair in one click without stopping the session. Source-language auto-detection handles multi-language audio.

Two-way meeting mode

Meeting mode runs two live sessions at once: the other party is translated into your language through your speakers, and your own speech is translated into theirs and injected through a virtual microphone. Works alongside Teams, Zoom, Meet, Webex and Discord — and no bot ever appears in the participant list.

VoxisLive Meeting mode translating both directions in the light theme

A native simultaneous interpreter, not a pipeline

Speech goes to a multimodal real-time model that recognizes, translates and re-speaks in a single low-latency pass — the way a human interpreter works in a booth. It begins translating while the speaker is still talking and stays roughly two seconds behind; professional simultaneous interpreters typically work two to four seconds behind.

Psychoacoustic ducking

While the translated voice speaks, the original audio is automatically lowered — mirroring professional simultaneous interpretation — then restored when the line ends. You always know who is talking.

Live bilingual transcript & export

Every session produces a searchable two-column transcript — the original line and your language, side by side. Export it as TXT, SRT or VTT when the session ends.

On-screen subtitles, if you want them

An optional always-on-top caption overlay floats over any app or game with a two-tier caption: the source line and yours. The spoken voice is the product; captions are there when you need a record.

VoxisLive translating a video with a live bilingual transcript

Private by design

VoxisLive never joins your call as a participant and is not a browser bot. On-device voice-activity detection means silence and non-speech audio never leave your machine; only detected speech segments go to the translation model, and no audio is retained after the session. The open-source BYOK build sends audio directly to Google under your own key — VoxisLive servers are never involved.

Open-core

The desktop engine is open source on GitHub. Run it with your own Gemini API key for free, audit the full audio pipeline, or install the managed app from the Microsoft Store with prepaid minutes and zero configuration.

FAQ

Common questions

01Does VoxisLive need a virtual audio cable?
No. VoxisLive uses driverless WASAPI process-loopback built into Windows 10 and 11 — there is no VB-CABLE, virtual audio driver or routing utility to install, and your audio setup is left unchanged.
02Is the translation spoken or subtitles?
It is spoken. VoxisLive delivers real-time speech-to-speech translation in a natural voice. A live bilingual transcript and an optional on-screen caption overlay are also available, exportable as TXT, SRT or VTT.
03How far behind the speaker is the translation?
About two seconds, depending on utterance length and network latency. The model starts translating while the speaker is still talking instead of waiting for the sentence to end.
04Which apps does it work with?
Anything that plays audio on Windows: browsers, desktop players, games, and conferencing apps like Teams, Zoom, Meet, Webex or Discord. Capture happens at the OS audio layer, so the source app is irrelevant.
Free to try · 10 minutes on us

Hear every language, in real time.

Runs on Windows 10 and 11 — no drivers, no setup ritual, no bot in your call.