Translate System Audio on Windows in Real Time

VoxisLive translates the sound coming out of your Windows PC — from any app — into your language in real time, and speaks it back aloud. You don't pick a file or paste a link. It taps the whole system audio mix as it plays and gives you a spoken translation, so whatever program is making noise right now becomes something you can understand.

Translate Any App's Audio — Not Just One File

Most translation tools make you commit to a source up front: upload this video, paste that URL, share this specific tab. VoxisLive works the opposite way. It listens to the Windows system audio output — the same mixed signal that reaches your speakers — so it doesn't care which application produced the sound. A browser playing a foreign news stream, a desktop video player, a Steam game with Japanese voice acting, a conferencing window, a podcast in a media app: if it plays through Windows, VoxisLive can translate it.

That means there's nothing to configure per source. You don't switch the app into a special mode or select a track. You start VoxisLive, choose your target language out of 79 supported languages, and play whatever you want. The translation follows the audio, switching seamlessly when you move from one app to another, because all of it is just system sound to the capture layer.

How Driverless System-Audio Capture Works

VoxisLive captures the system mix using Windows WASAPI process-loopback, the built-in mechanism Windows exposes for recording what's already playing through your sound device. There is no VB-CABLE, no virtual audio driver, and no audio-routing utility to install — which is the key difference from older "record what you hear" setups that depended on a virtual cable. On Windows 10 and Windows 11 it simply works after install.

Because the capture happens at the system level, VoxisLive also excludes its own output from what it listens to. The spoken translation it produces never gets fed back in and re-translated — a problem that plagues naive loopback approaches. The captured speech is streamed to a native simultaneous interpreter model that recognises, translates, and re-speaks in one low-latency pass, staying just a few seconds behind the original talker rather than waiting for a full sentence to finish.

For one-way listening, the Video / Game mode ducks the original audio so the spoken translation sits clearly on top. For two-way conversations, Meeting mode runs a second session that translates your own speech into the other party's language through a virtual microphone — and crucially, nothing joins the call as a bot and no participant sees an extra attendee. A deeper walkthrough of the pipeline lives on the how it works page, and the full range of scenarios is covered under use cases.

Spoken Output, Not On-Screen Text

The result you hear is a natural voice speaking your language, not a strip of captions across the screen. This is what separates VoxisLive from subtitle-based tools. For comparison, StreamVox produces subtitles across 49+ languages, while VoxisLive delivers spoken speech-to-speech translation across 79. A live transcript is still available alongside the audio and can be exported as TXT, SRT, or VTT with searchable history — but the thing you actually listen to is the translation itself, spoken aloud.

VoxisLive ships primarily through the Microsoft Store for a one-click install, and there is also a free, open-source build on GitHub where you bring your own API key. To get going on Windows 10 or 11, download VoxisLive, or check the pricing plans to pick a tier.

Common questions

Can VoxisLive translate audio from any app on Windows?

Yes. VoxisLive captures the whole Windows system audio mix, so it translates sound from any program that plays through your audio device — browsers, desktop video players, games, conferencing apps, and media players. You do not select a file or a single app; it translates whatever is currently playing.

Do I need a virtual audio cable to translate system sound?

No. VoxisLive uses driverless WASAPI process-loopback, the capture interface built into Windows 10 and 11. There is no VB-CABLE, no virtual audio driver, and no meeting bot to install. It also excludes its own output from the capture, so it never re-translates the voice it just spoke.

Does VoxisLive show subtitles or speak the translation?

It speaks the translation aloud in a natural voice in your target language — it is speech-to-speech, not subtitles. A live transcript is available on the side and can be exported as TXT, SRT, or VTT, but the primary output you hear is spoken audio.

How many languages can VoxisLive translate into?

VoxisLive translates into 79 target languages. The underlying model is a native simultaneous interpreter, so it translates as the speaker talks and stays only a few seconds behind the original audio.

Translate everything your PC plays, in real time.

Download