**VoxisLive for Gaming — Translate Game and Stream Audio in Real Time on Windows**
VoxisLive translates game audio, cutscenes, and live Twitch or YouTube streams into spoken English — or your language — in real time on Windows, with no virtual audio cable or driver install. WASAPI loopback captures any game's audio directly. Translation is spoken aloud, so you keep your eyes on the screen and your hands on the controller.
Playing a Japanese or Korean game that hasn't been localized yet
The global catalog of games available in Japanese, Korean, and Chinese dwarfs what Western publishers choose to localize. Thousands of RPGs, visual novels, strategy titles, and action games remain untranslated — either indefinitely or for years after the original release. Fan translation patches exist for some titles, but they require technical setup, they lag behind patches, and they cover a fraction of the catalog.
VoxisLive approaches the problem from the audio output rather than the game files. If the game speaks its dialogue aloud, VoxisLive can translate that dialogue into your language and speak it back to you, in real time, while you play. No patch. No file modification. No ROM editing. You install VoxisLive on your Windows PC, open the game, and start a translation session.
How does VoxisLive translate game audio in real time on Windows?
VoxisLive uses Windows WASAPI loopback to read the audio your sound card is currently playing. This is the same low-level audio API that Windows uses internally — it requires no virtual audio cable, no additional audio driver, and no modification to the game itself. The moment a character speaks, an NPC delivers a line, or a cutscene begins, VoxisLive's on-device speech detection identifies the speech segment and passes it to Gemini Live for translation.
The translated audio comes back as a synthesized spoken voice and plays through a separate output channel — typically a second audio device or a headphone mix. You hear the translation spoken aloud. You do not read subtitles. You do not pause the game to consult a translation window. You keep playing.
The entire pipeline — WASAPI capture, on-device speech detection, Gemini Live translation, spoken output — runs with low enough latency that voiced dialogue lines translate before the character animation ends in most games. Unvoiced text is not translated because there is no audio signal to capture; VoxisLive is an audio translation tool, not an OCR engine.
Why does speaking the translation matter more than showing subtitles?
Every subtitling or overlay-based translation tool for games presents the same trade-off: reading takes attention that would otherwise go to the screen. In an action game, a fighting game, or any title with moment-to-moment spatial demands, moving your gaze to a subtitle bar costs you positional awareness, reaction time, and immersion simultaneously.
VoxisLive delivers translation as audio because audio is processed through a different cognitive channel than vision. You can hear a translated dialogue line without looking anywhere other than where you already are. For games where listening to the original voice performance matters — a story-driven RPG, a voiced adventure game, a dialogue-heavy JRPG — you can run the original audio and the translated audio simultaneously and let your brain process both without switching focus.
For streamers, the spoken translation creates a different opportunity: viewers hear the translated content through your stream without any overlay cluttering the capture.
Can VoxisLive translate a Twitch or YouTube stream in real time?
Yes. VoxisLive does not distinguish between a game, a video file, a Twitch stream, or a YouTube video. All of them produce audio through your Windows sound card, and WASAPI loopback captures all of them identically. If you are watching a Japanese-language Twitch stream, a Korean esports broadcast, or a Spanish-language YouTube gaming channel, open VoxisLive, select your system audio output as the capture source, set your translation language, and start the session.
This also applies to video-on-demand content: any streaming service whose audio plays through your Windows output can be translated. The constraint is audio quality — compressed or low-bitrate streams may produce lower-accuracy speech detection — but for any stream at standard broadcast quality, the translation pipeline performs the same as it does on local game audio.
Does VoxisLive require a virtual audio cable or special audio driver?
No. Virtual audio cables — software tools like VB-Audio or Virtual Audio Cable — work by inserting a fake audio device into Windows that routes audio between applications. They require driver installation, persist as system-level software, and add a routing step that introduces latency and occasional compatibility issues with games that use exclusive audio mode or anti-cheat systems.
VoxisLive bypasses all of that. WASAPI loopback is a native Windows API available since Windows Vista. It reads from the audio mix your output device is already rendering, with no driver required, no routing change, and no new device appearing in your audio settings. VoxisLive installs as a standard application and uninstalls completely. Your audio setup is unchanged.
This matters in gaming specifically because many competitive games, launchers, or anti-cheat systems flag virtual audio devices as unusual or potentially adversarial software. VoxisLive does not install a driver of any kind.
What games and languages work with VoxisLive?
Any game that runs on Windows and produces voiced audio can be used with VoxisLive. The app captures at the OS audio layer, so game engine, launcher, DRM system, and store platform are all irrelevant. This includes Steam, Epic Games Store, GOG, Microsoft Store, and games run directly from executables.
Language support is determined by Gemini Live's supported languages, which includes Japanese, Korean, Mandarin Chinese, Spanish, French, German, Portuguese, Italian, Russian, Arabic, Hindi, and other major languages. Translation direction is fully configurable: you can translate from Japanese to English, Korean to English, or between any two supported languages.
VoxisLive also works with VRChat, social VR environments, and voice-chat-heavy multiplayer games where other players may be speaking a language you don't understand — the same WASAPI loopback that captures game audio captures voice chat audio routed through the same output device.
Getting started with game audio translation on VoxisLive
Download the Windows installer from the download page and run it. No reboot is required and no driver installs. Open VoxisLive and select your playback device as the audio source. If you are on the free Developer plan, enter your Gemini API key. If you have a Creator or Pro subscription, your managed minutes are ready with no additional configuration. Select your translation language pair and start the session, then launch your game.
For most games, VoxisLive handles the full translation loop automatically. The how it works page explains the WASAPI capture and Gemini Live pipeline in detail if you want to understand the architecture before committing. Translation plans, minute allocations, and the BYOK vs. managed-minutes options are covered on the pricing page. The VoxisLive home page has a full overview of all the audio sources the app supports beyond gaming.
Common questions
Can VoxisLive translate a game that uses Japanese voice acting but has no English dub?
A: Yes, as long as the game has voiced dialogue. VoxisLive captures whatever audio your Windows sound card is playing using WASAPI loopback. If Japanese voice lines are playing, VoxisLive detects the speech, translates it with Gemini Live, and speaks the translation back in English (or any supported target language). No game modification, patch, or file access is needed.
Will VoxisLive conflict with my game's anti-cheat software?
A: VoxisLive does not inject code into game processes, does not install a driver, and does not create a virtual audio device. It reads from the Windows WASAPI loopback API — the same API Windows itself uses. Because VoxisLive operates entirely outside the game process at the OS audio level, it presents no surface that anti-cheat systems monitor for injection or hooking.
Can I use VoxisLive while streaming on Twitch or OBS?
A: Yes. VoxisLive runs as a background application and does not alter your audio routing or add a device to your Windows audio stack. OBS and your streaming software capture audio from the same devices they always have. VoxisLive's translated output goes to a separate audio channel (your headphones or a secondary device), so it does not appear in your stream capture unless you explicitly route it there.
How much translation time does gaming use, and which plan makes sense?
A: Consumption depends on how much voiced audio your game contains. A heavily voiced JRPG with continuous dialogue will consume more minutes per hour than a game with sparse cutscenes. The Creator plan (700 managed minutes per month, $19/mo) is suitable for casual-to-moderate gaming sessions. The Pro plan (1,500 minutes, $39/mo) covers heavier use and includes a commercial license for streamers who translate content as part of their broadcast. The BYOK Developer plan uses your own Gemini API key billed directly by Google, making it cost-effective for high-volume users who want direct cost visibility. Full plan details are on the pricing page.
---
Hear every language, in real time.
Download