WhisperJAV GUI User Guide¶

Screenshots from v1.8.6 · Windows 11

Table of Contents¶

Launching the App
Interface Overview
Adding Files
Choosing an Output Location
Basic Transcription
Advanced Options
Ensemble Mode (Two-Pass)
AI Subtitle Translation
Running a Job
Console Output
Menus & Dialogs
Keyboard Shortcuts
Common Workflows

WhisperJAV GUI overview

1. Launching the App¶

After installation, launch WhisperJAV from:

Desktop shortcut — created automatically by the installer
Start menu — search "WhisperJAV"
Command line — whisperjav-gui

On first launch, the app runs a preflight check to verify FFmpeg, CUDA, and Python dependencies are available. Any issues are reported in the console at the bottom of the window.

2. Interface Overview¶

The GUI is divided into five areas, top to bottom:

Area	Purpose
Header bar	Theme switcher (4 themes) and update check button
Source	Add video/audio files to process
Destination	Where to save output SRT files
Options tabs	Four tabs of configuration (see sections 5–8)
Run controls & Console	Progress bar, Start/Cancel, and real-time log output

Themes¶

Click the palette icon in the header to cycle through themes:

Theme	Description
Default	Light theme with blue accents
Google	Material Design inspired
Carbon	IBM Carbon dark theme
Primer	GitHub-style neutral palette

3. Adding Files¶

WhisperJAV accepts video and audio files in any format FFmpeg supports (MP4, MKV, AVI, WAV, MP3, FLAC, M4B, etc.).

Adding files¶

Drag and drop files directly onto the file list area
Add File(s) button — opens a multi-select file dialog
Add Folder button — adds all media files from a folder (non-recursive)

Each file appears in the list with its filename and duration.

Managing the file list¶

Remove Selected — removes highlighted files from the list
Clear — removes all files

4. Choosing an Output Location¶

By default, SRT files are saved next to the source video. Uncheck "Save next to source video" to pick a custom output folder.

Setting	Behavior
Checked (default)	Output SRT saved in the same folder as each source video
Unchecked	All SRT files saved to the chosen output directory

When unchecked, use Browse to select a folder, or Open to view it in File Explorer.

5. Basic Transcription¶

The Transcription Mode tab (Tab 1) controls the core transcription pipeline.

Transcription Mode tab with default settings

Mode¶

Selects the processing pipeline. Each mode trades speed for accuracy.

Mode dropdown expanded showing all pipeline options

Mode	Backend	Scene Detection	VAD	Best For
Fidelity	Whisper	Yes	Full	Maximum accuracy, slow
Balanced (default)	Whisper	Yes	Yes	General use
Fast	Whisper	Yes	No	Quick results with scene awareness
Faster	Faster-Whisper	No	No	Maximum speed, minimal processing
Transformers	HuggingFace	Yes	Yes	Alternative backend

Sensitivity¶

Controls how aggressively the system detects and segments speech.

Sensitivity	Description
Aggressive (default)	Lower thresholds, captures more speech including quiet passages
Balanced	Middle ground
Conservative	Higher thresholds, fewer false positives, may miss quiet speech

Source Language¶

The language spoken in the video. Affects Whisper's language hint and post-processing rules.

Japanese (default) — optimized with Japanese-specific regrouping, particle detection, aizuchi handling
Korean, Chinese, English — standard Whisper processing

Subtitle Output¶

Native (default) — subtitles in the original spoken language
Direct-to-English — Whisper translates to English during transcription (lower quality than dedicated translation)

6. Advanced Options¶

The Advanced tab (Tab 2) provides additional controls for troubleshooting and fine-tuning.

Advanced Options tab

Option	Default	Description
Model override	Off	When checked, forces a specific Whisper model size instead of the pipeline default
Model dropdown	Large V2	Only visible when model override is checked. Options: Large V2, Large V3, Turbo
Output format	SRT	Output format: SRT, VTT, or Both
Async processing	Off	Enables asynchronous pipeline execution
Debug logging	Off	Writes detailed debug logs to `whisperjav.log`
Keep temp files	Off	Preserves intermediate audio chunks and processing artifacts
Custom temp dir	System default	When "Keep temp files" is on, optionally choose where to store them
Accept CPU-only mode	Off	Allows running without CUDA GPU (much slower, but works)

7. Ensemble Mode (Two-Pass)¶

The Ensemble tab (Tab 3) lets you run two different pipelines and merge their results for higher accuracy. This is the most powerful mode.

Ensemble Mode tab overview

How Ensemble Works¶

Pass 1 processes the video with one pipeline configuration
Pass 2 (optional) processes the same video with a different configuration
The two SRT outputs are merged using a configurable strategy

This leverages the strengths of different backends — for example, Whisper for timing accuracy and Qwen3-ASR for text quality.

Pass Configuration¶

Always active. Each pass has identical controls:

Control	Options	Default
Pipeline	Balanced, Fast, Faster, Fidelity, Transformers, Qwen3-ASR, ChronosJAV	Balanced
Sensitivity	Aggressive, Balanced, Conservative	Aggressive
Scene Detector	Auditok, Silero, Semantic, None	Semantic
Speech Enhancer	None, FFmpeg DSP, ZipEnhancer, ClearVoice, BS-RoFormer	None
Speech Segmenter	Silero v6.2, v4.0, v3.1, Whisper VAD, TEN, None	Silero v6.2
Model	Depends on pipeline — Large V2/V3/Turbo (Whisper) or 1.7B/0.6B (Qwen)	Pipeline default

Available Pipelines¶

The pipeline dropdown groups options by backend:

Pipeline dropdown showing all available backends

Whisper-Based: Balanced, Fast, Faster, Fidelity
HuggingFace: Transformers
ChronosJAV: Qwen3-ASR, Anime-Whisper

Customize Parameters¶

Click the Customize button on a pass to open the parameter tuning modal. This gives fine-grained control over model, quality, segmenter, enhancer, scene detection, and context parameters.

Customize Parameters modal — Quality tab

The modal has tabs for Model, Quality, Segmenter, Enhancer, and Scene settings. A badge shows DEFAULT or CUSTOM to indicate whether parameters have been modified.

Use Save Preset to save your configuration for reuse, or Load Preset to restore a saved configuration. Presets persist across sessions.

Pass 2 Configuration¶

Check the Pass 2 checkbox to enable the second pass. Controls are identical to Pass 1.

When disabled, the row is greyed out and all controls are inactive.

Speech Enhancement: FFmpeg DSP¶

When FFmpeg DSP is selected as the speech enhancer, an additional panel appears with 8 audio processing effects:

Effect	Description
Loudness Normalization	Normalize overall loudness to a standard level
Dynamic Normalization	Even out volume differences between quiet and loud sections
Compression	Reduce dynamic range
Denoise	Remove background noise
High-pass Filter	Remove low-frequency rumble
Low-pass Filter	Remove high-frequency hiss
De-esser	Reduce harsh sibilance (s/t sounds)
Amplify	Boost overall volume

Merge Strategy¶

When Pass 2 is enabled, choose how the two outputs are combined:

Strategy	Description
Pass 1 Primary (default)	Uses Pass 1 as the base, fills gaps from Pass 2
Smart Merge	Intelligently selects the best subtitle from each pass based on quality heuristics
Full Merge	Combines all subtitles from both passes, resolving overlaps
Longest	Picks the longer (more detailed) subtitle when passes overlap
Pass 2 Primary	Uses Pass 2 as the base, fills gaps from Pass 1
Pass 1 Overlap (30%)	Pass 1 base, requires 30% time overlap to merge from Pass 2
Pass 2 Overlap (30%)	Pass 2 base, requires 30% time overlap to merge from Pass 1

Serial Mode¶

Check "Finish each file" to complete each file fully (Pass 1 → Pass 2 → Merge) before starting the next. Useful when processing multiple files — you see results as they finish instead of waiting for the entire batch.

Inline AI Translation (Ensemble)¶

Check "AI-translate" after the merge strategy to automatically translate the merged output. This shows an inline provider/model selector and a settings button.

8. AI Subtitle Translation¶

The AI SRT Translate tab (Tab 4) is a standalone tool for translating existing SRT files using AI language models.

AI SRT Translate tab — full view

Provider & Model¶

Provider	Notes
Local	Uses a local LLM server (llama-cpp). Free, private, no API key needed. Requires GPU with ~8GB VRAM.
DeepSeek	Cloud API. Cost-effective, good quality for CJK languages.
Gemini	Google's API. Good multilingual support.
Claude	Anthropic's API. High quality, higher cost.
GPT	OpenAI's API. Widely available.
OpenRouter	Meta-router supporting many models.
GLM	Zhipu AI. Good for Chinese-related tasks.
Groq	Fast inference cloud provider.
Custom	Any OpenAI-compatible endpoint.

Each provider populates a Model dropdown with available models. Use Custom model override to specify a model ID not in the list.

API Key & Connection Test¶

For cloud providers, enter your API key and click Test Connection to verify it works. A status icon shows the result.

Green checkmark: connection successful
Red X: connection failed (check key and endpoint)

The Local provider does not require an API key — it starts a llama-cpp server automatically.

Language & Tone¶

Setting	Options	Default
Source Language	Japanese, Korean, Chinese	Japanese
Target Language	English, Chinese, Indonesian, Spanish	English
Tone/Style	Standard, Adult-Explicit	Standard

Standard tone produces clean, natural translations. Adult-Explicit uses specialized instructions tuned for JAV dialogue with appropriate vocabulary.

Advanced Settings¶

Click the collapsible Advanced Settings section to reveal additional options:

Setting	Default	Description
Movie Title	(empty)	Provides context to the AI for better translation
Actress Names	(empty)	Helps the AI correctly handle character names
Plot Summary	(empty)	Additional context for the AI translator
Scene Threshold	60 sec	How the translator groups subtitles into scenes for batch processing
Max Batch Size	30	Maximum subtitles per translation batch
Max Retries	3	Retry count for failed API calls
Rate Limit	(provider default)	Requests per minute limit
Custom Endpoint	(empty)	Override the default API endpoint URL

Translation Progress¶

Translation has its own progress bar and Start/Cancel buttons at the bottom of Tab 4. The main Run Controls section is hidden while Tab 4 is active.

9. Running a Job¶

Starting¶

Add one or more files (Section 3)
Configure options on the relevant tab
Click Start

The progress bar shows overall completion with a percentage. The status label describes the current stage (e.g., "Extracting audio...", "Transcribing scene 3/12...").

Cancelling¶

Click Cancel to stop the current job. The process terminates and any partial output is preserved.

When processing is running¶

All file selection and configuration controls are disabled
The Cancel button becomes active
Real-time progress appears in both the progress bar and the console

Completion¶

When finished, the status shows "Completed" and the SRT file path is printed in the console. The output SRT is ready to use with any media player.

10. Console Output¶

The console at the bottom shows real-time log messages from the processing pipeline.

Color	Meaning
Green	Success messages (file saved, processing complete)
Yellow	Warnings (fallback activated, parameter adjusted)
Red	Errors (file not found, CUDA failure)
White/Gray	Informational messages (progress, stage transitions)

Click Clear to reset the console output.

11. Menus & Dialogs¶

About Dialog¶

Press F1 or access via the header. Shows version info, feature list, and keyboard shortcuts.

Update Check¶

Click the update button in the header to check for new versions.

Stable Release track — shows the latest published release, release notes, and an update button
Development track — shows how many commits ahead the dev branch is, with recent commit messages

Accessible from the Translation Settings button in the Ensemble tab's AI-translate row. Provides the same configuration as Tab 4 in a compact modal.

Translation Settings modal

12. Keyboard Shortcuts¶

Shortcut	Action
Ctrl+O	Open file dialog (add files)
Ctrl+R	Start processing
Escape	Cancel current job / close dialogs
F1	Open About dialog
Arrow keys	Navigate file list

13. Common Workflows¶

Quick Transcription (Fastest)¶

Drag a video file onto the app
Leave defaults (Balanced mode, Aggressive sensitivity, Japanese)
Click Start
SRT appears next to your video file

High-Quality Transcription (Ensemble)¶

Add files
Go to Ensemble tab
Pass 1: Balanced pipeline, Semantic scene detection (defaults)
Enable Pass 2: Select Qwen3-ASR pipeline
Merge Strategy: Smart Merge
Click Start
Two passes run sequentially, then results are merged

Transcribe + Translate in One Step¶

Add files
Go to Ensemble tab
Configure passes as desired
Check "AI-translate"
Select provider, enter API key if needed
Click Start
Get translated SRT automatically after transcription

Translate an Existing SRT¶

Go to AI SRT Translate tab (Tab 4)
Add your SRT file(s)
Select provider and model
Enter API key and test connection
Set target language
Click Start

CPU-Only Mode (No GPU)¶

Go to Advanced tab
Check "Accept CPU-only mode"
Use Faster mode for best speed without GPU
Processing will be significantly slower but functional