ChronosJAV Pipeline¶
ChronosJAV is a dedicated pipeline for anime and JAV content, built around speech models specifically trained on Japanese anime and adult content dialogue.
Inspired by the temporal-awareness approach in ChronusOmni (Chen et al., 2025).
Available Models¶
| Model | Size | Strengths |
|---|---|---|
| anime-whisper | ~4 GB | Best quality for anime/JAV dialogue. Fine-tuned Whisper large-v3. |
| Kotoba v2.1 | ~2 GB | Lighter weight with punctuation support. Good balance of speed and quality. |
| Kotoba v2.0 | ~2 GB | Lighter weight, no punctuation. Fastest of the three. |
Tip
Start with anime-whisper for best results. Switch to Kotoba if you need faster processing or have limited GPU memory.
How to Use¶
GUI¶
- Go to the Ensemble tab
- Set Pipeline to ChronosJAV
- Select a Model from the dropdown
- Click Start
As Part of Ensemble¶
For maximum quality, combine ChronosJAV with another pipeline:
- Pass 1: ChronosJAV with anime-whisper
- Pass 2: Qwen3-ASR or Balanced
- Merge Strategy: Smart Merge
Technical Details¶
ChronosJAV uses different defaults than the standard Whisper pipelines:
| Setting | ChronosJAV Default | Standard Default |
|---|---|---|
| Decoding | Greedy (beam=1) | Beam search (beam=5) |
| Speech Segmenter | TEN VAD | Silero v6.2 |
| Timestamp Mode | VAD-only | Full alignment |
| Cleaner | Passthrough | Standard sanitizer |
These defaults are optimized for anime/JAV content. The greedy decoding with TEN VAD segmentation produces tighter subtitle timing and eliminates oversized subtitle blocks.
First Run¶
On first use, the model is downloaded from HuggingFace (~2-4 GB depending on model). This is a one-time download — subsequent runs use the cached model.
Models are cached in your HuggingFace cache directory:
- Windows:
C:\Users\<you>\.cache\huggingface\ - macOS/Linux:
~/.cache/huggingface/