Qwen3-ASR Pipeline¶
Qwen3-ASR is an alternative ASR engine based on Alibaba's Qwen architecture, offering strong Japanese text quality with a different approach to speech recognition.
Models¶
| Model | Size | Notes |
|---|---|---|
| Qwen3-ASR 1.7B | ~4 GB | Full model, best quality |
| Qwen3-ASR 0.6B | ~2 GB | Smaller, faster, slightly lower quality |
How to Use¶
GUI (Ensemble Tab)¶
- Go to the Ensemble tab
- Set Pipeline to Qwen3-ASR
- Select model size
- Click Start
CLI¶
As Ensemble Pass 2¶
Qwen3-ASR pairs well with Whisper-based pipelines:
- Pass 1: Balanced (Whisper — good timing)
- Pass 2: Qwen3-ASR (good text quality)
- Merge Strategy: Smart Merge
Requirements¶
- HuggingFace extra must be installed:
pip install whisperjav[huggingface] - Requires
transformersandacceleratepackages - First run downloads the model from HuggingFace
Strengths and Limitations¶
Strengths:
- Excellent Japanese text quality
- Good handling of casual/colloquial speech
- Strong punctuation and sentence structure
Limitations:
- Timing can differ from Whisper-based pipelines
- Apple Silicon: currently CPU-only (MPS not yet supported for forced aligner)
- Requires the HuggingFace extras installed