Long-form speech recognition built for one-pass context.
Accepts up to 60 minutes of audio in one run while preserving speaker continuity, timestamps, and domain-aware phrasing through customized hotwords.
- Structured output across speaker, timestamp, and transcript content
- Native multilingual coverage across 50+ languages
- Public Playground plus Hugging Face access