Automatic speech recognition (ASR) models can transcribe and translate speech into text. They can be used for AI assistants and conversational AI applications. We provide the wrapping code for streaming efficiency in ASR models. For the i.MX 95 microprocessing unit (MPU), we support Neutron neural processing unit (NPU) acceleration.
Models supported: Moonshine-tiny, Moonshine-base, Whisper-tiny.en, Whisper-base.en, Whisper-small.en and Whisper-medium.en
| Model | Model size [parameters] | Weights format | WER* [%] | Cores | Time To First Token [seconds] | Full transcription latency following X seconds of speech [seconds] | Library dependency | ||
|---|---|---|---|---|---|---|---|---|---|
| X = 3s | X = 6s | X = 9s | |||||||
| Moonshine-tiny | 27M | Q8 | 5.92 | 6x Cortex-A55 | 0.15 | 0.38 | 0.64 | 0.85 | ONNX |
| Moonshine-base | 61M | Q8 | 4.06 | 6x Cortex-A55 | 0.29 | 0.64 | 0.99 | 1.63 | ONNX |
| Whisper-tiny.en | 39M | Q8 | 7.30 | 6x Cortex-A55 | 0.17 | 0.37 | 0.56 | 0.74 | ONNX |
| Whisper-base.en | 74M | Q8 | 5.11 | 6x Cortex-A55 | 0.28 | 0.59 | 0.97 | 1.23 | ONNX |
| Whisper-small.en | 244M | Q8 | 3.76 | 6x Cortex-A55 | 0.69 | 1.36 | 2.17 | 2.92 | ONNX |
| Whisper-Medium.en | 769M | Q8 | 3.93 | 6x Cortex-A55 | 2.06 | 3.72 | 5.60 | 8.21 | ONNX |
*Computed in streaming on LibriSpeech test-clean set.