ASR models can transcribe and translate speech into text. They can be used for AI assistant and Conversational AI. We provide the wrapping code around the ASR models to make them run efficiently in streaming. For the i.MX 95 MPU, we support Neutron NPU acceleration.
Models supported: Moonshine-tiny, Moonshine-base, Whisper-tiny.en, Whisper-base.en, Whisper-small.en, Whisper-medium.en
| Model | Model size [parameters] | Weights format | WER* [%] | Cores | Time To First Token [seconds] | Full transcription latency following X seconds of speech [seconds] | Library dependency | ||
|---|---|---|---|---|---|---|---|---|---|
| X = 3s | X = 6s | X = 9s | |||||||
| Moonshine-tiny | 27M | Q8 | 5.92 | 6x Cortex-A55 | 0.15 | 0.38 | 0.64 | 0.85 | ONNX |
| Moonshine-base | 61M | Q8 | 4.06 | 6x Cortex-A55 | 0.29 | 0.64 | 0.99 | 1.63 | ONNX |
| Whisper-tiny.en | 39M | Q8 | 7.30 | 6x Cortex-A55 | 0.17 | 0.37 | 0.56 | 0.74 | ONNX |
| Whisper-base.en | 74M | Q8 | 5.11 | 6x Cortex-A55 | 0.28 | 0.59 | 0.97 | 1.23 | ONNX |
| Whisper-small.en | 244M | Q8 | 3.76 | 6x Cortex-A55 | 0.69 | 1.36 | 2.17 | 2.92 | ONNX |
| Whisper-Medium.en | 769M | Q8 | 3.93 | 6x Cortex-A55 | 2.06 | 3.72 | 5.60 | 8.21 | ONNX |
*Computed in streaming on LibriSpeech test-clean set.