In December 2022, NVIDIA published Riva Speech Skills release 2.8.1. The full release notes can be found on their official website. The new version of Riva includes support for new ASR models for different languages, as well as improvements in stability, quality, and latency/throughput.
In several projects at Data Monsters, we use the Conformer-CTC ASR model in streaming mode, and performance is a big issue. Therefore, we performed a series of tests on it.
The figure above shows a performance comparison of the Conformer-CTC model in streaming mode between Riva 2.8.1 and Riva 2.1.0 measured on a single GPU NVIDIA Tesla V100 and NVIDIA T4.
Measurements were performed on a pretrained model with the default Riva-build option in high throughput configuration mode that refers to chunk sizes of 800 milliseconds.
The figure shows an increase in performance for all accelerators for each number of streams. For more streams, the decrease in latency in Riva 2.8.1 with the same RTFX and the number of streams is more noticeable.
For example, 384 effective streams on NVIDIA Tesla V100 with Riva 2.8.1 shows almost 280 milliseconds less latency compared to Riva 2.1.0.
The table above shows a quality comparison of the Conformer-CTC ASR model in streaming mode between Riva 2.8.1 and Riva 2.1.0.
Measurements of Word Error Rate (WER) were performed with the same version of the source ASR model, fine tuned in the NVIDIA NeMo toolkit and the same versions of datasets - clean and augmented with noise. We simply deployed the same model in different versions of Riva in the Streaming mode.
Our experiments show significant quality improvements. After just deploying the ASR model in Riva 2.8.1, WER drops by about 0.3% compared to previous versions of Riva.
You will definitely see improvement in performance and quality. In any case, when you make the decision to upgrade, you should consider other changes as well.