On April 29th, NVIDIA published Riva Speech Skills release 2.1.0 (full release notes can be found here). Among other features, it includes an improvement of Conformer ASR latency/throughput.
Our attention was drawn to the first item from this list. On one of our current projects at Data Monsters, we use the Conformer CTC model in streaming mode, and performance is a big issue. Therefore, we performed a series of tests on it.
The figure above shows a performance comparison of the Conformer-CTC model in streaming mode between 2.10 and 1.10.0-beta Riva versions measured on a single Tesla V100 GPU.
Measurements were performed on a pretrained model with default riva-buildoptions.
The model is built on Riva 1.10.0-beta and Riva 2.1.0 with different configurations: ‘low_latency’, ‘intermediate’, and ‘high_throughput’ that refers to chunk sizes 160, 400, and 800.
The figure shows an increase in performance for each configuration that is more noticeable on an effective number of streams.
Up to about 20 streams, there is no significant difference. For more streams, a decrease in latency in Riva 2.1.0 with the same RTFX and the number of streams can be seen.
For example, the high_throughput model with 128 audio streams works about 100 milliseconds faster in Riva 2.1.0 than the same model with the same number of streams in Riva 1.10.0-beta.
If you have 20+ concurrent streams, you definitely will see improvement in performance. Any case, when you make a decision to upgrade, you should consider other changes as well.