Energy Use of Speech Algorithms on Intelligent Terminals

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Speech algorithms on intelligent terminals must operate under strict limits of power, memory, and heat. Many earlier studies focused on model size or FLOPs rather than direct energy use. In this study, we measured energy on smartphones, wearables, and embedded boards running automatic speech recognition (ASR) and keyword spotting (KWS). A total of 100 runs were carried out in laboratory and office conditions. The tested systems used quantization, pruning, and front-end subsampling, combined with DVFS and NPU scheduling. Results showed that per-inference energy dropped by 24.8% for ASR and 31.2% for KWS. Word error rate changed by no more than 0.2 absolute, and KWS F1 score declined by no more than 0.3 points. A regression model showed that model size and feature dimension explained most of the measured energy (R2=0.80R^2 = 0.80R2=0.80). These results show that direct energy measurement with hardware-aware design gives a clear and repeatable way to study speech processing on devices. Lightweight models linked with hardware security can support stable and low-power speech systems on the edge. The limits of this work are short test time, a small set of devices, and lack of far-field or multilingual speech, which future studies should address.

Article activity feed