Real-time assessment of listening effort using non-intrusive binaural prediction models
* Presenting author
Abstract:
Prediction models of human speech perception can provide important insights into factors affecting listening effort. In practical applications, such models can be used, e.g., for algorithm development and optimization by complementing time- and cost-intensive listening tests. Recent advances in machine learning have greatly extended the applicability of prediction models, especially with respect to their ability to derive predictions in a non-intrusive way, i.e., without access to a reference signal. This contribution presents a model concept combining a binaural front-end, which mimics the effective auditory processing of spatially distributed sound sources, and a monaural back-end, which computes listening effort predictions by using the output of a phoneme classifier commonly employed in automatic speech recognition. The model was validated in a number of studies in which model predictions were compared to subjective evaluation data, ranging from monitoring dialogs in broadcast signals, evaluating synthetic speech engines, speech enhancement algorithms, and different kinds of pathological speech. Furthermore, a real-time version of the model is presented which enables investigating highly dynamic acoustic scenes, in which the user can interact with the environment. Current strengths and limitations of the model will be illustrated and discussed.