Contribution

DOAVINCI: Direction of Arrival based Videoconferencing Incorporating Neural Networks for Increased Conversational Intelligibility

* Presenting author
Day / Time: 20.03.2025, 11:00-11:40
Typ: Poster
Information:

The posters will be exhibited in Hall E north from Tuesday to Thursday, sorted by thematic context in the poster island indicated in the session title. The poster session at the specified time offers the opportunity to enter into discussion with the authors.

Abstract: This paper introduces and evaluates DOAVINCI: direction of arrival based videoconferencing that incorporates neural networks to enhance conversational intelligibility. It leverages a spherical microphone array and a 360° camera to improve both audio and visual focus on active speakers. DOAVINCI employs deep learning based direction of arrival (DOA) estimation in the spherical harmonics domain, complemented by a voice activity detection. The detected DOA informs a beamforming algorithm that focuses on the active speaker, aiming to improve speech intelligibility by attenuating background noise. Additionally, the DOA information directs a zoomed and perspective-corrected view of the active speaker within the 360° video stream, aligning visual attention with auditory focus. The tool’s effectiveness in enhancing speech intelligibility is evaluated using the Short-Time Objective Intelligibility (STOI) metric across different realistic scenarios including varying SNR conditions.