Good speech intelligibility is of fundamental importance in media productions, broadcasting or streaming services. However, everyone's hearing is different. For this reason, it is often a challenge for sound professionals to objectively assess whether speech is sufficiently intelligible for listeners.
First analyze, then improve
Experts of the Fraunhofer Institute for Digital Media Technology IDMT in Oldenburg develop software solutions that, among other things, analyze and evaluate the intelligibility of speech in real-time. On request, intelligent algorithms provide information for the perfect audio mix or automatically improve speech intelligibility. This generates added value in various industries, such as radio and television, for content providers as well as in the areas of telecommunications, consumer electronics and security.
Intelligent evaluation and separation of dialog and atmo
Using machine learning methods, the solutions of Fraunhofer IDMT automatically identify audio signals in which speech occurs. The special feature is that their subsequent quality assessment is not based on the loudness ratio between speech and background (SNR), as is often the case, but on intelligibility as the assessment measure. For this, artificial intelligence is used to determine the listening effort of the already mixed signal. If necessary, specially developed source separation algorithms are used, allowing dialogs to be emphasized even in the presence of complex background acoustics, for example with music and sound effects. Since this can be implemented with minimal delay in signal processing, the solutions can be used not only in pre-processing but also in broadcasting operations or in end devices at the listener's premises. Use directly on the set or in the area of sound reinforcement technology is also conceivable. In conference or telephony solutions, the speech signal can be automatically adapted to ambient noise through adaptive signal processing. In addition, Fraunhofer IDMT places a further development focus on the reliable detection and separation of different speakers.