Speaker recognition

Speaker authentication is important in all situations where people speak and need to be clearly recognised. This can be the case in human-machine interaction as well as in conversations between several people.

Just a few seconds of audio material are sufficient to identify the person speaking with the help of intelligent algorithms. Newly generated and already known data are compared in order to confirm or disprove that they are similar. It is possible, for example, to detect whether the same person is speaking in different audio recordings.

However, by distinguishing the individual speakers we can not only assess who is speaking in the recording at a given time but also where and how many people can be heard in the recording overall. In addition, we can identify the language spoken in the audio file.

In production

When it is necessary to identify exactly who is speaking in a production environment at a given moment, the intelligent algorithms of Fraunhofer IDMT in Oldenburg come into play. Especially when certain machines may only be operated by authorised users, it is important to know who is giving the command. If the machine recognises that the operator is unauthorised, it will not be activated.

To give additional persons access, our speaker recognition system makes it possible to create a new SpeakerID within a few seconds. The new operator is then also able to execute voice commands on the machine. In our industrial working group “Audio Technology for Intelligent Production AiP”, we are working together with industrial partners on possible applications for this technology in practice.


If individual speakers can be identified, this also represents a possibility, for example, to search systematically in media archives. If the same person is speaking in several recordings, this can be filtered out. It is possible to identify for how long each person is speaking and, in this way, filter out a specific speaker. When only looking for content in a particular foreign language, thanks to the intelligent algorithms this can also be extracted.

In security-critical areas

Similar to a fingerprint, it is possible to identify an individual person via their voice. In combination with other biometric identifiers, such as facial recognition, speaker recognition can be used in security-critical areas. For example, it can be used in forensics to determine the identity of a speaker in sound recordings.

In healthcare

Across groups, we at Fraunhofer IDMT are looking at a wide range of potential applications for our technologies. Speaker authentication can also be used for monitoring speech and voice disorders. It can be used at the same time to check how speech therapy measures are progressing.

Voice-based user management in production

Privacy warning

With the click on the play button an external video from www.youtube.com is loaded and started. Your data is possible transferred and stored to third party. Do not start the video if you disagree. Find more about the youtube privacy statement under the following link: https://policies.google.com/privacy

With our voice authentication, authorized users can control machines securely by voice command. In our video, we show you how to create profiles for new users and that we can reliably assign voices even in noisy environments. Subtitles are available.

We were able to filter out that the same two people are speaking in all three audio samples. This is visible in the high score value.
Three languages could be extracted from the sample audio files: German, Dutch and Norwegian.
In the detail you can see in red and green the speech parts of the two persons, which were separated from each other.

Further Information


Voice Filtering

Our technology can recognise, separate and filter out different speakers.


Press information / 4.11.2021

Better understanding

Tonmeistertagung 2021: Fraunhofer IDMT presents solutions for analyzing, evaluating and improving speech intelligibility


Your sound, in every situation

Everyone hears differently. »YourSound« of Fraunhofer IDMT in Oldenburg enables users of audio devices to adjust audio to their own acoustic preferences – as easy as never before!


Analysis and optimization of speech intelligibility

Our software solutions are able to measure, display and optimize speech intelligibility – automatically if needed.


All solutions at a glance

Here you can find further information about our solutions of the Oldenburg Branch for Hearing, Speech and Audio Technology HSA.


Audio Signal Enhancement

Make use of our specialist expertise and let us develop software and hardware for you to improve your acoustic signals – for audio signals with real added value.