Audio and Visual Content Analysis

Research

Analyzing media content and making it accessible

The use and exploitation of audiovisual content depend on the availability of meaningful metadata (data describing data). They provide the basis for locating, organizing, and classifying specific content, as well as implementing recommendation systems. Technologies for the automatic extraction of metadata are therefore crucial to make media content truly accessible and usable.

Multimodal analysis and annotation of media data

The development of technologies for the automatic analysis and annotation of audiovisual data requires a solid understanding of signal processing and machine learning, along with a good comprehension of the underlying requirements.

Another challenge lies in multimodal analysis and orchestration: extracting metadata from audio, video, and image files involves a variety of processes ranging from preprocessing to feature extraction and classification. Different methods and technologies are employed, requiring flexible integration and orchestration. The integration of heterogeneous data from different sources and formats also requires the selection or development of suitable data models and metadata standards. Media archives often deal with large volumes of data, imposing specific requirements on system architecture, efficiency, and the optimization of the algorithms used.

Furthermore, we are involved in metadata standards, as well as the integration and orchestration of analysis components. We also address privacy concerns and other aspects of trustworthy AI, aiming to provide comprehensive solutions for specific application requirements.

Research areas Audio and Visual Content Analysis

Automatic Music Analysis

The focus is on the recognition of musical features such as pitch, rhythm, timbre, and genre, extending to musical transcription. The technologies enable music classifications, similarity analysis between musical pieces, and the detection of specific sound events and acoustic environments.

Automatic Music Analysis

Video Analysis

In visual analysis, the focus is on analyzing faces in videos. Through facial recognition and tracking, human faces can be analyzed and identified. Additionally, image processing techniques and machine learning are used to detect and classify animals in videos.

Video analysis

Provenance Analysis and Matching

The detection of recurring patterns, reuse of media content and transformation steps between different content provides insights into their origin and processing.

Multimodal and Crossmodal Analysis

To achieve optimal results, the methods described can be combined in many use cases or complemented with other analysis methods, such as metadata analysis. An important requirement for this is suitable interfaces, a common data model and the possibility of flexible orchestration and configuration of the analysis components used.

These technologies are applied particularly for tagging and indexing A/V archives, recommendation systems, program analysis, content tracking, and rights management. They are also used for audio-visual biodiversity measurement and to support disinformation detection.

Projects and activities

Research project

AI4Media

Center of excellence for AI in media – Our contributions: Audio forensics, audio provenance analysis, music analysis, privacy and recommendation systems

AI4Media

Research project

Construction-sAIt

Multi-modal AI-driven technologies for automatic construction site monitoring

Construction-sAIt

Research project

SAISBECO

Biodiversity identification software to automatically search through single images, video and audio recordings for sequences involving great apes.

SAISBECO

Research project

iMediaCities

Development of a digital platform to make the audio-visual cultural heritage of European cities accessible

iMediaCities

Research project

CUBRIK

Framework for multimedia search that combines "human and social computation" and content analysis

CUbRIK

Research project

MiCO

Platform for multimodal and context-based analysis, into which a wide variety of analysis components for different media types can be integrated

MICO

Range of services

Services

Media Analytics: Dienstleistungen zur Analyse und Annotation von Medieninhalten
Evaluation (Visual AI Assessment): Technische Evaluation von Verfahren, Komponenten und Systemen im Bereich Audio- und Videoanalyse

Publications

Jahr Year	Titel/Autor:in Title/Author	Publikationstyp Publication Type
2022	Construction-sAIt: Multi-modal AI-driven technologies for construction site monitoring Abeßer, Jakob; Loos, Alexander; Sharma, Prachi	Konferenzbeitrag Conference Paper
2016	A workflow for cross media recommendations based on linked data analysis Aichroth, P.; Berndl, E.; Weißgerber, T.; Kosch, H.; Köllmer, T.	Konferenzbeitrag Conference Paper
2015	MICO - Media in Context Aichroth, P.; Kurz, T.; Stadler, H.; Drewes, F.; Björklund, J.; Schlegel, K.; Berndl, E.; Perez, A.; Bowyer, A.; Volpini, A.; Weigel, C.	Konferenzbeitrag Conference Paper

Diese Liste ist ein Auszug aus der Publikationsplattform Fraunhofer-Publica

This list has been generated from the publication platform Fraunhofer-Publica

Audio and Visual Content Analysis

Extracting meaningful data from audiovisual content

Research areas "Audio and Visual Content Analysis"

News and upcoming events

Data Technology Seminar 2025

InsightPersona auf heise online

WSDB 2024

Tabbed contents

Research

Analyzing media content and making it accessible

Multimodal analysis and annotation of media data

Research areas Audio and Visual Content Analysis

Automatic Music Analysis

Video Analysis

Provenance Analysis and Matching

Multimodal and Crossmodal Analysis

Projects and activities

AI4Media

Construction-sAIt

SAISBECO

iMediaCities

CUBRIK

MiCO

Range of services

Services

Publications

Datasets

Contact Press / Media

Dr.-Ing. Uwe Kühhirt

Contact Press / Media

Hanna Lukashevich