Deep emotion recognition based on audio-visual correlation

Hajarolasvadi, Noushin; Demirel, Hasan

doi:10.1049/iet-cvi.2020.0013

Deep emotion recognition based on audio-visual correlation

dc.contributor.author	Hajarolasvadi, Noushin
dc.contributor.author	Demirel, Hasan
dc.date.accessioned	2026-02-06T18:43:43Z
dc.date.issued	2020
dc.department	Doğu Akdeniz Üniversitesi
dc.description.abstract	Human emotion recognition is studied by means of unimodal channels over the last decade. However, efforts continue to answer tempting questions about how variant modalities can complement each other. This study proposes a multimodal approach using three-dimensional (3D) convolutional neural networks (CNNs) to model human emotion through a modality-referenced system while investigating the solution to such questions. The proposed modality-referenced system selects the input data based on one of the modalities regarded as reference or master. The other modality which is referred to as a slave simply adjusts or attunes itself with the master in the temporal domain. In this context, the authors developed three multimodal emotion recognition system, namely, video-referenced system, audio-referenced system, and the audio-visual-referenced system to explore the congruence impact of audio and video modalities on each other. Two pipelines of 3D CNN architectures are employed where k-means clustering is used in the master pipeline and the slave pipeline adapts itself in a temporal sense. The outputs of the two pipelines are fused to improve recognition performance. In addition, canonical correlation analysis and t-distributed stochastic neighbour embedding is used validating the experiments. Results show that temporal alignment of the data between two modalities improves the recognition performance significantly.
dc.description.sponsorship	BAP-C project of Eastern Mediterranean University [BAP-C-02-18-0001]
dc.description.sponsorship	This research was funded by the BAP-C project of Eastern Mediterranean University under grant no. BAP-C-02-18-0001.
dc.identifier.doi	10.1049/iet-cvi.2020.0013
dc.identifier.endpage	527
dc.identifier.issn	1751-9632
dc.identifier.issn	1751-9640
dc.identifier.issue	7
dc.identifier.orcid	0009-0008-5201-5817
dc.identifier.orcid	0000-0002-3120-5370
dc.identifier.scopus	2-s2.0-85096127993
dc.identifier.scopusquality	Q2
dc.identifier.startpage	517
dc.identifier.uri	https://doi.org/10.1049/iet-cvi.2020.0013
dc.identifier.uri	https://hdl.handle.net/11129/13743
dc.identifier.volume	14
dc.identifier.wos	WOS:000598689800012
dc.identifier.wosquality	Q4
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Wiley
dc.relation.ispartof	Iet Computer Vision
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/closedAccess
dc.snmz	KA_WoS_20260204
dc.subject	Facial Expression
dc.subject	Face
dc.subject	Voice
dc.title	Deep emotion recognition based on audio-visual correlation
dc.type	Article

Collections

WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu

Deep emotion recognition based on audio-visual correlation

Files

Collections