3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms

Hajarolasvadi, Noushin; Demirel, Hasan

doi:10.3390/e21050479

3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms

dc.contributor.author	Hajarolasvadi, Noushin
dc.contributor.author	Demirel, Hasan
dc.date.accessioned	2026-02-06T18:24:02Z
dc.date.issued	2019
dc.department	Doğu Akdeniz Üniversitesi
dc.description.abstract	Detecting human intentions and emotions helps improve human-robot interactions. Emotion recognition has been a challenging research direction in the past decade. This paper proposes an emotion recognition system based on analysis of speech signals. Firstly, we split each speech signal into overlapping frames of the same length. Next, we extract an 88-dimensional vector of audio features including Mel Frequency Cepstral Coefficients (MFCC), pitch, and intensity for each of the respective frames. In parallel, the spectrogram of each frame is generated. In the final preprocessing step, by applying k-means clustering on the extracted features of all frames of each audio signal, we select k most discriminant frames, namely keyframes, to summarize the speech signal. Then, the sequence of the corresponding spectrograms of keyframes is encapsulated in a 3D tensor. These tensors are used to train and test a 3D Convolutional Neural network using a 10-fold cross-validation approach. The proposed 3D CNN has two convolutional layers and one fully connected layer. Experiments are conducted on the Surrey Audio-Visual Expressed Emotion (SAVEE), Ryerson Multimedia Laboratory (RML), and eNTERFACE'05 databases. The results are superior to the state-of-the-art methods reported in the literature.
dc.description.sponsorship	BAP-C project of Eastern Mediterranean University [BAP-C-02-18-0001]
dc.description.sponsorship	This research was funded by BAP-C project of Eastern Mediterranean University under grant number BAP-C-02-18-0001.
dc.identifier.doi	10.3390/e21050479
dc.identifier.issn	1099-4300
dc.identifier.issue	5
dc.identifier.orcid	0000-0002-3120-5370
dc.identifier.orcid	0009-0008-5201-5817
dc.identifier.pmid	33267193
dc.identifier.scopus	2-s2.0-85066604566
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.3390/e21050479
dc.identifier.uri	https://hdl.handle.net/11129/10021
dc.identifier.volume	21
dc.identifier.wos	WOS:000472675900043
dc.identifier.wosquality	Q2
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	PubMed
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Mdpi
dc.relation.ispartof	Entropy
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_WoS_20260204
dc.subject	speech emotion recognition
dc.subject	3D convolutional neural networks
dc.subject	deep learning
dc.subject	k-means clustering
dc.subject	spectrograms
dc.title	3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms
dc.type	Article

Collections

WoS Indexed Publications Collection
PubMed Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu

3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms

Files

Collections