Exploring multicepstral features in a new classical machine learning-based framework for replay attack detection
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
Abstract
The integration of Internet of Things (IoT) technologies has accelerated the adoption of recognition and authentication systems, offering seamless access across devices from smart homes to workplace systems. Among biometric traits, voice stands out due to its simplicity, cleanliness, low capture cost, uniqueness, and the extensive computational resources supporting it in the scientific literature. Recently, however, spoofing risks have emerged as a serious challenge to the security of voice-based systems. To counteract these threats without additional hardware, techniques analyzing inherent voice signal features have been developed. This paper introduces a new soft computing framework based on classical machine learning classifiers such as Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR), comprising Gaussian-noise-based data augmentation, extraction and fusion of multiple cepstral and non-cepstral features, and dimensionality reduction through Singular Value Decomposition (SVD). In particular, we explore eight distinct cepstral extraction techniques, exemplified by popular approaches such as MFCC and CQCC, and sixteen additional non-cepstral metrics such as Zero Crossing Rate (ZCR) and Harmonic-to-Noise Ratio (HNR). Additionally, we generalize cepstral pattern representation by proposing cepstral multiprojection, a novel strategy designed to systematically reduce the dimensionality and redundancy of multicepstral matrices, thereby enhancing discriminative power and computational efficiency. Evaluated with the ASVSpoof 2017 v2.0 competition benchmark, our approach achieved competitive results, reaching 5.14% equal error rate (EER) on the Dev set and 10.58% on the Eval set,










