Comparative analysis of CNN, vision transformer, and hybrid architectures for white blood cancer classification
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
Abstract
This study presents a comparative analysis of 13 artificial intelligence-based classification architectures for white blood cell classification using microscopic images. The models include four convolutional neural networks, five vision transformers, and four hybrid convolutional neural network-transformer architectures. All architectures were trained and tested on a publicly available Kaggle dataset under similar experimental settings to ensure fair comparison. Among all the models, MobileViT-XS, which has hybrid architecture, achieved the highest F1-score of 98.76%, indicating exceptional classification performance across all white blood cell categories. CNN-based DenseNet121 followed closely with an F1 score of 98.65%, though it required significantly more training time. In contrast, Vision Transformers such as ViT-Base underperformed, with an F1-score of only 87.36%, despite higher parameter complexity. These results underscore that vision transformers often require architectural optimization to perform well in medical imaging tasks. Overall, the results demonstrate that hybrid architecture variant deliver more accurate predictions while requiring less computational power. Their lightweight architecture make promising future candidate for deployment in clinical and mobile healthcare settings.










