Patch Token Fusion in Vision Transformers for Brain Cancer Classification
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
Abstract
Accurate and robust image classification plays a critical role in advancing medical diagnostics, particularly in detecting complex conditions such as brain cancer. This study investigates the integration of multiple Vision Transformer (ViT) models for patch-token-based image classification, aiming to enhance diagnostic accuracy. By leveraging three pre-trained ViT architectures (TinyViT, SmallViT, and BaseViT), features from each model are dynamically extracted, aligned, and combined into a unified representation for classification. The proposed approach demonstrated significant improvements in accuracy, AUC, and F1-score when evaluated across various model combinations and configurations. The highest performance was observed with specific combinations, achieving an accuracy of 95.96%, AUC of 99.58%, and F1-score of 95.95% for the ViT-Tiny-based classifier.










