Detecting Hate Speech using Machine Learning and Sampling Techniques

Chikova, Angela Yeukai

Detecting Hate Speech using Machine Learning and Sampling Techniques

dc.contributor.advisor	Dimililer, Nazife
dc.contributor.author	Chikova, Angela Yeukai
dc.date.accessioned	2025-11-14T13:24:12Z
dc.date.available	2025-11-14T13:24:12Z
dc.date.issued	2021
dc.date.submitted	2021-09
dc.department	Eastern Mediterranean University, School of Computing and Technology	en_US
dc.description	Master of Technology in Information Technology. Institute of Graduate Studies and Research. Thesis (M.Tech.) - Eastern Mediterranean University, School of Computing and Technology, 2021. Supervisor: Assoc. Prof. Dr. Nazife Dimililer.	en_US
dc.description.abstract	The spread of hate speech on social media platforms is a problem that is constantly becoming more imminent as the access to related technologies gets easier. This study focuses on detecting hate speech on an imbalanced multiclass twitter dataset using Machine Learning (ML) algorithms. The most commonly used ML algorithms namely, Logistic Regression, Support Vector Machines (SVM) and deep learning systems Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bi-directional Long Short-Term Memory (BiLSTM) and a hybrid model CNNBiLSTM have been used for hate speech detection. In order to overcome the problems that arise from using an imbalanced dataset several techniques are used to balance the dataset, Synthetic Minority Oversampling Technique (SMOTE), SMOTETomek, SMOTEENN, Adaptive Synthetic (ADASYN), class weights and the proposed method. Each classifier was trained with all data balancing techniques and their performances were compared in order to find the best classifier for classifying hate speech in the dataset. The best classifier was CNN using the proposed method and it had an F1-score of 0.96 with a Cohen Kappa score of 0.94 and an overall Recall and Precision score of 0.96. For the best system, the recall and precision scores for the hate class was 1.00 and 0.94 respectively.	en_US
dc.description.abstract	ÖZ: Nefret söyleminin sosyal medya platformlarında yayılması, ilgili teknolojilere erişim kolaylaştıkça sürekli artan bir sorundur. Bu çalışma, Makine Öğrenimi (ML) algoritmalarını kullanarak dengesiz çok sınıflı bir Twitter veri kümesinde nefret söylemini tespit etmeye odaklanmaktadır. En yaygın olarak kullanılan ML algoritmaları Lojistik Regresyon, Destek Vektör Makineleri (SVM) ve Kapılı Tekrarlayan Birim (GRU), Evrişimsel Sinir Ağı (CNN), Uzun Kısa Süreli Bellek (LSTM), Çift Yönlü Uzun Kısa- Nefret söyleminin tespiti için Term Memory (BiLSTM) ve bir hibrit model CNNBiLSTM gibi derin öğrenme sistemleri kullanılmıştır. Dengesiz bir veri kümesinin kullanılmasından kaynaklanan sorunların üstesinden gelmek için, veri kümesini dengelemek için çeşitli teknikler, Sentetik Azınlık Aşırı Örnekleme Tekniği (SMOTE), SMOTETomek, SMOTEENN, Uyarlanabilir Sentetik (ADASYN), sınıf ağırlıkları ve önerilen yöntem kullanılmıştır. Veri setinde nefret söylemini sınıflandırmak için en iyi sınıflandırıcıyı bulmak için her sınıflandırıcı her bir veri dengeleme tekniği ile eğitilmiş ve performansları karşılaştırılmıştır. Önerilen yöntemi kullanan en iyi sınıflandırıcı olarak 0.96'luk bir F1-puanına, 0.94'lik bir Cohen Kappa puanına ve 0.96'lik bir genel Geri Çağırma ve Kesinlik puanına sahip olan CNN algoritması belirlenmiştir. En iyi sınıflandırıcının nefret sınıfı için hatırlama ve kesinlik puanları sırasıyla 1.00 ve 0.94'tür.	en_US
dc.identifier.citation	Chikova, Angela Yeukai. (2021). Detecting Hate Speech using Machine Learning and Sampling Techniques. Thesis (M.Tech.), Eastern Mediterranean University, Institute of Graduate Studies and Research, Sch. of Computing and Technology, Famagusta: North Cyprus.	en_US
dc.identifier.uri	https://hdl.handle.net/11129/6516
dc.language.iso	en
dc.publisher	Eastern Mediterranean University (EMU) - Doğu Akdeniz Üniversitesi (DAÜ)	en_US
dc.relation.publicationcategory	Tez
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Thesis Tez	en_US
dc.subject	School of Computing and Technology	en_US
dc.subject	Information Technology	en_US
dc.subject	Machine learning--Artificial Inteligence--Speech Detecting	en_US
dc.subject	Hate speech--Detecting	en_US
dc.subject	Hate speech	en_US
dc.subject	multiclass imbalanced dataset	en_US
dc.subject	SMOTE	en_US
dc.subject	SMOTETomek	en_US
dc.subject	SMOTEENN	en_US
dc.subject	ADASYN	en_US
dc.subject	class weights	en_US
dc.subject	proposed method	en_US
dc.subject	machine learning	en_US
dc.subject	neural networks	en_US
dc.title	Detecting Hate Speech using Machine Learning and Sampling Techniques	en_US
dc.type	Master Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chikovaangela.pdf
Size:: 974.99 KB
Format:: Adobe Portable Document Format
Description:: Thesis, Master

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.77 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses (Master's and Ph.D) – SCT