Hate Speech Detection in Social Media

DSpace Home
→
08 Faculty of Arts and Sciences
→
Department of Mathematics
→
Theses (Master's and Ph.D) – Mathematics
→
View Item

dc.contributor.advisor	Dimililer, Nazife
dc.contributor.author	Aljero, Mona Khalifa A.
dc.date.accessioned	2024-05-29T12:39:56Z
dc.date.available	2024-05-29T12:39:56Z
dc.date.issued	2022-01
dc.date.submitted	2022-01
dc.identifier.citation	Aljero, Mona Khalifa A.. (2022). Hate Speech Detection in Social Media. Thesis (Ph.D.), Eastern Mediterranean University, Institute of Graduate Studies and Research, Dept. of Mathematics, Famagusta: North Cyprus.	en_US
dc.identifier.uri	http://hdl.handle.net/11129/5887
dc.description	Doctor of Philosophy in Applied Mathematics and Computer Science. Institute of Graduate Studies and Research. Thesis (Ph.D.) - Eastern Mediterranean University, Faculty of Arts and Sciences, Dept. of Mathematics, 2022. Supervisor: Assoc. Prof. Dr. Nazife Dimililer.	en_US
dc.description.abstract	Hate speech is a phenomenal issue for social media platforms. Recently a rapid increase in hate speech happened all over social media platforms. The aim of this thesis is to improve the performance of the current state-of-the-art for binary text classification in terms of hate speech on social media platforms. The popularity of social media has grown dramatically in recent years. Because of the ease of use and anonymity of the user identity, this increase coincided with the growth of hate speech on social media platforms. Due to the increasing propagation of hate speech, these platforms must implement an automatic hate speech identification system. Hate speech recognition is a difficult task in text mining, due to the use of colloquial language, intentional or incorrect spelling variations. The limitation of the message size on social media platforms also complicates the task since the context of the message is not readily available. Various approaches have been applied to text classification using supervised machine learning models, unsupervised machine learning models, and ensemble approaches. Nevertheless, these approaches did not acquire sufficient confidence to be implemented on social media platforms to address the classification of hate speech. Through this thesis, we proposed two models for detecting hate speech on social media platforms. In the first proposed approach, we developed a model using the novel stacking approach, when two levels of classifiers are used for improving hate speech performance. The second approach based on genetic programming (GP), which is an optimization technique. In the GP approach, a novel mutation technique that combines the standard one-point mutation with a novel feature mutation is employed. Both proposed methods were tested on four publicly available datasets of varying sizes. The experimental results show an improvement in the performance over the other used approaches in this thesis. The results show that the GP approach improves the performance on all datasets, compared to the state-of-the-art in terms of F1-score. On the other hand, in comparison with the state-of-the-art, the stacking approach improves the performance on three over four of the used datasets. Keywords: hate speech, text classification, classifier, classifier ensembles, stacking ensemble, text mining, genetic programming, pattern classification.	en_US
dc.description.abstract	ÖZ: Bu tezin amacı, sosyal medya platformlarında nefret söylemi tespit etmek için makine öğrenimi yaklaşımlarının kullanımını araştırarak son teknolojiyi geliştirmektir. Kitlelerin günlük yaşamlarında sosyal medyanın yaygın kullanımındaki keskin artışa paralel olarak sosyal medyanın nispeten kontrolsüz doğası ve kullanıcıların kimliğinin saklanabilmesi nedeniyle üretilen küfürlü ve nefret dolu içerik miktarı da artmaktadır. Nefret söyleminin yayılmasının bireyler ve toplum üzerinde ciddi sonuçları olabileceğinden sosyal medya platformları, nefret söylemini tespit etmek ve önlemek için otomatik nefret söylemi tanımlama sistemleri uygulamalıdır. Bununla birlikte, sosyal medyada nefret söyleminin tespit edilmesi, günlük konuşma dilinin kullanılması, kasıtlı veya kasıtsız yanlış yazım varyasyonları nedeniyle zor bir görevdir. Sosyal medya platformlarında mesaj boyutunun sınırlı olması nedeniyle mesajın bağlamının belirlenememesi de görevi karmaşıklaştırmaktadır. Denetimli makine öğrenimi modellerini, denetimsiz makine öğrenimi modellerini ve topluluk yaklaşımlarını kullanan çeşitli sınıflandırma yaklaşımları önerilmiş olsa da hala nefret söylemi tespiti konusunda elde edilen başarı yeterli değildir. Bu tez ile sosyal medya platformlarında nefret söylemini tespit etmek için iki model önerilmiştir. Önerilen ilk yaklaşımda hem temel sınıflandırıcıların hem de meta sınıflandırıcının aynı özellik setini kullandığı iki seviyeli bir yığınlama mimarisi önerilmiştir. Önerilen ikinci yaklaşım, bir optimizasyon tekniği olan genetik programlamaya (GP) dayanmaktadır. GP yaklaşımında, standart tek noktalı mutasyonu yeni bir özellik mutasyonu ile birleştiren yeni bir mutasyon tekniği kullanılmıştır. Önerilen her iki yöntem de çeşitli boyutlarda halka açık dört veri kümesi üzerinde test edilmiş ve deneysel sonuçlar, bu tezde kullanılan diğer yaklaşımlara göre performansta bir gelişme olduğunu kanıtlamıştır. Yığınlama yaklaşımı, kullanılan veri kümelerinin dördünden üçünde en son teknolojinin performansını iyileştirmiştir. Ayrıca sonuçlar, GP yaklaşımının performansının tüm veri kümelerinde en son teknolojiyi aştığını göstermektedir. Anahtar Kelimeler: nefret söylemi, metin sınıflandırması, sınıflandırıcı, sınıflandırıcı toplulukları, yığınlama topluluğu, metin madenciliği, genetik programlama.	en_US
dc.language.iso	eng	en_US
dc.publisher	Eastern Mediterranean University (EMU) - Doğu Akdeniz Üniversitesi (DAÜ)	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Mathematics	en_US
dc.subject	Applied Mathematics and Computer Science	en_US
dc.subject	Computer security--Speech Detection	en_US
dc.subject	Social Media--Hate Speech Detection--Computer Security	en_US
dc.subject	Internet--Social aspects--Cyberbullying--Harasssment	en_US
dc.subject	Natural language processing (Computer science)	en_US
dc.subject	Computational intelligence--Language Detection--Speech Detection	en_US
dc.subject	Hate speech	en_US
dc.subject	text classification	en_US
dc.subject	classifier	en_US
dc.subject	classifier ensembles	en_US
dc.subject	stacking ensemble	en_US
dc.subject	text mining	en_US
dc.subject	genetic programming	en_US
dc.subject	pattern classification	en_US
dc.title	Hate Speech Detection in Social Media	en_US
dc.type	doctoralThesis	en_US
dc.contributor.department	Eastern Mediterranean University, Faculty of Arts and Sciences, Dept. of Mathematics	en_US