TY - JOUR
T1 - AI-Based Hate Speech Detection System Using Video URLs for Effective Content Moderation
AU - Khan, Zohaib Ahmad
AU - Xia, Yuanqing
AU - Khaliq, Fiza
AU - Jiang, Weiwei
AU - Anwar, Muhammad Shahid
N1 - Publisher Copyright:
© 2001-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - Countering online hate speech is essential for creating a safer digital space where positive interactions can thrive. As central hubs of global communication, platforms like social media platforms require effective moderation through explainable and affective computing approaches. This study introduces a novel artificial intelligence-driven system for detecting misogynstic discourse. We collected 11,245 YouTube video uniform resource locators using specific keywords, then extracted audio to create Urdu transcripts and transliterated them into Roman Urdu, resulting in two distinct datasets. Various feature sets were explored using classic machine learning and deep learning algorithms. The results showed that classical models achieved 0.90 accuracy on the Urdu dataset, while deep learning models reached 0.96 accuracy on Roman Urdu. The corpus is publicly available to promote transparency and further research. Comparative evaluations against existing English hate speech dataset demonstrate the effectiveness of the proposed approach. This work lays the foundation for more ethical and transparent content moderation systems.
AB - Countering online hate speech is essential for creating a safer digital space where positive interactions can thrive. As central hubs of global communication, platforms like social media platforms require effective moderation through explainable and affective computing approaches. This study introduces a novel artificial intelligence-driven system for detecting misogynstic discourse. We collected 11,245 YouTube video uniform resource locators using specific keywords, then extracted audio to create Urdu transcripts and transliterated them into Roman Urdu, resulting in two distinct datasets. Various feature sets were explored using classic machine learning and deep learning algorithms. The results showed that classical models achieved 0.90 accuracy on the Urdu dataset, while deep learning models reached 0.96 accuracy on Roman Urdu. The corpus is publicly available to promote transparency and further research. Comparative evaluations against existing English hate speech dataset demonstrate the effectiveness of the proposed approach. This work lays the foundation for more ethical and transparent content moderation systems.
UR - https://www.scopus.com/pages/publications/105013050139
U2 - 10.1109/MIS.2025.3594849
DO - 10.1109/MIS.2025.3594849
M3 - Article
AN - SCOPUS:105013050139
SN - 1541-1672
VL - 40
SP - 29
EP - 40
JO - IEEE Intelligent Systems
JF - IEEE Intelligent Systems
IS - 6
ER -