Unintended Bias Evaluation: An Analysis of Hate Speech Detection and Gender Bias Mitigation on Social Media Using Ensemble Learning

Abstract

Hate speech on online social media platforms is now at a level that has been considered a serious concern by governments, media outlets, and scientists, especially because it is easily spread, promoting harm to individuals and society, and made it virtually impossible to tackle with using just human analysis. Automatic approaches using machine learning and natural language processing are helpful for detection. For such applications, amongst several different approaches, it is essential to investigate the systems` robustness to deal with biases towards identity terms (gender, race, religion, for example). In this work, we analyse gender bias in different datasets and proposed a ensemble learning approach based on different feature spaces for hate speech detection with the aim that the model can learn from different abstractions of the problem, namely unintended bias evaluation metrics. We have used nine different feature spaces to train the pool of classifiers and evaluated our approach on a publicly available corpus, and our results demonstrate its effectiveness compared to state-of-the-art solutions.

Publication
Expert Systems with Applications
Márjory Da Costa Abreu
Márjory Da Costa Abreu
Associate Professor in Ethical Artificial Intelligence and Transforming Lives Fellow

Feminist, Anti Racist and Anti Fascist.