Working paper
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
- Abstract:
- Detecting online hate is a complex task, and low-performing detection models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for online hate detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate how detection models perform on hateful language expressed with emoji. Using the test suite, we expose weaknesses in existing hate detection models. To address these weaknesses, we create the HatemojiTrain dataset using an innovative human-and-model-in-the-loop approach. Models trained on these 5,912 adversarial examples perform substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate. Both HatemojiCheck and HatemojiTrain are made publicly available.
Actions
Authors
- Publication date:
- 2021-08-12
- Language:
-
English
- Keywords:
- Pubs id:
-
1190679
- Local pid:
-
pubs:1190679
- Deposit date:
-
2021-08-12
Terms of use
- Copyright holder:
- Kirk et al.
- Copyright date:
- 2021
If you are the owner of this record, you can report an update to it here: Report update to this record