- A large unsupervised corpus for coreference resolution.
- University of Oxford
- Publication date:
- Data location:
- contact firstname.lastname@example.org
- MaskedWiki is a large-scale dataset for coreference resolution. It contains 130M passages from Wikipedia where a noun occurs at least twice. The second occurrence is masked and the goal is to correctly predict it. It is thus similar to coreference resolution and can serve as a large pre-training dataset.
- Copyright date: