Dataset icon

Dataset

MaskedWiki

Subtitle:
A large unsupervised corpus for coreference resolution.

Actions


Access Document


Files:

Authors


More by this author
Department:
Computer Science
Role:
Creator
Publisher:
University of Oxford
Publication date:
2019
Data location:
contact vid.kocijan@cs.ox.ac.uk
Format:
Digital
Documentation:
MaskedWiki is a large-scale dataset for coreference resolution. It contains 130M passages from Wikipedia where a noun occurs at least twice. The second occurrence is masked and the goal is to correctly predict it. It is thus similar to coreference resolution and can serve as a large pre-training dataset.

Terms of use


Metrics



If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP