Dataset icon

Dataset

MaskedWiki

Alternative title:
A large unsupervised corpus for coreference resolution.
Documentation:
MaskedWiki is a large-scale dataset for coreference resolution. It contains 130M passages from Wikipedia where a noun occurs at least twice. The second occurrence is masked and the goal is to correctly predict it. It is thus similar to coreference resolution and can serve as a large pre-training dataset.

Actions

Access Document

Authors/Creators

More by this author/creator
Department:
Computer Science
Role:
Creator


Publisher:
University of Oxford
Publication date:
2019
File format:
.txt and binary
Digital storage location:
contact [email protected]


Pubs id:
1480492
UUID:
uuid:9b34602b-c982-4b49-b4f4-6555b5a82c3d
Local pid:
pubs:1480492
Deposit date:
2019-06-06
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP