Dataset
MaskedWiki
- Alternative title:
- A large unsupervised corpus for coreference resolution.
- Documentation:
- MaskedWiki is a large-scale dataset for coreference resolution. It contains 130M passages from Wikipedia where a noun occurs at least twice. The second occurrence is masked and the goal is to correctly predict it. It is thus similar to coreference resolution and can serve as a large pre-training dataset.
Actions
Access Document
- Files:
-
-
(zip, 222.4MB, Terms of use)
-
(zip, 2.5GB, Terms of use)
-
(plain, 918.0B, Terms of use)
-
(zip, 3.5GB, Terms of use)
-
Authors/Creators
- Publisher:
- University of Oxford
- Publication date:
- 2019
- File format:
- .txt and binary
- Digital storage location:
- contact [email protected]
- Pubs id:
-
1480492
- UUID:
-
uuid:9b34602b-c982-4b49-b4f4-6555b5a82c3d
- Local pid:
-
pubs:1480492
- Deposit date:
-
2019-06-06
- ARK identifier:
Terms of use
- Copyright date:
- 2019
- Licence:
- ORA Terms and conditions
If you are the owner of this record, you can report an update to it here: Report update to this record