WikiCREM dataset and attached models are described in the publication: WikiCREM: A Large Unsupervised Corpus for Coreference Resolution, presented at EMNLP 2019 The code (together with instructions) to run the models and utilize the data can be found here: https://github.com/vid-koci/bert-commonsense The format of the dataset follows the format of the Definite Pronoun Resolution dataset (Rahman and Ng, 2012): Each example is given in 5 lines. The first line is the sentence, with one noun replaced with [MASK]. The second line is [MASK] (the word that has to be replaced). The third line contains both candidates, separated with a comma. Note that the order of the candidates is NOT guaranteed to be random. The fourth line contains the correct candidate. The fifth line is empty.