WikiCREM dataset and attached models are described in the publication: WikiCREM: A Large Unsupervised Corpus for Coreference Resolution, presented at EMNLP 2019
The code (together with instructions) to run the models and utilize the data can be found here: https://github.com/vid-koci/bert-commonsense

The format of the dataset follows the format of the Definite Pronoun Resolution dataset (Rahman and Ng, 2012):
Each example is given in 5 lines.
The first line is the sentence, with one noun replaced with [MASK].
The second line is [MASK] (the word that has to be replaced).
The third line contains both candidates, separated with a comma. Note that the order of the candidates is NOT guaranteed to be random.
The fourth line contains the correct candidate.
The fifth line is empty.