Dataset icon

Dataset

Early Slavic word embeddings [Data set]

Documentation:
Word embeddings trained on the lemmatised TOROT Treebank, using Word2Vec and the following parameters: sg = True min_count = <1,3,5> window = <3,5> vector_size = <100,200,300> epochs = 5 One model was trained for each combination of the parameters enclosed in angled brackets (< >). The release contains both the full models (.model) and the plain vector files (_vectors.txt). The models are named according to the parameters they were trained with. Note that these are the result of very preliminary experiments and no systematic evaluation of their quality was carried out, so use with caution.

Actions

Access Document

Files:
Publisher copy:
10.5281/zenodo.8414137
Publication website:
https://doi.org/10.5281/zenodo.8414137

Authors/Creators

More by this author/creator
Institution:
University of Oxford
Division:
HUMS
Department:
Linguistics Philology & Phonetics
Oxford college:
St Hugh's College
Role:
Creator
ORCID:
0000-0003-3757-2961


More from this funder
Funder identifier:
https://ror.org/03n0ht308
Grant:
2266900


Publisher:
Zenodo
Publication date:
2023
Digital storage location:
https://doi.org/10.5281/zenodo.8414137
DOI:


Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP