Conference item icon

Conference item

Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic

Abstract:
This paper explores the possibility of improving the performance of specialized parsers for premodern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP [8], a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachment score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publication website:
http://ceur-ws.org/Vol-2723/

Authors


More by this author
Institution:
University of Oxford
Division:
HUMS
Department:
Linguistics Philology and Phonetics Faculty
Role:
Author

Contributors

Role:
Editor
Role:
Editor
Role:
Editor
Role:
Editor


More from this funder
Funding agency for:
Pedrazzini, N
Grant:
ES/P000649/1


Publisher:
CEUR Workshop Proceedings
Host title:
Proceedings of the Workshop on Computational Humanities Research (CHR 2020)
Volume:
2723
Pages:
237-247
Publication date:
2020-11-02
Acceptance date:
2020-08-28
Event title:
CHR 2020: Workshop on Computational Humanities Research
Event location:
Amsterdam, The Netherlands
Event start date:
2020-11-18
Event end date:
2020-11-20
ISSN:
1613-0073


Language:
English
Keywords:
Pubs id:
1141364
Local pid:
pubs:1141364
Deposit date:
2020-11-09

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP