Conference item
Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
- Abstract:
- This paper explores the possibility of improving the performance of specialized parsers for premodern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP [8], a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachment score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, 252.3KB, Terms of use)
-
- Publication website:
- http://ceur-ws.org/Vol-2723/
Authors
Contributors
+ Karsdorp, F
- Role:
- Editor
+ McGillivray, B
- Role:
- Editor
+ Nerghes, A
- Role:
- Editor
+ Wevers, M
- Role:
- Editor
+ Economic and Social Research Council
More from this funder
- Funding agency for:
- Pedrazzini, N
- Grant:
- ES/P000649/1
- Publisher:
- CEUR Workshop Proceedings
- Host title:
- Proceedings of the Workshop on Computational Humanities Research (CHR 2020)
- Volume:
- 2723
- Pages:
- 237-247
- Publication date:
- 2020-11-02
- Acceptance date:
- 2020-08-28
- Event title:
- CHR 2020: Workshop on Computational Humanities Research
- Event location:
- Amsterdam, The Netherlands
- Event start date:
- 2020-11-18
- Event end date:
- 2020-11-20
- ISSN:
-
1613-0073
- Language:
-
English
- Keywords:
- Pubs id:
-
1141364
- Local pid:
-
pubs:1141364
- Deposit date:
-
2020-11-09
Terms of use
- Copyright holder:
- Pedrazzini, N
- Copyright date:
- 2020
- Rights statement:
- © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
- Notes:
- This paper has been accepted for CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The Netherlands.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record