Conference item
On hapax legomena and morphological productivity
- Abstract:
- Quantifying and predicting morphological productivity is a long-standing challenge in corpus linguistics and psycholinguistics. The same challenge reappears in natural language processing in the context of handling words that were not seen in the training set (out-of-vocabulary, or OOV, words). Prior research showed that a good indicator of the productivity of a morpheme is the number of words involving it that occur exactly once (the hapax legomena). A technical connection was adduced between this result and Good-Turing smoothing, which assigns probability mass to unseen events on the basis of the simplifying assumption that word frequencies are stationary. In a large-scale study of 133 affixes in Wikipedia, we develop evidence that success in fact depends on tapping the frequency range in which the assumptions of Good-Turing are violated.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 294.2KB, Terms of use)
-
- Publisher copy:
- 10.18653/v1/P17
Authors
- Publisher:
- Association for Computational Linguistics
- Host title:
- Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology
- Journal:
- SIGMORPHON 2018 More from this journal
- Pages:
- 125–130
- Publication date:
- 2018-10-01
- Acceptance date:
- 2018-08-19
- DOI:
- ISBN:
- 9781948087766
- Pubs id:
-
pubs:935230
- UUID:
-
uuid:f1c03435-fff1-45bb-8df9-0b9be2d4831b
- Local pid:
-
pubs:935230
- Source identifiers:
-
935230
- Deposit date:
-
2018-10-29
Terms of use
- Copyright holder:
- Special Interest Group on Computational Morphology and Phonology
- Copyright date:
- 2018
- Notes:
- Copyright 2018 The Special Interest Group on Computational Morphology and Phonology. This paper is licensed on a Creative Commons Attribution 4.0 International License.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record