Conference item icon

Conference item

On hapax legomena and morphological productivity

Abstract:
Quantifying and predicting morphological productivity is a long-standing challenge in corpus linguistics and psycholinguistics. The same challenge reappears in natural language processing in the context of handling words that were not seen in the training set (out-of-vocabulary, or OOV, words). Prior research showed that a good indicator of the productivity of a morpheme is the number of words involving it that occur exactly once (the hapax legomena). A technical connection was adduced between this result and Good-Turing smoothing, which assigns probability mass to unseen events on the basis of the simplifying assumption that word frequencies are stationary. In a large-scale study of 133 affixes in Wikipedia, we develop evidence that success in fact depends on tapping the frequency range in which the assumptions of Good-Turing are violated.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.18653/v1/P17

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Sub department:
Oxford e-Research Centre
Role:
Author
ORCID:
0000-0002-5989-3574
More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Engineering Science
Role:
Author


Publisher:
Association for Computational Linguistics
Host title:
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology
Journal:
SIGMORPHON 2018 More from this journal
Pages:
125–130
Publication date:
2018-10-01
Acceptance date:
2018-08-19
DOI:
ISBN:
9781948087766


Pubs id:
pubs:935230
UUID:
uuid:f1c03435-fff1-45bb-8df9-0b9be2d4831b
Local pid:
pubs:935230
Source identifiers:
935230
Deposit date:
2018-10-29

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP