Journal article icon

Journal article

Inducing Tree-Substitution Grammars.

Abstract:
Inducing a grammar from text has proven to be a notoriously challenging learning task despite decades of research. The primary reason for its difficulty is that in order to induce plausible grammars, the underlying model must be capable of representing the intricacies of language while also ensuring that it can be readily learned from data. The majority of existing work on grammar induction has favoured model simplicity (and thus learnability) over representational capacity by using context free grammars and first order dependency grammars, which are not sufficiently expressive to model many common linguistic constructions. We propose a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures. To limit the model's complexity we employ a Bayesian non-parametric prior which biases the model towards a sparse grammar with shallow productions. We demonstrate the model's efficacy on supervised phrase-structure parsing, where we induce a latent segmentation of the training treebank, and on unsupervised dependency grammar induction. In both cases the model uncovers interesting latent linguistic structures while producing competitive results. © 2010 Evangelos Theodorou, Jonas Buchli and Stefan Schaal.
Publication status:
Published

Actions


Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author


Journal:
Journal of Machine Learning Research More from this journal
Volume:
11
Pages:
3053-3096
Publication date:
2010-01-01
EISSN:
1533-7928
ISSN:
1532-4435


Language:
English
Keywords:
Pubs id:
pubs:328099
UUID:
uuid:b21b97ac-ed95-44d9-a401-7d943bc91197
Local pid:
pubs:328099
Source identifiers:
328099
Deposit date:
2012-12-19

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP