TY - GEN
T1 - Dependency syntax analysis using grammar induction and a lexical categories precedence system
AU - Calvo, Hiram
AU - Gambino, Omar J.
AU - Gelbukh, Alexander
AU - Inui, Kentaro
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2011.
PY - 2011
Y1 - 2011
N2 - The unsupervised approach for syntactic analysis tries to discover the structure of the text using only raw text. In this paper we explore this approach using Grammar Inference Algorithms. Despite of still having room for improvement, our approach tries to minimize the effect of the current limitations of some grammar inductors by adding morphological information before the grammar induction process, and a novel system for converting a shallow parse to dependencies, which reconstructs information about inductor's undiscovered heads by means of a lexical categories precedence system. The performance of our parser, which needs no syntactic tagged resources or rules, trained with a small corpus, is 10% below to that of commercial semi-supervised dependency analyzers for Spanish, and comparable to the state of the art for English.
AB - The unsupervised approach for syntactic analysis tries to discover the structure of the text using only raw text. In this paper we explore this approach using Grammar Inference Algorithms. Despite of still having room for improvement, our approach tries to minimize the effect of the current limitations of some grammar inductors by adding morphological information before the grammar induction process, and a novel system for converting a shallow parse to dependencies, which reconstructs information about inductor's undiscovered heads by means of a lexical categories precedence system. The performance of our parser, which needs no syntactic tagged resources or rules, trained with a small corpus, is 10% below to that of commercial semi-supervised dependency analyzers for Spanish, and comparable to the state of the art for English.
UR - http://www.scopus.com/inward/record.url?scp=79952274639&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-19400-9_9
DO - 10.1007/978-3-642-19400-9_9
M3 - Contribución a la conferencia
SN - 9783642193996
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 109
EP - 120
BT - Computational Linguistics and Intelligent Text Processing - 12th International Conference, CICLing 2011, Proceedings
T2 - 12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011
Y2 - 20 February 2011 through 26 February 2011
ER -