The unsupervised approach: Grammar induction

Omar J. Gambino; Hiram Calvo

doi:10.1007/978-3-319-74054-6_8

The unsupervised approach: Grammar induction

Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review

Abstract

There are mainly two approaches for creating syntactic dependency analyzers: supervised and unsupervised. The main goal of the first approach is to attain the best possible performance for a single language. For this purpose, a large collection of resources is gathered (using manually annotated corpora with part-of-speech annotations and syntactic and structure tags), which requires a significant amount of work and time. The state of the art in this approach attains syntactic annotation in about 85% of all full sentences (Rooth in Proceedings of the symposium on representation and acquisition of lexical knowledge. AAAI, 1995 [172]); in English, it attains over 90%. On the other hand, the unsupervised approach tries to discover the structure of a text using only raw text, which allows the creation of a dependency analyzer for virtually any language. Here, we explore this second approach. We present the model of an unsupervised dependency analyzer, named DILUCT-GI (GI short for grammar inference).

Original language	English
Title of host publication	Studies in Computational Intelligence
Publisher	Springer Verlag
Pages	111-124
Number of pages	14
DOIs	https://doi.org/10.1007/978-3-319-74054-6_8
State	Published - 2018

Publication series

Name	Studies in Computational Intelligence
Volume	765
ISSN (Print)	1860-949X

Access to Document

10.1007/978-3-319-74054-6_8

Cite this

@inbook{3fc23ba4eccf4388aadd288ad2f38ebe,

title = "The unsupervised approach: Grammar induction",

abstract = "There are mainly two approaches for creating syntactic dependency analyzers: supervised and unsupervised. The main goal of the first approach is to attain the best possible performance for a single language. For this purpose, a large collection of resources is gathered (using manually annotated corpora with part-of-speech annotations and syntactic and structure tags), which requires a significant amount of work and time. The state of the art in this approach attains syntactic annotation in about 85% of all full sentences (Rooth in Proceedings of the symposium on representation and acquisition of lexical knowledge. AAAI, 1995 [172]); in English, it attains over 90%. On the other hand, the unsupervised approach tries to discover the structure of a text using only raw text, which allows the creation of a dependency analyzer for virtually any language. Here, we explore this second approach. We present the model of an unsupervised dependency analyzer, named DILUCT-GI (GI short for grammar inference).",

author = "Gambino, {Omar J.} and Hiram Calvo",

note = "Publisher Copyright: {\textcopyright} Springer International Publishing AG 2018.",

year = "2018",

doi = "10.1007/978-3-319-74054-6_8",

language = "Ingl{\'e}s",

series = "Studies in Computational Intelligence",

publisher = "Springer Verlag",

pages = "111--124",

booktitle = "Studies in Computational Intelligence",

address = "Alemania",

}

TY - CHAP

T1 - The unsupervised approach

T2 - Grammar induction

AU - Gambino, Omar J.

AU - Calvo, Hiram

N1 - Publisher Copyright: © Springer International Publishing AG 2018.

PY - 2018

Y1 - 2018

N2 - There are mainly two approaches for creating syntactic dependency analyzers: supervised and unsupervised. The main goal of the first approach is to attain the best possible performance for a single language. For this purpose, a large collection of resources is gathered (using manually annotated corpora with part-of-speech annotations and syntactic and structure tags), which requires a significant amount of work and time. The state of the art in this approach attains syntactic annotation in about 85% of all full sentences (Rooth in Proceedings of the symposium on representation and acquisition of lexical knowledge. AAAI, 1995 [172]); in English, it attains over 90%. On the other hand, the unsupervised approach tries to discover the structure of a text using only raw text, which allows the creation of a dependency analyzer for virtually any language. Here, we explore this second approach. We present the model of an unsupervised dependency analyzer, named DILUCT-GI (GI short for grammar inference).

AB - There are mainly two approaches for creating syntactic dependency analyzers: supervised and unsupervised. The main goal of the first approach is to attain the best possible performance for a single language. For this purpose, a large collection of resources is gathered (using manually annotated corpora with part-of-speech annotations and syntactic and structure tags), which requires a significant amount of work and time. The state of the art in this approach attains syntactic annotation in about 85% of all full sentences (Rooth in Proceedings of the symposium on representation and acquisition of lexical knowledge. AAAI, 1995 [172]); in English, it attains over 90%. On the other hand, the unsupervised approach tries to discover the structure of a text using only raw text, which allows the creation of a dependency analyzer for virtually any language. Here, we explore this second approach. We present the model of an unsupervised dependency analyzer, named DILUCT-GI (GI short for grammar inference).

UR - http://www.scopus.com/inward/record.url?scp=85042874836&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-74054-6_8

DO - 10.1007/978-3-319-74054-6_8

M3 - Capítulo

AN - SCOPUS:85042874836

T3 - Studies in Computational Intelligence

SP - 111

EP - 124

BT - Studies in Computational Intelligence

PB - Springer Verlag

ER -

The unsupervised approach: Grammar induction

Abstract

Publication series

Access to Document

Other files and links

Fingerprint

Cite this