Unsupervised learning for syntactic disambiguation

Alexander Gelbukh

doi:10.13053/CyS-18-2-2014-035

Unsupervised learning for syntactic disambiguation

Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

We present a methodology framework for syntactic disambiguation in natural language texts. The method takes advantage of an existing manually compiled non-probabilistic and non-lexicalized grammar, and turns it into a probabilistic lexicalized grammar by automatically learning a kind of subcategorization frames or selectional preferences for all words observed in the training corpus. The dictionary of subcategorization frames or selectional preferences obtained in the training process can be subsequently used for syntactic disambiguation of new unseen texts. The learning process is unsupervised and requires no manual markup. The learning algorithm proposed in this paper can take advantage of any existing disambiguation method, including linguistically motivated methods of filtering or weighting competing alternative parse trees or syntactic relations, thus allowing for integration of linguistic knowledge and unsupervised machine learning.

Original language	English
Pages (from-to)	329-344
Number of pages	16
Journal	Computacion y Sistemas
Volume	18
Issue number	2
DOIs	https://doi.org/10.13053/CyS-18-2-2014-035
State	Published - 2014

Keywords

Natural language processing
Syntactic disambiguation
Syntactic parsing
Unsupervised machine learning

Access to Document

10.13053/CyS-18-2-2014-035

Cite this

@article{63b512858f684e2994f09ac0d47722a2,

title = "Unsupervised learning for syntactic disambiguation",

abstract = "We present a methodology framework for syntactic disambiguation in natural language texts. The method takes advantage of an existing manually compiled non-probabilistic and non-lexicalized grammar, and turns it into a probabilistic lexicalized grammar by automatically learning a kind of subcategorization frames or selectional preferences for all words observed in the training corpus. The dictionary of subcategorization frames or selectional preferences obtained in the training process can be subsequently used for syntactic disambiguation of new unseen texts. The learning process is unsupervised and requires no manual markup. The learning algorithm proposed in this paper can take advantage of any existing disambiguation method, including linguistically motivated methods of filtering or weighting competing alternative parse trees or syntactic relations, thus allowing for integration of linguistic knowledge and unsupervised machine learning.",

keywords = "Natural language processing, Syntactic disambiguation, Syntactic parsing, Unsupervised machine learning",

author = "Alexander Gelbukh",

year = "2014",

doi = "10.13053/CyS-18-2-2014-035",

language = "Ingl{\'e}s",

volume = "18",

pages = "329--344",

journal = "Computacion y Sistemas",

issn = "1405-5546",

number = "2",

}

TY - JOUR

T1 - Unsupervised learning for syntactic disambiguation

AU - Gelbukh, Alexander

PY - 2014

Y1 - 2014

N2 - We present a methodology framework for syntactic disambiguation in natural language texts. The method takes advantage of an existing manually compiled non-probabilistic and non-lexicalized grammar, and turns it into a probabilistic lexicalized grammar by automatically learning a kind of subcategorization frames or selectional preferences for all words observed in the training corpus. The dictionary of subcategorization frames or selectional preferences obtained in the training process can be subsequently used for syntactic disambiguation of new unseen texts. The learning process is unsupervised and requires no manual markup. The learning algorithm proposed in this paper can take advantage of any existing disambiguation method, including linguistically motivated methods of filtering or weighting competing alternative parse trees or syntactic relations, thus allowing for integration of linguistic knowledge and unsupervised machine learning.

AB - We present a methodology framework for syntactic disambiguation in natural language texts. The method takes advantage of an existing manually compiled non-probabilistic and non-lexicalized grammar, and turns it into a probabilistic lexicalized grammar by automatically learning a kind of subcategorization frames or selectional preferences for all words observed in the training corpus. The dictionary of subcategorization frames or selectional preferences obtained in the training process can be subsequently used for syntactic disambiguation of new unseen texts. The learning process is unsupervised and requires no manual markup. The learning algorithm proposed in this paper can take advantage of any existing disambiguation method, including linguistically motivated methods of filtering or weighting competing alternative parse trees or syntactic relations, thus allowing for integration of linguistic knowledge and unsupervised machine learning.

KW - Natural language processing

KW - Syntactic disambiguation

KW - Syntactic parsing

KW - Unsupervised machine learning

UR - http://www.scopus.com/inward/record.url?scp=84904020350&partnerID=8YFLogxK

U2 - 10.13053/CyS-18-2-2014-035

DO - 10.13053/CyS-18-2-2014-035

M3 - Artículo

SN - 1405-5546

VL - 18

SP - 329

EP - 344

JO - Computacion y Sistemas

JF - Computacion y Sistemas

IS - 2

ER -

Unsupervised learning for syntactic disambiguation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this