Detecting inflection patterns in natural language by minimization of morphological model

Alexander Gelbukh, Mikhail Alexandrov, Sang Yong Han

Producción científica: Capítulo del libro/informe/acta de congresoCapítulorevisión exhaustiva

20 Citas (Scopus)

Resumen

One of the most important steps in text processing and information retrieval is stemming - reducing of words to stems expressing their base meaning, e.g., bake, baked, bakes, baking → bak-. We suggest an unsupervised method of recognition such inflection patterns automatically, with no a priori information on the given language, basing exclusively on a list of words extracted from a large text. For a given word list V we construct two sets of strings: stems S and endings E, such that each word from V is a concatenation of a stem from S and ending from E. To select an optimal model, we minimize the total number of elements in S and E. Though such a simplistic model does not reflect many phenomena of real natural language morphology, it shows surprisingly promising results on different European languages. In addition to practical value, we believe that this can also shed light on the nature of human language.

Idioma originalInglés
Título de la publicación alojadaLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditoresAlberto Sanfeliu, Jose Francisco Martinez-Trinidad, Jesus Ariel Carrasco-Ochoa
EditorialSpringer Verlag
Páginas432-438
Número de páginas7
ISBN (versión impresa)3540235272
DOI
EstadoPublicada - 2004

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen3287
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Huella

Profundice en los temas de investigación de 'Detecting inflection patterns in natural language by minimization of morphological model'. En conjunto forman una huella única.

Citar esto