Text segmentation into paragraphs based on local text cohesion

Igor A. Bolshakov, Alexander Gelbukh

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

17 Citas (Scopus)

Resumen

The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically selfcontained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Specifically, we propose a method of quantitative evaluation of text cohesion based on a large linguistic resource - a collocation network. At each step, our algorithm compares word occurrences in a text against a large DB of collocations and semantic links between words in the given natural language. The procedure consists in evaluation of the cohesion function, its smoothing, normalization, and comparing with a specially constructed threshold.

Idioma originalInglés
Título de la publicación alojadaText, Speech and Dialogue - 4th International Conference, TSD 2001, Proceedings
EditoresVaclav Matousek, Pavel Mautner, Roman Moucek, Karel Tauser
EditorialSpringer Verlag
Páginas158-166
Número de páginas9
ISBN (versión impresa)9783540425571
DOI
EstadoPublicada - 2001
Evento4th International Conference on Text, Speech and Dialogue, TSD 2001 - Zelezna Ruda, República Checa
Duración: 11 sep. 200113 sep. 2001

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen2166
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia4th International Conference on Text, Speech and Dialogue, TSD 2001
País/TerritorioRepública Checa
CiudadZelezna Ruda
Período11/09/0113/09/01

Huella

Profundice en los temas de investigación de 'Text segmentation into paragraphs based on local text cohesion'. En conjunto forman una huella única.

Citar esto