TY - GEN
T1 - Text segmentation into paragraphs based on local text cohesion
AU - Bolshakov, Igor A.
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2001.
PY - 2001
Y1 - 2001
N2 - The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically selfcontained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Specifically, we propose a method of quantitative evaluation of text cohesion based on a large linguistic resource - a collocation network. At each step, our algorithm compares word occurrences in a text against a large DB of collocations and semantic links between words in the given natural language. The procedure consists in evaluation of the cohesion function, its smoothing, normalization, and comparing with a specially constructed threshold.
AB - The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically selfcontained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Specifically, we propose a method of quantitative evaluation of text cohesion based on a large linguistic resource - a collocation network. At each step, our algorithm compares word occurrences in a text against a large DB of collocations and semantic links between words in the given natural language. The procedure consists in evaluation of the cohesion function, its smoothing, normalization, and comparing with a specially constructed threshold.
UR - http://www.scopus.com/inward/record.url?scp=84942587674&partnerID=8YFLogxK
U2 - 10.1007/3-540-44805-5_20
DO - 10.1007/3-540-44805-5_20
M3 - Contribución a la conferencia
SN - 9783540425571
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 158
EP - 166
BT - Text, Speech and Dialogue - 4th International Conference, TSD 2001, Proceedings
A2 - Matousek, Vaclav
A2 - Mautner, Pavel
A2 - Moucek, Roman
A2 - Tauser, Karel
PB - Springer Verlag
T2 - 4th International Conference on Text, Speech and Dialogue, TSD 2001
Y2 - 11 September 2001 through 13 September 2001
ER -