Clustering abstracts instead of full texts

Pavel Makagonov, Mikhail Alexandrov, Alexander Gelbukh

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

25 Citas (Scopus)

Resumen

Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.

Idioma originalInglés
Páginas (desde-hasta)129-135
Número de páginas7
PublicaciónLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volumen3206
DOI
EstadoPublicada - 2004
Evento7th International Conference TSD 2004: Text, Speech and Dialogue - Brno, República Checa
Duración: 8 sep. 200411 sep. 2004

Huella

Profundice en los temas de investigación de 'Clustering abstracts instead of full texts'. En conjunto forman una huella única.

Citar esto