Clustering abstracts instead of full texts

Pavel Makagonov; Mikhail Alexandrov; Alexander Gelbukh

doi:10.1007/978-3-540-30120-2_17

Clustering abstracts instead of full texts

Pavel Makagonov, Mikhail Alexandrov, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

25 Citas (Scopus)

Resumen

Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.

Idioma original	Inglés
Páginas (desde-hasta)	129-135
Número de páginas	7
Publicación	Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volumen	3206
DOI	https://doi.org/10.1007/978-3-540-30120-2_17
Estado	Publicada - 2004
Evento	7th International Conference TSD 2004: Text, Speech and Dialogue - Brno, República Checa Duración: 8 sep. 2004 → 11 sep. 2004

Acceder al documento

10.1007/978-3-540-30120-2_17

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{dc6478d76fc04ebab254533b8615cc0f,

title = "Clustering abstracts instead of full texts",

abstract = "Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.",

author = "Pavel Makagonov and Mikhail Alexandrov and Alexander Gelbukh",

year = "2004",

doi = "10.1007/978-3-540-30120-2_17",

language = "Ingl{\'e}s",

volume = "3206",

pages = "129--135",

journal = "Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)",

issn = "0302-9743",

publisher = "Springer Verlag",

note = "7th International Conference TSD 2004: Text, Speech and Dialogue ; Conference date: 08-09-2004 Through 11-09-2004",

}

TY - JOUR

T1 - Clustering abstracts instead of full texts

AU - Makagonov, Pavel

AU - Alexandrov, Mikhail

AU - Gelbukh, Alexander

PY - 2004

Y1 - 2004

N2 - Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.

AB - Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.

UR - http://www.scopus.com/inward/record.url?scp=22944482209&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-30120-2_17

DO - 10.1007/978-3-540-30120-2_17

M3 - Artículo de la conferencia

AN - SCOPUS:22944482209

SN - 0302-9743

VL - 3206

SP - 129

EP - 135

JO - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

JF - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

T2 - 7th International Conference TSD 2004: Text, Speech and Dialogue

Y2 - 8 September 2004 through 11 September 2004

ER -

Clustering abstracts instead of full texts

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto