Clustering abstracts instead of full texts

Pavel Makagonov; Mikhail Alexandrov; Alexander Gelbukh

doi:10.1007/978-3-540-30120-2_17

Clustering abstracts instead of full texts

Pavel Makagonov, Mikhail Alexandrov, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

25 Scopus citations

Abstract

Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.

Original language	English
Pages (from-to)	129-135
Number of pages	7
Journal	Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume	3206
DOIs	https://doi.org/10.1007/978-3-540-30120-2_17
State	Published - 2004
Event	7th International Conference TSD 2004: Text, Speech and Dialogue - Brno, Czech Republic Duration: 8 Sep 2004 → 11 Sep 2004

Access to Document

10.1007/978-3-540-30120-2_17

Cite this

@article{dc6478d76fc04ebab254533b8615cc0f,

title = "Clustering abstracts instead of full texts",

abstract = "Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.",

author = "Pavel Makagonov and Mikhail Alexandrov and Alexander Gelbukh",

year = "2004",

doi = "10.1007/978-3-540-30120-2_17",

language = "Ingl{\'e}s",

volume = "3206",

pages = "129--135",

journal = "Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)",

issn = "0302-9743",

publisher = "Springer Verlag",

note = "7th International Conference TSD 2004: Text, Speech and Dialogue ; Conference date: 08-09-2004 Through 11-09-2004",

}

TY - JOUR

T1 - Clustering abstracts instead of full texts

AU - Makagonov, Pavel

AU - Alexandrov, Mikhail

AU - Gelbukh, Alexander

PY - 2004

Y1 - 2004

N2 - Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.

AB - Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.

UR - http://www.scopus.com/inward/record.url?scp=22944482209&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-30120-2_17

DO - 10.1007/978-3-540-30120-2_17

M3 - Artículo de la conferencia

AN - SCOPUS:22944482209

SN - 0302-9743

VL - 3206

SP - 129

EP - 135

JO - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

JF - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

T2 - 7th International Conference TSD 2004: Text, Speech and Dialogue

Y2 - 8 September 2004 through 11 September 2004

ER -

Clustering abstracts instead of full texts

Abstract

Access to Document

Other files and links

Fingerprint

Cite this