TY - JOUR
T1 - Clustering abstracts instead of full texts
AU - Makagonov, Pavel
AU - Alexandrov, Mikhail
AU - Gelbukh, Alexander
PY - 2004
Y1 - 2004
N2 - Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.
AB - Accessibility of digital libraries and other web-based repositories has caused the illusion of accessibility of the full texts of scientific papers. However, in the majority of cases such an access (at least free access) is limited only to abstracts having no more then 50-100 words. Traditional keyword-based approach for clustering this type of documents gives unstable and imprecise results. We show that they can be easy improved with more adequate keyword selection and document similarity evaluation. We suggest simple procedures for this. We evaluate our approach on the data from two international conferences. One of our conclusions is the suggestion for the digital libraries and other repositories to provide document images of full texts of the papers along with their abstracts for open access via Internet.
UR - http://www.scopus.com/inward/record.url?scp=22944482209&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-30120-2_17
DO - 10.1007/978-3-540-30120-2_17
M3 - Artículo de la conferencia
AN - SCOPUS:22944482209
SN - 0302-9743
VL - 3206
SP - 129
EP - 135
JO - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
JF - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
T2 - 7th International Conference TSD 2004: Text, Speech and Dialogue
Y2 - 8 September 2004 through 11 September 2004
ER -