TY - JOUR
T1 - Central embeddings for extractive summarization based on similarity
AU - Gutiérrez-Hinojosa, Sandra J.
AU - Calvo, Hiram
AU - Moreno-Armendáriz, Marco A.
N1 - Publisher Copyright:
© 2019 Instituto Politecnico Nacional. All rights reserved.
PY - 2019
Y1 - 2019
N2 - In this work we propose using word embeddings combined with unsupervised methods such as clustering for the multi-document summarization task of DUC (Document Understanding Conference) 2002. We aim to find evidence that semantic information is kept in word embeddings and this representation is subject to be grouped based on their similarity, so that main ideas can be identified in sets of documents. We experiment with different clustering methods to extract candidates for the multi-document summarization task. Our experiments show that our method is able to find the prevalent ideas. ROUGE measures of our experiments are similar to the state of the art, despite the fact that not all the main ideas are found; as our method does not require annotated resources, it provides a domain and language independent way to create a summary.
AB - In this work we propose using word embeddings combined with unsupervised methods such as clustering for the multi-document summarization task of DUC (Document Understanding Conference) 2002. We aim to find evidence that semantic information is kept in word embeddings and this representation is subject to be grouped based on their similarity, so that main ideas can be identified in sets of documents. We experiment with different clustering methods to extract candidates for the multi-document summarization task. Our experiments show that our method is able to find the prevalent ideas. ROUGE measures of our experiments are similar to the state of the art, despite the fact that not all the main ideas are found; as our method does not require annotated resources, it provides a domain and language independent way to create a summary.
KW - Central embeddings
KW - Concept similarity
KW - DUC 2002
KW - Extractive summarization
KW - Prevalent ideas extraction
UR - http://www.scopus.com/inward/record.url?scp=85076632723&partnerID=8YFLogxK
U2 - 10.13053/CyS-23-3-3256
DO - 10.13053/CyS-23-3-3256
M3 - Artículo
AN - SCOPUS:85076632723
SN - 1405-5546
VL - 23
SP - 649
EP - 663
JO - Computacion y Sistemas
JF - Computacion y Sistemas
IS - 3
ER -