Central embeddings for extractive summarization based on similarity

Sandra J. Gutiérrez-Hinojosa, Hiram Calvo, Marco A. Moreno-Armendáriz

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In this work we propose using word embeddings combined with unsupervised methods such as clustering for the multi-document summarization task of DUC (Document Understanding Conference) 2002. We aim to find evidence that semantic information is kept in word embeddings and this representation is subject to be grouped based on their similarity, so that main ideas can be identified in sets of documents. We experiment with different clustering methods to extract candidates for the multi-document summarization task. Our experiments show that our method is able to find the prevalent ideas. ROUGE measures of our experiments are similar to the state of the art, despite the fact that not all the main ideas are found; as our method does not require annotated resources, it provides a domain and language independent way to create a summary.

Original languageEnglish
Pages (from-to)649-663
Number of pages15
JournalComputacion y Sistemas
Volume23
Issue number3
DOIs
StatePublished - 2019

Keywords

  • Central embeddings
  • Concept similarity
  • DUC 2002
  • Extractive summarization
  • Prevalent ideas extraction

Fingerprint

Dive into the research topics of 'Central embeddings for extractive summarization based on similarity'. Together they form a unique fingerprint.

Cite this