Central embeddings for extractive summarization based on similarity

Sandra J. Gutiérrez-Hinojosa; Hiram Calvo; Marco A. Moreno-Armendáriz

doi:10.13053/CyS-23-3-3256

Central embeddings for extractive summarization based on similarity

Sandra J. Gutiérrez-Hinojosa, Hiram Calvo, Marco A. Moreno-Armendáriz

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

1 Cita (Scopus)

Resumen

In this work we propose using word embeddings combined with unsupervised methods such as clustering for the multi-document summarization task of DUC (Document Understanding Conference) 2002. We aim to find evidence that semantic information is kept in word embeddings and this representation is subject to be grouped based on their similarity, so that main ideas can be identified in sets of documents. We experiment with different clustering methods to extract candidates for the multi-document summarization task. Our experiments show that our method is able to find the prevalent ideas. ROUGE measures of our experiments are similar to the state of the art, despite the fact that not all the main ideas are found; as our method does not require annotated resources, it provides a domain and language independent way to create a summary.

Idioma original	Inglés
Páginas (desde-hasta)	649-663
Número de páginas	15
Publicación	Computacion y Sistemas
Volumen	23
N.º	3
DOI	https://doi.org/10.13053/CyS-23-3-3256
Estado	Publicada - 2019

Acceder al documento

10.13053/CyS-23-3-3256

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{cc67ccb3a2ae4bdab43e25749659b782,

title = "Central embeddings for extractive summarization based on similarity",

abstract = "In this work we propose using word embeddings combined with unsupervised methods such as clustering for the multi-document summarization task of DUC (Document Understanding Conference) 2002. We aim to find evidence that semantic information is kept in word embeddings and this representation is subject to be grouped based on their similarity, so that main ideas can be identified in sets of documents. We experiment with different clustering methods to extract candidates for the multi-document summarization task. Our experiments show that our method is able to find the prevalent ideas. ROUGE measures of our experiments are similar to the state of the art, despite the fact that not all the main ideas are found; as our method does not require annotated resources, it provides a domain and language independent way to create a summary.",

keywords = "Central embeddings, Concept similarity, DUC 2002, Extractive summarization, Prevalent ideas extraction",

author = "Guti{\'e}rrez-Hinojosa, {Sandra J.} and Hiram Calvo and Moreno-Armend{\'a}riz, {Marco A.}",

year = "2019",

doi = "10.13053/CyS-23-3-3256",

language = "Ingl{\'e}s",

volume = "23",

pages = "649--663",

journal = "Computacion y Sistemas",

issn = "1405-5546",

number = "3",

}

TY - JOUR

T1 - Central embeddings for extractive summarization based on similarity

AU - Gutiérrez-Hinojosa, Sandra J.

AU - Calvo, Hiram

AU - Moreno-Armendáriz, Marco A.

PY - 2019

Y1 - 2019

N2 - In this work we propose using word embeddings combined with unsupervised methods such as clustering for the multi-document summarization task of DUC (Document Understanding Conference) 2002. We aim to find evidence that semantic information is kept in word embeddings and this representation is subject to be grouped based on their similarity, so that main ideas can be identified in sets of documents. We experiment with different clustering methods to extract candidates for the multi-document summarization task. Our experiments show that our method is able to find the prevalent ideas. ROUGE measures of our experiments are similar to the state of the art, despite the fact that not all the main ideas are found; as our method does not require annotated resources, it provides a domain and language independent way to create a summary.

AB - In this work we propose using word embeddings combined with unsupervised methods such as clustering for the multi-document summarization task of DUC (Document Understanding Conference) 2002. We aim to find evidence that semantic information is kept in word embeddings and this representation is subject to be grouped based on their similarity, so that main ideas can be identified in sets of documents. We experiment with different clustering methods to extract candidates for the multi-document summarization task. Our experiments show that our method is able to find the prevalent ideas. ROUGE measures of our experiments are similar to the state of the art, despite the fact that not all the main ideas are found; as our method does not require annotated resources, it provides a domain and language independent way to create a summary.

KW - Central embeddings

KW - Concept similarity

KW - DUC 2002

KW - Extractive summarization

KW - Prevalent ideas extraction

UR - http://www.scopus.com/inward/record.url?scp=85076632723&partnerID=8YFLogxK

U2 - 10.13053/CyS-23-3-3256

DO - 10.13053/CyS-23-3-3256

M3 - Artículo

AN - SCOPUS:85076632723

SN - 1405-5546

VL - 23

SP - 649

EP - 663

JO - Computacion y Sistemas

JF - Computacion y Sistemas

IS - 3

ER -

Central embeddings for extractive summarization based on similarity

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto