Computing text similarity using Tree Edit Distance

Grigori Sidorov; Helena Gomez-Adorno; Ilia Markov; David Pinto; Nahun Loya

doi:10.1109/NAFIPS-WConSC.2015.7284129

Computing text similarity using Tree Edit Distance

Grigori Sidorov, Helena Gomez-Adorno, Ilia Markov, David Pinto, Nahun Loya

Centro de Investigación en Computación (CIC)

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución a la conferencia › revisión exhaustiva

25 Citas (Scopus)

Resumen

In this paper, we propose the application of the Tree Edit Distance (TED) for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts. The computation of text similarity is the basic task for many natural language processing problems, and it is an open research field. Syntactic n-grams are text features for Vector Space Model construction extracted from dependency trees. Soft similarity is application of Vector Space Model taking into account similarity of features. First, we discuss the advantages of the application of the TED to syntactic n-grams. Then, we present a procedure based on the TED and syntactic n-grams for calculating soft similarity between texts.

Idioma original	Inglés
Título de la publicación alojada	2015 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015
Editorial	Institute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)	9781467372473
DOI	https://doi.org/10.1109/NAFIPS-WConSC.2015.7284129
Estado	Publicada - 29 sep. 2015
Evento	Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015 - Redmond, Estados Unidos Duración: 17 ago. 2015 → 19 ago. 2015

Serie de la publicación

Nombre	Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS
Volumen	2015-September

Conferencia

Conferencia	Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015
País/Territorio	Estados Unidos
Ciudad	Redmond
Período	17/08/15 → 19/08/15

Acceder al documento

10.1109/NAFIPS-WConSC.2015.7284129

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

Sidorov, G., Gomez-Adorno, H., Markov, I., Pinto, D., & Loya, N. (2015). Computing text similarity using Tree Edit Distance. En 2015 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015 Artículo 7284129 (Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS; Vol. 2015-September). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/NAFIPS-WConSC.2015.7284129

@inproceedings{472829f97ce34a16a67ade47d151ee69,

title = "Computing text similarity using Tree Edit Distance",

abstract = "In this paper, we propose the application of the Tree Edit Distance (TED) for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts. The computation of text similarity is the basic task for many natural language processing problems, and it is an open research field. Syntactic n-grams are text features for Vector Space Model construction extracted from dependency trees. Soft similarity is application of Vector Space Model taking into account similarity of features. First, we discuss the advantages of the application of the TED to syntactic n-grams. Then, we present a procedure based on the TED and syntactic n-grams for calculating soft similarity between texts.",

keywords = "Computational modeling, Cost function, Heuristic algorithms, Information retrieval, Natural language processing, Semantics, Syntactics",

author = "Grigori Sidorov and Helena Gomez-Adorno and Ilia Markov and David Pinto and Nahun Loya",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015 ; Conference date: 17-08-2015 Through 19-08-2015",

year = "2015",

month = sep,

day = "29",

doi = "10.1109/NAFIPS-WConSC.2015.7284129",

language = "Ingl{\'e}s",

series = "Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2015 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015",

address = "Estados Unidos",

}

Sidorov, G, Gomez-Adorno, H, Markov, I, Pinto, D & Loya, N 2015, Computing text similarity using Tree Edit Distance. En 2015 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015., 7284129, Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS, vol. 2015-September, Institute of Electrical and Electronics Engineers Inc., Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015, Redmond, Estados Unidos, 17/08/15. https://doi.org/10.1109/NAFIPS-WConSC.2015.7284129

Computing text similarity using Tree Edit Distance. / Sidorov, Grigori; Gomez-Adorno, Helena; Markov, Ilia et al.
2015 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. 7284129 (Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS; Vol. 2015-September).

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución a la conferencia › revisión exhaustiva

TY - GEN

T1 - Computing text similarity using Tree Edit Distance

AU - Sidorov, Grigori

AU - Gomez-Adorno, Helena

AU - Markov, Ilia

AU - Pinto, David

AU - Loya, Nahun

PY - 2015/9/29

Y1 - 2015/9/29

N2 - In this paper, we propose the application of the Tree Edit Distance (TED) for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts. The computation of text similarity is the basic task for many natural language processing problems, and it is an open research field. Syntactic n-grams are text features for Vector Space Model construction extracted from dependency trees. Soft similarity is application of Vector Space Model taking into account similarity of features. First, we discuss the advantages of the application of the TED to syntactic n-grams. Then, we present a procedure based on the TED and syntactic n-grams for calculating soft similarity between texts.

AB - In this paper, we propose the application of the Tree Edit Distance (TED) for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts. The computation of text similarity is the basic task for many natural language processing problems, and it is an open research field. Syntactic n-grams are text features for Vector Space Model construction extracted from dependency trees. Soft similarity is application of Vector Space Model taking into account similarity of features. First, we discuss the advantages of the application of the TED to syntactic n-grams. Then, we present a procedure based on the TED and syntactic n-grams for calculating soft similarity between texts.

KW - Computational modeling

KW - Cost function

KW - Heuristic algorithms

KW - Information retrieval

KW - Natural language processing

KW - Semantics

KW - Syntactics

UR - http://www.scopus.com/inward/record.url?scp=84961888075&partnerID=8YFLogxK

U2 - 10.1109/NAFIPS-WConSC.2015.7284129

DO - 10.1109/NAFIPS-WConSC.2015.7284129

M3 - Contribución a la conferencia

AN - SCOPUS:84961888075

T3 - Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS

BT - 2015 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015

Y2 - 17 August 2015 through 19 August 2015

ER -

Sidorov G, Gomez-Adorno H, Markov I, Pinto D, Loya N. Computing text similarity using Tree Edit Distance. En 2015 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015. Institute of Electrical and Electronics Engineers Inc. 2015. 7284129. (Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS). doi: 10.1109/NAFIPS-WConSC.2015.7284129

Computing text similarity using Tree Edit Distance

Resumen

Serie de la publicación

Conferencia

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto