Computing text similarity using Tree Edit Distance

Grigori Sidorov, Helena Gomez-Adorno, Ilia Markov, David Pinto, Nahun Loya

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

25 Scopus citations

Abstract

In this paper, we propose the application of the Tree Edit Distance (TED) for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts. The computation of text similarity is the basic task for many natural language processing problems, and it is an open research field. Syntactic n-grams are text features for Vector Space Model construction extracted from dependency trees. Soft similarity is application of Vector Space Model taking into account similarity of features. First, we discuss the advantages of the application of the TED to syntactic n-grams. Then, we present a procedure based on the TED and syntactic n-grams for calculating soft similarity between texts.

Original languageEnglish
Title of host publication2015 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467372473
DOIs
StatePublished - 29 Sep 2015
EventAnnual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015 - Redmond, United States
Duration: 17 Aug 201519 Aug 2015

Publication series

NameAnnual Conference of the North American Fuzzy Information Processing Society - NAFIPS
Volume2015-September

Conference

ConferenceAnnual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2015
Country/TerritoryUnited States
CityRedmond
Period17/08/1519/08/15

Keywords

  • Computational modeling
  • Cost function
  • Heuristic algorithms
  • Information retrieval
  • Natural language processing
  • Semantics
  • Syntactics

Fingerprint

Dive into the research topics of 'Computing text similarity using Tree Edit Distance'. Together they form a unique fingerprint.

Cite this