Generalized Mongue-Elkan method for approximate text string comparison

Sergio Jimenez, Claudia Becerra, Alexander Gelbukh, Fabio Gonzalez

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Scopus citations

Abstract

The Mongue-Elkan method is a general text string comparison method based on an internal character-based similarity measure (e.g. edit distance) combined with a token level (i.e. word level) similarity measure. We propose a generalization of this method based on the notion of the generalized arithmetic mean instead of the simple average used in the expression to calculate the Monge-Elkan method. The experiments carried out with 12 well-known name-matching data sets show that the proposed approach outperforms the original Monge-Elkan method when character-based measures are used to compare tokens.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 10th International Conference, CICLing 2009, Proceedings
Pages559-570
Number of pages12
DOIs
StatePublished - 2009
Event10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009 - Mexico City, Mexico
Duration: 1 Mar 20097 Mar 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5449 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009
Country/TerritoryMexico
CityMexico City
Period1/03/097/03/09

Fingerprint

Dive into the research topics of 'Generalized Mongue-Elkan method for approximate text string comparison'. Together they form a unique fingerprint.

Cite this