Soft cardinality: A parameterized similarity function for text comparison

Sergio Jimenez, Claudia Becerra, Alexander Gelbukh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

49 Scopus citations


We present an approach for the construction of text similarity functions using a parameterized resemblance coefficient in combination with a softened cardinality function called soft cardinality. Our approach provides a consistent and recursive model, varying levels of granularity from sentences to characters. Therefore, our model was used to compare sentences divided into words, and in turn, words divided into q-grams of characters. Experimentally, we observed that a performance correlation function in a space defined by all parameters was relatively smooth and had a single maximum achievable by "hill climbing." Our approach used only surface text information, a stop-word remover, and a stemmer to tackle the semantic text similarity task 6 at SEMEVAL 2012. The proposed method ranked 3rd (average), 5th (normalized correlation), and 15th (aggregated correlation) among 89 systems submitted by 31 teams.

Original languageEnglish
Title of host publicationProceedings of the 6th International Workshop on Semantic Evaluation, SemEval 2012
PublisherAssociation for Computational Linguistics (ACL)
Number of pages5
ISBN (Electronic)9781937284220
StatePublished - 2012
Event1st Joint Conference on Lexical and Computational Semantics, *SEM 2012 - Montreal, Canada
Duration: 7 Jun 20128 Jun 2012

Publication series

Name*SEM 2012 - 1st Joint Conference on Lexical and Computational Semantics


Conference1st Joint Conference on Lexical and Computational Semantics, *SEM 2012


Dive into the research topics of 'Soft cardinality: A parameterized similarity function for text comparison'. Together they form a unique fingerprint.

Cite this