TY - GEN
T1 - Text comparison using soft cardinality
AU - Jimenez, Sergio
AU - Gonzalez, Fabio
AU - Gelbukh, Alexander
PY - 2010
Y1 - 2010
N2 - The classical set theory provides a method for comparing objects using cardinality and intersection, in combination with well-known resemblance coefficients such as Dice, Jaccard, and cosine. However, set operations are intrinsically crisp: they do not take into account similarities between elements. We propose a new general-purpose method for comparison of objects using a soft cardinality function that show that the soft cardinality method is superior via an auxiliary affinity (similarity) measure. Our experiments with 12 text matching datasets suggest that the soft cardinality method is superior to known approximate string comparison methods in text comparison task.
AB - The classical set theory provides a method for comparing objects using cardinality and intersection, in combination with well-known resemblance coefficients such as Dice, Jaccard, and cosine. However, set operations are intrinsically crisp: they do not take into account similarities between elements. We propose a new general-purpose method for comparison of objects using a soft cardinality function that show that the soft cardinality method is superior via an auxiliary affinity (similarity) measure. Our experiments with 12 text matching datasets suggest that the soft cardinality method is superior to known approximate string comparison methods in text comparison task.
UR - http://www.scopus.com/inward/record.url?scp=78449297542&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-16321-0_31
DO - 10.1007/978-3-642-16321-0_31
M3 - Contribución a la conferencia
SN - 3642163203
SN - 9783642163203
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 297
EP - 302
BT - String Processing and Information Retrieval - 17th International Symposium, SPIRE 2010, Proceedings
T2 - 17th International Symposium on String Processing and Information Retrieval, SPIRE 2010
Y2 - 11 October 2010 through 13 October 2010
ER -