TY - GEN
T1 - SC spectra
T2 - 10th Mexican International Conference on Artificial Intelligence, MICAI 2011
AU - Jiménez Vargas, Sergio
AU - Gelbukh, Alexander
PY - 2011
Y1 - 2011
N2 - Soft cardinality (SC) is a softened version of the classical cardinality of set theory. However, given its prohibitive cost of computing (exponential order), an approximation that is quadratic in the number of terms in the text has been proposed in the past. SC Spectra is a new method of approximation in linear time for text strings, which divides text strings into consecutive substrings (i.e., q-grams) of different sizes. Thus, SC in combination with resemblance coefficients allowed the construction of a family of similarity functions for text comparison. These similarity measures have been used in the past to address a problem of entity resolution (name matching) outperforming SoftTFIDF measure. SC spectra method improves the previous results using less time and obtaining better performance. This allows the new method to be used with relatively large documents such as those included in classic information retrieval collections. SC spectra method exceeded SoftTFIDF and cosine tf-idf baselines with an approach that requires no term weighing.
AB - Soft cardinality (SC) is a softened version of the classical cardinality of set theory. However, given its prohibitive cost of computing (exponential order), an approximation that is quadratic in the number of terms in the text has been proposed in the past. SC Spectra is a new method of approximation in linear time for text strings, which divides text strings into consecutive substrings (i.e., q-grams) of different sizes. Thus, SC in combination with resemblance coefficients allowed the construction of a family of similarity functions for text comparison. These similarity measures have been used in the past to address a problem of entity resolution (name matching) outperforming SoftTFIDF measure. SC spectra method improves the previous results using less time and obtaining better performance. This allows the new method to be used with relatively large documents such as those included in classic information retrieval collections. SC spectra method exceeded SoftTFIDF and cosine tf-idf baselines with an approach that requires no term weighing.
KW - approximate text comparison
KW - ngrams
KW - q-grams
KW - soft cardinality
KW - soft cardinality spectra
UR - http://www.scopus.com/inward/record.url?scp=82555180517&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-25330-0_19
DO - 10.1007/978-3-642-25330-0_19
M3 - Contribución a la conferencia
SN - 9783642253294
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 213
EP - 224
BT - Advances in Soft Computing - 10th Mexican International Conference on Artificial Intelligence, MICAI 2011, Proceedings
Y2 - 26 November 2011 through 4 December 2011
ER -