TY - JOUR
T1 - Mathematical properties of soft cardinality
T2 - Enhancing Jaccard, Dice and cosine similarity measures with element-wise distance
AU - Jimenez, Sergio
AU - Gonzalez, Fabio A.
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2016 Elsevier Inc.
PY - 2016/11/1
Y1 - 2016/11/1
N2 - The soft cardinality function generalizes the concept of counting measure of the classic cardinality of sets. This function provides an intuitive measure of the amount of elements in a collection (i.e. a set or a bag) exploiting the similarities among them. Although soft cardinality was first proposed in an ad-hoc way, it has been successfully used in various tasks in the field of natural language processing. In this paper, a formal definition of soft cardinality is proposed together with an analysis of its boundaries, monotonicity property and a method for constructing similarity functions. Additionally, an empirical evaluation of the model was carried out using synthetic data.
AB - The soft cardinality function generalizes the concept of counting measure of the classic cardinality of sets. This function provides an intuitive measure of the amount of elements in a collection (i.e. a set or a bag) exploiting the similarities among them. Although soft cardinality was first proposed in an ad-hoc way, it has been successfully used in various tasks in the field of natural language processing. In this paper, a formal definition of soft cardinality is proposed together with an analysis of its boundaries, monotonicity property and a method for constructing similarity functions. Additionally, an empirical evaluation of the model was carried out using synthetic data.
KW - Cardinality-based similarity measures
KW - Cosine similarity
KW - Dice's index
KW - Diversity-based similarity functions
KW - Jaccard's index
KW - Soft cardinality
UR - http://www.scopus.com/inward/record.url?scp=84976292399&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2016.06.012
DO - 10.1016/j.ins.2016.06.012
M3 - Artículo
SN - 0020-0255
VL - 367-368
SP - 373
EP - 389
JO - Information Sciences
JF - Information Sciences
ER -