TY - JOUR
T1 - Author clustering using hierarchical Clustering analysis
T2 - 18th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2017
AU - Gómez-Adorno, Helena
AU - Aleman, Yuridiana
AU - Vilariño, Darnes
AU - Sanchez-Perez, Miguel A.
AU - Pinto, David
AU - Sidorov, Grigori
N1 - Funding Information:
This work was partially supported by the Mexican Government (CONACYT projects 240844, SNI, COFAA-IPN, SIP-IPN 20171813, 20171344, and 20172008).
PY - 2017
Y1 - 2017
N2 - This paper presents our approach to the Author Clustering task at PAN 2017. We performed a hierarchical clustering analysis of different document features: typed and untyped character n-grams, and word n-grams. We experimented with two feature representation methods, log-entropy model, and tf-idf; while tuning minimum frequency threshold values to reduce the dimensionality. Our system was ranked 1st in both subtasks, author clustering and authorship-link ranking.
AB - This paper presents our approach to the Author Clustering task at PAN 2017. We performed a hierarchical clustering analysis of different document features: typed and untyped character n-grams, and word n-grams. We experimented with two feature representation methods, log-entropy model, and tf-idf; while tuning minimum frequency threshold values to reduce the dimensionality. Our system was ranked 1st in both subtasks, author clustering and authorship-link ranking.
UR - http://www.scopus.com/inward/record.url?scp=85034772425&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85034772425
SN - 1613-0073
VL - 1866
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 11 September 2017 through 14 September 2017
ER -