TY - JOUR
T1 - Language- and subtask-dependent feature selection and classifier parameter tuning for author Profiling
T2 - 18th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2017
AU - Markov, Ilia
AU - Gómez-Adorno, Helena
AU - Sidorov, Grigori
N1 - Funding Information:
This work was partially supported by the Mexican Government (CONACYT projects 240844, SNI, COFAA-IPN, SIP-IPN 20162204, 20162064, 20171813, 20171344, and 20172008).
PY - 2017
Y1 - 2017
N2 - We present the CIC's approach to the Author Profiling (AP) task at PAN 2017. This year task consists of two subtasks: gender and language variety identification in English, Spanish, Portuguese, and Arabic. We use typed and untyped character n-grams, word n-grams, and non-textual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized frequency, log-entropy weighting, tf-idf), machine-learning algorithms (liblinear and libSVM implementations of Support Vector Machines (SVM), multinomial naive Bayes, ensemble classifier, meta-classifiers), and frequency threshold values. We adjusted system configurations for each of the languages and subtasks.
AB - We present the CIC's approach to the Author Profiling (AP) task at PAN 2017. This year task consists of two subtasks: gender and language variety identification in English, Spanish, Portuguese, and Arabic. We use typed and untyped character n-grams, word n-grams, and non-textual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized frequency, log-entropy weighting, tf-idf), machine-learning algorithms (liblinear and libSVM implementations of Support Vector Machines (SVM), multinomial naive Bayes, ensemble classifier, meta-classifiers), and frequency threshold values. We adjusted system configurations for each of the languages and subtasks.
UR - http://www.scopus.com/inward/record.url?scp=85034760880&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85034760880
SN - 1613-0073
VL - 1866
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 11 September 2017 through 14 September 2017
ER -