Language- and subtask-dependent feature selection and classifier parameter tuning for author Profiling: Notebook for PAN at CLEF 2017

Ilia Markov, Helena Gómez-Adorno, Grigori Sidorov

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

17 Citas (Scopus)

Resumen

We present the CIC's approach to the Author Profiling (AP) task at PAN 2017. This year task consists of two subtasks: gender and language variety identification in English, Spanish, Portuguese, and Arabic. We use typed and untyped character n-grams, word n-grams, and non-textual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized frequency, log-entropy weighting, tf-idf), machine-learning algorithms (liblinear and libSVM implementations of Support Vector Machines (SVM), multinomial naive Bayes, ensemble classifier, meta-classifiers), and frequency threshold values. We adjusted system configurations for each of the languages and subtasks.

Idioma originalInglés
PublicaciónCEUR Workshop Proceedings
Volumen1866
EstadoPublicada - 2017
Evento18th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2017 - Dublin, Irlanda
Duración: 11 sep. 201714 sep. 2017

Huella

Profundice en los temas de investigación de 'Language- and subtask-dependent feature selection and classifier parameter tuning for author Profiling: Notebook for PAN at CLEF 2017'. En conjunto forman una huella única.

Citar esto