CIC-GIL approach to author profiling in Spanish tweets: Location and occupation

Ilia Markov; Helena Gómez-Adorno; Mónica Jasso-Rosales; Grigori Sidorov

CIC-GIL approach to author profiling in Spanish tweets: Location and occupation

Ilia Markov, Helena Gómez-Adorno, Mónica Jasso-Rosales, Grigori Sidorov

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

2 Scopus citations

Abstract

We present the CIC-GIL approach to the author profiling (AP) task at MEX-A3T 2018. The task consists of two subtasks: identification of authors’ location (6-way) and occupation (8-way) in a corpus of Mexican Spanish tweets. We used the logistic regression algorithm trained on typed character n-grams, function-word n-grams, and regionalisms for location identification, and typed character n-grams with several modifications for occupation identification. Our best run showed F1-macro score of 73.63% for location and 48.94% for occupation identification. The results are competitive with other participating teams; in particular, our best run was ranked fourth in the shared task.

Original language	English
Pages (from-to)	97-101
Number of pages	5
Journal	CEUR Workshop Proceedings
Volume	2150
State	Published - 2018
Event	3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval 2018 - Sevilla, Spain Duration: 18 Sep 2018 → …

Keywords

Author profiling
Location identification
Machine learning
N-grams
Occupation identification
Social media
Spanish

Cite this

@article{285669556dcf4cb1ae7ce9369bdb30e6,

title = "CIC-GIL approach to author profiling in Spanish tweets: Location and occupation",

abstract = "We present the CIC-GIL approach to the author profiling (AP) task at MEX-A3T 2018. The task consists of two subtasks: identification of authors{\textquoteright} location (6-way) and occupation (8-way) in a corpus of Mexican Spanish tweets. We used the logistic regression algorithm trained on typed character n-grams, function-word n-grams, and regionalisms for location identification, and typed character n-grams with several modifications for occupation identification. Our best run showed F1-macro score of 73.63% for location and 48.94% for occupation identification. The results are competitive with other participating teams; in particular, our best run was ranked fourth in the shared task.",

keywords = "Author profiling, Location identification, Machine learning, N-grams, Occupation identification, Social media, Spanish",

author = "Ilia Markov and Helena G{\'o}mez-Adorno and M{\'o}nica Jasso-Rosales and Grigori Sidorov",

note = "Publisher Copyright: {\textcopyright} 2018 CEUR-WS. All Rights Reserved.; 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval 2018 ; Conference date: 18-09-2018",

year = "2018",

language = "Ingl{\'e}s",

volume = "2150",

pages = "97--101",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - CIC-GIL approach to author profiling in Spanish tweets

T2 - 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval 2018

AU - Markov, Ilia

AU - Gómez-Adorno, Helena

AU - Jasso-Rosales, Mónica

AU - Sidorov, Grigori

PY - 2018

Y1 - 2018

N2 - We present the CIC-GIL approach to the author profiling (AP) task at MEX-A3T 2018. The task consists of two subtasks: identification of authors’ location (6-way) and occupation (8-way) in a corpus of Mexican Spanish tweets. We used the logistic regression algorithm trained on typed character n-grams, function-word n-grams, and regionalisms for location identification, and typed character n-grams with several modifications for occupation identification. Our best run showed F1-macro score of 73.63% for location and 48.94% for occupation identification. The results are competitive with other participating teams; in particular, our best run was ranked fourth in the shared task.

AB - We present the CIC-GIL approach to the author profiling (AP) task at MEX-A3T 2018. The task consists of two subtasks: identification of authors’ location (6-way) and occupation (8-way) in a corpus of Mexican Spanish tweets. We used the logistic regression algorithm trained on typed character n-grams, function-word n-grams, and regionalisms for location identification, and typed character n-grams with several modifications for occupation identification. Our best run showed F1-macro score of 73.63% for location and 48.94% for occupation identification. The results are competitive with other participating teams; in particular, our best run was ranked fourth in the shared task.

KW - Author profiling

KW - Location identification

KW - Machine learning

KW - N-grams

KW - Occupation identification

KW - Social media

KW - Spanish

UR - http://www.scopus.com/inward/record.url?scp=85051335691&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85051335691

SN - 1613-0073

VL - 2150

SP - 97

EP - 101

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 18 September 2018

ER -

CIC-GIL approach to author profiling in Spanish tweets: Location and occupation

Abstract

Keywords

Other files and links

Fingerprint

Cite this