Abstract
We present the CIC-GIL approach to the author profiling (AP) task at MEX-A3T 2018. The task consists of two subtasks: identification of authors’ location (6-way) and occupation (8-way) in a corpus of Mexican Spanish tweets. We used the logistic regression algorithm trained on typed character n-grams, function-word n-grams, and regionalisms for location identification, and typed character n-grams with several modifications for occupation identification. Our best run showed F1-macro score of 73.63% for location and 48.94% for occupation identification. The results are competitive with other participating teams; in particular, our best run was ranked fourth in the shared task.
Original language | English |
---|---|
Pages (from-to) | 97-101 |
Number of pages | 5 |
Journal | CEUR Workshop Proceedings |
Volume | 2150 |
State | Published - 2018 |
Event | 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval 2018 - Sevilla, Spain Duration: 18 Sep 2018 → … |
Keywords
- Author profiling
- Location identification
- Machine learning
- N-grams
- Occupation identification
- Social media
- Spanish