THANGCIC at PoliticEs 2022: Term-based BERT for Extracting Political Ideology from Spanish Author Profiling

Hoang Thang Ta; Abu Bakar Siddiqur Rahman; Lotfollah Najjar; Alexander Gelbukh

THANGCIC at PoliticEs 2022: Term-based BERT for Extracting Political Ideology from Spanish Author Profiling

Hoang Thang Ta, Abu Bakar Siddiqur Rahman, Lotfollah Najjar, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

1 Scopus citations

Abstract

This paper presents our participation in the task of detecting gender, profession, and political ideology in tweets of Spanish users, in a binary and multi-class perspective. The task plays an important role in identifying political ideology of parties and politicians, especially new emerging ones. This may support relevant tasks to make predictions in the elections, or create an impact on the decision of citizens through out propagation systems. For each user, we extracted features as the most popular terms from a bunch of his/her tweets, then put them as input data for the training, which applied a transfer learning set up on pre-trained BERT models. Our quick method should be suggested as a baseline for the task with the highest F1 average macro of 72.72%. In detail, we obtained F1 Gender of 69.14%, F1 Profession of 81.47%, F1 Ideology Binary of 75.76%, and F1 Ideology Multiclass of 64.51%.

Original language	English
Journal	CEUR Workshop Proceedings
Volume	3202
State	Published - 2022
Event	2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Spain Duration: 20 Sep 2022 → …

Keywords

Author Profiling
BERT
IberLEF
Political Ideology
SEPLN
Text Classification

Cite this

@article{2e63adcc564049e2b88506a4cf47e052,

title = "THANGCIC at PoliticEs 2022: Term-based BERT for Extracting Political Ideology from Spanish Author Profiling",

abstract = "This paper presents our participation in the task of detecting gender, profession, and political ideology in tweets of Spanish users, in a binary and multi-class perspective. The task plays an important role in identifying political ideology of parties and politicians, especially new emerging ones. This may support relevant tasks to make predictions in the elections, or create an impact on the decision of citizens through out propagation systems. For each user, we extracted features as the most popular terms from a bunch of his/her tweets, then put them as input data for the training, which applied a transfer learning set up on pre-trained BERT models. Our quick method should be suggested as a baseline for the task with the highest F1 average macro of 72.72%. In detail, we obtained F1 Gender of 69.14%, F1 Profession of 81.47%, F1 Ideology Binary of 75.76%, and F1 Ideology Multiclass of 64.51%.",

keywords = "Author Profiling, BERT, IberLEF, Political Ideology, SEPLN, Text Classification",

author = "Ta, {Hoang Thang} and Rahman, {Abu Bakar Siddiqur} and Lotfollah Najjar and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).; 2022 Iberian Languages Evaluation Forum, IberLEF 2022 ; Conference date: 20-09-2022",

year = "2022",

language = "Ingl{\'e}s",

volume = "3202",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - THANGCIC at PoliticEs 2022

T2 - 2022 Iberian Languages Evaluation Forum, IberLEF 2022

AU - Ta, Hoang Thang

AU - Rahman, Abu Bakar Siddiqur

AU - Najjar, Lotfollah

AU - Gelbukh, Alexander

PY - 2022

Y1 - 2022

N2 - This paper presents our participation in the task of detecting gender, profession, and political ideology in tweets of Spanish users, in a binary and multi-class perspective. The task plays an important role in identifying political ideology of parties and politicians, especially new emerging ones. This may support relevant tasks to make predictions in the elections, or create an impact on the decision of citizens through out propagation systems. For each user, we extracted features as the most popular terms from a bunch of his/her tweets, then put them as input data for the training, which applied a transfer learning set up on pre-trained BERT models. Our quick method should be suggested as a baseline for the task with the highest F1 average macro of 72.72%. In detail, we obtained F1 Gender of 69.14%, F1 Profession of 81.47%, F1 Ideology Binary of 75.76%, and F1 Ideology Multiclass of 64.51%.

AB - This paper presents our participation in the task of detecting gender, profession, and political ideology in tweets of Spanish users, in a binary and multi-class perspective. The task plays an important role in identifying political ideology of parties and politicians, especially new emerging ones. This may support relevant tasks to make predictions in the elections, or create an impact on the decision of citizens through out propagation systems. For each user, we extracted features as the most popular terms from a bunch of his/her tweets, then put them as input data for the training, which applied a transfer learning set up on pre-trained BERT models. Our quick method should be suggested as a baseline for the task with the highest F1 average macro of 72.72%. In detail, we obtained F1 Gender of 69.14%, F1 Profession of 81.47%, F1 Ideology Binary of 75.76%, and F1 Ideology Multiclass of 64.51%.

KW - Author Profiling

KW - BERT

KW - IberLEF

KW - Political Ideology

KW - SEPLN

KW - Text Classification

UR - http://www.scopus.com/inward/record.url?scp=85137336482&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85137336482

SN - 1613-0073

VL - 3202

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 20 September 2022

ER -

THANGCIC at PoliticEs 2022: Term-based BERT for Extracting Political Ideology from Spanish Author Profiling

Abstract

Keywords

Other files and links

Fingerprint

Cite this