Assessing Wordplay-Pun classification from JOKER dataset with pretrained BERT humorous models

Victor Manuel Palma Preciado; Grigori Sidorov; Carolina Palma Preciado

Assessing Wordplay-Pun classification from JOKER dataset with pretrained BERT humorous models

Victor Manuel Palma Preciado, Grigori Sidorov, Carolina Palma Preciado

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

1 Scopus citations

Abstract

Humor is one of the most subjective matters of human behavior since it includes a wide range of variables: sentiments, wordplay, double meanings structurally or phonetic, all of this within the construction of written humor. It is important to assess the humor from a different point of view since this variability tends to provide insight into the true structure or the main core of the humoristic dilemma, as we know the range of humor is so diverse that it presents a high skilled problem even on the simplest tasks. Pre-trained base Bert and DistilBert models trained with a humorous one-liners dataset were used, these trained models were tested with a merged dataset from JOKER from data of tasks 1 and task 3, the collected data was trimmed from duplicated records and special characters to create a final dataset with 3,601 humorous sentences. Under this experiment we try to see if our models were able to detect a different humor from the initial type with which they were trained, it was noted that both methods are able to successfully classify another type of humor. On the one hand, it was expected that the pre-trained models would be able to classify at least a portion of the humor in the data set, the results obtained were much better than anticipated, obtaining 95.64% for BERT and 92.58% for DistilBERT, the models were really able to identify humor, an analysis of the worst and best cases were taken into account.

Original language	English
Pages (from-to)	1828-1833
Number of pages	6
Journal	CEUR Workshop Proceedings
Volume	3180
State	Published - 2022
Event	2022 Conference and Labs of the Evaluation Forum, CLEF 2022 - Bologna, Italy Duration: 5 Sep 2022 → 8 Sep 2022

Keywords

Classifiers
Humor identification
Humourism
Transformers

Cite this

@article{b81bc5bc791645a89c47d017305ba171,

title = "Assessing Wordplay-Pun classification from JOKER dataset with pretrained BERT humorous models",

abstract = "Humor is one of the most subjective matters of human behavior since it includes a wide range of variables: sentiments, wordplay, double meanings structurally or phonetic, all of this within the construction of written humor. It is important to assess the humor from a different point of view since this variability tends to provide insight into the true structure or the main core of the humoristic dilemma, as we know the range of humor is so diverse that it presents a high skilled problem even on the simplest tasks. Pre-trained base Bert and DistilBert models trained with a humorous one-liners dataset were used, these trained models were tested with a merged dataset from JOKER from data of tasks 1 and task 3, the collected data was trimmed from duplicated records and special characters to create a final dataset with 3,601 humorous sentences. Under this experiment we try to see if our models were able to detect a different humor from the initial type with which they were trained, it was noted that both methods are able to successfully classify another type of humor. On the one hand, it was expected that the pre-trained models would be able to classify at least a portion of the humor in the data set, the results obtained were much better than anticipated, obtaining 95.64% for BERT and 92.58% for DistilBERT, the models were really able to identify humor, an analysis of the worst and best cases were taken into account.",

keywords = "Classifiers, Humor identification, Humourism, Transformers",

author = "{Palma Preciado}, {Victor Manuel} and Grigori Sidorov and Preciado, {Carolina Palma}",

note = "Publisher Copyright: {\textcopyright} 2022 Copyright for this paper by its authors.; 2022 Conference and Labs of the Evaluation Forum, CLEF 2022 ; Conference date: 05-09-2022 Through 08-09-2022",

year = "2022",

language = "Ingl{\'e}s",

volume = "3180",

pages = "1828--1833",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - Assessing Wordplay-Pun classification from JOKER dataset with pretrained BERT humorous models

AU - Palma Preciado, Victor Manuel

AU - Sidorov, Grigori

AU - Preciado, Carolina Palma

PY - 2022

Y1 - 2022

N2 - Humor is one of the most subjective matters of human behavior since it includes a wide range of variables: sentiments, wordplay, double meanings structurally or phonetic, all of this within the construction of written humor. It is important to assess the humor from a different point of view since this variability tends to provide insight into the true structure or the main core of the humoristic dilemma, as we know the range of humor is so diverse that it presents a high skilled problem even on the simplest tasks. Pre-trained base Bert and DistilBert models trained with a humorous one-liners dataset were used, these trained models were tested with a merged dataset from JOKER from data of tasks 1 and task 3, the collected data was trimmed from duplicated records and special characters to create a final dataset with 3,601 humorous sentences. Under this experiment we try to see if our models were able to detect a different humor from the initial type with which they were trained, it was noted that both methods are able to successfully classify another type of humor. On the one hand, it was expected that the pre-trained models would be able to classify at least a portion of the humor in the data set, the results obtained were much better than anticipated, obtaining 95.64% for BERT and 92.58% for DistilBERT, the models were really able to identify humor, an analysis of the worst and best cases were taken into account.

AB - Humor is one of the most subjective matters of human behavior since it includes a wide range of variables: sentiments, wordplay, double meanings structurally or phonetic, all of this within the construction of written humor. It is important to assess the humor from a different point of view since this variability tends to provide insight into the true structure or the main core of the humoristic dilemma, as we know the range of humor is so diverse that it presents a high skilled problem even on the simplest tasks. Pre-trained base Bert and DistilBert models trained with a humorous one-liners dataset were used, these trained models were tested with a merged dataset from JOKER from data of tasks 1 and task 3, the collected data was trimmed from duplicated records and special characters to create a final dataset with 3,601 humorous sentences. Under this experiment we try to see if our models were able to detect a different humor from the initial type with which they were trained, it was noted that both methods are able to successfully classify another type of humor. On the one hand, it was expected that the pre-trained models would be able to classify at least a portion of the humor in the data set, the results obtained were much better than anticipated, obtaining 95.64% for BERT and 92.58% for DistilBERT, the models were really able to identify humor, an analysis of the worst and best cases were taken into account.

KW - Classifiers

KW - Humor identification

KW - Humourism

KW - Transformers

UR - http://www.scopus.com/inward/record.url?scp=85136920312&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85136920312

SN - 1613-0073

VL - 3180

SP - 1828

EP - 1833

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2022 Conference and Labs of the Evaluation Forum, CLEF 2022

Y2 - 5 September 2022 through 8 September 2022

ER -

Assessing Wordplay-Pun classification from JOKER dataset with pretrained BERT humorous models

Abstract

Keywords

Other files and links

Fingerprint

Cite this