The Combination of BERT and Data Oversampling for Relation Set Prediction

Thang Ta Hoang; Sabur Butt; Jason Angel; Grigori Sidorov; Alexander Gelbukh

The Combination of BERT and Data Oversampling for Relation Set Prediction

Thang Ta Hoang, Sabur Butt, Jason Angel, Grigori Sidorov, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

1 Scopus citations

Abstract

In this paper, we engage the Task 2 of the SMART Task 2021 challenge in predicting relations used to identify the correct answer of a given question. This is a subtask of Knowledge Base Question Answering (KBQA) and offers valuable insights for the development of KBQA systems. We introduce our method, combining BERT and data oversampling with text replacements of linked terms to Wikidata and dependent noun phrases, in predicting answer relations in two datasets. For the DBpedia dataset, we obtain F1 of 83.15%, precision of 83.68%, and recall of 82.95%. Meanwhile, for the Wikidata dataset we achieved F1 of 60.70%, precision of 61.63%, and recall of 61.10%.

Original language	English
Journal	CEUR Workshop Proceedings
Volume	3119
State	Published - 2022
Event	2nd SeMantic Answer Type and Relation Prediction Task at ISWC Semantic Web Challenge, SMART 2021 - Virtual, Online Duration: 26 Oct 2021 → …

Keywords

ISWC
Knowledge Base Question Answering
Relation Linking
Relation Prediction
Semantic Web Challenge

Cite this

@article{a352ebd46b90402b92149b88419e6ab8,

title = "The Combination of BERT and Data Oversampling for Relation Set Prediction",

abstract = "In this paper, we engage the Task 2 of the SMART Task 2021 challenge in predicting relations used to identify the correct answer of a given question. This is a subtask of Knowledge Base Question Answering (KBQA) and offers valuable insights for the development of KBQA systems. We introduce our method, combining BERT and data oversampling with text replacements of linked terms to Wikidata and dependent noun phrases, in predicting answer relations in two datasets. For the DBpedia dataset, we obtain F1 of 83.15%, precision of 83.68%, and recall of 82.95%. Meanwhile, for the Wikidata dataset we achieved F1 of 60.70%, precision of 61.63%, and recall of 61.10%.",

keywords = "ISWC, Knowledge Base Question Answering, Relation Linking, Relation Prediction, Semantic Web Challenge",

author = "Hoang, {Thang Ta} and Sabur Butt and Jason Angel and Grigori Sidorov and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2022 CEUR-WS. All rights reserved.; 2nd SeMantic Answer Type and Relation Prediction Task at ISWC Semantic Web Challenge, SMART 2021 ; Conference date: 26-10-2021",

year = "2022",

language = "Ingl{\'e}s",

volume = "3119",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - The Combination of BERT and Data Oversampling for Relation Set Prediction

AU - Hoang, Thang Ta

AU - Butt, Sabur

AU - Angel, Jason

AU - Sidorov, Grigori

AU - Gelbukh, Alexander

PY - 2022

Y1 - 2022

N2 - In this paper, we engage the Task 2 of the SMART Task 2021 challenge in predicting relations used to identify the correct answer of a given question. This is a subtask of Knowledge Base Question Answering (KBQA) and offers valuable insights for the development of KBQA systems. We introduce our method, combining BERT and data oversampling with text replacements of linked terms to Wikidata and dependent noun phrases, in predicting answer relations in two datasets. For the DBpedia dataset, we obtain F1 of 83.15%, precision of 83.68%, and recall of 82.95%. Meanwhile, for the Wikidata dataset we achieved F1 of 60.70%, precision of 61.63%, and recall of 61.10%.

AB - In this paper, we engage the Task 2 of the SMART Task 2021 challenge in predicting relations used to identify the correct answer of a given question. This is a subtask of Knowledge Base Question Answering (KBQA) and offers valuable insights for the development of KBQA systems. We introduce our method, combining BERT and data oversampling with text replacements of linked terms to Wikidata and dependent noun phrases, in predicting answer relations in two datasets. For the DBpedia dataset, we obtain F1 of 83.15%, precision of 83.68%, and recall of 82.95%. Meanwhile, for the Wikidata dataset we achieved F1 of 60.70%, precision of 61.63%, and recall of 61.10%.

KW - ISWC

KW - Knowledge Base Question Answering

KW - Relation Linking

KW - Relation Prediction

KW - Semantic Web Challenge

UR - http://www.scopus.com/inward/record.url?scp=85129185028&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85129185028

SN - 1613-0073

VL - 3119

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2nd SeMantic Answer Type and Relation Prediction Task at ISWC Semantic Web Challenge, SMART 2021

Y2 - 26 October 2021

ER -

The Combination of BERT and Data Oversampling for Relation Set Prediction

Abstract

Keywords

Other files and links

Fingerprint

Cite this