The Combination of BERT and Data Oversampling for Answer Type Prediction

Thang Ta Hoang; Olumide Ebenezer Ojo; Olaronke Oluwayemisi Adebanji; Hiram Calvo; Alexander Gelbukh

The Combination of BERT and Data Oversampling for Answer Type Prediction

Thang Ta Hoang, Olumide Ebenezer Ojo, Olaronke Oluwayemisi Adebanji, Hiram Calvo, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

3 Scopus citations

Abstract

In this paper, we address the Task 1 (of the SMART Task 2021) of predicting the answer categories and types based on target ontologies, which could be useful in knowledge-based Question Answering (QA) systems. We introduced our method by combining the power of BERT architectures with data oversampling via replacements of linked terms to Wikidata and dependent noun phrases to attain the state-of-the-art performance. The accuracy on the DBpedia dataset is 98.5%, whereas NDCG@5 and NDCG@10 are 72.7% and 66.4% respectively. Our model has the best performance compared to other teams, with the accuracy score of 98% and Mean Reciprocal Rank (MRR) of 70% on the Wikidata dataset.

Original language	English
Journal	CEUR Workshop Proceedings
Volume	3119
State	Published - 2022
Event	2nd SeMantic Answer Type and Relation Prediction Task at ISWC Semantic Web Challenge, SMART 2021 - Virtual, Online Duration: 26 Oct 2021 → …

Keywords

Answer Type Prediction
ISWC
Question Answering
Semantic Web Challenge

Cite this

@article{484bff3d01ff4cfba4957cddc49a4bda,

title = "The Combination of BERT and Data Oversampling for Answer Type Prediction",

abstract = "In this paper, we address the Task 1 (of the SMART Task 2021) of predicting the answer categories and types based on target ontologies, which could be useful in knowledge-based Question Answering (QA) systems. We introduced our method by combining the power of BERT architectures with data oversampling via replacements of linked terms to Wikidata and dependent noun phrases to attain the state-of-the-art performance. The accuracy on the DBpedia dataset is 98.5%, whereas NDCG@5 and NDCG@10 are 72.7% and 66.4% respectively. Our model has the best performance compared to other teams, with the accuracy score of 98% and Mean Reciprocal Rank (MRR) of 70% on the Wikidata dataset.",

keywords = "Answer Type Prediction, ISWC, Question Answering, Semantic Web Challenge",

author = "Hoang, {Thang Ta} and Ojo, {Olumide Ebenezer} and Adebanji, {Olaronke Oluwayemisi} and Hiram Calvo and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2022 CEUR-WS. All rights reserved.; 2nd SeMantic Answer Type and Relation Prediction Task at ISWC Semantic Web Challenge, SMART 2021 ; Conference date: 26-10-2021",

year = "2022",

language = "Ingl{\'e}s",

volume = "3119",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - The Combination of BERT and Data Oversampling for Answer Type Prediction

AU - Hoang, Thang Ta

AU - Ojo, Olumide Ebenezer

AU - Adebanji, Olaronke Oluwayemisi

AU - Calvo, Hiram

AU - Gelbukh, Alexander

PY - 2022

Y1 - 2022

N2 - In this paper, we address the Task 1 (of the SMART Task 2021) of predicting the answer categories and types based on target ontologies, which could be useful in knowledge-based Question Answering (QA) systems. We introduced our method by combining the power of BERT architectures with data oversampling via replacements of linked terms to Wikidata and dependent noun phrases to attain the state-of-the-art performance. The accuracy on the DBpedia dataset is 98.5%, whereas NDCG@5 and NDCG@10 are 72.7% and 66.4% respectively. Our model has the best performance compared to other teams, with the accuracy score of 98% and Mean Reciprocal Rank (MRR) of 70% on the Wikidata dataset.

AB - In this paper, we address the Task 1 (of the SMART Task 2021) of predicting the answer categories and types based on target ontologies, which could be useful in knowledge-based Question Answering (QA) systems. We introduced our method by combining the power of BERT architectures with data oversampling via replacements of linked terms to Wikidata and dependent noun phrases to attain the state-of-the-art performance. The accuracy on the DBpedia dataset is 98.5%, whereas NDCG@5 and NDCG@10 are 72.7% and 66.4% respectively. Our model has the best performance compared to other teams, with the accuracy score of 98% and Mean Reciprocal Rank (MRR) of 70% on the Wikidata dataset.

KW - Answer Type Prediction

KW - ISWC

KW - Question Answering

KW - Semantic Web Challenge

UR - http://www.scopus.com/inward/record.url?scp=85129142915&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85129142915

SN - 1613-0073

VL - 3119

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2nd SeMantic Answer Type and Relation Prediction Task at ISWC Semantic Web Challenge, SMART 2021

Y2 - 26 October 2021

ER -

The Combination of BERT and Data Oversampling for Answer Type Prediction

Abstract

Keywords

Other files and links

Fingerprint

Cite this