Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model

Fida Ullah; Ihsan Ullah; Olga Kolesnikova

doi:10.1007/978-3-031-19496-2_1

Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model

Fida Ullah, Ihsan Ullah, Olga Kolesnikova

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Scopus citations

Abstract

The named entity recognition (NER) task is a challenging problem in natural language processing (NLP), especially for languages with very few annotated corpora such as Urdu. In this paper we proposed an Attention-Bi-LSTM-CRF method and applied it to the MK-PUCIT Corpus which is the latest NER dataset available for the Urdu language. In addition to word-level embedding, we used an embedding-level focus mechanism. The output of the embedding layer was fed into a bidirectional-LSTM encoder unit, accompanied by another self-attention layer to boost the system’s accuracy. Our Attention-Bi-LSTM-CRF model demonstrated an F1-score of 92%. The cumulative findings of the experiments show that our approach outperforms existing methods, thus yielding a new UNER (Urdu Named Entity Recognition) state-of-the-art performance.

Original language	English
Title of host publication	Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings
Editors	Obdulia Pichardo Lagunas, Bella Martínez Seis, Juan Martínez-Miranda
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	3-17
Number of pages	15
ISBN (Print)	9783031194955
DOIs	https://doi.org/10.1007/978-3-031-19496-2_1
State	Published - 2022
Event	21st Mexican International Conference on Artificial Intelligence, MICAI 2022 - Monterrey, Mexico Duration: 24 Oct 2022 → 29 Oct 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13613 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	21st Mexican International Conference on Artificial Intelligence, MICAI 2022
Country/Territory	Mexico
City	Monterrey
Period	24/10/22 → 29/10/22

Keywords

Attention mechanism
Deep learning
Named entity recognition
Natural language processing
Word embedding

Access to Document

10.1007/978-3-031-19496-2_1

Cite this

Ullah, F., Ullah, I., & Kolesnikova, O. (2022). Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model. In O. Pichardo Lagunas, B. Martínez Seis, & J. Martínez-Miranda (Eds.), Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings (pp. 3-17). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13613 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19496-2_1

Ullah, Fida ; Ullah, Ihsan ; Kolesnikova, Olga. / Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model. Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings. editor / Obdulia Pichardo Lagunas ; Bella Martínez Seis ; Juan Martínez-Miranda. Springer Science and Business Media Deutschland GmbH, 2022. pp. 3-17 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{be1a412f07c84bdbb05f7d1103dd660a,

title = "Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model",

abstract = "The named entity recognition (NER) task is a challenging problem in natural language processing (NLP), especially for languages with very few annotated corpora such as Urdu. In this paper we proposed an Attention-Bi-LSTM-CRF method and applied it to the MK-PUCIT Corpus which is the latest NER dataset available for the Urdu language. In addition to word-level embedding, we used an embedding-level focus mechanism. The output of the embedding layer was fed into a bidirectional-LSTM encoder unit, accompanied by another self-attention layer to boost the system{\textquoteright}s accuracy. Our Attention-Bi-LSTM-CRF model demonstrated an F1-score of 92%. The cumulative findings of the experiments show that our approach outperforms existing methods, thus yielding a new UNER (Urdu Named Entity Recognition) state-of-the-art performance.",

keywords = "Attention mechanism, Deep learning, Named entity recognition, Natural language processing, Word embedding",

author = "Fida Ullah and Ihsan Ullah and Olga Kolesnikova",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 21st Mexican International Conference on Artificial Intelligence, MICAI 2022 ; Conference date: 24-10-2022 Through 29-10-2022",

year = "2022",

doi = "10.1007/978-3-031-19496-2_1",

language = "Ingl{\'e}s",

isbn = "9783031194955",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "3--17",

editor = "{Pichardo Lagunas}, Obdulia and {Mart{\'i}nez Seis}, Bella and Juan Mart{\'i}nez-Miranda",

booktitle = "Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings",

address = "Alemania",

}

Ullah, F, Ullah, I & Kolesnikova, O 2022, Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model. in O Pichardo Lagunas, B Martínez Seis & J Martínez-Miranda (eds), Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13613 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 3-17, 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Monterrey, Mexico, 24/10/22. https://doi.org/10.1007/978-3-031-19496-2_1

Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model. / Ullah, Fida; Ullah, Ihsan; Kolesnikova, Olga.
Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings. ed. / Obdulia Pichardo Lagunas; Bella Martínez Seis; Juan Martínez-Miranda. Springer Science and Business Media Deutschland GmbH, 2022. p. 3-17 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13613 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model

AU - Ullah, Fida

AU - Ullah, Ihsan

AU - Kolesnikova, Olga

PY - 2022

Y1 - 2022

N2 - The named entity recognition (NER) task is a challenging problem in natural language processing (NLP), especially for languages with very few annotated corpora such as Urdu. In this paper we proposed an Attention-Bi-LSTM-CRF method and applied it to the MK-PUCIT Corpus which is the latest NER dataset available for the Urdu language. In addition to word-level embedding, we used an embedding-level focus mechanism. The output of the embedding layer was fed into a bidirectional-LSTM encoder unit, accompanied by another self-attention layer to boost the system’s accuracy. Our Attention-Bi-LSTM-CRF model demonstrated an F1-score of 92%. The cumulative findings of the experiments show that our approach outperforms existing methods, thus yielding a new UNER (Urdu Named Entity Recognition) state-of-the-art performance.

AB - The named entity recognition (NER) task is a challenging problem in natural language processing (NLP), especially for languages with very few annotated corpora such as Urdu. In this paper we proposed an Attention-Bi-LSTM-CRF method and applied it to the MK-PUCIT Corpus which is the latest NER dataset available for the Urdu language. In addition to word-level embedding, we used an embedding-level focus mechanism. The output of the embedding layer was fed into a bidirectional-LSTM encoder unit, accompanied by another self-attention layer to boost the system’s accuracy. Our Attention-Bi-LSTM-CRF model demonstrated an F1-score of 92%. The cumulative findings of the experiments show that our approach outperforms existing methods, thus yielding a new UNER (Urdu Named Entity Recognition) state-of-the-art performance.

KW - Attention mechanism

KW - Deep learning

KW - Named entity recognition

KW - Natural language processing

KW - Word embedding

UR - http://www.scopus.com/inward/record.url?scp=85142829581&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-19496-2_1

DO - 10.1007/978-3-031-19496-2_1

M3 - Contribución a la conferencia

AN - SCOPUS:85142829581

SN - 9783031194955

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 3

EP - 17

BT - Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings

A2 - Pichardo Lagunas, Obdulia

A2 - Martínez Seis, Bella

A2 - Martínez-Miranda, Juan

PB - Springer Science and Business Media Deutschland GmbH

T2 - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022

Y2 - 24 October 2022 through 29 October 2022

ER -

Ullah F, Ullah I, Kolesnikova O. Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model. In Pichardo Lagunas O, Martínez Seis B, Martínez-Miranda J, editors, Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 3-17. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-19496-2_1

Urdu Named Entity Recognition with Attention Bi-LSTM-CRF Model

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this