Bert Based Named Entity Recognition for the Albanian Language

Authors

  • Labehat Kryeziu Ph.D. Candidate, Faculty of Contemporary Sciences and Technologies South East European University, Tetovo, North Macedonia lk29054@seeu.edu.mk
  • Visar Shehu Full professor, Faculty of Contemporary Sciences and Technologies South East European University, Tetovo, North Macedonia v.shehu@seeu.edu.mk

DOI:

https://doi.org/10.56345/ijrdv10n302

Keywords:

NER, BERT, Transfer Learning, Low-resource Language, NLP

Abstract

In this paper, we explore the application of the Bidirectional Encoder Representations from Transformers (BERT) model for Named Entity Recognition (NER) in the Albanian language. Despite the success of BERT in various NLP tasks across numerous languages, Albanian remains under-represented. Our approach leverages a transfer learning strategy with multilingual BERT, fine-tuned on a newly created, comprehensive Albanian language corpus. The corpus includes texts from various domains, thereby facilitating the identification of a broad range of named entities. We report on the challenges faced during corpus creation, model fine-tuning, and testing phases, such as dealing with dialectal variations and lack of existing resources for the Albanian language. This research not only contributes to the advancement of NER systems for low-resource languages, but also provides a robust foundation for further advancements in Albanian language processing.

 

Received: 21 September 2023 / Accepted: 29 October 2023 / Published: 23 November 2023

Downloads

Published

2023-11-23

How to Cite

Kryeziu, L., & Shehu, V. (2023). Bert Based Named Entity Recognition for the Albanian Language. Interdisciplinary Journal of Research and Development, 10(3), 7. https://doi.org/10.56345/ijrdv10n302