Bert Based Named Entity Recognition for the Albanian Language
DOI:
https://doi.org/10.56345/ijrdv10n302Keywords:
NER, BERT, Transfer Learning, Low-resource Language, NLPAbstract
In this paper, we explore the application of the Bidirectional Encoder Representations from Transformers (BERT) model for Named Entity Recognition (NER) in the Albanian language. Despite the success of BERT in various NLP tasks across numerous languages, Albanian remains under-represented. Our approach leverages a transfer learning strategy with multilingual BERT, fine-tuned on a newly created, comprehensive Albanian language corpus. The corpus includes texts from various domains, thereby facilitating the identification of a broad range of named entities. We report on the challenges faced during corpus creation, model fine-tuning, and testing phases, such as dealing with dialectal variations and lack of existing resources for the Albanian language. This research not only contributes to the advancement of NER systems for low-resource languages, but also provides a robust foundation for further advancements in Albanian language processing.
Received: 21 September 2023 / Accepted: 29 October 2023 / Published: 23 November 2023
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.