A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques

Authors

  • Amarildo Rista South East European University, Faculty of Contemporary Sciences and Technologies, Tetovo, North Macedonia
  • Arbana Kadriu South East European University, Faculty of Contemporary Sciences and Technologies, Tetovo, North Macedonia

DOI:

https://doi.org/10.56345/ijrdv9n301

Keywords:

Deep learning, Albanian language, End-to-end ASR, Speech Recognition, Corpus

Abstract

End-to-end Automatic Speech Recognition (ASR) system folds the acoustic model (AM), language model (LM), and pronunciation model (PM) into a single neural network. The joint optimization of all these components optimizes performance of the model. In this paper, we introduce a model for Albanian speech recognition (SR) using end-to-end deep learning techniques. The two main modules that build this model are: Residual Convolutional Neural Networks (ResCNN), which aims to learn the relevant features and Bidirectional Recurrent Neural Networks (BiRNN) aiming to leverage the learned ResCNN audio features. To train and evaluate the model, we have built a corpus for Albanian Speech Recognition (CASR), which contains 100 hours of audio data along with their transcripts. During the design of the corpus we took into account the attributes of the speaker such as: age, gender, and accent, speed of utterance and dialect, so that it is as heterogeneous as possible. The evaluation of the model is done through word error rate (WER) and character error rate (CER) metrics. It achieves 5% WER and 1% CER.

Downloads

Published

2022-07-01

How to Cite

Rista, A. ., & Kadriu, A. . (2022). A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques. Interdisciplinary Journal of Research and Development, 9(3), 1. https://doi.org/10.56345/ijrdv9n301

Issue

Section

Articles