Diacritics Restoration Using Deep Neural Networks

Abstract
Diacritics restoration problem is defined as correct insertion of diacritic marks into a sentence without changing the context of a particular sentence. Changing a letter in a text can cause the meaning of the word to change, and therefore the meaning of the whole sentence. This kind of problem is common among languages that use Latin alphabet such as the Slovak language. Our goal is to develop an artificial neural network that can restore diacritic errors made by a human or a computer. For this purpose, we chose state of the art architecture of recurrent neural network. Our results prove that neural network is able to restore diacritics with 97 % accuracy on given Slovak corpus. The accuracy test on a text from a different genre, however, has shown a lower accuracy of 76%.

This publication has 12 references indexed in Scilit: