Updates: The trained model + instructions to use can now be downloaded from HF here.
In this post, I summarize how I made use of Huggingface’s transformer library to re-solve an NLP problem related to the Vietnamese language.
The problem
After learning about Hidden Markov models about 10+ years ago, I decided to apply it to building a small, but practical, toy that can auto insert accent marks for Vietnamese language.
In a nutshell, Vietnamese has some letters that have additional marks put on them. For ex, in addition to the letter ‘a’, the Vi alphabet also contains these “marked versions”: ă, â.
And for each of these 3 versions (a, ă, â), we can then put the 5 tones on them. An example for ‘ă’ will be: ắ (acute), ằ (grave), ẳ (hook), ẵ (tilde), ặ (dot).
Continue reading A Transformer model for inserting Vietnamese accent marks