Huggingface’s transformer library is enabling engineers and developers to access the latest latest developments in AI research. Kudos to them.
Below, I summarize how I made use of their library to re-solve an NLP problem related to the Vietnamese language.
After learning about Hidden Markov models about 10+ years ago, I decided to apply it to building a small, but practical, toy that can auto insert accent marks for Vietnamese language.
In a nutshell, Vietnamese has some letters that have additional marks put on them. For ex, in addition to the letter ‘a’, the Vi alphabet also contains these “marked versions”: ă, â.
And for each of these 3 versions (a, ă, â), we can then put the 5 tones on them. An example for ‘ă’ will be: ắ (acute), ằ (grave), ẳ (hook), ẵ (tilde), ặ (dot).