Suramya's Blog : Welcome to my crazy life…

July 17, 2019

Using Machine Learning To Automatically Translate Long-Lost Languages

Filed under: Computer Software,Interesting Sites,My Thoughts — Suramya @ 1:25 PM

Machine Learning has become such a buzz word that any new product or research being released nowadays has to mention ML in it somewhere even though they have nothing to do with it. But this particular usecase is actually very interesting and I am looking forward to more advances in this front. Researchers Jiaming Luo and Regina Barzilay from MIT and Yuan Cao from Google’s AI lab in Mountain View, California have created a machine-learning system capable of deciphering lost languages.

Normally Machine translation programs work by mapping out how words in a given language are related to each other. This is done by processing large amounts of text in the language and creating vector maps on how often each word appears next to every other word for both source and target languages. Unfortunately, this requires a large dataset (text) in the language and that is not possible in case of lost languages, and that’s where the brilliance of this new technique comes in. Focusing on the fact that when languages evolve over time they can only change in certain ways (e.g. related words have the same order of characters etc) they came up with a ruleset for deciphering a language when the parent or child of the language being translated is known.

To test out their theory/process they tried it out with two lost languages, Linear B and Ugaritic. Linguists know that Linear B encodes an early version of ancient Greek and that Ugaritic, which was discovered in 1929, is an early form of Hebrew. After processing the system was able to correctly translate 67.3% of Linear B into their Greek equivalents which is a remarkable achievement and marks a first in the field.

There are still some restrictions with the new algorithm in that it doesn’t work if the progenitor language is not known. But work on the system is ongoing and who knows some new breakthrough might be just around the corner. Plus there is always a brute force approach where the system tries translating a given language using every possible language as the progenitor language. It would require a lot of compute and time but is something to look at as an option.

Well, this is all for now. Will write more later.

– Suramya

Source: Machine learning has been used to automatically translate long-lost languages

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress