What is an unwritten language? Simply put, it is a language that is spoken but not written, and therefore has no written form by native speakers of that language. Unwritten languages by their very nature are obscure, and used by very few people. Spain for example has some dialects which are spoken by precious few people and can be considered unwritten. Leonese is one such language. Although written Leonese texts exist from about 1,000 years ago, the language has been mostly replaced by Catalan. Yet it is still spoken by some people in parts of Spain.
Unwritten languages are obscure by their very nature, as already mentioned, and furthermore are usually dying languages. So why would one of the world’s largest tech companies be concerned with developing machine translation for unwritten languages? Good question. Because Facebook just announced that they have developed translation software for Hokkien, which is an unwritten dialect of Mandarin Chinese.
The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes (Mark Zuckerberg)
Facebook is at the forefront of AI-based translation technology. And with good reason. People from all over the world post things on Facebook and can have their posts translated to hundreds of languages. Being able to linguistically connect people regardless of their spoken languages have helped turn Facebook into the world’s largest water cooler.
Is there any merit in developing translation software for obscure, unwritten languages? Probably not. And what for? If it is for research purposes, linguists and historians have the tools needed for their research. If it is to combat illiteracy, how can people who don’t know how to use a computer interact with machine translation software. And besides, many unwritten languages are dialects of mainstream languages which exist in written form.
The solution for unwritten languages is really just a gateway. Facebook is developing AI-based training tools which allow them to spin off new MT languages when a substantial bilingual corpus is not available. If they can do that, then Mr. Zuckerberg’s vision of communicating with people regardless of language may yet become a reality in our lifetime.