The idea of machine translation has been around for centuries. It’s been brought up by such celebrities as Descartes and Leibnitz. It was the subject of patents for “translating machines” as early as the 1930s. Developed with the effort of innumerable scholars from the whole spectrum of science, it’s been gaining momentum ever since. Warren Weaver formulated goals and methods of machine translation long before we came to know the power of digital computing. The Georgetown-IBM experiment was carried out in 1954. Then the ALPAC (Automatic Language Processing Advisory Committee) was founded in the 1960s. So on and so forth.
The point is that for a long time, we’ve been dreaming of a Babel fish of one sort or another. Now, it’s almost here. What do we do with it?
Why would you use machine translation?
Both ordinary customers and business buyers prefer reading content in their own language. Which sort of makes sense. We’re laughing at inadequate outcomes of free online machine translation tools, but more than half non-native English speakers use them when browsing the Internet everyday. This is not a problem, it’s just human behavior. The problem occurs when the amount of content is too large to translate it. That is, when we don’t have enough time, people or money to do it. And this is what the situation looks like right now.
Eric Schmidt’s (former Google CEO) famous quote about 5 exabytes of data we create every two days may get the numbers wrong, but it makes a point. Facebook says its users share 30 billion pieces of content each month (and that was back in 2011). Since the first website ever made in 1990, we reached numbers like 1,816,416,499 (as of April 2017) and the symbolical counter is running faster than ever. This is just playing with numbers, but we all see it happen around us. There’s too much to read, let alone translate.
So if the Lennon-like dream of interlingual communication is not enough to justify using machine translation, there’s the real, present-day need: we’re producing far too much content to translate it with human brains only. There are not enough brains and they’re not fast enough.
How do we use machine translation today?
We’ve been using it for fun, but it slowly stops being funny. Check out this New York Times article to get a glimpse of what is going on with Google Translate behind the scenes. There’s a lot of commotion and work that goes unnoticed. Additionally, both Google and Facebook use crowdsourcing to improve their translation systems. And they are improving quite rapidly.
In business, the most common model is to combine both human and machine translation. How does it work? In different ways, depending on the needs, type of content or language pairs.
One of the two most popular methods are pre-editing and post-editing. This is how it works: you take a source text and spend some time on preparing it. This may mean simplifying the syntax. Or applying the so-called controlled natural language rules to reduce ambiguity and make things less complicated. This is to make the source text easier to process for machine translation applications.
The pre-edited text goes through machine translation and then it’s time for post-editing, which is pretty self-explanatory. Thanks to post-editing, the text can reach a level of quality required in a given situation.
What are the machine translation applications out there? There are quite a few of them. IBM uses their famous Watson supercomputer as part of their language services. Microsoft, apart from developing the widely known Bing Translator, launched a product called Microsoft Translator. There are plenty of other more and less specialized tools. To name but a few: Omniscien Technologies, Language Weaver, Moses, or Anusaaraka that focuses specifically on Indian languages.
How does machine translation work?
Well, as you can imagine, that’s a pretty broad subject. What may seem as a simple web application on the outside, combines a wide range of cutting-edge technologies (artificial intelligence, big data, cloud computing, web APIs) with linguistic research.
The most classic solutions rely on linguistic information. These are called rule-based machine translation systems. They generate output based on semantic, syntactic and morphological regularities existing in the source and target languages. You basically “feed” them with information stored in dictionaries and grammars of the languages you operate on and they use it to perform the translation.
Another popular family of machine translation methods is based on statistics. Translations are performed based on large sets of texts (corpora). Statistical machine translation programs go through hundreds of millions of records that were processed by countless human translators throughout the history. They take this data and detect patterns that allow for finding the translation that has the biggest chance of being correct. As you can guess, the larger the linguistic corpus, the better the translation.
There are a few other methods but the most thrilling one is neural machine translation. It seems to point a future direction of machine translation. In fact, Google and Microsoft translation programs already use this method. This approach uses very large artificial neural networks that are loosely based on how the human brain works. They use deep learning and are being constantly improved, or rather, improve themselves.
Artificial neural networks seem immensely complicated, but they’re a fascinating subject. A good place to start understanding them is here.
Content ready for machine translation
To really benefit from machine translation, you need your content well-prepared. There’s nothing new here, the same technical writing rules still apply, such as:
- keep your sentences short and to the point
- double-check your spelling
- use simple grammatical structures
- use active voice
So on and so forth. Neural networks may grow and improve every second, but the old Elements of Style still comes in handy.
As you can see, it’s not just about pre-editing. It’s about well-made and future-proof writing.
Will translators starve?
They may lose some weight. But they won’t die out, just like horses didn’t disappear when the Ford Model T came around (sorry if the comparison offends anyone). We’re positive about the changes the technological progress brings. We’re not hijacking Uber drivers and we’re not going to harass machine translation researchers. Not to mention that there’s a lot of work left to be done and most probably we won’t start understanding Qing-dynasty poetry at the push of a button anytime soon.
Or will we?