Nearly 80 years into its long history, Machine Translation is having a moment.
In recent years, as all things AI have cemented themselves firmly into the zeitgeist, “Neural MT” has emerged as one of the many buzzwords that captivate attention across industries. No matter how you’re connected to the concept of Machine Translation (MT), you need to know how to talk about it.
As applications of artificial intelligence become increasingly accessible to companies and consumers, a lexicon of closely related terms has emerged. If you’re an outsider looking in, how do you parse the difference among terms that are sometimes used interchangeably? How do you translate machine translation?
We’re here to help. Here at Lionbridge, some of the most experienced MT experts in the world are part of our pride. We’ve worked with them to develop this cheat sheet to help you determine the subtle and not-so-subtle differences in the terms that keep the industry moving.
To understand recent trends in MT, you first need to familiarize yourself with the backdrop against which they have been happening: heady, hefty Artificial Intelligence. AI is “intelligence” that machines demonstrate when they perform tasks usually considered to require inherently human types of thinking, such as learning and problem solving. In recent years, AI has benefited from increasing computer power. More powerful computers yield not just more intensive processing during a task at hand, but also more advanced machine learning, which is how computers gain the knowledge that’s required for AI applications.
Machine learning is a branch of computer science that uses massive amounts of data to teach computers how to perform a task. Machine learning examines data related to a particular task, finds patterns in those data and makes associations among those patterns, then uses those new learnings to shape how the computer performs the task. If, after this analysis, the computer gets better at performing the task, then we say machine learning has occurred.
Because we have data on just about everything you can imagine, people are using machine learning to improve computer performance in everything from weather forecasting to automatic stock selection to machine translation.
Put simply, Machine Translation is automated translation: you present source material to a computer in one language, and it gives it back to you in another language. It’s not perfect, but it’s one of the most powerful tools we have for producing high-quality translations more efficiently.
MT has been getting better and better, in terms of the quality of the output and the breadth languages it supports, over the last several decades. From simple word replacement systems in the very early days of MT, to the explicitly coded grammar and lexicons of rules-based MT, to the number-crunching paradigm of Statistical MT, to the Deep Learning and neural networks of Neural MT, the development of machine translation has mirrored our increasingly sophisticated use of computers.
Statistical Machine Translation (SMT) leverages machine learning to generate a massive number of translation candidates for a given source sentence, then select the best one, based on the likelihood of words and phrases appearing together in the target language. SMT learns about translation through the lens of “n-grams”—small groupings of words that appear together in the source and target language. During the machine learning phases, an SMT system is given training material: that is, many, many examples of sentences in the source language and their translations into the target language. The learning algorithm divides source sentences and target sentences into n-grams and determines which target language n-grams are likely to appear in a translation when a certain source language n-gram appear in a sentence.
The learning algorithm then builds a language model that calculates the likelihood that given words and phrases appear next to one another in the target language. When the learning is done and it’s time to translate new material, the SMT system breaks the new source sentence down into n-grams, finds the highly associated target language n-grams, and starts generating candidate sentences. The final translation is that sentence whose target language n-grams correlate most highly with the source sentence’s n-grams, and whose target language words are most likely to appear together in the target language.
SMT works surprisingly well, particularly when you consider that there is nothing linguistic about an SMT system; indeed, the system only considers n-grams, never a comprehensive sentence. This differs from an emerging approach to MT: Neural Machine Translation.
Neural Machine Translation (NMT) overcomes the greatest shortcoming of SMT: its reliance on n-gram analysis. NMT empowers the machine—the system receives the training material, just as it would with SMT, but there’s a key difference. Once the system receives the material, it decides itself how to learn everything it can about that data.
NMT systems build vectors of information for each source sentence, associating information about each word by the words that surround it. Some systems come up with hundreds of pieces of information per word, creating a deep sense of accuracy. Through deep learning, NMT systems capture a massive amount of information about each word and source sentence, then use what’s called an attention model to hone in on the critical features that it has learned, through analysis of these massive data streams, are important for the translation process. The result is translations that show marked improvements in fluency, which means that computer-generated translations are starting to sound more and more natural.
While NMT is still a very young MT paradigm, it’s a game-changer in our industry. As available toolsets mature and more improvements come, we at Lionbridge will continue to increase our use of MT to accelerate our production processes.
At Lionbridge, we speak the MT language fluently. We’ve been offering MT and post-editing at scale since 2002, and every year we do more—and it just keeps getting better. Learn more about how we use our fluency in MT to help our clients here.