A new star of machine translation is rising

Editor’s note: This article comes from WeChat public account “Big Data Digest” (ID: BigDataDigest) , author Liu Junhuan.

Recently, an online machine translation software has become popular in Japan.

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

This translation software is called DeepL. The reason for the fire is that it is too responsible for the work and the translation is too accurate, which has caused heated discussions in Japan.

From the folk evaluation of Japanese netizens, not only the Japanese dialect translation effect is leveraged, but even the classics have also been taken down. You know, this is something that even Google Translate cannot do.

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

But how precise is it? As a rigorous technology software, of course, it is still necessary to use data to speak. DeepL officially also published the blind test results of Japanese-English translation and Chinese-English translation. As shown in the following figure, it can be seen that DeepL is simply a crushing grade. Exist:

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

The blind test is to evaluate the translated text when the professional translator reviewers do not know which translation version is translated by which website. This has also been one of the methods of DeepL testing.

DeepL also caused Reddit because of its excellent accuracy. Some netizens pointed out that DeepL is not like GoogleTranslate words from words like translations. From the settings of Textractor, you can see that DeepL also supports the use of previous translations as context to improve translation results.

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

Many netizens are calling “DeepL Niubi”!

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

Three years ago, when DeepL first appeared in the public eye, it has attracted a lot of attention. DeepL CEO Gereon Frahling once said that the goal of DeepL is not just translation tasks. Neural networks will start from understanding the text and open more More likely.

As for how it may have been developed, the digestive bacteria did a small evaluation, and then came together to watch the history of DeepL’s family. The small bench has been placed. Welcome to sit ~

Dialects, classical Chinese, academic papers, machine translation fairy fight!

Whether it is a private evaluation or the official blind test result of DeepL, it implies that DeepL may be the most accurate machine translation at present. What is the condition, or do you have to try it yourself?

Since this update also includes simplified Chinese, with a little bit of doubt and a little bit of curiosity, Digestive also conducted a simple evaluation of DeepL, and currently mainstream Google Translate, Microsoft Translate, Baidu Translate, Youdao translation was compared.

The test score is three rounds, the first round of dialects, the second round of classical Chinese, and the third round of academic papers. OK, now we have five players on the field.

In the first round, let ’s take a look at the dialect.

Everyone knows that Chinese dialects have a vast and profound culture. If dialects cannot be translated correctly, this accuracy rate must be marked with a question mark.

We have chosen the Northeast Dialect’s ten-level topic: “I’m going to go, you’re too stingy.” There are two scoring points in this question, one is “I’m going to go” and the other is “磕 碜”. Let’s take a look at the five playersPerformance.

On the first scoring point, Google translated it into “I’ll go there”. Microsoft and Baidu considered it to mean “I’ll go”. Youdao gave “I don’t know” The answer, DeepL performed well, correctly translated into “oh my god” with a surprising tone.

On the second scoring point, all five players gave different answers: Google “shy”, Microsoft “snobful”, Baidu “shabby”, Youdao “bad”, and DeepL “ugly”.

From the point of view of scoring, Baidu’s performance in the second question is okay, it makes sense … Barely pass, Google and Microsoft are completely destroyed. Let’s enjoy the full score of DeepL:

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

This is the first question, don’t worry, there is still a chance to turn around. Next, let’s take a look at the classical Chinese. Since DeepL can translate ancient Japanese, it would be wrong if it could not translate ancient Chinese.

Second round, classical Chinese.

In the classical part, we take the famous sentence “Tangyu Yuanhuai” of Zhang Jiuling, a famous Tang poet, as the test question, “The sea is full of bright moons at this time”, this poem means that a bright moon rises in the boundless ocean. It reminds people of relatives and friends who are far away at the ends of the earth. He should be looking at the same moon at this moment.

The scoring point for this question is to see if the players can express the mood of the entire poem in English. OK, the Chinese version of the standard answer has been announced, so how are the five players performing?

First of all, in terms of sentence meaning, Google, Microsoft, and Baidu have directly abandoned the translation of the second half of the sentence, and you can translate the second half of the sentence into “Tianya at this time”. In the translation of the first half, both Microsoft and Baidu The word born is used, but Microsoft’s translation is “The sea is born”? ? ? ?

Let ’s take a look at DeepL. The first half of the sentence is exactly the same as Google ’s answer, but it ’s not clear whether the translation of the second half of the sentence has reached the level of Sintra ’s digestive bacteria.

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

The third and final final question, we will examine the Chinese-English translation of academic papers by players.

In addition to the fluency of sentences, the key to academic papers must be accurate in professional vocabulary, which is also the focus of this investigation.

In the Chinese-English translation, we selected an article published last year in the “International Press”. The researchers investigated the impact of social media trust on privacy risk perception and self-disclosure.

Original text: Empirical results show: 1. There is no significant correlation between privacy risk perception and self-disclosure; 2. Social media trust negatively affects users’ privacy risk perception, and interpersonal trust on the Internet plays an intermediary role; 3. Social media Trust positively affects users’ self-disclosure, and network interpersonal trust plays an intermediary role in it.

From the translation results, the answers given by the five contestants are all satisfactory, and there are no problems with sentence patterns and grammar, except for some specific words. For example, “self-disclosure”, DeepL and Microsoft use “self-expression”, the remaining three players use “self-disclosure”; and “Internet interpersonal trust”, Youdao, Baidu and Microsoft translate into “network interpersonal” “trust”, Google gives the answer of “online interpersonal trust”, and DeepL translates as “cyber-interpersonal trust”.

As usual, let’s take a look at DeepL’s answer.

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

In the English-Chinese translation section, we have chosene1a82a5cabd3a69af0cac027e3 & scene = 21 # wechat_redirect “> Introductory part of the Imperial College Paper published last week ’s article extract. In the user experience, the extract of bacteria must be inserted. When switching from Chinese to English, only Baidu, Tao and DeepL have realized automatic recognition, Google and Microsoft still need to manually select the language.

Original: The global impact of COVID-19 has been profound, and the public health threat it represents is the most serious seen in a respiratory virus since the 1918 H1N1 influenza pandemic. Here we present the results of epidemiological modelling which has informed policymaking in the UK and other countries in recent weeks. In the absence of a COVID-19 vaccine, we assess the potential role of a number of public health measures – so-called non-pharmaceutical interventions (NPIs) – aimed at reducing contact rates in the population and thereby reducing transmission of the virus. In the results presented here, we apply a previously published microsimulation model to two countries: the UK (Great Britain specifically) and the US. We conclude that the effectiveness of any one intervention in isolation is likely to be limited, requiring multiple interventions to be combined to have a substantial impact on transmission.

From the results, the academic accomplishments of the five contestants are relatively high, and the standardization of academic language use is not much different. However, see the truth for details. Only Youdao retains the use of double dashes, but this is actually not common in Chinese. Except for Youdao, the other three players have not done any expression of “non-drug intervention (NPI)”. To perfection.

Still, the last laugh is DeepL. Although there are still various small problems that cannot get full marks, it is also a proper high-scoring test paper:

Hardcore evaluation, Google translation is crushed: the world's first translation engine has evolved, and

The above are all the questions in this evaluation. It can be seen that DeepL is indeed the number one seed player. Whether it is dialect, classical Chinese or academic discourse, it has a good performance. It seems that DeepL is still very honest.

Migration from Linguee, machine learning empowers DeepL

I saw DeepL ’s “exclusive performance”. Next, we will focus on the top seed, DeepL, which performed best in this evaluation.

Don’t know DeepL? Then Linguee should have heard of it. It is the online foreign language dictionary that has been in operation for more than ten years. The predecessor of DeepL was Linguee. Linguee is a translation tool that has appeared for many years. Although it is widely used and has a group of loyal users, its translation quality is still not comparable to Google Translate, especially considering the huge advantages of the latter’s brand and status.

But what really matters is Linguee ’s technical accumulation. Linguee ’s co-founder Gereon Frahling previously worked at the Google Research Institute. In 2007, he chose to start a new journey. The team has been working on machine translation for several years until 2016. Only then did they start developing new systems and building a new company, namely DeepL.

Linguee’s core competitive advantage is crawlers and machine learning systems. The former is able to capture large databases of more than one billion sentences of translation results and queries on the Internet, and the latter searches and evaluates the true translation methods of similar fragments on web pages. The combination of the two made Linguee “the world’s first translation search engine” at the time.

Ten