Production| Tiger Sniffing Technology Team

Author|Utada

Cover|Visual China

In April 2020, Mellanox, Israel’s top data center network technology company, which was finally acquired by NVIDIA for nearly US$7 billion, ushered in two pieces that were enough to cause a sensation in the industry after more than half a year of “silence” event.

First of all, Eyal Waldman, the founder and CEO of Mellanox who has been with Mellanox for 21 years, announced his resignation and left a rather dashing sentence: “When this transaction was concluded, I said Know that I am leaving. You know that I used decades to create a company and make all the decisions, not for second place.”

Secondly, just today, Mellanox, as a part of Nvidia’s GPU family, has been integrated into a new network connection architecture product by Nvidia—NVIDIA Mellanox 400G InfiniBand. From then on, it will take the lead in entering the most elite force in the data center market-the supercomputer market as a “GPU companion”.

The structure of NVIDIA Mellanox 400G InfiniBand products. Simply put, it is a component used to connect other servers and NVIDIA products

At today’s Global Supercomputing Conference, NVIDIA once again set a new performance record for its most powerful GPU product line, the enterprise-level accelerator A100.

The new generation of A100 GPU will increase the high-bandwidth memory to 80GB, which is twice the size of the previous generation. This means that more than 2TB of memory bandwidth per second will allow data to circulate more quickly between the memory and the GPU, so as to “withstand” the pressure of researchers building larger artificial intelligence models and data sets. /strong>.

“To continue to break the upper limit in the research results of AI and high-performance computing (HPC), scientists must build larger and more complex models, and then require larger memory capacity and more High bandwidth.” said Bryan Catanzaro, vice president of applied deep learning research at Nvidia.

Nvidia deployed a DGX system consisting of several A100 80GB blocks at its Supercomputing Center in Cambridge, UK

In fact, in addition to changing the architecture of chip products, “connecting” hundreds of thousands of chips in the most efficient form, of course, can also solve the huge computing needs of scientists per second/exascale, and purchase This is the purpose of Mellanox.

If you have seen the Chinese supercomputer “Taihu Light” in Wuxi, you will find that “this computer” is actually a computer cluster consisting of hundreds of black cabinets, which can fill up an entire floor. It can be called a “high-performance computer cluster”, or it can be seen as a large-scale data center.

Of course, the performance of these black cabinets is much stronger than that of ordinary servers. They are composed of more than 40,000 different types of domestic chips. It is an extremely difficult task to integrate them.

The Light of Taihu Lake in Wuxi Supercomputing Center

The role of NVIDIA Mellanox 400G InfiniBand is to “connect” tens of thousands of CPUs, GPUs and other types of chips in a supercomputer. While maximizing performance, the data transmission efficiency of each chip should not be too much loss.

“Previously, the interconnection between CPU and GPU was through Nvidia’s NVlinks (a kind of bus and its communication protocol), but the efficiency of the interconnection of this thing is not particularly good, and it cannot be simply extended to the ultra-thousand-chip interconnection. Counting scenes.” After participating in the development of TPU, Xinying Technology co-founder Yang Gong Yifan told Huxi that Mellanox was good at one of NVIDIA’s biggest shortcomings before.

“For a supercomputer, the efficient cooperation between chips of various brands is extremely important. Previously, Nvidia’s NVlinks could only be interconnected with IBM’s CPU. After the acquisition of Mellanox, Nvidia enhanced the scalability of using chips to build supercomputer systems. It allows the GPU to connect with other brands and types of chips.”

In other words, Nvidia has found a way to easily plug its products into other types of server interfaces such as Intel X86.

Supercomputer market, a chip nugget that cannot be ignored

Nvidia’s full range of products this time, without exception, are aimed at the “money-burning cave”-the supercomputer market.

For example, on the Global Supercomputer Top500 list released in June 2020, the Oak Ridge National Laboratory supercomputer Summit, which ranked second in the United States, is equipped with 2 IBM Power9 CPUs and 6 A Tesla V100 from Nvidia. There are 4356 such nodes at a total cost of 200 million US dollars.

In March 2019, the U.S. Department of Energy’s Argonne Laboratory publicly announced that it would spend $500 million to build a new generation of supercomputer Aurora. This will be delivered in 2021Supercomputers do not pursue computing speed blindly, but need to adopt new design ideas for the application of new technologies such as artificial intelligence.

So who is the main beneficiary of this huge government contract? The outside speculation is that Intel, the largest CPU manufacturer in the United States, and Cray, the famous supercomputer system integrator.

But it cannot be ignored that as supercomputing systems are used more and more in the field of artificial intelligence research, the additional acceleration chip GPU has gradually become a must for building supercomputers. Therefore, Nvidia has also obtained considerable benefits in many supercomputing projects around the world, including Oak Ridge Laboratory and Argonne Laboratory in the United States, which are the first early adopters of Nvidia’s best performance products.

Summit, the strongest supercomputer in the United States to date, ranks 2nd in the latest Top500 list.

The competition between supercomputers has always been regarded as a competition for technological strength between countries.

Although this is a narrow measure, these supercomputer clusters do play an extremely important role in many military and scientific tasks, such as weapon design and code breaking, and simulation of climate change, research and Diagnose the new coronavirus.

Many unprecedented materials and chemical experiments are unlikely to be performed in the cloud, so the deployment of high-performance servers is extremely important.

“No one wants to burn a lot of money to do some research on new technologies that have not been achieved for decades, but these studies are very necessary, so the calculations of these new technologies need supercomputers to support.” An industry insider told Tiger sniffing.

In this competition, the United States and China are of course the strongest contestants, and they are also the two major technological powers that are most willing to spend money on supercomputing systems.

In June 2020, although the Top500’s top spot was taken away by Fugaku, a supercomputing system costing US$1 billion by the Kobe Riken Computing Center in Japan. Among the 500 supercomputers, China has 226 systems, while the United States has 114.

The top 10 of the latest list in June 2020. From 2016 to 2017, the Chinese supercomputer Taihuzhiguang, which won the championship four times in a row, is in fourth place and Tianhe No. 2 is fifth.

Therefore, if the unit price of each supercomputer is hundreds of millions of dollars, this is a coveted market. And the government has set record huge orders time and time again, which is becoming more and more attractive to commercial companies;

In addition, there is no doubt that because supercomputers run the most difficult technology research and development tasks, their existence will be the source of future technology sinking into the industrial and consumer markets.

The battle for supercomputers between nations has been fought decades ago. In the beginning, most of the microprocessors used in supercomputers evolved from PC chips from Intel and AMD. However, in the past 5 years, the amount of data has exploded and the application of new technologies emerges endlessly, so the strongest Big supercomputers have begun to increase the use of professional chips, and Nvidia is one of the biggest beneficiaries.

As can be seen from the figure, those generated from 2020 to 2025The amount of data is three times the amount of data from 2010 to 2020. The picture comes from Nvidia.

But what is interesting is that the “source power” that Japanese Fugaku defeated the strongest computers in the United States and Japan turned out to be Fujitsu’s 48-core A64FX SoC, which should be the first supercomputer on the list powered by ARM processors. System (although I don’t want to link the acquisition of NVIDIA and ARM, but this is the beginning).

In the future supercomputing market, the current consensus is HPC+AI.In other words, the future AI is a typical application of supercomputing, and the magnitude is very large. So NVIDIA wants to be The future overlord will maintain its leading position in this market.

Of course, in the high-performance computing market, everything is not so easy. “