This article is from WeChat public account:Silkians (guixingren123), author: Doutzen, edit: Vicky Xiao, from the title figure: vision China

On Tuesday, November 12th, US time, Intel finally officially announced the first neural network processor for complex deep learning: Nervana NNP.

The full name Nervana Neural Network Processor, this is Intel’s first dedicated ASIC chip for data center customers for complex deep learning. (In a more understandable way: Nervana NNP is the Intel version of the TPU.)

The naming of the chip comes from Nervana, a neuro computing company that Intel acquired in 2016.

At the Intel Artificial Intelligence Summit on Tuesday, Naveen Rao, former vice president and general manager of the Artificial Intelligence Products Division (formerly Nervana founder and CEO), said, “With the further development of artificial intelligence, computing hardware and memory All will reach the critical point. If you want to continue to make great progress, dedicated hardware is essential.”

He further pointed out that with the use of Nervana NNP, artificial intelligence will become more advanced at the system level, driving the next revolution from the information technology era: from “data to information transformation” to “from information to knowledge Conversion”.

Accurately, the Nervana NNP is not a chip, but a family of chips, a new architecture. For extremely complex deep neural networks, from training to reasoning, Nervana NNP has different products for data center users with different needs.

Nervana NNP-T1000 (hereinafter referred to as NNP-T) is a neural network training processor, Nervana NNP-I1000 (hereinafter referred to as NNP-I) has been greatly optimized for reasoning.

The Nervana NNP family has three main highlights: higher computing density and better energy efficiency, supported by Intel architecture + open source full stack software.

Intel claims that the computing density of the Nervana NNP chip family is 3.7 times higher than the competing product. As far as energy efficiency is concerned, especially on the NNP-I neural network inference chip, the energy consumption of a single piece is only 15W.

The current development of neural network technology is advancing by leaps and bounds. The depth of the model is getting larger and larger, and the number of nodes is increasing. It is difficult for a single processor to complete the training work with acceptable efficiency. Therefore, the computing density and scalability of the processor are high. And the requirements for energy consumption are getting higher and higher.

In terms of training, Intel showed that NNP-T can achieve 95% accuracy in ResNET50 and BERT benchmark.

With Intel’s advanced memory management and communication technology, NNP-T can scale to a cluster computing architecture of 32 or more chips, and each data transfer efficiency and power consumption can be maintained with stand-alone use. Consistent. NNP-T strikes a balance between computing, communications, and memory, both in small-scale clusters and in the largest supercomputers.

NNP-T chip:

Calculation card with NNP-T: Mezzanine:

Like Google’s Cloud TPU large-scale cluster architecture, Intel also made a Nervana POD: 480 NNP-T neural network processors in 10 racks. Thanks to the nature of the NNP-T processor, better communication and coordination between the chassis and even the chassis is achieved, and the near-linear expansion capability brings a significant increase in computing power.

In terms of reasoning, NNP-I’s biggest advantage is its high energy efficiency, low cost, and flexible form factor, which is very suitable for multi-mode reasoning that runs in the real world with flexible specifications.

The NNP-I’s working power consumption is around 15W. It can be combined with data center enterprise users’ own technologies to deploy faster and more efficient inference calculations.

Intel is targeting leading edge artificial intelligence customers such as Baidu and Facebook, and has customized development for their artificial intelligence processing needs. For example, when NNP-I and Facebook Glow compilers are combined, significant improvements can be made to workloads such as computer vision, while achieving higher performance while saving more energy.

NNP-I chip:

m.2 computing card with NNP-I chip:

TruncationTo date, in addition to Intel, the world’s leading computing companies, including NVIDIA and Qualcomm, have launched neural network processors similar to Nervana NNP, such as TESLA, Cloud AI 100 and so on. And long before them, Google developed its own TPU to meet the needs of artificial intelligence training. In such a competitive environment, how does the significance of the Nervana NNP stand out?

Intel said that the demand for artificial intelligence computing has increased significantly in recent years, and the requirements of enterprise customers for data center computing power will double every three and a half months.

One fact that cannot be ignored is that at least in the industrial world, most companies and organizations’ data centers and cloud computing services are based on Intel’s Xeon (Xeon) processor. Advancing deep learning inference and application requires extremely complex data, models, and techniques, so different considerations are needed in architecture selection. For customers who have already invested in Intel architecture, the advantages of Nervana NNP in architecture compatibility and performance optimization are undoubtedly significant.

As Naveen Rao said, the development of artificial intelligence has brought unprecedented demands on dedicated chips. In addition to the world’s leading technology companies, there are many smaller companies that also involve artificial intelligence, but their model requirements for data center computing are differentiated.

One of the most straightforward examples is that not every company is willing to pay Google and Amazon huge cloud computing fees. They may need to set up their own cloud in their own data center, which may have a need for less or more, and ever-changing needs. For these customers, and their requirements for accessibility, compatibility, scalability, etc., Nervana NNP seems to be one of the best solutions.

In addition to the Nervana NNP chip family, Intel also introduced the third-generation vision processor Movidius Myriad VPU for edge computing.

Compared with the second generation, the third-generation VPU has once again set a benchmark for edge computing performance, with more than 10 times better reasoning performance on specific computer vision tasks, and better energy efficiency than competing products. This product is mainly suitable for the embedded environment of the terminal, such as small robots, cameras, smart furniture and so on.