From the current situation, most AI chip start-ups will focus on benchmark performance testing of a single chip.

Editor’s note: This article is from “InfoQ” (Public No. ID: infoqchina) , Author: Nicole Hemsoth, translator: Nuclear Cola, Planning: Liu Yan.

For AI chip start-ups, the core task is naturally to bring higher benchmark performance peaks. The transcripts are beautiful, naturally attracting more hardware users who are engaged in deep learning or reasoning.

But everyone should have heard that such a (seemingly) simple decision often takes weeks or even months of development work, and the final project still does not work efficiently, let alone support those custom developers. Environment and new equipment. When we delve into each chip architecture, we often find that we don’t really understand the software stack part. On the contrary, we can only be incompetently furious, or Yunshan is swaying.

This gap does not stem from the absence of tools, time or resources. Instead, the question comes from how the AI ​​chip is designed (and who determines it), how developers are trying new architectures, and how the market/more developers can explore the practicals of these new architectures. The deep flaws in these links have caused us to today’s core issue – why the output of chip companies and the demands of developers are always unable to catch up?

More fairly, this pot can’t be completely backed by the hardware design side. Codeplay’s Andrew Richards sorted out a long list of things that made the AI ​​accelerator difficult to reach.

Here, let’s start with the most important factors that are widespread in most of the well-known AI accelerator companies.

Almost every AI chip startup has its founders with experience in DSP and embedded devices. From a workload/market perspective (especially in reasoning scenarios), this kind of expertise is really important; but for the AI ​​chip itself, it plays the accelerator role (more like a shared nature) The engine), therefore requires the designer to adopt a new way of thinking. In the DSP, almost all signal processing is performed on the device, and the entire design and development work is premised on this. To be more straightforward, what the CPU has to do in the past is just to turn it on orClosing the switch does not involve other complicated management tasks; however, the AI ​​chip is not so simple.

Richards said, “From a professional perspective, AI hardware vendors are more likely to hire people who are good at load-sharing technology.” He pointed out that the HPC community is the main source of talent, and practitioners can learn here over the years. How large-scale supercomputers use GPU resources in a complex parallel application based on a set of specific software environment constraints. In short, the HPC community understands the perspective of developers looking at load sharing and understands what this means for developers.

Transferring the right workload to the right device is a daunting task and often takes a year or more to explore and tweak. The same is true for game developers, who are the first to get the high-intensity load from early games to the GPU.

This brings up the second question. A year has passed since the developer completed the experiment and actually tested the production-level AI architecture. Chip start-ups need to build strong business models to support such a long business cycle. In addition, even if you don’t consider production availability challenges, experiment availability alone is enough headache. To build a new architecture, developers need to provide at least devkit and demo hardware. Obviously, internal designers with only embedded development backgrounds can hardly meet the device testability requirements of AI accelerator startups.

Richards said that the reason why the GPU started fast is that NVI has been sparing no effort to promote and promote its own chips, guiding developers to write code based on this. Over time, NVIDIA has finally built a rich ecosystem for GPU-accelerated workloads, making it easier for developers to complete code migration. However, the success of NVIDIA does not mean that it can be successful. For Nvidia, every developer buys one or two chips, which is enough to support this part of the business. But the accelerator price is too expensive, and start-ups obviously have no choice but to choose this business model. This is a difficult problem with chickens or eggs first. It is hard to make a breakthrough step.

“As an AI software developer, your job is to keep trying. In the process of trying, everyone will buy a few development chips more or less. But now AI chip startups can’t get through this path because One or two chips are too small for them to even support a complete distribution chain.”

If a developer can’t get their own devkit and test equipment, there is no way to submit an application that can actually run.So even the most enlightened architects can’t support such project plans without brains. In addition, because there is no time to test, the entire development cycle may take up to a year or even longer.

I thought this is over? No, the problem is not limited to this.

All this means that it is extremely difficult for developers to use any acceleration hardware other than GPU/CPU. In fact, even data centers face similar problems—several exceptions may be only edge computing and autonomous devices.

Richards explains, “This area has been growing rapidly, and developers need to be able to write software and run existing frameworks on their own. Therefore, developers usually prefer TensorFlow or PyTorch and make changes from top to bottom. This is a new solution. So from a developer’s point of view, since we can run PyTorch or TensorFlow directly on NVIDIA GPUs or Intel CPUs out of the box, what reason can we try other AI chip manufacturers’ products?

“In most cases, AI hardware company software staff assumes that API functions run on top of the CPU and then asynchronously transfer them to the accelerator. However, the assumptions of ordinary hardware vendors are completely different, their default software APIs The function will run on its own processor. Obviously, the two ideas are simply not right.” He added that these deep fundamental differences are exactly what he did during the cooperation with the AI ​​chip supplier and the AI ​​software. The conclusions reached by the developers after communication.

“One thing AI chip developers must admit is that while they see their products as processors, they also need to ensure that these chips are properly matched to the host CPU system. To really attract the attention of users, rely on It’s not just the benchmark performance data of a single chip, not even what kind of application scenario the user is currently facing; the most important thing is to regard his processor as an integral part of the system.”

With all of this in mind, Richards came up with the following chart, which shows the minimum requirements for the current AI software integration scenarios. This simple and clear list is worthy of everybody who is still thinking about the AI ​​accelerator with hardware as the center.

The gap between AI chip startups and developers< /p>

Richards pointed out that for practitioners in the field of AI accelerators,Each entry in the middle represents a tricky puzzle. In addition to the lengthy development cycle, AI vendors must provide available chips to developer users and wait a year to confirm success. “AI processor vendors must be aware that the first version they came up with is just devkit, and the next version is likely to be actually in production. Obviously, this is a very difficult business, even for them. Mode.”

Richards also shared a second chart telling us that the AI ​​chip market has continued to grow in size and diversity, but at the same time it has maintained certain core rules. If AI chip startups can introduce these rules into the business model, the results can be quite different.

The gap between AI chip startups and developers< /p>

Richards said that many developers will be able to purchase demo prototypes in three major segments. However, the actual use after purchase is varied, which ultimately determines the success of an AI accelerator. It is important to emphasize that AI developers need the ability to write their own software and combine it with the original tools/architecture. In addition, this combination will become more difficult as large data centers move toward high performance and high programmability. After all, these wealthy gold masters are willing to spend money, but also require new hardware to smoothly interface with their customized code or long-term use of the original code.

“Most AI software developers come from industries such as HPC or gaming, so they have a fixed understanding of the entire system. For example, delay-sensitive tasks should run on the CPU, compute-sensitive tasks should be handled by the accelerator, and so on. Therefore, the overall view is the most important premise. In other words, a vendor that lacks a global view and cannot examine the problem from a holistic system perspective will not be able to help users solve practical problems, let alone succeed in the market.”

Unfortunately, from the current situation, most AI chip start-ups will focus on benchmark performance testing of a single chip. Obviously, they forget the final destination of each AI device and will always be part of the data center or other overall system. Only when we really discuss from the perspective of code development and production, we can hope to improve the quality of dialogue and solve problems fundamentally.

Original link:

https://www.nextplatform.com/2019/10/29/deep-divides-between-ai-chip-startups-developers/

The cover image is from pexels