集微访谈Fabrizio Del Maffeo:AI芯片供应商,从小而美到大而强,要走几步?

在往期的集微访谈栏目中,爱集微有幸采访了Axelera AI首席执行官兼联合创始人Fabrizio Del Maffeo。集微访谈就关于RISC-V开源技术、AI芯片发展、存内计算、新创企业模式、数据访问权限等一系列问题,收到了十分有启发的答复。

问:我的第一 个问题是,与冯·诺依曼结构相比,内存计算的特点是什么?与传统架构相比,它有什么优势?






答:对,你使用存内计算。你只用来做一件事,那就是在向量和矩阵中做乘法运算。而如果你深入研究神经网络、递归神经网络、卷积神经网络、LSTM 网络、Transformer网络,70%到90%的计算都只是向量矩阵乘法。



问:您认为存内计算是否是突破存储墙(Memory Wall)的一种解决方案?


这样,你可以使用数千个小 CPU 和数千个小存储而不是使用巨大的CPU和巨大的存储。我认为这是解决内存墙的最佳方案,但这并不是真正的存内计算,而是邻存计算,因为两者的区别在于,在邻存计算中,你仍然有一个存储阵列和一个计算元件。而在存内计算中,你要分解存储阵列,并在阵列中放入计算元件。存内计算智能用于乘法和累加,没有其他用处。




例如,如果你使用我们设计的解决方案,它实现了超过 200TOPs的算力,我们将它定价在149美元的卡片中,因为我们希望人们使用它。我们想让人们获得这个强大的解决方案。赚钱总是有时间的。但首先要让人们能够利用我们的技术创造出伟大的东西。如果他们成功了,我们也就成功了。然后我们认为,重要的是要有一个易于使用、高性能、低成本的东西,你可以在网上买到它,在世界各地都能买到它。你随处都可以做出好产品。我们希望激发创新的活力。

问:对于 AI 加速芯片来说,采用 RSIC-V 有何优势?


问:您觉得 AI 应用会成为 RISC-V 生态的重要推动力么?

答:我认为在针对特定应用设定的芯片中使用RISC-V比在通用芯片中更容易。因为在特定应用芯片中,你可以使用RISC-V,并针对你想做的事情优化 RISC-V。

然后,你必须对它进行验证,以满足你的要求。但是如果你想把RISC-V用作通用处理器,你想用它来和英特尔或者AMD最先进的CPU竞争,那就得另当别论了。用RISC-V实现要困难得多,需要的资源和时间也更多。因为这是一个新的架构,它还没有得到所有人的高度认可。从某种意义上来讲,当达到芯片如此复杂的阶段,你需要一整个生态系统的支持,你需要驱动程序,你需要来自社区、微软、Ubuntu、Linux 的支持。总的来说,这将变得更加困难。因此,我认为目前得益于AI, RISC-V 将会发展壮大,。但要想成为真正的通用解决方案,来替代现在的产品,还需要 5 到 10 年的时间。我们还需要给RISC-V一些时间才会看到它在诸如手机上应用。








所以你需要给他们提供所有的软件栈和工具。让他们能够以容易简单的方式高效地使用你的解决方案。这就是为什么要注重易用性。如今,例如,在边缘领域,虽然英伟达的AI硬件产品很强大,比如性能以及平台,但它太贵了,限制了它的普及。你总不能想着在一个准备卖 500 美元的机器人里面塞一颗价值 1000 美元的芯片对吧?这肯定不现实。


问:您能告诉我们更多关于维持能力的事吗?还有让Edge AI芯片保持Edge AI解决方案的易用性有多重要?


那么问题就是如何在边缘使用这些功能。我们Axelera AI,我们必须为客户提供一个简单的软件堆栈,这让他们在云端做的事情在边缘也可以运行。你必须确定,不知道什么是量化,云计算的客户可能知道这是啥,但是在边缘计算,90%、95%的客户不知道也不在乎单精度浮点型和整型之间有什么区别。所以这个坑只能我们来填。让他们无论在云端做了什么。我们必须在边缘侧为他们提供使用相同应用或相同网络的工具。然后,边缘服务提供商需要建立一个更柔性的堆栈,允许客户使用他们现在正在使用的东西,但要部署在边缘。我们应该负责部署落地,而不是造新轮子,因为客户他们不想学习新的东西。


问:边缘 AI 所面对的各种工况是否意味着,其采用的芯片类型和数据中心所使用的芯片的侧重点各不相同?边缘 AI 可能更倾向于专芯专用?




问:边缘 AI 碎片化的产品需求是否意味着更不容易被大公司垄断,小公司会有更多机会?答:是的,完全正确。从传统来看确实是这样的。如果我们考虑云计算,在过去的20到30年中,一直都有英特尔、AMD,最近还加入了英伟达,然后实际上有2到3家公司占据了云计算市场的98%,而新兴公司或其他公司只占很小一部分份额。

但是,当我们转向边缘计算领域,历史上一直存在着众多的参与者,比如英特尔、AMD、英伟达、高通、恩智浦、德州仪器、瑞萨电子、意法半导体、英飞凌、联发科技、Cirrus Logic、Umbrella Silicone等公司。边缘计算市场更加专用化,有非常多不同的应用领域。这导致了市场的碎片化,而大型企业并不喜欢这种情况。顾客往往需要特定的应用处理器来满足他们在边缘设备上的需求。这就是为什么边缘计算市场存在更多的空间容纳更多参与者。我预计在边缘计算领域也会出现整合,但不像云计算那样,我预计在边缘计算领域会看到的公司会多得多,虽然边缘计算的半导体公司体量较小,但是数量更多,而在云计算领域的玩家则相对较少。

问:您是如何看待 CUDA 的?




问:您认为在未来边缘 AI 和数据中心的集中 AI 会相互融合相互配合?我们现在处于什么状态?








问:Axelera AI 总部位于埃因霍温,这是一座高科技实力雄厚的小城市。 埃因霍温凭借怎样的炼金术打造出如此伟大的半导体产业集群?



因此,埃因霍温地区绝对是理想之地。在Axelera AI,正如你所知,我们拥有来自英特尔的专业人士,他们来自埃因霍温,还有来自苏黎世联邦理工学院(ETH Zurich)的人员。我们在瑞士设有一个大型办公室,我们还有来自IBM苏黎世实验室的人员。此外,我们还有来自五个制造商人才组成的存内计算团队。我们Axelera AI在欧洲各地拥有员工。目前,我们有140名员工,其中超过50名拥有博士学位,但他们分布在欧洲不同地区。我们致力于招聘在欧洲找到的最优秀的人才。







Q:what are the characteristics of the in-memory computing compared to the von Neumann architectures? And what advantages does it have over the traditional architectures?

A:Yes. Thanks for asking. In-memory computing as the advantage that you can parallelize computations in a unique way, because essentially, you are transforming a memory array, which is typically large, can be 260,000 elements or a million elements and use this as a computational engine.

Then the advantage is that you have high parallelization, which means high throughput, low data movement, because you compute the calculation in the memory, which means low power consumption and low cost, because you merged the memory area with the computing element.

And then the area is smaller and means low cost for the chip. And there are two kinds of in-memory computing. One is analog in-memory computing. The other is digital in-memory computing. In analog in-memory computing, you use the relationship between the current and the tension that you have in transistors, that you have in the memory cell, to do the computations, to do the vector matrix multiplication, which you have in all neural network.

And this is one way, right? But when you do in analog domain, you means that you have a data coming in digital data. You convert in analog. You do the computation. Then you convert it back in digital. The problem of analog in-memory computing is that there is noise in the analog domain. And then you have noise, and the noise changes the result of the calculations. Then a typically analog in-memory computing chips. They don't have high accuracy and high positions. You have to fine tune the network, fine tune the silicon to get back a decent accuracy. In we accelerate, we have this technology, but we don't use this. We use digital in-memory computing, which is different, because what we do,We don't convert. We don't do calculation in analog. We just take the estram sale. And close to us is each cells. We embedded an element, a computing element to do the multiplication. And then we have an adder tree that make the accumulations. This allow us to make calculations in the digital domain, allow us to put together the memory and the computation in a small area. This allows us also to parallelize the computation.

And then we will have a very high throughput, low cost, because it's the cheapest, small low data movement, which means low power consumptions and high precision because we stay in digital.

Q:Is in-memory computing more suitable for special-purpose computing for specific algorithms rather than general-purpose computing?

A:Definitely, you use in-memory computing. You use only to do one thing, multiplication between vector and matrix. And if you look inside the neural networks, recursive neural networks, convolutional neural networks, classic networks, transformer networks, 70 to 90 % of the calculations are just vector matrix multiplication.

And you do in-memory code you use in-memory computing to do all it is. In-memory computing can do all of this. You cannot do activation functions. You don't do this within memory computing. You just do the multiplication and the accumulations, the sums when you have to sum up the numbers. That's it. But this these calculations represent 70% to 90% of what you have in any neural network. And this is the reason why it's important to use it in AI and machine learning in deep learning.

But you don't use in-memory computing in any other domain. Because unless you have to do vector matrix multiplication.

Q:Do you think in-memory computing is a solution to break through the memory wall?

A:In-memory computing is the solution for the vector matrix multiplication, not more than this.

To break the memory wall, there are other approaches, which is near-memory computing, which is slightly different, where you have a more generic computing element, very small, and you put the memory close by.

Then instead of having a larger CPU and a larger memory, you have thousands small CPU with thousands small memories close by. I think this is the best solution to solve the memory wall, but it's not really in-memory computing, but there is near-memory computing. Because the difference is that in near-memory computing, you still have an array of memory and a computing element. While in in-memory computing, you break down the array of memory, and you put inside the array of the computing elements. You can do it all if you do multiplication accumulation. Otherwise, it's useless.

Q:Can RISC-V be part of vision of“democratization of Artificial Intelligence”?

A:Yes, it is.RISC-V is one element. In general, in accelerate, we try to keep open as much as we can, our software stack. We are using open source code. We are using TVM in the back end of the compiler, we are using the fair in the firmware, which is an open source for supported by Intel. We tried to use also one API, and we are trying to use as much as possible open source, and also to give back the community.

In accelerate, I most of the other many other guys are very active in the RISC-V communities. And then we want to give back the community. We want to develop things, create our own architecture and our own product. But still, based on open sources. But I think that when I say that we want to democratize the AI, it's also mean that we want to have a product which is powerful, usable, and low cost.

For example, if you take our solution that we design, which is a cheap of more than 200 tops, we are positioning this in already in a card at $149, because we want people to use it. We want to give the access to a powerful solution to people. There is always time to make money. But the first things is to have people to create great things using our technology. If they succeed, we succeed. Then we think that it's important to have something that's easy to be used, high performance, low cost that you can buy online, that you can get it everywhere in the world. You can just do great product around. We want to unleash innovation.

Q:As for AI accelerators, what are the advantages of using RSIC-V?

A:Well, it's the advantages that we can control it, because it's open source, we can design, we can control it. We don't have to go back to anyone and ask permission to or ask a source code of the compiler. If you use whatever IP from CAD and Synopsis, doesn't matter. You cannot access to everything you start to rely on them to. And this is can be a problem. In the long run, therefore, with RISC-V, you can just control completely your architecture. And it's a platform which is tested by a large community, which is good. And you can extend and develop it. For example, we are developing a vector instruction, a specific veterans instruction set units, which will be integrated in next generation. And we can do it by ourselves because we have the knowledge. And it's an open source platform, then we don't have to negotiate with supplier to solve the problem.

Q:Do you think AI applications will become an important driving force for the RISC-V ecosystem?

A:I think It's easier to use RISC-V in an application specific shape than in a general purpose. Because in application specific chip, you can use the RISC-V and optimize the RISC-V for what you want to do.

And then you have to verify it only for what you want to do. But if you want to use RISC-V in it as a general purpose processor, and you want to use it for to compete with a cutting edge, Intel, CPU or cutting edge, AMD CPU, then is a different story. It's a way more difficult, and it requires way more resources, way more time, because it's a new architecture. And it is not so highly verified by everybody. In the sense, when you go to complex things, you need an ecosystem around, you need the drivers, you need support from the community, from Microsoft, from you go into from Linux. In general, then it becomes more difficult. Then I think that RISC-V will grow. Now thanks to AI. And it will take still 5 to 10 years to become a real general purpose solution alternative to what you have today is still take time. It will take time to have at least five running a mobile phone.

Q:In terms of the computing efficiency, maybe the data centers has better infrastructure because they have better infrastructures. They have more constant computing power. So why do we need the Edge AI

A:You don't need Edge AI for efficiency, as you said, is correct. the center, it's way better because you concentrate everything you can eat, especially for utilization, more than the efficiency itself. Utilization is way higher, right? In the center of sun. But you need Edge AI because of privacy, security of data, safety, economics. Think about it, if your car, you cannot have a car that is asking to the cloud, should I turn right or left? Your car needs to have the computing power to react on time without a latency, almost to whatever happening without checking with cloud. Even because in some area, you don't have even coverage.

Second of all, it doesn't make sense even from economics to send everything to cloud, think about surveillance, sophisticated surveillance system, where you have plenty of camera, high resolution camera. It's extremely expensive to think to take all these data and send to cloud, because 95% or 98% of this data is useless. Because you want to understand that you want to identify the things like, I don't know the baggage that someone drop in a railway station or the specific person that is running. And the police is looking for. For these things you don't have to know, why should you send all the data to the cloud? You can extract the right information at the edge, then it's even cheaper to do it. And still, there are plenty of in many area. There is not even coverage. Actually, you don't even have a good connection. Then there is still an infrastructure problem where you can't solve everything, sending data to the cloud, then it makes sense edge computing. It's necessary for many different application, drones, robotics, car, automotive. It makes it up and even surveillance, actually.

Q: What's the problem we need to overcome for the Edge AI solutions? You have already mentioned the power consumption, the platform maybe didn't have so much powers, maybe. And the operating conditions, the light, the latency, the cost, or the maintain abilities

A:Yeah, I think that the obstacle, for me, it's different is that in the cloud, if you in the cloud, you have few players in China, for example, you have 234 cloud providers, the same in the United States and in Europe. Chinese and Americans are leading the cloud in terms of providers. And there are a few company building up largely the center and providing services.

Therefore, it's easy to design a technology and to provide to them, because it's you have one big customer with one set off a list of features that they need requisite and so on. But when it comes to the edge, you have 1,000 or several thousand of customers, each of them with asking different things. And many of this customer didn't have the background to understand your technology and to twist it in the way they need.

Then the problem of the edges that you need to differ. If you want edge to succeed, you need to have clearly cost effective hardware, because you need to cost effective solution, because the edge customer is more sensitive than a cloud customer. In terms of this, you need to have power efficiency, because you have constraints. You don't in the center of no constraint, you have a power plants close by. But at the edge, you have some constraints. You have to have efficient, but also usability. You need to have something that is plug and play. Customers, they don't have…… 90% of the customer of the edge. They cannot have the engineers so that Baidu can have. It's different, right? It's because they are medium, small companies.

And then you need to give them all the software stack, all the instrument to use very efficiently. But in an easy, simple way, your solution. And there is……Today, for example, at the edge, you have greater envy as a great hardware in terms of performance and platform, but it's too expensive to scale. You can't use $1,000 hardware, a probably in a small robot that you want to sell at $500, right? You can't simply.

And then I think there are solutions which are good, but expensive or there are solution that are cheap, but it is difficult to be used. And it's important to find a good compromise.

Q:So can you tell us more about the maintain ability? And how easy to use for the how important to make the Edge AI chip stay Edge AI solution is easy to use, because we know customers.

A:I can tell you, first of all, customers, they use the cloud to do everything even to train the algorithm. If you are a small medium enterprise and you want to do something in AI, you have to connect to Amazon or bite or whatever. It doesn't matter which kind of player you have to go back to the cloud system and use the the typical tools that you have in the cloud. Where do you get out from it? Its a network, a training network and applications.

Then the problem is how to use this in the edge. Then if we as excel and we have to give customer a simple, softer stack, which allowed them to take what they did in the cloud and run it at the edge. You have to be sure that customer didn't know what is quantization in the cloud, but in the edge, 90% of 95% of the customer, they don't know what was the difference between floating . 32 and intake. They don't care. We have to solve that problem. They should do whatever they want in the cloud. And then we have to give them the tools to use the same application or the same network of the edge. Then an edge provider needs to build up a softer stack which allow customer to use what they are using today, but deployed at the edge. Company like Ccash, we should be responsible of the deployment, not of the development because customer, they don't want to learn a new things.

If you go to customer and you say, listen, I have a great hardware, but you have to learn my software. They will say, no, I don't have time. I don't want. What should I do it? You have to go to them and say, listen, I have a great hardware and software stack. What you have to do is just take what you have. Push button. I any runs or take what you have, do this abc any runs. It should be very simple. And this is the key aspect that a lot of companies, I think they don't think about it. They think that it's important to be efficient. Yes, the efficiency is important, but it's not only that you need a mix of things, efficiency, throughput cost, and the software stack, even because customer cares about the total cost of ownership, if you go to customer and say, listen, with my cheap, you save, I don't know, 300,000 euro per year, but the customer to change the software need to spend 1 million, then they will not do it simply. Then you have to think at the picture at a big level the implication.

Q:They so is data center more client to use the general purpose, ai computing power. While the edge AI chips may be going the other ways they will design for the pacific use case. And all the customize is something like that.

A:If you go to consumer edge, it’s super customized, because in a television, your television is edge. You have a super solution, and you have a lot of features that are AI-generated, and then it’s super customized. SOC has to do all a few things in a very specific way. It has to be low power consumption, because the television must be at low power consumption. It cannot have a fan or a computer running inside. Then it's highly customized. In the phone it’s the same. In the phone, if super customized is battery power, then probably you don't run floating point network, you run binary networks, and it's good enough, because the customers are not really sensible. When you go to automation, you have to go to find a good compromise, because you have still limitation in power sometimes, but you can compromise and say, I use this net for binary, because probably if using automation and network, you should have high accuracy.

And then you have to find a good compromise between efficiency, throughput, accuracy, then having some limitation of the edge, but still try to look for the precision that you have in cloud computing, then it’s still customized, but it's different, it’s more programmable solution.

Then when you go to cloud, as you said, in a cloud, you have everything. But in a cloud, if you see, there is more and more specialization. The difference is that essentially the crowd in data center, you start to have more and more specialized machine for specialized workload. Because even there is a need of efficiency, not like at the edge, but still it's necessary. At the edge, you try to get 15 Tops per watt, 20, 30, whatever. In the cloud today, the workloads are running at the 0.1 Tops per watt even less. Because if you take a general computing a platform, it's very low deficiency. And then even in the data center, you see the trend to have the trends to have the tensor processing unit, GPU, CPUs, etc. It's a kind of trend and based on the workload, they start to allocate to the different hardware. Then I see trends in the data center to do it.

Q:Does the fragment to the product demand of the ai indicated that it's less likely be model class by large companies, such as the Nvidia, AMD, centralized that. And the small company might have more opportunities in this area.

A:Yes, absolutely. It's traditionally, it's like this. If you think about cloud computing, we always, in the last 20, 30 years, you had always Intel, AMD, Nvidia more recently, then you have and you still have actually 2,3 players that are dominating 98% of the cloud, and a very small portion for new startup or other players.

But if you got at the edge, historically, you have plenty of players, because you have still Intel, AMD, Nvidia, Qualcomm, NXP, Texas Instrument, Renesas, ST Microelectronics, Infineon, and MediaTek, Cirrus Logic, Umbrella Silicone. I can go on, right? You have a lot of players, because as you said, the edge is more specialized, you have plenty of applications. It's very fragmented, and the big players, they don

