Workflow
中国电子:华为云CloudMatrix384:超节点突破与国产算力的自主崛起
海通国际·2025-04-14 14:23

Investment Rating - The report does not explicitly state an investment rating for the industry or specific companies involved Core Insights - Huawei Cloud has introduced the CloudMatrix 384 hyper-node cluster, which consists of 384 Ascend chips and is built on a new high-speed interconnect architecture, achieving significant advancements in computing power, interconnection bandwidth, and memory bandwidth [1][8] - The CloudMatrix 384 cluster has increased resource interconnection bandwidth by over 10 times and leads the industry in computing power density and memory bandwidth, supporting efficient inference of large-scale MoE models [2][9] - The CloudMatrix 384 matches NVIDIA's NVL72 in scale and inference performance, with practical tests showing a throughput of 1,920 tokens/second, surpassing the industry standard of 1,850 tokens/second [3][10] - The AI infrastructure provided by CloudMatrix 384 is critical for the implementation of large models, significantly improving MoE model inference efficiency and lowering development barriers [4][11] - The fully self-developed nature of CloudMatrix 384 enhances the domestic AI industry's capabilities, reducing reliance on overseas chips and supporting over 160 third-party large models [4][12] Summary by Sections Event Overview - The Huawei Cloud Ecosystem Conference 2025 was held on April 10, 2025, where significant advancements in AI infrastructure were announced, including the CloudMatrix 384 hyper-node cluster [1][8] Technological Breakthroughs - The new architecture of CloudMatrix 384 utilizes a fully peer-to-peer interconnection bus and shared Ethernet technology, leading to a tenfold increase in resource interconnection bandwidth [2][9] - The cluster's architecture allows for efficient inference of large-scale MoE models, simplifying development complexity and reducing computing costs [2][9] Performance Comparison - CloudMatrix 384 is the world's largest commercial AI computing cluster, with a single-node scale of 384 cards, compared to NVIDIA's NVL72 system, which supports a maximum of 72 cards [3][10] - In specific tests, CloudMatrix 384 achieved a throughput of 1,920 tokens/second, while NVIDIA's system reached 3,872 tokens/second under different conditions, highlighting the unique advantages of CloudMatrix 384 in large-scale deployments [3][10] Implications for AI Infrastructure - The infrastructure provided by CloudMatrix 384 is essential for the commercial viability of AI applications, enabling efficient and low-cost computing for large models [4][11] - The development of CloudMatrix 384 signifies a shift in China's AI industry towards systematic leadership and self-sufficiency in technology [4][12]