Seek .(SKLTY)
Search documents
DeepSeek新年炸场!梁文锋署名论文发布
Di Yi Cai Jing· 2026-01-01 13:44
Core Viewpoint - DeepSeek has introduced a new network architecture called mHC (Manifold-Constrained Hyper-Connections) aimed at addressing instability issues in large-scale model training, potentially guiding the evolution of next-generation infrastructure [1][3][4]. Group 1: Technical Innovations - The mHC architecture improves upon traditional hyper-connection frameworks by balancing performance and efficiency, akin to adding "traffic rules" to information channels, ensuring stable information flow during model training [4]. - The research highlights that mHC can enhance the stability and scalability of large models, making it easier to implement in complex scenarios, such as multi-modal models and industrial decision-making systems [5]. Group 2: Industry Implications - mHC may reduce hardware investment and training time for companies developing larger foundational models, thus lowering the barriers for small and medium AI enterprises to create more complex models [5]. - The innovation is seen as a fundamental advancement in addressing core issues within the Transformer architecture, with expectations for significant updates in DeepSeek's upcoming V4 version [5]. Group 3: Recent Developments - Despite not launching major versions like R2 or V4 in 2023, DeepSeek has continued to innovate, releasing DeepSeek-V3.2 and DeepSeek-Math-V2, the latter being the first math model to reach international Olympiad gold medal standards [6].
AI进化速递丨DeepSeek提出mHC新架构
Di Yi Cai Jing· 2026-01-01 13:05
Core Insights - DeepSeek has released a new paper proposing the mHC (Manifold-Constrained Hyperconnection) architecture [1] Group 1 - Zhiyuan has launched an integrated embodied large brain system called GenieReasoner [1] - The Moon's Dark Side project has introduced a new multimodal model earlier this year [1] - DeepSeek's new paper focuses on the mHC architecture, which aims to enhance hyperconnection capabilities [1]
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
Xin Lang Cai Jing· 2026-01-01 11:45
Core Insights - DeepSeek has introduced an upgraded version of the residual connection, a fundamental component of deep learning proposed by Kaiming He in 2016, marking a significant evolution in the field [1][27]. Group 1: Residual Connections and Hyper-Connections - Residual connections have remained unchanged for a decade, serving as the cornerstone of deep learning architectures, allowing signals to pass directly from shallow to deep layers without modification [5][31]. - The emergence of Hyper-Connections (HC) aims to expand the residual flow width from C dimensions to n×C dimensions, introducing three learnable mapping matrices to manage information flow [7][32]. - Experiments by the DeepSeek team indicate that the Hres matrix, responsible for internal information exchange within the residual flow, contributes significantly to performance improvements [7][32]. Group 2: Challenges with Hyper-Connections - When HC is extended to multiple layers, the composite mapping no longer retains the identity property, leading to sudden loss spikes and gradient fluctuations during training [9][34]. - The research team calculated that the amplification factor of the composite mapping in HC peaked at 3000, indicating that signals could be amplified or attenuated drastically during inter-layer propagation [10][35]. Group 3: Double Random Matrix Constraints - The core idea of the DeepSeek paper is to constrain the residual mapping matrix to a specific manifold formed by double random matrices, known as the Birkhoff polytope [11][36]. - This constraint provides three key theoretical properties: norm preservation, combinatorial closure, and a geometric interpretation that enhances feature fusion stability [14][39][40]. - The Sinkhorn-Knopp algorithm is employed to project any matrix onto this manifold, resulting in a significant reduction in signal gain from 3000 in HC to approximately 1.6 in mHC [16][41]. Group 4: Engineering Optimizations - The expansion of residual flow width incurs additional memory access costs, with detailed analysis showing that standard residual connections require reading 2C elements and writing C elements, while HC requires significantly more [19][44]. - The DeepSeek team developed infrastructure optimizations, including kernel fusion and specialized kernels for the Sinkhorn-Knopp algorithm, to reduce memory access and improve computational efficiency [19][43]. - The paper presents an optimization formula for recomputation strategies, aligning recomputation boundaries with pipeline stage boundaries for enhanced performance [20][45]. Group 5: Experimental Validation - The paper validates the proposed methods on MoE models of sizes 3B, 9B, and 27B, with an expansion rate of n set to 4, demonstrating stable training curves and a loss reduction of 0.021 compared to the baseline [22][47]. - In downstream task evaluations, mHC outperformed HC by 2.1% in the BBH reasoning task and 2.3% in the DROP reading comprehension task, showing superior performance across most tasks [22][48]. - Internal large-scale training experiments confirmed these findings, with mHC introducing only a 6.7% additional time overhead when n=4 [25][50].
DeepSeek,最新发布!
Zheng Quan Shi Bao· 2026-01-01 10:56
Group 1 - DeepSeek has introduced a new architecture called mHC (manifold-constrained hyperconnection) to address instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3] - The research highlights that while hyperconnections have improved performance by diversifying connection patterns, they have also weakened the inherent identity mapping property of residual connections, leading to training instability and limited scalability [3] - Empirical results indicate that mHC effectively supports large-scale training with only a 6.7% additional time overhead when the expansion rate is set to 4, demonstrating its efficiency [3][5] Group 2 - DeepSeek recently launched two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with V3.2 achieving performance comparable to GPT-5 in inference benchmarks, suitable for everyday tasks [6][7] - The V3.2-Speciale model enhances long reasoning capabilities and combines theorem proving abilities, performing similarly to Gemini-3.0-Pro in mainstream inference benchmarks [7] - DeepSeek has also reduced API costs by over 50%, making it more accessible for developers [7] Group 3 - DeepSeek's research paper on the R1 inference model was featured on the cover of the prestigious journal Nature, marking a significant achievement for Chinese AI technology in the international scientific community [8] - This publication is notable as it is the first mainstream large language model research to undergo complete peer review and be published in a leading journal, breaking a gap in the field [8]
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 10:34
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) aimed at addressing the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][27][28]. Group 1: Architecture and Methodology - The mHC architecture expands the traditional single residual flow of Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][28]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving issues related to training instability and excessive memory consumption [4][34]. - The research team has implemented infrastructure optimizations such as kernel fusion, selective recomputation, and an extended DualPipe communication strategy to offset the overhead caused by wider channels [31][34]. Group 2: Performance and Stability - Empirical evidence shows that mHC not only resolves stability issues but also demonstrates exceptional scalability in large-scale training scenarios, such as with a 27 billion parameter model, where it only increased training time overhead by 6.7% while achieving significant performance improvements [34][49]. - The training stability of mHC was evaluated against a baseline model, showing a reduction in final loss by 0.021 and maintaining a stable gradient norm profile, indicating superior stability compared to traditional hyper-connections [49][50]. Group 3: Benchmarking and Results - In various downstream benchmark tests, mHC consistently outperformed the baseline model and surpassed traditional hyper-connections in most tasks, achieving performance gains of 2.1% and 2.3% in specific tasks [51][52]. - The scalability experiments indicated that mHC maintains its performance advantages even under higher computational budgets, demonstrating robust effectiveness in large-scale scenarios [52][53].
DeepSeek元旦发布新论文 开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 09:28
格隆汇1月1日|DeepSeek在元旦发布了一篇新论文,提出了一种名为mHC(流形约束超连接)的新架构。 该研究旨在解决传统超连接在大规模模型训练中的不稳定性问题,同时保持其显著的性能增益 。这篇 论文的第一作者有三位:Zhenda Xie(解振达)、Yixuan Wei(韦毅轩)、Huanqi Cao。值得注意的是, DeepSeek创始人&CEO梁文锋也在作者名单中。 ...
2025盘点:DeepSeek引领AI进化 国补激发消费活力 行业重塑带来更多可能
Xin Lang Cai Jing· 2025-12-31 16:07
Core Insights - The year 2025 has been pivotal for the digital 3C industry, marked by significant advancements in AI technology, policy support, and market dynamics, setting the stage for future developments in 2026 [1][15] Group 1: AI Developments - The launch of DeepSeek-R1 on January 20, 2025, showcased its competitive capabilities against top closed-source models with a training cost of approximately $6 million, challenging Silicon Valley's computational dominance [1][16] - DeepSeek's V3.2-Exp, released in September, introduced a sparse attention mechanism that halved API prices, while the December V3.2 version integrated logical reasoning with agent tool usage, achieving gold medal performances in international competitions [2][16] - DeepSeek's contributions to the 3C industry include promoting "open-source equity," enabling low-cost smart experiences on budget devices through cloud APIs, and leading a global shift towards efficiency in AI [2][16] Group 2: Policy Impact on Market - 2025 is defined as the "Year of National Subsidies" for the 3C market, with the introduction of a policy on January 8 that included subsidies of up to 500 yuan for mobile phones, tablets, and smartwatches, significantly boosting daily active users on e-commerce platforms [3][18] - The subsidy policy expanded in the second half of the year, with 14 provinces increasing the maximum subsidy to 700 yuan, resulting in a total retail sales increase of over 120 billion yuan [3][18] - The continuation of the subsidy policy into 2026 is expected to further include emerging categories like smart glasses, enhancing consumer access to mid-to-high-end products and shifting competition from parameter-based pricing to value-for-money battles [5][18] Group 3: Industry Challenges - The "Romashi incident" in June 2025 involved the recall of nearly 500,000 defective power banks due to safety concerns, leading to significant regulatory responses and the introduction of stricter safety standards in the power bank industry [19][21] - Following the incident, new regulations mandated that all power banks must carry a 3C certification, marking a shift away from low-cost models and ensuring consumer safety [21][22] Group 4: Growth of AI Glasses - 2025 marked a breakthrough year for the AI glasses industry, driven by policy support and market demand, with global shipments expected to reach 12.05 million units and the Chinese market alone surpassing 2.75 million units, reflecting a 107% year-on-year increase [8][22] - The emergence of numerous brands, including major players like Huawei and Xiaomi, indicates a competitive landscape with nearly 70 companies entering the market [10][24] Group 5: AI Assistant Developments - The launch of the "Doubao Phone" by ByteDance and ZTE on December 1, 2025, introduced an AI assistant capable of executing complex tasks across applications, marking a significant advancement in mobile technology [10][24] - The introduction of the AI assistant sparked a debate over app permissions and user data security, highlighting the tension between innovation and established app ecosystems [12][27]
科学圆桌会·趣谈2025| 药理学家:这一年,国产创新药正在经历“DeepSeek时刻”
Xin Hua She· 2025-12-31 05:04
Core Insights - The Chinese pharmaceutical industry is experiencing a significant breakthrough in innovative drug development, marking a decade of progress since 2015, which was recognized as the "Year of Chinese Innovative Drugs" [2] Group 1: Innovative Drug Development - The industry has seen over 110 domestically developed innovative drugs approved since the start of the 14th Five-Year Plan, with a market size reaching 100 billion yuan [8] - The number of innovative drugs in development accounts for over 20% of the global total, positioning China as the second-largest in new drug research and development [8] - Since 2018, a total of 265 innovative drugs have been approved in China, with 68 approved by November 2025 alone, which is more than six times the total for 2018 [9] Group 2: CAR-T Therapy and Chronic Kidney Disease - A new CAR-T therapy targeting kidney fibrosis has gained significant attention, highlighting the potential of innovative treatments to address chronic kidney disease (CKD), which affects approximately 788 million adults globally [3][4] - Traditional treatment options for kidney disease are limited, with high costs and reliance on dialysis for end-stage renal disease patients [4] - The CAR-T therapy aims to halt the progression of kidney disease by precisely targeting pathogenic cells, representing a shift from conventional treatment methods [5] Group 3: Policy Support and Market Expansion - Recent policies from the National Healthcare Security Administration and the National Health Commission have established a comprehensive support system for innovative drug development, enhancing market access and reimbursement mechanisms [7] - The dual-directory mechanism for including innovative drugs in basic medical insurance and commercial health insurance is expected to expand market opportunities for pharmaceutical companies [7] - The industry is entering a "window of opportunity" as major multinational pharmaceutical companies face patent expirations, creating demand for new products in oncology and other therapeutic areas [9] Group 4: International Market and Collaboration - By 2025, the total overseas licensing amount for Chinese innovative drugs is projected to exceed 100 billion USD, doubling from 2024 [10] - The collaboration model for overseas markets has evolved from simple licensing to joint development and commercialization, indicating a shift from a "seller" to a "partner" role for Chinese pharmaceutical companies [10] - This transformation signifies a transition from "following generics" to "source innovation," enhancing the global standing of Chinese innovative drugs [10]
药理学家:这一年,国产创新药正在经历“DeepSeek时刻”
Xin Hua She· 2025-12-31 05:02
Core Insights - The Chinese pharmaceutical industry is experiencing a significant breakthrough in innovative drug development, marking a decade of progress since 2015, which was recognized as the "Year of Chinese Innovative Drugs" [1][5] - Chronic Kidney Disease (CKD) is emerging as a global public health challenge, with approximately 788 million adults affected worldwide in 2023, highlighting the urgent need for new treatment options [2] - The CAR-T therapy shows promise in addressing kidney disease by targeting pathogenic cells, although initial trials faced challenges [3] Industry Developments - The Chinese government has implemented supportive policies for innovative drug development, including a comprehensive support system that enhances the entire drug development chain [3][4] - Since the "14th Five-Year Plan," over 110 innovative drugs have been approved in China, with the market size reaching 100 billion yuan, and the country now ranks second globally in new drug research and development [5][6] - The approval of 265 innovative drugs since 2018, with a significant increase in approvals in 2025, reflects the growing optimism in the industry regarding the drug development cycle [6] Market Opportunities - By 2030, many major multinational pharmaceutical companies will face patent expirations, creating opportunities for Chinese innovative drugs in areas like tumor immunotherapy and other therapeutic fields [6] - The overseas licensing of Chinese innovative drugs is projected to exceed 100 billion USD by 2025, indicating a shift from simple licensing to collaborative development models [7] - The transformation of Chinese pharmaceutical companies from "sellers" to "partners" in global markets signifies a strategic evolution towards original innovation [7] Future Challenges - The industry must transition from "fast following" to "best in class" and ultimately to "first in class" innovations, necessitating collaboration among research, policy, and capital [8] - Continued patience, courage, and wisdom are required to navigate the complexities of the evolving pharmaceutical landscape in China [8]