Workflow
Seek .(SKLTY)
icon
Search documents
DeepSeek上新mHC,R2还远吗?
Tai Mei Ti A P P· 2026-01-04 06:05
Core Insights - DeepSeek has introduced a new neural network architecture optimization called mHC (Manifold-Constrained Hyper-Connections), which is expected to significantly impact the AI industry, including large models and chips [1][5][9] Group 1: mHC Architecture - The mHC architecture builds on the Hyper-Connections (HC) framework released by the Byte Bean team in November 2024, aiming to replace the nearly decade-old ResNet architecture [5] - mHC introduces a Manifold-Constrained approach using the Sinkhorn-Knopp algorithm to stabilize signal propagation during training, addressing issues of signal explosion and instability in large model training [5][6] - In training demonstrations with 27 billion parameters, mHC maintained a signal amplification of only 1.6 times, while HC experienced a catastrophic failure with a 3000 times amplification [6][8] Group 2: Performance and Efficiency - mHC shows a significant reduction in training loss and improved performance on challenging tasks, with over 2% enhancement in reasoning and reading comprehension benchmarks compared to traditional architectures [6][8] - The additional training time overhead for mHC, even with a fourfold expansion of residual channels, is only 6.7%, indicating a focus on cost-effectiveness and efficiency [8] Group 3: Industry Impact and Reactions - The release of mHC has sparked high discussion levels among researchers and industry professionals, with expectations of a paradigm shift in large model architectures by 2026 [9][10] - Competitors are already responding, with new architectures like Deep Delta Learning emerging shortly after mHC's announcement, indicating a potential chain reaction in AI architecture development [9][10] - Analysts predict that DeepSeek may make significant announcements around the Lunar New Year, potentially unveiling the long-awaited R2 model or a faster universal model V4 [10] Group 4: Compatibility and Market Dynamics - mHC's architecture is primarily designed for NVIDIA's supernode links, raising concerns about compatibility with domestic chips, which may require enhanced adaptation efforts [11] - As U.S. AI chip manufacturers gradually exit the Chinese market due to geopolitical factors, domestic chipmakers are accelerating their development and ecosystem building to adapt to DeepSeek's models [12]
Manus季逸超评喜爱的大模型公司:千问扎实、Kimi有品位、DeepSeek有早期渊源
Xin Lang Cai Jing· 2026-01-04 05:45
同时,他提到,和千问的团队很多成员都比较熟悉。在季逸超看来,千问是做的非常的扎实,而且它是 真正第一个上非常宽松开源的模型,对此他非常尊敬。另外,千问让他看到了大厂里一群年轻人能干出 来事情的表现,非常难得。 他解释到,自己很早就和Deepseek有接触。"我之前的一次创业是在做知识图谱构建,且同时我们自己 从零搭建整个搜索引擎的Infra,有很大的预训练数据集,所以当时很早就跟Deepseek的朋友认识了。但 后来我也没有把数据单独卖给他们,我直接把数据集开源了,也算是给大家留下一点遗产。" 此外,他还提到,Kimi一方面也是真格基金投资的,因为自己也在真格,所以当时交流比较多。另一 方面,季逸超认为,Kimi这个公司是比较有品位的,这个很重要。 新浪声明:所有会议实录均为现场速记整理,未经演讲者审阅,新浪网登载此文出于传递更多信息之目 的,并不意味着赞同其观点或证实其描述。 近日,在访谈中,谈及大模型公司,Manus联合创始人兼首席科学家季逸超表示,自己比较喜欢的有三 家——千问、Deepseek和Kimi。 同时,他提到,和千问的团队很多成员都比较熟悉。在季逸超看来,千问是做的非常的扎实,而且它是 真正 ...
AI周报|Meta斥资数十亿美元收购Manus;梁文锋署名DeepSeek新论文
Di Yi Cai Jing· 2026-01-04 02:26
Group 1: Meta's Acquisition of Manus - Meta has acquired the AI startup Manus for a price reported to be in the billions, marking it as Meta's third-largest acquisition after WhatsApp and Scale.ai [1] - Manus will continue its operations in Singapore and maintain its product offerings through its app and website without changes to its decision-making processes [1] - The acquisition reflects Meta's urgency to enhance its AI capabilities, especially in light of competition from Google's Gemini 3 [1] Group 2: SoftBank's Investment in OpenAI - SoftBank has completed a $40 billion investment commitment to OpenAI, making it one of the largest private financings in history [2] - The final tranche of the investment, amounting to $22 billion to $22.5 billion, has been sent recently [2] - SoftBank's divestment of Nvidia shares for $5.83 billion indicates a strategic shift to fund AI projects, including the partnership with OpenAI [2] Group 3: DeepSeek's New Research - DeepSeek has introduced a new network architecture called mHC (manifold-constrained hyperconnection) aimed at improving model training stability and efficiency [3] - The research addresses issues related to the scalability and memory access costs of existing hyperconnection models [3] - Industry experts view this innovation as a foundational advancement that could lead to significant updates in future versions of DeepSeek's technology [3] Group 4: Moonlight's Financing and Market Position - Moonlight, a large model unicorn, has completed a $500 million Series C financing, significantly exceeding its target, and currently holds over 10 billion yuan in cash [4] - The funds will be used to aggressively expand GPU resources and accelerate the training and development of its K3 model [4] - Moonlight aims to surpass competitors like Anthropic to become a leading AGI company [4] Group 5: Upcoming IPOs in the AI Sector - Companies including OpenAI, Anthropic, and SpaceX are preparing for potential IPOs this year, with total fundraising expected to reach hundreds of billions [6] - OpenAI is negotiating a new valuation of $750 billion, while Anthropic's valuation may exceed $300 billion [6] - The combined valuation of these companies could approach 13 trillion yuan, indicating a significant market impact [6] Group 6: MiniMax's IPO Plans - MiniMax has initiated its IPO process, aiming to raise up to 4.19 billion HKD (approximately $538 million) with a share price range of 151 to 165 HKD [7] - The company is set to list on the Hong Kong Stock Exchange on January 9, 2026, shortly after its competitor, Zhipu AI [7] - MiniMax's cornerstone investors include major financial institutions and investment funds, highlighting strong market interest [7] Group 7: Baidu's Kunlun Chip IPO - Baidu has submitted a confidential application for its AI chip subsidiary Kunlun to independently list on the Hong Kong Stock Exchange [8] - This move follows Baidu's earlier evaluation of the potential for a spin-off, indicating a strategic shift in its business model [8] - The competitive landscape for Kunlun includes major players like Nvidia and AMD, as well as domestic competitors [8] Group 8: Wall Street's Response to Wall Street's IPOs - Wall Street analysts predict that if any of the aforementioned companies successfully go public, it could overshadow the total fundraising of approximately 200 companies in the U.S. in 2025 [6] - The anticipated IPOs are expected to generate significant returns for venture capitalists and investment bankers involved in the transactions [6] Group 9: Wall Street's Response to Wall Street's IPOs - Wall Street analysts predict that if any of the aforementioned companies successfully go public, it could overshadow the total fundraising of approximately 200 companies in the U.S. in 2025 [6] - The anticipated IPOs are expected to generate significant returns for venture capitalists and investment bankers involved in the transactions [6] Group 10: xAI's Expansion - xAI, led by Elon Musk, has purchased a third building to enhance its training capabilities, aiming for nearly 2 gigawatts of computing power [15] - The new facility is set to be transformed into a data center by 2026, supporting xAI's growth and operational needs [15] - xAI's previous investments in data centers indicate a strong commitment to expanding its AI infrastructure [15]
喜茶掉队、DeepSeek被它打败,2025年好品牌之争谁赢了
3 6 Ke· 2026-01-04 02:24
Group 1 - The brand index is used as a measurement standard for the public, calculated based on reader votes, with the highest voted brand in each category set to 100 for index processing [2] - The top five brands in various categories have been identified, with changes in rankings noted, including new entries and shifts in positions compared to the previous year [4][9] - The overall consumer sentiment indicates a cautious approach to spending, with a significant portion of respondents prioritizing product quality and reliability over brand loyalty [123][124] Group 2 - Heytea has fallen behind, with Guming Tea replacing it in the top five, and Guming Tea's store count reaching 11,179 with a net profit of 1.625 billion yuan, surpassing its total profit from the previous year [9] - Haidilao remains the top brand in the hot pot category, while KFC and McDonald's have swapped positions, with KFC slightly ahead [12] - The beverage market sees a return of Nongfu Spring to the top ranks, while Wahaha faces management turmoil, impacting its brand perception [15][17] Group 3 - In the beauty and personal care sector, Estee Lauder and L'Oreal dominate, with significant changes in rankings and the absence of local brands in the top positions [41] - Anta and Li Ning lead the sportswear category, with Li Ning rising to first place from fourth last year, while Adidas has returned to the rankings [45] - Douyin has surpassed Bilibili in the short video sector, with Douyin's daily active users reaching 600 million, while Bilibili has improved its profitability [55] Group 4 - The e-commerce landscape is evolving, with traditional platforms like JD, Meituan, and Taobao entering the instant retail competition, leading to significant financial investments in subsidies [73] - The AI app market is witnessing explosive growth, with ByteDance's products leading in active user numbers, indicating a shift towards AI-driven applications [80] - The adult product market is quietly rising, with brands like Durex and Okamoto leading the category [82] Group 5 - The home appliance market is characterized by intense competition, with Midea focusing on diversified business strategies, while Haier emphasizes high-end and localized operations [92] - Huawei continues to focus on the high-end market, with significant developments in its HarmonyOS ecosystem, while Apple faces challenges with its latest iPhone series [94][95] - The hotel industry is shifting towards new chain hotels, with traditional five-star hotels losing appeal as consumers seek more modern accommodations [116]
美媒称要向DeepSeek学习
Xin Lang Cai Jing· 2026-01-03 00:40
Group 1 - The core viewpoint of the article highlights China's rising global appeal and cultural influence, particularly through its innovative technology and creative industries, with "cool China" becoming a frequent descriptor in foreign media by 2025 [1] - DeepSeek, a Chinese startup, launched its AI model R1 on January 20, 2025, achieving performance comparable to leading global AI models with significantly lower computational power, challenging the perception of U.S. dominance in the AI sector [1] - The article notes that China has surpassed other countries in the number of patents obtained in the AI field, and Chinese scientists publish more research papers on quantum computing annually than their counterparts in other nations [1] Group 2 - Chinese micro-short dramas have gained popularity globally, reaching over 200 countries and regions, effectively catering to the fragmented entertainment needs of global internet users [1] - Southeast Asia has emerged as a key fan base for Chinese micro-short dramas, serving as a natural medium for spreading Chinese culture [1] - An Indonesian fan expressed that watching these dramas sparked an interest in the Tang Dynasty, highlighting the cultural impact of these productions [1]
DeepSeek发布最新论文,破解大模型训练拥堵难题
Bei Ke Cai Jing· 2026-01-02 12:44
Core Viewpoint - The DeepSeek team has introduced a new framework called "mHC" (Manifold-Constrained Hyper-Connections) that significantly improves the training performance of large-scale models by addressing issues related to the previous "HC" (Hyper-Connections) paradigm [1][4]. Group 1: Paper Overview - The paper focuses on the foundational aspect of large model training, specifically the residual connection paradigm, and proposes the mHC framework as a theoretical innovation to enhance model training stability [4][5]. - The mHC framework is likened to a smart traffic management system that regulates data flow in multi-lane connections, thereby increasing training stability and performance [5][6]. Group 2: Theoretical Innovation - The mHC framework builds upon the work of AI pioneers such as He Kaiming and ByteDance, who previously introduced the residual connection and HC paradigms, respectively [7][8]. - DeepSeek's contribution is positioned as an optimization of existing frameworks, aiming to reignite interest in macro-architecture design within the AI community [9]. Group 3: Company Strategy - Amidst a trend of commercialization in the large model sector, DeepSeek's focus on foundational model research underscores its strategic commitment to advancing basic model theory rather than immediate commercial applications [9].
DeepSeek又放大招!梁文锋署名新论文引关注
Core Insights - DeepSeek has introduced a new framework called "Manifold-Constrained Hyperconnection" (mHC) aimed at enhancing scalability while reducing the computational power and energy requirements for training advanced AI systems [1][14][19] - The next flagship system, R2, is expected to be launched around the Chinese New Year in February [1][14] Summary of Key Points Introduction of mHC Framework - DeepSeek published a paper detailing the mHC framework, which addresses instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][15][16] - The paper lists three primary authors, including DeepSeek's founder Liang Wenfeng [1][17] Performance and Scalability - The mHC framework projects the residual connection space of hyperconnections onto a specific manifold, restoring the identity mapping property and integrating strict infrastructure optimizations for operational efficiency [3][19] - Empirical experiments indicate that mHC effectively supports large-scale training, providing notable performance improvements with better scalability. When the expansion rate is set to 4, it incurs only a 6.7% additional time overhead [3][19][21] Future Research Directions - The paper suggests that mHC serves as a flexible and practical extension of hyperconnection paradigms, potentially deepening the understanding of topological architecture design and guiding the evolution of foundational models [3][21] - It opens up several important research directions, including compatibility with various manifold constraints tailored to specific learning objectives and the exploration of differentiated geometric constraints to better balance plasticity and stability [3][21]
DeepSeek发布新论文提出更为高效的AI开发方法
Xin Lang Cai Jing· 2026-01-02 10:13
Core Viewpoint - DeepSeek has introduced a more efficient artificial intelligence development method through a paper co-authored by founder Liang Wenfeng, proposing a framework called "Manifold-Constrained Hyperconnection" (mHC) aimed at enhancing scalability while reducing the computational power and energy requirements for training advanced AI systems [1] Group 1 - The mHC framework is designed to improve scalability in AI development [1] - The new flagship system R2 from DeepSeek is expected to be launched around the Chinese New Year in February [1]
梁文锋DeepSeek新论文!接棒何恺明和字节,又稳了稳AI的“地基”
Xin Lang Cai Jing· 2026-01-02 05:27
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyper-Connections), which significantly improves the residual connection component of the Transformer architecture, a foundational element that has seen little change since its inception in 2015 [1][3] Group 1: Historical Context - The evolution of neural network architectures began with ResNet, introduced by Kaiming He in 2015, which addressed the vanishing gradient problem and enabled the training of very deep networks [3] - The Transformer model, released in 2017, adopted residual connections as a standard feature, forming the basis for many leading models today [3] Group 2: Technical Comparisons - Hyper-Connections, proposed by ByteDance in 2024, expanded the single residual flow into multiple parallel streams, enhancing model performance but introducing stability issues during training [5][10] - mHC aims to resolve the stability problems associated with Hyper-Connections by constraining the connection weight matrix within a specific mathematical space, ensuring that signal amplification does not occur [10][12] Group 3: Mathematical Innovation - The core innovation of mHC involves using a Doubly Stochastic Matrix for the connection weights, which guarantees that the output does not exceed the maximum input value, thus preserving energy conservation [10][12] - The implementation of mHC utilizes the Sinkhorn-Knopp algorithm to achieve the desired matrix properties efficiently, allowing for end-to-end training without introducing new hyperparameters [11][12] Group 4: Engineering Excellence - DeepSeek's approach to implementing mHC demonstrates significant engineering capabilities, including the development of custom CUDA kernels and operator fusion techniques to minimize computational delays [16] - The ability to integrate innovative mathematical solutions into practical training environments highlights DeepSeek's competitive advantage in the AI research landscape [16]
四大热点齐发:茅台直销战略落地、巴菲特退休、GPU四小龙集结上市、DeepSeek再释信号
Jin Rong Jie· 2026-01-02 00:17
Group 1: Moutai's Direct Sales Strategy - Moutai officially launched its direct sales strategy by selling Feitian Moutai on the "i Moutai" platform at a price of 1499 yuan per bottle, with a purchase limit of 12 bottles per user per day [2] - The move aims to reduce intermediaries, potentially converting some dealer profits into direct company revenue, which is expected to positively support mid-to-long-term performance [2] - The market response was extremely enthusiastic, with all six rounds of product releases selling out quickly, indicating strong demand for reasonably priced Feitian Moutai [2] Group 2: Warren Buffett's Retirement - Warren Buffett, the legendary investor, announced his retirement at the age of 95, marking the end of a nearly century-long investment career [3] - His career exemplified that investing can be a lifelong endeavor and has prompted a renewed examination of long-term investment philosophies [3] - Buffett emphasized the importance of focusing on quality assets and long-term holding, a principle that remains relevant despite the rise of high-frequency trading and quantitative strategies [3] Group 3: Domestic GPU Companies Accelerating Capitalization - The four leading domestic GPU companies, including Suiruan Technology, have initiated their IPO processes, with Suiruan recently completing its IPO counseling [4] - This acceleration in the capitalization of the domestic GPU sector reflects an unprecedented speed in the industry, with multiple companies moving towards public offerings [4] - The upcoming wave of IPOs in the tech sector is expected to inject capital into the economy and support the goal of self-sufficiency in the industrial chain [4] Group 4: DeepSeek's Research Publication - DeepSeek recently published an important research paper on a preprint platform, with founder Liang Wenfeng listed as one of the authors, highlighting the company's strategic focus on technological advancement [5] - The release of the paper follows the market's high interest in their DeepSeek-R1 model, indicating the company's strong technical capabilities [5] - Despite mixed opinions on the pace of AI technology iteration, DeepSeek's continuous output of significant research results suggests a robust technical strength [5]