Seek .(SKLTY)
Search documents
DeepSeek新模型“MODEL1”曝光
Di Yi Cai Jing Zi Xun· 2026-01-21 09:05
Core Insights - The article discusses the emergence of a new model named "MODEL1" from DeepSeek, coinciding with the one-year anniversary of the DeepSeek-R1 release, indicating potential advancements in AI model architecture [2][6]. Group 1: Model Development - "MODEL1" has been referenced in the updated FlashMLA code on GitHub, suggesting it may represent a new model distinct from the existing "V32" architecture [2][3]. - There are differing opinions in the industry regarding whether "MODEL1" is a version 4 model or an advanced inference model, with some developers speculating it could be the ultimate version of the V3 series [2][5]. - Key technical differences between "MODEL1" and "V32" include variations in key-value (KV) cache layout, sparsity handling, and support for FP8 data format decoding, indicating targeted design for memory optimization and computational efficiency [5]. Group 2: Anticipated Release and Features - The structure of the model files suggests that "MODEL1" is nearing completion or inference deployment, awaiting final weight freezing and testing validation, which implies a forthcoming launch [5]. - There are expectations for DeepSeek to release its next flagship model, DeepSeek V4, in February, with preliminary tests indicating it may surpass other top models in programming capabilities [6]. - Recent technical papers from DeepSeek introduce new training methods and an AI memory module, hinting that these innovations may be integrated into the upcoming model [6]. Group 3: Industry Impact - The DeepSeek-R1 model has been recognized as the most praised model on Hugging Face, significantly lowering barriers in inference technology and production deployment, thus influencing the open-source strategy of major Chinese companies [9]. - Over the past year, Chinese AI models have seen increased downloads on Hugging Face, surpassing those from the U.S., indicating a shift in reliance on Chinese-developed open-source models within the global supply chain [9].
传DeepSeek曝新模型,梁文锋再放“王炸”?
Xin Lang Cai Jing· 2026-01-21 07:55
Core Insights - DeepSeek has generated significant buzz in the AI community with the unexpected exposure of a new model named Model1 during a code update, suggesting a potential new technological path distinct from the existing V3 series [1][6][8] - Speculation is rife that DeepSeek is preparing to launch its next-generation AI model, V4, around mid-February, following a year of iterative improvements to the V3 model [3][8] Model Development Timeline - On March 25, 2025, DeepSeek released V3-0324, enhancing code generation usability and surpassing GPT-4.5 in mathematical and coding capabilities [4] - On May 29, 2025, the R1 model underwent a minor upgrade, improving performance in mathematics, programming, and general logic, with hallucination rates reduced by 45-50% [4] - On August 21, 2025, DeepSeek V3.1 was launched, offering faster response times and stronger agent capabilities, along with support for Anthropic's API [4] - On September 22, 2025, the V3.1-Terminus version was released, addressing issues with mixed-language inputs and enhancing the performance of Code and Search Agents [4] - On September 29, 2025, the V3.2-Exp version introduced a new attention mechanism, with updated API pricing structures [4] - On December 1, 2025, the official V3.2 version was released, achieving inference capabilities comparable to GPT-5 and integrating thinking modes for tool usage [4][9] Research Contributions - Two papers authored by Liang Wenfeng were published between late December 2025 and early January 2026, addressing training stability and knowledge retrieval efficiency in large model architectures [5][10] - The first paper proposed a manifold-constrained hyper-connections framework to enhance training stability by constraining residual connections within a specific manifold [10][11] - The second paper introduced a conditional memory module that improves inference and knowledge task performance by decoupling knowledge storage from neural computation [10][11] Market Expectations - The AI community is eagerly anticipating whether DeepSeek will unveil the new Model1 or V4 during the upcoming Spring Festival, with expectations of a significant impact on the global AI landscape [6][8]
DeepSeek新模型真的要来了?“MODEL1”曝光
Di Yi Cai Jing Zi Xun· 2026-01-21 07:00
Core Insights - The article discusses the emergence of a new model named "MODEL1" from DeepSeek, coinciding with the one-year anniversary of the release of DeepSeek-R1, indicating potential advancements in AI technology [1][4]. Group 1: Model Development - "MODEL1" has been referenced in the updated FlashMLA code on GitHub, suggesting it is a new model distinct from the existing "V32" architecture [1][2]. - There are differing opinions in the industry regarding whether "MODEL1" represents a V4 model or an advanced version of the V3 series [2][3]. - The new model is expected to be close to completion, awaiting final weight freezing and testing validation, indicating a near launch [3]. Group 2: Technical Innovations - FlashMLA is a proprietary software tool optimized for NVIDIA Hopper architecture GPUs, crucial for achieving low-cost and high-performance model implementations [3]. - Key technical differences between "MODEL1" and "V32" include variations in key-value (KV) cache layout, sparse processing methods, and support for FP8 data format decoding, suggesting targeted design for memory optimization and computational efficiency [3]. Group 3: Market Impact and Expectations - The anticipation for DeepSeek's next flagship model is high, with expectations that it will integrate recent research findings, including a new training method and an AI memory module [4]. - The release of DeepSeek-R1 has significantly influenced the open-source community, with increased contributions from major Chinese companies and a shift in global reliance towards Chinese-developed open-source models [5][7].
DeepSeek新模型“Model 1”曝光,疑似“高效推理模型”
Xin Lang Cai Jing· 2026-01-21 06:58
Core Insights - DeepSeek has updated its official GitHub repository with a series of FlashMLA code, drawing attention to a model named "Model 1" [1][2] - Model 1 is speculated to be the new model code that DeepSeek is expected to release around the Chinese New Year [2] Model Specifications - Model 1 is one of the two main model architectures supported in DeepSeek FlashMLA, alongside DeepSeek-V3.2 [2] - It is likely to be an efficient inference model with lower memory usage compared to V3.2, making it suitable for edge devices or cost-sensitive scenarios [2] - Model 1 may also function as a long-sequence expert optimized for sequences longer than 16K, making it ideal for tasks such as document understanding and code analysis [2]
AI视频迎来了它的DeepSeek时刻
Jing Ji Guan Cha Wang· 2026-01-21 06:39
Core Insights - PixVerse R1, launched by Aishi Technology, represents a significant advancement in AI video generation, allowing users to create videos in real-time without needing prompts, marking a transformative moment in the AI video industry [1][2][4] Group 1: Product Features - PixVerse R1 can generate videos instantly, adapting to user commands with remarkable speed, creating an immersive digital world where user input directly influences the narrative [1][3] - The model utilizes an Omni native multimodal architecture, integrating text, images, audio, and video into a unified processing framework, enhancing its generative capabilities [3][4] - It employs a self-regressive flow generation method, allowing it to remember previous inputs and generate content with a "long-term memory," which differentiates it from traditional video generation methods [4][7] Group 2: Market Impact - Aishi Technology secured a strategic investment of $14.2 million from Chinese company Ruyi, which will facilitate collaboration in film, streaming, and gaming sectors, indicating strong market interest in PixVerse R1 [5][6] - The partnership aims to explore innovative applications of AI technology in the film industry, highlighting the potential for significant transformation in content creation [6][7] - The product has already attracted attention from various game companies, indicating its potential to revolutionize interactive media and gaming experiences [8][9] Group 3: Competitive Landscape - Aishi Technology is positioned as a leader in the real-time video generation space, with no other companies having launched similar products, showcasing its competitive edge [7][9] - The company has rapidly gained traction, with over 100 million global users and a monthly active user count exceeding 16 million, reflecting its strong market presence [9][10] - The PixVerse R1 is recognized as the first universal real-time world model supporting up to 1080P resolution, setting a new standard in the industry [9][10] Group 4: Future Prospects - The introduction of PixVerse R1 is expected to blur the lines between video production and consumption, allowing users to generate and edit content in real-time, thus redefining user engagement in media [7][11] - The technology is anticipated to enable new forms of interactive storytelling and AI-native games, where narratives evolve based on user interactions, creating a dynamic digital ecosystem [7][8] - Aishi Technology's founder emphasizes that PixVerse R1 represents a new media form, where AI can create a continuously evolving world based on user intent, marking the beginning of a new era in real-time content generation [11]
DeepSeek AI新模型曝光:搭载 MODEL1 全新架构,最快2月上线
Huan Qiu Wang Zi Xun· 2026-01-21 06:37
Core Insights - DeepSeek plans to launch its next-generation flagship AI model, DeepSeek V4, around mid-February during the Lunar New Year, which is expected to significantly enhance coding capabilities and attract industry attention [1][2] Group 1: Model Development - The release of DeepSeek V4 follows the one-year anniversary of the DeepSeek-R1 model, with developers discovering updates related to FlashMLA in 114 files, including 28 references to an unknown "MODEL1" identifier, likely indicating a new AI model with a different architecture [1][2] - The new architecture optimizes key technical aspects such as key-value (KV) cache layout, sparsity handling, and FP8 data format decoding support, addressing memory usage and computational efficiency issues, thereby laying the groundwork for performance improvements [3] Group 2: Research Innovations - DeepSeek's research team has previously published two technical papers introducing innovative training methods like "optimized residual connections (mHC)" and a biologically inspired "AI memory module (Engram)," suggesting that DeepSeek V4 may integrate these latest research findings to enhance its capabilities in handling complex tasks [3]
R1模型发布一周年 DeepSeek新模型“MODEL1”曝光
Xin Lang Cai Jing· 2026-01-21 04:05
Core Insights - DeepSeek has unveiled a new model architecture named "MODEL1" as part of its FlashMLA software, which is designed to optimize large model inference generation on NVIDIA GPUs [1][2] - MODEL1 is expected to be a highly efficient inference model with lower memory usage compared to the existing V3.2 model, making it suitable for edge devices and cost-sensitive applications [2] - The company is set to launch its next flagship AI model, DeepSeek V4, in mid-February 2025, which is anticipated to enhance coding capabilities [3] Group 1 - The FlashMLA tool analyzes a total of 114 code files and identifies the MODEL1 architecture mentioned 31 times [1] - MODEL1 supports multiple GPU architectures, including specific implementations for NVIDIA H100/H200 and B200, indicating a tailored optimization for the latest GPU technology [2] - DeepSeek's existing models represent two technical routes: the V series focusing on comprehensive performance and the R series targeting complex reasoning tasks [2] Group 2 - The V3 model, launched in December 2024, established a strong performance foundation with its efficient MoE architecture, followed by rapid iterations leading to V3.2 [3] - The R1 model, released in January 2025, excels in complex reasoning tasks through reinforcement learning and introduces a "deep thinking" mode [3] - Recent technical papers from DeepSeek suggest ongoing development of new models that may integrate innovative training methods and AI memory modules [3]
Hugging Face回看“DeepSeek时刻”:过去一年,中国AI如何改变全球开源格局?
Hua Er Jie Jian Wen· 2026-01-21 02:41
Core Insights - The article discusses the significant impact of the release of DeepSeek R-1 on the global open-source AI ecosystem, marking a pivotal moment for China's AI development and its influence worldwide [1][3]. Group 1: Transformation of AI Landscape - The release of DeepSeek R-1 in January 2025 is identified as a watershed moment that lowered barriers to technology and application, leading to a shift from closed-source to open-source models in China [1][5]. - Major Chinese tech companies like Baidu, Alibaba, and Tencent, along with startups like Moonshot, have significantly increased their open-source investments, resulting in Chinese models surpassing U.S. models in download volume on Hugging Face [1][6]. Group 2: Breaking Down Barriers - DeepSeek R-1 effectively dismantled three critical barriers: technical, adoption, and psychological, transforming the perception of open-source from a tactical choice to a long-term strategy for Chinese tech companies [3][5]. - The article emphasizes that the focus of competition has shifted from individual model performance to ecosystem development, with companies now prioritizing engineering systems and application scenarios [6][10]. Group 3: Market Dynamics and Global Response - The article notes that the rise of Chinese AI models is not merely a result of collaboration but is driven by shared technological, economic, and regulatory pressures, leading to a competitive alignment among companies [8][11]. - Global reactions indicate a reliance on Chinese-developed models, with many startups and researchers defaulting to these models, highlighting the growing influence of Chinese AI in international markets [11].
DeepSeek新模型MODEL1曝光,瑞士百达持续投资科技股
Mei Ri Jing Ji Xin Wen· 2026-01-21 01:21
【市场复盘】 【热门ETF】 机器人ETF(562500)是全市场唯一规模超两百亿、流动性最佳、覆盖中国机器人产业链最全的机器人主 题ETF,助力投资者一键布局中国机器人产业。 2.瑞士百达多元资产香港区主管黄思远表示,还是会持续投资科技股,尽管苹果、微软有点跑输大市, 不过很多科技公司都很不错。目前美国市场对于科技领域专注于"现在交付",而中国市场略有不同,人 们花钱购买机器人等,也是更长期的购买。目前这一市场还没有看到过度繁荣及不合理的繁荣。 3.德勤发布《2026科技、传媒和电信行业预测》报告指出,AI正在重新定义硬件、软件、电信与传媒行 业的基础。全球工业机器人装机量预计将在2026年达到550万台,并保持相对温和的年增长率,突破每 年100万台的关键节点预计要到2030年之后。 【机构观点】 招商证券认为,震裕科技(300953)利基的模具业务经营稳中有增,铁芯板块的新产品开始放量,有望 恢复到较好的增速。收入体量最大的结构件业务经营如期反转,有望维持加快增长态势。公司大力培育 的机器人板块,在国内市场进展较顺利,后续海外大客户体系也有望有所突破。 本周二(1月20日),科创人工智能ETF华夏(58 ...
DeepSeek新模型MODEL1曝光
Jin Rong Jie· 2026-01-20 23:59
DeepSeek-R1发布一周年之际,新模型"MODEL1"曝光。DeepSeek在GitHub更新FlashMLA代码,横跨 114个文件中有28处提到MODEL1,与V32作为不同的模型出现。已知V32是DeepSeek-V3.2,MODEL1 很可能是新的架构。代码中的具体差异体现在KV缓存布局、稀疏性处理和FP8解码方面,在内存优化 上有多处不同。此前有消息称DeepSeek将在2月中旬春节前后发布下一代旗舰模型。 ...