qzmjef 发表于 2024-7-30 20:10:33

关于 Google Gemini 的八点启示


    <div style="color: black; text-align: left; margin-bottom: 10px;">
      <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-axegupay5k/bb8edb175acb413b8d4621bd75df1ed2~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1722922607&amp;x-signature=APMtc7Z%2B2wcJzWJxuSzsDuwPei8%3D" style="width: 50%; margin-bottom: 20px;"><strong style="color: blue;">作者 | 高佳 李维</strong><span style="color: black;"><span style="color: black;"><strong style="color: blue;">创意<strong style="color: blue;"> | 李志飞</strong></strong></span></span><span style="color: black;">1948年,英国<span style="color: black;">大夫</span>罗斯·阿什比受精神病<span style="color: black;">病人</span>的启发,发明了一种古怪的<span style="color: black;">设备</span>——“同态调节器”,并宣<span style="color: black;">叫作</span>,这台造价约50磅的<span style="color: black;">安装</span>,是“迄今为止人类所设计出的最接近人工大脑的事物。”</span><span style="color: black;">“同态调节器”把 4 个英国皇家空军用于二战的炸弹<span style="color: black;">掌控</span>开关齿轮<span style="color: black;">安装</span><span style="color: black;">做为</span>底座,上面套有4个立方铝盒,4个铝盒顶部的4个小磁针是这台<span style="color: black;">设备</span><span style="color: black;">独一</span>可见的运动部件,像指南针<span style="color: black;">同样</span>在小水槽内摆动。</span><span style="color: black;">当<span style="color: black;">起步</span><span style="color: black;">设备</span>时,磁针会受到来自铝盒的电流影响而运动,4个磁针始终<span style="color: black;">处在</span><span style="color: black;">敏锐</span>且脆弱的平衡状中。同态调节器的<span style="color: black;">独一</span><span style="color: black;">功效</span>,<span style="color: black;">便是</span>让4个磁针保持在中间位置,即让<span style="color: black;">设备</span>感到“舒服”的状态。</span><span style="color: black;">阿什比尝试<span style="color: black;">各样</span>让<span style="color: black;">设备</span>“不舒适”的<span style="color: black;">办法</span>,如颠倒电线连接的极性、颠倒磁针方向等,但<span style="color: black;">设备</span>总能找到适应新状态的<span style="color: black;">办法</span>,并重新将磁针摇摆到中心位置。按阿什比的说法:<span style="color: black;">设备</span><span style="color: black;">经过</span>突触“主动地”抵御了任何扰乱其平衡的尝试,执行“协同活动”以重新<span style="color: black;">得到</span>平衡。</span><span style="color: black;">阿什比相信终有一天,<span style="color: black;">这般</span>一个“简陋的<span style="color: black;">安装</span>”会发展成一颗“比任何人类都强大”的人工大脑,去<span style="color: black;">处理</span>世界上一切<span style="color: black;">繁杂</span>棘手的问题。</span><span style="color: black;">尽管阿什比对今天的 AGI 进化毫无所知,尽管 4 个小磁针<span style="color: black;">做为</span>传感器对智能所需的<span style="color: black;">要求</span>堪<span style="color: black;">叫作</span>笑谈,但它从元<span style="color: black;">规律</span>上挑战了所有人对“智能”的理解——“智能”不<span style="color: black;">便是</span>从环境中吸收多种模态的信息,并<span style="color: black;">按照</span>反馈修正<span style="color: black;">行径</span>、处理任务的一种能力吗?</span><strong style="color: blue;"><span style="color: black;">从古怪的“同态调节器”到75年后的今天,号<span style="color: black;">叫作</span>多模态任务处理能力首次超越人类的 Gemini ,<span style="color: black;">经过</span>多模态原生态大数据的注入,向着数十亿年碳基智能的演化加速迭进。</span></strong><span style="color: black;">今天<span style="color: black;">设备</span>智能的进化速度已远超<span style="color: black;">咱们</span>想象。</span><span style="color: black;">一年前,</span><span style="color: black;"><span style="color: black;">OpenAI掀翻Google布局<span style="color: black;">数年</span>的AI大旗</span></span><span style="color: black;">,以「暴力美学」筑就人类语言的通天塔。</span><span style="color: black;">一年后,Google 祭出 Gemini,「以暴制暴」建成人类跨模态大一统模型,<span style="color: black;">作为</span>另一个加速AGI演进的节点。</span><span style="color: black;">尽管发布首日Gemini 就深陷“视频demo夸张”的质疑,但不可否认的是,大一统多模态已初闪了光芒。</span><span style="color: black;">Gemini 这位寓意善于体察、敏锐好奇的“双子星”印证了<span style="color: black;">那些</span>能力,</span><span style="color: black;">Google的命运齿轮将<span style="color: black;">怎么样</span>转动?<span style="color: black;">时间是OpenAI还是Google的<span style="color: black;">伴侣</span>?</span>多模态<span style="color: black;">针对</span>Agent和具身智能<span style="color: black;">寓意</span>什么?<span style="color: black;">持有</span>自主<span style="color: black;">认识</span>AGI的涌现<span style="color: black;">基本</span><span style="color: black;">已然</span>具备了吗?<span style="color: black;">怎样</span>看待 Gemini 对<span style="color: black;">将来</span>的启示?</span>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">01.</span></strong></span></strong></span></strong></strong></span></span></strong></span></strong></span></strong></strong></span></span></strong></span></h1>
            <h2 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">大模型的跨模态知识迁移能力再次被证明</span></strong></span></h2><span style="color: black;">对人类<span style="color: black;">来讲</span>,比学习技能更重要的是知识迁移能力,<span style="color: black;">能够</span>跨越各个<span style="color: black;">行业</span>,纵深<span style="color: black;">区别</span>时空。<span style="color: black;">倘若</span><span style="color: black;">设备</span>学会了跨模态的知识迁移,<span style="color: black;">更易</span>抵达“通用”。</span><span style="color: black;"><span style="color: black;">今年7月,Google发布了基于大模型的<span style="color: black;">设备</span>人系统RT-2,让人们看到了通用<span style="color: black;">设备</span>人的<span style="color: black;">期盼</span></span></span><span style="color: black;">。</span><span style="color: black;">机械臂基于语言模型的“常识”<span style="color: black;">能够</span>从桌上“捡起<span style="color: black;">已然</span>灭绝的动物”,从常识推理到<span style="color: black;">设备</span>人执行,展示了跨模态的知识迁移。</span><span style="color: black;">12月,Gemini 这一记巨头的手笔,再次印证了大模型的跨模态知识迁移能力:</span><strong style="color: blue;"><span style="color: black;">语言模型的“常识”<span style="color: black;">能够</span>迁移到后续加入的其他非语言模态的训练中。</span></strong><strong style="color: blue;"><span style="color: black;">语言模型是认知智能的<span style="color: black;">基本</span>,最基本的认知智能是“常识”。</span></strong><span style="color: black;"><span style="color: black;">倘若</span><span style="color: black;">无</span>常识赋能,多模态大模型的<span style="color: black;">非常多</span>落地执行是难以做到的。</span><span style="color: black;">Gemini </span><span style="color: black;">把互联网上学到的这些“常识”,丝滑地迁移到下游的多模态任务中。如同 RT-2 ,<span style="color: black;">经过</span>互联网文本知识的迁移,实现跨模态的融会贯通——Gemini <span style="color: black;">能够</span>把抽象的语言概念贯通到对听觉、视觉对象的理解,<span style="color: black;">乃至</span>与 Action 连起来,<span style="color: black;">作为</span>一个智能落地的系统。</span><span style="color: black;"><span style="color: black;">对模型训练<span style="color: black;">方向</span>而言,相比于语言模型由</span><span style="color: black;">海量的互联网数据</span><span style="color: black;">训练</span><span style="color: black;">,其</span><span style="color: black;">下游模型(如<span style="color: black;">设备</span>人模型)<span style="color: black;">能够</span><span style="color: black;">经过</span>知识迁移</span><span style="color: black;">用少量的数据来训练,</span></span><span style="color: black;">这种循序渐进的训练</span><span style="color: black;"><span style="color: black;">处理</span>了<span style="color: black;">困惑</span>学术界<span style="color: black;">数年</span>的</span><span style="color: black;">下游数据稀缺问题。</span><span style="color: black;"><span style="color: black;">例如</span>,为了达到视频中展示的效果(该展示<span style="color: black;">诱发</span>对 Gemini 视频理解的存疑,但不影响跨模态知识迁移的讨论),Gemini <span style="color: black;">首要</span>要有<span style="color: black;">有些</span>本体知识——它<span style="color: black;">晓得</span>鸭子这一品种概念,<span style="color: black;">晓得</span>鸭子<span style="color: black;">通常</span>是什么颜色,<span style="color: black;">晓得</span>什么是蓝色。当它看到“蓝鸭”时,才会与人类有类似反应,表达“蓝鸭并不<span style="color: black;">平常</span>”这一“常识”。</span><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-tjoges91tu/34fa237ea1007cd192f70b1a7d075484~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1722922607&amp;x-signature=vIpauamddr9UH%2Bje6MPjGSKTHrI%3D" style="width: 50%; margin-bottom: 20px;"><span style="color: black;">Gemini <span style="color: black;">经过</span>声音、视觉感知到蓝鸭的材质是橡胶,并<span style="color: black;">晓得</span>橡胶的密度<span style="color: black;">少于</span>水的密度,基于这些常识和推理,当听到嘎吱声时,<span style="color: black;">能够</span>预判“蓝鸭能漂在水上”。</span><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-tjoges91tu/e63a554ef5b401a63f65d04c2a92a4ff~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1722922607&amp;x-signature=bMUkMs9kcgdyGJ%2B5h0NJGR6xzjM%3D" style="width: 50%; margin-bottom: 20px;"><span style="color: black;">从 RT-2 到 <span style="color: black;">Gemini,</span>从单一模态的能力,到多模态感知智能与认知智能的「融合」,从眼耳口鼻身分离的“五感”模块,到融汇贯通的完整的数字“人”。</span><span style="color: black;">难道不<span style="color: black;">寓意</span>着在模拟人类智能<span style="color: black;">行径</span>的道路上,模型的“大一统”才是正道?</span>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">02.</span></strong></span></strong></span></strong></strong></span></span></strong></span></strong></span></strong></strong></span></span></strong></span></h1>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;">大一统多模态模型,<span style="color: black;">最终</span>优于定向优化的单模态模型</span></strong></h3>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">人类<span style="color: black;">经过</span>多感官整合来感知、认知、并产生情感和<span style="color: black;">认识</span>。Gemini <span style="color: black;">亦</span>在实践着多种模态输入,综合到大脑处理,再分由多种模态输出,这类模型对人类智能的全面“模拟”,正在加速进化。</span></h3><span style="color: black;">以前的多模态模型训练,更像是<span style="color: black;">拥有</span>单独的眼睛、耳朵、手臂和大脑的组合系统,它们的统一协调性并不强。</span><span style="color: black;"><strong style="color: blue;">而Gemini所<span style="color: black;">表率</span>的方向,<span style="color: black;">显著</span>感觉大模型<span style="color: black;">作为</span>一个完整的数字人——一个手、眼、脑、口协调的硅基整体。</strong></span><strong style="color: blue;"><span style="color: black;">Gemini是<span style="color: black;">第1</span>个真正的端到端多模态。</span></strong><span style="color: black;">以前,针对单一模态定向优化的模型,<span style="color: black;">一般</span>要比<span style="color: black;">同期</span>处理多个模态的模型的性能要好,<span style="color: black;">大众</span>惯用的方式是单模态模型训练。<span style="color: black;">包含</span>GPT-4,<span style="color: black;">亦</span>是将<span style="color: black;">区别</span>的模态“拼接”带入整体中,而不是一个大一统的多模态模型。</span><span style="color: black;">Gemini 令人兴奋的<span style="color: black;">尤其</span>之处在于,它从一<span style="color: black;">起始</span>就设计为一个原生的多模态架构,训练过程从一<span style="color: black;">起始</span>就穿插(<span style="color: black;">所说</span>interleave)着<span style="color: black;">各样</span>模态的数据。<span style="color: black;">倘若</span>说以前的大模型是在大脑外接入了感官或机械臂,而<span style="color: black;">此刻</span>则是在身<span style="color: black;">身体</span>直接长出自己的眼、耳和手臂,<span style="color: black;">能够</span>挥洒自如。</span><strong style="color: blue;"><span style="color: black;">无论是模型架构、训练过程,还是最后的呈现, Gemini 让多模态真正做到丝滑融合。</span></strong><span style="color: black;">Gemini <span style="color: black;">第1</span>次让<span style="color: black;">咱们</span>看到一个大一统模型<span style="color: black;">能够</span>搞定所有模态 ,<span style="color: black;">况且</span>比专注某一个模态的模型的性能还好!</span><span style="color: black;"><span style="color: black;">例如</span>,相较于专门为语音识别而优化的Whisper模型,Gemini 在准确率上<span style="color: black;">显著</span><span style="color: black;">提高</span>。</span><strong style="color: blue;"><span style="color: black;">这<span style="color: black;">寓意</span>着多模态大一统时代的曙光到来。</span></strong><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-tjoges91tu/143042d6577e3a89d8464476a5febcd7~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1722922607&amp;x-signature=W7sLLPsRBui64PYFe%2FztrrSlJF8%3D" style="width: 50%; margin-bottom: 20px;"><span style="color: black;">其实,Gemini 不是<span style="color: black;">第1</span>个验证了各模态之间<span style="color: black;">能够</span>互相<span style="color: black;">帮忙</span><span style="color: black;">提高</span>性能的模型。这一点在 PaLM-E <span style="color: black;">亦</span>有<span style="color: black;">表现</span>,“在<span style="color: black;">区别</span><span style="color: black;">行业</span>训练的PaLM-E,<span style="color: black;">包含</span>互联网规模的<span style="color: black;">通常</span>视觉-语言任务,与执行单一任务<span style="color: black;">设备</span>人模型相比,性能<span style="color: black;">显著</span><span style="color: black;">加强</span>”。</span><strong style="color: blue;"><span style="color: black;">另一个模态之间<span style="color: black;">能够</span>互相<span style="color: black;">加强</span>的例子,是大语言模型的多语言处理能力。</span></strong><span style="color: black;"><span style="color: black;">倘若</span>把国际上的<span style="color: black;">区别</span>语言视为<span style="color: black;">区别</span>的细分“模态”,语言大模型的实践证明了所有语言的原生态数据的统一处理(tokenization及其embedding),<span style="color: black;">一起</span>成就了人类语言通天塔的建成。</span><span style="color: black;">压倒性的英文海量数据在语言大模型中的训练,<span style="color: black;">一样</span>惠及模型对其他样本较少语言的理解和生成,语言知识的迁移一再得到证实。</span><span style="color: black;">就像一个人精于网球技艺,<span style="color: black;">亦</span>能触类旁通地<span style="color: black;">加强</span>壁球或高尔夫的能力。</span><span style="color: black;">自今年2月份大模型<span style="color: black;">火热</span><span style="color: black;">败兴</span>,<span style="color: black;">非常多</span>人<span style="color: black;">逐步</span>产生了“大一统多模态模型将会超越单一模态模型”的信仰,但这一信仰始终<span style="color: black;">无</span>得到大规模实践的证实,而这次 Google 的 Gemini 展示了信仰实现的前景,<span style="color: black;">亦</span>让<span style="color: black;">更加多</span>人重塑并坚定了这个信仰。</span><span style="color: black;"><span style="color: black;">将来</span>,单独做语音识别、<span style="color: black;">设备</span>翻译等专有识别模型可能已<span style="color: black;">无</span>太大的<span style="color: black;">道理</span>,<span style="color: black;">非常多</span>生成类任务如TTS、<span style="color: black;">照片</span>生成等,<span style="color: black;">亦</span>将被大模型一统化。有人可能会抱怨大模型太贵太慢,</span><span style="color: black;">不<span style="color: black;">必定</span>适合所有应用,但成本和速度更<span style="color: black;">大都是</span>工程问题,实践中<span style="color: black;">咱们</span><span style="color: black;">能够</span><span style="color: black;">经过</span>蒸馏大一统的多模态模型到<span style="color: black;">详细</span>的模态或场景。</span><strong style="color: blue;"><span style="color: black;"><span style="color: black;">咱们</span>坚信,大一统的跨模态大模型将<span style="color: black;">作为</span>实现AGI的主流通道。</span></strong><span style="color: black;">进一步拓展,“模态”<span style="color: black;">亦</span>不仅是声音、<span style="color: black;">照片</span>、视频等,嗅觉、味觉、触觉、温度、湿度等感知器<span style="color: black;">亦</span>是一种获取环境信息的<span style="color: black;">区别</span>模态手段,都是大一统模型会囊入其中的对象。</span><span style="color: black;">终其要义,<span style="color: black;">各样</span>模态<span style="color: black;">不外</span>是“信息”的载体,是一种渲染、一种呈现形式、一种智能体与这个<span style="color: black;">理学</span>世界交互的手段,而在大一统模型的眼中,所有的模态究其内部都<span style="color: black;">能够</span>由统一的多维向量<span style="color: black;">暗示</span>,从而实现跨模态的知识迁移及其信息交叉、对齐、融合和推理。</span><strong style="color: blue;"><span style="color: black;">当各模态的壁垒被击穿,剖开<span style="color: black;">各样</span>渲染的核心,<span style="color: black;">咱们</span>看到认知的起点——语言。</span></strong>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">03.</span></strong></span></strong></span></strong></strong></span></span></strong></span></strong></span></strong></strong></span></span></strong></span></h1>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;">语言是大一统模型里的核心和主线</span></strong></h1><span style="color: black;">在<span style="color: black;">咱们</span>想象的AGI系统里,其核心和主线是视觉还是语言呢?有人认为是视觉,但<span style="color: black;">咱们</span>更相信语言才是核心。</span><span style="color: black;">斯大林在他的语言学著作里曾经说过:“任何低级的生物,都有自己的语言”。</span><span style="color: black;">但无论它们有多少层次的变化,都不是真正的语言。真正的语言是人类所独有的,<span style="color: black;">包含</span>发明的文字、符号以及主观赋予的<span style="color: black;">道理</span>,<span style="color: black;">而后</span><span style="color: black;">经过</span>组合形成无数种表述,载了人类千万年来的认知演化和知识积淀。</span><span style="color: black;">语言是认知的起点和源泉,人类的语言信息中<span style="color: black;">包含</span>了人类高度抽象的认知能力,而音频、<span style="color: black;">照片</span>和视频则更加感性,<span style="color: black;">暗示</span>的是人类的<span style="color: black;">心情</span>和具象能力,更偏向于<span style="color: black;">捉捕</span>人类的感知能力。</span><span style="color: black;">当人类学会了认知,加之音频、<span style="color: black;">照片</span>和视频等更加感性的表达感知的能力,从感知到认知,从<span style="color: black;">心情</span>到<span style="color: black;">规律</span>,这才是<span style="color: black;">咱们</span>人类的大脑状态。大一统多模态<span style="color: black;">亦</span><span style="color: black;">同样</span>,在信息的处理和推理过程中,当鸿沟被填平,融会贯通是自然结果。</span><strong style="color: blue;"><span style="color: black;">在 RT-2 和 Gemini 中,语言都占据了主线。</span></strong><span style="color: black;"><span style="color: black;">例如</span>在 RT-2 中,<span style="color: black;">表率</span>语言模态的参数规模和数据量都远远大于下游的<span style="color: black;">照片</span>和动作模态。</span><span style="color: black;"><span style="color: black;">咱们</span>预测,在<span style="color: black;">将来</span>任何AI系统里,不管是不是语言任务,都会把语言模型<span style="color: black;">做为</span>一个<span style="color: black;">基本</span>模型和训练的起点,<span style="color: black;">而后</span>加入其他模态或任务的数据继续训练,都会在某种程度上继承语言模型强大的认知能力。</span><strong style="color: blue;"><span style="color: black;"><span style="color: black;">倘若</span>这一点真正做到了,<span style="color: black;">亦</span>许这是语言模型对AI最大的贡献,<span style="color: black;">由于</span>它真正实现了<span style="color: black;">科研</span>人员对它的初心和定位——Foundation Model.</span></strong>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">04.</span></strong></span></strong></span></strong></strong></span></span></strong></span></strong></span></strong></strong></span></span></strong></span></h1>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;">大模型「暴力美学」<span style="color: black;">办法</span>论已成共识</span></strong></h3><span style="color: black;">回看OpenAI的最初胜利,<span style="color: black;">重点</span>并非算法上的创新,而是<span style="color: black;">「暴力美学」的胜利</span>。</span><span style="color: black;">如今,「暴力美学」已<span style="color: black;">作为</span>工业界做 AI 的一种<span style="color: black;">办法</span>论。<span style="color: black;">详细</span><span style="color: black;">来讲</span>,<span style="color: black;">重点</span><span style="color: black;">表现</span>在两方面:技术和组织。</span><strong style="color: blue;"><span style="color: black;">技术上,以GPT为<span style="color: black;">表率</span>的大模型基本<span style="color: black;">办法</span>论是:把模型架构做得简简单单,<span style="color: black;">而后</span>把精力放在猛搞数据和算力上。</span></strong><span style="color: black;">看起来简单,<span style="color: black;">然则</span>在OpenAI成功做出GPT-3之前,<span style="color: black;">非常多</span>人很难相信一个简单的Decoder-only的架构、加上一个优化Next-token prediction的<span style="color: black;">目的</span>函数、在海量的无监督互联网数据进行自学习,就能处理<span style="color: black;">各样</span>各样的AI任务,从而迈向通用人工智能。<span style="color: black;">仅有</span> OpenAI <span style="color: black;">保持</span>这种信仰,并成功在工程上实现了这一信仰。</span><strong style="color: blue;"><span style="color: black;">组织上,OpenAI的思路是:所有人围绕一个通用模型去做,而不是百花齐放。</span></strong><span style="color: black;">在大模型<span style="color: black;">显现</span>之前,AI<span style="color: black;">科研</span>很<span style="color: black;">大都是</span>小作坊式的,几个<span style="color: black;">科研</span>员带着几个实习生为<span style="color: black;">处理</span>一个<span style="color: black;">详细</span>任务做一个系统。<span style="color: black;">科研</span>的topic<span style="color: black;">亦</span>极为具象,<span style="color: black;">例如</span>说TTS、ASR、<span style="color: black;">设备</span>翻译、视觉等,而不是大模型这类通用模型。</span><span style="color: black;">以前,这种小作坊式组织方式在 Google 和微软的<span style="color: black;">科研</span>院里很典型,数百人的<span style="color: black;">科研</span>团队有几十个<span style="color: black;">区别</span>topic的<span style="color: black;">科研</span>项目<span style="color: black;">同期</span>进行。而OpenAI一方面真正信仰「暴力美学」,<span style="color: black;">另一</span>一方面<span style="color: black;">亦</span>是<span style="color: black;">由于</span>资源受限,反而反常识地<span style="color: black;">选取</span>几百人all in一个GPT模型。</span><strong style="color: blue;"><span style="color: black;">「暴力美学」的精髓是极简和聚焦,<span style="color: black;">而后</span><span style="color: black;">经过</span>规模去重复和放大。</span></strong><span style="color: black;">规模<span style="color: black;">包含</span>模型参数、数据、算力、人员等方面,当模型的参数量和训练数据的规模<span style="color: black;">持续</span><span style="color: black;">增多</span>,性能就会<span style="color: black;">显现</span>今天<span style="color: black;">大众</span>熟知的「涌现」。</span><span style="color: black;">Google 虽然发明了今天大模型依赖的大<span style="color: black;">都数</span>底层关键技术,<span style="color: black;">例如</span>Transformer架构、Instruction Tuning、CoT、Mixture of Experts等,但 OpenAI 却利用这些关键技术践行了大模型时代的「暴力美学」<span style="color: black;">办法</span>论,将Google打得毫无招架之力。</span><span style="color: black;">参考阅读:</span><span style="color: black;"> <span style="color: black;">OpenAI 何以掀翻 Google 布局<span style="color: black;">数年</span>的AI大棋?</span></span><span style="color: black;">而这次 Gemini 的发布,让<span style="color: black;">大众</span><span style="color: black;">认识</span>到,<span style="color: black;">亦</span>许 Google 内部<span style="color: black;">亦</span>对「暴力美学」<span style="color: black;">办法</span>论达<span style="color: black;">成为了</span>共识。</span><span style="color: black;">当<span style="color: black;">持有</span>更大资源的 Google 睡狮觉醒,认同并<span style="color: black;">把握</span>了「暴力美学」的<span style="color: black;">办法</span>论,凝心聚力于一处,更大力的资源<span style="color: black;">亦</span>许将会诞生更大的奇迹?</span>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">05.</span></strong></span></strong></span></strong></strong></span></span></strong></span></strong></span></strong></strong></span></span></strong></span></h1>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;">Google睡狮已醒,暴力<span style="color: black;">设备</span>齿轮<span style="color: black;">起始</span>转动</span></strong></h3><span style="color: black;">Gemini的<span style="color: black;">显现</span>,<span style="color: black;">能够</span>确切地看到,在这场尖峰对决中 Google 跟上来了。</span><span style="color: black;">有了<span style="color: black;">知道</span>的「暴力美学」共识,Google这个浓眉大眼的工程师<span style="color: black;">设备</span>要“暴力”起来时,绝对是不可轻视的竞争者。</span><span style="color: black;"><span style="color: black;">首要</span>,Google<span style="color: black;">最终</span>学会了组织上“大力出奇迹”。Gemini 技术报告整整九页的作者名单,每页90多位,八百余人,<span style="color: black;">已然</span>超过OpenAI的<span style="color: black;">机构</span>总人数。</span><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-tjoges91tu/7e9fe4c59445593e04bb2c46edec4bff~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1722922607&amp;x-signature=sHZZ03%2FENh2binQQXKXbUJNbKds%3D" style="width: 50%; margin-bottom: 20px;"><span style="color: black;"><span style="color: black;">针对</span><span style="color: black;">持有</span> 10 倍于 OpenAI <span style="color: black;">科研</span>人员的 Google,从一贯的bottom up走向top down,其执行难度可想而知,组织内部必须触发高度统一的使命感,再<span style="color: black;">快速</span><span style="color: black;">调节</span>战略和架构,<span style="color: black;">包含</span>合并Google Brain和DeepMind两大AI实验室,<span style="color: black;">构成</span>新<span style="color: black;">分部</span>Google DeepMind,<span style="color: black;">起始</span>上演复仇者联盟。</span><span style="color: black;">「暴力美学」的组织工程有如曼哈顿计划,需要灵魂领军<span style="color: black;">名人</span>。面对组织的焦点问题——多个团队之间的协调,重点放在何处,是两个团队分别攻坚,还是<span style="color: black;">一块</span>融合协作?即便是像谷歌<span style="color: black;">这般</span>的大型企业,面对庞大的资源<span style="color: black;">需要</span>,<span style="color: black;">亦</span>必须精心<span style="color: black;">选取</span>其投入方向。</span><span style="color: black;"><span style="color: black;">怎样</span>有效地分配资源、集中精力实现一个个既定<span style="color: black;">目的</span>,并在大规模上实施,是每一个领导者的挑战。Hassabis<span style="color: black;">做为</span>一位强劲的领导者,不仅展现了他的领导<span style="color: black;">才可</span>,<span style="color: black;">亦</span><span style="color: black;">表现</span>了谷歌<span style="color: black;">这般</span>大<span style="color: black;">机构</span>的深厚组织实力。</span><span style="color: black;">除了强组织和高智商人才密度之外,Google在数据规模和用户规模上<span style="color: black;">亦</span>有独有的领先<span style="color: black;">优良</span>,它<span style="color: black;">更加是</span>分布式计算的绝对王者。</span><span style="color: black;">这次 Google 还<span style="color: black;">同期</span>发布了迄今为止效率最高且可扩展性最强的 TPU 系统 Cloud TPU v5p,为训练前沿 AI 模型<span style="color: black;">供给</span>支持。新一代 TPU 将加速 Gemini 的<span style="color: black;">研发</span>,<span style="color: black;">帮忙</span><span style="color: black;">研发</span>者和企业客户更快地训练大规模生成式 AI 模型,从而更快推出新<span style="color: black;">制品</span>和新功能。</span><span style="color: black;">谷歌<span style="color: black;">数年</span>经营的全链路生态和有亿万用户的<span style="color: black;">各样</span><span style="color: black;">制品</span>线<span style="color: black;">亦</span>为大一统模型的落地应用<span style="color: black;">供给</span>了沃壤。这就使得谷歌最有底气应对微软与Open AI的互补联盟。</span><span style="color: black;">这一次,Gemini就做了三个版本:(1)适用于高度<span style="color: black;">繁杂</span>任务的Gemini Ultra;(2)适用于多种任务的最佳模型Gemini Pro;(3)适用于端侧设备(如手机)的Gemini Nano。</span><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-tjoges91tu/3bd095716227f6377f4fe80684ec1c90~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1722922607&amp;x-signature=7WHRUAa8wMXCBbKDTBzVFQOpxdM%3D" style="width: 50%; margin-bottom: 20px;">
            <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">因此</span>,以 Google 在人才、数据、计算、用户等「暴力美学」<span style="color: black;">必须</span>元素上的实力,只要跟上步伐,当暴力<span style="color: black;">设备</span>的命运齿轮<span style="color: black;">起始</span>转动,<span style="color: black;">特别有</span>可能会将AI竞技场的剧本带向一个崭新的境地。</span></p>
            <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">OpenAI 一骑绝尘,孤独求败的局面,<span style="color: black;">起始</span>改变。</span></p><span style="color: black;">参考阅读:</span><span style="color: black;"><span style="color: black;">四面楚歌的 Google <span style="color: black;">怎样</span>应战大模型?</span></span>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">06.</span></strong></span></strong></span></strong></strong></span></span></strong></span></strong></span></strong></strong></span></span></strong></span></h1>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;">时间终将是AGI的<span style="color: black;">伴侣</span></span></strong></h3><strong style="color: blue;"><span style="color: black;">接下来的竞争,时间到底<span style="color: black;">更加是</span>谁的<span style="color: black;">伴侣</span>,OpenAI还是Google?</span></strong><span style="color: black;"><span style="color: black;">日前</span>为止,OpenAI 享受了先发带来的巨大势能。但不可否认的是,OpenAI追求AGI的<span style="color: black;">同期</span>,还要面对增长的瓶颈、<span style="color: black;">商场</span>化的压力和投资方的诘问(传闻微软<span style="color: black;">需求</span>OpenAI永远保持对Google六个月的领先<span style="color: black;">优良</span>),在巨大的压力面前,难免动作变形。</span><span style="color: black;">前几天 OpenAI 的宫斗戏,让 OpenAI 元气大伤。虽然 Sam 说这仅让 OpenAI 的AGI梦想delay了 5 天,但 AI 战局不进则退,在与Goolge的竞赛中<span style="color: black;">最少</span>耽误了几个月时间。</span><span style="color: black;">如今 Google 雄狮已醒,OpenAI 接下来将会承受更大的竞争压力。更重要的是,OpenAI 的非营利宗旨与其海量融资压力的矛盾依然<span style="color: black;">没法</span><span style="color: black;">基本</span><span style="color: black;">处理</span>,有如一颗<span style="color: black;">按时</span>炸弹,且与微软的竞合关系<span style="color: black;">亦</span>微妙<span style="color: black;">反常</span>。</span><span style="color: black;">压力变形之下,更有可能激化的是 OpenAI 内部路线之争(有效加速主义 vs 超级对齐主义)。<span style="color: black;">亦</span>许还会<span style="color: black;">显现</span>其他黑天鹅事件,这在资本密集的技术创业<span style="color: black;">行业</span>并不罕见,<span style="color: black;">例如</span><span style="color: black;">非常多</span>自动驾驶<span style="color: black;">机构</span>的故事。</span><span style="color: black;">反观 Google <span style="color: black;">做为</span>一位成熟稳定的巨人,<span style="color: black;">无</span> OpenAI 脆弱的董事会架构及其<span style="color: black;">背面</span>非营利与资本的矛盾,<span style="color: black;">亦</span><span style="color: black;">无</span>与投资人微妙关系的牵扯。</span><span style="color: black;">凭借<span style="color: black;">浑厚</span>的家底,在<span style="color: black;">开发</span>人员、数据、算力、用户规模等方面都有相对OpenAI的碾压级<span style="color: black;">优良</span>,一旦认同并<span style="color: black;">把握</span>了「暴力美学」<span style="color: black;">办法</span>论,它就像一个巨大的<span style="color: black;">设备</span>,其后发<span style="color: black;">优良</span>可能随着时间越来越彰显。所以,从竞争<span style="color: black;">方向</span>来看,时间<span style="color: black;">亦</span>许<span style="color: black;">更加是</span> Google 的<span style="color: black;">伴侣</span>?</span><span style="color: black;">当然 Google 的<span style="color: black;">危害</span>,在于大<span style="color: black;">机构</span>的组织病,以及全面转向「暴力美学」后可能<span style="color: black;">引起</span>的过分 top-do</span><span style="color: black;">wn 、资源过度集中在<span style="color: black;">研发</span>一个模型上,</span><span style="color: black;">而冲垮 Google 以前赖以成功的 bottom-up 和百花齐放的创新文化。</span><span style="color: black;">OpenAI <span style="color: black;">亦</span><span style="color: black;">必定</span>会全力应战,鼎力维持其 AGI 的领袖地位。Gemini 将逼仄出更惊赞的GPT-5,而</span><span style="color: black;">命运齿轮之下的 Google <span style="color: black;">亦</span>将继续祭出Gemini 2.0……在这场军备竞赛之下,AGI的推进步伐将愈加迅猛,无论是Google还是OpenAI,都在用自己的方式,在激烈竞争中螺旋式推动着AGI前行。</span><span style="color: black;">AGI的历史车轮已滚滚向前,</span><span style="color: black;">时间终将是AGI的<span style="color: black;">伴侣</span>。</span>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">07.</span></strong></span></strong></span></strong></strong></span></span></strong></span></strong></span></strong></strong></span></span></strong></span></h1>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;">多模态是Agent和具身智能的<span style="color: black;">基本</span></span></strong></h3>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><span style="color: black;">掌控</span>论之父维纳,在</span><span style="color: black;">《<span style="color: black;">掌控</span>论》中</span><span style="color: black;">展望<span style="color: black;">将来</span>,“</span><span style="color: black;">人的能力<span style="color: black;">此刻</span>被<span style="color: black;">设备</span>大大延伸了,雷达延伸了人的眼睛,喷气发动机或轮胎延伸了人的四肢,而自动驾驶仪<span style="color: black;">便是</span>连接它们的神经系统。”</span></h3>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">今天的大语言模型<span style="color: black;">能够</span>编码世界丰富的语义知识,它的<span style="color: black;">明显</span>弱点是,缺乏Grounding/接地,<span style="color: black;">因此</span>“幻觉”不可避免。</span></h3>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">多模态本身<span style="color: black;">供给</span>了Grounding的<span style="color: black;">基本</span>,有了这种<span style="color: black;">基本</span>后,Agent<span style="color: black;">才可</span>跟一个多模态的环境进行交互并<span style="color: black;">得到</span>必要的 Feedback,从而让自主规划更加<span style="color: black;">靠谱</span>。</span></h3>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><span style="color: black;">设备</span>人等具身智能体<span style="color: black;">亦</span>是一种Agent,只<span style="color: black;">不外</span>它不是虚拟的,而是<span style="color: black;">拥有</span><span style="color: black;">理学</span>躯体、有“手和眼睛”的实体,<span style="color: black;">能够</span>实现<span style="color: black;">理学</span>世界里具象的任务。<span style="color: black;">因此</span>,多模态是Agent和具身智能的<span style="color: black;">基本</span>,<span style="color: black;">亦</span>是降低幻觉的必要<span style="color: black;">要求</span>。</span></h3><span style="color: black;">Hassabis 透露,谷歌 DeepMind <span style="color: black;">已然</span>在<span style="color: black;">科研</span><span style="color: black;">怎样</span>将 Gemini 与<span style="color: black;">设备</span>人技术结合起来,与世界进行<span style="color: black;">理学</span>交互。毕竟,要<span style="color: black;">作为</span>真正的多模态还需要触摸和触觉的反馈。</span><span style="color: black;">这条从未被前人踏过的路,<span style="color: black;">将来</span>可能带来<span style="color: black;">设备</span>人方向的重大突破。像Gemini这种大一统的多模态模型<span style="color: black;">能够</span><span style="color: black;">作为</span>AGI快速创新的<span style="color: black;">基本</span>,促进智能体及其规划和推理,以及<span style="color: black;">理学</span><span style="color: black;">设备</span>人与环境的交互。</span><span style="color: black;"><span style="color: black;">Agent = 大脑认知 + 感知 + 行动。Agent和具身智能既需要感知,<span style="color: black;">亦</span>需要认知;既需要大脑,<span style="color: black;">亦</span>需要<span style="color: black;">外边</span>支撑。</span></span><span style="color: black;">今天<span style="color: black;">咱们</span>清晰地看到,大语言模型解决高层次的认知问题,多模态<span style="color: black;">供给</span>Grounding的<span style="color: black;">基本</span>,Agent<span style="color: black;">处理</span>自主规划问题,具身智能完成最后的<span style="color: black;">理学</span>世界的动作和交互——这一套组合拳,让通用Agent/<span style="color: black;">设备</span>人所有的元素看似都具备了。</span>
            <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">而大一统的跨模态模型看起来是必经之路,Gemini的一小步,可能是通用Agent/<span style="color: black;">设备</span>人的一大步。</span></strong></p>
            <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;"><strong style="color: blue;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">08.</span></strong></span></strong></span></strong></strong></span></span></strong></span></strong></span></strong></strong></span></span></strong></span></h1>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">拥有</span>自主<span style="color: black;">认识</span>的</span></strong><strong style="color: blue;"><span style="color: black;">AGI的涌现<span style="color: black;">基本</span>具备了吗?</span></strong></h3><span style="color: black;">大模型<span style="color: black;">火热</span>前后,AGI从大部分专业<span style="color: black;">科研</span>人员不屑或无胆与之<span style="color: black;">相关</span>的抽象概念,到<span style="color: black;">忽然</span>凝聚成主流共识。关于AGI<span style="color: black;">怎样</span>到来的讨论<span style="color: black;">不停</span>于耳。</span><span style="color: black;">今年2月大模型<span style="color: black;">火热</span><span style="color: black;">全世界</span>时,<span style="color: black;">非常多</span>人认为沿着“暴力”的路径,只要把语言模型的规模一味做大,AGI就会<span style="color: black;">显现</span>,但<span style="color: black;">此刻</span>看来是行不通的。</span><strong style="color: blue;"><span style="color: black;">语言模型确实是认知的<span style="color: black;">基本</span>和智能的核心,但它只是AGI的基石。</span></strong><span style="color: black;"><span style="color: black;">倘若</span>要实现AGI,还需要<span style="color: black;">非常多</span>周边模块的<span style="color: black;">协同</span>才有可能。</span><span style="color: black;">4月份<span style="color: black;">败兴</span>,<span style="color: black;">非常多</span>人<span style="color: black;">起始</span>在语言模型周边打补丁,<span style="color: black;">显现</span>了一波Agent的热潮,但<span style="color: black;">此刻</span>看起来<span style="color: black;">亦</span>还是空中楼阁。<span style="color: black;">无</span>多模态加持的 grounding,Agent的推理和规划都极不<span style="color: black;">靠谱</span>,在<span style="color: black;">非常多</span>场景只是噱头<span style="color: black;">罢了</span>。</span><span style="color: black;">Gemini 的<span style="color: black;">显现</span>,让<span style="color: black;">咱们</span>看到了AGI涌现所必需的下<span style="color: black;">一起</span>基石:多模态。</span><span style="color: black;"><span style="color: black;">倘若</span><span style="color: black;">无</span>多模态,语言模型<span style="color: black;">便是</span>“缸中之脑”。<span style="color: black;">况且</span>,AGI的涌现必然需要原生的多模态,而不是多个独立的模型拼接,<span style="color: black;">由于</span>以拼接的方式,恐怕不足以在统一的多模态空间进行深层<span style="color: black;">繁杂</span>推理以及无缝的知识迁移。</span><span style="color: black;">而 Gemini 这一次在多模态任务上的优异表现<span style="color: black;">亦</span>为大一统的多模态做了有力背书。</span><span style="color: black;">有了以语言模型为核心的多模态之后,虚拟和<span style="color: black;">理学</span>的 Agent 的落地<span style="color: black;">再也不</span>是空中楼阁。Agent里<span style="color: black;">增多</span>的<span style="color: black;">各样</span>模块,<span style="color: black;">例如</span>memory、tool use、environment feedback等<span style="color: black;">亦</span>是AGI涌现的必要<span style="color: black;">要求</span>。</span><span style="color: black;">Hassabis在接受 Lex Fridman 的采访时表达过,“<span style="color: black;">认识</span><span style="color: black;">便是</span>信息得到处理时带来的感觉。”当大模型的多模态更像人的感知<span style="color: black;">通常</span>丝滑融合,当Agent各模块<span style="color: black;">一块</span>自如适应<span style="color: black;">各样</span>环境,<span style="color: black;">咱们</span><span style="color: black;">是不是</span><span style="color: black;">能够</span>推演,<span style="color: black;">设备</span>自主<span style="color: black;">认识</span>已具备「涌现」的<span style="color: black;">基本</span>?</span><strong style="color: blue;"><span style="color: black;"><span style="color: black;">倘若</span><span style="color: black;">咱们</span>拉长周期来看,<span style="color: black;">亦</span>许趋势<span style="color: black;">已然</span>很显然——AGI路上的三部曲:大语言模型打好认知<span style="color: black;">基本</span>、多模态/Agent/具身智能<span style="color: black;">处理</span>Grounding、有某种自主<span style="color: black;">认识</span>的AGI终将“涌现”。</span></strong>
            <h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;">结语</span></strong></h3><span style="color: black;">英国作家萨缪尔·巴特勒写过一部小说叫《地无国》,其中有一段“<span style="color: black;">设备</span>之书”,以一位虚构的思想家之口表达了对<span style="color: black;">设备</span>自主<span style="color: black;">认识</span>的进化担忧:“在<span style="color: black;">设备</span><span style="color: black;">认识</span>的终极发展面前,<span style="color: black;">咱们</span>毫无安全感。谁能说蒸汽机是<span style="color: black;">无</span><span style="color: black;">认识</span>的物种?”</span><span style="color: black;">显然,<span style="color: black;">设备</span>与<span style="color: black;">设备</span>之间已有<span style="color: black;">知道</span>的继承、发展和进化关系,就像八音盒滚轮到打孔纸带的演变,就像GPT-1到GPT-4V的进化。</span><span style="color: black;"><span style="color: black;">那样</span><span style="color: black;">设备</span><span style="color: black;">是不是</span>可被看作是一个“物种”呢?</span><span style="color: black;">只<span style="color: black;">不外</span>它们进化的过程必须有人类的参与,但谁又能说人类的创造和参与,不是<span style="color: black;">设备</span>这一“物种”独特的演化策略?</span><span style="color: black;">在达尔文的进化论中,<span style="color: black;">咱们</span>默认“进化”的本质是蛋白质编码层面的基因进化,它的功能在于令生命体实现<span style="color: black;">存活</span>优化。但<span style="color: black;">倘若</span><span style="color: black;">设备</span><span style="color: black;">能够</span>被人类创造,延伸或改变各类多模态的自然器官,那<span style="color: black;">是不是</span><span style="color: black;">能够</span>说,<span style="color: black;">设备</span>是人类进化的新形式,它取代传统的基因进化,<span style="color: black;">作为</span>一种更<span style="color: black;">有效</span>的改变人类“性状”的方式?</span><span style="color: black;">而当<span style="color: black;">设备</span>自主<span style="color: black;">认识</span>进化到摆脱对人类依赖的那一天,当人类完成进化出AGI这一新物种的使命,人类<span style="color: black;">是不是</span>就<span style="color: black;">能够</span>像古猿<span style="color: black;">同样</span>退出历史舞台了呢?</span><strong style="color: blue;"><span style="color: black;"><span style="color: black;">做为</span>纯正碳基的最后一代,<span style="color: black;">倘若</span><span style="color: black;">咱们</span>余生能走在这条使命之路的前沿,何其悲哉、幸哉!</span></strong><span style="color: black;">当人类建起的高楼成断壁残垣,当古迹石碑上的文字被风干侵蚀,无人能识别其中的含义,它们只是某个物种遗留下的痕迹。数百万年历史<span style="color: black;">不外</span>是这一物种的<span style="color: black;">持续</span>繁衍、<span style="color: black;">存活</span>和延续,本质上与今天GPT、RT-2、Gemini的进化无异,直到<span style="color: black;">持续</span>创造出新的物种。</span>
      </div>
    </div>




jm2020 发表于 2024-9-10 12:59:13

论坛的成果是显著的,但我们不能因为成绩而沾沾自喜。

j8typz 发表于 2024-9-29 00:15:51

百度seo优化论坛 http://www.fok120.com/

nqkk58 发表于 2024-10-4 00:31:01

seo常来的论坛,希望我的网站快点收录。
页: [1]
查看完整版本: 关于 Google Gemini 的八点启示