9q13nh 发表于 2024-10-3 13:14:13

微软开源的GraphRAG爆火,生成式AI进入知识图谱时代?


    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">设备</span>之心<span style="color: black;">报告</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">编辑:Panda W</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">知识图谱从不退环境!</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">LLM 很强大,但<span style="color: black;">亦</span>存在<span style="color: black;">有些</span><span style="color: black;">显著</span>缺点,<span style="color: black;">例如</span>幻觉问题、可解释性差、抓不住问题重点、隐私和安全问题等。检索<span style="color: black;">加强</span>式生成(RAG)可大幅<span style="color: black;">提高</span> LLM 的生成质量和结果有用性。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本月初,微软发布最强 RAG 知识库开源<span style="color: black;">方法</span> GraphRAG,项目上线即爆火,现在星标量<span style="color: black;">已然</span>达到 10.5 k。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/992e6a3c9b10414eade51acd6ed797e2~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=VAKgg6mKykCrtoXwIF0oNsY%2BMSg%3D" style="width: 50%; margin-bottom: 20px;"></div><span style="color: black;">项目<span style="color: black;">位置</span>:https://github.com/microsoft/graphrag</span><span style="color: black;">官方文档:https://microsoft.github.io/graphrag/</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">有人<span style="color: black;">暗示</span>,它比普通的 RAG 更强大:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/6875882b680147bda3ce20dd70cbe5c4~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=aTli8a1DgmNh%2FkaDi%2BIHh5fnOi4%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">GraphRAG <span style="color: black;">运用</span> LLM 生成知识图谱,在对<span style="color: black;">繁杂</span>信息进行文档分析时可<span style="color: black;">明显</span><span style="color: black;">加强</span>问答性能,尤其是在处理私有数据时。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/444147a5e0a54abaa045bf990d394c15~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=d1n9Cq10%2F%2FB7Kkrm1qiL79flmAk%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">GraphRAG 和传统 RAG 对比结果</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">现如今,RAG 是一种<span style="color: black;">运用</span>真实世界信息改进 LLM 输出的技术,是大<span style="color: black;">都数</span>基于 LLM 的工具的重要<span style="color: black;">构成</span>部分,<span style="color: black;">通常</span>而言,RAG <span style="color: black;">运用</span>向量<span style="color: black;">类似</span>性<span style="color: black;">做为</span>搜索,<span style="color: black;">叫作</span>之为 Baseline RAG(基准RAG)。但 Baseline RAG 在某些<span style="color: black;">状况</span>下表现并不完美。例如:</span></p><span style="color: black;">Baseline RAG 难以将各个点连接起来。当回答问题需要<span style="color: black;">经过</span>共享属性遍历<span style="color: black;">区别</span>的信息片段以<span style="color: black;">供给</span>新的综合见解时,就会<span style="color: black;">出现</span>这种<span style="color: black;">状况</span>;</span><span style="color: black;">当被<span style="color: black;">需求</span>全面理解大型数据集<span style="color: black;">乃至</span>单个大型文档中的总结语义概念时,Baseline RAG 表现<span style="color: black;">不良</span>。</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">微软提出的 GraphRAG 利用 LLM <span style="color: black;">按照</span>输入的文本库创建一个知识图谱。这个图谱结合社区摘要和图<span style="color: black;">设备</span>学习的输出,在<span style="color: black;">查找</span>时<span style="color: black;">加强</span>提示。GraphRAG 在回答<span style="color: black;">以上</span>两类问题时<span style="color: black;">表示</span>出<span style="color: black;">明显</span>的改进,展现了在处理私有数据集上超越以往<span style="color: black;">办法</span>的性能。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">不外</span>,随着<span style="color: black;">大众</span>对 GraphRAG 的深入<span style="color: black;">认识</span>,<span style="color: black;">她们</span><span style="color: black;">发掘</span>其原理和内容真的让人很难理解。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/86bd99574f00409ebdc29fadd696a72e~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=Hkq46etxN7mMSeoPhNhaNSbMRYY%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">近期</span>,Neo4j <span style="color: black;">机构</span> CTO Philip Rathle 发布了一篇标题为《GraphRAG 宣言:将知识加入到生成式 AI 中》的博客<span style="color: black;">文案</span>,Rathle 用通俗易懂的语言<span style="color: black;">仔细</span>介绍了 GraphRAG 的原理、与传统 RAG 的区别、GraphRAG 的<span style="color: black;">优良</span>等。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">他<span style="color: black;">暗示</span>:「你的下一个生成式 AI 应用很可能就会用上知识图谱。」</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/2ae72809db4b4d02ac24b2e0f639b874~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=exx8mUVbcsVR%2BwgRnlnLYfO1OAQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Neo4j CTO Philip Rathle</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">下面来看这篇<span style="color: black;">文案</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">咱们</span>正在<span style="color: black;">逐步</span>认识到这一点:要<span style="color: black;">运用</span>生成式 AI 做<span style="color: black;">有些</span>真正有<span style="color: black;">道理</span>的事情,你就<span style="color: black;">不可</span>只依靠自回归 LLM 来帮你做决定。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">我<span style="color: black;">晓得</span>你在想什么:「用 RAG 呀。」<span style="color: black;">或</span>微调,又<span style="color: black;">或</span>等待 GPT-5。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">是的。基于向量的检索<span style="color: black;">加强</span>式生成(RAG)和微调等技术能帮到你。<span style="color: black;">况且</span>它们<span style="color: black;">亦</span>确实能足够好地<span style="color: black;">处理</span>某些用例。但有一类用例却会让所有这些技术折戟沉沙。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">针对<span style="color: black;">非常多</span>问题,基于向量的 RAG(以及微调)的<span style="color: black;">处理</span><span style="color: black;">办法</span>本质上<span style="color: black;">便是</span>增大正确答案的概率。<span style="color: black;">然则</span>这两种技术都<span style="color: black;">没法</span><span style="color: black;">供给</span>正确答案的确定程度。它们<span style="color: black;">一般</span>缺乏背景信息,难以与你<span style="color: black;">已然</span><span style="color: black;">晓得</span>的东西建立联系。<span style="color: black;">另外</span>,这些工具<span style="color: black;">亦</span>不会<span style="color: black;">供给</span>线索让你<span style="color: black;">认识</span>特定决策的<span style="color: black;">原由</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">让<span style="color: black;">咱们</span>把视线转回 2012 年,那时候谷歌推出了自己的第二代搜索引擎,并发布了一篇标志性的博客<span style="color: black;">文案</span>《Introducing the Knowledge Graph: things, not strings》。<span style="color: black;">她们</span><span style="color: black;">发掘</span>,<span style="color: black;">倘若</span>在执行<span style="color: black;">各样</span>字符串处理之外再<span style="color: black;">运用</span>知识图谱来组织所有网页中用字符串<span style="color: black;">暗示</span>的事物,<span style="color: black;">那样</span>有可能为搜索带来飞跃式的<span style="color: black;">提高</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">此刻</span>,生成式 AI <span style="color: black;">行业</span><span style="color: black;">亦</span><span style="color: black;">显现</span>了类似的模式。<span style="color: black;">非常多</span>生成式 AI 项目都遇到了瓶颈,其生成结果的质量受限于这一事实:<span style="color: black;">处理</span><span style="color: black;">方法</span>处理的是字符串,而非事物。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">快进到今天,前沿的 AI 工程师和学术<span style="color: black;">科研</span>者们重新<span style="color: black;">发掘</span>了谷歌曾经的<span style="color: black;">发掘</span>:打破这道瓶颈的秘诀<span style="color: black;">便是</span>知识图谱。换句话说,<span style="color: black;">便是</span>将<span style="color: black;">相关</span>事物的知识引入到基于统计的文本技术中。其工作方式就类似于其它 RAG,只<span style="color: black;">不外</span>除了向量索引外还要调用知识图谱。<span style="color: black;">亦</span><span style="color: black;">便是</span>:GraphRAG!(GraphRAG = 知识图谱 + RAG)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本文的<span style="color: black;">目的</span>是全面且易懂地介绍 GraphRAG。<span style="color: black;">科研</span><span style="color: black;">显示</span>,如果将你的数据构建成知识图谱并<span style="color: black;">经过</span> RAG 来<span style="color: black;">运用</span>它,就能为你带来多种强劲<span style="color: black;">优良</span>。有<span style="color: black;">海量</span><span style="color: black;">科研</span>证明,相比于仅<span style="color: black;">运用</span>普通向量的 RAG,GraphRAG 能更好地回答你向 LLM 提出的大部分乃至<span style="color: black;">所有</span>问题。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">单这一项<span style="color: black;">优良</span>,就足以<span style="color: black;">极重</span>地推动人们采用 GraphRAG 了。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">但还不止于此;<span style="color: black;">因为</span>在构建应用时数据是可见的,<span style="color: black;">因此呢</span>其<span style="color: black;">研发</span>起来<span style="color: black;">亦</span>更简单。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">GraphRAG 的第三个<span style="color: black;">优良</span>是人类和<span style="color: black;">设备</span>都能很好地理解图谱并基于其执行推理。<span style="color: black;">因此呢</span>,<span style="color: black;">运用</span> GraphRAG 构建应用会更简单<span style="color: black;">容易</span>,并得到更好的结果,<span style="color: black;">同期</span>还更便于解释和审计(这对<span style="color: black;">非常多</span>行业<span style="color: black;">来讲</span>至关重要)。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">我相信 GraphRAG 将取代仅向量 RAG,<span style="color: black;">作为</span>大<span style="color: black;">都数</span>用例的默认 RAG 架构。本文将解释<span style="color: black;">原由</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">图谱是什么?</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">首要</span><span style="color: black;">咱们</span>必须阐明什么是图谱。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">图谱,<span style="color: black;">亦</span><span style="color: black;">便是</span> graph,<span style="color: black;">亦</span>常被译为「图」,但<span style="color: black;">亦</span><span style="color: black;">因此呢</span>容易与 image 和 picture 等概念混淆。本文为方便区分,仅采用「图谱」这一译法。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">图谱大概长<span style="color: black;">这般</span>:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/99d82fd630be480f9f8700c99395f603~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=SfJT8tGIeVzpbghZzlrFJVg%2Fsb8%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">图谱示例</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">尽管这张图常<span style="color: black;">做为</span>知识图谱的示例,但其出处和作者<span style="color: black;">已然</span>不可考。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">或<span style="color: black;">这般</span>:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/28244fb4cd1a419883d684f34a35628a~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=22sNgkewGk45OzIVdMl9w6uAX1c%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">《权力的游戏》<span style="color: black;">名人</span>关系图谱,来自 William Lyon</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">或<span style="color: black;">这般</span>:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/4e57a367157b49459ce323ab1419043a~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=t7iaGFDLwlsoR6JTV%2Biwj5tlLhc%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">伦敦<span style="color: black;">轻轨</span>地图。有趣小知识:伦敦交通局前段时间<span style="color: black;">安排</span>了一个基于图谱的数字孪生应用,以<span style="color: black;">提高</span>事故响应能力并减少拥堵。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">换句话说,图谱不是图表。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">这儿</span><span style="color: black;">咱们</span>就不<span style="color: black;">太多</span>纠结于定义问题,就假设你<span style="color: black;">已然</span>明白图谱是什么了。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">倘若</span>你理解上面几张<span style="color: black;">照片</span>,<span style="color: black;">那样</span>你<span style="color: black;">亦</span>许能看出来<span style="color: black;">能够</span><span style="color: black;">怎样</span><span style="color: black;">查找</span>其底层的知识图谱数据(存储在图谱数据库中),并将其用作 RAG 工作流程的一部分。<span style="color: black;">亦</span><span style="color: black;">便是</span> GraphRAG。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">两种呈现知识的形式:向量和图谱</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">典型 RAG 的核心是向量搜索,<span style="color: black;">亦</span><span style="color: black;">便是</span><span style="color: black;">按照</span>输入的文本块从候选的书面材料中找到并返回概念<span style="color: black;">类似</span>的文本。这种自动化很好用,基本的搜索都大有用途。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">但你每次执行搜索时,可能并未思考过向量是什么<span style="color: black;">或</span><span style="color: black;">类似</span>度计算是怎么实现的。下面<span style="color: black;">咱们</span>来<span style="color: black;">瞧瞧</span> Apple(苹果)。它在人类视角、向量视角和图谱视角下呈现出了<span style="color: black;">区别</span>的形式:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/22971455a6d84c77909538a07e5aa2e1~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=R13crBJ9CsB44zOVkMgADzJUzh4%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">人类视角、向量视角和图谱视角下的 Apple</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在人类看来,苹果的表征很<span style="color: black;">繁杂</span>并且是多维度的,其特征<span style="color: black;">没法</span>被完整地描述到纸面上。<span style="color: black;">这儿</span><span style="color: black;">咱们</span><span style="color: black;">能够</span>充满诗意地想象这张红彤彤的照片能够在感知和概念上<span style="color: black;">暗示</span>一个苹果。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这个苹果的向量<span style="color: black;">暗示</span>是一个数组。向量的神奇之处在于它们各自以编码形式<span style="color: black;">捕捉</span>了其对应文本的本质。但在 RAG 语境中,<span style="color: black;">仅有</span>当你需要确定一段文本与另一段文本的<span style="color: black;">类似</span>度时,才需要向量。为此,只需简单地执行<span style="color: black;">类似</span>度计算并<span style="color: black;">检测</span>匹配程度。<span style="color: black;">然则</span>,<span style="color: black;">倘若</span>你想理解向量内部的含义、了解文本中<span style="color: black;">暗示</span>的事物、洞察其与更大规模语境的关系,那<span style="color: black;">运用</span>向量<span style="color: black;">暗示</span>法就无能为力了。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">相较之下,知识图谱是以<span style="color: black;">描述</span>式(declarative)的形式来<span style="color: black;">暗示</span>世界 —— 用 AI <span style="color: black;">行业</span>的术语<span style="color: black;">来讲</span>,<span style="color: black;">亦</span><span style="color: black;">便是</span>符号式(symbolic)。<span style="color: black;">因此呢</span>,人类和<span style="color: black;">设备</span>都<span style="color: black;">能够</span>理解知识图谱并基于其执行推理。这很重要,<span style="color: black;">咱们</span>后面还会<span style="color: black;">说到</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">另外</span>,你还<span style="color: black;">能够</span><span style="color: black;">查找</span>、可视化、标注、修改和延展知识图谱。知识图谱<span style="color: black;">便是</span>世界模型,能<span style="color: black;">暗示</span>你当前工作<span style="color: black;">行业</span>的世界。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">GraphRAG 与 RAG</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这两者并不是竞争关系。对 RAG <span style="color: black;">来讲</span>,向量<span style="color: black;">查找</span>和图谱<span style="color: black;">查找</span>都<span style="color: black;">特别有</span>用。正如 LlamaIndex 的创始人 Jerry Liu 指出的那样:思考 GraphRAG 时,将向量<span style="color: black;">包括</span>进来会<span style="color: black;">特别有</span><span style="color: black;">帮忙</span>。这<span style="color: black;">区别</span>于「仅向量 RAG」—— 完全基于文本嵌入之间的<span style="color: black;">类似</span>度。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">基本</span>上讲,GraphRAG <span style="color: black;">便是</span>一种 RAG,只是其检索路径<span style="color: black;">包括</span>知识图谱。下面你会看到,GraphRAG 的核心模式非常简单。其架构与<span style="color: black;">运用</span>向量的 RAG <span style="color: black;">同样</span>,但其中<span style="color: black;">包括</span>知识图谱层。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">GraphRAG 模式</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">GraphRAG 的一种常用模式</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/0fc5bd579f4549dd96425f819f461703~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=wITCpRf81eDO2cuYKmfSELWRGSU%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">能够</span>看到,上图中触发了一次图谱<span style="color: black;">查找</span>。其<span style="color: black;">能够</span><span style="color: black;">选取</span><span style="color: black;">是不是</span><span style="color: black;">包括</span>向量<span style="color: black;">类似</span>度组件。你<span style="color: black;">能够</span><span style="color: black;">选取</span>将图谱和向量<span style="color: black;">掰开</span>存储在两个<span style="color: black;">区别</span>的数据库中,<span style="color: black;">亦</span>可<span style="color: black;">运用</span> Neo4j 等支持向量搜索的图谱数据库。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">下面给出了一种<span style="color: black;">运用</span> GraphRAG 的常用模式:</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">1. 执行一次向量搜索或关键词搜索,找到一组初始节点;</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">2. 遍历图谱,带回<span style="color: black;">关联</span>节点的信息;</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">3.(可选)<span style="color: black;">运用</span> PageRank 等基于图谱的排名算法对文档进行重新排名</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">用例<span style="color: black;">区别</span>,<span style="color: black;">运用</span>模式<span style="color: black;">亦</span>会不<span style="color: black;">同样</span>。和当今 AI <span style="color: black;">行业</span>的各个<span style="color: black;">科研</span>方向<span style="color: black;">同样</span>,GraphRAG <span style="color: black;">亦</span>是一个<span style="color: black;">科研</span>丰富的<span style="color: black;">行业</span>,每周都有新<span style="color: black;">发掘</span>涌现。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">GraphRAG 的生命周期</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">运用</span> GraphRAG 的生成式 AI <span style="color: black;">亦</span>遵循其它任意 RAG 应用的模式,一<span style="color: black;">起始</span>有一个「创建图谱」<span style="color: black;">过程</span>:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/6049a2b3881244fcbaa825e8537495a4~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=sfq3XQ%2Bo4VjKiekkIdW5pGah%2BRY%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">GraphRAG 的生命周期</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">创建图谱类似于对文档进行分块并将其加载到向量数据库中。工具的发展进步<span style="color: black;">已然</span>让图谱创建变得相当简单。<span style="color: black;">这儿</span>有三个好<span style="color: black;">信息</span>:</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">1. 图谱有很好的迭代性 —— 你<span style="color: black;">能够</span>从一个「最小可行图谱」<span style="color: black;">起始</span>,<span style="color: black;">而后</span>基于其进行延展。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">2. 一旦将数据加入到了知识图谱中,就能很<span style="color: black;">容易</span>地演进它。你<span style="color: black;">能够</span>添加<span style="color: black;">更加多</span>类型的数据,从而<span style="color: black;">得到</span>并利用数据网络效应。你还<span style="color: black;">能够</span><span style="color: black;">加强</span>数据的质量,以<span style="color: black;">提高</span>应用的价值。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">3. 该<span style="color: black;">行业</span>发展<span style="color: black;">快速</span>,这就<span style="color: black;">寓意</span>着随着工具愈发<span style="color: black;">繁杂</span>精妙,图谱创建只会越来越容易<span style="color: black;">容易</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在之前的<span style="color: black;">照片</span>中加入图谱创建<span style="color: black;">过程</span>,<span style="color: black;">能够</span>得到如下所示的工作流程:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/5c57d1e3a82d409392e7ef0fb922b32b~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=gb7fCWILdLHQwmDUAmxdTMRweVM%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">添加图谱创建<span style="color: black;">过程</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">下面来<span style="color: black;">瞧瞧</span> GraphRAG 能带来什么好处。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">为何</span>要<span style="color: black;">运用</span> GraphRAG?</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">相较于仅向量 RAG,GraphRAG 的<span style="color: black;">优良</span><span style="color: black;">重点</span>分为三大类:</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">1. 准确度更高且答案更完整(运行时间 / 生产<span style="color: black;">优良</span>)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">2. 一旦创建好知识图谱,<span style="color: black;">那样</span>构建和<span style="color: black;">守护</span> RAG 应用都会<span style="color: black;">更易</span>(<span style="color: black;">研发</span>时间<span style="color: black;">优良</span>)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">3. 可解释性、可追溯性和<span style="color: black;">拜访</span><span style="color: black;">掌控</span>方面都更好(治理<span style="color: black;">优良</span>)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">下面深入介绍这些<span style="color: black;">优良</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">1. 准确度更高且答案更有用</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">GraphRAG 的<span style="color: black;">第1</span>个<span style="color: black;">优良</span>(<span style="color: black;">亦</span>是最直接可见的<span style="color: black;">优良</span>)是其响应质量更高。不管是学术界还是产业界,<span style="color: black;">咱们</span>都能看到<span style="color: black;">非常多</span>证据支持这一观察。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">例如</span>这个来自数据目录<span style="color: black;">机构</span> Data.world 的示例。2023 年底,<span style="color: black;">她们</span>发布了一份<span style="color: black;">科研</span>报告,<span style="color: black;">显示</span>在 43 个业务问题上,GraphRAG 可将 LLM 响应的准确度平均<span style="color: black;">提高</span> 3 倍。这项基准评测<span style="color: black;">科研</span>给出了知识图谱能大幅<span style="color: black;">提高</span>响应准确度的证据。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/63db0e9d8cae41dfa0e7e6783091bd89~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=bg%2BtsDEQ9%2Bg2BtSROJUi3EBdL%2FM%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">知识图谱将 LLM 响应的准确度<span style="color: black;">提高</span>了 54.2 个百分点,<span style="color: black;">亦</span><span style="color: black;">便是</span>大约<span style="color: black;">提高</span>了 3 倍</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">微软<span style="color: black;">亦</span>给出了一系列证据,<span style="color: black;">包含</span> 2024 年 2 月的一篇<span style="color: black;">科研</span>博客《GraphRAG: Unlocking LLM discovery on narrative private data》以及<span style="color: black;">关联</span>的<span style="color: black;">科研</span>论文《From Local to Global: A Graph RAG Approach to Query-Focused Summarization》和软件:https://github.com/microsoft/graphrag(即上文开篇<span style="color: black;">说到</span>的 GraphRAG)。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">其中,<span style="color: black;">她们</span>观察到<span style="color: black;">运用</span>向量的基线 RAG 存在以下两个问题:</span></p><span style="color: black;">基线 RAG 难以将点连接起来。为了综合<span style="color: black;">区别</span>的信息来<span style="color: black;">得到</span>新见解,需要<span style="color: black;">经过</span>共享属性遍历<span style="color: black;">区别</span>的信息片段,<span style="color: black;">此时</span>候,基线 RAG 就难以将<span style="color: black;">区别</span>的信息片段连接起来。</span><span style="color: black;">当被<span style="color: black;">需求</span>全面理解在大型数据集合<span style="color: black;">乃至</span>单个大型文档上归纳总结的语义概念时,基线 RAG 表现<span style="color: black;">不良</span>。</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">微软<span style="color: black;">发掘</span>:「<span style="color: black;">经过</span><span style="color: black;">运用</span> LLM 生成的知识图谱,GraphRAG <span style="color: black;">能够</span>大幅<span style="color: black;">提高</span> RAG 的「检索」部分,为上下文窗口填入<span style="color: black;">关联</span>性更高的内容,从而得到更好的答案并获取证据<span style="color: black;">源自</span>。」他们还<span style="color: black;">发掘</span>,相比于其它替代<span style="color: black;">办法</span>,GraphRAG 所需的 token 数量<span style="color: black;">能够</span>少 26% 到 97%,<span style="color: black;">因此呢</span>其不仅能给出更好的答案,<span style="color: black;">况且</span>成本更低,扩展性<span style="color: black;">亦</span>更好。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">进一步深入准确度方面,<span style="color: black;">咱们</span><span style="color: black;">晓得</span>答案正确固然重要,但答案<span style="color: black;">亦</span>要有用才行。人们<span style="color: black;">发掘</span>,GraphRAG 不仅能让答案更准确,<span style="color: black;">况且</span>还能让答案更丰富、更完整、更有用。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">领英近期的论文《Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering》<span style="color: black;">便是</span>一个出色的范例,其中描述了 GraphRAG 对其客户服务应用的影响。GraphRAG <span style="color: black;">提高</span>了其客户服务答案的正确性和丰富度,<span style="color: black;">亦</span><span style="color: black;">因此呢</span>让答案更加有用,还让其客户服务团队<span style="color: black;">处理</span><span style="color: black;">每一个</span>问题的时间中位数降低了 28.6%。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Neo4j 的生成式 AI 研讨会<span style="color: black;">亦</span>有一个类似的例子。如下所示,这是针对一组 SEC 备案文件,「向量 + GraphRAG」与「仅向量」<span style="color: black;">办法</span>得到的答案:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/d7877eb33e6543879bbdbb8f2a577e76~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=F%2BgTHOZzLbaBevIuoCLzpg7RfTE%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">「仅向量」与「向量 + GraphRAG」<span style="color: black;">办法</span>对比</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">请<span style="color: black;">重视</span>「描述可能受锂短缺影响的<span style="color: black;">机构</span>的特征」与「列出可能受影响的<span style="color: black;">详细</span><span style="color: black;">机构</span>」之间的区别。<span style="color: black;">倘若</span>你是一位想要<span style="color: black;">按照</span>市场变化重新平衡投资组合的投资者,或一家想要<span style="color: black;">按照</span>自然灾害重新<span style="color: black;">调节</span>供应链的<span style="color: black;">机构</span>,<span style="color: black;">那样</span>上图右侧的信息肯定比左侧的重要得多。<span style="color: black;">这儿</span>,这两个答案都是准确的。但右侧答案<span style="color: black;">显著</span>更有用。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Jesus Barrasa 的《Going Meta》节目第 23 期给出了另一个绝佳示例:从词汇图谱<span style="color: black;">起始</span><span style="color: black;">运用</span>法律文件。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">咱们</span><span style="color: black;">亦</span>时不时会看到来自学术界和产业界的新示例。<span style="color: black;">例如</span> Lettria 的 Charles Borderie 就给出了一个「仅向量」与「向量 + GraphRAG」<span style="color: black;">办法</span>的对比示例;其中 GraphRAG 依托于一个基于 LLM 的文本到图谱工作流程,将 10,000 篇金融<span style="color: black;">文案</span>整理<span style="color: black;">成为了</span>一个知识图谱:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/f090aec8e850485ca799a7d1c6dbf83c~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=860%2FuqAOJULtxVxYFWG0%2FjPOEMg%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">仅检索器<span style="color: black;">办法</span>与图检索器<span style="color: black;">办法</span>的对比</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">能够</span>看到,相比于<span style="color: black;">运用</span>普通 RAG,<span style="color: black;">运用</span> GraphRAG 不仅能<span style="color: black;">提高</span>答案的质量,并且其答案的 token 数量<span style="color: black;">亦</span>少了三分之一。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">再举一个来自 Writer 的例子。<span style="color: black;">她们</span><span style="color: black;">近期</span>发布了一份基于 RobustQA 框架的 RAG 基准评测报告,其中对比了<span style="color: black;">她们</span>的基于 GraphRAG 的<span style="color: black;">办法</span>与其它同类工具。GraphRAG 得到的分数是 86%,<span style="color: black;">显著</span>优于其它<span style="color: black;">办法</span>(在 33% 到 76% 之间),<span style="color: black;">同期</span>还有相近或更好的延迟性能。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/4547d66cb5ac4ff78ef53826dfcd53ca~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=HjpcStFkzRTpL0ILaRo6u0rofmk%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">RAG <span style="color: black;">办法</span>的准确度和响应时间<span style="color: black;">评定</span>结果</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">GraphRAG 正在给多种多样的生成式 AI 应用带去助益。知识图谱打开了让生成式 AI 的结果更准确和更有用的道路。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">2. 数据理解得到<span style="color: black;">提高</span>,迭代速度更快</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不管是概念上还是视觉上,知识图谱都很直观。探索知识图谱<span style="color: black;">常常</span>能带来新的见解。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">非常多</span>知识图谱用户都分享了<span style="color: black;">这般</span>的意外收获:一旦投入心力完<span style="color: black;">成为了</span>自己的知识图谱,<span style="color: black;">那样</span>它就能以一种意想不到的方式<span style="color: black;">帮忙</span><span style="color: black;">她们</span>构建和调试自己的生成式 AI 应用。部分<span style="color: black;">原由</span>是<span style="color: black;">倘若</span>能以图谱的形式看待数据,那便能看到这些应用底层的数据呈现出了一副生动的数据图景。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">图谱能让你追溯答案,找到数据,并一路追溯其因果链。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">咱们</span>来<span style="color: black;">瞧瞧</span>上面<span style="color: black;">相关</span>锂短缺的例子。<span style="color: black;">倘若</span>你可视化其向量,<span style="color: black;">那样</span>你会得到类似下图的结果,只<span style="color: black;">不外</span>行列数量都<span style="color: black;">更加多</span>。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/b5aef96ee97040d384fada163e032789~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=az0GDw32WuS4PscrI3D0W%2FMi5Bk%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">向量可视化</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">而<span style="color: black;">倘若</span>将数据转换成图谱,则你能以一种向量<span style="color: black;">暗示</span>做不到的方式来理解它。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">以下是 LlamaIndex <span style="color: black;">近期</span>的网络研讨会上的一个例子,展示了<span style="color: black;">她们</span><span style="color: black;">运用</span>「MENTIONS(提及)」关系提取向量化词块(词汇图谱)和 LLM 提取实体(<span style="color: black;">行业</span>图谱)的图谱并将两者联系起来的能力:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/673a1ea46b184391a8279d46f7ac2bdc~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=GA12MKU8IHWnGr6rJaQ09JChGfw%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">提取词汇图谱和<span style="color: black;">行业</span>图谱</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">(<span style="color: black;">亦</span>有<span style="color: black;">非常多</span><span style="color: black;">运用</span> Langchain、Haystack 和 SpringAI 等工具的例子。)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">你<span style="color: black;">能够</span>看到此图中数据的丰富结构,<span style="color: black;">亦</span>能想象其所能带来的新的<span style="color: black;">研发</span>和调试可能性。其中,各个数据都有各自的值,而结构本身<span style="color: black;">亦</span>存储和传达了额外的含义,你可将其用于<span style="color: black;">提高</span>应用的智能水平。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这不仅是可视化。这<span style="color: black;">亦</span>是让你的数据结构能传达和存储<span style="color: black;">道理</span>。下面是一位来自一家著名金融科技<span style="color: black;">机构</span>的<span style="color: black;">研发</span>者的反应,当时<span style="color: black;">她们</span>刚把知识图谱引入 RAG 工作流程<span style="color: black;">1星期</span>时间:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/bad277bf178e4844b29b5f2967dc4e47~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=1a66VpD3%2FMIDIJ1q4k7dXtnwKUA%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">研发</span>者对 GraphRAG 的反应</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这位<span style="color: black;">研发</span>者的反应非常符合「测试驱动的<span style="color: black;">研发</span>」假设,即验证(而非信任)答案<span style="color: black;">是不是</span>正确。就我个人而言,<span style="color: black;">倘若</span>让我百分之百地将自主权交给决策完全不透明的 AI,我会感到毛骨悚然。更<span style="color: black;">详细</span>而言,就算你不是一个 AI 末日论者,你<span style="color: black;">亦</span>会同意:<span style="color: black;">倘若</span>能不将与「Apple, Inc.」<span style="color: black;">相关</span>的词块或文档映射到「Apple Corps」(这是两家完全不<span style="color: black;">同样</span>的<span style="color: black;">机构</span>),确实会大有价值。<span style="color: black;">因为</span>推动生成式 AI 决策的<span style="color: black;">最后</span>还是数据,<span style="color: black;">因此呢</span><span style="color: black;">能够</span>说<span style="color: black;">评定</span>和<span style="color: black;">保证</span>数据正确性才是最至关重要的。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">3. 治理:可解释性、安全及<span style="color: black;">更加多</span></span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">生成式 AI 决策的影响越大,你就越需要说服在决策出错时需要<span style="color: black;">最后</span>负责的人。这<span style="color: black;">一般</span><span style="color: black;">触及</span>到审计<span style="color: black;">每一个</span>决策。这就需要<span style="color: black;">靠谱</span>且重复的优良决策记录。但这还<span style="color: black;">不足</span>。在采纳或放弃一个决策时,你还需要解释其<span style="color: black;">背面</span>的<span style="color: black;">原由</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">LLM 本身没法很好地做到这一点。是的,你<span style="color: black;">能够</span>参考用于得到该决策的文档。但这些文档并<span style="color: black;">不可</span>解释这个决策本身 —— 更别说 LLM 还会编造参考<span style="color: black;">源自</span>。知识图谱则完全在另一个层面上,能让生成式 AI 的推理<span style="color: black;">规律</span>更加明晰,<span style="color: black;">亦</span><span style="color: black;">更易</span>解释输入。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">继续来看上面的一个例子:Lettria 的 Charles 将从 10,000 篇金融<span style="color: black;">文案</span>提取出的实体载入到了一个知识图谱中,并搭配一个 LLM 来执行 GraphRAG。<span style="color: black;">咱们</span>看到这确实能<span style="color: black;">供给</span>更好的答案。<span style="color: black;">咱们</span>来<span style="color: black;">瞧瞧</span>这些数据:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/2e5073944b9a4231a889e3e55387b711~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=E%2F8wZ0dnKCC0JJRt4aves3V533g%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">将从 10,000 篇金融<span style="color: black;">文案</span>提取出的实体载入知识图谱</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">首要</span>,将数据看作图谱。<span style="color: black;">另一</span>,<span style="color: black;">咱们</span><span style="color: black;">亦</span><span style="color: black;">能够</span>导览和<span style="color: black;">查找</span>这些数据,还能随时修正和更新它们。其治理<span style="color: black;">优良</span>在于:查看和审计这些数据的「世界模型」变得简单了<span style="color: black;">非常多</span>。相较于<span style="color: black;">运用</span>同一数据的向量版本,<span style="color: black;">运用</span>图谱让<span style="color: black;">最后</span>负责人更可能理<span style="color: black;">处理</span>策<span style="color: black;">背面</span>的<span style="color: black;">原由</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在<span style="color: black;">保证</span>质量方面,<span style="color: black;">倘若</span>能将数据放在知识图谱中,则就能更<span style="color: black;">容易</span>地找到其中的错误和意外并且追溯它们的源头。你还能在图谱中获取<span style="color: black;">源自</span>和置信度信息,<span style="color: black;">而后</span>将其用于计算以及解释。而<span style="color: black;">运用</span><span style="color: black;">一样</span>数据的仅向量版本<span style="color: black;">基本</span>就<span style="color: black;">没法</span>做到这一点,正如<span style="color: black;">咱们</span>之前讨论的那样,<span style="color: black;">通常</span>人(<span style="color: black;">乃至</span>不<span style="color: black;">通常</span>的人)都很难理解向量化的数据。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">知识图谱还<span style="color: black;">能够</span><span style="color: black;">明显</span><span style="color: black;">加强</span>安全性和隐私性。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在构建原型设计时,安全性和隐私性<span style="color: black;">一般</span>不是很重要,但<span style="color: black;">倘若</span>要将其打<span style="color: black;">导致</span><span style="color: black;">制品</span>,那这就至关重要了。在银行或医疗等受监管的行业,任何员工的数据<span style="color: black;">拜访</span>权限都取决于其工作岗位。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不管是 LLM 还是向量数据库,都<span style="color: black;">无</span>很好的方法来限制数据的<span style="color: black;">拜访</span>范围。知识图谱却能<span style="color: black;">供给</span>很好的<span style="color: black;">处理</span><span style="color: black;">方法</span>,<span style="color: black;">经过</span>权限<span style="color: black;">掌控</span>来规范参与者可<span style="color: black;">拜访</span>数据库的范围,不让<span style="color: black;">她们</span>看到不<span style="color: black;">准许</span><span style="color: black;">她们</span>看的数据。下面是一个可在知识图谱中实现细粒度权限<span style="color: black;">掌控</span>的简单安全策略:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/db4bc5234abe4f8da1631ad8940ab8e0~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=eMav0pll0PZkkz%2FzQR3nk8Vzj7g%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">可在知识图谱中实现的一种简单安全策略</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">创建知识图谱</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">构建知识图谱需要什么?<span style="color: black;">第1</span>步是<span style="color: black;">认识</span>两种与生成式 AI 应用最相关的图谱。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">行业</span>图谱(domain graph)<span style="color: black;">暗示</span>的是与当前应用<span style="color: black;">关联</span>的世界模型。<span style="color: black;">这儿</span>有一个简单示例:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/4848c3bc19ad4a8185435ac749290ae6~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=F4DZtqVW%2BitamgXxIVQVNgoAk4c%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">行业</span>图谱</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">词汇图谱(lexical graph)则是文档结构的图谱。最基本的词汇图谱由词块<span style="color: black;">形成</span>的节点<span style="color: black;">构成</span>:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/ad31b385ad42426cb483387e2b9729b5~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=3s22KCnBKI3h73RszTvdLJvZv5M%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">词汇图谱</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">人们<span style="color: black;">常常</span>会对其进行扩展,以<span style="color: black;">包括</span>词块、文档对象(<span style="color: black;">例如</span>表格)、章节、段落、页码、文档名<span style="color: black;">叫作</span>或编号、文集、<span style="color: black;">源自</span>等之间的关系。你还<span style="color: black;">能够</span>将<span style="color: black;">行业</span>图谱和词汇图谱组合到<span style="color: black;">一块</span>,如下所示:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/5def24208f72415f881773bd8d69cae8~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=QaNvtnNCUBti9AmWUtNeqDd1coI%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">将<span style="color: black;">行业</span>层和词汇层组合起来</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">词汇图谱的创建很简单,<span style="color: black;">重点</span><span style="color: black;">便是</span>简单的解析和分块。至于<span style="color: black;">行业</span>图谱,则<span style="color: black;">按照</span>数据<span style="color: black;">源自</span>(来自结构化数据源还是非结构化数据源<span style="color: black;">或</span>两种<span style="color: black;">源自</span>都有)的<span style="color: black;">区别</span>,有<span style="color: black;">区别</span>的创建路径。幸运的是,从非结构化数据源创建知识图谱的工具正在飞速发展。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">举个例子,新的 Neo4j Knowledge Graph Builder <span style="color: black;">能够</span><span style="color: black;">运用</span> PDF 文档、网页、YouTube 视频、维基百科<span style="color: black;">文案</span>来自动创建知识图谱。<span style="color: black;">全部</span>过程非常简单,点几下按钮<span style="color: black;">就可</span>,<span style="color: black;">而后</span>你就能可视化和<span style="color: black;">查找</span>你输入的文本的<span style="color: black;">行业</span>和词汇图谱。这个工具很强大,<span style="color: black;">亦</span><span style="color: black;">特别有</span>趣,能<span style="color: black;">极重</span>降低创建知识图谱的门槛。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">至于结构化数据(<span style="color: black;">例如</span>你的<span style="color: black;">机构</span>存储的<span style="color: black;">相关</span>客户、<span style="color: black;">制品</span>、地理位置等的结构化数据),则能直接映射成知识图谱。举个例子,<span style="color: black;">针对</span>最<span style="color: black;">平常</span>的存储在关系数据库中的结构化数据,<span style="color: black;">能够</span><span style="color: black;">运用</span><span style="color: black;">有些</span>标准工具基于经过验证的<span style="color: black;">靠谱</span>规则将关系映射成图谱。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">运用</span>知识图谱</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">有了知识图谱后,就<span style="color: black;">能够</span>做 GraphRAG 了,为此有很多框架可选,<span style="color: black;">例如</span> LlamaIndex Property Graph Index、Langchain 整合的 Neo4j 以及 Haystack 整合的版本。这个<span style="color: black;">行业</span>发展<span style="color: black;">火速</span>,但<span style="color: black;">此刻</span>编程<span style="color: black;">办法</span>正在变得非常简单。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在图谱创建方面<span style="color: black;">亦</span>是如此,<span style="color: black;">此刻</span><span style="color: black;">已然</span><span style="color: black;">显现</span>了 Neo4j Importer(可<span style="color: black;">经过</span>图形化界面将表格数据导入和映射为图谱)和前面<span style="color: black;">说到</span>的 Neo4j Knowledge Graph Builder 等工具。下图总结了构建知识图谱的<span style="color: black;">过程</span>。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/c02da27bb5b14502a73618c455a5c2ef~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=NnkzUIWPddwzN75%2B8uLVEZM%2F8Yw%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">自动构建用于生成式 AI 的知识图谱</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">运用</span>知识图谱还能将人类语言的问题映射成图谱数据库<span style="color: black;">查找</span>。Neo4j 发布了一款开源工具 NeoConverse,可<span style="color: black;">帮忙</span><span style="color: black;">运用</span>自然语言来<span style="color: black;">查找</span>知识图谱:https://neo4j.com/labs/genai-ecosystem/neoconverse/</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">虽然<span style="color: black;">起始</span><span style="color: black;">运用</span>图谱时确实需要花一番功夫来学习,但好<span style="color: black;">信息</span>是随着工具的发展,这会越来越简单。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">总结:GraphRAG 是 RAG 的必定<span style="color: black;">将来</span></span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">LLM 固有的基于词的计算和语言技能加上基于向量的 RAG 能带来非常好的结果。为了稳定地得到好结果,就必须超越字符串层面,构建词模型之上的世界模型。<span style="color: black;">一样</span>地,谷歌<span style="color: black;">发掘</span>为了<span style="color: black;">把握</span>搜索能力,<span style="color: black;">她们</span>就必须超越单纯的文本分析,绘制出字符串所<span style="color: black;">表率</span>的事物之间的关系。<span style="color: black;">咱们</span><span style="color: black;">起始</span>看到 AI 世界<span style="color: black;">亦</span>正在<span style="color: black;">显现</span><span style="color: black;">一样</span>的模式。这个模式<span style="color: black;">便是</span> GraphRAG。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">技术的发展曲线呈现出 S 型:一项技术达到顶峰后,另一项技术便会推动进步并超越前者。随着生成式 AI 的发展,<span style="color: black;">关联</span>应用的<span style="color: black;">需求</span><span style="color: black;">亦</span>会<span style="color: black;">提高</span> —— 从高质量答案到可解释性再到对数据<span style="color: black;">拜访</span>权限的细粒度<span style="color: black;">掌控</span>以及隐私和安全,知识图谱的价值<span style="color: black;">亦</span>会随之愈发凸显。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/be131bf130934339ac77878e830c8f2c~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1728100731&amp;x-signature=hj7nsQXFKBjRrzqOCu8ItCk4kOw%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">生成式 AI 的进化</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">你的下一个生成式 AI 应用很可能就会用上知识图谱。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">参考链接:https://neo4j.com/blog/graphrag-manifesto/</span></p>




页: [1]
查看完整版本: 微软开源的GraphRAG爆火,生成式AI进入知识图谱时代?