5ep9lzv 发表于 2024-8-17 14:38:59

首个GPU高级语言,大规模并行就像写Python,已获8500 Star


    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">设备</span>之心<span style="color: black;">报告</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">编辑:泽南、小舟</strong></p>最多可支持 10000+ 个并发线程。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">经过近 10 年的不懈<span style="color: black;">奋斗</span>,对计算机科学核心的深入<span style="color: black;">科研</span>,人们<span style="color: black;">最终</span>实现了一个梦想:在 GPU 上运行高级语言。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">上周末,一种名为 Bend 的编程语言在开源社区<span style="color: black;">诱发</span>了热烈的讨论,GitHub 的 Star 量<span style="color: black;">已然</span>超过了 8500。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="//q9.itc.cn/images01/20240520/1aeaf212afa14a20a40e770a2a8ea957.jpeg" style="width: 50%; margin-bottom: 20px;"></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">GitHub:https://github.com/HigherOrderCO/Bend</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">做为</span>一种大规模并行的高级编程语言,它仍<span style="color: black;">处在</span><span style="color: black;">科研</span><span style="color: black;">周期</span>,但提出的思路<span style="color: black;">已然</span>让人们感到非常惊讶。<span style="color: black;">运用</span> Bend,你<span style="color: black;">能够</span>为多核 CPU/GPU 编写并行代码,而无需<span style="color: black;">作为</span><span style="color: black;">拥有</span> 10 年经验的 C/CUDA 专家,感觉就像 Python <span style="color: black;">同样</span>!</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="//q3.itc.cn/images01/20240520/bcc7ca7344434587b38ac03da474bd68.gif" style="width: 50%; margin-bottom: 20px;"></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">是的,Bend 采用了 Python 语法。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">与 CUDA、Metal 等低级替代<span style="color: black;">方法</span><span style="color: black;">区别</span>,Bend <span style="color: black;">拥有</span> Python、Haskell 等表达性语言的功能,<span style="color: black;">包含</span>快速对象分配、完全闭包支持的高阶函数、无限制的递归,<span style="color: black;">乃至</span> continuation。Bend 运行在大规模并行硬件上,<span style="color: black;">拥有</span>基于核心数量的近线性加速。Bend 由 HVM2 运行时<span style="color: black;">供给</span>支持。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">该项目的<span style="color: black;">重点</span>贡献者 Victor Taelin 来自巴西,他在 X 平台上分享了 Bend 的<span style="color: black;">重点</span>特性和<span style="color: black;">研发</span>思路。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">首要</span>,Bend 不适用于现代<span style="color: black;">设备</span>学习算法,<span style="color: black;">由于</span>这些算法是高度正则化的(矩阵乘法),<span style="color: black;">拥有</span>预先分配的内存,并且<span style="color: black;">一般</span><span style="color: black;">已然</span>有编写好的 CUDA 内核。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Bend 的巨大<span style="color: black;">优良</span><span style="color: black;">表现</span>在<span style="color: black;">实质</span>应用中,这是<span style="color: black;">由于</span>「真正的应用程序」<span style="color: black;">一般</span><span style="color: black;">无</span>预算来制作专用的 GPU 内核。试问,谁在 CUDA 中制作了网站?<span style="color: black;">况且</span>,即使有人<span style="color: black;">这般</span>做了,<span style="color: black;">亦</span>是不可行的,<span style="color: black;">由于</span>:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">1. 真正的应用程序<span style="color: black;">必须</span>从许多<span style="color: black;">区别</span>的库导入函数,<span style="color: black;">没法</span>为它们编写 CUDA 内核;</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">2. 真实的应用程序具有动态函数和闭包;</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">3. 真实的应用程序会动态且不可预测地分配<span style="color: black;">海量</span>内存。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Bend 完<span style="color: black;">成为了</span><span style="color: black;">有些</span>新的尝试,并且在某些<span style="color: black;">状况</span>下<span style="color: black;">能够</span>相当快,但<span style="color: black;">此刻</span>想写大语言模型肯定是不行的。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">作者对比了一下旧<span style="color: black;">办法</span>和新的<span style="color: black;">办法</span>,<span style="color: black;">运用</span>相同的算法树中的双调排序,<span style="color: black;">触及</span> JSON 分配和操作。Node.js 的速度是 3.5 秒(Apple M3 Max),Bend 的速度是 0.5 秒(NVIDIA RTX 4090)。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">是的,<span style="color: black;">日前</span> Bend <span style="color: black;">必须</span>整块 GPU <span style="color: black;">才可</span>在一个核心上击败 Node.js。但另一方面,这还是一个初生的新<span style="color: black;">办法</span>与大<span style="color: black;">机构</span>(Google)优化了 16 年的 JIT 编译器在进行比较。<span style="color: black;">将来</span>还有<span style="color: black;">非常多</span>可能性。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">怎样</span><span style="color: black;">运用</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在 GitHub 上,作者简要介绍了 Bend 的<span style="color: black;">运用</span>流程。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">首要</span>,安装 Rust。<span style="color: black;">倘若</span>你想<span style="color: black;">运用</span> C 运行时,请安装 C 编译器(例如 GCC 或 Clang);<span style="color: black;">倘若</span>要<span style="color: black;">运用</span> CUDA 运行时,请安装 CUDA 工具包(CUDA 和 nvcc)版本 12.x。Bend <span style="color: black;">日前</span>仅支持 Nvidia GPU。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">而后</span>,安装 HVM2 和 Bend:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">cargo +nightly install hvm</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">cargo +nightly install bend-lang</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">最后,编写<span style="color: black;">有些</span> Bend 文件,并<span style="color: black;">运用</span>以下命令之一运行它:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">bend run &lt;file.bend&gt; # uses the Rust interpreter (sequential)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">bend run-c &lt;file.bend&gt; # uses the C interpreter (parallel)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">bend run-cu &lt;file.bend&gt; # uses the CUDA interpreter (massively parallel)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">你还<span style="color: black;">能够</span><span style="color: black;">运用</span> gen-c 和 gen-cu 将 Bend 编译为独立的 C/CUDA 文件,以<span style="color: black;">得到</span>最佳性能。但 gen-c、gen-cu 仍<span style="color: black;">处在</span>起步<span style="color: black;">周期</span>,远<span style="color: black;">无</span>像 GCC 和 GHC <span style="color: black;">这般</span>的 SOTA 编译器<span style="color: black;">那样</span>成熟。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">Bend 中的并行编程</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这儿</span>举例说明<span style="color: black;">能够</span>在 Bend 中并行运行的程序。例如,表达式:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">(((1 + 2) + 3) + 4)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不可</span>并行运行,<span style="color: black;">由于</span> + 4 取决于 + 3,而 + 3 又取决于 (1+2)。而表达式:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">((1 + 2) + (3 + 4))</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">能够</span>并行运行,<span style="color: black;">由于</span> (1+2) 和 (3+4) 是独立的。Bend 并行运行的<span style="color: black;">要求</span><span style="color: black;">便是</span>符合并行<span style="color: black;">规律</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">再来看一个更完整的代码示例:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"># Sorting Network = just rotate trees!</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">def sort (d, s, tree):</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">switch d:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">case 0:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">return tree</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">case _:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">(x,y) = tree</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">lft = sort (d-1, 0, x)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">rgt = sort (d-1, 1, y)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">return rots (d, s, lft, rgt)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"># Rotates sub-trees (Blue/Green Box)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">def rots (d, s, tree):</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">switch d:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">case 0:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">return tree</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">case _:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">(x,y) = tree</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">return down (d, s, warp (d-1, s, x, y))</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">该文件实现了<span style="color: black;">拥有</span>不可变树旋转的双调排序器。它不是<span style="color: black;">非常多</span>人期望的在 GPU 上快速运行的算法。然而,<span style="color: black;">因为</span>它<span style="color: black;">运用</span>本质上并行的分治<span style="color: black;">办法</span>,<span style="color: black;">因此呢</span> Bend 会以多线程方式运行它。<span style="color: black;">有些</span>速度基准:</p>CPU,Apple M3 Max,1 个线程:12.15 秒 CPU,Apple M3 Max,16 线程:0.96 秒 GPU,NVIDIA RTX 4090,16k 线程:0.21 秒<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">不执行任何操作<span style="color: black;">就可</span>实现 57 倍的加速。<span style="color: black;">无</span>线程产生,<span style="color: black;">无</span>锁、互斥锁的显式管理。<span style="color: black;">咱们</span>只是<span style="color: black;">需求</span> Bend 在 RTX 上运行<span style="color: black;">咱们</span>的程序,就这么简单。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Bend 不限于特定范例,例如张量或矩阵。任何的并发系统,从着色器到类 Erlang 的 actor 模型都<span style="color: black;">能够</span>在 Bend 上进行模拟。例如,要实时渲染图像,<span style="color: black;">咱们</span><span style="color: black;">能够</span>简单地在<span style="color: black;">每一个</span>帧上分配一个不可变的树:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"># given a shader, returns a square image</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">def render (depth, shader):</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">bend d = 0, i = 0:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">when d &lt; depth:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">color = (fork (d+1, i*2+0), fork (d+1, i*2+1))</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">else:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">width = depth / 2</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">color = shader (i % width, i /width)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">return color</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"># given a position, returns a color</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"># for this demo, it just busy loops</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">def demo_shader (x, y):</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">bend i = 0:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">when i &lt; 5000:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">color = fork (i + 1)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">else:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">color = 0x000001</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">return color</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"># renders a 256x256 image using demo_shader</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">def main:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">return render (16, demo_shader)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">它确实会起<span style="color: black;">功效</span>,即使<span style="color: black;">触及</span>的算法在 Bend 上<span style="color: black;">亦</span>能很好地并行。长距离通信<span style="color: black;">经过</span>全局 beta 缩减(<span style="color: black;">按照</span>交互演算)执行,并<span style="color: black;">经过</span> HVM2 的原子链接器正确有效地同步。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">最后,作者<span style="color: black;">暗示</span> Bend <span style="color: black;">此刻</span>仅仅是<span style="color: black;">第1</span>个版本,还<span style="color: black;">无</span>在合适的编译器上投入太多精力。<span style="color: black;">大众</span><span style="color: black;">能够</span>预期<span style="color: black;">将来</span><span style="color: black;">每一个</span>版本的原始性能都会大幅<span style="color: black;">加强</span>。而<span style="color: black;">此刻</span>,<span style="color: black;">咱们</span><span style="color: black;">已然</span><span style="color: black;">能够</span><span style="color: black;">运用</span>解释器,从 Python 高级语言的<span style="color: black;">方向</span>一睹大规模并行编程的样子了。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考内容:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://news.ycombinator.com/item?id=40390287</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://x.com/VictorTaelin?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://x.com/DrJimFan/status/1791514371086250291<a style="color: black;"><span style="color: black;">返回<span style="color: black;">外链论坛:www.fok120.com</span>,查看<span style="color: black;">更加多</span></span></a></p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">责任编辑:网友投稿</span></p>




nykek5i 发表于 2024-10-4 18:28:13

期待楼主的下一次分享!”

qzmjef 发表于 2024-10-11 02:38:25

“BS”(鄙视的缩写)‌
页: [1]
查看完整版本: 首个GPU高级语言,大规模并行就像写Python,已获8500 Star