38.7fps!EdgeSAM = RepViT + SAM,移动端超强变种,已开源!
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">点击下方</span><strong style="color: blue;"><span style="color: black;">卡片</span></strong><span style="color: black;">,关注</span><span style="color: black;">「</span><span style="color: black;">AIWalker</span><span style="color: black;">」</span><span style="color: black;">公众号</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">底层视觉干货,<span style="color: black;">就可</span>获取</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">SAM轻量化的终点竟然是RepViT + SAM</strong>,移动端速度<span style="color: black;">达到</span>38.7fps。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">针对</span> 2023 年的计算机视觉<span style="color: black;">行业</span><span style="color: black;">来讲</span>,「分割一切」(Segment Anything Model)是备受关注的一项<span style="color: black;">科研</span><span style="color: black;">发展</span>。尽管SAM<span style="color: black;">拥有</span><span style="color: black;">各样</span><span style="color: black;">优良</span>,但速度慢是其不得不提的一个缺点,端侧<span style="color: black;">基本</span>就跑不动。<span style="color: black;">科研</span>者们<span style="color: black;">亦</span>提出了<span style="color: black;">有些</span>改进策略:<strong style="color: blue;">将默认 ViT-H 图像编码器中的知识提炼到一个微小的 ViT 图像编码器中,<span style="color: black;">或</span><span style="color: black;">运用</span>基于 CNN 的实时架构降低用于 Segment Anything 任务的计算成本</strong>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">就在今日,arXiv上<span style="color: black;">同期</span>公开两篇SAM轻量化的<span style="color: black;">办法</span><strong style="color: blue;">EdgeSAM</strong>、<strong style="color: blue;">RepViT-SAM</strong>,更巧合的是两者采用了<strong style="color: blue;">完全相同的Image Encoder模块:RepViT</strong>;两者<span style="color: black;">亦</span>都在手机端达到了超快处理速度,值得一提的是:<strong style="color: blue;">EdgeSAM能在iphone14手机上达到38.7fps的处理速度</strong>。</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/VvkhdVVVIDgBoia6M27K1axSicTqbU8tGcPblcGNVP1467OVUNZKDBgxscN8KIz5Gc2TZEIAS8pGYicPby28GUPJQ/640?wx_fmt=png&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://arxiv.org/abs/2312.05760</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://github.com/THU-MIG/RepViT</p>在AIWalker后台回复【<strong style="color: blue;">RepViT-SAM</strong>】<span style="color: black;">就可</span>下载原文与中文译文
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">该<span style="color: black;">方法</span>延续了<a style="color: black;">MobileSAM</a>的处理方式,即采用原生SAM的ViT Encoder模块对所替换的Encoder模块进行知识蒸馏。</p>在实现方面,RepViT-SAM引入了移动端新秀<a style="color: black;">RepViT</a>的RepViT-M2.3<span style="color: black;">做为</span>图像编码器提取图像特征;在老师模型方面,它<span style="color: black;">选择</span>了SAM-ViT-H版本进行蒸馏。在应用方面,该<span style="color: black;">方法</span>进行了多种任务适配,如Mask预测、边缘检测等。<img src="https://mmbiz.qpic.cn/sz_mmbiz_png/VvkhdVVVIDgBoia6M27K1axSicTqbU8tGcvQWXiaPyxdbNntDfY7guBf3yfxQwn65F78wDiaC4xAjfojVhelzjyjbg/640?wx_fmt=png&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/VvkhdVVVIDgBoia6M27K1axSicTqbU8tGcaR0ztF8ERxJcFgbyOHSIjJyaecMgSxvYrwfwxWoxFjhrXyaBGNDzTQ/640?wx_fmt=png&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/VvkhdVVVIDgBoia6M27K1axSicTqbU8tGcBtc1OuicZf0Bjt7Fu1X8qFd33kVjKQBXqEVAF3vkibyerfDjKv13WQiaA/640?wx_fmt=png&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://arxiv.org/abs/2312.06660</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://github.com/chongzhou96/EdgeSAM</p>在AIWalker后台回复【<strong style="color: blue;">EdgeSAM</strong>】<span style="color: black;">就可</span>下载原文与中文译文
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/VvkhdVVVIDgBoia6M27K1axSicTqbU8tGcqV7RZO3TI3fEktDwqCgn15IFn38rniahc48wiaAwNNXdcGYTUgmg9DfA/640?wx_fmt=png&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">相比而言,EdgeSAM<span style="color: black;">办法</span>上会<span style="color: black;">显出</span>更优异:它并非仅仅参考MobileSAM进行了Image Encoder的蒸馏,还仔细分析了<span style="color: black;">区别</span>蒸馏策略并证实:<strong style="color: blue;">任务不可知的编码器蒸馏难以学习到SAM所具备的<span style="color: black;">所有</span>知识</strong>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">有鉴于此,作者提出:<strong style="color: blue;">循环<span style="color: black;">运用</span>bbox与point提示词,<span style="color: black;">同期</span>对提示词编码器与Mak解码器进行蒸馏,以便于蒸馏模型能够准确的学习到提示词与Mask之间的<span style="color: black;">繁杂</span>关系</strong>。</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/VvkhdVVVIDgBoia6M27K1axSicTqbU8tGcormkNAv9DWfyUKlT5H6VKuQ5Xb66ebl7AgEmLBGukiaElP3z0E3ucSQ/640?wx_fmt=png&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">在2080Ti上,相比原生SAM,EdgeSAM推理速度快40倍;在iPhone14上,相比MobileSAM,EdgeSAM推理速度快14倍,达到了38.7fps。<img src="https://mmbiz.qpic.cn/sz_mmbiz_png/VvkhdVVVIDgBoia6M27K1axSicTqbU8tGc5WWYSa2OfKKKny4Ty1Gu44q33iahnPzKVORWxwpicYibNlE9Zicz7kEMUA/640?wx_fmt=png&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/VvkhdVVVIDgBoia6M27K1axSicTqbU8tGc77zh9MjEhba0OrHD4ScHoyQB36FeqPKO43PZ2ficAh4sOib8aYvGkPLg/640?wx_fmt=png&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在AIWalker后台回复【<strong style="color: blue;">EdgeSAM</strong>
</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">】,<span style="color: black;">就可</span>下载原文;</p>在AIWalker后台回复【<strong style="color: blue;">RepViT-SAM</strong>】<span style="color: black;">就可</span>下载原文。
<h2 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><span style="color: black;">举荐</span>阅读</span></h2><a style="color: black;">RepViT: 从ViT<span style="color: black;">方向</span>重新审视轻量级CNN移动端架构</a><a style="color: black;">EfficientSAM | 借助MIM机制,MetaAI让SAM更<span style="color: black;">有效</span>!</a><a style="color: black;">Fa</a>stSAM:基于CNN的SAM任务<span style="color: black;">处理</span><span style="color: black;">方法</span>,速度<span style="color: black;">提高</span>50倍!<a style="color: black;">MobileSAM | 让SAM再快一点!处理一张图像仅需10ms</a><a style="color: black;">NanoSAM:让您在Jetson Orin上享受实时分割的卓越体验</a><a style="color: black;">实践教程|硬核详解SAM TensorRT模型转换</a>
谷歌外贸网站优化技术。 你字句如珍珠,我珍藏这份情。
页:
[1]