A near-storage framework for boosted data preprocessing of mass spectrum clustering

Weihong Xu, Jaeyoung Kang, T. Simunic
{"title":"A near-storage framework for boosted data preprocessing of mass spectrum clustering","authors":"Weihong Xu, Jaeyoung Kang, T. Simunic","doi":"10.1145/3489517.3530449","DOIUrl":null,"url":null,"abstract":"Mass spectrometry (MS) has been a key to proteomics and metabolomics due to its unique ability to identify and analyze protein structures. Modern MS equipment generates massive amount of tandem mass spectra with high redundancy, making spectral analysis the major bottleneck in design of new medicines. Mass spectrum clustering is one promising solution as it greatly reduces data redundancy and boosts protein identification. However, state-of-the-art MS tools take many hours to run spectrum clustering. Spectra loading and preprocessing consumes average 82% execution time and energy during clustering. We propose a near-storage framework, MSAS, to speed up spectrum preprocessing. Instead of loading data into host memory and CPU, MSAS processes spectra near storage, thus reducing the expensive cost of data movement. We present two types of accelerators that leverage internal bandwidth at two storage levels: SSD and channel. The accelerators are optimized to match the data rate at each storage level with negligible overhead. Our results demonstrate that the channel-level design yields the best performance improvement for preprocessing - it is up to 187X and 1.8X faster than the CPU and the state-of-the-art in-storage computing solution, INSIDER, respectively. After integrating channel-level MSAS into existing MS clustering tools, we measure system level improvements in speed of 3.5X to 9.8X with 2.8X to 11.9X better energy efficiency.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 59th ACM/IEEE Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3489517.3530449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Mass spectrometry (MS) has been a key to proteomics and metabolomics due to its unique ability to identify and analyze protein structures. Modern MS equipment generates massive amount of tandem mass spectra with high redundancy, making spectral analysis the major bottleneck in design of new medicines. Mass spectrum clustering is one promising solution as it greatly reduces data redundancy and boosts protein identification. However, state-of-the-art MS tools take many hours to run spectrum clustering. Spectra loading and preprocessing consumes average 82% execution time and energy during clustering. We propose a near-storage framework, MSAS, to speed up spectrum preprocessing. Instead of loading data into host memory and CPU, MSAS processes spectra near storage, thus reducing the expensive cost of data movement. We present two types of accelerators that leverage internal bandwidth at two storage levels: SSD and channel. The accelerators are optimized to match the data rate at each storage level with negligible overhead. Our results demonstrate that the channel-level design yields the best performance improvement for preprocessing - it is up to 187X and 1.8X faster than the CPU and the state-of-the-art in-storage computing solution, INSIDER, respectively. After integrating channel-level MSAS into existing MS clustering tools, we measure system level improvements in speed of 3.5X to 9.8X with 2.8X to 11.9X better energy efficiency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
质谱聚类增强数据预处理的近存储框架
质谱(MS)由于其独特的鉴定和分析蛋白质结构的能力,已成为蛋白质组学和代谢组学的关键。现代质谱分析设备产生大量高冗余的串联质谱,使谱分析成为新药设计的主要瓶颈。质谱聚类是一种很有前途的解决方案,因为它大大减少了数据冗余并提高了蛋白质识别。然而,最先进的MS工具需要花费许多小时来运行频谱聚类。在集群过程中,谱加载和预处理平均消耗82%的执行时间和能量。我们提出了一个近存储框架,MSAS,以加快频谱预处理。MSAS不是将数据加载到主机内存和CPU中,而是在存储附近处理频谱,从而降低了昂贵的数据移动成本。我们提出了两种利用两个存储级别的内部带宽的加速器:SSD和channel。加速器经过优化以匹配每个存储级别的数据速率,而开销可以忽略不计。我们的结果表明,通道级设计在预处理方面产生了最佳的性能改进-它分别比CPU和最先进的存储计算解决方案INSIDER快187X和1.8X。在将通道级MSAS集成到现有的MS集群工具中后,我们测量了系统级速度的3.5到9.8倍的提高,以及2.8到11.9倍的能效提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Timing macro modeling with graph neural networks Thermal-aware optical-electrical routing codesign for on-chip signal communications PHANES ScaleHLS Terminator on SkyNet: a practical DVFS attack on DNN hardware IP for UAV object detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1