IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum最新文献

Parallel Maximum Cardinality Matching for General Graphs on GPUs. GPU 上通用图的并行最大心性匹配。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2024-05-01 Epub Date: 2024-07-26 DOI: 10.1109/ipdpsw63119.2024.00157

Gregory Schwing, Daniel Grosu, Loren Schwiebert

The matching problem formulated as Maximum Cardinality Matching in General Graphs (MCMGG) finds the largest matching on graphs without restrictions. The Micali-Vazirani algorithm has the best asymptotic complexity for solving MCMGG when the graphs are sparse. Parallelizing matching in general graphs on the GPU is difficult for multiple reasons. First, the augmenting path procedure is highly recursive, and NVIDIA GPUs use registers to store kernel arguments, which eventually spill into cached device memory, with a performance penalty. Second, extracting parallelism from the matching process requires partitioning the graph to avoid any overlapping augmenting paths. We propose an implementation of the Micali-Vazirani algorithm which identifies bridge edges using thread-parallel breadth-first search, followed by block-parallel path augmentation and blossom contraction. Augmenting path and Union-find methods were implemented as stack-based iterative methods, with a stack allocated in shared memory. Our experimentation shows that compared to the serial implementation, our approach results in up to 15-fold speed-up for very sparse regular graphs, up to 5-fold slowdown for denser regular graphs, and finally a 50-fold slowdown for power-law distributed Kronecker graphs. This implementation has been open-sourced for further research on developing combinatorial graph algorithms on GPUs.

匹配问题被表述为 "一般图中的最大卡入度匹配（MCMGG）"，它可以在没有限制的图中找到最大的匹配。当图形稀疏时，Micali-Vazirani 算法具有求解 MCMGG 的最佳渐进复杂度。由于多种原因，在 GPU 上并行处理一般图中的匹配是困难的。首先，增强路径过程是高度递归的，英伟达™（NVIDIA®）图形处理器使用寄存器来存储内核参数，这些参数最终会溢出到缓存设备内存中，从而影响性能。其次，从匹配过程中提取并行性需要对图形进行分区，以避免任何重叠的增强路径。我们提出了一种 Micali-Vazirani 算法的实现方法，该算法使用线程并行广度优先搜索来识别桥边，然后进行块并行路径增强和花朵收缩。增强路径和联合查找方法是作为基于堆栈的迭代方法实现的，堆栈分配在共享内存中。实验结果表明，与串行实现相比，我们的方法在处理非常稀疏的正则图时速度提高了 15 倍，在处理较密集的正则图时速度降低了 5 倍，最后在处理幂律分布的 Kronecker 图时速度降低了 50 倍。该实现已被开源，用于在 GPU 上开发组合图算法的进一步研究。

{"title":"Parallel Maximum Cardinality Matching for General Graphs on GPUs.","authors":"Gregory Schwing, Daniel Grosu, Loren Schwiebert","doi":"10.1109/ipdpsw63119.2024.00157","DOIUrl":"10.1109/ipdpsw63119.2024.00157","url":null,"abstract":"The matching problem formulated as Maximum Cardinality Matching in General Graphs (MCMGG) finds the largest matching on graphs without restrictions. The Micali-Vazirani algorithm has the best asymptotic complexity for solving MCMGG when the graphs are sparse. Parallelizing matching in general graphs on the GPU is difficult for multiple reasons. First, the augmenting path procedure is highly recursive, and NVIDIA GPUs use registers to store kernel arguments, which eventually spill into cached device memory, with a performance penalty. Second, extracting parallelism from the matching process requires partitioning the graph to avoid any overlapping augmenting paths. We propose an implementation of the Micali-Vazirani algorithm which identifies bridge edges using thread-parallel breadth-first search, followed by block-parallel path augmentation and blossom contraction. Augmenting path and Union-find methods were implemented as stack-based iterative methods, with a stack allocated in shared memory. Our experimentation shows that compared to the serial implementation, our approach results in up to 15-fold speed-up for very sparse regular graphs, up to 5-fold slowdown for denser regular graphs, and finally a 50-fold slowdown for power-law distributed Kronecker graphs. This implementation has been open-sourced for further research on developing combinatorial graph algorithms on GPUs.","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2024 ","pages":"880-889"},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11308434/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shared-Memory Parallel Edmonds Blossom Algorithm for Maximum Cardinality Matching in General Graphs. 共享内存并行 Edmonds Blossom 算法，用于一般图中的最大卡方匹配。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2024-05-01 Epub Date: 2024-07-26 DOI: 10.1109/ipdpsw63119.2024.00107

Gregory Schwing, Daniel Grosu, Loren Schwiebert

The Edmonds Blossom algorithm is implemented here using depth-first search, which is intrinsically serial. By streamlining the code, our serial implementation is consistently three to five times faster than the previously fastest general graph matching code. By extracting parallelism across iterations of the algorithm, with coarse-grain locking, we are able to further reduce the run time on random regular graphs four-fold and obtain a two-fold reduction of run time on real-world graphs with similar topology. Solving very sparse graphs (average degree less than four) exhibiting community structure with eight threads led to a slow down of three-fold, but this slow down is replaced by marginal speed up once the average degree is greater than four. We conclude that our parallel coarse-grain locking implementation performs well when extracting parallelism from this augmenting-path-based algorithm and may work well for similar algorithms.

Edmonds Blossom 算法是通过深度优先搜索实现的，而深度优先搜索本质上是串行搜索。通过精简代码，我们的串行实现始终比之前最快的通用图形匹配代码快三到五倍。通过提取算法迭代中的并行性并进行粗粒度锁定，我们能够将随机规则图的运行时间进一步缩短四倍，并将具有类似拓扑结构的真实世界图的运行时间缩短两倍。用八个线程求解表现出群落结构的非常稀疏的图（平均度数小于四）时，运行时间缩短了三倍，但一旦平均度数大于四，这种缩短就会被边际加速所取代。我们得出的结论是，我们的并行粗粒度锁定实现在从这种基于增强路径的算法中提取并行性时表现出色，并可能适用于类似算法。

{"title":"Shared-Memory Parallel Edmonds Blossom Algorithm for Maximum Cardinality Matching in General Graphs.","authors":"Gregory Schwing, Daniel Grosu, Loren Schwiebert","doi":"10.1109/ipdpsw63119.2024.00107","DOIUrl":"10.1109/ipdpsw63119.2024.00107","url":null,"abstract":"The Edmonds Blossom algorithm is implemented here using depth-first search, which is intrinsically serial. By streamlining the code, our serial implementation is consistently three to five times faster than the previously fastest general graph matching code. By extracting parallelism across iterations of the algorithm, with coarse-grain locking, we are able to further reduce the run time on random regular graphs four-fold and obtain a two-fold reduction of run time on real-world graphs with similar topology. Solving very sparse graphs (average degree less than four) exhibiting community structure with eight threads led to a slow down of three-fold, but this slow down is replaced by marginal speed up once the average degree is greater than four. We conclude that our parallel coarse-grain locking implementation performs well when extracting parallelism from this augmenting-path-based algorithm and may work well for similar algorithms.","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2024 ","pages":"530-539"},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11308447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sequre: a high-performance framework for rapid development of secure bioinformatics pipelines. secure:用于快速开发安全生物信息学管道的高性能框架。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2022-05-01 Epub Date: 2022-08-01

Haris Smajlović, Ariya Shajii, Bonnie Berger, Hyunghoon Cho, Ibrahim Numanagić

引用次数: 0

Application of Distributed Agent-based Modeling to Investigate Opioid Use Outcomes in Justice Involved Populations. 应用基于分布式代理的建模来调查司法介入人群中阿片类药物的使用结果。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2021-06-01 Epub Date: 2021-06-24 DOI: 10.1109/ipdpsw52791.2021.00157

Eric Tatara, John Schneider, Madeline Quasebarth, Nicholson Collier, Harold Pollack, Basmattee Boodram, Sam Friedman, Elizabeth Salisbury-Afshar, Mary Ellen Mackesy-Amiti, Jonathan Ozik

Criminal justice involved (CJI) individuals with a history of opioid use disorder (OUD) are at high risk of overdose and death in the weeks following release from jail. We developed the Justice-Community Circulation Model (JCCM) to investigate OUD/CJI dynamics post-release and the effects of interventions on overdose deaths. The JCCM uses a synthetic agent-based model population of approximately 150,000 unique individuals that is generated using demographic information collected from multiple Chicago-area studies and data sets. We use a high-performance computing (HPC) workflow to implement a sequential approximate Bayesian computation algorithm for calibrating the JCCM. The calibration results in the simulated joint posterior distribution of the JCCM input parameters. The calibrated model is used to investigate the effects of a naloxone intervention for a mass jail release. The simulation results show the degree to which a targeted intervention focusing on recently released jail inmates can help reduce the risk of death from opioid overdose.

有阿片类药物使用障碍（OUD）病史的刑事司法（CJI）涉案人员在出狱后的几周内面临用药过量和死亡的高风险。我们开发了司法-社区循环模型 (JCCM)，以研究释放后 OUD/CJI 的动态以及干预措施对用药过量死亡的影响。JCCM 使用基于合成代理的模型人口，该人口约有 150,000 个独特的个体，是利用从芝加哥地区多项研究和数据集收集的人口信息生成的。我们使用高性能计算 (HPC) 工作流程来实施一种用于校准 JCCM 的顺序近似贝叶斯计算算法。校准结果是模拟 JCCM 输入参数的联合后验分布。校准后的模型用于研究纳洛酮干预对大规模监狱释放的影响。模拟结果表明，针对近期出狱的囚犯采取有针对性的干预措施在多大程度上有助于降低阿片类药物过量致死的风险。

{"title":"Application of Distributed Agent-based Modeling to Investigate Opioid Use Outcomes in Justice Involved Populations.","authors":"Eric Tatara, John Schneider, Madeline Quasebarth, Nicholson Collier, Harold Pollack, Basmattee Boodram, Sam Friedman, Elizabeth Salisbury-Afshar, Mary Ellen Mackesy-Amiti, Jonathan Ozik","doi":"10.1109/ipdpsw52791.2021.00157","DOIUrl":"10.1109/ipdpsw52791.2021.00157","url":null,"abstract":"Criminal justice involved (CJI) individuals with a history of opioid use disorder (OUD) are at high risk of overdose and death in the weeks following release from jail. We developed the Justice-Community Circulation Model (JCCM) to investigate OUD/CJI dynamics post-release and the effects of interventions on overdose deaths. The JCCM uses a synthetic agent-based model population of approximately 150,000 unique individuals that is generated using demographic information collected from multiple Chicago-area studies and data sets. We use a high-performance computing (HPC) workflow to implement a sequential approximate Bayesian computation algorithm for calibrating the JCCM. The calibration results in the simulated joint posterior distribution of the JCCM input parameters. The calibrated model is used to investigate the effects of a naloxone intervention for a mass jail release. The simulation results show the degree to which a targeted intervention focusing on recently released jail inmates can help reduce the risk of death from opioid overdose.","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":" ","pages":"989-997"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9297575/pdf/nihms-1820884.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40528890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing High-Performance Computing Systems for Biomedical Workloads. 为生物医学工作负载优化高性能计算系统。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2020-05-01 Epub Date: 2020-07-28 DOI: 10.1109/ipdpsw50202.2020.00040

Patricia Kovatch, Lili Gai, Hyung Min Cho, Eugene Fluder, Dansha Jiang

The productivity of computational biologists is limited by the speed of their workflows and subsequent overall job throughput. Because most biomedical researchers are focused on better understanding scientific phenomena rather than developing and optimizing code, a computing and data system implemented in an adventitious and/or non-optimized manner can impede the progress of scientific discovery. In our experience, most computational, life-science applications do not generally leverage the full capabilities of high-performance computing, so tuning a system for these applications is especially critical. To optimize a system effectively, systems staff must understand the effects of the applications on the system. Effective stewardship of the system includes an analysis of the impact of the applications on the compute cores, file system, resource manager and queuing policies. The resulting improved system design, and enactment of a sustainability plan, help to enable a long-term resource for productive computational and data science. We present a case study of a typical biomedical computational workload at a leading academic medical center supporting over $100 million per year in computational biology research. Over the past eight years, our high-performance computing system has enabled over 900 biomedical publications in four major areas: genetics and population analysis, gene expression, machine learning, and structural and chemical biology. We have upgraded the system several times in response to trends, actual usage, and user feedback. Major components crucial to this evolution include scheduling structure and policies, memory size, compute type and speed, parallel file system capabilities, and deployment of cloud technologies. We evolved a 70 teraflop machine to a 1.4 petaflop machine in seven years and grew our user base nearly 10-fold. For long-term stability and sustainability, we established a chargeback fee structure. Our overarching guiding principle for each progression has been to increase scientific throughput and enable enhanced scientific fidelity with minimal impact to existing user workflows or code. This highly-constrained system optimization has presented unique challenges, leading us to adopt new approaches to provide constructive pathways forward. We share our practical strategies resulting from our ongoing growth and assessments.

计算生物学家的工作效率受限于他们的工作流程速度和随后的整体工作吞吐量。由于大多数生物医学研究人员都专注于更好地理解科学现象，而不是开发和优化代码，因此，计算和数据系统如果以不恰当和/或未优化的方式实施，就会阻碍科学发现的进展。根据我们的经验，大多数计算型生命科学应用通常不会充分利用高性能计算的全部功能，因此针对这些应用调整系统尤为重要。要有效优化系统，系统工作人员必须了解应用程序对系统的影响。有效的系统管理包括分析应用程序对计算核心、文件系统、资源管理器和队列策略的影响。由此改进的系统设计和制定的可持续发展计划有助于为富有成效的计算和数据科学提供长期资源。我们介绍了一个典型的生物医学计算工作量案例研究，该案例发生在一个领先的学术医学中心，每年支持超过 1 亿美元的计算生物学研究。在过去八年中，我们的高性能计算系统在遗传学和群体分析、基因表达、机器学习以及结构和化学生物学四大领域发表了 900 多篇生物医学论文。根据发展趋势、实际使用情况和用户反馈，我们对系统进行了多次升级。对这一演变至关重要的主要组成部分包括调度结构和策略、内存大小、计算类型和速度、并行文件系统功能以及云技术的部署。我们在七年内将 70 teraflop 的机器进化为 1.4 petaflop 的机器，用户数量增长了近 10 倍。为了实现长期稳定和可持续发展，我们建立了收费结构。我们每次进步的总体指导原则都是在对现有用户工作流程或代码影响最小的情况下，提高科学吞吐量，增强科学保真度。这种高度受限的系统优化带来了独特的挑战，促使我们采用新的方法来提供建设性的前进道路。我们将与大家分享在不断发展和评估过程中形成的实用策略。

{"title":"Optimizing High-Performance Computing Systems for Biomedical Workloads.","authors":"Patricia Kovatch, Lili Gai, Hyung Min Cho, Eugene Fluder, Dansha Jiang","doi":"10.1109/ipdpsw50202.2020.00040","DOIUrl":"10.1109/ipdpsw50202.2020.00040","url":null,"abstract":"The productivity of computational biologists is limited by the speed of their workflows and subsequent overall job throughput. Because most biomedical researchers are focused on better understanding scientific phenomena rather than developing and optimizing code, a computing and data system implemented in an adventitious and/or non-optimized manner can impede the progress of scientific discovery. In our experience, most computational, life-science applications do not generally leverage the full capabilities of high-performance computing, so tuning a system for these applications is especially critical. To optimize a system effectively, systems staff must understand the effects of the applications on the system. Effective stewardship of the system includes an analysis of the impact of the applications on the compute cores, file system, resource manager and queuing policies. The resulting improved system design, and enactment of a sustainability plan, help to enable a long-term resource for productive computational and data science. We present a case study of a typical biomedical computational workload at a leading academic medical center supporting over $100 million per year in computational biology research. Over the past eight years, our high-performance computing system has enabled over 900 biomedical publications in four major areas: genetics and population analysis, gene expression, machine learning, and structural and chemical biology. We have upgraded the system several times in response to trends, actual usage, and user feedback. Major components crucial to this evolution include scheduling structure and policies, memory size, compute type and speed, parallel file system capabilities, and deployment of cloud technologies. We evolved a 70 teraflop machine to a 1.4 petaflop machine in seven years and grew our user base nearly 10-fold. For long-term stability and sustainability, we established a chargeback fee structure. Our overarching guiding principle for each progression has been to increase scientific throughput and enable enhanced scientific fidelity with minimal impact to existing user workflows or code. This highly-constrained system optimization has presented unique challenges, leading us to adopt new approaches to provide constructive pathways forward. We share our practical strategies resulting from our ongoing growth and assessments.","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2020 ","pages":"183-192"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7575271/pdf/nihms-1635815.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38515062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

atSNPInfrastructure, a case study for searching billions of records while providing significant cost savings over cloud providers. atSNPInfrastructure，一项搜索数十亿条记录的案例研究，同时为云提供商节省了大量成本。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2018-05-01 Epub Date: 2018-08-06 DOI: 10.1109/IPDPSW.2018.00086

Christopher Harrison, Sündüz Keleş, Rebecca Hudson, Sunyoung Shin, Inês Dutra

We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional model for scalable storage and query-ability for researchers exploring the SNP's in the human genome. We address the scalability problem by building physical infrastructure and comparing final costs to a major cloud provider.

我们探索了一个数据库存储引擎的可行性，该引擎包含多达3070亿个用于在线访问的遗传单核苷酸多态性（SNP）。我们评估了数据库存储引擎，并利用数据集大小、信息增益、成本和硬件限制等因素实现了解决方案。我们的解决方案为探索人类基因组中SNP的研究人员提供了一个可扩展存储和查询能力的全功能模型。我们通过构建物理基础设施并将最终成本与主要云提供商进行比较来解决可扩展性问题。

引用次数: 0

Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads. 生物分子和细胞模拟工作负载的新兴节能异构计算平台的评估。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2016-05-01 DOI: 10.1109/IPDPSW.2016.130

John E Stone, Michael J Hallock, James C Phillips, Joseph R Peterson, Zaida Luthey-Schulten, Klaus Schulten

Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.

通过计算生物学取得的许多持续的科学进步，都是建立在计算能力不断提高的基础上的，而计算能力的提高需要在生物相关的时间尺度上对细胞过程进行详细的模拟和分析。未来百亿亿次超级计算机系统的发展面临的一个关键挑战是开发新的计算硬件和相关的科学应用程序，这些应用程序可以显著提高现有解决方案的能源效率，同时提供更高的模拟、分析和可视化性能。移动计算平台最近变得足够强大，可以支持以前只能在笔记本电脑和工作站上实现的交互式分子可视化任务，为它们在会议、远程协作和作为沉浸式立体观看的头戴式显示器创造了未来的机会。我们描述了将几种生物分子模拟和分析应用程序应用于新兴异构计算平台的早期经验，这些平台结合了节能的片上系统多核cpu和高性能大规模并行gpu。我们提出了低成本的功耗监测仪器，提供足够的时间分辨率来评估单个CPU算法和GPU内核的功耗。我们比较了在新兴平台上运行的科学应用程序与在传统平台上获得的结果的性能和能源效率，确定了影响这些平台可用性的硬件和算法性能瓶颈，并描述了改进硬件和应用程序的途径，以满足移动设备和未来百亿亿次计算机上分子建模任务的需求。

{"title":"Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads.","authors":"John E Stone, Michael J Hallock, James C Phillips, Joseph R Peterson, Zaida Luthey-Schulten, Klaus Schulten","doi":"10.1109/IPDPSW.2016.130","DOIUrl":"https://doi.org/10.1109/IPDPSW.2016.130","url":null,"abstract":"Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2016 ","pages":"89-100"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/IPDPSW.2016.130","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34645245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Real-Time Agent-Based Modeling Simulation with in-situ Visualization of Complex Biological Systems: A Case Study on Vocal Fold Inflammation and Healing. 基于智能体的实时建模仿真与复杂生物系统的原位可视化:声带炎症和愈合的案例研究。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2016-05-01 DOI: 10.1109/IPDPSW.2016.20

Nuttiiya Seekhao, Caroline Shung, Joseph JaJa, Luc Mongeau, Nicole Y K Li-Jessen

We present an efficient and scalable scheme for implementing agent-based modeling (ABM) simulation with In Situ visualization of large complex systems on heterogeneous computing platforms. The scheme is designed to make optimal use of the resources available on a heterogeneous platform consisting of a multicore CPU and a GPU, resulting in minimal to no resource idle time. Furthermore, the scheme was implemented under a client-server paradigm that enables remote users to visualize and analyze simulation data as it is being generated at each time step of the model. Performance of a simulation case study of vocal fold inflammation and wound healing with 3.8 million agents shows 35× and 7× speedup in execution time over single-core and multi-core CPU respectively. Each iteration of the model took less than 200 ms to simulate, visualize and send the results to the client. This enables users to monitor the simulation in real-time and modify its course as needed.

我们提出了一种高效且可扩展的方案，用于在异构计算平台上实现基于代理的建模(ABM)仿真和大型复杂系统的原位可视化。该方案旨在优化由多核CPU和GPU组成的异构平台上的可用资源，从而最小化或没有资源空闲时间。此外，该方案是在客户机-服务器范式下实现的，该范式使远程用户能够在模型的每个时间步生成仿真数据时对其进行可视化和分析。在380万个代理的声带炎症和伤口愈合模拟案例研究中，执行时间分别比单核和多核CPU加快35倍和7倍。模型的每次迭代花费不到200毫秒的时间来模拟、可视化并将结果发送给客户端。这使用户能够实时监控模拟并根据需要修改其过程。

{"title":"Real-Time Agent-Based Modeling Simulation with in-situ Visualization of Complex Biological Systems: A Case Study on Vocal Fold Inflammation and Healing.","authors":"Nuttiiya Seekhao, Caroline Shung, Joseph JaJa, Luc Mongeau, Nicole Y K Li-Jessen","doi":"10.1109/IPDPSW.2016.20","DOIUrl":"https://doi.org/10.1109/IPDPSW.2016.20","url":null,"abstract":"We present an efficient and scalable scheme for implementing agent-based modeling (ABM) simulation with In Situ visualization of large complex systems on heterogeneous computing platforms. The scheme is designed to make optimal use of the resources available on a heterogeneous platform consisting of a multicore CPU and a GPU, resulting in minimal to no resource idle time. Furthermore, the scheme was implemented under a client-server paradigm that enables remote users to visualize and analyze simulation data as it is being generated at each time step of the model. Performance of a simulation case study of vocal fold inflammation and wound healing with 3.8 million agents shows 35× and 7× speedup in execution time over single-core and multi-core CPU respectively. Each iteration of the model took less than 200 ms to simulate, visualize and send the results to the client. This enables users to monitor the simulation in real-time and modify its course as needed.","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2016 ","pages":"463-472"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/IPDPSW.2016.20","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34325451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Do Firms Hedge Optimally? Evidence from an Exogenous Governance Change 公司是否最优对冲?来自外生治理变化的证据

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2013-08-13 DOI: 10.2139/ssrn.2312263

Sterling Huang, U. Peyer, Benjamin Segal

We ask whether firms hedge optimally by analyzing the impact the NYSE/NASDAQ listing rule changes have had, which exogenously imposed board composition changes on a subset of firms, on financial risk management. Using new proxies for the extent of financial risk management in non-financial firms we find that treated firms reduce their financial hedging, in a difference-in-difference framework. The reduction is concentrated in firms with higher conflicts of interests, such as a high CEO equity ownership level, which exposes them to more idiosyncratic risk, and a higher occurrence of option backdating. We reject the hypothesis that newly majority-independent boards reduce financial hedging due to a lack of knowledge. First, we find no difference in financial hedging for firms where SOX mandated the addition of a financial expert relative to those that already had such expertise. Second, shareholder value increases more during the period of time of the listing rule deliberations for treated firms that hedge prior to the treatment. We conclude that some firms hedge too much reducing shareholder value potentially to the benefit of under-diversified CEOs. We also show that board independence serves to reinforce monitoring which allows boards to cut back on excessive financial hedging.

我们通过分析纽交所/纳斯达克上市规则的变化对金融风险管理的影响，来询问公司是否能进行最佳对冲。纽交所/纳斯达克上市规则的变化外生地对一部分公司施加了董事会组成的变化。使用非金融企业金融风险管理程度的新代理，我们发现在差异中差异框架中，被处理的企业减少了其金融对冲。这种减少主要集中在利益冲突较高的公司，如CEO股权水平较高的公司，这使他们面临更多的特殊风险，期权回溯的发生率较高。我们反对新成立的多数独立董事会由于缺乏知识而减少财务对冲的假设。首先，我们发现SOX法案要求增加金融专家的公司在金融对冲方面与那些已经拥有此类专业知识的公司没有区别。第二，对于在处理前进行套期保值的公司，股东价值在上市规则审议期间增加更多。我们得出的结论是，一些公司对冲过多，可能会降低股东价值，从而使多元化程度较低的首席执行官受益。我们还表明，董事会独立性有助于加强监督，从而使董事会能够减少过度的财务对冲。

{"title":"Do Firms Hedge Optimally? Evidence from an Exogenous Governance Change","authors":"Sterling Huang, U. Peyer, Benjamin Segal","doi":"10.2139/ssrn.2312263","DOIUrl":"https://doi.org/10.2139/ssrn.2312263","url":null,"abstract":"We ask whether firms hedge optimally by analyzing the impact the NYSE/NASDAQ listing rule changes have had, which exogenously imposed board composition changes on a subset of firms, on financial risk management. Using new proxies for the extent of financial risk management in non-financial firms we find that treated firms reduce their financial hedging, in a difference-in-difference framework. The reduction is concentrated in firms with higher conflicts of interests, such as a high CEO equity ownership level, which exposes them to more idiosyncratic risk, and a higher occurrence of option backdating. We reject the hypothesis that newly majority-independent boards reduce financial hedging due to a lack of knowledge. First, we find no difference in financial hedging for firms where SOX mandated the addition of a financial expert relative to those that already had such expertise. Second, shareholder value increases more during the period of time of the listing rule deliberations for treated firms that hedge prior to the treatment. We conclude that some firms hedge too much reducing shareholder value potentially to the benefit of under-diversified CEOs. We also show that board independence serves to reinforce monitoring which allows boards to cut back on excessive financial hedging.","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"201 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76981332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce. 基于反转漂移和MapReduce的长RNA序列二级结构预测。

IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Pub Date : 2013-05-01 DOI: 10.1109/IPDPSW.2013.109

Daniel T Yehdego, Boyu Zhang, Vikram K R Kodimala, Kyle L Johnson, Michela Taufer, Ming-Ying Leung

Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.

核糖核酸(RNA)分子的二级结构在包括基因表达和调控在内的许多生物过程中起着重要作用。实验观察和计算限制表明，我们可以通过将长RNA序列分割成较短的片段，使用现有的预测程序分别预测每个片段的二级结构，然后将结果组合起来给出原始序列的结构来解决长RNA序列的二级结构预测问题。切割点的选择是分割步骤的关键组成部分。注意到茎环和假结总是包含反转，即一段核苷酸紧跟着其逆互补序列，我们开发了两种基于反转漂移的切割长RNA序列的方法:中心和优化方法。搜索反转、分块和预测的每一步都可以并行执行。在本文中，我们使用MapReduce框架，即Hadoop，广泛探索有意义的反演茎长度和分割间隙大小，并确定分块方法与预测精度之间的相关性。我们表明，对于RFAM数据库中的一组长RNA序列，其二级结构已知包含假结，当后一种预测在计算上可能时，我们的方法比不分割序列的方法更准确地预测二级结构。我们还表明，当序列超过一定长度时，一些程序无法计算预测伪结，而我们的分块方法可以。总的来说，与已知的实验二级结构相比，我们预测的结构仍然保持了原始预测程序的精度水平。

{"title":"Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.","authors":"Daniel T Yehdego, Boyu Zhang, Vikram K R Kodimala, Kyle L Johnson, Michela Taufer, Ming-Ying Leung","doi":"10.1109/IPDPSW.2013.109","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.109","url":null,"abstract":"Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.","PeriodicalId":90848,"journal":{"name":"IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum : [proceedings]. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2013 ","pages":"520-529"},"PeriodicalIF":0.0,"publicationDate":"2013-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/IPDPSW.2013.109","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33223654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8