首页 > 最新文献

Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)最新文献

英文 中文
Collective buffering: Improving parallel I/O performance 集体缓冲:提高并行I/O性能
B. Nitzberg, V. Lo
"Parallel I/O" is the support of a single parallel application run on many nodes; application data is distributed among the nodes, and is read or written to a single logical file, itself spread across nodes and disks. Parallel I/O is a mapping problem from the data layout in node memory to the file layout on disks. Since the mapping can be quite complicated and involve significant data movement, optimizing the mapping is critical for performance. We discuss our general model of the problem, describe four Collective Buffering algorithms we designed, and report experiments testing their performance on an Intel Paragon and an IBM SP2 both housed at NASA Ames Research Center. Our experiments show improvements of up to two order of magnitude over standard techniques and the potential to deliver peak performance with minimal hardware support.
“并行I/O”是指支持在多个节点上运行单个并行应用程序;应用程序数据分布在节点之间,并被读写到单个逻辑文件中,该文件本身分布在节点和磁盘上。并行I/O是一个从节点内存中的数据布局到磁盘上的文件布局的映射问题。由于映射可能非常复杂,并且涉及大量的数据移动,因此优化映射对于性能至关重要。我们讨论了问题的一般模型,描述了我们设计的四种集体缓冲算法,并报告了在NASA Ames研究中心的Intel Paragon和IBM SP2上测试其性能的实验。我们的实验表明,与标准技术相比,它的改进幅度高达两个数量级,并且有可能在最小的硬件支持下提供峰值性能。
{"title":"Collective buffering: Improving parallel I/O performance","authors":"B. Nitzberg, V. Lo","doi":"10.1109/HPDC.1997.622371","DOIUrl":"https://doi.org/10.1109/HPDC.1997.622371","url":null,"abstract":"\"Parallel I/O\" is the support of a single parallel application run on many nodes; application data is distributed among the nodes, and is read or written to a single logical file, itself spread across nodes and disks. Parallel I/O is a mapping problem from the data layout in node memory to the file layout on disks. Since the mapping can be quite complicated and involve significant data movement, optimizing the mapping is critical for performance. We discuss our general model of the problem, describe four Collective Buffering algorithms we designed, and report experiments testing their performance on an Intel Paragon and an IBM SP2 both housed at NASA Ames Research Center. Our experiments show improvements of up to two order of magnitude over standard techniques and the potential to deliver peak performance with minimal hardware support.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"26 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133882489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Parallel hierarchical global illumination 并行分层全局照明
Q. Snell, J. Gustafson
This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like "progressive radiosity" methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for "ray histories.".
本文提出了一种算法,可以求解任意精度的渲染方程,并且可以并行运行在具有良好缩放特性的分布式内存或共享内存计算机系统上。它在速度和物理正确性方面都优于最近发表的涉及双向光线追踪或漫射和镜面混合处理的方法。像“渐进式辐射”方法一样,它在需要的地方动态地细化几何分解,但这样做没有对“射线历史”的过度存储要求。
{"title":"Parallel hierarchical global illumination","authors":"Q. Snell, J. Gustafson","doi":"10.1109/HPDC.1997.622358","DOIUrl":"https://doi.org/10.1109/HPDC.1997.622358","url":null,"abstract":"This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like \"progressive radiosity\" methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for \"ray histories.\".","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124224832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Parallel programming in multi-paradigm clusters 多范式集群中的并行编程
J. Leichtl, Phyllis E. Crandall, M. Clement
An important development in cluster computing is the availability of multiprocessor workstations. These are able to provide additional computational power to the cluster without increasing network overhead and allow multiparadigm parallelism, which we define to be the simultaneous application of both distributed and shared memory parallel processing techniques to a single problem. In this paper we compare execution times and speedup of parallel programs written in a pure message-passing paradigm with those that combine message passing and shared-memory primitives in the same application. We consider three basic applications that are common building blocks for many scientific and engineering problems: numerical integration, matrix multiplication and Jacobi iteration. Our results indicate that the added complexity of combining shared- and distributed-memory programming methods in the same program does not contribute sufficiently to performance to justify the added programming complexity.
集群计算的一个重要发展是多处理器工作站的可用性。它们能够在不增加网络开销的情况下为集群提供额外的计算能力,并允许多范式并行,我们将其定义为对单个问题同时应用分布式和共享内存并行处理技术。在本文中,我们比较了在同一应用程序中以纯消息传递范式编写的并行程序与将消息传递和共享内存原语结合在一起编写的并行程序的执行时间和加速。我们考虑了三个基本的应用,它们是许多科学和工程问题的常见构建块:数值积分,矩阵乘法和雅可比迭代。我们的结果表明,在同一程序中结合共享内存和分布式内存编程方法所增加的复杂性并不能充分提高性能,从而证明增加的编程复杂性是合理的。
{"title":"Parallel programming in multi-paradigm clusters","authors":"J. Leichtl, Phyllis E. Crandall, M. Clement","doi":"10.1109/HPDC.1997.626438","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626438","url":null,"abstract":"An important development in cluster computing is the availability of multiprocessor workstations. These are able to provide additional computational power to the cluster without increasing network overhead and allow multiparadigm parallelism, which we define to be the simultaneous application of both distributed and shared memory parallel processing techniques to a single problem. In this paper we compare execution times and speedup of parallel programs written in a pure message-passing paradigm with those that combine message passing and shared-memory primitives in the same application. We consider three basic applications that are common building blocks for many scientific and engineering problems: numerical integration, matrix multiplication and Jacobi iteration. Our results indicate that the added complexity of combining shared- and distributed-memory programming methods in the same program does not contribute sufficiently to performance to justify the added programming complexity.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132003640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A bus-efficient low-latency network interface for the PDSS multicomputer 用于PDSS多计算机的总线高效低延迟网络接口
C. Steele, J. Draper, J. Koller, C. LaCour
The Packaging-Driven Scalable Systems multicomputer (PDSS) project uses several innovative interconnect and routing techniques to construct a low-latency, high-bandwidth (1.3 GB/s) multicomputer network. The PDSS network interface provides a low-latency interface between the network and the processing nodes that allows unprivileged code to initiate network operations while maintaining a high level of protection. The interface design exploits processor-bus cache coherence protocols to deliver very-low-latency cache-to-cache communications between processing nodes. Network operations include a variety of transfers of cache-line-sized packets, including remote read and write, and a distributed barrier-synchronization mechanism. Despite performance-limiting flaws, the initial single-chip implementation of the network router and interface achieves gigabit/s bandwidth and microsecond cache-to-cache latencies between nodes using commodity processor and memory components.
封装驱动的可扩展系统多计算机(PDSS)项目使用几种创新的互连和路由技术来构建低延迟、高带宽(1.3 GB/s)的多计算机网络。PDSS网络接口在网络和处理节点之间提供低延迟接口,允许非特权代码启动网络操作,同时保持高水平的保护。接口设计利用处理器总线缓存一致性协议在处理节点之间提供非常低延迟的缓存到缓存通信。网络操作包括各种缓存行大小的数据包传输,包括远程读取和写入,以及分布式屏障同步机制。尽管存在性能限制缺陷,网络路由器和接口的初始单芯片实现使用商用处理器和内存组件在节点之间实现千兆带宽和微秒缓存到缓存延迟。
{"title":"A bus-efficient low-latency network interface for the PDSS multicomputer","authors":"C. Steele, J. Draper, J. Koller, C. LaCour","doi":"10.1109/HPDC.1997.626407","DOIUrl":"https://doi.org/10.1109/HPDC.1997.626407","url":null,"abstract":"The Packaging-Driven Scalable Systems multicomputer (PDSS) project uses several innovative interconnect and routing techniques to construct a low-latency, high-bandwidth (1.3 GB/s) multicomputer network. The PDSS network interface provides a low-latency interface between the network and the processing nodes that allows unprivileged code to initiate network operations while maintaining a high level of protection. The interface design exploits processor-bus cache coherence protocols to deliver very-low-latency cache-to-cache communications between processing nodes. Network operations include a variety of transfers of cache-line-sized packets, including remote read and write, and a distributed barrier-synchronization mechanism. Despite performance-limiting flaws, the initial single-chip implementation of the network router and interface achieves gigabit/s bandwidth and microsecond cache-to-cache latencies between nodes using commodity processor and memory components.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126924571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1