首页 > 最新文献

International Workshop on Storage Network Architecture and Parallel I/Os最新文献

英文 中文
A stochastic approach to file access prediction 文件存取预测的随机方法
Pub Date : 2003-09-28 DOI: 10.1145/1162618.1162623
Jehan-Francois Pâris, A. Amer, D. Long
Most existing studies of file access prediction are experimental in nature and rely on trace driven simulation to predict the performance of the schemes being investigated. We present a first order Markov analysis of file access prediction, discuss its limitations and show how it can be used to estimate the performance of file access predictors, such as First Successor, Last Successor, Stable Successor and Best-k-out-of-n. We compare these analytical results with experimental measurements performed on several file traces and find out that specific workloads, and indeed individual files, can exhibit very different levels of non-stationarity. Overall, at least 60 percent of access requests appear to remain stable over at least a month.
大多数现有的文件访问预测研究本质上是实验性的,并且依赖于跟踪驱动模拟来预测所研究方案的性能。我们提出了文件访问预测的一阶马尔可夫分析,讨论了它的局限性,并展示了如何使用它来估计文件访问预测器的性能,如第一后继者,最后后继者,稳定后继者和best -k- of-n。我们将这些分析结果与对几个文件跟踪执行的实验测量结果进行比较,发现特定的工作负载(实际上是单个文件)可能表现出非常不同的非平稳性水平。总体而言,至少60%的访问请求在至少一个月内保持稳定。
{"title":"A stochastic approach to file access prediction","authors":"Jehan-Francois Pâris, A. Amer, D. Long","doi":"10.1145/1162618.1162623","DOIUrl":"https://doi.org/10.1145/1162618.1162623","url":null,"abstract":"Most existing studies of file access prediction are experimental in nature and rely on trace driven simulation to predict the performance of the schemes being investigated. We present a first order Markov analysis of file access prediction, discuss its limitations and show how it can be used to estimate the performance of file access predictors, such as First Successor, Last Successor, Stable Successor and Best-k-out-of-n. We compare these analytical results with experimental measurements performed on several file traces and find out that specific workloads, and indeed individual files, can exhibit very different levels of non-stationarity. Overall, at least 60 percent of access requests appear to remain stable over at least a month.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129840849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system 一个具有成本效益、容错的并行虚拟文件系统的设计、实现和性能评估
Pub Date : 2003-09-28 DOI: 10.1145/1162618.1162625
Yifeng Zhu, Hong Jiang, X. Qin, D. Feng, D. Swanson
Fault tolerance is one of the most important issues for parallel file systems. This paper presents the design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system (CEFT-PVFS) that provides parallel I/O service without requiring any additional hardware by utilizing existing commodity disks on cluster nodes and incorporates fault tolerance in the form of disk mirroring. While mirroring is a straightforward idea, we have implemented this open source system and conducted extensive experiments to evaluate the feasibility, efficiency and scalability of this fault tolerant approach on one of the current largest clusters, where the issues of data consistency and recovery are also investigated. Four mirroring protocols are proposed, reflecting whether the fault-tolerant operations are client driven or server driven; synchronous or asynchronous. Their relative merits are assessed by comparing their write performances, measured in the real systems, and their reliability and availability measures, obtained through analytical modeling. The results indicate that, in cluster environments, mirroring can improve the reliability by a factor of over 40 (4000%) while sacrificing the peak write performance by 33--58% when both systems are of identical sizes (i.e., counting the 50% mirroring disks in the mirrored system). In addition, protocols with higher peak write performance are less reliable than those with lower peak write performance, with the latter achieving a higher reliability and availability at the expense of some write bandwidth. A hybrid protocol is proposed to optimize this tradeoff.
容错是并行文件系统最重要的问题之一。本文介绍了一个具有成本效益的、容错的并行虚拟文件系统(CEFT-PVFS)的设计、实现和性能评估,该系统通过利用集群节点上现有的商品磁盘提供并行I/O服务,而不需要任何额外的硬件,并以磁盘镜像的形式包含容错。虽然镜像是一个简单的想法,但我们已经实现了这个开源系统,并进行了大量的实验,以在当前最大的集群之一上评估这种容错方法的可行性、效率和可伸缩性,其中还研究了数据一致性和恢复问题。提出了四种镜像协议,反映了容错操作是客户端驱动还是服务器驱动;同步或异步。通过比较它们在实际系统中测量的写入性能,以及通过分析建模获得的可靠性和可用性度量,来评估它们的相对优点。结果表明,在集群环境中,当两个系统具有相同的大小(即,计算镜像系统中50%的镜像磁盘)时,镜像可以将可靠性提高40倍(4000%)以上,同时牺牲峰值写性能33- 58%。另外,峰值写性能较高的协议的可靠性不如峰值写性能较低的协议,后者以牺牲一定的写带宽为代价获得更高的可靠性和可用性。提出了一种混合协议来优化这种权衡。
{"title":"Design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system","authors":"Yifeng Zhu, Hong Jiang, X. Qin, D. Feng, D. Swanson","doi":"10.1145/1162618.1162625","DOIUrl":"https://doi.org/10.1145/1162618.1162625","url":null,"abstract":"Fault tolerance is one of the most important issues for parallel file systems. This paper presents the design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system (CEFT-PVFS) that provides parallel I/O service without requiring any additional hardware by utilizing existing commodity disks on cluster nodes and incorporates fault tolerance in the form of disk mirroring. While mirroring is a straightforward idea, we have implemented this open source system and conducted extensive experiments to evaluate the feasibility, efficiency and scalability of this fault tolerant approach on one of the current largest clusters, where the issues of data consistency and recovery are also investigated. Four mirroring protocols are proposed, reflecting whether the fault-tolerant operations are client driven or server driven; synchronous or asynchronous. Their relative merits are assessed by comparing their write performances, measured in the real systems, and their reliability and availability measures, obtained through analytical modeling. The results indicate that, in cluster environments, mirroring can improve the reliability by a factor of over 40 (4000%) while sacrificing the peak write performance by 33--58% when both systems are of identical sizes (i.e., counting the 50% mirroring disks in the mirrored system). In addition, protocols with higher peak write performance are less reliable than those with lower peak write performance, with the latter achieving a higher reliability and availability at the expense of some write bandwidth. A hybrid protocol is proposed to optimize this tradeoff.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127625860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Performance evaluation of distributed iSCSI RAID 分布式iSCSI RAID性能评估
Pub Date : 2003-09-28 DOI: 10.1145/1162618.1162620
Xubin He, Praveen Beedanagari, Dan Zhou
iSCSI is a newly emerging protocol with the goal of implementing the storage area network (SAN) technology over TCP/IP, which brings economy and convenience whereas it also raises performance and reliability issues. This paper identifies the performance bottleneck of iSCSI, and then proposes a distributed iSCSI RAID to improve the performance by stripping data among iSCSI targets (S-iRAID) and improve the reliability by using rotated parity for data blocks (P-iRAID). Numerical results using popular benchmark have shown dramatic performance gain. S-iRAID improves the average throughput from 11.7MB/s to 46.1 MB/s by striping data among only three iSCSI targets. S-iRAID and P-iRAID can speed up the iSCSI performance by a factor of up to 6.6 and 2.17, respectively.
iSCSI是一种新兴的协议,其目标是在TCP/IP上实现存储区域网络(SAN)技术,它带来了经济和方便,但也提出了性能和可靠性问题。本文针对iSCSI的性能瓶颈,提出了一种分布式iSCSI RAID,通过在iSCSI目标间剥离数据(S-iRAID)来提高性能,通过数据块的旋转奇偶校验(P-iRAID)来提高可靠性。使用常用基准测试的数值结果显示了显著的性能增益。s - iraid通过仅在三个iSCSI目标之间条带化数据,将平均吞吐量从11.7MB/s提高到46.1 MB/s。S-iRAID和P-iRAID可以分别将iSCSI性能提高6.6倍和2.17倍。
{"title":"Performance evaluation of distributed iSCSI RAID","authors":"Xubin He, Praveen Beedanagari, Dan Zhou","doi":"10.1145/1162618.1162620","DOIUrl":"https://doi.org/10.1145/1162618.1162620","url":null,"abstract":"iSCSI is a newly emerging protocol with the goal of implementing the storage area network (SAN) technology over TCP/IP, which brings economy and convenience whereas it also raises performance and reliability issues. This paper identifies the performance bottleneck of iSCSI, and then proposes a distributed iSCSI RAID to improve the performance by stripping data among iSCSI targets (S-iRAID) and improve the reliability by using rotated parity for data blocks (P-iRAID). Numerical results using popular benchmark have shown dramatic performance gain. S-iRAID improves the average throughput from 11.7MB/s to 46.1 MB/s by striping data among only three iSCSI targets. S-iRAID and P-iRAID can speed up the iSCSI performance by a factor of up to 6.6 and 2.17, respectively.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130645772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Data consistent up- and downstreaming in a distributed storage system 分布式存储系统中数据上下流一致
Pub Date : 2003-09-28 DOI: 10.1145/1162618.1162621
P. Sobe
Distribution of large data objects among several storage servers is a common technique to speed up access rates. In combination with parity schemes, failures of single server nodes can be tolerated, so that such systems reach a certain degree of fault tolerance. In this paper such a distributed server system is analyzed. Data objects are stored in a data layout according to RAID level 3 among disk subsystems of different computers. An access control provides concurrent up- and down-streaming of data objects to/from the distributed storage system with ensured data consistency. This consistency control is described in combination with the handling of faulty server nodes and faulty clients. Furthermore, performance is measured with several access patterns. An application of that technique is for instance a distributed video server, allowing permanently updates without interrupting access.
在多个存储服务器之间分布大型数据对象是一种提高访问速率的常用技术。结合奇偶校验方案,可以容忍单个服务器节点的故障,使系统达到一定的容错能力。本文对这种分布式服务器系统进行了分析。数据对象在不同计算机的磁盘子系统之间按RAID级别3按数据布局存储。访问控制为分布式存储系统提供数据对象的并发上下流,保证数据的一致性。这种一致性控制与故障服务器节点和故障客户端的处理结合在一起进行描述。此外,性能是用几种访问模式来衡量的。该技术的一个应用是分布式视频服务器,允许在不中断访问的情况下进行永久更新。
{"title":"Data consistent up- and downstreaming in a distributed storage system","authors":"P. Sobe","doi":"10.1145/1162618.1162621","DOIUrl":"https://doi.org/10.1145/1162618.1162621","url":null,"abstract":"Distribution of large data objects among several storage servers is a common technique to speed up access rates. In combination with parity schemes, failures of single server nodes can be tolerated, so that such systems reach a certain degree of fault tolerance. In this paper such a distributed server system is analyzed. Data objects are stored in a data layout according to RAID level 3 among disk subsystems of different computers. An access control provides concurrent up- and down-streaming of data objects to/from the distributed storage system with ensured data consistency. This consistency control is described in combination with the handling of faulty server nodes and faulty clients. Furthermore, performance is measured with several access patterns. An application of that technique is for instance a distributed video server, allowing permanently updates without interrupting access.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"526 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128561392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Meta-data snapshotting: a simple mechanism for file system consistency 元数据快照:一种简单的文件系统一致性机制
Pub Date : 2003-09-28 DOI: 10.1145/1162618.1162624
Livio Baldini Soares, O. Krieger, Dilma Da Silva
File system consistency frequently involves a choice between raw performance and integrity guarantees. A few software-based solutions for this problem have appeared and are currently being used on some commercial operating systems; these include log-structured file systems, journaling file systems, and soft updates. In this paper, we propose meta-data snapshotting as a low-cost, scalable, and simple mechanism that provides file system integrity. It allows the safe use of write-back caching by making successive snapshots of the meta-data using copy-on-write, and atomically committing the snapshot to stable storage without interrupting file system availability. In the presence of system failures, no file system checker or any other operation is necessary to mount the file system, therefore it greatly improves system availability. This paper describes meta-data snapshotting, and its incorporation into a file system available for the Linux and K42 operating systems. We show that meta-data snapshotting has low overhead: for a microbenchmark, and two macrobenchmarks, the measured overhead is of at most 4%, when compared to a completely asynchronous file system, with no consistency guarantees. Our experiments also show that it induces less overhead then a write-ahead journaling file system, and it scales much better when the number of clients and file system operations grows.Furthermore, this new technique can be easily extended to provide file system snapshotting (versioning) and transaction support for a collection of selected files or directories.
文件系统一致性经常涉及在原始性能和完整性保证之间进行选择。针对这个问题的一些基于软件的解决方案已经出现,目前正在一些商业操作系统上使用;其中包括日志结构的文件系统、日志文件系统和软更新。在本文中,我们提出元数据快照作为一种低成本、可扩展和简单的机制,提供文件系统完整性。它允许安全地使用回写缓存,方法是使用写时复制对元数据进行连续快照,并在不中断文件系统可用性的情况下自动将快照提交到稳定的存储中。在出现系统故障时,不需要文件系统检查器或任何其他操作来挂载文件系统,因此它大大提高了系统可用性。本文描述了元数据快照,并将其集成到Linux和K42操作系统可用的文件系统中。我们表明,元数据快照的开销很低:对于一个微基准测试和两个宏基准测试,与完全异步文件系统相比,测量的开销最多为4%,没有一致性保证。我们的实验还表明,它比预写日志文件系统带来的开销更少,而且当客户端和文件系统操作数量增加时,它的可伸缩性要好得多。此外,可以很容易地扩展这种新技术,为选定的文件或目录集合提供文件系统快照(版本控制)和事务支持。
{"title":"Meta-data snapshotting: a simple mechanism for file system consistency","authors":"Livio Baldini Soares, O. Krieger, Dilma Da Silva","doi":"10.1145/1162618.1162624","DOIUrl":"https://doi.org/10.1145/1162618.1162624","url":null,"abstract":"File system consistency frequently involves a choice between raw performance and integrity guarantees. A few software-based solutions for this problem have appeared and are currently being used on some commercial operating systems; these include log-structured file systems, journaling file systems, and soft updates. In this paper, we propose meta-data snapshotting as a low-cost, scalable, and simple mechanism that provides file system integrity. It allows the safe use of write-back caching by making successive snapshots of the meta-data using copy-on-write, and atomically committing the snapshot to stable storage without interrupting file system availability. In the presence of system failures, no file system checker or any other operation is necessary to mount the file system, therefore it greatly improves system availability. This paper describes meta-data snapshotting, and its incorporation into a file system available for the Linux and K42 operating systems. We show that meta-data snapshotting has low overhead: for a microbenchmark, and two macrobenchmarks, the measured overhead is of at most 4%, when compared to a completely asynchronous file system, with no consistency guarantees. Our experiments also show that it induces less overhead then a write-ahead journaling file system, and it scales much better when the number of clients and file system operations grows.Furthermore, this new technique can be easily extended to provide file system snapshotting (versioning) and transaction support for a collection of selected files or directories.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122713299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Performance of optimized software implementation of the iSCSI protocol 优化软件实现iSCSI协议的性能
Pub Date : 2003-09-28 DOI: 10.1145/1162618.1162619
Fujita Tomonori, Ogawara Masanori
The advent of IP-based storage networking has brought specialized network adapters that directly support TCP/IP and storage protocols on the market to get comparable performance that specialized high-performance storage networking architectures provide.This paper describes an efficient software implementation of the iSCSI protocol with a commodity networking infrastructure. Though several studies have compared the performances of specialized network adapters and commodity network adapters, our iSCSI implementation eliminates data copying overhead unlike straightforward iSCSI implementations used in previous studies. To achieve it, we modified a general-purpose operating system by using techniques studied for improving TCP performance in the literature and features that commodity Gigabit Ethernet adapters support. We also quantified their effects.Our microbenchmarks show, compared with a straightforward iSCSI driver that does not use these techniques, the iSCSI driver with these optimizations reduces CPU utilization from 39.4% to 30.8% when writing with an I/O size of 64 KB. However, when reading, any performance gain is negated due to the high cost of operations on the virtual memory system.
基于IP的存储网络的出现带来了市场上直接支持TCP/IP和存储协议的专用网络适配器,以获得与专用高性能存储网络架构提供的性能相当的性能。本文描述了一种基于商用网络基础设施的iSCSI协议的高效软件实现。虽然有几项研究比较了专用网络适配器和商品网络适配器的性能,但我们的iSCSI实现消除了数据复制开销,不像以前研究中使用的直接iSCSI实现。为了实现这一点,我们修改了一个通用的操作系统,使用了文献中研究的提高TCP性能的技术和千兆以太网适配器支持的特性。我们还量化了它们的效果。我们的微基准测试显示,与没有使用这些技术的直接iSCSI驱动程序相比,当写入I/O大小为64 KB时,具有这些优化的iSCSI驱动程序将CPU利用率从39.4%降低到30.8%。然而,在读取时,由于虚拟内存系统上的高操作成本,任何性能增益都被抵消了。
{"title":"Performance of optimized software implementation of the iSCSI protocol","authors":"Fujita Tomonori, Ogawara Masanori","doi":"10.1145/1162618.1162619","DOIUrl":"https://doi.org/10.1145/1162618.1162619","url":null,"abstract":"The advent of IP-based storage networking has brought specialized network adapters that directly support TCP/IP and storage protocols on the market to get comparable performance that specialized high-performance storage networking architectures provide.This paper describes an efficient software implementation of the iSCSI protocol with a commodity networking infrastructure. Though several studies have compared the performances of specialized network adapters and commodity network adapters, our iSCSI implementation eliminates data copying overhead unlike straightforward iSCSI implementations used in previous studies. To achieve it, we modified a general-purpose operating system by using techniques studied for improving TCP performance in the literature and features that commodity Gigabit Ethernet adapters support. We also quantified their effects.Our microbenchmarks show, compared with a straightforward iSCSI driver that does not use these techniques, the iSCSI driver with these optimizations reduces CPU utilization from 39.4% to 30.8% when writing with an I/O size of 64 KB. However, when reading, any performance gain is negated due to the high cost of operations on the virtual memory system.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128118287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Source level transformations to improve I/O data partitioning 改进I/O数据分区的源级转换
Pub Date : 2003-09-28 DOI: 10.1145/1162618.1162622
Yijian Wang, D. Kaeli
The main goal for parallel I/O is to increase I/O parallelism by providing multiple, independent data channels between processors and disks. To realize this goal, I/O streams need to be parallelized and partitioned at multiple system layers. Contention at any level can dramatically decrease performance and limit scalability. To address this disk contention bottleneck, it is important to carefully study disk access patterns.From our previous work on I/O profiling, we found that I/O access patterns of parallel scientific applications are usually very regular and highly predictable. Thus it is possible to detect I/O access patterns statically during compiler time. Large datasets are logically linearized in file space on disk, and these intensive data accesses follow a linear space traversal. In this paper, we present our recent work on compiler-directed I/O partitioning, based on Linear Disk Access Descriptors (LDAD). We use the SUIF compiler infrastructure to perform data-flow analysis and recognize LDADs. We then use these LDADs to guide our I/O data partitioning that utilizes multiple disks to significantly increase I/O throughput.
并行I/O的主要目标是通过在处理器和磁盘之间提供多个独立的数据通道来增加I/O并行性。为了实现这一目标,I/O流需要在多个系统层进行并行化和分区。任何级别的争用都可能显著降低性能并限制可伸缩性。为了解决这个磁盘争用瓶颈,仔细研究磁盘访问模式非常重要。从我们之前关于I/O分析的工作中,我们发现并行科学应用程序的I/O访问模式通常是非常规则且高度可预测的。因此,可以在编译期间静态地检测I/O访问模式。大型数据集在磁盘上的文件空间中逻辑上线性化,这些密集的数据访问遵循线性空间遍历。在本文中,我们介绍了我们最近在基于线性磁盘访问描述符(LDAD)的编译器定向I/O分区方面的工作。我们使用SUIF编译器基础结构来执行数据流分析和识别ldap。然后,我们使用这些ldap来指导I/O数据分区,该分区利用多个磁盘来显著提高I/O吞吐量。
{"title":"Source level transformations to improve I/O data partitioning","authors":"Yijian Wang, D. Kaeli","doi":"10.1145/1162618.1162622","DOIUrl":"https://doi.org/10.1145/1162618.1162622","url":null,"abstract":"The main goal for parallel I/O is to increase I/O parallelism by providing multiple, independent data channels between processors and disks. To realize this goal, I/O streams need to be parallelized and partitioned at multiple system layers. Contention at any level can dramatically decrease performance and limit scalability. To address this disk contention bottleneck, it is important to carefully study disk access patterns.From our previous work on I/O profiling, we found that I/O access patterns of parallel scientific applications are usually very regular and highly predictable. Thus it is possible to detect I/O access patterns statically during compiler time. Large datasets are logically linearized in file space on disk, and these intensive data accesses follow a linear space traversal. In this paper, we present our recent work on compiler-directed I/O partitioning, based on Linear Disk Access Descriptors (LDAD). We use the SUIF compiler infrastructure to perform data-flow analysis and recognize LDADs. We then use these LDADs to guide our I/O data partitioning that utilizes multiple disks to significantly increase I/O throughput.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128346521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The Mercury system: exploiting truly fast hardware for data search 水星系统:利用真正快速的硬件进行数据搜索
Pub Date : 2003-09-28 DOI: 10.1145/1162618.1162626
R. Chamberlain, R. Cytron, M. Franklin, R. Indeck
In many data mining applications, the size of the database is not only extremely large, it is also growing rapidly. Even for relatively simple searches, the time required to move the data off magnetic media, cross the system bus into main memory, copy into processor cache, and then execute code to perform a search is prohibitive. We are building a system in which a significant portion of the data mining task (i.e., the portion that examines the bulk of the raw data) is implemented in fast hardware, close to the magnetic media on which it is stored. Furthermore, this hardware can be replicated allowing mining tasks to be performed in parallel, thus providing further speedup for the overall mining application. In this paper, we describe a general framework under which this can be accomplished and provide initial performance results for a set of applications.
在许多数据挖掘应用中,数据库的规模不仅非常大,而且还在迅速增长。即使是相对简单的搜索,将数据移出磁性介质、穿过系统总线进入主存、复制到处理器缓存,然后执行执行搜索的代码所需的时间也是令人望而却步的。我们正在构建一个系统,其中数据挖掘任务的重要部分(即检查大量原始数据的部分)在快速硬件中实现,靠近存储数据的磁性介质。此外,该硬件可以复制,允许并行执行挖掘任务,从而为整个挖掘应用程序提供进一步的加速。在本文中,我们描述了一个通用框架,在这个框架下可以完成这一任务,并为一组应用程序提供初步的性能结果。
{"title":"The Mercury system: exploiting truly fast hardware for data search","authors":"R. Chamberlain, R. Cytron, M. Franklin, R. Indeck","doi":"10.1145/1162618.1162626","DOIUrl":"https://doi.org/10.1145/1162618.1162626","url":null,"abstract":"In many data mining applications, the size of the database is not only extremely large, it is also growing rapidly. Even for relatively simple searches, the time required to move the data off magnetic media, cross the system bus into main memory, copy into processor cache, and then execute code to perform a search is prohibitive. We are building a system in which a significant portion of the data mining task (i.e., the portion that examines the bulk of the raw data) is implemented in fast hardware, close to the magnetic media on which it is stored. Furthermore, this hardware can be replicated allowing mining tasks to be performed in parallel, thus providing further speedup for the overall mining application. In this paper, we describe a general framework under which this can be accomplished and provide initial performance results for a set of applications.","PeriodicalId":447113,"journal":{"name":"International Workshop on Storage Network Architecture and Parallel I/Os","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132487388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
期刊
International Workshop on Storage Network Architecture and Parallel I/Os
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1