Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528220
Michael F. Shields
Storage management for our scientific, general, and signals processing complexes in the past has been addressed on a system-by-system, complex-by-complex basis. Over the years, the time-phased introduction of a wide variety of heterogeneous computer systems, each of which was selected to solve a unique operational problem, has created a most challenging environment from the storage management perspective. Each processor had its own dedicated storage subsystems, and in the main, its own custom-software solution to its storage management. A walking tour of our major complexes would expose one of almost every type of computer and almost every imaginable data-processing application type, each with its own robotic tape and disk subsystems. Our current challenge is to design and implement a coherent mass storage architecture (across all of the processing complexes) that flexible, repartitionable, and easier to manage. We are in the midst of that challenge now. From the 1970s to the present, our approach to storage has changed considerably, shaped in large measure by the products that have arisen in the commercial marketplace. This paper discusses our corporate adaptation to change and presents our current strategy in this vital area of high-performance/high-capacity computing.
{"title":"Towards a heterogeneous common/shared storage system architecture","authors":"Michael F. Shields","doi":"10.1109/MASS.1995.528220","DOIUrl":"https://doi.org/10.1109/MASS.1995.528220","url":null,"abstract":"Storage management for our scientific, general, and signals processing complexes in the past has been addressed on a system-by-system, complex-by-complex basis. Over the years, the time-phased introduction of a wide variety of heterogeneous computer systems, each of which was selected to solve a unique operational problem, has created a most challenging environment from the storage management perspective. Each processor had its own dedicated storage subsystems, and in the main, its own custom-software solution to its storage management. A walking tour of our major complexes would expose one of almost every type of computer and almost every imaginable data-processing application type, each with its own robotic tape and disk subsystems. Our current challenge is to design and implement a coherent mass storage architecture (across all of the processing complexes) that flexible, repartitionable, and easier to manage. We are in the midst of that challenge now. From the 1970s to the present, our approach to storage has changed considerably, shaped in large measure by the products that have arisen in the commercial marketplace. This paper discusses our corporate adaptation to change and presents our current strategy in this vital area of high-performance/high-capacity computing.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127447810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528241
R. Baird
The VSAG expands on the virtual storage concepts of the IEEE Mass Storage Systems Reference Model (MSSRM), version 5. The intended outcome of the VSAG is to provide a framework for the VSS standard that will enable software vendors to develop and deploy interoperable virtual storage management products. The VSAG provides a model for: a single storage image of diverse technologies in a multivendor network of storage devices; interoperability with other OSSI components and relevant IEEE, ISO, and ANSI standards; adapting to various existing and anticipated data management clientele. The VSAG envisions a single storage image of diverse device technologies in a multivendor network. The VSAG also promotes the automation of routine storage management tasks, permits reconfiguration of network storage as needs change, and supports common points of administration for accounting, availability, configuration, performance, recovery, and security. The scope of the VSAG includes: definition of virtual objects that reside in persistent storage; framework for software interfaces and protocols; model for servers of various types interacting within a single network. The scope of the VSAG excludes: detail specification of interfaces; specification of specific functional requirements and dependencies; discussion of algorithms, protocols, and physical media formats.
{"title":"Virtual storage architecture guide (VSAG)","authors":"R. Baird","doi":"10.1109/MASS.1995.528241","DOIUrl":"https://doi.org/10.1109/MASS.1995.528241","url":null,"abstract":"The VSAG expands on the virtual storage concepts of the IEEE Mass Storage Systems Reference Model (MSSRM), version 5. The intended outcome of the VSAG is to provide a framework for the VSS standard that will enable software vendors to develop and deploy interoperable virtual storage management products. The VSAG provides a model for: a single storage image of diverse technologies in a multivendor network of storage devices; interoperability with other OSSI components and relevant IEEE, ISO, and ANSI standards; adapting to various existing and anticipated data management clientele. The VSAG envisions a single storage image of diverse device technologies in a multivendor network. The VSAG also promotes the automation of routine storage management tasks, permits reconfiguration of network storage as needs change, and supports common points of administration for accounting, availability, configuration, performance, recovery, and security. The scope of the VSAG includes: definition of virtual objects that reside in persistent storage; framework for software interfaces and protocols; model for servers of various types interacting within a single network. The scope of the VSAG excludes: detail specification of interfaces; specification of specific functional requirements and dependencies; discussion of algorithms, protocols, and physical media formats.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116882211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528233
Jang Si-Woong, Kidong Chung, S. Coleman
We estimate the performance of a network-wide concurrent file system implemented using conventional disks as disk arrays. Tests were carried out on both single system and network-wide environments. On single systems, a file was split across several disks to test the performance of file I/O operations. We concluded that performance was proportional to the number of disks, up to four, on a system with high computing power. Performance of a system with low computing power, however, did not increase, even with more than two disks. When we split a file across disks in a network-wide system called the Network-wide Concurrent File System (N-CFS), we found performance similar to or slightly higher than that of disk arrays on single systems. Since file access through N-CFS is transparent, this system enables traditional disks on single and networked systems to be used as disk arrays for I/O intensive jobs.
{"title":"Design and implementation of a network-wide concurrent file system in a workstation cluster","authors":"Jang Si-Woong, Kidong Chung, S. Coleman","doi":"10.1109/MASS.1995.528233","DOIUrl":"https://doi.org/10.1109/MASS.1995.528233","url":null,"abstract":"We estimate the performance of a network-wide concurrent file system implemented using conventional disks as disk arrays. Tests were carried out on both single system and network-wide environments. On single systems, a file was split across several disks to test the performance of file I/O operations. We concluded that performance was proportional to the number of disks, up to four, on a system with high computing power. Performance of a system with low computing power, however, did not increase, even with more than two disks. When we split a file across disks in a network-wide system called the Network-wide Concurrent File System (N-CFS), we found performance similar to or slightly higher than that of disk arrays on single systems. Since file access through N-CFS is transparent, this system enables traditional disks on single and networked systems to be used as disk arrays for I/O intensive jobs.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130330465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528221
C. Wood
This paper will attempt to examine the industry requirements for shared network data storage and sustained highspeed (tens to thousands of megabytes per second) network data serving via the NFS and FTP protocol suite. It will discuss the current structural and architectural impediments to achieving these sorts of data rates cost-effectively on many general-purpose servers, and will describe an architecture and resulting product family that addresses these problems. We will show the sustained-performance levels that were achieved in the lab and discuss early customer experiences utilizing both the HIPPI-IP and ATM OC3-IP network interfaces.
{"title":"Client/server data serving for high-performance computing","authors":"C. Wood","doi":"10.1109/MASS.1995.528221","DOIUrl":"https://doi.org/10.1109/MASS.1995.528221","url":null,"abstract":"This paper will attempt to examine the industry requirements for shared network data storage and sustained highspeed (tens to thousands of megabytes per second) network data serving via the NFS and FTP protocol suite. It will discuss the current structural and architectural impediments to achieving these sorts of data rates cost-effectively on many general-purpose servers, and will describe an architecture and resulting product family that addresses these problems. We will show the sustained-performance levels that were achieved in the lab and discuss early customer experiences utilizing both the HIPPI-IP and ATM OC3-IP network interfaces.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116769690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528238
Anuradha Mahadevan Sastri, D. Agrawal, A. E. Abbadi, Terence R. Smith
Computational modeling systems (CMS) are designed to resolve many of the shortcomings associated with systems currently employed in providing support for a wide range of scientific modeling applications. We identify the requirements of a "reasonable" CMS and identify the requirements of Amazonia, a CMS intended to support modeling in large-scale earth science research. Amazonia has been implemented as an open and layered architecture. In this paper we discuss the design and implementation of the distributed access system, a key component of the Amazonia Kernel that supports the organization of and access to data and services in a distributed environment.
{"title":"Distributed access system for uniform and scalable data and service access","authors":"Anuradha Mahadevan Sastri, D. Agrawal, A. E. Abbadi, Terence R. Smith","doi":"10.1109/MASS.1995.528238","DOIUrl":"https://doi.org/10.1109/MASS.1995.528238","url":null,"abstract":"Computational modeling systems (CMS) are designed to resolve many of the shortcomings associated with systems currently employed in providing support for a wide range of scientific modeling applications. We identify the requirements of a \"reasonable\" CMS and identify the requirements of Amazonia, a CMS intended to support modeling in large-scale earth science research. Amazonia has been implemented as an open and layered architecture. In this paper we discuss the design and implementation of the distributed access system, a key component of the Amazonia Kernel that supports the organization of and access to data and services in a distributed environment.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131043828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528215
T. Tyler, D. Fisher
Distributed on-line transaction processing (OLTP) technology can be applied to distributed mass storage systems as the mechanism for managing the consistency of distributed metadata. OLTP concepts are familiar to many industries, such as banking and financial services, but are less well known and understood in others, such as scientific and technical computing. However, as mass storage systems and other products are designed using distributed processing and data-management strategies for performance, scalability, and/or availability reasons, distributed OLTP technology can be applied to solve the inherent challenges raised by such environments. This paper briefly discusses the general benefits in using distributed transaction processing products. Design and implementation experiences using the Encina OLTP product from Transarc in the high performance storage system are presented in more detail as a case study for how this technology can be applied to mass storage systems designed for distributed environments.
{"title":"Using distributed OLTP technology in a high performance storage system","authors":"T. Tyler, D. Fisher","doi":"10.1109/MASS.1995.528215","DOIUrl":"https://doi.org/10.1109/MASS.1995.528215","url":null,"abstract":"Distributed on-line transaction processing (OLTP) technology can be applied to distributed mass storage systems as the mechanism for managing the consistency of distributed metadata. OLTP concepts are familiar to many industries, such as banking and financial services, but are less well known and understood in others, such as scientific and technical computing. However, as mass storage systems and other products are designed using distributed processing and data-management strategies for performance, scalability, and/or availability reasons, distributed OLTP technology can be applied to solve the inherent challenges raised by such environments. This paper briefly discusses the general benefits in using distributed transaction processing products. Design and implementation experiences using the Encina OLTP product from Transarc in the high performance storage system are presented in more detail as a case study for how this technology can be applied to mass storage systems designed for distributed environments.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122680968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528245
Steve Johnson, Steve Scott
The evolution of system architectures and system configurations has created the need for a new supercomputer system interconnect. Attributes required of the new interconnect include commonality among system and subsystem types, scalability, low latency, high bandwidth, a high level of resiliency, and flexibility. Cray Research Inc. is developing a new system channel to meet these interconnect requirements in future systems. The channel has a ring-based architecture, but can also function as a point-to-point link. It integrates control and data on a single, physical path while providing low latency and variance for control messages. Extensive features for client isolation, diagnostic capabilities, and fault tolerance have been incorporated into the design. The attributes and features of this channel are discussed along with implementation and protocol specifics.
{"title":"A supercomputer system interconnect and scalable IOS","authors":"Steve Johnson, Steve Scott","doi":"10.1109/MASS.1995.528245","DOIUrl":"https://doi.org/10.1109/MASS.1995.528245","url":null,"abstract":"The evolution of system architectures and system configurations has created the need for a new supercomputer system interconnect. Attributes required of the new interconnect include commonality among system and subsystem types, scalability, low latency, high bandwidth, a high level of resiliency, and flexibility. Cray Research Inc. is developing a new system channel to meet these interconnect requirements in future systems. The channel has a ring-based architecture, but can also function as a point-to-point link. It integrates control and data on a single, physical path while providing low latency and variance for control messages. Extensive features for client isolation, diagnostic capabilities, and fault tolerance have been incorporated into the design. The attributes and features of this channel are discussed along with implementation and protocol specifics.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115066337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528235
M. Lautenschlager
Starting from an accumulated amount of climate model data of 7 TByte at the end of 1994, a magnitude of 60 TByte is expected at the end of 1996. There is probably no physical problem in storing the data on available sequential mass storage devices. The problem is the organization of the data mining.
{"title":"Data retrieval from climate model archives","authors":"M. Lautenschlager","doi":"10.1109/MASS.1995.528235","DOIUrl":"https://doi.org/10.1109/MASS.1995.528235","url":null,"abstract":"Starting from an accumulated amount of climate model data of 7 TByte at the end of 1994, a magnitude of 60 TByte is expected at the end of 1996. There is probably no physical problem in storing the data on available sequential mass storage devices. The problem is the organization of the data mining.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129139703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528224
Thomas A. Mück, J. Witzmann
The development and evaluation of a tuple set manager (TSM) based on multikey index data structures is a main part of the PARABASE project at the University of Vienna. The TSM provides access to parallel mass storage systems using tuple sets instead of conventional files as the central data structure for application programs. A proof-of-concept prototype TSM is already implemented and operational on an iPSC/2. It supports tuple insert and delete operations as well as exact match, partial match, and range queries at system call level. Available results are from this prototype on the one hand and from various performance evaluation figures. The evaluation results demonstrate the performance gain achieved by the implementation of the tuple set management concept on a parallel mass storage system.
{"title":"Multikey index support for tuple sets on parallel mass storage systems","authors":"Thomas A. Mück, J. Witzmann","doi":"10.1109/MASS.1995.528224","DOIUrl":"https://doi.org/10.1109/MASS.1995.528224","url":null,"abstract":"The development and evaluation of a tuple set manager (TSM) based on multikey index data structures is a main part of the PARABASE project at the University of Vienna. The TSM provides access to parallel mass storage systems using tuple sets instead of conventional files as the central data structure for application programs. A proof-of-concept prototype TSM is already implemented and operational on an iPSC/2. It supports tuple insert and delete operations as well as exact match, partial match, and range queries at system call level. Available results are from this prototype on the one hand and from various performance evaluation figures. The evaluation results demonstrate the performance gain achieved by the implementation of the tuple set management concept on a parallel mass storage system.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115673278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-09-11DOI: 10.1109/MASS.1995.528226
P. Berard, Thomas L. Keller
The Environmental Molecular Sciences Laboratory (EMSL) is currently under construction at Pacific Northwest Laboratory (PNL) for the US Department of Energy (DOE). This laboratory will be used for molecular and environmental sciences research to identify comprehensive solutions to DOE's environmental problems. Major facilities within the EMSL include the Molecular Sciences Computing Facility (MSCF), a laser-surface dynamics laboratory, a high-field nuclear magnetic resonance (NMR) laboratory, and a mass spectrometry laboratory. The EMSL is scheduled to open early in 1997 and will house about 260 resident and visiting scientists. It is anticipated that at least six (6) terabytes of data will be archived in the first year of operation. Both the size of individual datasets and the total amount of data each researcher will manage is expected to become unwieldy and overwhelming for researchers and archive administrators. An object-oriented database management system (OODBMS) and a mass storage system will be integrated to provide an intelligent, automated mechanism to manage data. The resulting system, called the DataBase Computer System (DBCS), will provide total scientific data management capabilities to EMSL users. This paper describes all efforts associated with DBCS-0 and DBCS-1, including software development, key lessons learned, and long-term goals.
{"title":"Scientific data management in the Environmental Molecular Sciences Laboratory","authors":"P. Berard, Thomas L. Keller","doi":"10.1109/MASS.1995.528226","DOIUrl":"https://doi.org/10.1109/MASS.1995.528226","url":null,"abstract":"The Environmental Molecular Sciences Laboratory (EMSL) is currently under construction at Pacific Northwest Laboratory (PNL) for the US Department of Energy (DOE). This laboratory will be used for molecular and environmental sciences research to identify comprehensive solutions to DOE's environmental problems. Major facilities within the EMSL include the Molecular Sciences Computing Facility (MSCF), a laser-surface dynamics laboratory, a high-field nuclear magnetic resonance (NMR) laboratory, and a mass spectrometry laboratory. The EMSL is scheduled to open early in 1997 and will house about 260 resident and visiting scientists. It is anticipated that at least six (6) terabytes of data will be archived in the first year of operation. Both the size of individual datasets and the total amount of data each researcher will manage is expected to become unwieldy and overwhelming for researchers and archive administrators. An object-oriented database management system (OODBMS) and a mass storage system will be integrated to provide an intelligent, automated mechanism to manage data. The resulting system, called the DataBase Computer System (DBCS), will provide total scientific data management capabilities to EMSL users. This paper describes all efforts associated with DBCS-0 and DBCS-1, including software development, key lessons learned, and long-term goals.","PeriodicalId":345074,"journal":{"name":"Proceedings of IEEE 14th Symposium on Mass Storage Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121719749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}