Pub Date : 2009-12-09DOI: 10.1109/E-SCIENCE.2009.21
A. Shah, N. Jaitly, Nino Zuljevic, M. Monroe, A. Liyu, A. Polpitiya, J. Adkins, M. Belov, G. Anderson, Richard D. Smith, I. Gorton
Independent, greedy collection of data events using simple heuristics results in massive over-sampling of the prominent data features in large-scale studies over what should be achievable through “intelligent,” online acquisition of such data. As a result, data generated are more aptly described as a collection of a large number of small experiments rather than a true large-scale experiment. Nevertheless, achieving “intelligent,” online control requires tight interplay between state-of-the-art, data-intensive computing infrastructure developments and analytical algorithms. In this paper, we propose a Software Architecture for Mass spectrometry-based Proteomics coupled with Liquid chromatography Experiments (SAMPLE) to develop an “intelligent” online control and analysis system to significantly enhance the information content from each sensor (in this case, a mass spectrometer). Using online analysis of data events as they are collected and decision theory to optimize the collection of events during an experiment, we aim to maximize the information content generated during an experiment by the use of pre-existing knowledge to optimize the dynamic collection of events.
{"title":"An Architecture for Real Time Data Acquisition and Online Signal Processing for High Throughput Tandem Mass Spectrometry","authors":"A. Shah, N. Jaitly, Nino Zuljevic, M. Monroe, A. Liyu, A. Polpitiya, J. Adkins, M. Belov, G. Anderson, Richard D. Smith, I. Gorton","doi":"10.1109/E-SCIENCE.2009.21","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2009.21","url":null,"abstract":"Independent, greedy collection of data events using simple heuristics results in massive over-sampling of the prominent data features in large-scale studies over what should be achievable through “intelligent,” online acquisition of such data. As a result, data generated are more aptly described as a collection of a large number of small experiments rather than a true large-scale experiment. Nevertheless, achieving “intelligent,” online control requires tight interplay between state-of-the-art, data-intensive computing infrastructure developments and analytical algorithms. In this paper, we propose a Software Architecture for Mass spectrometry-based Proteomics coupled with Liquid chromatography Experiments (SAMPLE) to develop an “intelligent” online control and analysis system to significantly enhance the information content from each sensor (in this case, a mass spectrometer). Using online analysis of data events as they are collected and decision theory to optimize the collection of events during an experiment, we aim to maximize the information content generated during an experiment by the use of pre-existing knowledge to optimize the dynamic collection of events.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/E-SCIENCE.2009.8
J. Zundert, D. Zeldenrust, A. Beaulieu
This paper presents project 'Alfalab'. Alfalab is a collaborative frame work project of the Royal Netherlands Academy of Arts and Sciences (KNAW). It explores the success and fail factors for virtual research collaboration and supporting digital infrastructure in the Humanities. It does so by delivering a virtual research environment engineered through a virtual R&D collaborative and by drawing in use cases and feedback from Humanities researchers from two research fields: textual historical text research and historical GIS-application. The motivation for the project is found in a number of commonly stated factors that seem to be inhibiting general application of virtualized research infrastructure in the Humanities. The paper outlines the project's motivation, key characteristics and implementation. One of the pilot applications is described in greater detail.
{"title":"Alfalab: Construction and Deconstruction of a Digital Humanities Experiment","authors":"J. Zundert, D. Zeldenrust, A. Beaulieu","doi":"10.1109/E-SCIENCE.2009.8","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2009.8","url":null,"abstract":"This paper presents project 'Alfalab'. Alfalab is a collaborative frame work project of the Royal Netherlands Academy of Arts and Sciences (KNAW). It explores the success and fail factors for virtual research collaboration and supporting digital infrastructure in the Humanities. It does so by delivering a virtual research environment engineered through a virtual R&D collaborative and by drawing in use cases and feedback from Humanities researchers from two research fields: textual historical text research and historical GIS-application. The motivation for the project is found in a number of commonly stated factors that seem to be inhibiting general application of virtualized research infrastructure in the Humanities. The paper outlines the project's motivation, key characteristics and implementation. One of the pilot applications is described in greater detail.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121877632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/e-Science.2009.14
K. Ng, Eleni Mikroyannidi, B. Ong, D. Giaretta
Interactive multimedia and human-computer interaction technologies are effecting and contributing towards a wide range of developments in many different subject areas including contemporary performing arts. These technologies have facilitated the developments and advancements of augmented and virtual instruments for interactive music performance, interactive installation, many aspects of technology-enhanced learning (TEL) and others. These systems typically involve several different digital objects including software as well as data that are necessary for the performance and/or data captured/generated during the performance that may be invaluable to understand the performance. Consequently, the preservation of interactive multimedia systems and performances is an important step to ensure possible future re-performances as well as preserving the artistic style and heritage of the art form. This paper presents the CASPAR framework (developed within the CASPAR EC IST project) for the preservation of Interactive Multimedia Performances (IMP) and introduces an IMP archival system that has been developed based on the CASPAR framework and components. This paper also discusses the main functionalities and validation of the IMP archival system developed.
{"title":"An Ontology Based Framework for the Preservation of Interactive Multimedia Performances","authors":"K. Ng, Eleni Mikroyannidi, B. Ong, D. Giaretta","doi":"10.1109/e-Science.2009.14","DOIUrl":"https://doi.org/10.1109/e-Science.2009.14","url":null,"abstract":"Interactive multimedia and human-computer interaction technologies are effecting and contributing towards a wide range of developments in many different subject areas including contemporary performing arts. These technologies have facilitated the developments and advancements of augmented and virtual instruments for interactive music performance, interactive installation, many aspects of technology-enhanced learning (TEL) and others. These systems typically involve several different digital objects including software as well as data that are necessary for the performance and/or data captured/generated during the performance that may be invaluable to understand the performance. Consequently, the preservation of interactive multimedia systems and performances is an important step to ensure possible future re-performances as well as preserving the artistic style and heritage of the art form. This paper presents the CASPAR framework (developed within the CASPAR EC IST project) for the preservation of Interactive Multimedia Performances (IMP) and introduces an IMP archival system that has been developed based on the CASPAR framework and components. This paper also discusses the main functionalities and validation of the IMP archival system developed.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131157615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/e-Science.2009.49
Sucha Smanchat, M. Indrawan, Sea Ling, C. Enticott, D. Abramson
Due to its ability to provide high-performance computing environment, the grid has become an important infrastructure to support eScience. To utilise the grid for parameter sweep experiments, workflow technology combined with tools such as Nimrod/K are used to orchestrate and automate scientific services provided on the grid. As parameter sweeping over a workflow needs to be executed numerous times, it is more efficient to execute multiple instances of the workflow in parallel. However, this parallel execution can be delayed as every workflow instance requires the same set of resources leading to resource competition problem. Although many algorithms exist for scheduling grid workflows, there is little effort in considering multiple workflow instances and resource competition in the scheduling process. In this paper, we proposed a scheduling algorithm for parameter sweep workflow based on resource competition. The proposed algorithm aims to support multiple workflow instances and avoid allocating resources with high resource competition to minimise delay due to the blocking of tasks. The result is evaluated using simulation to compare with an existing scheduling algorithm.
{"title":"Scheduling Multiple Parameter Sweep Workflow Instances on the Grid","authors":"Sucha Smanchat, M. Indrawan, Sea Ling, C. Enticott, D. Abramson","doi":"10.1109/e-Science.2009.49","DOIUrl":"https://doi.org/10.1109/e-Science.2009.49","url":null,"abstract":"Due to its ability to provide high-performance computing environment, the grid has become an important infrastructure to support eScience. To utilise the grid for parameter sweep experiments, workflow technology combined with tools such as Nimrod/K are used to orchestrate and automate scientific services provided on the grid. As parameter sweeping over a workflow needs to be executed numerous times, it is more efficient to execute multiple instances of the workflow in parallel. However, this parallel execution can be delayed as every workflow instance requires the same set of resources leading to resource competition problem. Although many algorithms exist for scheduling grid workflows, there is little effort in considering multiple workflow instances and resource competition in the scheduling process. In this paper, we proposed a scheduling algorithm for parameter sweep workflow based on resource competition. The proposed algorithm aims to support multiple workflow instances and avoid allocating resources with high resource competition to minimise delay due to the blocking of tasks. The result is evaluated using simulation to compare with an existing scheduling algorithm.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123724435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/E-SCIENCE.2009.12
Marta Villegas, Carla Parra
The research reported in this paper is part of the activities carried out within the CLARIN (Common Language Resources and Technology Infrastructure) project, a large-scale pan-European project to create, coordinate and make Language Resources and Technologies (LRT) available and readily useable. CLARIN is devoted to the creation of a persistent and stable infrastructure serving the needs of the European Humanities and Social Sciences (HSS) research community. HSS researchers will be able to efficiently access distributed resources and apply analysis and exploitation tools relevant for their research. Hereby we present a real use case addressed as a CLARIN scenario and the implementation of a demonstrator that enables us to foresee the potential problems and contributes to the planning of the implementation phase. It deals with how to support researchers interested in harvesting and analyzing data from historical press archives. Therefore, we address the integration and interoperability of distributed and heterogeneous research data and analysis tools.
{"title":"Integrating Full-Text Search and Linguistic Analyses on Disperse Data for Humanities and Social Sciences Research Projects","authors":"Marta Villegas, Carla Parra","doi":"10.1109/E-SCIENCE.2009.12","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2009.12","url":null,"abstract":"The research reported in this paper is part of the activities carried out within the CLARIN (Common Language Resources and Technology Infrastructure) project, a large-scale pan-European project to create, coordinate and make Language Resources and Technologies (LRT) available and readily useable. CLARIN is devoted to the creation of a persistent and stable infrastructure serving the needs of the European Humanities and Social Sciences (HSS) research community. HSS researchers will be able to efficiently access distributed resources and apply analysis and exploitation tools relevant for their research. Hereby we present a real use case addressed as a CLARIN scenario and the implementation of a demonstrator that enables us to foresee the potential problems and contributes to the planning of the implementation phase. It deals with how to support researchers interested in harvesting and analyzing data from historical press archives. Therefore, we address the integration and interoperability of distributed and heterogeneous research data and analysis tools.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/e-Science.2009.54
J. Townsend, J. Downing, Peter Murray-Rust
We have developed a methodology and workflow (CHIC) for the automatic semantification and structuring of legacy textual scientific documents. CHIC imports common document formats (PDF, DOCX and (X)HTML) and uses a number of toolkits to extract components and convert them into SciXML. This is sectioned into text-rich and data-rich streams and stand-off annotation (SAF) is created for each. Embedded domain specific objects can be converted into XML (Chemical Markup Language). The different workflow streams can then be recombined and typically converted into RDF (Resource Description Format).
{"title":"CHIC - Converting Hamburgers into Cows","authors":"J. Townsend, J. Downing, Peter Murray-Rust","doi":"10.1109/e-Science.2009.54","DOIUrl":"https://doi.org/10.1109/e-Science.2009.54","url":null,"abstract":"We have developed a methodology and workflow (CHIC) for the automatic semantification and structuring of legacy textual scientific documents. CHIC imports common document formats (PDF, DOCX and (X)HTML) and uses a number of toolkits to extract components and convert them into SciXML. This is sectioned into text-rich and data-rich streams and stand-off annotation (SAF) is created for each. Embedded domain specific objects can be converted into XML (Chemical Markup Language). The different workflow streams can then be recombined and typically converted into RDF (Resource Description Format).","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127471437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/e-Science.2009.35
M. Ondrejcek, Jason Kastner, R. Kooper, P. Bajcsy
This paper addresses the problem of discovering temporal and contextual relationships across document, data, and software categories of electronic records. We designed a methodology to discover unknown relationships by conducting file system and file content analyses. The work also investigates automation of metadata extraction from engineering drawings and storage requirements for metadata extraction. The methodology has been applied to extracting information from a test collection of electronic records about the NAVY ship (TWR 841) archived by the US National Archive (NARA). This test collection represents a problem of unknown relationships among files that include 784 2D image drawings and 22 CAD models.
{"title":"A Methodology for File Relationship Discovery","authors":"M. Ondrejcek, Jason Kastner, R. Kooper, P. Bajcsy","doi":"10.1109/e-Science.2009.35","DOIUrl":"https://doi.org/10.1109/e-Science.2009.35","url":null,"abstract":"This paper addresses the problem of discovering temporal and contextual relationships across document, data, and software categories of electronic records. We designed a methodology to discover unknown relationships by conducting file system and file content analyses. The work also investigates automation of metadata extraction from engineering drawings and storage requirements for metadata extraction. The methodology has been applied to extracting information from a test collection of electronic records about the NAVY ship (TWR 841) archived by the US National Archive (NARA). This test collection represents a problem of unknown relationships among files that include 784 2D image drawings and 22 CAD models.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"105 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114032552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/e-Science.2009.46
I. Gorton, Zhenyu Huang, Yousu Chen, Benson Kalahar, Shuangshuang Jin, D. Chavarría-Miranda, Douglas J. Baxter, J. Feo
Operating the electrical power grid to prevent power black-outs is a complex task. An important aspect of this is contingency analysis, which involves understanding and mitigating potential failures in power grid elements such as transmission lines. When taking into account the potential for multiple simultaneous failures (known as the N-x contingency problem), contingency analysis becomes a massively computational task. In this paper we describe a novel hybrid computational approach to contingency analysis. This approach exploits the unique graph processing performance of the Cray XMT in conjunction with a conventional massively parallel compute cluster to identify likely simultaneous failures that could cause widespread cascading power failures that have massive economic and social impact on society. The approach has the potential to provide the first practical and scalable solution to the N-x contingency problem. When deployed in power grid operations, it will increase the grid operator’s ability to deal effectively with outages and failures with power grid components while preserving stable and safe operation of the grid. The paper describes the architecture of our solution and presents preliminary performance results that validate the efficacy of our approach.
{"title":"A High-Performance Hybrid Computing Approach to Massive Contingency Analysis in the Power Grid","authors":"I. Gorton, Zhenyu Huang, Yousu Chen, Benson Kalahar, Shuangshuang Jin, D. Chavarría-Miranda, Douglas J. Baxter, J. Feo","doi":"10.1109/e-Science.2009.46","DOIUrl":"https://doi.org/10.1109/e-Science.2009.46","url":null,"abstract":"Operating the electrical power grid to prevent power black-outs is a complex task. An important aspect of this is contingency analysis, which involves understanding and mitigating potential failures in power grid elements such as transmission lines. When taking into account the potential for multiple simultaneous failures (known as the N-x contingency problem), contingency analysis becomes a massively computational task. In this paper we describe a novel hybrid computational approach to contingency analysis. This approach exploits the unique graph processing performance of the Cray XMT in conjunction with a conventional massively parallel compute cluster to identify likely simultaneous failures that could cause widespread cascading power failures that have massive economic and social impact on society. The approach has the potential to provide the first practical and scalable solution to the N-x contingency problem. When deployed in power grid operations, it will increase the grid operator’s ability to deal effectively with outages and failures with power grid components while preserving stable and safe operation of the grid. The paper describes the architecture of our solution and presents preliminary performance results that validate the efficacy of our approach.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116720445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/E-SCIENCE.2009.36
D. Flannery, B. Matthews, T. Griffin, J. Bicarregui, M. Gleaves, L. Lerusse, Roger Downing, A. Ashton, Shoaib Sufi, G. Drinkwater, K. K. Dam
Scientific facilities, in particular large-scale photon and neutron sources, have demanding requirements to manage the increasing quantities of experimental data they generate in a systematic and secure way. In this paper, we describe the ICAT infrastructure for cataloguing facility-generated experimental data which has been in development within STFC and DLS for several years. We consider the factors which have influenced its design and describe its architecture and metadata model, a key tool in the management of data. We go on to give an outline of its current implementation and use, with plans for its future development.
{"title":"ICAT: Integrating Data Infrastructure for Facilities Based Science","authors":"D. Flannery, B. Matthews, T. Griffin, J. Bicarregui, M. Gleaves, L. Lerusse, Roger Downing, A. Ashton, Shoaib Sufi, G. Drinkwater, K. K. Dam","doi":"10.1109/E-SCIENCE.2009.36","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2009.36","url":null,"abstract":"Scientific facilities, in particular large-scale photon and neutron sources, have demanding requirements to manage the increasing quantities of experimental data they generate in a systematic and secure way. In this paper, we describe the ICAT infrastructure for cataloguing facility-generated experimental data which has been in development within STFC and DLS for several years. We consider the factors which have influenced its design and describe its architecture and metadata model, a key tool in the management of data. We go on to give an outline of its current implementation and use, with plans for its future development.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129090870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/e-Science.2009.34
J. Ward, A. D. Torcy, Mason Chua, Jon Crabtree
This prototype demonstrated that the migration of collections between digital libraries and preservation data archives is now possible using automated batch load for both data and metadata. We used this capability to enable collection interoperability between the H.W. Odum Institute for Research in Social Science (Odum) Data Archive and the integrated Rule Oriented Data System (iRODS) extension of the National Archives and Record Administration's (NARA) Transcontinental Persistent Archive Prototype (TPAP). We extracted data and metadata from a Dataverse data archive and ingested it into the iRODS server and metadata catalog using the OAI-PMH, Java, XML/XSL and iRODS rules and microservices. We validated ingest of the files and retained the required Terms & Conditions for the social science data after ingest.
{"title":"Extracting and Ingesting DDI Metadata and Digital Objects from a Data Archive into the iRODS Extension of the NARA TPAP Using the OAI-PMH","authors":"J. Ward, A. D. Torcy, Mason Chua, Jon Crabtree","doi":"10.1109/e-Science.2009.34","DOIUrl":"https://doi.org/10.1109/e-Science.2009.34","url":null,"abstract":"This prototype demonstrated that the migration of collections between digital libraries and preservation data archives is now possible using automated batch load for both data and metadata. We used this capability to enable collection interoperability between the H.W. Odum Institute for Research in Social Science (Odum) Data Archive and the integrated Rule Oriented Data System (iRODS) extension of the National Archives and Record Administration's (NARA) Transcontinental Persistent Archive Prototype (TPAP). We extracted data and metadata from a Dataverse data archive and ingested it into the iRODS server and metadata catalog using the OAI-PMH, Java, XML/XSL and iRODS rules and microservices. We validated ingest of the files and retained the required Terms & Conditions for the social science data after ingest.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129639034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}