Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404436
Ping Wang, L. Fu, E. Patton, D. McGuinness, F. J. Dein, R. S. Bristol
We aim to inform the development of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of open linked data. In previous work, we designed and implemented a semantically-enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. In this work, we significantly extend SemantEco to include knowledge required to support resource decisions concerning fish and wildlife species and their habitats. Our previous system included foundational ontologies to support environmental regulation violations and relevant human health effects. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for more holistically examining the effects of environmental pollution on ecosystems. Our results include a refactored and expanded version of the SemantEco portal. Additionally the updated system is now compatible with the emerging best in class Extensible Observation Ontology (OBOE). A wider range of relevant data has been integrated, focusing on additions concerning wildlife health related to exposure to contaminants. The resulting system stores and exposes provenance concerning the source of the data, how it was used, and also the rationale for choosing the data. In this paper, we describe the system, highlight its research contributions, and describe current and envisioned usage.
{"title":"Towards semantically-enabled exploration and analysis of environmental ecosystems","authors":"Ping Wang, L. Fu, E. Patton, D. McGuinness, F. J. Dein, R. S. Bristol","doi":"10.1109/eScience.2012.6404436","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404436","url":null,"abstract":"We aim to inform the development of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of open linked data. In previous work, we designed and implemented a semantically-enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. In this work, we significantly extend SemantEco to include knowledge required to support resource decisions concerning fish and wildlife species and their habitats. Our previous system included foundational ontologies to support environmental regulation violations and relevant human health effects. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for more holistically examining the effects of environmental pollution on ecosystems. Our results include a refactored and expanded version of the SemantEco portal. Additionally the updated system is now compatible with the emerging best in class Extensible Observation Ontology (OBOE). A wider range of relevant data has been integrated, focusing on additions concerning wildlife health related to exposure to contaminants. The resulting system stores and exposes provenance concerning the source of the data, how it was used, and also the rationale for choosing the data. In this paper, we describe the system, highlight its research contributions, and describe current and envisioned usage.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"65 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91032126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404414
Christian Haas, Simon Caton, Daniel Trumpp, Christof Weinhardt
Social collaboration scenarios, such as sharing resources between friends, are becoming increasingly prevalent in recent years. An example of this new paradigm is Social Cloud Computing, which aims at leveraging existing digital relationships within social networks for the exchange of resources among users and user communities. Due to their complexity, such platforms and systems have to be carefully designed and engineered to suit their purpose. In this paper, we propose a general-purpose simulation tool to help in the design and analysis of Social Collaboration Platforms, and discuss potential use cases and the architecture of the simulator. To show the usefulness of the simulator, we present a simple use case in which we study the effects of an incentive scheme on the system and its user community.
{"title":"A simulator for social exchanges and collaborations — Architecture and case study","authors":"Christian Haas, Simon Caton, Daniel Trumpp, Christof Weinhardt","doi":"10.1109/eScience.2012.6404414","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404414","url":null,"abstract":"Social collaboration scenarios, such as sharing resources between friends, are becoming increasingly prevalent in recent years. An example of this new paradigm is Social Cloud Computing, which aims at leveraging existing digital relationships within social networks for the exchange of resources among users and user communities. Due to their complexity, such platforms and systems have to be carefully designed and engineered to suit their purpose. In this paper, we propose a general-purpose simulation tool to help in the design and analysis of Social Collaboration Platforms, and discuss potential use cases and the architecture of the simulator. To show the usefulness of the simulator, we present a simple use case in which we study the effects of an incentive scheme on the system and its user community.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"53 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78160970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404463
K. K. Dam, J. Carson, A. Corrigan, D. Einstein, Zoe Guillen, Brandi S. Heath, A. Kuprat, Ingela Lanekoff, C. Lansing, J. Laskin, Dongsheng Li, Y. Liu, M. Marshall, E. Miller, G. Orr, Paulo Pinheiro da Silva, Seun Ryu, C. Szymanski, Mathew Thomas
The Chemical Imaging Initiative at the Pacific Northwest National Laboratory (PNNL) is creating a `Rapid Experimental Analysis' (REXAN) Framework, based on the concept of reusable component libraries. REXAN allows developers to quickly compose and customize high throughput analysis pipelines for a range of experiments, as well as supporting the creation of multi-modal analysis pipelines. In addition, PNNL has coupled REXAN with its collaborative data management and analysis environment Velo to create an easy to use data management and analysis environments for experimental facilities. This paper will discuss the benefits of Velo and REXAN in the context of three examples: PNNL High Resolution Mass Spectrometry - reducing analysis times from hours to seconds, and enabling the analysis of much larger data samples (100KB to 40GB) at the same time. · ALS X-Ray Tomography - reducing analysis times of combined STXM and EM data collected at the ALS from weeks to minutes, decreasing manual work and increasing data volumes that can be analysed in a single step. · Multi-modal nano-scale analysis of STXM and TEM data - providing a semi automated process for particle detection. The creation of REXAN has significantly shortened the development time for these analysis pipelines. The integration of Velo and REXAN has significantly increased the scientific productivity of the instruments and their users by creating easy to use data management and analysis environments with greatly reduced analysis times and improved analysis capabilities.
{"title":"Velo and REXAN — Integrated data management and high speed analysis for experimental facilities","authors":"K. K. Dam, J. Carson, A. Corrigan, D. Einstein, Zoe Guillen, Brandi S. Heath, A. Kuprat, Ingela Lanekoff, C. Lansing, J. Laskin, Dongsheng Li, Y. Liu, M. Marshall, E. Miller, G. Orr, Paulo Pinheiro da Silva, Seun Ryu, C. Szymanski, Mathew Thomas","doi":"10.1109/ESCIENCE.2012.6404463","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404463","url":null,"abstract":"The Chemical Imaging Initiative at the Pacific Northwest National Laboratory (PNNL) is creating a `Rapid Experimental Analysis' (REXAN) Framework, based on the concept of reusable component libraries. REXAN allows developers to quickly compose and customize high throughput analysis pipelines for a range of experiments, as well as supporting the creation of multi-modal analysis pipelines. In addition, PNNL has coupled REXAN with its collaborative data management and analysis environment Velo to create an easy to use data management and analysis environments for experimental facilities. This paper will discuss the benefits of Velo and REXAN in the context of three examples: PNNL High Resolution Mass Spectrometry - reducing analysis times from hours to seconds, and enabling the analysis of much larger data samples (100KB to 40GB) at the same time. · ALS X-Ray Tomography - reducing analysis times of combined STXM and EM data collected at the ALS from weeks to minutes, decreasing manual work and increasing data volumes that can be analysed in a single step. · Multi-modal nano-scale analysis of STXM and TEM data - providing a semi automated process for particle detection. The creation of REXAN has significantly shortened the development time for these analysis pipelines. The integration of Velo and REXAN has significantly increased the scientific productivity of the instruments and their users by creating easy to use data management and analysis environments with greatly reduced analysis times and improved analysis capabilities.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"26 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74224141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404447
H. Green, Kirk Hess, Richard Hislop
This paper demonstrates a series of analyses to calculate new clusters of shared subject headings among items in a library collection. The paper establishes a method of reconstituting anonymous circulation data from a library catalog into separate user transactions. The transaction data is incorporated into subject analyses that use supercomputing resources to generate predictive network analyses and visualizations of subject areas searched by library users. The paper develops several methods for ranking these subject headings, and shows how the analyses will be extended on supercomputing resources for information retrieval research.
{"title":"Incorporating circulation data in relevancy rankings for search algorithms in library collections","authors":"H. Green, Kirk Hess, Richard Hislop","doi":"10.1109/eScience.2012.6404447","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404447","url":null,"abstract":"This paper demonstrates a series of analyses to calculate new clusters of shared subject headings among items in a library collection. The paper establishes a method of reconstituting anonymous circulation data from a library catalog into separate user transactions. The transaction data is incorporated into subject analyses that use supercomputing resources to generate predictive network analyses and visualizations of subject areas searched by library users. The paper develops several methods for ranking these subject headings, and shows how the analyses will be extended on supercomputing resources for information retrieval research.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"106 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79316278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404468
F. Carlo, Xianghui Xiao, K. Fezzaa, Steve Wang, N. Schwarz, C. Jacobsen, N. Chawla, F. Fusseis
New developments in detector technology allow the acquisition of micrometer-resolution x-ray transmission images of specimens as large as a few millimeters at unprecedented frame rates. The high x-ray flux density generated by the Advanced Photon Source (APS) allows for detector exposure times ranging from hundreds of milliseconds to 150 picoseconds. The synchronization of the camera with the rotation stage allows a full 3D dataset to be acquired in less than one second. The micro and nano tomography systems available at the x-ray imaging beamlines of the APS are routinely used in material science and geoscience applications where high-resolution and fast 3D imaging are instrumental in extracting in situ four-dimensional dynamic information. Here we will describe the computational challenges associated with the x-ray imaging systems at the APS and discuss our current data model and data analysis processes.
{"title":"Data intensive science at synchrotron based 3D x-ray imaging facilities","authors":"F. Carlo, Xianghui Xiao, K. Fezzaa, Steve Wang, N. Schwarz, C. Jacobsen, N. Chawla, F. Fusseis","doi":"10.1109/ESCIENCE.2012.6404468","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404468","url":null,"abstract":"New developments in detector technology allow the acquisition of micrometer-resolution x-ray transmission images of specimens as large as a few millimeters at unprecedented frame rates. The high x-ray flux density generated by the Advanced Photon Source (APS) allows for detector exposure times ranging from hundreds of milliseconds to 150 picoseconds. The synchronization of the camera with the rotation stage allows a full 3D dataset to be acquired in less than one second. The micro and nano tomography systems available at the x-ray imaging beamlines of the APS are routinely used in material science and geoscience applications where high-resolution and fast 3D imaging are instrumental in extracting in situ four-dimensional dynamic information. Here we will describe the computational challenges associated with the x-ray imaging systems at the APS and discuss our current data model and data analysis processes.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"9 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76357938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404470
Dong Liu, D. Maxwell, Elder Mathias
The advantages of web applications have already got attention in major physical research facilities like Canadian Light Source (CLS). It is the accessability of web applications that makes them preferred to native desktop application in some experimental control scenarios. This short paper presents two web applications that were mainly developed at CLS - Science Studio for remote access and collaboration of instruments and computation resources, and Logit for beamline experiment information management. These two applications represents two typical web applications. Science Studio is heavy-weight and provides a large spectrum of functionalities and has been developed by distributed teams for years. Logit is light-weight, and provides very limited set of features and was delivered in a very short time. The architectural designs are discussed for both sides, and the lessons learned from them are discussed.
{"title":"Web applications for experimental control at CLS","authors":"Dong Liu, D. Maxwell, Elder Mathias","doi":"10.1109/eScience.2012.6404470","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404470","url":null,"abstract":"The advantages of web applications have already got attention in major physical research facilities like Canadian Light Source (CLS). It is the accessability of web applications that makes them preferred to native desktop application in some experimental control scenarios. This short paper presents two web applications that were mainly developed at CLS - Science Studio for remote access and collaboration of instruments and computation resources, and Logit for beamline experiment information management. These two applications represents two typical web applications. Science Studio is heavy-weight and provides a large spectrum of functionalities and has been developed by distributed teams for years. Logit is light-weight, and provides very limited set of features and was delivered in a very short time. The architectural designs are discussed for both sides, and the lessons learned from them are discussed.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"1 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89332598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404449
Nicolas Limare, L. Oudre, Pascal Getreuer
With the journal Image Processing On Line (IPOL), we propose to promote software to the status of regular research material and subject it to the same treatment as research papers: it must be reviewed, it must be reusable and verifiable by the research community, it must follow style and quality guidelines. In IPOL, algorithms are published with their implementation, codes are peer-reviewed, and a web-based test interface is attached to each of these articles. This results in more software released by the researchers, a better software quality achieved with the review process, and a large collection of test data gathered for each article. IPOL has been active since 2010, and has already published thirty articles.
{"title":"IPOL: Reviewed publication and public testing of research software","authors":"Nicolas Limare, L. Oudre, Pascal Getreuer","doi":"10.1109/eScience.2012.6404449","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404449","url":null,"abstract":"With the journal Image Processing On Line (IPOL), we propose to promote software to the status of regular research material and subject it to the same treatment as research papers: it must be reviewed, it must be reusable and verifiable by the research community, it must follow style and quality guidelines. In IPOL, algorithms are published with their implementation, codes are peer-reviewed, and a web-based test interface is attached to each of these articles. This results in more software released by the researchers, a better software quality achieved with the review process, and a large collection of test data gathered for each article. IPOL has been active since 2010, and has already published thirty articles.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"75 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77418661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404461
P. Ruth, A. Mandal, Yufeng Xin, I. Baldine, Chris Heermann, J. Chase
Advanced networks are an essential element of data-driven science enabled by next generation cyberinfrastructure environments. Computational activities increasingly incorporate widely dispersed resources with linkages among software components spanning multiple sites and administrative domains. We have seen recent advances in enabling on-demand network circuits in the national and international backbones coupled with Software Defined Networking (SDN) advances like OpenFlow and programmable edge technologies like OpenStack. These advances have created an unprecedented opportunity to enable complex scientific applications to run on specially tailored, dynamic infrastructure that include compute, storage and network resources, combining the performance advantages of purpose-built infrastructures, but without the costs of a permanent infrastructure. This work presents an experience deploying scientific workflows on the ExoGENI national test bed that dynamically allocates computational resources with high-speed circuits from backbone providers. Dynamically allocated bandwidth-provisioned high-speed circuits increase the ability of scientific applications to access and stage large data sets from remote data repositories or to move computation to remote sites and access data stored locally. The remainder of this extended abstract is a brief description of the test bed and several scientific workflow applications that were deployed using bandwidth-provisioned high-speed circuits.
{"title":"Dynamic network provisioning for data intensive applications in the cloud","authors":"P. Ruth, A. Mandal, Yufeng Xin, I. Baldine, Chris Heermann, J. Chase","doi":"10.1109/eScience.2012.6404461","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404461","url":null,"abstract":"Advanced networks are an essential element of data-driven science enabled by next generation cyberinfrastructure environments. Computational activities increasingly incorporate widely dispersed resources with linkages among software components spanning multiple sites and administrative domains. We have seen recent advances in enabling on-demand network circuits in the national and international backbones coupled with Software Defined Networking (SDN) advances like OpenFlow and programmable edge technologies like OpenStack. These advances have created an unprecedented opportunity to enable complex scientific applications to run on specially tailored, dynamic infrastructure that include compute, storage and network resources, combining the performance advantages of purpose-built infrastructures, but without the costs of a permanent infrastructure. This work presents an experience deploying scientific workflows on the ExoGENI national test bed that dynamically allocates computational resources with high-speed circuits from backbone providers. Dynamically allocated bandwidth-provisioned high-speed circuits increase the ability of scientific applications to access and stage large data sets from remote data repositories or to move computation to remote sites and access data stored locally. The remainder of this extended abstract is a brief description of the test bed and several scientific workflow applications that were deployed using bandwidth-provisioned high-speed circuits.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"28 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74739470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404434
Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry
We describe our efforts with the National Archives and Records Administration (NARA) to provide a form of automated search of handwritten content within large digitized document archives. With a growing push towards the digitization of paper archives there is an imminent need to develop tools capable of searching the resulting unstructured image data as data from such collections offer valuable historical records that can be mined for information pertinent to a number of fields from the geosciences to the humanities. To carry out the search, we use a Computer Vision technique called Word Spotting. A form of content based image retrieval, it avoids the still difficult task of directly recognizing the text by allowing a user to search using a query image containing handwritten text and ranking a database of images in terms of those that contain more similar looking content. In order to make this search capability available on an archive, three computationally expensive pre-processing steps are required. We describe these steps, the open source framework we have developed, and how it can be used not only on the recently released 1940 Census data containing nearly 4 million high resolution scanned forms, but also on other collections of forms. With a growing demand to digitize our wealth of paper archives we see this type of automated search as a low cost scalable alternative to the costly manual transcription that would otherwise be required.
{"title":"A framework to access handwritten information within large digitized paper collections","authors":"Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry","doi":"10.1109/eScience.2012.6404434","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404434","url":null,"abstract":"We describe our efforts with the National Archives and Records Administration (NARA) to provide a form of automated search of handwritten content within large digitized document archives. With a growing push towards the digitization of paper archives there is an imminent need to develop tools capable of searching the resulting unstructured image data as data from such collections offer valuable historical records that can be mined for information pertinent to a number of fields from the geosciences to the humanities. To carry out the search, we use a Computer Vision technique called Word Spotting. A form of content based image retrieval, it avoids the still difficult task of directly recognizing the text by allowing a user to search using a query image containing handwritten text and ranking a database of images in terms of those that contain more similar looking content. In order to make this search capability available on an archive, three computationally expensive pre-processing steps are required. We describe these steps, the open source framework we have developed, and how it can be used not only on the recently released 1940 Census data containing nearly 4 million high resolution scanned forms, but also on other collections of forms. With a growing demand to digitize our wealth of paper archives we see this type of automated search as a low cost scalable alternative to the costly manual transcription that would otherwise be required.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"20 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89543468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404417
M. Reiter, Uwe Breitenbücher, Oliver Kopp, D. Karastoyanova
Simulations are characterized by long running calculations and complex data handling tasks accompanied by non-trivial data dependencies. The workflow technology helps to automate and steer such simulations. Quality of Data frameworks are used to determine the goodness of simulation data, e.g., they analyze the accuracy of input data with regards to the usability within numerical solvers. In this paper, we present generic approaches using evaluated Quality of Data to steer simulation workflows. This allows for ensuring that the predefined requirements such as a precise final result or a short execution time will be met even after the execution of simulation workflow has been started. We discuss mechanisms for steering a simulation on all relevant levels - workflow, service, algorithms, and define a unifying approach to control such workflows. To realize Quality of Data-driven workflows, we present an architecture realizing the presented approach and a WS-Policy-based language to describe Quality of Data requirements and capabilities.
{"title":"Quality of data driven simulation workflows","authors":"M. Reiter, Uwe Breitenbücher, Oliver Kopp, D. Karastoyanova","doi":"10.1109/ESCIENCE.2012.6404417","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404417","url":null,"abstract":"Simulations are characterized by long running calculations and complex data handling tasks accompanied by non-trivial data dependencies. The workflow technology helps to automate and steer such simulations. Quality of Data frameworks are used to determine the goodness of simulation data, e.g., they analyze the accuracy of input data with regards to the usability within numerical solvers. In this paper, we present generic approaches using evaluated Quality of Data to steer simulation workflows. This allows for ensuring that the predefined requirements such as a precise final result or a short execution time will be met even after the execution of simulation workflow has been started. We discuss mechanisms for steering a simulation on all relevant levels - workflow, service, algorithms, and define a unifying approach to control such workflows. To realize Quality of Data-driven workflows, we present an architecture realizing the presented approach and a WS-Policy-based language to describe Quality of Data requirements and capabilities.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"27 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89534925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}