首页 > 最新文献

2012 IEEE 8th International Conference on E-Science最新文献

英文 中文
Towards semantically-enabled exploration and analysis of environmental ecosystems 对环境生态系统进行语义探索和分析
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404436
Ping Wang, L. Fu, E. Patton, D. McGuinness, F. J. Dein, R. S. Bristol
We aim to inform the development of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of open linked data. In previous work, we designed and implemented a semantically-enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. In this work, we significantly extend SemantEco to include knowledge required to support resource decisions concerning fish and wildlife species and their habitats. Our previous system included foundational ontologies to support environmental regulation violations and relevant human health effects. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for more holistically examining the effects of environmental pollution on ecosystems. Our results include a refactored and expanded version of the SemantEco portal. Additionally the updated system is now compatible with the emerging best in class Extensible Observation Ontology (OBOE). A wider range of relevant data has been integrated, focusing on additions concerning wildlife health related to exposure to contaminants. The resulting system stores and exposes provenance concerning the source of the data, how it was used, and also the rationale for choosing the data. In this paper, we describe the system, highlight its research contributions, and describe current and envisioned usage.
我们的目标是为资源管理者提供决策支持工具的开发信息,这些资源管理者需要检查大型复杂的生态系统,并在面临许多权衡和冲突驱动因素时提出建议。我们采用语义技术方法,利用后台本体和不断增长的开放关联数据体。在之前的工作中,我们设计并实现了一个语义支持的环境监测框架SemantEco,并使用它来构建一个名为SemantAqua的水质门户。在这项工作中,我们大大扩展了SemantEco,以包括支持有关鱼类和野生动物物种及其栖息地的资源决策所需的知识。我们以前的系统包括支持违反环境法规和相关人类健康影响的基础本体。我们增强的框架包括支持野生动物观测和野生动物健康影响建模的基础本体,从而为更全面地研究环境污染对生态系统的影响提供更深入、更广泛的支持。我们的结果包括一个重构和扩展版本的SemantEco门户。此外,更新后的系统现在与新兴的最佳可扩展观察本体(OBOE)兼容。已纳入范围更广的相关数据,重点是增加与接触污染物有关的野生动物健康方面的数据。生成的系统存储并公开有关数据源的来源、使用方式以及选择数据的基本原理。在本文中,我们描述了该系统,突出了其研究贡献,并描述了当前和预期的使用情况。
{"title":"Towards semantically-enabled exploration and analysis of environmental ecosystems","authors":"Ping Wang, L. Fu, E. Patton, D. McGuinness, F. J. Dein, R. S. Bristol","doi":"10.1109/eScience.2012.6404436","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404436","url":null,"abstract":"We aim to inform the development of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of open linked data. In previous work, we designed and implemented a semantically-enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. In this work, we significantly extend SemantEco to include knowledge required to support resource decisions concerning fish and wildlife species and their habitats. Our previous system included foundational ontologies to support environmental regulation violations and relevant human health effects. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for more holistically examining the effects of environmental pollution on ecosystems. Our results include a refactored and expanded version of the SemantEco portal. Additionally the updated system is now compatible with the emerging best in class Extensible Observation Ontology (OBOE). A wider range of relevant data has been integrated, focusing on additions concerning wildlife health related to exposure to contaminants. The resulting system stores and exposes provenance concerning the source of the data, how it was used, and also the rationale for choosing the data. In this paper, we describe the system, highlight its research contributions, and describe current and envisioned usage.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"65 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91032126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A simulator for social exchanges and collaborations — Architecture and case study 社会交流和合作的模拟器-架构和案例研究
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404414
Christian Haas, Simon Caton, Daniel Trumpp, Christof Weinhardt
Social collaboration scenarios, such as sharing resources between friends, are becoming increasingly prevalent in recent years. An example of this new paradigm is Social Cloud Computing, which aims at leveraging existing digital relationships within social networks for the exchange of resources among users and user communities. Due to their complexity, such platforms and systems have to be carefully designed and engineered to suit their purpose. In this paper, we propose a general-purpose simulation tool to help in the design and analysis of Social Collaboration Platforms, and discuss potential use cases and the architecture of the simulator. To show the usefulness of the simulator, we present a simple use case in which we study the effects of an incentive scheme on the system and its user community.
社交协作场景,例如在朋友之间共享资源,近年来变得越来越普遍。这种新范例的一个例子是社会云计算,它旨在利用社会网络中现有的数字关系,在用户和用户社区之间交换资源。由于其复杂性,这些平台和系统必须仔细设计和设计以适应其目的。在本文中,我们提出了一个通用的仿真工具,以帮助设计和分析社会协作平台,并讨论了潜在的用例和模拟器的体系结构。为了展示模拟器的实用性,我们提出了一个简单的用例,在这个用例中,我们研究了激励方案对系统及其用户群体的影响。
{"title":"A simulator for social exchanges and collaborations — Architecture and case study","authors":"Christian Haas, Simon Caton, Daniel Trumpp, Christof Weinhardt","doi":"10.1109/eScience.2012.6404414","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404414","url":null,"abstract":"Social collaboration scenarios, such as sharing resources between friends, are becoming increasingly prevalent in recent years. An example of this new paradigm is Social Cloud Computing, which aims at leveraging existing digital relationships within social networks for the exchange of resources among users and user communities. Due to their complexity, such platforms and systems have to be carefully designed and engineered to suit their purpose. In this paper, we propose a general-purpose simulation tool to help in the design and analysis of Social Collaboration Platforms, and discuss potential use cases and the architecture of the simulator. To show the usefulness of the simulator, we present a simple use case in which we study the effects of an incentive scheme on the system and its user community.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"53 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78160970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Velo and REXAN — Integrated data management and high speed analysis for experimental facilities Velo和REXAN -实验设施的集成数据管理和高速分析
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404463
K. K. Dam, J. Carson, A. Corrigan, D. Einstein, Zoe Guillen, Brandi S. Heath, A. Kuprat, Ingela Lanekoff, C. Lansing, J. Laskin, Dongsheng Li, Y. Liu, M. Marshall, E. Miller, G. Orr, Paulo Pinheiro da Silva, Seun Ryu, C. Szymanski, Mathew Thomas
The Chemical Imaging Initiative at the Pacific Northwest National Laboratory (PNNL) is creating a `Rapid Experimental Analysis' (REXAN) Framework, based on the concept of reusable component libraries. REXAN allows developers to quickly compose and customize high throughput analysis pipelines for a range of experiments, as well as supporting the creation of multi-modal analysis pipelines. In addition, PNNL has coupled REXAN with its collaborative data management and analysis environment Velo to create an easy to use data management and analysis environments for experimental facilities. This paper will discuss the benefits of Velo and REXAN in the context of three examples: PNNL High Resolution Mass Spectrometry - reducing analysis times from hours to seconds, and enabling the analysis of much larger data samples (100KB to 40GB) at the same time. · ALS X-Ray Tomography - reducing analysis times of combined STXM and EM data collected at the ALS from weeks to minutes, decreasing manual work and increasing data volumes that can be analysed in a single step. · Multi-modal nano-scale analysis of STXM and TEM data - providing a semi automated process for particle detection. The creation of REXAN has significantly shortened the development time for these analysis pipelines. The integration of Velo and REXAN has significantly increased the scientific productivity of the instruments and their users by creating easy to use data management and analysis environments with greatly reduced analysis times and improved analysis capabilities.
太平洋西北国家实验室(PNNL)的化学成像计划正在创建一个基于可重用组件库概念的“快速实验分析”(REXAN)框架。REXAN允许开发人员为一系列实验快速组合和定制高通量分析管道,并支持创建多模态分析管道。此外,PNNL将REXAN与其协作数据管理和分析环境Velo相结合,为实验设施创建了一个易于使用的数据管理和分析环境。本文将在三个例子中讨论Velo和REXAN的优势:PNNL高分辨率质谱分析-将分析时间从数小时缩短到数秒,并能够同时分析更大的数据样本(100KB到40GB)。·ALS x射线断层扫描-将ALS收集的STXM和EM数据的综合分析时间从几周减少到几分钟,减少了手工工作,增加了单步分析的数据量。·STXM和TEM数据的多模态纳米级分析-为粒子检测提供半自动化过程。REXAN的创建大大缩短了这些分析管道的开发时间。Velo和REXAN的集成通过创建易于使用的数据管理和分析环境,大大减少了分析时间,提高了分析能力,大大提高了仪器及其用户的科学生产力。
{"title":"Velo and REXAN — Integrated data management and high speed analysis for experimental facilities","authors":"K. K. Dam, J. Carson, A. Corrigan, D. Einstein, Zoe Guillen, Brandi S. Heath, A. Kuprat, Ingela Lanekoff, C. Lansing, J. Laskin, Dongsheng Li, Y. Liu, M. Marshall, E. Miller, G. Orr, Paulo Pinheiro da Silva, Seun Ryu, C. Szymanski, Mathew Thomas","doi":"10.1109/ESCIENCE.2012.6404463","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404463","url":null,"abstract":"The Chemical Imaging Initiative at the Pacific Northwest National Laboratory (PNNL) is creating a `Rapid Experimental Analysis' (REXAN) Framework, based on the concept of reusable component libraries. REXAN allows developers to quickly compose and customize high throughput analysis pipelines for a range of experiments, as well as supporting the creation of multi-modal analysis pipelines. In addition, PNNL has coupled REXAN with its collaborative data management and analysis environment Velo to create an easy to use data management and analysis environments for experimental facilities. This paper will discuss the benefits of Velo and REXAN in the context of three examples: PNNL High Resolution Mass Spectrometry - reducing analysis times from hours to seconds, and enabling the analysis of much larger data samples (100KB to 40GB) at the same time. · ALS X-Ray Tomography - reducing analysis times of combined STXM and EM data collected at the ALS from weeks to minutes, decreasing manual work and increasing data volumes that can be analysed in a single step. · Multi-modal nano-scale analysis of STXM and TEM data - providing a semi automated process for particle detection. The creation of REXAN has significantly shortened the development time for these analysis pipelines. The integration of Velo and REXAN has significantly increased the scientific productivity of the instruments and their users by creating easy to use data management and analysis environments with greatly reduced analysis times and improved analysis capabilities.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"26 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74224141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Incorporating circulation data in relevancy rankings for search algorithms in library collections 将流通数据纳入图书馆馆藏搜索算法的相关性排名中
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404447
H. Green, Kirk Hess, Richard Hislop
This paper demonstrates a series of analyses to calculate new clusters of shared subject headings among items in a library collection. The paper establishes a method of reconstituting anonymous circulation data from a library catalog into separate user transactions. The transaction data is incorporated into subject analyses that use supercomputing resources to generate predictive network analyses and visualizations of subject areas searched by library users. The paper develops several methods for ranking these subject headings, and shows how the analyses will be extended on supercomputing resources for information retrieval research.
本文演示了一系列的分析,以计算图书馆馆藏中项目之间的共享主题标题的新簇。本文建立了一种将图书馆目录中的匿名流通数据重组为单独用户交易的方法。事务数据被合并到主题分析中,使用超级计算资源生成预测网络分析和图书馆用户搜索主题区域的可视化。本文提出了对这些主题进行排序的几种方法,并展示了如何将这些分析扩展到信息检索研究的超级计算资源上。
{"title":"Incorporating circulation data in relevancy rankings for search algorithms in library collections","authors":"H. Green, Kirk Hess, Richard Hislop","doi":"10.1109/eScience.2012.6404447","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404447","url":null,"abstract":"This paper demonstrates a series of analyses to calculate new clusters of shared subject headings among items in a library collection. The paper establishes a method of reconstituting anonymous circulation data from a library catalog into separate user transactions. The transaction data is incorporated into subject analyses that use supercomputing resources to generate predictive network analyses and visualizations of subject areas searched by library users. The paper develops several methods for ranking these subject headings, and shows how the analyses will be extended on supercomputing resources for information retrieval research.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"106 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79316278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Data intensive science at synchrotron based 3D x-ray imaging facilities 基于同步加速器的三维x射线成像设备的数据密集型科学
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404468
F. Carlo, Xianghui Xiao, K. Fezzaa, Steve Wang, N. Schwarz, C. Jacobsen, N. Chawla, F. Fusseis
New developments in detector technology allow the acquisition of micrometer-resolution x-ray transmission images of specimens as large as a few millimeters at unprecedented frame rates. The high x-ray flux density generated by the Advanced Photon Source (APS) allows for detector exposure times ranging from hundreds of milliseconds to 150 picoseconds. The synchronization of the camera with the rotation stage allows a full 3D dataset to be acquired in less than one second. The micro and nano tomography systems available at the x-ray imaging beamlines of the APS are routinely used in material science and geoscience applications where high-resolution and fast 3D imaging are instrumental in extracting in situ four-dimensional dynamic information. Here we will describe the computational challenges associated with the x-ray imaging systems at the APS and discuss our current data model and data analysis processes.
探测器技术的新发展允许以前所未有的帧速率获取大至几毫米的样品的微米分辨率x射线透射图像。先进光子源(APS)产生的高x射线通量密度允许探测器曝光时间从数百毫秒到150皮秒不等。相机与旋转台的同步可以在不到一秒的时间内获得完整的3D数据集。APS x射线成像光束线上的微纳米层析成像系统通常用于材料科学和地球科学应用,在这些应用中,高分辨率和快速3D成像有助于提取原位四维动态信息。在这里,我们将描述与APS x射线成像系统相关的计算挑战,并讨论我们当前的数据模型和数据分析过程。
{"title":"Data intensive science at synchrotron based 3D x-ray imaging facilities","authors":"F. Carlo, Xianghui Xiao, K. Fezzaa, Steve Wang, N. Schwarz, C. Jacobsen, N. Chawla, F. Fusseis","doi":"10.1109/ESCIENCE.2012.6404468","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404468","url":null,"abstract":"New developments in detector technology allow the acquisition of micrometer-resolution x-ray transmission images of specimens as large as a few millimeters at unprecedented frame rates. The high x-ray flux density generated by the Advanced Photon Source (APS) allows for detector exposure times ranging from hundreds of milliseconds to 150 picoseconds. The synchronization of the camera with the rotation stage allows a full 3D dataset to be acquired in less than one second. The micro and nano tomography systems available at the x-ray imaging beamlines of the APS are routinely used in material science and geoscience applications where high-resolution and fast 3D imaging are instrumental in extracting in situ four-dimensional dynamic information. Here we will describe the computational challenges associated with the x-ray imaging systems at the APS and discuss our current data model and data analysis processes.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"9 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76357938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Web applications for experimental control at CLS 用于CLS实验控制的Web应用程序
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404470
Dong Liu, D. Maxwell, Elder Mathias
The advantages of web applications have already got attention in major physical research facilities like Canadian Light Source (CLS). It is the accessability of web applications that makes them preferred to native desktop application in some experimental control scenarios. This short paper presents two web applications that were mainly developed at CLS - Science Studio for remote access and collaboration of instruments and computation resources, and Logit for beamline experiment information management. These two applications represents two typical web applications. Science Studio is heavy-weight and provides a large spectrum of functionalities and has been developed by distributed teams for years. Logit is light-weight, and provides very limited set of features and was delivered in a very short time. The architectural designs are discussed for both sides, and the lessons learned from them are discussed.
web应用程序的优势已经引起了加拿大光源(CLS)等主要物理研究机构的注意。在一些实验性的控制场景中,web应用程序的可访问性使得它们比本地桌面应用程序更受欢迎。本文介绍了两个主要由CLS开发的web应用程序——Science Studio用于远程访问和协作仪器和计算资源,Logit用于光束线实验信息管理。这两个应用程序代表了两个典型的web应用程序。Science Studio是重量级的,提供了大量的功能,并且已经由分布式团队开发了多年。Logit是轻量级的,提供的功能非常有限,并且在很短的时间内交付。讨论了双方的建筑设计,并讨论了从中吸取的教训。
{"title":"Web applications for experimental control at CLS","authors":"Dong Liu, D. Maxwell, Elder Mathias","doi":"10.1109/eScience.2012.6404470","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404470","url":null,"abstract":"The advantages of web applications have already got attention in major physical research facilities like Canadian Light Source (CLS). It is the accessability of web applications that makes them preferred to native desktop application in some experimental control scenarios. This short paper presents two web applications that were mainly developed at CLS - Science Studio for remote access and collaboration of instruments and computation resources, and Logit for beamline experiment information management. These two applications represents two typical web applications. Science Studio is heavy-weight and provides a large spectrum of functionalities and has been developed by distributed teams for years. Logit is light-weight, and provides very limited set of features and was delivered in a very short time. The architectural designs are discussed for both sides, and the lessons learned from them are discussed.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"1 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89332598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image retrieval in the unstructured data management system AUDR 非结构化数据管理系统AUDR中的图像检索
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404474
Junwu Luo, B. Lang, Chao Tian, Danchen Zhang
The explosive growth of image data leads to severe challenges to the traditional image retrieval methods. In order to manage massive images more accurate and efficient, this paper firstly proposes a scalable architecture for image retrieval based on a uniform data model and makes this function a sub-engine of AUDR, an advanced unstructured data management system, which can simultaneously manage several kinds of unstructured data including image, video, audio and text. The paper then proposes a new image retrieval algorithm, which incorporates rich visual features and two text models for multi-modal retrieval. Experiments on both ImageNet dataset and ImageCLEF medical dataset show that our proposed architecture and the new retrieval algorithm are appropriate for efficient management of massive image.
图像数据的爆炸式增长对传统的图像检索方法提出了严峻的挑战。为了更准确、高效地管理海量图像,本文首先提出了一种基于统一数据模型的可扩展图像检索架构,并将该功能作为先进的非结构化数据管理系统AUDR的子引擎,实现对图像、视频、音频和文本等多种非结构化数据的同时管理。然后提出了一种新的图像检索算法,该算法结合了丰富的视觉特征和两种文本模型进行多模态检索。在ImageNet数据集和ImageCLEF医学数据集上的实验表明,我们提出的检索架构和新算法适用于海量图像的高效管理。
{"title":"Image retrieval in the unstructured data management system AUDR","authors":"Junwu Luo, B. Lang, Chao Tian, Danchen Zhang","doi":"10.1109/eScience.2012.6404474","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404474","url":null,"abstract":"The explosive growth of image data leads to severe challenges to the traditional image retrieval methods. In order to manage massive images more accurate and efficient, this paper firstly proposes a scalable architecture for image retrieval based on a uniform data model and makes this function a sub-engine of AUDR, an advanced unstructured data management system, which can simultaneously manage several kinds of unstructured data including image, video, audio and text. The paper then proposes a new image retrieval algorithm, which incorporates rich visual features and two text models for multi-modal retrieval. Experiments on both ImageNet dataset and ImageCLEF medical dataset show that our proposed architecture and the new retrieval algorithm are appropriate for efficient management of massive image.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"1 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81342863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A framework to access handwritten information within large digitized paper collections 一个访问大型数字化纸质馆藏中手写信息的框架
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404434
Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry
We describe our efforts with the National Archives and Records Administration (NARA) to provide a form of automated search of handwritten content within large digitized document archives. With a growing push towards the digitization of paper archives there is an imminent need to develop tools capable of searching the resulting unstructured image data as data from such collections offer valuable historical records that can be mined for information pertinent to a number of fields from the geosciences to the humanities. To carry out the search, we use a Computer Vision technique called Word Spotting. A form of content based image retrieval, it avoids the still difficult task of directly recognizing the text by allowing a user to search using a query image containing handwritten text and ranking a database of images in terms of those that contain more similar looking content. In order to make this search capability available on an archive, three computationally expensive pre-processing steps are required. We describe these steps, the open source framework we have developed, and how it can be used not only on the recently released 1940 Census data containing nearly 4 million high resolution scanned forms, but also on other collections of forms. With a growing demand to digitize our wealth of paper archives we see this type of automated search as a low cost scalable alternative to the costly manual transcription that would otherwise be required.
我们描述了我们与国家档案和记录管理局(NARA)合作的努力,以提供一种在大型数字化文档档案中自动搜索手写内容的形式。随着纸质档案数字化的不断推进,迫切需要开发能够搜索由此产生的非结构化图像数据的工具,因为这些收集的数据提供了有价值的历史记录,可以从地球科学到人文科学等多个领域挖掘相关信息。为了进行搜索,我们使用了一种叫做单词识别的计算机视觉技术。它是一种基于内容的图像检索形式,它允许用户使用包含手写文本的查询图像进行搜索,并根据包含更相似内容的图像对图像数据库进行排序,从而避免了直接识别文本的困难任务。为了在存档中提供这种搜索功能,需要执行三个计算代价高昂的预处理步骤。我们描述了这些步骤,我们开发的开源框架,以及它如何不仅用于最近发布的包含近400万高分辨率扫描表格的1940年人口普查数据,而且还用于其他表格集合。随着对纸质档案数字化需求的不断增长,我们将这种类型的自动搜索视为一种低成本可扩展的替代方案,否则将需要昂贵的人工转录。
{"title":"A framework to access handwritten information within large digitized paper collections","authors":"Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry","doi":"10.1109/eScience.2012.6404434","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404434","url":null,"abstract":"We describe our efforts with the National Archives and Records Administration (NARA) to provide a form of automated search of handwritten content within large digitized document archives. With a growing push towards the digitization of paper archives there is an imminent need to develop tools capable of searching the resulting unstructured image data as data from such collections offer valuable historical records that can be mined for information pertinent to a number of fields from the geosciences to the humanities. To carry out the search, we use a Computer Vision technique called Word Spotting. A form of content based image retrieval, it avoids the still difficult task of directly recognizing the text by allowing a user to search using a query image containing handwritten text and ranking a database of images in terms of those that contain more similar looking content. In order to make this search capability available on an archive, three computationally expensive pre-processing steps are required. We describe these steps, the open source framework we have developed, and how it can be used not only on the recently released 1940 Census data containing nearly 4 million high resolution scanned forms, but also on other collections of forms. With a growing demand to digitize our wealth of paper archives we see this type of automated search as a low cost scalable alternative to the costly manual transcription that would otherwise be required.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"20 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89543468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
ExSciTecH: Expanding volunteer computing to Explore Science, Technology, and Health ExSciTecH:扩大志愿者计算探索科学、技术和健康
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404451
Michael Matheny, S. Schlachter, L. M. Crouse, E. T. Kimmel, Trilce Estrada, Marcel Schumann, R. Armen, G. Zoppetti, M. Taufer
This paper presents ExSciTecH, an NSF-funded project deploying volunteer computing (VC) systems to Explore Science, Tecenology, and Health. ExSciTecH aims at radically transforming VC systems and the volunteer's experience. To pursue this goal, ExSciTecH integrates and uses gameplay environments into BOINC, a well-known VC middleware, to involve the volunteers not only for simply donating idle cycles but also for actively participating in scientific discovery, i.e., generating new simulations side by side with the scientists. More specifically, ExSciTecH plugs into the BOINC framework extending it with two main gaming components, i.e., a learning component that includes a suite of games for training users on relevant biochemical concepts, and an engaging component that includes a suite of games to engage volunteers in drug design and scientific discovery. We assessed the impact of a first implementation of the learning game on a group of students at the University of Delaware. Our tests clearly show how ExSciTecH can generate higher levels of enthusiasm than more traditional learning tools in our students.
本文介绍了ExSciTecH,一个由美国国家科学基金会资助的项目,部署志愿计算(VC)系统来探索科学、技术和健康。ExSciTecH旨在从根本上改变VC系统和志愿者的体验。为了实现这一目标,ExSciTecH将游戏环境整合并使用到著名的VC中间件BOINC中,让志愿者不仅可以简单地捐赠空闲周期,还可以积极参与科学发现,即与科学家一起生成新的模拟。更具体地说,ExSciTecH插入BOINC框架,通过两个主要的游戏组件扩展它,即,一个学习组件,包括一套游戏,用于培训用户相关的生化概念,以及一个吸引组件,包括一套游戏,参与药物设计和科学发现的志愿者。我们在特拉华大学的一组学生中评估了第一次实施学习游戏的影响。我们的测试清楚地表明,与传统的学习工具相比,ExSciTecH可以在学生中产生更高水平的热情。
{"title":"ExSciTecH: Expanding volunteer computing to Explore Science, Technology, and Health","authors":"Michael Matheny, S. Schlachter, L. M. Crouse, E. T. Kimmel, Trilce Estrada, Marcel Schumann, R. Armen, G. Zoppetti, M. Taufer","doi":"10.1109/eScience.2012.6404451","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404451","url":null,"abstract":"This paper presents ExSciTecH, an NSF-funded project deploying volunteer computing (VC) systems to Explore Science, Tecenology, and Health. ExSciTecH aims at radically transforming VC systems and the volunteer's experience. To pursue this goal, ExSciTecH integrates and uses gameplay environments into BOINC, a well-known VC middleware, to involve the volunteers not only for simply donating idle cycles but also for actively participating in scientific discovery, i.e., generating new simulations side by side with the scientists. More specifically, ExSciTecH plugs into the BOINC framework extending it with two main gaming components, i.e., a learning component that includes a suite of games for training users on relevant biochemical concepts, and an engaging component that includes a suite of games to engage volunteers in drug design and scientific discovery. We assessed the impact of a first implementation of the learning game on a group of students at the University of Delaware. Our tests clearly show how ExSciTecH can generate higher levels of enthusiasm than more traditional learning tools in our students.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"8 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84163478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Quality of data driven simulation workflows 数据驱动的仿真工作流的质量
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404417
M. Reiter, Uwe Breitenbücher, Oliver Kopp, D. Karastoyanova
Simulations are characterized by long running calculations and complex data handling tasks accompanied by non-trivial data dependencies. The workflow technology helps to automate and steer such simulations. Quality of Data frameworks are used to determine the goodness of simulation data, e.g., they analyze the accuracy of input data with regards to the usability within numerical solvers. In this paper, we present generic approaches using evaluated Quality of Data to steer simulation workflows. This allows for ensuring that the predefined requirements such as a precise final result or a short execution time will be met even after the execution of simulation workflow has been started. We discuss mechanisms for steering a simulation on all relevant levels - workflow, service, algorithms, and define a unifying approach to control such workflows. To realize Quality of Data-driven workflows, we present an architecture realizing the presented approach and a WS-Policy-based language to describe Quality of Data requirements and capabilities.
模拟的特点是长时间运行的计算和复杂的数据处理任务伴随着重要的数据依赖关系。工作流技术有助于自动化和引导此类模拟。数据框架的质量用于确定模拟数据的好坏,例如,它们根据数值求解器的可用性分析输入数据的准确性。在本文中,我们提出了使用评估数据质量来引导仿真工作流程的通用方法。这样可以确保预定义的需求,例如精确的最终结果或短的执行时间,即使在开始执行模拟工作流之后也能得到满足。我们讨论了在所有相关级别(工作流、服务、算法)上指导模拟的机制,并定义了控制这些工作流的统一方法。为了实现数据驱动工作流的质量,我们提出了一个实现所提方法的体系结构和一个基于ws - policy的语言来描述数据质量需求和功能。
{"title":"Quality of data driven simulation workflows","authors":"M. Reiter, Uwe Breitenbücher, Oliver Kopp, D. Karastoyanova","doi":"10.1109/ESCIENCE.2012.6404417","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404417","url":null,"abstract":"Simulations are characterized by long running calculations and complex data handling tasks accompanied by non-trivial data dependencies. The workflow technology helps to automate and steer such simulations. Quality of Data frameworks are used to determine the goodness of simulation data, e.g., they analyze the accuracy of input data with regards to the usability within numerical solvers. In this paper, we present generic approaches using evaluated Quality of Data to steer simulation workflows. This allows for ensuring that the predefined requirements such as a precise final result or a short execution time will be met even after the execution of simulation workflow has been started. We discuss mechanisms for steering a simulation on all relevant levels - workflow, service, algorithms, and define a unifying approach to control such workflows. To realize Quality of Data-driven workflows, we present an architecture realizing the presented approach and a WS-Policy-based language to describe Quality of Data requirements and capabilities.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"27 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89534925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2012 IEEE 8th International Conference on E-Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1