首页 > 最新文献

2013 IEEE 9th International Conference on e-Science最新文献

英文 中文
Consensus Sigma-70 Promoter Prediction Using Hadoop 基于Hadoop的共识Sigma-70启动子预测
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.42
J. Hogan, W. Kelly, Felicity Newell
MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a well established workflow for identifying promoters - binding sites for regulatory proteins - across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.
MapReduce框架(如Hadoop)非常适合处理可以单独和独立处理的大型数据集,在信息检索和销售记录分析中具有规范的应用程序。测序技术的快速发展确保了基因组数据的爆炸性增长,随之而来的是大规模比较基因组学的重要性上升,通常涉及偏离经典Map Reduce结构的操作和数据关系。这项工作考察了Hadoop在这种性质模式中的应用,我们的重点是建立一个良好的工作流程,用于识别跨多个基因区域和生物体的启动子(调节蛋白的结合位点),以及将这些结果组装成共识序列的统一步骤。我们的方法展示了Hadoop在解决这类问题上的实用性,展示了如何至少部分地克服“主导分解”的暴政。它还演示了如何通过分割和重新组织输入文件的预处理来优化负载平衡和并行度粒度,从而允许将广泛的相关问题放在同一个计算伞下。
{"title":"Consensus Sigma-70 Promoter Prediction Using Hadoop","authors":"J. Hogan, W. Kelly, Felicity Newell","doi":"10.1109/eScience.2013.42","DOIUrl":"https://doi.org/10.1109/eScience.2013.42","url":null,"abstract":"MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a well established workflow for identifying promoters - binding sites for regulatory proteins - across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the \"dominant decomposition\" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132350870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cabinet: Managing Data Efficiently in the Global Federated File System 内阁:在全球联邦文件系统中有效地管理数据
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.36
Avinash Kalyanaraman, A. Grimshaw
With ever expanding datasets, efficient data management in grids becomes important. This paper describes Cabinet which employs two techniques for efficiently managing data in grids-a caching system and a new file staging approach called coordinated staging. The caching system is designed based on the characteristics of grid applications. Coordinated staging is based on the BitTorrent Protocol model and is specifically designed for High Throughput Computing (HTC) applications, a common use-case for grids. In coordinated staging, each site that is assigned to execute an individual job of the HTC application treats other execution sites as potential replica-stores. In our evaluation, we show that coordinated staging lowered the download time of a file by 3.85x, and increased the throughput of the download by 2.86x over the conventional approach of file transfer from a single source.
随着数据集的不断扩展,网格中有效的数据管理变得非常重要。本文描述了Cabinet,它采用了两种技术来有效地管理网格中的数据——一种缓存系统和一种称为协调分段的新文件分段方法。该缓存系统是根据网格应用的特点设计的。协调分段是基于BitTorrent协议模型,是专门为高吞吐量计算(HTC)应用设计的,这是网格的一个常见用例。在协调分段中,分配给执行HTC应用程序的单个作业的每个站点都将其他执行站点视为潜在的副本存储。在我们的评估中,我们表明,与传统的从单个源传输文件的方法相比,协调分段将文件的下载时间降低了3.85倍,并将下载吞吐量提高了2.86倍。
{"title":"Cabinet: Managing Data Efficiently in the Global Federated File System","authors":"Avinash Kalyanaraman, A. Grimshaw","doi":"10.1109/eScience.2013.36","DOIUrl":"https://doi.org/10.1109/eScience.2013.36","url":null,"abstract":"With ever expanding datasets, efficient data management in grids becomes important. This paper describes Cabinet which employs two techniques for efficiently managing data in grids-a caching system and a new file staging approach called coordinated staging. The caching system is designed based on the characteristics of grid applications. Coordinated staging is based on the BitTorrent Protocol model and is specifically designed for High Throughput Computing (HTC) applications, a common use-case for grids. In coordinated staging, each site that is assigned to execute an individual job of the HTC application treats other execution sites as potential replica-stores. In our evaluation, we show that coordinated staging lowered the download time of a file by 3.85x, and increased the throughput of the download by 2.86x over the conventional approach of file transfer from a single source.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126211396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bird-SDPS: A Migratory Birds' Spatial Distribution Prediction System 候鸟空间分布预测系统- sdps
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.12
Yuanchun Zhou, Jing Shao, Xuezhi Wang, Ze Luo, Jianhui Li, Baoping Yan
Species distribution modeling is an important ecological research task that has received a great deal of interest. There are several single model packages and applications available for species distribution analysis. This paper introduces Bird-SDPS, a Prediction System for Migratory Birds' Spatial Distribution, which is an extensible system for birds' spatial distribution prediction. The Bird-SDPS uses birds' GPS tracking data and remote sensing data as input to build multiple distribution models, which are implemented by different programming languages. And the system provides online access and visualization functions. In order to store large dataset of remote sensing data, we design a hybrid storage structure based on HBase. We extensively evaluate our system using a real-world GPS dataset collected from 90 wild birds over 3 years. We show that the system can conduct birds' distribution prediction based on multiple models, and our hybrid data storage modes can outperform the traditional storage modes of files.
物种分布建模是一项重要的生态学研究任务,受到了广泛的关注。有几个单一的模型包和应用程序可用于物种分布分析。本文介绍了候鸟空间分布预测系统Bird-SDPS,这是一个可扩展的候鸟空间分布预测系统。Bird-SDPS利用鸟类的GPS跟踪数据和遥感数据作为输入,构建多个分布模型,通过不同的编程语言实现。系统提供在线访问和可视化功能。为了存储大型遥感数据集,设计了一种基于HBase的混合存储结构。我们使用从90只野生鸟类收集的真实世界GPS数据集对我们的系统进行了广泛的评估。结果表明,该系统能够基于多个模型进行鸟类分布预测,混合数据存储模式优于传统的文件存储模式。
{"title":"Bird-SDPS: A Migratory Birds' Spatial Distribution Prediction System","authors":"Yuanchun Zhou, Jing Shao, Xuezhi Wang, Ze Luo, Jianhui Li, Baoping Yan","doi":"10.1109/eScience.2013.12","DOIUrl":"https://doi.org/10.1109/eScience.2013.12","url":null,"abstract":"Species distribution modeling is an important ecological research task that has received a great deal of interest. There are several single model packages and applications available for species distribution analysis. This paper introduces Bird-SDPS, a Prediction System for Migratory Birds' Spatial Distribution, which is an extensible system for birds' spatial distribution prediction. The Bird-SDPS uses birds' GPS tracking data and remote sensing data as input to build multiple distribution models, which are implemented by different programming languages. And the system provides online access and visualization functions. In order to store large dataset of remote sensing data, we design a hybrid storage structure based on HBase. We extensively evaluate our system using a real-world GPS dataset collected from 90 wild birds over 3 years. We show that the system can conduct birds' distribution prediction based on multiple models, and our hybrid data storage modes can outperform the traditional storage modes of files.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126758593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wildlife@Home: Combining Crowd Sourcing and Volunteer Computing to Analyze Avian Nesting Video Wildlife@Home:结合众包和志愿者计算来分析鸟类筑巢视频
Pub Date : 2013-10-01 DOI: 10.1109/ESCIENCE.2013.50
Travis Desell, Robert Bergman, K. Goehner, R. Marsh, Rebecca VanderClute, Susan N. Ellis‐Felege
New camera technology is allowing avian ecologists to perform detailed studies of avian behavior, nesting strategies and predation in areas where it was previously impossible to gather data. Unfortunately, studies have shown mechanical triggers and a variety of sensors to be inadequate in capturing footage of small predators (e.g., snakes, rodents) or events in dense vegetation. Because of this, continuous camera recording is currently the most robust solution for avian monitoring, especially in ground nesting species. However, continuous video footage results in a data deluge, as monitoring enough nests to make biologically significant inferences results in massive amounts of data which is unclassifiable by humans alone. In the summer of 2012, Dr. Ellis-Felege gathered video footage from 63 sharp-tailed grouse (Tympanuchus phasianellus) nests, as well as preliminary interior least tern (Sternula antillarum) and piping plover (Charadrius melodus) nests, resulting in over 20,000 hours of video footage. In order to effectively analyze this video, a project combining both crowd sourcing and volunteer computing was developed, where volunteers can stream nesting video and report their observations, as well as have their computers download video for analysis by computer vision techniques. This provides a robust way to analyze the video, as user observations are validated by multiple views as well as the results of the computer vision techniques. This work provides initial results analyzing the effectiveness of the crowd sourced observations and computer vision techniques.
新的摄像技术使鸟类生态学家能够在以前无法收集数据的地区对鸟类的行为、筑巢策略和捕食行为进行详细的研究。不幸的是,研究表明,机械触发器和各种传感器不足以捕捉小型食肉动物(如蛇、啮齿动物)或茂密植被中的事件。正因为如此,连续摄像机记录是目前鸟类监测最可靠的解决方案,特别是在地面筑巢的物种中。然而,连续的视频片段导致数据泛滥,因为监测足够的巢穴以做出生物学上重要的推断,导致大量的数据,而这些数据仅靠人类是无法分类的。2012年夏天,埃利斯-费莱格博士收集了63个尖尾松鸡(Tympanuchus phasianellus)巢穴的视频片段,以及初步的内部小燕鸥(Sternula antillarum)和管鸻(Charadrius melodus)巢穴的视频片段,时长超过2万小时。为了有效地分析这段视频,我们开发了一个结合众包和志愿者计算的项目,志愿者可以通过流媒体传输筑巢视频并报告他们的观察结果,也可以让他们的电脑下载视频以供计算机视觉技术分析。这为分析视频提供了一种强大的方法,因为用户观察结果可以通过多个视图以及计算机视觉技术的结果进行验证。这项工作提供了初步的结果,分析了群众来源的观察和计算机视觉技术的有效性。
{"title":"Wildlife@Home: Combining Crowd Sourcing and Volunteer Computing to Analyze Avian Nesting Video","authors":"Travis Desell, Robert Bergman, K. Goehner, R. Marsh, Rebecca VanderClute, Susan N. Ellis‐Felege","doi":"10.1109/ESCIENCE.2013.50","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2013.50","url":null,"abstract":"New camera technology is allowing avian ecologists to perform detailed studies of avian behavior, nesting strategies and predation in areas where it was previously impossible to gather data. Unfortunately, studies have shown mechanical triggers and a variety of sensors to be inadequate in capturing footage of small predators (e.g., snakes, rodents) or events in dense vegetation. Because of this, continuous camera recording is currently the most robust solution for avian monitoring, especially in ground nesting species. However, continuous video footage results in a data deluge, as monitoring enough nests to make biologically significant inferences results in massive amounts of data which is unclassifiable by humans alone. In the summer of 2012, Dr. Ellis-Felege gathered video footage from 63 sharp-tailed grouse (Tympanuchus phasianellus) nests, as well as preliminary interior least tern (Sternula antillarum) and piping plover (Charadrius melodus) nests, resulting in over 20,000 hours of video footage. In order to effectively analyze this video, a project combining both crowd sourcing and volunteer computing was developed, where volunteers can stream nesting video and report their observations, as well as have their computers download video for analysis by computer vision techniques. This provides a robust way to analyze the video, as user observations are validated by multiple views as well as the results of the computer vision techniques. This work provides initial results analyzing the effectiveness of the crowd sourced observations and computer vision techniques.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131368829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Rapid Scanning of Spectrograms for Efficient Identification of Bioacoustic Events in Big Data 快速扫描谱图用于大数据中生物声学事件的有效识别
Pub Date : 2013-10-01 DOI: 10.1109/eScience.2013.25
A. Truskinger, Mark Cottman-Fields, Daniel M. Johnson, P. Roe
Acoustic sensing is a promising approach to scaling faunal biodiversity monitoring. Scaling the analysis of audio collected by acoustic sensors is a big data problem. Standard approaches for dealing with big acoustic data include automated recognition and crowd based analysis. Automatic methods are fast at processing but hard to rigorously design, whilst manual methods are accurate but slow at processing. In particular, manual methods of acoustic data analysis are constrained by a 1:1 time relationship between the data and its analysts. This constraint is the inherent need to listen to the audio data. This paper demonstrates how the efficiency of crowd sourced sound analysis can be increased by an order of magnitude through the visual inspection of audio visualized as spectrograms. Experimental data suggests that an analysis speedup of 12× is obtainable for suitable types of acoustic analysis, given that only spectrograms are shown.
声传感是一种很有前途的动物生物多样性监测方法。对声学传感器收集的音频进行规模化分析是一个大数据问题。处理大声学数据的标准方法包括自动识别和基于人群的分析。自动方法处理速度快,但难以严格设计,而人工方法精确,但处理速度慢。特别是,声学数据分析的人工方法受到数据和分析人员之间1:1时间关系的限制。这个约束是监听音频数据的内在需求。本文演示了如何通过可视化为频谱图的音频视觉检查,将人群源声音分析的效率提高一个数量级。实验数据表明,在只显示谱图的情况下,对于合适类型的声学分析,可以获得12倍的分析加速。
{"title":"Rapid Scanning of Spectrograms for Efficient Identification of Bioacoustic Events in Big Data","authors":"A. Truskinger, Mark Cottman-Fields, Daniel M. Johnson, P. Roe","doi":"10.1109/eScience.2013.25","DOIUrl":"https://doi.org/10.1109/eScience.2013.25","url":null,"abstract":"Acoustic sensing is a promising approach to scaling faunal biodiversity monitoring. Scaling the analysis of audio collected by acoustic sensors is a big data problem. Standard approaches for dealing with big acoustic data include automated recognition and crowd based analysis. Automatic methods are fast at processing but hard to rigorously design, whilst manual methods are accurate but slow at processing. In particular, manual methods of acoustic data analysis are constrained by a 1:1 time relationship between the data and its analysts. This constraint is the inherent need to listen to the audio data. This paper demonstrates how the efficiency of crowd sourced sound analysis can be increased by an order of magnitude through the visual inspection of audio visualized as spectrograms. Experimental data suggests that an analysis speedup of 12× is obtainable for suitable types of acoustic analysis, given that only spectrograms are shown.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121365963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
期刊
2013 IEEE 9th International Conference on e-Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1