Big Data, Big Challenges

Wei Wang
{"title":"Big Data, Big Challenges","authors":"Wei Wang","doi":"10.1109/ICSC.2014.65","DOIUrl":null,"url":null,"abstract":"Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2014.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大数据,大挑战
只提供摘要形式。大数据分析是检查大量各种类型的数据(大数据)以发现隐藏的模式,未知的相关性和其他有用信息的过程。它的革命潜力现在已得到普遍认可。数据的复杂性、异构性、规模和时效性使得数据分析在许多生物医学应用中成为一个明显的瓶颈,这是由于数据模式的复杂性和底层算法缺乏可扩展性。正在开发先进的机器学习和数据挖掘算法来解决上面列出的一个或多个挑战。典型的情况是,潜在模式的复杂性可能会随着数据复杂性呈指数级增长,模式空间的大小也是如此。为了避免在模式空间中进行穷举搜索,机器学习和数据挖掘算法通常采用贪心方法在解空间中搜索局部最优,或使用分支定界方法寻求最优解,因此,通常以迭代或递归的方式实现。为了提高效率,这些算法通常利用潜在模式之间的依赖关系来最大化内存中的计算和/或利用特殊硬件(如GPU和FPGA)来加速。这将导致强烈的数据依赖性、操作依赖性和硬件依赖性,有时还会导致无法推广到更广泛范围的临时解决方案。在这次演讲中,我将介绍生物医学领域数据科学家面临的一些公开挑战以及目前应对这些挑战的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fulgeo -- Towards an Intuitive User Interface for a Semantics-Enabled Multimedia Search Engine Refinement of Ontology-Constrained Human Pose Classification "Units of Meaning" in Medical Documents: Natural Language Processing Perspective Enhancing Multimedia Semantic Concept Mining and Retrieval by Incorporating Negative Correlations Cloud Resource Auto-scaling System Based on Hidden Markov Model (HMM)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1