首页 > 最新文献

2014 IEEE International Conference on Semantic Computing最新文献

英文 中文
A Semantic++ MapReduce: A Preliminary Report 语义++ MapReduce:初步报告
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.63
Guigang Zhang, Jian Wang, Weixing Huang, C. Li, Yong Zhang, Chunxiao Xing
Big data processing is one of the hot scientific issues in the current social development. MapReduce is an important foundation for big data processing. In this paper, we propose a semantic++ MapReduce. This study includes four parts. (1) Semantic++ extraction and management for big data. We will do research about the automatically extracting, labeling and management methods for big data's semantic++ information. (2) SMRPL (Semantic++ MapReduce Programming Language). It is a declarative programming language which is close to the human thinking and be used to program for big data's applications. (3) Semantic++ MapReduce compilation methods. (4) Semantic++ MapReduce computing technology. It includes three parts. 1) Analysis of semantic++ index information of the data block, the description of the semantic++ index structure and semantic++ index information automatic loading method. 2) Analysis of all kinds of semantic++ operations such as semantic++ sorting, semantic++ grouping, semantic+++ merging and semantic++ query in the map and reduce phases. 3) Shuffle scheduling strategy based on semantic++ techniques. This paper's research will optimize the MapReduce and enhance its processing efficiency and ability. Our research will provide theoretical and technological accumulation for intelligent processing of big data.
大数据处理是当前社会发展中的热点科学问题之一。MapReduce是大数据处理的重要基础。在本文中,我们提出了一个语义++ MapReduce。本研究包括四个部分。(1)大数据语义++提取与管理。研究大数据语义++信息的自动提取、标注和管理方法。(2) SMRPL (Semantic++ MapReduce Programming Language)。它是一种接近人类思维的声明式编程语言,可用于大数据应用的编程。(3) Semantic++ MapReduce编译方法。(4) Semantic++ MapReduce计算技术。它包括三个部分。1)分析了数据块的语义++索引信息,描述了语义++索引结构和语义++索引信息自动加载方法。2)分析map和reduce阶段的各种语义++操作,如语义++排序、语义++分组、语义++合并、语义++查询。3)基于语义++技术的Shuffle调度策略。本文的研究将对MapReduce进行优化,提高其处理效率和处理能力。我们的研究将为大数据的智能处理提供理论和技术积累。
{"title":"A Semantic++ MapReduce: A Preliminary Report","authors":"Guigang Zhang, Jian Wang, Weixing Huang, C. Li, Yong Zhang, Chunxiao Xing","doi":"10.1109/ICSC.2014.63","DOIUrl":"https://doi.org/10.1109/ICSC.2014.63","url":null,"abstract":"Big data processing is one of the hot scientific issues in the current social development. MapReduce is an important foundation for big data processing. In this paper, we propose a semantic++ MapReduce. This study includes four parts. (1) Semantic++ extraction and management for big data. We will do research about the automatically extracting, labeling and management methods for big data's semantic++ information. (2) SMRPL (Semantic++ MapReduce Programming Language). It is a declarative programming language which is close to the human thinking and be used to program for big data's applications. (3) Semantic++ MapReduce compilation methods. (4) Semantic++ MapReduce computing technology. It includes three parts. 1) Analysis of semantic++ index information of the data block, the description of the semantic++ index structure and semantic++ index information automatic loading method. 2) Analysis of all kinds of semantic++ operations such as semantic++ sorting, semantic++ grouping, semantic+++ merging and semantic++ query in the map and reduce phases. 3) Shuffle scheduling strategy based on semantic++ techniques. This paper's research will optimize the MapReduce and enhance its processing efficiency and ability. Our research will provide theoretical and technological accumulation for intelligent processing of big data.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"97 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128035995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fulgeo -- Towards an Intuitive User Interface for a Semantics-Enabled Multimedia Search Engine Fulgeo——迈向一个支持语义的多媒体搜索引擎的直观用户界面
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.52
D. Schneider, Denny Stohr, J. Tingvold, A. B. Amundsen, Lydia Weiland, S. Kopf, W. Effelsberg, A. Scherp
Multimedia documents like PowerPoint presentations or Flash documents are widely adopted in the Internet and exist in context of lots of different topics. However, so far there is no user friendly way to explore and search for this content. The aim of this work is to address this issue by developing a new, easy-to-use user interface approach and prototype search engine. Our system is called fulgeo and specifically focuses on a suitable multimedia interface for visualizing the query results of semantically-enriched Flash documents.
多媒体文档,如PowerPoint演示文稿或Flash文档在互联网上被广泛采用,存在于许多不同主题的背景中。然而,到目前为止,还没有用户友好的方式来探索和搜索这些内容。这项工作的目的是通过开发一种新的、易于使用的用户界面方法和原型搜索引擎来解决这个问题。我们的系统被称为fulgeo,特别关注于一个合适的多媒体界面,用于可视化语义丰富的Flash文档的查询结果。
{"title":"Fulgeo -- Towards an Intuitive User Interface for a Semantics-Enabled Multimedia Search Engine","authors":"D. Schneider, Denny Stohr, J. Tingvold, A. B. Amundsen, Lydia Weiland, S. Kopf, W. Effelsberg, A. Scherp","doi":"10.1109/ICSC.2014.52","DOIUrl":"https://doi.org/10.1109/ICSC.2014.52","url":null,"abstract":"Multimedia documents like PowerPoint presentations or Flash documents are widely adopted in the Internet and exist in context of lots of different topics. However, so far there is no user friendly way to explore and search for this content. The aim of this work is to address this issue by developing a new, easy-to-use user interface approach and prototype search engine. Our system is called fulgeo and specifically focuses on a suitable multimedia interface for visualizing the query results of semantically-enriched Flash documents.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114061580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Computing Recursive SPARQL Queries 计算递归SPARQL查询
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.54
M. Atzori
We present a simple approach to handle recursive SPARQL queries, that is, nested queries that may contain references to the query itself. This powerful feature is obtained by implementing a custom SPARQL function that takes a SPARQL query as a parameter and executes it over a specified endpoint. The behaviour is similar to the SPARQL 1.1 SERVICE clause, with a few fundamental differences: (1) the query passed as argument can be arbitrarily complex, (2) being a string, the query can be created at runtime in the calling (outer) query, and (3) it can reference to itself, enabling recursion. These features transform the SPARQL language into a Turing-equivalent one without introducing special constructs or needing another interpreter implemented on the endpoint server engine. The feature is implemented using the standard Estensible Value Testing described in the recommendations since 1.0, therefore, our proposal is standard compliant and also compatible with older endpoints not supporting 1.1 Specifications, where it can be also a replacement for the missing SERVICE clause.
我们提供了一种简单的方法来处理递归SPARQL查询,即可能包含对查询本身的引用的嵌套查询。这个强大的特性是通过实现一个自定义SPARQL函数获得的,该函数将SPARQL查询作为参数,并在指定的端点上执行它。这种行为类似于SPARQL 1.1 SERVICE子句,但有一些基本的区别:(1)作为参数传递的查询可以任意复杂;(2)作为字符串,查询可以在运行时调用(外部)查询中创建;(3)它可以引用自身,从而支持递归。这些特性将SPARQL语言转换为与图灵等效的语言,而不需要引入特殊结构,也不需要在端点服务器引擎上实现另一个解释器。该特性是使用自1.0以来建议中描述的标准可评估值测试来实现的,因此,我们的建议是符合标准的,并且与不支持1.1规范的旧端点兼容,它也可以替代缺失的SERVICE子句。
{"title":"Computing Recursive SPARQL Queries","authors":"M. Atzori","doi":"10.1109/ICSC.2014.54","DOIUrl":"https://doi.org/10.1109/ICSC.2014.54","url":null,"abstract":"We present a simple approach to handle recursive SPARQL queries, that is, nested queries that may contain references to the query itself. This powerful feature is obtained by implementing a custom SPARQL function that takes a SPARQL query as a parameter and executes it over a specified endpoint. The behaviour is similar to the SPARQL 1.1 SERVICE clause, with a few fundamental differences: (1) the query passed as argument can be arbitrarily complex, (2) being a string, the query can be created at runtime in the calling (outer) query, and (3) it can reference to itself, enabling recursion. These features transform the SPARQL language into a Turing-equivalent one without introducing special constructs or needing another interpreter implemented on the endpoint server engine. The feature is implemented using the standard Estensible Value Testing described in the recommendations since 1.0, therefore, our proposal is standard compliant and also compatible with older endpoints not supporting 1.1 Specifications, where it can be also a replacement for the missing SERVICE clause.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125647667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
CrowdLink: Crowdsourcing for Large-Scale Linked Data Management CrowdLink:大规模关联数据管理的众包
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.14
A. Basharat, I. Arpinar, Shima Dastgheib, Ugur Kursuncu, K. Kochut, Erdogan Dogdu
Crowd sourcing is an emerging paradigm to exploit the notion of human-computation for solving various computational problems, which cannot be accurately solved solely by the machine-based solutions. We use crowd sourcing for large-scale link management in the Semantic Web. More specifically, we develop Crowd Link, which utilizes crowd workers for verification and creation of triples in Linking Open Data (LOD). LOD incorporates the core data sets in the Semantic Web, yet is not in full conformance with the guidelines for publishing high quality linked data on the Web. Our approach can help in enriching and improving quality of mission-critical links in LOD. Scalable LOD link management requires a hybrid approach, where human intelligent and machine intelligent tasks interleave in a workflow execution. Likewise, many other crowd sourcing applications require a sophisticated workflow specification not only on human intelligent tasks, but also machine intelligent tasks to handle data and control-flow, which is strictly deficient in the existing crowd sourcing platforms. Hence, we are strongly motivated to investigate the interplay of crowd sourcing, and semantically enriched workflows for better human-machine cooperation in task completion. We demonstrate usefulness of our approach through various link creation and verification tasks, and workflows using Amazon Mechanical Turk. Experimental evaluation demonstrates promising results in terms of accuracy of the links created, and verified by the crowd workers.
众包是一种新兴的范例,它利用人类计算的概念来解决各种计算问题,这些问题不能仅仅通过基于机器的解决方案来精确解决。我们在语义网中使用众包进行大规模的链接管理。更具体地说,我们开发了Crowd Link,它利用人群工作者来验证和创建链接开放数据(LOD)中的三元组。LOD将核心数据集合并到语义Web中,但并不完全符合在Web上发布高质量链接数据的指导方针。我们的方法有助于丰富和提高LOD中关键任务链接的质量。可扩展的LOD链接管理需要一种混合方法,其中人工智能和机器智能任务在工作流执行中交织。同样,许多其他众包应用需要复杂的工作流规范,不仅对人类智能任务,还需要机器智能任务来处理数据和控制流程,这在现有的众包平台中是严重缺乏的。因此,我们有强烈的动机去研究众包的相互作用,以及语义丰富的工作流程,以更好地完成任务中的人机合作。我们通过各种链接创建和验证任务以及使用Amazon Mechanical Turk的工作流来展示我们方法的实用性。实验评估表明,在创建链接的准确性方面,有希望的结果,并得到了人群工作者的验证。
{"title":"CrowdLink: Crowdsourcing for Large-Scale Linked Data Management","authors":"A. Basharat, I. Arpinar, Shima Dastgheib, Ugur Kursuncu, K. Kochut, Erdogan Dogdu","doi":"10.1109/ICSC.2014.14","DOIUrl":"https://doi.org/10.1109/ICSC.2014.14","url":null,"abstract":"Crowd sourcing is an emerging paradigm to exploit the notion of human-computation for solving various computational problems, which cannot be accurately solved solely by the machine-based solutions. We use crowd sourcing for large-scale link management in the Semantic Web. More specifically, we develop Crowd Link, which utilizes crowd workers for verification and creation of triples in Linking Open Data (LOD). LOD incorporates the core data sets in the Semantic Web, yet is not in full conformance with the guidelines for publishing high quality linked data on the Web. Our approach can help in enriching and improving quality of mission-critical links in LOD. Scalable LOD link management requires a hybrid approach, where human intelligent and machine intelligent tasks interleave in a workflow execution. Likewise, many other crowd sourcing applications require a sophisticated workflow specification not only on human intelligent tasks, but also machine intelligent tasks to handle data and control-flow, which is strictly deficient in the existing crowd sourcing platforms. Hence, we are strongly motivated to investigate the interplay of crowd sourcing, and semantically enriched workflows for better human-machine cooperation in task completion. We demonstrate usefulness of our approach through various link creation and verification tasks, and workflows using Amazon Mechanical Turk. Experimental evaluation demonstrates promising results in terms of accuracy of the links created, and verified by the crowd workers.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134485933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Pattern-Based Approach to Semantic Relation Extraction Using a Seed Ontology 基于模式的种子本体语义关系提取方法
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.42
M. Al-Yahya, L. Aldhubayi, Sawsan Al-Malak
This paper presents our experiment on the "Badea" system. A system designed for the automated extraction of semantic relations from text using a seed ontology and a pattern based approach. We describe the experiment using a set of Arabic language corpora for extracting the antonym semantic relation. Antonyms from the seed ontology are used to extract patterns from the corpora, these patterns are then used to discover new antonym pairs, thus enriching the ontology. Evaluation results show that the system was successful in enriching the ontology with over 400% increase in size. The results also showed that only 2.7% of the patterns were useful in extracting new antonyms, and thus recommendations for pattern scoring are presented in this paper.
本文介绍了我们在“baddea”系统上的实验。一个使用种子本体和基于模式的方法从文本中自动提取语义关系的系统。我们描述了用一组阿拉伯语语料库提取反义词语义关系的实验。利用种子本体中的反义词从语料库中提取模式,然后利用这些模式发现新的反义词对,从而丰富本体。评估结果表明,该系统成功地丰富了本体,规模增加了400%以上。结果还表明,只有2.7%的模式对新反义词的提取有用,因此本文提出了模式评分的建议。
{"title":"A Pattern-Based Approach to Semantic Relation Extraction Using a Seed Ontology","authors":"M. Al-Yahya, L. Aldhubayi, Sawsan Al-Malak","doi":"10.1109/ICSC.2014.42","DOIUrl":"https://doi.org/10.1109/ICSC.2014.42","url":null,"abstract":"This paper presents our experiment on the \"Badea\" system. A system designed for the automated extraction of semantic relations from text using a seed ontology and a pattern based approach. We describe the experiment using a set of Arabic language corpora for extracting the antonym semantic relation. Antonyms from the seed ontology are used to extract patterns from the corpora, these patterns are then used to discover new antonym pairs, thus enriching the ontology. Evaluation results show that the system was successful in enriching the ontology with over 400% increase in size. The results also showed that only 2.7% of the patterns were useful in extracting new antonyms, and thus recommendations for pattern scoring are presented in this paper.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Enabling 'Question Answering' in the MBAT Vector Symbolic Architecture by Exploiting Orthogonal Random Matrices 利用正交随机矩阵实现MBAT向量符号体系结构中的“问答”
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.38
M. Tissera, M. McDonnell
Vector Symbolic Architectures (VSA) are methods designed to enable distributed representation and manipulation of semantically-structured information, such as natural languages. Recently, a new VSA based on multiplication of distributed vectors by random matrices was proposed, this is known as Matrix-Binding-of-Additive-Terms (MBAT). We propose an enhancement that introduces an important additional feature to MBAT: the ability to 'unbind' symbols. We show that our method, which exploits the inherent properties of orthogonal matrices, imparts MBAT with the 'question answering' ability found in other VSAs. We compare our results with another popular VSA that was recently demonstrated to have high utility in brain-inspired machine learning applications.
向量符号体系结构(Vector Symbolic Architectures, VSA)是一种旨在支持分布式表示和操作语义结构化信息(如自然语言)的方法。近年来,提出了一种基于随机矩阵与分布向量相乘的VSA,称为矩阵加项绑定(Matrix-Binding-of-Additive-Terms, MBAT)。我们提出了一个增强,为MBAT引入了一个重要的附加功能:“解绑定”符号的能力。我们表明,我们的方法利用正交矩阵的固有特性,赋予MBAT在其他vsa中发现的“问答”能力。我们将我们的结果与另一个流行的VSA进行了比较,该VSA最近被证明在大脑启发的机器学习应用中具有很高的实用性。
{"title":"Enabling 'Question Answering' in the MBAT Vector Symbolic Architecture by Exploiting Orthogonal Random Matrices","authors":"M. Tissera, M. McDonnell","doi":"10.1109/ICSC.2014.38","DOIUrl":"https://doi.org/10.1109/ICSC.2014.38","url":null,"abstract":"Vector Symbolic Architectures (VSA) are methods designed to enable distributed representation and manipulation of semantically-structured information, such as natural languages. Recently, a new VSA based on multiplication of distributed vectors by random matrices was proposed, this is known as Matrix-Binding-of-Additive-Terms (MBAT). We propose an enhancement that introduces an important additional feature to MBAT: the ability to 'unbind' symbols. We show that our method, which exploits the inherent properties of orthogonal matrices, imparts MBAT with the 'question answering' ability found in other VSAs. We compare our results with another popular VSA that was recently demonstrated to have high utility in brain-inspired machine learning applications.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132504520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Semantic Approach for Identifying Harmful Sites Using the Link Relations 利用链接关系识别有害站点的语义方法
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.53
Junghoon Shin, Sangjun Lee, Taehyung Wang
Technologies based on the internet are improving continuously and so are the harmful websites such as pornography or illegal gambling websites. In addition, it is the characteristics of websites that the changes made to the web address or its contents take effect almost instantaneously. Therefore, it is not easy to identify harmful websites from those that are not. There are two ways to make such decision: manual examination and automated searching for certain texts, videos, or sounds. These two methods require a lot of time. In this paper we propose a method for identifying harmful websites by analyzing the relationship between websites instead of the contents.
基于互联网的技术在不断进步,有害的网站如色情或非法赌博网站也在不断进步。此外,网站的特点是对网址或其内容的更改几乎是即时生效的。因此,识别有害网站和非有害网站并不容易。有两种方法可以做出这样的决定:人工检查和自动搜索某些文本、视频或声音。这两种方法都需要大量的时间。在本文中,我们提出了一种通过分析网站之间的关系而不是内容来识别有害网站的方法。
{"title":"Semantic Approach for Identifying Harmful Sites Using the Link Relations","authors":"Junghoon Shin, Sangjun Lee, Taehyung Wang","doi":"10.1109/ICSC.2014.53","DOIUrl":"https://doi.org/10.1109/ICSC.2014.53","url":null,"abstract":"Technologies based on the internet are improving continuously and so are the harmful websites such as pornography or illegal gambling websites. In addition, it is the characteristics of websites that the changes made to the web address or its contents take effect almost instantaneously. Therefore, it is not easy to identify harmful websites from those that are not. There are two ways to make such decision: manual examination and automated searching for certain texts, videos, or sounds. These two methods require a lot of time. In this paper we propose a method for identifying harmful websites by analyzing the relationship between websites instead of the contents.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131994164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Big Data, Big Challenges 大数据,大挑战
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.65
Wei Wang
Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.
只提供摘要形式。大数据分析是检查大量各种类型的数据(大数据)以发现隐藏的模式,未知的相关性和其他有用信息的过程。它的革命潜力现在已得到普遍认可。数据的复杂性、异构性、规模和时效性使得数据分析在许多生物医学应用中成为一个明显的瓶颈,这是由于数据模式的复杂性和底层算法缺乏可扩展性。正在开发先进的机器学习和数据挖掘算法来解决上面列出的一个或多个挑战。典型的情况是,潜在模式的复杂性可能会随着数据复杂性呈指数级增长,模式空间的大小也是如此。为了避免在模式空间中进行穷举搜索,机器学习和数据挖掘算法通常采用贪心方法在解空间中搜索局部最优,或使用分支定界方法寻求最优解,因此,通常以迭代或递归的方式实现。为了提高效率,这些算法通常利用潜在模式之间的依赖关系来最大化内存中的计算和/或利用特殊硬件(如GPU和FPGA)来加速。这将导致强烈的数据依赖性、操作依赖性和硬件依赖性,有时还会导致无法推广到更广泛范围的临时解决方案。在这次演讲中,我将介绍生物医学领域数据科学家面临的一些公开挑战以及目前应对这些挑战的方法。
{"title":"Big Data, Big Challenges","authors":"Wei Wang","doi":"10.1109/ICSC.2014.65","DOIUrl":"https://doi.org/10.1109/ICSC.2014.65","url":null,"abstract":"Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128110484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
A Scalable Approach to Learn Semantic Models of Structured Sources 一种学习结构化资源语义模型的可扩展方法
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.13
M. Taheriyan, Craig A. Knoblock, Pedro A. Szekely, J. Ambite
Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes.
数据源的语义模型根据领域本体定义的概念和关系来描述数据的含义。构建这样的模型是集成来自不同数据源的数据的重要一步,我们需要向用户提供底层数据源的统一视图。在本文中,我们提出了一种可扩展的方法,通过利用先前建模源的知识来自动学习结构化数据源的语义模型。我们的评估表明,该方法可以用最少的用户输入生成富有表现力的语义模型,并且可以扩展到具有许多属性的大型本体和数据源。
{"title":"A Scalable Approach to Learn Semantic Models of Structured Sources","authors":"M. Taheriyan, Craig A. Knoblock, Pedro A. Szekely, J. Ambite","doi":"10.1109/ICSC.2014.13","DOIUrl":"https://doi.org/10.1109/ICSC.2014.13","url":null,"abstract":"Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129095643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Automatic Unsupervised Polarity Detection on a Twitter Data Stream Twitter数据流上的自动无监督极性检测
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.17
D. Terrana, A. Augello, G. Pilato
In this paper we propose a simple and completely automatic methodology for analyzing sentiment of users in Twitter. Firstly, we built a Twitter corpus by grouping tweets expressing positive and negative polarity through a completely automatic procedure by using only emoticons in tweets. Then, we have built a simple sentiment classifier where an actual stream of tweets from Twitter is processed and its content classified as positive, negative or neutral. The classification is made without the use of any pre-defined polarity lexicon. The lexicon is automatically inferred from the streaming of tweets. Experimental results show that our method reduces human intervention and, consequently, the cost of the whole classification process. We observe that our simple system captures polarity distinctions matching reasonably well the classification done by human judges.
在本文中,我们提出了一种简单而完全自动化的方法来分析Twitter用户的情绪。首先,我们通过一个完全自动的程序,将推文中表达积极极性和消极极性的推文进行分组,构建推文语料库。然后,我们建立了一个简单的情感分类器,其中处理来自Twitter的实际tweet流,并将其内容分类为积极,消极或中性。这种分类不使用任何预定义的极性词典。词汇会自动从tweet流中推断出来。实验结果表明,该方法减少了人为干预,从而降低了整个分类过程的成本。我们观察到,我们的简单系统捕获极性区分相当好地匹配由人类法官所做的分类。
{"title":"Automatic Unsupervised Polarity Detection on a Twitter Data Stream","authors":"D. Terrana, A. Augello, G. Pilato","doi":"10.1109/ICSC.2014.17","DOIUrl":"https://doi.org/10.1109/ICSC.2014.17","url":null,"abstract":"In this paper we propose a simple and completely automatic methodology for analyzing sentiment of users in Twitter. Firstly, we built a Twitter corpus by grouping tweets expressing positive and negative polarity through a completely automatic procedure by using only emoticons in tweets. Then, we have built a simple sentiment classifier where an actual stream of tweets from Twitter is processed and its content classified as positive, negative or neutral. The classification is made without the use of any pre-defined polarity lexicon. The lexicon is automatically inferred from the streaming of tweets. Experimental results show that our method reduces human intervention and, consequently, the cost of the whole classification process. We observe that our simple system captures polarity distinctions matching reasonably well the classification done by human judges.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122458824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
期刊
2014 IEEE International Conference on Semantic Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1