The ability to reason over large scale data and return responsive query results is widely seen as a critical step to achieving the Semantic Web vision. We describe an approach for partitioning OWL Lite datasets and then propose a strategy for parallel reasoning about concept instances and role instances on each partition. The partitions are designed such that each can be reasoned on independently to find answers to each query sub goal, and when the results are unioned together, a complete set of results are found for that sub goal. Our partitioning approach has a polynomial worst case time complexity in the size of the knowledge base. In our current implementation, we partition semantic web datasets and execute reasoning tasks on partitioned data in parallel on independent machines. We implement a master-slave architecture that distributes a given query to the slave processes on different machines. All slaves run in parallel, each performing sound and complete reasoning to execute each sub goal of its query on its own set of partitions. As a final step, master joins the results computed by the slaves. We study the impact of our parallel reasoning approach on query performance and show some promising results on LUBM data.
{"title":"Partitioning OWL Knowledge Bases for Parallel Reasoning","authors":"S. Priya, Yuanbo Guo, Michael F. Spear, J. Heflin","doi":"10.1109/ICSC.2014.34","DOIUrl":"https://doi.org/10.1109/ICSC.2014.34","url":null,"abstract":"The ability to reason over large scale data and return responsive query results is widely seen as a critical step to achieving the Semantic Web vision. We describe an approach for partitioning OWL Lite datasets and then propose a strategy for parallel reasoning about concept instances and role instances on each partition. The partitions are designed such that each can be reasoned on independently to find answers to each query sub goal, and when the results are unioned together, a complete set of results are found for that sub goal. Our partitioning approach has a polynomial worst case time complexity in the size of the knowledge base. In our current implementation, we partition semantic web datasets and execute reasoning tasks on partitioned data in parallel on independent machines. We implement a master-slave architecture that distributes a given query to the slave processes on different machines. All slaves run in parallel, each performing sound and complete reasoning to execute each sub goal of its query on its own set of partitions. As a final step, master joins the results computed by the slaves. We study the impact of our parallel reasoning approach on query performance and show some promising results on LUBM data.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127495723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Twitter-based messages have presented challenges in the identification of features as applied to classification. This paper explores filtering techniques for improved trend detection and information extraction. Starting with a pre-filtered source (Twitter), we will examine the application of both information theory and Natural Language Processing (NLP) based techniques as a means of preprocessing for classification. Results demonstrate that both means allow for improved results in classification among highly idiosyncratic data (Twitter).
{"title":"Feature Selection for Twitter Classification","authors":"D. Ostrowski","doi":"10.1109/ICSC.2014.50","DOIUrl":"https://doi.org/10.1109/ICSC.2014.50","url":null,"abstract":"Twitter-based messages have presented challenges in the identification of features as applied to classification. This paper explores filtering techniques for improved trend detection and information extraction. Starting with a pre-filtered source (Twitter), we will examine the application of both information theory and Natural Language Processing (NLP) based techniques as a means of preprocessing for classification. Results demonstrate that both means allow for improved results in classification among highly idiosyncratic data (Twitter).","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125391775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of the classical Web (of documents) with the emerging Web of Data is a challenging vision. In this paper we focus on an integration approach during searching which aims at enriching the responses of non-semantic search systems (e.g. professional search systems, web search engines) with semantic information, i.e. Linked Open Data (LOD), and exploiting the outcome for providing an overview of the search space and allowing the users (apart from restricting it) to explore the related LOD. We use named entities (e.g. persons, locations, etc.) as the "glue" for automatically connecting search hits with LOD. We consider a scenario where this entity-based integration is performed at query time with no human effort, and no a-priori indexing, which is beneficial in terms of configurability and freshness. To realize this scenario one has to tackle various challenges. One spiny issue is that the number of identified entities can be high, the same is true for the semantic information about these entities that can be fetched from the available LOD (i.e. their properties and associations with other entities). To this end, in this paper we propose a Link Analysis-based method which is used for (a) ranking (and thus selecting to show) the more important semantic information related to the search results, (b) deriving and showing top-K semantic graphs. In the sequel, we report the results of a survey regarding the marine domain with promising results, and comparative results that illustrate the effectiveness of the proposed (Page Rank-based) ranking scheme. Finally, we report experimental results regarding efficiency showing that the proposed functionality can be offered even at query time.
{"title":"Post-analysis of Keyword-Based Search Results Using Entity Mining, Linked Data, and Link Analysis at Query Time","authors":"P. Fafalios, Yannis Tzitzikas","doi":"10.1109/ICSC.2014.11","DOIUrl":"https://doi.org/10.1109/ICSC.2014.11","url":null,"abstract":"The integration of the classical Web (of documents) with the emerging Web of Data is a challenging vision. In this paper we focus on an integration approach during searching which aims at enriching the responses of non-semantic search systems (e.g. professional search systems, web search engines) with semantic information, i.e. Linked Open Data (LOD), and exploiting the outcome for providing an overview of the search space and allowing the users (apart from restricting it) to explore the related LOD. We use named entities (e.g. persons, locations, etc.) as the \"glue\" for automatically connecting search hits with LOD. We consider a scenario where this entity-based integration is performed at query time with no human effort, and no a-priori indexing, which is beneficial in terms of configurability and freshness. To realize this scenario one has to tackle various challenges. One spiny issue is that the number of identified entities can be high, the same is true for the semantic information about these entities that can be fetched from the available LOD (i.e. their properties and associations with other entities). To this end, in this paper we propose a Link Analysis-based method which is used for (a) ranking (and thus selecting to show) the more important semantic information related to the search results, (b) deriving and showing top-K semantic graphs. In the sequel, we report the results of a survey regarding the marine domain with promising results, and comparative results that illustrate the effectiveness of the proposed (Page Rank-based) ranking scheme. Finally, we report experimental results regarding efficiency showing that the proposed functionality can be offered even at query time.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"753 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122978493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a statistical semantic analysis method for Chinese terms. We use words, part-of-speech (POS) tags, word distances, word contexts and the first sememe of a word in HowNet as features to train a Support Vector Machine (SVM) model for analyzing term semantics. The model is used to identify dependencies embedded inside a term. A Conditional Random Field (CRF) model is used afterwards to incorporate the dependencies and experimental results showed the effectiveness and validity of our approach.
{"title":"A Statistical Approach to Semantic Analysis for Chinese Terms","authors":"Dongfeng Cai, Na Ye, Guiping Zhang, Yan Song","doi":"10.1109/ICSC.2014.47","DOIUrl":"https://doi.org/10.1109/ICSC.2014.47","url":null,"abstract":"We propose a statistical semantic analysis method for Chinese terms. We use words, part-of-speech (POS) tags, word distances, word contexts and the first sememe of a word in HowNet as features to train a Support Vector Machine (SVM) model for analyzing term semantics. The model is used to identify dependencies embedded inside a term. A Conditional Random Field (CRF) model is used afterwards to incorporate the dependencies and experimental results showed the effectiveness and validity of our approach.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126903012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose an image classification method that recognizes several poses of idol photographs. The proposed method takes unannotated idol photos as input, and classifies them according to their poses based on spatial layouts of the idol in the photos. Our method has two phases, the first one is to estimate the spatial layout of ten body parts (head, torso, upper and lower arms and legs) using Eichner's Stickman Pose Estimation. The second one is to classify the poses of the idols using Bayesian Network classifiers. In order to improve accuracy of the classification, we introduce Pose Guide Ontology (PGO). PGO contains useful background knowledge, such as semantic hierarchies and constraints related to the positional relationship between the body parts. The location information of body parts is amended by PGO. We also propose iterative procedures for making further refinements of PGO. Finally, we evaluated our method on a dataset consisting of 400 images in 8 poses, and the final results indicated that F-measure of the classification has become 15% higher than non-amended results.
在本文中,我们提出了一种图像分类方法来识别多个姿势的偶像照片。该方法以未标注的偶像照片为输入,根据照片中偶像的空间布局,对其进行姿势分类。我们的方法有两个阶段,第一个阶段是使用Eichner的stick - man Pose Estimation来估计十个身体部位(头、躯干、上臂和下臂以及腿)的空间布局。二是使用贝叶斯网络分类器对偶像的姿势进行分类。为了提高分类精度,引入姿态引导本体(Pose Guide Ontology, PGO)。PGO包含有用的背景知识,例如与身体部位之间位置关系相关的语义层次和约束。身体部位的位置信息通过PGO进行修正。我们还提出了进一步改进PGO的迭代过程。最后,我们在一个包含8个姿态的400张图像的数据集上对我们的方法进行了评估,最终结果表明,分类的F-measure比未修正的结果提高了15%。
{"title":"Refinement of Ontology-Constrained Human Pose Classification","authors":"Kazuhiro Tashiro, Takahiro Kawamura, Y. Sei, Hiroyuki Nakagawa, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.1109/ICSC.2014.20","DOIUrl":"https://doi.org/10.1109/ICSC.2014.20","url":null,"abstract":"In this paper, we propose an image classification method that recognizes several poses of idol photographs. The proposed method takes unannotated idol photos as input, and classifies them according to their poses based on spatial layouts of the idol in the photos. Our method has two phases, the first one is to estimate the spatial layout of ten body parts (head, torso, upper and lower arms and legs) using Eichner's Stickman Pose Estimation. The second one is to classify the poses of the idols using Bayesian Network classifiers. In order to improve accuracy of the classification, we introduce Pose Guide Ontology (PGO). PGO contains useful background knowledge, such as semantic hierarchies and constraints related to the positional relationship between the body parts. The location information of body parts is amended by PGO. We also propose iterative procedures for making further refinements of PGO. Finally, we evaluated our method on a dataset consisting of 400 images in 8 poses, and the final results indicated that F-measure of the classification has become 15% higher than non-amended results.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114816816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The MapReduce paradigm has become ubiquitous within Big Data Analytics. Within this field, Social Networks exist as an important area of applications as it relies on the large scale analysis of graphs. To enable the scalability of Social Networks, we consider the application of MapReduce design patterns for the determination of graph-based metrics. Specifically, we detail the application of a MapReduce-based solution for the metric of betweenness-centrality. The prevailing concept is separation of the graph topology from the actual graph analysis. Here, we consider the chaining of MapReduce jobs for the estimation of shortest paths in a graph as well as post processing statistics. Through our design pattern, we are able to leverage Big Data Technologies to determine metrics in the context of ever expanding internet-based data resources.
{"title":"MapReduce Design Patterns for Social Networking Analysis","authors":"D. Ostrowski","doi":"10.1109/ICSC.2014.61","DOIUrl":"https://doi.org/10.1109/ICSC.2014.61","url":null,"abstract":"The MapReduce paradigm has become ubiquitous within Big Data Analytics. Within this field, Social Networks exist as an important area of applications as it relies on the large scale analysis of graphs. To enable the scalability of Social Networks, we consider the application of MapReduce design patterns for the determination of graph-based metrics. Specifically, we detail the application of a MapReduce-based solution for the metric of betweenness-centrality. The prevailing concept is separation of the graph topology from the actual graph analysis. Here, we consider the chaining of MapReduce jobs for the estimation of shortest paths in a graph as well as post processing statistics. Through our design pattern, we are able to leverage Big Data Technologies to determine metrics in the context of ever expanding internet-based data resources.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123879939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Augello, Ignazio Infantino, G. Pilato, R. Rizzo, Filippo Vella
The paper proposes a system architecture for artificial creativity that enables a robot to perform portraits. The proposed cognitive architecture is inspired by the PSI model, and it requires that the motivation of the robot in the execution of its tasks is influenced by urges. Such parameters depend on both internal and external evaluation mechanisms. The system is a premise for the development of an artificial artist able to develop a personality and a behavior that depends on its experience of successes and failures (competence), and the availability of different painting techniques (certainty). The creative execution is driven by the motivation arising from the urges, and the perception of the work being executed or performed. The external evaluation is obtained by analyzing the opinions expressed in natural language from people watching the realized portrait.
{"title":"Robotic Creativity Driven by Motivation and Semantic Analysis","authors":"A. Augello, Ignazio Infantino, G. Pilato, R. Rizzo, Filippo Vella","doi":"10.1109/ICSC.2014.58","DOIUrl":"https://doi.org/10.1109/ICSC.2014.58","url":null,"abstract":"The paper proposes a system architecture for artificial creativity that enables a robot to perform portraits. The proposed cognitive architecture is inspired by the PSI model, and it requires that the motivation of the robot in the execution of its tasks is influenced by urges. Such parameters depend on both internal and external evaluation mechanisms. The system is a premise for the development of an artificial artist able to develop a personality and a behavior that depends on its experience of successes and failures (competence), and the availability of different painting techniques (certainty). The creative execution is driven by the motivation arising from the urges, and the perception of the work being executed or performed. The external evaluation is obtained by analyzing the opinions expressed in natural language from people watching the realized portrait.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126340905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vector Symbolic Architectures (VSA) are approaches to representing symbols and structured combinations of symbols as high-dimensional vectors. They have applications in machine learning and for understanding information processing in neurobiology. VSAs are typically described in an abstract mathematical form in terms of vectors and operations on vectors. In this work, we show that a machine learning approach known as hierarchical temporal memory, which is based on the anatomy and function of mammalian neocortex, is inherently capable of supporting important VSA functionality. This follows because the approach learns sequences of semantics-preserving sparse distributed representations.
{"title":"A Neurobiologically Plausible Vector Symbolic Architecture","authors":"Daniel E. Padilla, M. McDonnell","doi":"10.1109/ICSC.2014.40","DOIUrl":"https://doi.org/10.1109/ICSC.2014.40","url":null,"abstract":"Vector Symbolic Architectures (VSA) are approaches to representing symbols and structured combinations of symbols as high-dimensional vectors. They have applications in machine learning and for understanding information processing in neurobiology. VSAs are typically described in an abstract mathematical form in terms of vectors and operations on vectors. In this work, we show that a machine learning approach known as hierarchical temporal memory, which is based on the anatomy and function of mammalian neocortex, is inherently capable of supporting important VSA functionality. This follows because the approach learns sequences of semantics-preserving sparse distributed representations.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123816045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo
The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.
{"title":"Mining Semantic Structures from Syntactic Structures in Free Text Documents","authors":"Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo","doi":"10.1109/ICSC.2014.31","DOIUrl":"https://doi.org/10.1109/ICSC.2014.31","url":null,"abstract":"The Web has made possible many advanced text-mining applications, such as news summarization, essay grading, question answering, and semantic search. For many of such applications, statistical text-mining techniques are ineffective since they do not utilize the morphological structure of the text. Thus, many approaches use NLP-based techniques, that parse the text and use patterns to mine and analyze the parse trees which are often unnecessarily complex. Therefore, we propose a weighted-graph representation of text, called Text Graphs, which captures the grammatical and semantic relations between words and terms in the text. Text Graphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves co references by a novel technique, generates domain-specific Text Graphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining Text Graphs.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127908836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatial SQL (structured query language) is a powerful tool for systematically solving geographic problems, however, it has not been widely applied to the problem of geographic question answering. This paper introduces a parameterized approach to translate natural language geographic questions into spatial SQLs. In particular, three types of complexity are introduced and initial solutions are proposed to deal with these complexities. The entire parameterization process is implemented to generate spatial SQL templates for five types of geographic questions. It is suggested that our approach is useful for solving natural geographic problems using spatial functions such as those in a GIS.
{"title":"Parameterized Spatial SQL Translation for Geographic Question Answering","authors":"Wei Chen","doi":"10.1109/ICSC.2014.44","DOIUrl":"https://doi.org/10.1109/ICSC.2014.44","url":null,"abstract":"Spatial SQL (structured query language) is a powerful tool for systematically solving geographic problems, however, it has not been widely applied to the problem of geographic question answering. This paper introduces a parameterized approach to translate natural language geographic questions into spatial SQLs. In particular, three types of complexity are introduced and initial solutions are proposed to deal with these complexities. The entire parameterization process is implemented to generate spatial SQL templates for five types of geographic questions. It is suggested that our approach is useful for solving natural geographic problems using spatial functions such as those in a GIS.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"89 1-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116916946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}