Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo
Ontologies are a vital component of most knowledge-based applications, including semantic web search, intelligent information integration, and natural language processing. In particular, we need effective tools for generating in-depth ontologies that achieve comprehensive converge of specific application domains of interest, while minimizing the time and cost of this process. Therefore we cannot rely on the manual or highly supervised approaches often used in the past, since they do not scale well. We instead propose a new approach that automatically generates domain-specific ontologies from a small corpus of documents using deep NLP-based text-mining. Starting from an initial small seed of domain concepts, our Onto Harvester system iteratively extracts ontological relations connecting existing concepts to other terms in the text, and adds strongly connected terms to the current ontology. As a result, Onto Harvester (i) remains focused on the application domain, (ii) is resistant to noise, and (iii) generates very comprehensive ontologies from modest-size document corpora. In fact, starting from a small seed, Onto Harvester produces ontologies that outperform both manually generated ontologies and ontologies generated by current techniques, even those that require very large well-focused data sets.
{"title":"Harvesting Domain Specific Ontologies from Text","authors":"Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo","doi":"10.1109/ICSC.2014.12","DOIUrl":"https://doi.org/10.1109/ICSC.2014.12","url":null,"abstract":"Ontologies are a vital component of most knowledge-based applications, including semantic web search, intelligent information integration, and natural language processing. In particular, we need effective tools for generating in-depth ontologies that achieve comprehensive converge of specific application domains of interest, while minimizing the time and cost of this process. Therefore we cannot rely on the manual or highly supervised approaches often used in the past, since they do not scale well. We instead propose a new approach that automatically generates domain-specific ontologies from a small corpus of documents using deep NLP-based text-mining. Starting from an initial small seed of domain concepts, our Onto Harvester system iteratively extracts ontological relations connecting existing concepts to other terms in the text, and adds strongly connected terms to the current ontology. As a result, Onto Harvester (i) remains focused on the application domain, (ii) is resistant to noise, and (iii) generates very comprehensive ontologies from modest-size document corpora. In fact, starting from a small seed, Onto Harvester produces ontologies that outperform both manually generated ontologies and ontologies generated by current techniques, even those that require very large well-focused data sets.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116915433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.
{"title":"Big Data, Big Challenges","authors":"Wei Wang","doi":"10.1109/ICSC.2014.65","DOIUrl":"https://doi.org/10.1109/ICSC.2014.65","url":null,"abstract":"Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128110484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Taheriyan, Craig A. Knoblock, Pedro A. Szekely, J. Ambite
Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes.
{"title":"A Scalable Approach to Learn Semantic Models of Structured Sources","authors":"M. Taheriyan, Craig A. Knoblock, Pedro A. Szekely, J. Ambite","doi":"10.1109/ICSC.2014.13","DOIUrl":"https://doi.org/10.1109/ICSC.2014.13","url":null,"abstract":"Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129095643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ability to identify, process, and comprehend the essential elements of information associated with a given operational environment can be used to reason about how the actors within the environment can best respond. This is often referred to as "situation assessment," the end state of which is "situation awareness," which can be simply defined as "knowing what is going on around you." Taken together, these are important fields of study concerned with perception of the environment critical to decision-makers in many complex, dynamic domains, including aviation, military command and control, and emergency management. The primary goal of our research is to identify some of the main technical challenges associated with automated situation assessment, in general, and to propose an information processing methodology that meets those challenges, which we call Find-to-Forecast (F2F). The F2F framework supports accessing heterogeneous information (structured and unstructured), which is normalized into a standard RDF representation. Next, the F2F framework identifies mission-relevant information elements, filtering out irrelevant (or low priority) information, fusing the remaining relevant information. The next steps in the F2F process involve focusing operator attention on essential elements of mission information, and reasoning over fused, relevant information to forecast potential courses of action based on the evolving situation, changing data, and uncertain knowledge. This paper provides an overview of the overall F2F methodology, to provide context, followed by a more detailed consideration of the "focus" algorithm, which uses contextual semantics to evaluate the value of new information relative to an operator's situational understanding during evolving events.
{"title":"Find-to-Forecast Process: An Automated Methodology for Situation Assessment","authors":"K. Bimson, Ahmad Slim, G. Heileman","doi":"10.1109/ICSC.2014.60","DOIUrl":"https://doi.org/10.1109/ICSC.2014.60","url":null,"abstract":"The ability to identify, process, and comprehend the essential elements of information associated with a given operational environment can be used to reason about how the actors within the environment can best respond. This is often referred to as \"situation assessment,\" the end state of which is \"situation awareness,\" which can be simply defined as \"knowing what is going on around you.\" Taken together, these are important fields of study concerned with perception of the environment critical to decision-makers in many complex, dynamic domains, including aviation, military command and control, and emergency management. The primary goal of our research is to identify some of the main technical challenges associated with automated situation assessment, in general, and to propose an information processing methodology that meets those challenges, which we call Find-to-Forecast (F2F). The F2F framework supports accessing heterogeneous information (structured and unstructured), which is normalized into a standard RDF representation. Next, the F2F framework identifies mission-relevant information elements, filtering out irrelevant (or low priority) information, fusing the remaining relevant information. The next steps in the F2F process involve focusing operator attention on essential elements of mission information, and reasoning over fused, relevant information to forecast potential courses of action based on the evolving situation, changing data, and uncertain knowledge. This paper provides an overview of the overall F2F methodology, to provide context, followed by a more detailed consideration of the \"focus\" algorithm, which uses contextual semantics to evaluate the value of new information relative to an operator's situational understanding during evolving events.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125073606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cultural heritage resources are huge and heterogeneous. They include highly structured, very unstructured, and semi-structured data or information obtained from both authorized and unauthorized sources and involving multimedia data including text, audio and video data. With the rapid development of the web, more and more cultural heritage organizations use digital methods to record, store and represent their arts and events. However, searching for information after they are stored is still considered a challenging task. The use of semantic web techniques is proposed here to make the data more structured so that the items in the cultural heritage domain can be fully represented and made easily assessable to the public as much as possible. This paper proposes a method to convert a traditional cultural heritage website into one that is well-designed and content-rich. The method includes an ontology model which could automatically adopt new class and instance as input by asserted and inferred models. It could also align local ontology and external online ontologies. Through the proposed method, this paper also discusses several urgent issues about automatic conversion of data, semantic search and user involvement.
{"title":"Using Aligned Ontology Model to Convert Cultural Heritage Resources into Semantic Web","authors":"Li Bing, Keith C. C. Chan, L. Carr","doi":"10.1109/ICSC.2014.39","DOIUrl":"https://doi.org/10.1109/ICSC.2014.39","url":null,"abstract":"Cultural heritage resources are huge and heterogeneous. They include highly structured, very unstructured, and semi-structured data or information obtained from both authorized and unauthorized sources and involving multimedia data including text, audio and video data. With the rapid development of the web, more and more cultural heritage organizations use digital methods to record, store and represent their arts and events. However, searching for information after they are stored is still considered a challenging task. The use of semantic web techniques is proposed here to make the data more structured so that the items in the cultural heritage domain can be fully represented and made easily assessable to the public as much as possible. This paper proposes a method to convert a traditional cultural heritage website into one that is well-designed and content-rich. The method includes an ontology model which could automatically adopt new class and instance as input by asserted and inferred models. It could also align local ontology and external online ontologies. Through the proposed method, this paper also discusses several urgent issues about automatic conversion of data, semantic search and user involvement.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125464513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-06-16DOI: 10.1142/S1793351X14400078
D. Popolov, Joseph R. Barr
This paper discusses principles for the design of natural language processing (NLP) systems to automatically extract of data from doctor's notes, laboratory results and other medical documents in free-form text. We argue that rather than searching for 'atom units of meaning' in the text and then trying to generalize them into a broader set of documents through increasingly complicated system of rules, an NLP practitioner should take concepts as a whole as a meaningful unit of text. This simplifies the rules and makes NLP system easier to maintain and adapt. The departure point is purely practical, however a deeper investigation of typical problems with the implementation of such systems leads us to a discussion of broader theoretical principles underlying the NLP practices.
{"title":"\"Units of Meaning\" in Medical Documents: Natural Language Processing Perspective","authors":"D. Popolov, Joseph R. Barr","doi":"10.1142/S1793351X14400078","DOIUrl":"https://doi.org/10.1142/S1793351X14400078","url":null,"abstract":"This paper discusses principles for the design of natural language processing (NLP) systems to automatically extract of data from doctor's notes, laboratory results and other medical documents in free-form text. We argue that rather than searching for 'atom units of meaning' in the text and then trying to generalize them into a broader set of documents through increasingly complicated system of rules, an NLP practitioner should take concepts as a whole as a meaningful unit of text. This simplifies the rules and makes NLP system easier to maintain and adapt. The departure point is purely practical, however a deeper investigation of typical problems with the implementation of such systems leads us to a discussion of broader theoretical principles underlying the NLP practices.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115677365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we represent a dynamic context-dependent weighting method for vector space model. A meaning is relatively decided by a context dynamically. A vector space model, including latent semantic indexing (LSI), etc. relatively measures correlations of each target thing that represents in each vector. However, the vectors of each target thing in almost method of the vector space models are static. It is important to weight each element of each vector by a context. Recently, it is necessary to understand a certain thing by not reading one data but summarizing massive data. Therefore, the vectors in the vector space model create from data set corresponding to represent a certain thing. That is, we should create vectors for the vector space model dynamically corresponding to a context and data distribution. The features of our method are a dynamic calculation of each element of vectors in a vector space model corresponding to a context. Our method reduces a vector dimension corresponding to context by context-depending weighting. Therefore, We can measure correlation with low calculation cost corresponding to context because of dimension deduction.
{"title":"Semantic Context-Dependent Weighting for Vector Space Model","authors":"T. Nakanishi","doi":"10.1109/ICSC.2014.49","DOIUrl":"https://doi.org/10.1109/ICSC.2014.49","url":null,"abstract":"In this paper, we represent a dynamic context-dependent weighting method for vector space model. A meaning is relatively decided by a context dynamically. A vector space model, including latent semantic indexing (LSI), etc. relatively measures correlations of each target thing that represents in each vector. However, the vectors of each target thing in almost method of the vector space models are static. It is important to weight each element of each vector by a context. Recently, it is necessary to understand a certain thing by not reading one data but summarizing massive data. Therefore, the vectors in the vector space model create from data set corresponding to represent a certain thing. That is, we should create vectors for the vector space model dynamically corresponding to a context and data distribution. The features of our method are a dynamic calculation of each element of vectors in a vector space model corresponding to a context. Our method reduces a vector dimension corresponding to context by context-depending weighting. Therefore, We can measure correlation with low calculation cost corresponding to context because of dimension deduction.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126854725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many Semantic Web applications, having RDF predicates sorted by significance is of primarily importance to improve usability and performance. In this paper we focus on predicates available on DBpedia, the most important Semantic Web source of data counting 470 million english triples. Although there is plenty of work in literature dealing with ranking entities or RDF query results, none of them seem to specifically address the problem of computing predicate rank. We address the problem by associating to each DBPedia property (also known as predicates or attributes of RDF triples) a number of original features specifically designed to provide sort-by-importance quantitative measures, automatically computable from an online SPARQL endpoint or a RDF dataset. By computing those features on a number of entity properties, we created a learning set and tested the performance of a number of well-known learning-to-rank algorithms. Our first experimental results show that the approach is effective and fast.
{"title":"Computing On-the-Fly DBpedia Property Ranking","authors":"A. Dessì, M. Atzori","doi":"10.1109/ICSC.2014.55","DOIUrl":"https://doi.org/10.1109/ICSC.2014.55","url":null,"abstract":"In many Semantic Web applications, having RDF predicates sorted by significance is of primarily importance to improve usability and performance. In this paper we focus on predicates available on DBpedia, the most important Semantic Web source of data counting 470 million english triples. Although there is plenty of work in literature dealing with ranking entities or RDF query results, none of them seem to specifically address the problem of computing predicate rank. We address the problem by associating to each DBPedia property (also known as predicates or attributes of RDF triples) a number of original features specifically designed to provide sort-by-importance quantitative measures, automatically computable from an online SPARQL endpoint or a RDF dataset. By computing those features on a number of entity properties, we created a learning set and tested the performance of a number of well-known learning-to-rank algorithms. Our first experimental results show that the approach is effective and fast.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128457770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To cope with the increasing amount of cyber threats, cyber security information must be shared beyond organization borders. Assorted organizations have already started to provide publicly-available repositories that store XML-based cyber security information on the Internet, but users are unaware of all of them. Cyber security information must be identified and located across such repositories by the parties who need that, and then should be transported to them to advance information sharing. This paper proposes a discovery mechanism, which identifies and locates various types of cyber security information and exchanges the information over networks. The mechanism generates RDF-based metadata to manage the list of cyber security information, and the metadata structure is based on an ontology of cyber security information, which absorbs the differences of the assorted schemata of the information and incorporates them. The mechanism is also capable of propagating any information updates such that entities with obsolete information do not suffer from emerging security threats. This paper also introduces a prototype of the mechanism to demonstrate its feasibility. It then analyzes the mechanism's extensibility, scalability, and information credibility. Through this work, we wish to expedite information sharing beyond organization borders and contribute to global cyber security.
{"title":"Mechanism for Linking and Discovering Structured Cybersecurity Information over Networks","authors":"Takeshi Takahashi, Y. Kadobayashi","doi":"10.1109/ICSC.2014.66","DOIUrl":"https://doi.org/10.1109/ICSC.2014.66","url":null,"abstract":"To cope with the increasing amount of cyber threats, cyber security information must be shared beyond organization borders. Assorted organizations have already started to provide publicly-available repositories that store XML-based cyber security information on the Internet, but users are unaware of all of them. Cyber security information must be identified and located across such repositories by the parties who need that, and then should be transported to them to advance information sharing. This paper proposes a discovery mechanism, which identifies and locates various types of cyber security information and exchanges the information over networks. The mechanism generates RDF-based metadata to manage the list of cyber security information, and the metadata structure is based on an ontology of cyber security information, which absorbs the differences of the assorted schemata of the information and incorporates them. The mechanism is also capable of propagating any information updates such that entities with obsolete information do not suffer from emerging security threats. This paper also introduces a prototype of the mechanism to demonstrate its feasibility. It then analyzes the mechanism's extensibility, scalability, and information credibility. Through this work, we wish to expedite information sharing beyond organization borders and contribute to global cyber security.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"275 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133232846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, we have witnessed a deluge of multimedia data such as texts, images, and videos. However, the research of managing and retrieving these data efficiently is still in the development stage. The conventional tag-based searching approaches suffer from noisy or incomplete tag issues. As a result, the content-based multimedia data management framework has become increasingly popular. In this research direction, multimedia high-level semantic concept mining and retrieval is one of the fastest developing research topics requesting joint efforts from researchers in both data mining and multimedia domains. To solve this problem, one great challenge is to bridge the semantic gap which is the gap between high-level concepts and low-level features. Recently, positive inter-concept correlations have been utilized to capture the context of a concept to bridge the gap. However, negative correlations have rarely been studied because of the difficulty to mine and utilize them. In this paper, a concept mining and retrieval framework utilizing negative inter-concept correlations is proposed. Several research problems such as negative correlation selection, weight estimation, and score integration are addressed. Experimental results on TRECVID 2010 benchmark data set demonstrate that the proposed framework gives promising performance.
{"title":"Enhancing Multimedia Semantic Concept Mining and Retrieval by Incorporating Negative Correlations","authors":"Tao Meng, Yang Liu, M. Shyu, Yilin Yan, C. Shu","doi":"10.1109/ICSC.2014.30","DOIUrl":"https://doi.org/10.1109/ICSC.2014.30","url":null,"abstract":"In recent years, we have witnessed a deluge of multimedia data such as texts, images, and videos. However, the research of managing and retrieving these data efficiently is still in the development stage. The conventional tag-based searching approaches suffer from noisy or incomplete tag issues. As a result, the content-based multimedia data management framework has become increasingly popular. In this research direction, multimedia high-level semantic concept mining and retrieval is one of the fastest developing research topics requesting joint efforts from researchers in both data mining and multimedia domains. To solve this problem, one great challenge is to bridge the semantic gap which is the gap between high-level concepts and low-level features. Recently, positive inter-concept correlations have been utilized to capture the context of a concept to bridge the gap. However, negative correlations have rarely been studied because of the difficulty to mine and utilize them. In this paper, a concept mining and retrieval framework utilizing negative inter-concept correlations is proposed. Several research problems such as negative correlation selection, weight estimation, and score integration are addressed. Experimental results on TRECVID 2010 benchmark data set demonstrate that the proposed framework gives promising performance.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116634734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}