首页 > 最新文献

2014 IEEE International Conference on Semantic Computing最新文献

英文 中文
Harvesting Domain Specific Ontologies from Text 从文本中获取领域特定本体
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.12
Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo
Ontologies are a vital component of most knowledge-based applications, including semantic web search, intelligent information integration, and natural language processing. In particular, we need effective tools for generating in-depth ontologies that achieve comprehensive converge of specific application domains of interest, while minimizing the time and cost of this process. Therefore we cannot rely on the manual or highly supervised approaches often used in the past, since they do not scale well. We instead propose a new approach that automatically generates domain-specific ontologies from a small corpus of documents using deep NLP-based text-mining. Starting from an initial small seed of domain concepts, our Onto Harvester system iteratively extracts ontological relations connecting existing concepts to other terms in the text, and adds strongly connected terms to the current ontology. As a result, Onto Harvester (i) remains focused on the application domain, (ii) is resistant to noise, and (iii) generates very comprehensive ontologies from modest-size document corpora. In fact, starting from a small seed, Onto Harvester produces ontologies that outperform both manually generated ontologies and ontologies generated by current techniques, even those that require very large well-focused data sets.
本体是大多数基于知识的应用程序的重要组成部分,包括语义网络搜索、智能信息集成和自然语言处理。特别是,我们需要有效的工具来生成深度本体,以实现感兴趣的特定应用领域的全面收敛,同时最大限度地减少此过程的时间和成本。因此,我们不能依赖过去经常使用的手动或高度监督的方法,因为它们不能很好地扩展。我们提出了一种新的方法,使用基于深度nlp的文本挖掘,从一个小的文档语料库自动生成特定领域的本体。从最初的小领域概念种子开始,我们的Onto Harvester系统迭代地提取将现有概念与文本中其他术语连接起来的本体关系,并将强连接的术语添加到当前本体中。因此,Onto Harvester(1)仍然专注于应用领域,(2)抗噪声,(3)从中等大小的文档语料库生成非常全面的本体。事实上,从一个小种子开始,Onto Harvester产生的本体优于手动生成的本体和当前技术生成的本体,即使是那些需要非常大的集中数据集的本体。
{"title":"Harvesting Domain Specific Ontologies from Text","authors":"Hamid Mousavi, Deirdre Kerr, Markus R Iseli, C. Zaniolo","doi":"10.1109/ICSC.2014.12","DOIUrl":"https://doi.org/10.1109/ICSC.2014.12","url":null,"abstract":"Ontologies are a vital component of most knowledge-based applications, including semantic web search, intelligent information integration, and natural language processing. In particular, we need effective tools for generating in-depth ontologies that achieve comprehensive converge of specific application domains of interest, while minimizing the time and cost of this process. Therefore we cannot rely on the manual or highly supervised approaches often used in the past, since they do not scale well. We instead propose a new approach that automatically generates domain-specific ontologies from a small corpus of documents using deep NLP-based text-mining. Starting from an initial small seed of domain concepts, our Onto Harvester system iteratively extracts ontological relations connecting existing concepts to other terms in the text, and adds strongly connected terms to the current ontology. As a result, Onto Harvester (i) remains focused on the application domain, (ii) is resistant to noise, and (iii) generates very comprehensive ontologies from modest-size document corpora. In fact, starting from a small seed, Onto Harvester produces ontologies that outperform both manually generated ontologies and ontologies generated by current techniques, even those that require very large well-focused data sets.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116915433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Big Data, Big Challenges 大数据,大挑战
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.65
Wei Wang
Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.
只提供摘要形式。大数据分析是检查大量各种类型的数据(大数据)以发现隐藏的模式,未知的相关性和其他有用信息的过程。它的革命潜力现在已得到普遍认可。数据的复杂性、异构性、规模和时效性使得数据分析在许多生物医学应用中成为一个明显的瓶颈,这是由于数据模式的复杂性和底层算法缺乏可扩展性。正在开发先进的机器学习和数据挖掘算法来解决上面列出的一个或多个挑战。典型的情况是,潜在模式的复杂性可能会随着数据复杂性呈指数级增长,模式空间的大小也是如此。为了避免在模式空间中进行穷举搜索,机器学习和数据挖掘算法通常采用贪心方法在解空间中搜索局部最优,或使用分支定界方法寻求最优解,因此,通常以迭代或递归的方式实现。为了提高效率,这些算法通常利用潜在模式之间的依赖关系来最大化内存中的计算和/或利用特殊硬件(如GPU和FPGA)来加速。这将导致强烈的数据依赖性、操作依赖性和硬件依赖性,有时还会导致无法推广到更广泛范围的临时解决方案。在这次演讲中,我将介绍生物医学领域数据科学家面临的一些公开挑战以及目前应对这些挑战的方法。
{"title":"Big Data, Big Challenges","authors":"Wei Wang","doi":"10.1109/ICSC.2014.65","DOIUrl":"https://doi.org/10.1109/ICSC.2014.65","url":null,"abstract":"Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128110484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
A Scalable Approach to Learn Semantic Models of Structured Sources 一种学习结构化资源语义模型的可扩展方法
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.13
M. Taheriyan, Craig A. Knoblock, Pedro A. Szekely, J. Ambite
Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes.
数据源的语义模型根据领域本体定义的概念和关系来描述数据的含义。构建这样的模型是集成来自不同数据源的数据的重要一步,我们需要向用户提供底层数据源的统一视图。在本文中,我们提出了一种可扩展的方法,通过利用先前建模源的知识来自动学习结构化数据源的语义模型。我们的评估表明,该方法可以用最少的用户输入生成富有表现力的语义模型,并且可以扩展到具有许多属性的大型本体和数据源。
{"title":"A Scalable Approach to Learn Semantic Models of Structured Sources","authors":"M. Taheriyan, Craig A. Knoblock, Pedro A. Szekely, J. Ambite","doi":"10.1109/ICSC.2014.13","DOIUrl":"https://doi.org/10.1109/ICSC.2014.13","url":null,"abstract":"Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129095643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Find-to-Forecast Process: An Automated Methodology for Situation Assessment 从发现到预测的过程:一种态势评估的自动化方法
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.60
K. Bimson, Ahmad Slim, G. Heileman
The ability to identify, process, and comprehend the essential elements of information associated with a given operational environment can be used to reason about how the actors within the environment can best respond. This is often referred to as "situation assessment," the end state of which is "situation awareness," which can be simply defined as "knowing what is going on around you." Taken together, these are important fields of study concerned with perception of the environment critical to decision-makers in many complex, dynamic domains, including aviation, military command and control, and emergency management. The primary goal of our research is to identify some of the main technical challenges associated with automated situation assessment, in general, and to propose an information processing methodology that meets those challenges, which we call Find-to-Forecast (F2F). The F2F framework supports accessing heterogeneous information (structured and unstructured), which is normalized into a standard RDF representation. Next, the F2F framework identifies mission-relevant information elements, filtering out irrelevant (or low priority) information, fusing the remaining relevant information. The next steps in the F2F process involve focusing operator attention on essential elements of mission information, and reasoning over fused, relevant information to forecast potential courses of action based on the evolving situation, changing data, and uncertain knowledge. This paper provides an overview of the overall F2F methodology, to provide context, followed by a more detailed consideration of the "focus" algorithm, which uses contextual semantics to evaluate the value of new information relative to an operator's situational understanding during evolving events.
识别、处理和理解与给定操作环境相关的信息的基本元素的能力可用于推断环境中的参与者如何最好地响应。这通常被称为“情况评估”,其最终状态是“情况感知”,可以简单地定义为“知道你周围发生了什么”。总的来说,这些都是研究环境感知的重要领域,对许多复杂、动态领域的决策者至关重要,包括航空、军事指挥和控制以及应急管理。我们研究的主要目标是确定与自动情况评估相关的一些主要技术挑战,一般来说,并提出满足这些挑战的信息处理方法,我们称之为发现到预测(F2F)。F2F框架支持访问异构信息(结构化和非结构化),这些信息被标准化为标准RDF表示。接下来,F2F框架识别与任务相关的信息元素,过滤掉不相关(或低优先级)的信息,融合剩余的相关信息。F2F过程的下一步包括将操作员的注意力集中在任务信息的基本要素上,并对融合的相关信息进行推理,以根据不断变化的情况、不断变化的数据和不确定的知识预测潜在的行动路线。本文概述了整个F2F方法,提供了上下文,然后更详细地考虑了“焦点”算法,该算法使用上下文语义来评估相对于操作员在不断发展的事件中的情景理解的新信息的价值。
{"title":"Find-to-Forecast Process: An Automated Methodology for Situation Assessment","authors":"K. Bimson, Ahmad Slim, G. Heileman","doi":"10.1109/ICSC.2014.60","DOIUrl":"https://doi.org/10.1109/ICSC.2014.60","url":null,"abstract":"The ability to identify, process, and comprehend the essential elements of information associated with a given operational environment can be used to reason about how the actors within the environment can best respond. This is often referred to as \"situation assessment,\" the end state of which is \"situation awareness,\" which can be simply defined as \"knowing what is going on around you.\" Taken together, these are important fields of study concerned with perception of the environment critical to decision-makers in many complex, dynamic domains, including aviation, military command and control, and emergency management. The primary goal of our research is to identify some of the main technical challenges associated with automated situation assessment, in general, and to propose an information processing methodology that meets those challenges, which we call Find-to-Forecast (F2F). The F2F framework supports accessing heterogeneous information (structured and unstructured), which is normalized into a standard RDF representation. Next, the F2F framework identifies mission-relevant information elements, filtering out irrelevant (or low priority) information, fusing the remaining relevant information. The next steps in the F2F process involve focusing operator attention on essential elements of mission information, and reasoning over fused, relevant information to forecast potential courses of action based on the evolving situation, changing data, and uncertain knowledge. This paper provides an overview of the overall F2F methodology, to provide context, followed by a more detailed consideration of the \"focus\" algorithm, which uses contextual semantics to evaluate the value of new information relative to an operator's situational understanding during evolving events.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125073606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Aligned Ontology Model to Convert Cultural Heritage Resources into Semantic Web 利用对齐本体模型将文化遗产资源转化为语义Web
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.39
Li Bing, Keith C. C. Chan, L. Carr
Cultural heritage resources are huge and heterogeneous. They include highly structured, very unstructured, and semi-structured data or information obtained from both authorized and unauthorized sources and involving multimedia data including text, audio and video data. With the rapid development of the web, more and more cultural heritage organizations use digital methods to record, store and represent their arts and events. However, searching for information after they are stored is still considered a challenging task. The use of semantic web techniques is proposed here to make the data more structured so that the items in the cultural heritage domain can be fully represented and made easily assessable to the public as much as possible. This paper proposes a method to convert a traditional cultural heritage website into one that is well-designed and content-rich. The method includes an ontology model which could automatically adopt new class and instance as input by asserted and inferred models. It could also align local ontology and external online ontologies. Through the proposed method, this paper also discusses several urgent issues about automatic conversion of data, semantic search and user involvement.
文化遗产资源巨大且异质性强。它们包括高度结构化、非常非结构化和半结构化的数据或从授权和未经授权的来源获得的信息,并涉及多媒体数据,包括文本、音频和视频数据。随着网络的快速发展,越来越多的文化遗产组织采用数字化的方式来记录、存储和展示他们的艺术和活动。然而,在信息存储后进行搜索仍然被认为是一项具有挑战性的任务。本文建议使用语义网技术,使数据更加结构化,从而使文化遗产领域的项目能够充分地呈现出来,并尽可能方便地向公众进行评估。本文提出了一种将传统文化遗产网站改造成设计精良、内容丰富的网站的方法。该方法包括一个本体模型,该模型可以通过断言模型和推断模型自动接受新的类和实例作为输入。它还可以对齐本地本体和外部在线本体。通过提出的方法,本文还讨论了数据自动转换、语义搜索和用户参与等亟待解决的问题。
{"title":"Using Aligned Ontology Model to Convert Cultural Heritage Resources into Semantic Web","authors":"Li Bing, Keith C. C. Chan, L. Carr","doi":"10.1109/ICSC.2014.39","DOIUrl":"https://doi.org/10.1109/ICSC.2014.39","url":null,"abstract":"Cultural heritage resources are huge and heterogeneous. They include highly structured, very unstructured, and semi-structured data or information obtained from both authorized and unauthorized sources and involving multimedia data including text, audio and video data. With the rapid development of the web, more and more cultural heritage organizations use digital methods to record, store and represent their arts and events. However, searching for information after they are stored is still considered a challenging task. The use of semantic web techniques is proposed here to make the data more structured so that the items in the cultural heritage domain can be fully represented and made easily assessable to the public as much as possible. This paper proposes a method to convert a traditional cultural heritage website into one that is well-designed and content-rich. The method includes an ontology model which could automatically adopt new class and instance as input by asserted and inferred models. It could also align local ontology and external online ontologies. Through the proposed method, this paper also discusses several urgent issues about automatic conversion of data, semantic search and user involvement.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125464513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
"Units of Meaning" in Medical Documents: Natural Language Processing Perspective 医学文献中的“意义单位”:自然语言处理视角
Pub Date : 2014-06-16 DOI: 10.1142/S1793351X14400078
D. Popolov, Joseph R. Barr
This paper discusses principles for the design of natural language processing (NLP) systems to automatically extract of data from doctor's notes, laboratory results and other medical documents in free-form text. We argue that rather than searching for 'atom units of meaning' in the text and then trying to generalize them into a broader set of documents through increasingly complicated system of rules, an NLP practitioner should take concepts as a whole as a meaningful unit of text. This simplifies the rules and makes NLP system easier to maintain and adapt. The departure point is purely practical, however a deeper investigation of typical problems with the implementation of such systems leads us to a discussion of broader theoretical principles underlying the NLP practices.
本文讨论了自然语言处理(NLP)系统的设计原则,用于从医生笔记、实验室结果和其他自由格式的医疗文档中自动提取数据。我们认为,与其在文本中搜索“意义的原子单位”,然后试图通过日益复杂的规则系统将它们概括为更广泛的文档集,NLP从业者应该将概念作为一个整体作为文本的有意义单位。这简化了规则,使NLP系统更容易维护和适应。出发点纯粹是实用的,然而,对此类系统实施的典型问题进行更深入的调查将使我们讨论NLP实践背后的更广泛的理论原则。
{"title":"\"Units of Meaning\" in Medical Documents: Natural Language Processing Perspective","authors":"D. Popolov, Joseph R. Barr","doi":"10.1142/S1793351X14400078","DOIUrl":"https://doi.org/10.1142/S1793351X14400078","url":null,"abstract":"This paper discusses principles for the design of natural language processing (NLP) systems to automatically extract of data from doctor's notes, laboratory results and other medical documents in free-form text. We argue that rather than searching for 'atom units of meaning' in the text and then trying to generalize them into a broader set of documents through increasingly complicated system of rules, an NLP practitioner should take concepts as a whole as a meaningful unit of text. This simplifies the rules and makes NLP system easier to maintain and adapt. The departure point is purely practical, however a deeper investigation of typical problems with the implementation of such systems leads us to a discussion of broader theoretical principles underlying the NLP practices.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115677365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semantic Context-Dependent Weighting for Vector Space Model 向量空间模型的语义上下文相关加权
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.49
T. Nakanishi
In this paper, we represent a dynamic context-dependent weighting method for vector space model. A meaning is relatively decided by a context dynamically. A vector space model, including latent semantic indexing (LSI), etc. relatively measures correlations of each target thing that represents in each vector. However, the vectors of each target thing in almost method of the vector space models are static. It is important to weight each element of each vector by a context. Recently, it is necessary to understand a certain thing by not reading one data but summarizing massive data. Therefore, the vectors in the vector space model create from data set corresponding to represent a certain thing. That is, we should create vectors for the vector space model dynamically corresponding to a context and data distribution. The features of our method are a dynamic calculation of each element of vectors in a vector space model corresponding to a context. Our method reduces a vector dimension corresponding to context by context-depending weighting. Therefore, We can measure correlation with low calculation cost corresponding to context because of dimension deduction.
本文提出了一种基于上下文的向量空间模型动态加权方法。意义是相对动态地由上下文决定的。包括潜在语义索引(LSI)等在内的向量空间模型相对度量了每个向量中表示的每个目标事物的相关性。然而,在向量空间模型的几乎方法中,每个目标物体的向量是静态的。通过上下文对每个向量的每个元素进行加权是很重要的。最近,需要通过汇总大量数据而不是阅读一个数据来理解某件事。因此,向量空间模型中的向量是从对应的数据集中产生的,用来表示某一事物。也就是说,我们应该为与上下文和数据分布动态对应的向量空间模型创建向量。该方法的特点是动态计算向量空间模型中对应于上下文的每个向量元素。我们的方法通过上下文相关的加权来减少与上下文对应的向量维。因此,由于维度的推演,我们可以以较低的计算成本度量上下文对应的相关性。
{"title":"Semantic Context-Dependent Weighting for Vector Space Model","authors":"T. Nakanishi","doi":"10.1109/ICSC.2014.49","DOIUrl":"https://doi.org/10.1109/ICSC.2014.49","url":null,"abstract":"In this paper, we represent a dynamic context-dependent weighting method for vector space model. A meaning is relatively decided by a context dynamically. A vector space model, including latent semantic indexing (LSI), etc. relatively measures correlations of each target thing that represents in each vector. However, the vectors of each target thing in almost method of the vector space models are static. It is important to weight each element of each vector by a context. Recently, it is necessary to understand a certain thing by not reading one data but summarizing massive data. Therefore, the vectors in the vector space model create from data set corresponding to represent a certain thing. That is, we should create vectors for the vector space model dynamically corresponding to a context and data distribution. The features of our method are a dynamic calculation of each element of vectors in a vector space model corresponding to a context. Our method reduces a vector dimension corresponding to context by context-depending weighting. Therefore, We can measure correlation with low calculation cost corresponding to context because of dimension deduction.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126854725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Computing On-the-Fly DBpedia Property Ranking 计算动态DBpedia属性排名
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.55
A. Dessì, M. Atzori
In many Semantic Web applications, having RDF predicates sorted by significance is of primarily importance to improve usability and performance. In this paper we focus on predicates available on DBpedia, the most important Semantic Web source of data counting 470 million english triples. Although there is plenty of work in literature dealing with ranking entities or RDF query results, none of them seem to specifically address the problem of computing predicate rank. We address the problem by associating to each DBPedia property (also known as predicates or attributes of RDF triples) a number of original features specifically designed to provide sort-by-importance quantitative measures, automatically computable from an online SPARQL endpoint or a RDF dataset. By computing those features on a number of entity properties, we created a learning set and tested the performance of a number of well-known learning-to-rank algorithms. Our first experimental results show that the approach is effective and fast.
在许多语义Web应用程序中,按照重要性对RDF谓词进行排序对于提高可用性和性能非常重要。在本文中,我们将重点关注DBpedia上可用的谓词,DBpedia是最重要的语义Web数据源,包含4.7亿个英语三元组。尽管文献中有大量关于排序实体或RDF查询结果的工作,但它们似乎都没有专门解决计算谓词秩的问题。我们通过为每个DBPedia属性(也称为RDF三元组的谓词或属性)关联许多专门设计用于提供按重要性排序的定量度量的原始特性来解决这个问题,这些特性可从在线SPARQL端点或RDF数据集自动计算。通过在许多实体属性上计算这些特征,我们创建了一个学习集,并测试了许多著名的学习排序算法的性能。初步实验结果表明,该方法有效、快速。
{"title":"Computing On-the-Fly DBpedia Property Ranking","authors":"A. Dessì, M. Atzori","doi":"10.1109/ICSC.2014.55","DOIUrl":"https://doi.org/10.1109/ICSC.2014.55","url":null,"abstract":"In many Semantic Web applications, having RDF predicates sorted by significance is of primarily importance to improve usability and performance. In this paper we focus on predicates available on DBpedia, the most important Semantic Web source of data counting 470 million english triples. Although there is plenty of work in literature dealing with ranking entities or RDF query results, none of them seem to specifically address the problem of computing predicate rank. We address the problem by associating to each DBPedia property (also known as predicates or attributes of RDF triples) a number of original features specifically designed to provide sort-by-importance quantitative measures, automatically computable from an online SPARQL endpoint or a RDF dataset. By computing those features on a number of entity properties, we created a learning set and tested the performance of a number of well-known learning-to-rank algorithms. Our first experimental results show that the approach is effective and fast.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128457770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mechanism for Linking and Discovering Structured Cybersecurity Information over Networks 网络上结构化网络安全信息的链接和发现机制
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.66
Takeshi Takahashi, Y. Kadobayashi
To cope with the increasing amount of cyber threats, cyber security information must be shared beyond organization borders. Assorted organizations have already started to provide publicly-available repositories that store XML-based cyber security information on the Internet, but users are unaware of all of them. Cyber security information must be identified and located across such repositories by the parties who need that, and then should be transported to them to advance information sharing. This paper proposes a discovery mechanism, which identifies and locates various types of cyber security information and exchanges the information over networks. The mechanism generates RDF-based metadata to manage the list of cyber security information, and the metadata structure is based on an ontology of cyber security information, which absorbs the differences of the assorted schemata of the information and incorporates them. The mechanism is also capable of propagating any information updates such that entities with obsolete information do not suffer from emerging security threats. This paper also introduces a prototype of the mechanism to demonstrate its feasibility. It then analyzes the mechanism's extensibility, scalability, and information credibility. Through this work, we wish to expedite information sharing beyond organization borders and contribute to global cyber security.
为了应对日益增多的网络威胁,网络安全信息必须跨越组织边界共享。各种各样的组织已经开始提供公开可用的存储库,这些存储库在Internet上存储基于xml的网络安全信息,但用户并不知道所有这些存储库。需要网络安全信息的各方必须在这些存储库中识别和定位网络安全信息,然后将其传输到这些存储库,以促进信息共享。本文提出了一种发现机制,该机制可以识别和定位各种类型的网络安全信息,并在网络上进行信息交换。该机制生成基于rdf的元数据来管理网络安全信息列表,元数据结构基于网络安全信息本体,吸收信息分类模式的差异并进行融合。该机制还能够传播任何信息更新,使具有过时信息的实体不会受到新出现的安全威胁。本文还介绍了该机构的原型,以证明其可行性。然后分析了该机制的可扩展性、可伸缩性和信息可信度。通过这项工作,我们希望加快跨组织的信息共享,为全球网络安全作出贡献。
{"title":"Mechanism for Linking and Discovering Structured Cybersecurity Information over Networks","authors":"Takeshi Takahashi, Y. Kadobayashi","doi":"10.1109/ICSC.2014.66","DOIUrl":"https://doi.org/10.1109/ICSC.2014.66","url":null,"abstract":"To cope with the increasing amount of cyber threats, cyber security information must be shared beyond organization borders. Assorted organizations have already started to provide publicly-available repositories that store XML-based cyber security information on the Internet, but users are unaware of all of them. Cyber security information must be identified and located across such repositories by the parties who need that, and then should be transported to them to advance information sharing. This paper proposes a discovery mechanism, which identifies and locates various types of cyber security information and exchanges the information over networks. The mechanism generates RDF-based metadata to manage the list of cyber security information, and the metadata structure is based on an ontology of cyber security information, which absorbs the differences of the assorted schemata of the information and incorporates them. The mechanism is also capable of propagating any information updates such that entities with obsolete information do not suffer from emerging security threats. This paper also introduces a prototype of the mechanism to demonstrate its feasibility. It then analyzes the mechanism's extensibility, scalability, and information credibility. Through this work, we wish to expedite information sharing beyond organization borders and contribute to global cyber security.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"275 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133232846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Enhancing Multimedia Semantic Concept Mining and Retrieval by Incorporating Negative Correlations 利用负相关增强多媒体语义概念挖掘与检索
Pub Date : 2014-06-16 DOI: 10.1109/ICSC.2014.30
Tao Meng, Yang Liu, M. Shyu, Yilin Yan, C. Shu
In recent years, we have witnessed a deluge of multimedia data such as texts, images, and videos. However, the research of managing and retrieving these data efficiently is still in the development stage. The conventional tag-based searching approaches suffer from noisy or incomplete tag issues. As a result, the content-based multimedia data management framework has become increasingly popular. In this research direction, multimedia high-level semantic concept mining and retrieval is one of the fastest developing research topics requesting joint efforts from researchers in both data mining and multimedia domains. To solve this problem, one great challenge is to bridge the semantic gap which is the gap between high-level concepts and low-level features. Recently, positive inter-concept correlations have been utilized to capture the context of a concept to bridge the gap. However, negative correlations have rarely been studied because of the difficulty to mine and utilize them. In this paper, a concept mining and retrieval framework utilizing negative inter-concept correlations is proposed. Several research problems such as negative correlation selection, weight estimation, and score integration are addressed. Experimental results on TRECVID 2010 benchmark data set demonstrate that the proposed framework gives promising performance.
近年来,我们目睹了大量的多媒体数据,如文本、图像和视频。然而,如何有效地管理和检索这些数据还处于发展阶段。传统的基于标签的搜索方法存在噪声或标签不完整的问题。因此,基于内容的多媒体数据管理框架越来越受欢迎。多媒体高级语义概念挖掘与检索是当前发展最快的研究方向之一,需要数据挖掘和多媒体领域的研究人员共同努力。为了解决这个问题,一个巨大的挑战是弥合语义差距,即高级概念和低级特征之间的差距。最近,积极的概念间相关性被用来捕捉概念的背景,以弥合差距。然而,由于难以挖掘和利用负相关关系,对负相关关系的研究很少。本文提出了一种利用概念间负相关的概念挖掘和检索框架。讨论了负相关选择、权值估计和分数整合等研究问题。在TRECVID 2010基准数据集上的实验结果表明,该框架具有良好的性能。
{"title":"Enhancing Multimedia Semantic Concept Mining and Retrieval by Incorporating Negative Correlations","authors":"Tao Meng, Yang Liu, M. Shyu, Yilin Yan, C. Shu","doi":"10.1109/ICSC.2014.30","DOIUrl":"https://doi.org/10.1109/ICSC.2014.30","url":null,"abstract":"In recent years, we have witnessed a deluge of multimedia data such as texts, images, and videos. However, the research of managing and retrieving these data efficiently is still in the development stage. The conventional tag-based searching approaches suffer from noisy or incomplete tag issues. As a result, the content-based multimedia data management framework has become increasingly popular. In this research direction, multimedia high-level semantic concept mining and retrieval is one of the fastest developing research topics requesting joint efforts from researchers in both data mining and multimedia domains. To solve this problem, one great challenge is to bridge the semantic gap which is the gap between high-level concepts and low-level features. Recently, positive inter-concept correlations have been utilized to capture the context of a concept to bridge the gap. However, negative correlations have rarely been studied because of the difficulty to mine and utilize them. In this paper, a concept mining and retrieval framework utilizing negative inter-concept correlations is proposed. Several research problems such as negative correlation selection, weight estimation, and score integration are addressed. Experimental results on TRECVID 2010 benchmark data set demonstrate that the proposed framework gives promising performance.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116634734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2014 IEEE International Conference on Semantic Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1