首页 > 最新文献

Int. J. Semantic Comput.最新文献

英文 中文
A Study on Information-Preserving Schema Transformations 信息保持模式转换的研究
Pub Date : 2020-06-09 DOI: 10.1142/s1793351x20400024
Nonyelum Ndefo, Enrico Franconi
The problem of determining the relative information capacity between two knowledge bases or schemas, of the same or different models, is inherent when implementing schema transformations. When rest...
在实现模式转换时,确定相同或不同模型的两个知识库或模式之间的相对信息容量的问题是固有的。当休息……
{"title":"A Study on Information-Preserving Schema Transformations","authors":"Nonyelum Ndefo, Enrico Franconi","doi":"10.1142/s1793351x20400024","DOIUrl":"https://doi.org/10.1142/s1793351x20400024","url":null,"abstract":"The problem of determining the relative information capacity between two knowledge bases or schemas, of the same or different models, is inherent when implementing schema transformations. When rest...","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126705647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Theory and Classifying Security Events in Grid Security Gateways 图论与网格安全网关安全事件分类
Pub Date : 2020-06-09 DOI: 10.1142/s1793351x2040005x
James Obert, A. Chavez
In recent years, the use of security gateways (SG) located within the electrical grid distribution network has become pervasive. SGs in substations and renewable distributed energy resource aggrega...
近年来,安全网关(SG)在电网配电网中的应用越来越普遍。变电站和可再生分布式能源的SGs…
{"title":"Graph Theory and Classifying Security Events in Grid Security Gateways","authors":"James Obert, A. Chavez","doi":"10.1142/s1793351x2040005x","DOIUrl":"https://doi.org/10.1142/s1793351x2040005x","url":null,"abstract":"In recent years, the use of security gateways (SG) located within the electrical grid distribution network has become pervasive. SGs in substations and renewable distributed energy resource aggrega...","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"223 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128847289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles 数字娱乐内容字幕的无监督翻译质量估计
Pub Date : 2020-06-09 DOI: 10.1142/S1793351X20500026
Prabhakar Gupta, Mayank Sharma
We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or refe...
我们展示了使用对齐双语词嵌入来开发一种无监督方法来评估机器翻译的潜力,而不需要平行翻译语料库或参考文献。
{"title":"Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles","authors":"Prabhakar Gupta, Mayank Sharma","doi":"10.1142/S1793351X20500026","DOIUrl":"https://doi.org/10.1142/S1793351X20500026","url":null,"abstract":"We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or refe...","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114552260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Click-Through Rate Prediction of Online Banners Featuring Multimodal Analysis 多模态分析在线广告的点击率预测
Pub Date : 2020-06-09 DOI: 10.1142/s1793351x20400048
Bohui Xia, Hiroyuki Seshime, Xueting Wang, T. Yamasaki
As the online advertisement industry continues to grow, it is predicted that online advertisement will account for about 45% of global advertisement spending by 2020.a Thus, predicting the click-th...
随着网络广告行业的持续增长,预计到2020年,网络广告将占全球广告支出的45%左右。因此,预测点击…
{"title":"Click-Through Rate Prediction of Online Banners Featuring Multimodal Analysis","authors":"Bohui Xia, Hiroyuki Seshime, Xueting Wang, T. Yamasaki","doi":"10.1142/s1793351x20400048","DOIUrl":"https://doi.org/10.1142/s1793351x20400048","url":null,"abstract":"As the online advertisement industry continues to grow, it is predicted that online advertisement will account for about 45% of global advertisement spending by 2020.a Thus, predicting the click-th...","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131254270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards Programming in Natural Language: Learning New Functions from Spoken Utterances 面向自然语言编程:从口语话语中学习新功能
Pub Date : 2020-06-01 DOI: 10.1142/S1793351X20400097
Sebastian Weigelt, Vanessa Steurer, Tobias Hey, W. Tichy
Systems with conversational interfaces are rather popular nowadays. However, their full potential is not yet exploited. For the time being, users are restricted to calling predefined functions. Soon, users will expect to customize systems to their needs and create own functions using nothing but spoken instructions. Thus, future systems must understand how laypersons teach new functionality to intelligent systems. The understanding of natural language teaching sequences is a first step toward comprehensive end-user programming in natural language. We propose to analyze the semantics of spoken teaching sequences with a hierarchical classification approach. First, we classify whether an utterance constitutes an effort to teach a new function or not. Afterward, a second classifier locates the distinct semantic parts of teaching efforts: declaration of a new function, specification of intermediate steps, and superfluous information. For both tasks we implement a broad range of machine learning techniques: classical approaches, such as Naïve Bayes, and neural network configurations of various types and architectures, such as bidirectional LSTMs. Additionally, we introduce two heuristic-based adaptations that are tailored to the task of understanding teaching sequences. As data basis we use 3168 descriptions gathered in a user study. For the first task convolutional neural networks obtain the best results (accuracy: 96.6%); bidirectional LSTMs excel in the second (accuracy: 98.8%). The adaptations improve the first-level classification considerably (plus 2.2% points).
具有会话接口的系统现在相当流行。然而,它们的全部潜力尚未得到开发。目前,用户只能调用预定义的函数。很快,用户将期望根据自己的需求定制系统,只使用语音指令就能创建自己的功能。因此,未来的系统必须了解外行人如何向智能系统传授新功能。对自然语言教学序列的理解是实现全面的最终用户自然语言编程的第一步。我们提出用层次分类的方法来分析口语教学序列的语义。首先,我们对话语是否构成教授新功能的努力进行分类。然后,第二个分类器定位教学努力的不同语义部分:新功能的声明、中间步骤的说明和多余的信息。对于这两个任务,我们实现了广泛的机器学习技术:经典方法,如Naïve贝叶斯,以及各种类型和架构的神经网络配置,如双向lstm。此外,我们还介绍了两种基于启发式的适应,它们是针对理解教学序列的任务量身定制的。我们使用在用户研究中收集的3168个描述作为数据基础。对于第一个任务,卷积神经网络获得了最好的结果(准确率:96.6%);双向lstm在第二方面表现突出(准确率为98.8%)。这些调整大大提高了一级分类(增加2.2%)。
{"title":"Towards Programming in Natural Language: Learning New Functions from Spoken Utterances","authors":"Sebastian Weigelt, Vanessa Steurer, Tobias Hey, W. Tichy","doi":"10.1142/S1793351X20400097","DOIUrl":"https://doi.org/10.1142/S1793351X20400097","url":null,"abstract":"Systems with conversational interfaces are rather popular nowadays. However, their full potential is not yet exploited. For the time being, users are restricted to calling predefined functions. Soon, users will expect to customize systems to their needs and create own functions using nothing but spoken instructions. Thus, future systems must understand how laypersons teach new functionality to intelligent systems. The understanding of natural language teaching sequences is a first step toward comprehensive end-user programming in natural language. We propose to analyze the semantics of spoken teaching sequences with a hierarchical classification approach. First, we classify whether an utterance constitutes an effort to teach a new function or not. Afterward, a second classifier locates the distinct semantic parts of teaching efforts: declaration of a new function, specification of intermediate steps, and superfluous information. For both tasks we implement a broad range of machine learning techniques: classical approaches, such as Naïve Bayes, and neural network configurations of various types and architectures, such as bidirectional LSTMs. Additionally, we introduce two heuristic-based adaptations that are tailored to the task of understanding teaching sequences. As data basis we use 3168 descriptions gathered in a user study. For the first task convolutional neural networks obtain the best results (accuracy: 96.6%); bidirectional LSTMs excel in the second (accuracy: 98.8%). The adaptations improve the first-level classification considerably (plus 2.2% points).","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"16 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132639486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predicting Domain Specific Personal Attitudes and Sentiment 预测特定领域的个人态度和情绪
Pub Date : 2020-06-01 DOI: 10.1142/S1793351X20400073
Md. Enamul Haque, Eddie C. Ling, Aminul Islam, M. E. Tozal
Microblog activity logs are useful to determine user’s interest and sentiment towards specific and broader category of events such as natural disaster and national election. In this paper, we present a corpus model to show how personal attitudes can be predicted from social media or microblog activities for a specific domain of events such as natural disasters. More specifically, given a user’s tweet and an event, the model is used to predict whether the user will be willing to help or show a positive attitude towards that event or similar events in the future. We present a new dataset related to a specific natural disaster event, i.e. Hurricane Harvey, that distinguishes user’s tweets into positive and non-positive attitudes. We build Term Embeddings for Tweet (TEmT) to generate features to model personal attitudes for arbitrary user’s tweets. In addition, we present sentiment analysis on the same disaster event dataset using enhanced feature learning on TEmT generated features by applying Convolutional Neural Network (CNN). Finally, we evaluate the effectiveness of our method by employing multiple classification techniques and comparative methods on the newly created dataset.
微博活动日志有助于确定用户对特定和更广泛类别的事件(如自然灾害和国家选举)的兴趣和情绪。在本文中,我们提出了一个语料库模型来展示如何从社交媒体或微博活动中预测个人态度,以应对特定领域的事件,如自然灾害。更具体地说,给定用户的tweet和事件,该模型用于预测用户是否愿意帮助或对该事件或未来类似事件表现出积极的态度。我们提出了一个与特定自然灾害事件(即飓风哈维)相关的新数据集,该数据集将用户的推文区分为积极和非积极态度。我们构建了Tweet的术语嵌入(TEmT)来生成特征,为任意用户的Tweet建模个人态度。此外,我们利用卷积神经网络(CNN)对TEmT生成的特征进行增强的特征学习,对同一灾难事件数据集进行情感分析。最后,我们通过对新创建的数据集使用多种分类技术和比较方法来评估我们方法的有效性。
{"title":"Predicting Domain Specific Personal Attitudes and Sentiment","authors":"Md. Enamul Haque, Eddie C. Ling, Aminul Islam, M. E. Tozal","doi":"10.1142/S1793351X20400073","DOIUrl":"https://doi.org/10.1142/S1793351X20400073","url":null,"abstract":"Microblog activity logs are useful to determine user’s interest and sentiment towards specific and broader category of events such as natural disaster and national election. In this paper, we present a corpus model to show how personal attitudes can be predicted from social media or microblog activities for a specific domain of events such as natural disasters. More specifically, given a user’s tweet and an event, the model is used to predict whether the user will be willing to help or show a positive attitude towards that event or similar events in the future. We present a new dataset related to a specific natural disaster event, i.e. Hurricane Harvey, that distinguishes user’s tweets into positive and non-positive attitudes. We build Term Embeddings for Tweet (TEmT) to generate features to model personal attitudes for arbitrary user’s tweets. In addition, we present sentiment analysis on the same disaster event dataset using enhanced feature learning on TEmT generated features by applying Convolutional Neural Network (CNN). Finally, we evaluate the effectiveness of our method by employing multiple classification techniques and comparative methods on the newly created dataset.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131290670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Guest Editor's Introduction 特邀编辑简介
Pub Date : 2020-06-01 DOI: 10.1142/S1793351X2002002X
D. D’Auria
{"title":"Guest Editor's Introduction","authors":"D. D’Auria","doi":"10.1142/S1793351X2002002X","DOIUrl":"https://doi.org/10.1142/S1793351X2002002X","url":null,"abstract":"","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117234759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transitive Topic Modeling with Conversational Structure Context: Discovering Topics that are Most Popular in Online Discussions 会话结构上下文的传递主题建模:发现在线讨论中最流行的主题
Pub Date : 2020-06-01 DOI: 10.1142/S1793351X20400103
Yingcheng Sun, R. Kolacinski, K. Loparo
With the explosive growth of online discussions published everyday on social media platforms, comprehension and discovery of the most popular topics have become a challenging problem. Conventional topic models have had limited success in online discussions because the corpus is extremely sparse and noisy. To overcome their limitations, we use the discussion thread tree structure and propose a “popularity” metric to quantify the number of replies to a comment to extend the frequency of word occurrences, and the “transitivity” concept to characterize topic dependency among nodes in a nested discussion thread. We build a Conversational Structure Aware Topic Model (CSATM) based on popularity and transitivity to infer topics and their assignments to comments. Experiments on real forum datasets are used to demonstrate improved performance for topic extraction with six different measurements of coherence and impressive accuracy for topic assignments.
随着社交媒体平台上每天发布的在线讨论的爆炸式增长,理解和发现最热门的话题已经成为一个具有挑战性的问题。由于语料库极其稀疏和嘈杂,传统的主题模型在在线讨论中取得的成功有限。为了克服它们的局限性,我们使用讨论线程树结构,并提出了一个“流行度”度量来量化评论的回复数量,以扩展单词出现的频率,并提出了“及物性”概念来表征嵌套讨论线程中节点之间的主题依赖性。我们基于流行度和及物性建立了一个会话结构感知主题模型(CSATM)来推断主题及其对评论的分配。在真实论坛数据集上进行的实验证明,通过六种不同的一致性测量,主题提取的性能得到了改善,并且主题分配的准确性令人印象深刻。
{"title":"Transitive Topic Modeling with Conversational Structure Context: Discovering Topics that are Most Popular in Online Discussions","authors":"Yingcheng Sun, R. Kolacinski, K. Loparo","doi":"10.1142/S1793351X20400103","DOIUrl":"https://doi.org/10.1142/S1793351X20400103","url":null,"abstract":"With the explosive growth of online discussions published everyday on social media platforms, comprehension and discovery of the most popular topics have become a challenging problem. Conventional topic models have had limited success in online discussions because the corpus is extremely sparse and noisy. To overcome their limitations, we use the discussion thread tree structure and propose a “popularity” metric to quantify the number of replies to a comment to extend the frequency of word occurrences, and the “transitivity” concept to characterize topic dependency among nodes in a nested discussion thread. We build a Conversational Structure Aware Topic Model (CSATM) based on popularity and transitivity to infer topics and their assignments to comments. Experiments on real forum datasets are used to demonstrate improved performance for topic extraction with six different measurements of coherence and impressive accuracy for topic assignments.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131074115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm 多任务学习范式在视觉问答中的动词语义信息整合
Pub Date : 2020-06-01 DOI: 10.1142/S1793351X20400085
Mehrdad Alizadeh, Barbara Maria Di Eugenio
Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQAsub). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQAsub as well. The results show a slight improvement over the single-task CNN-LSTM model.
视觉问答(VQA)关注的是为关于图像的自然语言问题提供答案。已经提出了几种深度神经网络方法,以端到端方式对任务进行建模。然而,这个任务是基于视觉处理的,如果问题集中在动词描述的事件上,语言理解部分就变得至关重要。我们的假设是,模型应该知道动词语义,通过语义角色标签、参数类型和/或框架元素来表达。不幸的是,不存在包含动词语义信息的VQA数据集。我们的第一个贡献是利用imSitu注释构建了一个新的VQA数据集(imSituVQA)。imSitu数据集由手动标记语义框架元素的图像组成,这些图像大多来自FrameNet。其次,我们提出了一个多任务CNN-LSTM VQA模型,该模型学习对答案和语义框架元素进行分类。我们的实验表明,语义框架元素分类有助于VQA系统避免响应不一致,提高性能。第三,我们使用自动语义角色标记器并注释VQA数据集的子集(VQAsub)。这样,所提出的多任务CNN-LSTM VQA模型也可以用VQAsub进行训练。结果表明,与单任务CNN-LSTM模型相比,该模型略有改进。
{"title":"Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm","authors":"Mehrdad Alizadeh, Barbara Maria Di Eugenio","doi":"10.1142/S1793351X20400085","DOIUrl":"https://doi.org/10.1142/S1793351X20400085","url":null,"abstract":"Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQAsub). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQAsub as well. The results show a slight improvement over the single-task CNN-LSTM model.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124401571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ontology-based Document Spanning Systems for Information Extraction 基于本体的信息抽取文档生成系统
Pub Date : 2020-03-01 DOI: 10.1142/s1793351x20400012
D. Lembo, Federico Maria Scafoglieri
Information Extraction (IE) is the task of automatically organizing in a structured form data extracted from free text documents. In several contexts, it is often desirable that the extracted data are then organized according to an ontology, which provides a formal and conceptual representation of the domain of interest. Ontologies allow for a better data interpretation, as well as for their semantic integration with other information, as in Ontology-based Data Access (OBDA), a popular declarative framework for data management where an ontology is connected to a data layer through mappings. However, the data layer considered so far in OBDA has consisted essentially of relational databases, and how to declaratively couple an ontology with unstructured data sources is still unexplored. By leveraging the recent study on document spanners for rule-based IE by Fagin et al., in this paper, we propose a new framework that allows to map text documents to ontologies, in the spirit of OBDA. We investigate the problem of answering conjunctive queries in this framework. For ontologies specified in the Description Logics [Formula: see text] and [Formula: see text], we show that the problem is polynomial in the size of the underlying documents. We also provide algorithms to solve query answering by rewriting the input query on the basis of the ontology and its mapping toward the source documents. Through these techniques, we pursue a virtual approach, similar to that typically adopted in OBDA, which allows us to answer a query without having to first populate the entire ontology. Interestingly, for [Formula: see text], both the spanners used in the mapping and the one computed by the rewriting algorithm belong to the same class of expressiveness. This holds also for [Formula: see text], modulo some limitations on the form of the mapping. These results say that in these cases our framework can be easily implemented by decoupling ontology management and document access, which can be delegated to an external IE system able to process the extraction rules we use in the mapping.
信息抽取(Information Extraction, IE)是从自由文本文档中抽取的数据,以结构化的形式自动组织数据。在一些上下文中,通常需要根据本体组织提取的数据,本体提供感兴趣的领域的形式化和概念性表示。本体允许更好的数据解释,以及它们与其他信息的语义集成,如基于本体的数据访问(OBDA),这是一种流行的数据管理声明性框架,其中本体通过映射连接到数据层。然而,到目前为止,OBDA中考虑的数据层主要由关系数据库组成,如何声明性地将本体与非结构化数据源耦合仍然没有研究。通过利用Fagin等人最近对基于规则的IE的文档生成器的研究,在本文中,我们提出了一个新的框架,允许在OBDA的精神下将文本文档映射到本体。我们研究了在这个框架中回答连词查询的问题。对于描述逻辑[公式:见文本]和[公式:见文本]中指定的本体,我们表明问题是底层文档大小的多项式。我们还提供了通过在本体及其到源文档的映射的基础上重写输入查询来解决查询回答的算法。通过这些技术,我们追求一种虚拟方法,类似于OBDA中通常采用的方法,它允许我们回答查询,而不必首先填充整个本体。有趣的是,对于[公式:参见文本],映射中使用的生成器和重写算法计算的生成器都属于同一类表达性。这也适用于[公式:见文本],模取映射形式的一些限制。这些结果表明,在这些情况下,我们的框架可以通过解耦本体管理和文档访问来轻松实现,这可以委托给能够处理我们在映射中使用的提取规则的外部IE系统。
{"title":"Ontology-based Document Spanning Systems for Information Extraction","authors":"D. Lembo, Federico Maria Scafoglieri","doi":"10.1142/s1793351x20400012","DOIUrl":"https://doi.org/10.1142/s1793351x20400012","url":null,"abstract":"Information Extraction (IE) is the task of automatically organizing in a structured form data extracted from free text documents. In several contexts, it is often desirable that the extracted data are then organized according to an ontology, which provides a formal and conceptual representation of the domain of interest. Ontologies allow for a better data interpretation, as well as for their semantic integration with other information, as in Ontology-based Data Access (OBDA), a popular declarative framework for data management where an ontology is connected to a data layer through mappings. However, the data layer considered so far in OBDA has consisted essentially of relational databases, and how to declaratively couple an ontology with unstructured data sources is still unexplored. By leveraging the recent study on document spanners for rule-based IE by Fagin et al., in this paper, we propose a new framework that allows to map text documents to ontologies, in the spirit of OBDA. We investigate the problem of answering conjunctive queries in this framework. For ontologies specified in the Description Logics [Formula: see text] and [Formula: see text], we show that the problem is polynomial in the size of the underlying documents. We also provide algorithms to solve query answering by rewriting the input query on the basis of the ontology and its mapping toward the source documents. Through these techniques, we pursue a virtual approach, similar to that typically adopted in OBDA, which allows us to answer a query without having to first populate the entire ontology. Interestingly, for [Formula: see text], both the spanners used in the mapping and the one computed by the rewriting algorithm belong to the same class of expressiveness. This holds also for [Formula: see text], modulo some limitations on the form of the mapping. These results say that in these cases our framework can be easily implemented by decoupling ontology management and document access, which can be delegated to an external IE system able to process the extraction rules we use in the mapping.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123316011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Int. J. Semantic Comput.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1