Journal of Web Semantics最新文献_第9页

Improved distant supervision relation extraction based on edge-reasoning hybrid graph model 基于边缘推理混合图模型的改进远程监督关系提取

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-07-01 DOI: 10.1016/j.websem.2021.100656

Shirong Shen, Shangfu Duan, Huan Gao, Guilin Qi

Distant supervision relation extraction (DSRE) trains a classifier by automatically labeling data through aligning triples in the knowledge base (KB) with large-scale corpora. Training data generated by distant supervision may contain many mislabeled instances, which is harmful to the training of the classifier. Some recent methods show that relevant background information in KBs, such as entity type (e.g., Organization and Book), can improve the performance of DSRE. However, there are three main problems with these methods. Firstly, these methods are tailored for a specific type of information. A specific type of information only has a positive effect on a part of instances and will not be beneficial to all cases. Secondly, different background information is embedded independently, and no reasonable interaction is achieved. Thirdly, previous methods do not consider the side effect of the introduced noise of background information. To address these issues, we leverage five types of background information instead of a specific type of information in previous works and propose a novel edge-reasoning hybrid graph (ER-HG) model to realize reasonable interaction between different kinds of information. In addition, we further employ an attention mechanism for the ER-HG model to alleviate the side effect of noise. The ER-HG model integrates all types of information efficiently and is very robust to the noise of information. We conduct experiments on two widely used datasets. The experimental results demonstrate that our model outperforms the state-of-the-art methods significantly in held-out metric and robustness tests.

远程监督关系提取(DSRE)通过将知识库中的三元组与大规模语料库对齐来自动标记数据，从而训练分类器。远程监督生成的训练数据可能包含许多错误标记的实例，这对分类器的训练是有害的。最近的一些方法表明，KBs中的相关背景信息，如实体类型(例如，Organization和Book)，可以提高DSRE的性能。然而，这些方法存在三个主要问题。首先，这些方法是为特定类型的信息量身定制的。特定类型的信息仅对部分实例具有积极作用，而不是对所有实例都有益。其次，不同的背景信息被独立嵌入，没有实现合理的交互。第三，以前的方法没有考虑引入背景信息噪声的副作用。为了解决这些问题，我们利用五种类型的背景信息，而不是以往的工作中特定类型的信息，提出了一种新的边缘推理混合图(ER-HG)模型，以实现不同类型信息之间的合理交互。此外，我们还在ER-HG模型中引入了注意机制，以减轻噪声的副作用。ER-HG模型有效地集成了各类信息，对信息噪声具有很强的鲁棒性。我们在两个广泛使用的数据集上进行实验。实验结果表明，我们的模型在持有度量和稳健性测试中明显优于最先进的方法。

{"title":"Improved distant supervision relation extraction based on edge-reasoning hybrid graph model","authors":"Shirong Shen, Shangfu Duan, Huan Gao, Guilin Qi","doi":"10.1016/j.websem.2021.100656","DOIUrl":"10.1016/j.websem.2021.100656","url":null,"abstract":"<div><p>Distant supervision relation extraction (DSRE) trains a classifier by automatically labeling data through aligning triples in the knowledge base (KB) with large-scale corpora. Training data generated by distant supervision may contain many mislabeled instances, which is harmful to the training of the classifier. Some recent methods show that relevant background information in KBs, such as entity type (e.g., Organization and Book), can improve the performance of DSRE. However, there are three main problems with these methods. Firstly, these methods are tailored for a specific type of information. A specific type of information only has a positive effect on a part of instances and will not be beneficial to all cases. Secondly, different background information is embedded independently, and no reasonable interaction is achieved. Thirdly, previous methods do not consider the side effect of the introduced noise of background information. To address these issues, we leverage five types of background information instead of a specific type of information in previous works and propose a novel edge-reasoning hybrid graph (ER-HG) model to realize reasonable interaction between different kinds of information. In addition, we further employ an attention mechanism<span> for the ER-HG model to alleviate the side effect of noise. The ER-HG model integrates all types of information efficiently and is very robust to the noise of information. We conduct experiments on two widely used datasets. The experimental results demonstrate that our model outperforms the state-of-the-art methods significantly in held-out metric and robustness tests.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2021.100656","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73405400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Supporting contextualized learning with linked open data 通过关联的开放数据支持情境化学习

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-07-01 DOI: 10.1016/j.websem.2021.100657

Adolfo Ruiz-Calleja, Guillermo Vega-Gorgojo, Miguel L. Bote-Lorenzo, Juan I. Asensio-Pérez, Yannis Dimitriadis, Eduardo Gómez-Sánchez

This paper proposes a template-based approach to semi-automatically create contextualized learning tasks out of several sources from the Web of Data. The contextualization of learning tasks opens the possibility of bridging formal learning that happens in a classroom, and informal learning that happens in other physical spaces, such as squares or historical buildings. The tasks created cover different cognitive levels and are contextualized by their location and the topics covered. We applied this approach to the domain of History of Art in the Spanish region of Castile and Leon. We gathered data from DBpedia, Wikidata and the Open Data published by the regional government and we applied 32 templates to obtain 16K learning tasks. An evaluation with 8 teachers shows that teachers would accept their students to carry out the tasks generated. Teachers also considered that the 85% of the tasks generated are aligned with the content taught in the classroom and were found to be relevant to learn in other informal spaces. The tasks created are available at https://casuallearn.gsic.uva.es/sparql.

本文提出了一种基于模板的方法，从Web of Data的多个数据源中半自动地创建情境化学习任务。学习任务的情境化为在教室中进行的正式学习和在其他物理空间(如广场或历史建筑)中进行的非正式学习之间架起了桥梁。所创建的任务涵盖不同的认知水平，并根据其位置和所涵盖的主题进行上下文化。我们将这种方法应用于西班牙卡斯蒂利亚和莱昂地区的艺术史领域。我们从DBpedia, Wikidata和地区政府发布的开放数据中收集数据，并应用32个模板获得16K个学习任务。对8名教师的评估表明，教师会接受学生执行生成的任务。教师们还认为，生成的任务中有85%与课堂教学内容一致，并且发现与其他非正式空间的学习相关。创建的任务可在https://casuallearn.gsic.uva.es/sparql上获得。

{"title":"Supporting contextualized learning with linked open data","authors":"Adolfo Ruiz-Calleja, Guillermo Vega-Gorgojo, Miguel L. Bote-Lorenzo, Juan I. Asensio-Pérez, Yannis Dimitriadis, Eduardo Gómez-Sánchez","doi":"10.1016/j.websem.2021.100657","DOIUrl":"10.1016/j.websem.2021.100657","url":null,"abstract":"<div><p><span><span>This paper proposes a template-based approach to semi-automatically create contextualized learning tasks out of several sources from the Web of Data. The contextualization of learning tasks opens the possibility of bridging formal learning that happens in a classroom, and informal learning that happens in other physical spaces, such as squares or historical buildings. The tasks created cover different cognitive levels and are contextualized by their location and the topics covered. We applied this approach to the domain of History of Art in the Spanish region of Castile and Leon. We gathered data from DBpedia, Wikidata and the Open Data published by the regional government and we applied 32 templates to obtain 16K learning tasks. An evaluation with 8 teachers shows that teachers would accept their students to carry out the </span>tasks generated. Teachers also considered that the 85% of the tasks generated are aligned with the content taught in the classroom and were found to be relevant to learn in other informal spaces. The tasks created are available at </span><span>https://casuallearn.gsic.uva.es/sparql</span><svg><path></path></svg>.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2021.100657","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81088571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

DTN: Deep triple network for topic specific fake news detection DTN:针对特定主题的假新闻检测的深度三重网络

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-07-01 DOI: 10.1016/j.websem.2021.100646

Jinshuo Liu , Chenyang Wang , Chenxi Li , Ningxi Li , Juan Deng , Jeff Z. Pan

Detection of fake news has spurred widespread interests in areas such as healthcare and Internet societies, in order to prevent propagating misleading information for commercial and political purposes. However, efforts to study a general framework for exploiting knowledge, for judging the trustworthiness of given news based on their content, have been limited. Indeed, the existing works rarely consider incorporating knowledge graphs (KGs), which could provide rich structured knowledge for better language understanding.

In this work, we propose a deep triple network (DTN) that leverages knowledge graphs to facilitate fake news detection with triple-enhanced explanations. In the DTN, background knowledge graphs, such as open knowledge graphs and extracted graphs from news bases, are applied for both low-level and high-level feature extraction to classify the input news article and provide explanations for the classification.

The performance of the proposed method is evaluated by demonstrating abundant convincing comparative experiments. Obtained results show that DTN outperforms conventional fake news detection methods from different aspects, including the provision of factual evidence supporting the decision of fake news detection.

为了防止出于商业和政治目的传播误导性信息，对假新闻的发现激发了医疗保健和互联网社会等领域的广泛兴趣。然而，研究利用知识的一般框架，根据内容判断给定新闻的可信度的努力是有限的。事实上，现有的研究很少考虑将知识图(knowledge graphs, KGs)纳入其中，而知识图可以为更好的语言理解提供丰富的结构化知识。在这项工作中，我们提出了一个深度三重网络(DTN)，它利用知识图谱来促进假新闻检测，并提供三重增强的解释。在DTN中，使用背景知识图(如开放知识图和从新闻库中提取的图)进行低级和高级特征提取，对输入的新闻文章进行分类，并为分类提供解释。通过大量令人信服的对比实验，对所提方法的性能进行了评价。获得的结果表明，DTN在不同方面都优于传统的假新闻检测方法，包括提供支持假新闻检测决策的事实证据。

{"title":"DTN: Deep triple network for topic specific fake news detection","authors":"Jinshuo Liu , Chenyang Wang , Chenxi Li , Ningxi Li , Juan Deng , Jeff Z. Pan","doi":"10.1016/j.websem.2021.100646","DOIUrl":"10.1016/j.websem.2021.100646","url":null,"abstract":"<div><p>Detection of fake news has spurred widespread interests in areas such as healthcare and Internet societies, in order to prevent propagating misleading information for commercial and political purposes. However, efforts to study a general framework for exploiting knowledge, for judging the trustworthiness of given news based on their content, have been limited. Indeed, the existing works rarely consider incorporating knowledge graphs (KGs), which could provide rich structured knowledge for better language understanding.</p><p>In this work, we propose a deep triple network (DTN) that leverages knowledge graphs to facilitate fake news detection with triple-enhanced explanations. In the DTN, background knowledge graphs, such as open knowledge graphs and extracted graphs from news bases, are applied for both low-level and high-level feature extraction to classify the input news article and provide explanations for the classification.</p><p>The performance of the proposed method is evaluated by demonstrating abundant convincing comparative experiments. Obtained results show that DTN outperforms conventional fake news detection methods from different aspects, including the provision of factual evidence supporting the decision of fake news detection.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2021.100646","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76145187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Beware of the hierarchy — An analysis of ontology evolution and the materialisation impact for biomedical ontologies 谨防层次结构-对生物医学本体进化和物质化影响的分析

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-07-01 DOI: 10.1016/j.websem.2021.100658

Romana Pernisch , Daniele Dell’Aglio , Abraham Bernstein

Ontologies are becoming a key component of numerous applications and research fields. But knowledge captured within ontologies is not static. Some ontology updates potentially have a wide ranging impact; others only affect very localised parts of the ontology and their applications. Investigating the impact of the evolution gives us insight into the editing behaviour but also signals ontology engineers and users how the ontology evolution is affecting other applications. However, such research is in its infancy. Hence, we need to investigate the evolution itself and its impact on the simplest of applications: the materialisation.

In this work, we define impact measures that capture the effect of changes on the materialisation. In the future, the impact measures introduced in this work can be used to investigate how aware the ontology editors are about consequences of changes. By introducing five different measures, which focus either on the change in the materialisation with respect to the size or on the number of changes applied, we are able to quantify the consequences of ontology changes. To see these measures in action, we investigate the evolution and its impact on materialisation for nine open biomedical ontologies, most of which adhere to the ${EL}^{+ +}$ description logic.

Our results show that these ontologies evolve at varying paces but no statistically significant difference between the ontologies with respect to their evolution could be identified. We identify three types of ontologies based on the types of complex changes which are applied to them throughout their evolution. The impact on the materialisation is the same for the investigated ontologies, bringing us to the conclusion that the effect of changes on the materialisation can be generalised to other similar ontologies. Further, we found that the materialised concept inclusion axioms experience most of the impact induced by changes to the class inheritance of the ontology and other changes only marginally touch the materialisation.

本体正在成为众多应用和研究领域的关键组成部分。但是在本体中获取的知识不是静态的。一些本体更新可能会产生广泛的影响;其他的只影响本体及其应用的非常局部的部分。研究进化的影响可以让我们深入了解编辑行为，还可以向本体工程师和用户发出信号，说明本体进化如何影响其他应用程序。然而，这类研究还处于起步阶段。因此，我们需要研究进化本身及其对最简单的应用程序的影响:物质化。在这项工作中，我们定义了影响措施，以捕获变化对物化的影响。在未来，本工作中引入的影响度量可用于调查本体编辑对更改后果的了解程度。通过引入五种不同的度量，这些度量要么关注与大小相关的物化中的变化，要么关注应用的变化的数量，我们能够量化本体变化的结果。为了观察这些措施的作用，我们研究了9个开放生物医学本体的演变及其对物质化的影响，其中大多数坚持el++描述逻辑。我们的研究结果表明，这些本体以不同的速度进化，但就其进化而言，这些本体之间没有统计学上的显著差异。我们根据在整个发展过程中应用于它们的复杂变化的类型确定了三种类型的本体。对于所研究的本体论，对物化的影响是相同的，这使我们得出结论，变化对物化的影响可以推广到其他类似的本体论。此外，我们发现物化概念包含公理受到本体类继承变化的大部分影响，而其他变化仅略微触及物化。

{"title":"Beware of the hierarchy — An analysis of ontology evolution and the materialisation impact for biomedical ontologies","authors":"Romana Pernisch , Daniele Dell’Aglio , Abraham Bernstein","doi":"10.1016/j.websem.2021.100658","DOIUrl":"10.1016/j.websem.2021.100658","url":null,"abstract":"<div><p>Ontologies are becoming a key component of numerous applications and research fields. But knowledge captured within ontologies is not static. Some ontology updates potentially have a wide ranging impact; others only affect very localised parts of the ontology and their applications. Investigating the impact of the evolution gives us insight into the editing behaviour but also signals ontology engineers and users how the ontology evolution is affecting other applications. However, such research is in its infancy. Hence, we need to investigate the evolution itself and its impact on the simplest of applications: the materialisation.</p><p>In this work, we define impact measures that capture the effect of changes on the materialisation. In the future, the impact measures introduced in this work can be used to investigate how aware the ontology editors are about consequences of changes. By introducing five different measures, which focus either on the change in the materialisation with respect to the size or on the number of changes applied, we are able to quantify the consequences of ontology changes. To see these measures in action, we investigate the evolution and its impact on materialisation for nine open biomedical ontologies, most of which adhere to the <span><math><msup><mrow><mi>EL</mi></mrow><mrow><mo>+</mo><mo>+</mo></mrow></msup></math></span> description logic.</p><p>Our results show that these ontologies evolve at varying paces but no statistically significant difference between the ontologies with respect to their evolution could be identified. We identify three types of ontologies based on the types of complex changes which are applied to them throughout their evolution. The impact on the materialisation is the same for the investigated ontologies, bringing us to the conclusion that the effect of changes on the materialisation can be generalised to other similar ontologies. Further, we found that the materialised concept inclusion axioms experience most of the impact induced by changes to the class inheritance of the ontology and other changes only marginally touch the materialisation.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2021.100658","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85745744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Entity summarization: State of the art and future challenges 实体总结:技术现状和未来挑战

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-05-01 DOI: 10.1016/j.websem.2021.100647

Qingxia Liu , Gong Cheng , Kalpa Gunaratna , Yuzhong Qu

The increasing availability of semantic data has substantially enhanced Web applications. Semantic data such as RDF data is commonly represented as entity-property-value triples. The magnitude of semantic data, in particular the large number of triples describing an entity, could overload users with excessive amounts of information. This has motivated fruitful research on automated generation of summaries for entity descriptions to satisfy users’ information needs efficiently and effectively. We focus on this prominent topic of entity summarization, and our research objective is to present the first comprehensive survey of entity summarization research. Rather than separately reviewing each method, our contributions include (1) identifying and classifying technical features of existing methods to form a high-level overview, (2) identifying and classifying frameworks for combining multiple technical features adopted by existing methods, (3) collecting known benchmarks for intrinsic evaluation and efforts for extrinsic evaluation, and (4) suggesting research directions for future work. By investigating the literature, we synthesized two hierarchies of techniques. The first hierarchy categories generic technical features into several perspectives: frequency and centrality, informativeness, and diversity and coverage. In the second hierarchy we present domain-specific and task-specific technical features, including the use of domain knowledge, context awareness, and personalization. Our review demonstrated that existing methods are mainly unsupervised and they combine multiple technical features using various frameworks: random surfer models, similarity-based grouping, MMR-like re-ranking, or combinatorial optimization. We also found a few deep learning based methods in recent research. Current evaluation results and our case study showed that the problem of entity summarization is still far from being solved. Based on the limitations of existing methods revealed in the review, we identified several future directions: the use of semantics, human factors, machine and deep learning, non-extractive methods, and interactive methods.

语义数据可用性的增加极大地增强了Web应用程序。语义数据(如RDF数据)通常表示为实体-属性-值三元组。语义数据的规模，特别是描述实体的大量三元组，可能会给用户带来过多的信息负担。这推动了实体描述摘要的自动生成研究，以高效地满足用户的信息需求。我们专注于实体摘要这一突出的主题，我们的研究目标是对实体摘要研究进行第一次全面的调查。我们的贡献不是单独回顾每种方法，而是包括:(1)识别和分类现有方法的技术特征，形成一个高层次的概述;(2)识别和分类现有方法采用的多种技术特征组合的框架;(3)收集已知的内在评价基准和外在评价努力;(4)为未来的工作提出研究方向。通过研究文献，我们综合了两种技术层次。第一个层次结构将通用技术特征分为几个方面:频率和中心性、信息性、多样性和覆盖范围。在第二个层次中，我们介绍了特定于领域和特定于任务的技术特性，包括使用领域知识、上下文感知和个性化。我们的综述表明，现有的方法主要是无监督的，它们使用不同的框架结合了多种技术特征:随机冲浪者模型、基于相似性的分组、类似mmr的重新排序或组合优化。在最近的研究中，我们也发现了一些基于深度学习的方法。目前的评价结果和我们的案例研究表明，实体摘要的问题还远远没有得到解决。基于回顾中揭示的现有方法的局限性，我们确定了几个未来的方向:使用语义，人为因素，机器和深度学习，非提取方法和交互方法。

{"title":"Entity summarization: State of the art and future challenges","authors":"Qingxia Liu , Gong Cheng , Kalpa Gunaratna , Yuzhong Qu","doi":"10.1016/j.websem.2021.100647","DOIUrl":"10.1016/j.websem.2021.100647","url":null,"abstract":"<div><p><span>The increasing availability of semantic data has substantially enhanced Web applications. Semantic data such as RDF data is commonly represented as entity-property-value triples. The magnitude of semantic data, in particular the large number of triples describing an entity, could overload users with excessive amounts of information. This has motivated fruitful research on automated generation of summaries for entity descriptions to satisfy users’ information needs efficiently and effectively. We focus on this prominent topic of entity summarization, and our research objective is to present the first comprehensive survey of entity summarization research. Rather than separately reviewing each method, our contributions include (1) identifying and classifying technical features of existing methods to form a high-level overview, (2) identifying and classifying frameworks for combining multiple technical features adopted by existing methods, (3) collecting known benchmarks for intrinsic evaluation and efforts for extrinsic evaluation, and (4) suggesting research directions for future work. By investigating the literature, we synthesized two hierarchies of techniques. The first hierarchy categories generic technical features into several perspectives: frequency and centrality, informativeness, and diversity and coverage. In the second hierarchy we present domain-specific and task-specific technical features, including the use of domain knowledge, </span>context awareness<span><span><span>, and personalization. Our review demonstrated that existing methods are mainly unsupervised and they combine multiple technical features using various frameworks: random surfer models, similarity-based grouping, MMR-like re-ranking, or combinatorial optimization. We also found a few </span>deep learning based methods in recent research. Current evaluation results and our case study showed that the problem of entity summarization is still far from being solved. Based on the limitations of existing methods revealed in the review, we identified several future directions: the use of semantics, </span>human factors, machine and deep learning, non-extractive methods, and interactive methods.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2021.100647","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77050790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Handling redundant processing in OBDA query execution over relational sources 处理关系源上OBDA查询执行中的冗余处理

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-04-01 DOI: 10.1016/j.websem.2021.100639

Dimitris Bilidas, Manolis Koubarakis

Redundant processing is a key problem in the translation of initial queries posed over an ontology into SQL queries, through mappings, as it is performed by ontology-based data access systems. Examples of such processing are duplicate answers obtained during query evaluation, which must finally be discarded, or common expressions evaluated multiple times from different parts of the same complex query. Many optimizations that aim to minimize this problem have been proposed and implemented, mostly based on semantic query optimization techniques, by exploiting ontological axioms and constraints defined in the database schema. However, data operations that introduce redundant processing are still generated in many practical settings, and this is a factor that impacts query execution. In this work we propose a cost-based method for query translation, which starts from an initial result and uses information about redundant processing in order to come up with an equivalent, more efficient translation. The method operates in a number of steps, by relying on certain heuristics indicating that we obtain a more efficient query in each step. Through experimental evaluation using the Ontop system for ontology-based data access, we exhibit the benefits of our method.

冗余处理是将对本体提出的初始查询通过映射转换为SQL查询的关键问题，因为它是由基于本体的数据访问系统执行的。这种处理的例子包括在查询求值期间获得的重复答案，这些答案最终必须被丢弃，或者从同一复杂查询的不同部分多次求值公共表达式。通过利用数据库模式中定义的本体论公理和约束，已经提出并实现了许多旨在最小化此问题的优化，这些优化大多基于语义查询优化技术。但是，在许多实际设置中仍然会生成引入冗余处理的数据操作，这是影响查询执行的一个因素。在这项工作中，我们提出了一种基于成本的查询翻译方法，该方法从初始结果开始，利用冗余处理的信息，以得到等效的、更有效的翻译。该方法在许多步骤中运行，依靠某些启发式方法，表明我们在每个步骤中获得更有效的查询。通过使用Ontop系统进行基于本体的数据访问的实验评估，我们展示了我们方法的优点。

{"title":"Handling redundant processing in OBDA query execution over relational sources","authors":"Dimitris Bilidas, Manolis Koubarakis","doi":"10.1016/j.websem.2021.100639","DOIUrl":"10.1016/j.websem.2021.100639","url":null,"abstract":"<div><p><span><span>Redundant processing is a key problem in the translation of initial queries posed over an ontology into SQL queries, through mappings, as it is performed by ontology-based data access systems. Examples of such processing are duplicate answers obtained during query evaluation, which must finally be discarded, or common expressions evaluated multiple times from different parts of the same complex query. Many optimizations that aim to minimize this problem have been proposed and implemented, mostly based on semantic </span>query optimization techniques, by exploiting ontological axioms and constraints defined in the database schema. However, data operations that introduce redundant processing are still generated in many practical settings, and this is a factor that impacts </span>query execution<span><span>. In this work we propose a cost-based method for query translation, which starts from an initial result and uses information about redundant processing in order to come up with an equivalent, more efficient translation. The method operates in a number of steps, by relying on certain heuristics indicating that we obtain a more efficient query in each step. Through experimental evaluation using the </span>Ontop system for ontology-based data access, we exhibit the benefits of our method.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2021.100639","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72667523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FarsBase-KBP: A knowledge base population system for the Persian Knowledge Graph FarsBase-KBP:波斯语知识图谱的知识库人口系统

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-04-01 DOI: 10.1016/j.websem.2021.100638

Majid Asgari-Bidhendi, Behrooz Janfada, Behrouz Minaei-Bidgoli

While most of the knowledge bases already support the English language, there is only one knowledge base for the Persian language, known as FarsBase, which is automatically created via semi-structured web information. Unlike English knowledge bases such as Wikidata, which have tremendous community support, the population of a knowledge base like FarsBase must rely on automatically extracted knowledge. Knowledge base population can let FarsBase keep growing in size, as the system continues working. In this paper, we present a knowledge base population system for the Persian language, which extracts knowledge from unlabelled raw text, crawled from the Web. The proposed system consists of a set of state-of-the-art modules such as an entity linking module as well as information and relation extraction modules designed for FarsBase. Moreover, a canonicalization system is introduced to link extracted relations to FarsBase properties. Then, the system uses knowledge fusion techniques with minimal intervention of human experts to integrate and filter the proper knowledge instances, extracted by each module. To evaluate the performance of the presented knowledge base population system, we present the first gold dataset for benchmarking knowledge base population in the Persian language, which consisting of 22015 FarsBase triples and verified by human experts. The evaluation results demonstrate the efficiency of the proposed system.

虽然大多数知识库已经支持英语，但只有一个波斯语知识库，称为FarsBase，它是通过半结构化web信息自动创建的。与维基数据等拥有巨大社区支持的英语知识库不同，FarsBase等知识库的人口必须依赖于自动提取的知识。随着系统的持续运行，知识库的数量可以让FarsBase保持规模的增长。在本文中，我们提出了一个波斯语知识库人口系统，该系统从网络上抓取的未标记的原始文本中提取知识。该系统由一系列最先进的模块组成，如实体链接模块以及为FarsBase设计的信息和关系提取模块。此外，还引入了一个规范化系统，将提取的关系链接到FarsBase属性。然后，系统利用知识融合技术，在最小程度上减少人类专家的干预，对各个模块提取的合适的知识实例进行整合和过滤。为了评估所提出的知识库人口系统的性能，我们提出了第一个用于对波斯语知识库人口进行基准测试的黄金数据集，该数据集由22015个FarsBase三元组组成，并由人类专家验证。评价结果表明了系统的有效性。

{"title":"FarsBase-KBP: A knowledge base population system for the Persian Knowledge Graph","authors":"Majid Asgari-Bidhendi, Behrooz Janfada, Behrouz Minaei-Bidgoli","doi":"10.1016/j.websem.2021.100638","DOIUrl":"10.1016/j.websem.2021.100638","url":null,"abstract":"<div><p>While most of the knowledge bases already support the English language, there is only one knowledge base for the Persian language, known as FarsBase, which is automatically created via semi-structured web information. Unlike English knowledge bases such as Wikidata, which have tremendous community support, the population of a knowledge base like FarsBase must rely on automatically extracted knowledge. Knowledge base population can let FarsBase keep growing in size, as the system continues working. In this paper, we present a knowledge base population system for the Persian language, which extracts knowledge from unlabelled raw text, crawled from the Web. The proposed system consists of a set of state-of-the-art modules such as an entity linking module as well as information and relation extraction modules designed for FarsBase. Moreover, a canonicalization system is introduced to link extracted relations to FarsBase properties. Then, the system uses knowledge fusion techniques with minimal intervention of human experts to integrate and filter the proper knowledge instances, extracted by each module. To evaluate the performance of the presented knowledge base population system, we present the first gold dataset for benchmarking knowledge base population in the Persian language, which consisting of 22015 FarsBase triples and verified by human experts. The evaluation results demonstrate the efficiency of the proposed system.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2021.100638","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76836239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Knowledge graph embeddings for dealing with concept drift in machine learning 用于处理机器学习中概念漂移的知识图嵌入

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-02-01 DOI: 10.1016/j.websem.2020.100625

Jiaoyan Chen , Freddy Lécué , Jeff Z. Pan , Shumin Deng , Huajun Chen

Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. As data is evolving on a temporal basis, its underlying knowledge is subject to many challenges. Concept drift,¹ as one of core challenge from the stream learning community, is described as changes of statistical properties of the data over time, causing most of machine learning models to be less accurate as changes over time are in unforeseen ways. This is particularly problematic as the evolution of data could derive to dramatic change in knowledge. We address this problem by studying the semantic representation of data streams in the Semantic Web, i.e., ontology streams. Such streams are ordered sequences of data annotated with ontological vocabulary. In particular we exploit three levels of knowledge encoded in ontology streams to deal with concept drifts: i) existence of novel knowledge gained from stream dynamics, ii) significance of knowledge change and evolution, and iii) (in)consistency of knowledge evolution. Such knowledge is encoded as knowledge graph embeddings through a combination of novel representations: entailment vectors, entailment weights, and a consistency vector. We illustrate our approach on classification tasks of supervised learning. Key contributions of the study include: (i) an effective knowledge graph embedding approach for stream ontologies, and (ii) a generic consistent prediction framework with integrated knowledge graph embeddings for dealing with concept drifts. The experiments have shown that our approach provides accurate predictions towards air quality in Beijing and bus delay in Dublin with real world ontology streams.

数据流学习主要用于从连续快速的数据记录中提取知识结构。由于数据在时间基础上不断发展，其基础知识受到许多挑战。概念漂移，作为流学习社区的核心挑战之一，被描述为数据的统计属性随时间的变化，导致大多数机器学习模型不太准确，因为随时间的变化是以不可预见的方式发生的。这尤其成问题，因为数据的演变可能导致知识的急剧变化。我们通过研究语义Web中数据流的语义表示(即本体流)来解决这个问题。这些流是用本体论词汇表注释的有序数据序列。特别是，我们利用本体流中编码的三个层次的知识来处理概念漂移:i)从流动力学中获得的新知识的存在，ii)知识变化和进化的意义，以及iii) (In)知识进化的一致性。这样的知识被编码为知识图嵌入，通过一种新颖表示的组合:蕴涵向量、蕴涵权重和一致性向量。我们在监督学习的分类任务上说明了我们的方法。本研究的主要贡献包括:(i)为流本体提供了一种有效的知识图嵌入方法;(ii)为处理概念漂移提供了一种集成知识图嵌入的通用一致预测框架。实验表明，我们的方法可以用现实世界的本体论流准确预测北京的空气质量和都柏林的公共汽车延误。

{"title":"Knowledge graph embeddings for dealing with concept drift in machine learning","authors":"Jiaoyan Chen , Freddy Lécué , Jeff Z. Pan , Shumin Deng , Huajun Chen","doi":"10.1016/j.websem.2020.100625","DOIUrl":"10.1016/j.websem.2020.100625","url":null,"abstract":"<div><p>Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. As data is evolving on a temporal basis, its underlying knowledge is subject to many challenges. Concept drift,<span><sup>1</sup></span><span> as one of core challenge from the stream learning community, is described as changes of statistical properties of the data over time, causing most of machine learning models to be less accurate as changes over time are in unforeseen ways. This is particularly problematic as the evolution of data could derive to dramatic change in knowledge. We address this problem by studying the semantic representation<span> of data streams in the Semantic Web, i.e., ontology streams. Such streams are ordered sequences of data annotated with ontological vocabulary. In particular we exploit three levels of knowledge encoded in ontology streams to deal with concept drifts: i) existence of novel knowledge gained from stream dynamics, ii) significance of knowledge change and evolution, and iii) (in)consistency of knowledge evolution. Such knowledge is encoded as knowledge graph embeddings through a combination of novel representations: entailment vectors, entailment weights, and a consistency vector. We illustrate our approach on classification tasks of supervised learning. Key contributions of the study include: </span></span><em>(i)</em> an effective knowledge graph embedding approach for stream ontologies, and <em>(ii)</em> a generic consistent prediction framework with integrated knowledge graph embeddings for dealing with concept drifts. The experiments have shown that our approach provides accurate predictions towards air quality in Beijing and bus delay in Dublin with real world ontology streams.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2020.100625","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73302346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

On revealing shared conceptualization among open datasets 开放数据集之间共享概念的揭示

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2021-01-01 DOI: 10.1016/j.websem.2020.100624

Miloš Bogdanović, Nataša Veljković, Milena Frtunić Gligorijević, Darko Puflović, Leonid Stoimenov

Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.

开放和透明倡议不仅是科学进步的里程碑，而且还影响了组织和行业的各个领域。在这种影响下，世界各地的各种政府机构通过开放数据门户网站发布了大量的数据集。政府数据涵盖了不同的主题，可用数据的规模每年都在增长。发布的数据应该是可访问和可发现的。出于这些目的，门户利用数据集附带的元数据。然而，元数据的一部分经常丢失，这降低了用户获取所需信息的能力。随着已发布数据集规模的增长，这个问题也随之增加。我们在本文中描述的方法侧重于通过实现能够为未分类数据集所属的类别提出最佳匹配的知识结构和算法来减少这一问题。通过这样做，我们的目标是双重的:通过建议适当的类别来丰富数据集元数据，并增加其可见性和可发现性。我们的方法依赖于用户提供的关于开放数据集的信息——数据集标签中包含的数据集描述。由于数据集标签由于其来源而表达低一致性，在本文中，我们将提出一种基于自然语言处理机制的语义相似度量来优化其使用的方法。优化是根据减少用于数据集描述的不同标记值的数量来执行的。优化后，通过形式概念分析，使用数据集标签来揭示源自其使用的共享概念化。我们将通过比较优化过程前后使用形式概念分析生成的概念格，并使用生成的结构作为知识库对未分类的开放数据集进行分类，来展示我们的建议的优势。最后，我们将提出一种基于生成的知识库的分类机制，该机制利用语义相似度度量来提出适合未分类数据集的类别。

{"title":"On revealing shared conceptualization among open datasets","authors":"Miloš Bogdanović, Nataša Veljković, Milena Frtunić Gligorijević, Darko Puflović, Leonid Stoimenov","doi":"10.1016/j.websem.2020.100624","DOIUrl":"10.1016/j.websem.2020.100624","url":null,"abstract":"<div><p><span>Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data<span><span> portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of </span>semantic similarity measures based on </span></span>natural language processing<span> mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72571636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Less is more: Data-efficient complex question answering over knowledge bases 少即是多:基于知识库的数据高效复杂问题回答

IF 2.5 3区计算机科学 Q1 Computer Science

Journal of Web Semantics

Pub Date : 2020-12-01 DOI: 10.1016/j.websem.2020.100612

Yuncheng Hua , Yuan-Fang Li , Guilin Qi , Wei Wu , Jingyao Zhang , Daiqing Qi

Question answering is an effective method for obtaining information from knowledge bases (KB). In this paper, we propose the Neural-Symbolic Complex Question Answering (NS-CQA) model, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples. Our framework consists of a neural generator and a symbolic executor that, respectively, transforms a natural-language question into a sequence of primitive actions, and executes them over the knowledge base to compute the answer. We carefully formulate a set of primitive symbolic actions that allows us to not only simplify our neural network design but also accelerate model convergence. To reduce search space, we employ the copy and masking mechanisms in our encoder–decoder architecture to drastically reduce the decoder output vocabulary and improve model generalizability. We equip our model with a memory buffer that stores high-reward promising programs. Besides, we propose an adaptive reward function. By comparing the generated trial with the trials stored in the memory buffer, we derive the curriculum-guided reward bonus, i.e., the proximity and the novelty. To mitigate the sparse reward problem, we combine the adaptive reward and the reward bonus, reshaping the sparse reward into dense feedback. Also, we encourage the model to generate new trials to avoid imitating the spurious trials while making the model remember the past high-reward trials to improve data efficiency. Our NS-CQA model is evaluated on two datasets: CQA, a recent large-scale complex question answering dataset, and WebQuestionsSP, a multi-hop question answering dataset. On both datasets, our model outperforms the state-of-the-art models. Notably, on CQA, NS-CQA performs well on questions with higher complexity, while only using approximately 1% of the total training samples.

问答是从知识库中获取信息的一种有效方法。在本文中，我们提出了神经符号复杂问题回答(NS-CQA)模型，这是一个仅使用少量训练样本进行复杂问题回答的数据高效强化学习框架。我们的框架由一个神经生成器和一个符号执行器组成，它们分别将自然语言问题转换为一系列原始动作，并在知识库上执行它们以计算答案。我们仔细地制定了一套原始的符号动作，使我们不仅简化了我们的神经网络设计，而且加速了模型的收敛。为了减少搜索空间，我们在编码器-解码器架构中采用了复制和屏蔽机制，以大幅减少解码器输出词汇表并提高模型的泛化性。我们为我们的模型配备了一个存储高回报有希望的程序的内存缓冲区。此外，我们提出了一个自适应奖励函数。通过将生成的实验与存储在记忆缓冲中的实验进行比较，我们得出了课程引导的奖励奖励，即接近性和新颖性。为了缓解稀疏奖励问题，我们将自适应奖励与奖励奖金相结合，将稀疏奖励重塑为密集反馈。此外，我们鼓励模型生成新的试验，以避免模仿虚假试验，同时使模型记住过去的高奖励试验，以提高数据效率。我们的NS-CQA模型在两个数据集上进行了评估:CQA，一个最近的大规模复杂问答数据集，和WebQuestionsSP，一个多跳问答数据集。在这两个数据集上，我们的模型都优于最先进的模型。值得注意的是，在CQA上，NS-CQA在更高复杂性的问题上表现良好，而只使用了大约1%的总训练样本。

{"title":"Less is more: Data-efficient complex question answering over knowledge bases","authors":"Yuncheng Hua , Yuan-Fang Li , Guilin Qi , Wei Wu , Jingyao Zhang , Daiqing Qi","doi":"10.1016/j.websem.2020.100612","DOIUrl":"10.1016/j.websem.2020.100612","url":null,"abstract":"<div><p><span>Question answering is an effective method for obtaining information from knowledge bases (KB). In this paper, we propose the Neural-Symbolic Complex Question Answering (NS-CQA) model, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples. Our framework consists of a neural </span><em>generator</em> and a symbolic <em>executor</em><span><span><span> that, respectively, transforms a natural-language question into a sequence of primitive actions, and executes them over the knowledge base to compute the answer. We carefully formulate a set of primitive symbolic actions that allows us to not only simplify our </span>neural network design but also accelerate model convergence. To reduce search space, we employ the copy and masking mechanisms in our encoder–decoder architecture to drastically reduce the decoder output vocabulary and improve model </span>generalizability<span>. We equip our model with a memory buffer that stores high-reward promising programs. Besides, we propose an adaptive reward function. By comparing the generated trial with the trials stored in the memory buffer, we derive the curriculum-guided reward bonus, i.e., the proximity and the novelty. To mitigate the sparse reward problem, we combine the adaptive reward and the reward bonus, reshaping the sparse reward into dense feedback. Also, we encourage the model to generate new trials to avoid imitating the spurious trials while making the model remember the past high-reward trials to improve data efficiency. Our NS-CQA model is evaluated on two datasets: CQA, a recent large-scale complex question answering dataset, and WebQuestionsSP, a multi-hop question answering dataset. On both datasets, our model outperforms the state-of-the-art models. Notably, on CQA, NS-CQA performs well on questions with higher complexity, while only using approximately 1% of the total training samples.</span></span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2020.100612","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84110720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21