ABSTRACT Radiotherapy is one of the main treatment methods for cancer, and the delineation of the radiotherapy target area is the basis and premise of precise treatment. Artificial intelligence technology represented by machine learning has done a lot of research in this area, improving the accuracy and efficiency of target delineation. This article will review the applications and research of machine learning in medical image matching, normal organ delineation and treatment target delineation according to the procudures of doctors to delineate the target volume, and give an outlook on the development prospects.
{"title":"A Survey on Automatic Delineation of Radiotherapy Target Volume based on Machine Learning","authors":"Zhenchao Tao, Shengfei Lyu","doi":"10.1162/dint_a_00204","DOIUrl":"https://doi.org/10.1162/dint_a_00204","url":null,"abstract":"ABSTRACT Radiotherapy is one of the main treatment methods for cancer, and the delineation of the radiotherapy target area is the basis and premise of precise treatment. Artificial intelligence technology represented by machine learning has done a lot of research in this area, improving the accuracy and efficiency of target delineation. This article will review the applications and research of machine learning in medical image matching, normal organ delineation and treatment target delineation according to the procudures of doctors to delineate the target volume, and give an outlook on the development prospects.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"841-856"},"PeriodicalIF":3.9,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41788085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaxi Yang, Kui Chen, Kai Ding, Chongning Na, Meng Wang
ABSTRACT In recent years, feature engineering-based machine learning models have made significant progress in auto insurance fraud detection. However, most models or systems focused only on structural data and did not utilize multi-modal data to improve fraud detection efficiency. To solve this problem, we adapt both natural language processing and computer vision techniques to our knowledge-based algorithm and construct an Auto Insurance Multi-modal Learning (AIML) framework. We then apply AIML to detect fraud behavior in auto insurance cases with data from real scenarios and conduct experiments to examine the improvement in model performance with multi-modal data compared to baseline model with structural data only. A self-designed Semi-Auto Feature Engineer (SAFE) algorithm to process auto insurance data and a visual data processing framework are embedded within AIML. Results show that AIML substantially improves the model performance in detecting fraud behavior compared to models that only use structural data.
{"title":"Auto Insurance Fraud Detection with Multimodal Learning","authors":"Jiaxi Yang, Kui Chen, Kai Ding, Chongning Na, Meng Wang","doi":"10.1162/dint_a_00191","DOIUrl":"https://doi.org/10.1162/dint_a_00191","url":null,"abstract":"ABSTRACT In recent years, feature engineering-based machine learning models have made significant progress in auto insurance fraud detection. However, most models or systems focused only on structural data and did not utilize multi-modal data to improve fraud detection efficiency. To solve this problem, we adapt both natural language processing and computer vision techniques to our knowledge-based algorithm and construct an Auto Insurance Multi-modal Learning (AIML) framework. We then apply AIML to detect fraud behavior in auto insurance cases with data from real scenarios and conduct experiments to examine the improvement in model performance with multi-modal data compared to baseline model with structural data only. A self-designed Semi-Auto Feature Engineer (SAFE) algorithm to process auto insurance data and a visual data processing framework are embedded within AIML. Results show that AIML substantially improves the model performance in detecting fraud behavior compared to models that only use structural data.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"388-412"},"PeriodicalIF":3.9,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48167815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lili Zhang, Jianhui Li, P. Uhlir, Liangming Wen, Kaichao Wu, Ze Luo, Yude Liu
ABSTRACT This paper focuses on research e-infrastructures in the open science era. We analyze some of the challenges and opportunities of cloud-based science and introduce an example of a national solution in the China Science and Technology Cloud (CSTCloud). We selected three CSTCloud use cases in deploying open science modules, including scalable engineering in astronomical data management, integrated Earth-science resources for SDG-13 decision making, and the coupling of citizen science and artificial intelligence (AI) techniques in biodiversity. We conclude with a forecast on the future development of research e-infrastructures and introduce the idea of the Global Open Science Cloud (GOSC). We hope this analysis can provide some insights into the future development of research e-infrastructures in support of open science.
{"title":"Research e-infrastructures for open science: The national example of CSTCloud in China","authors":"Lili Zhang, Jianhui Li, P. Uhlir, Liangming Wen, Kaichao Wu, Ze Luo, Yude Liu","doi":"10.1162/dint_a_00196","DOIUrl":"https://doi.org/10.1162/dint_a_00196","url":null,"abstract":"ABSTRACT This paper focuses on research e-infrastructures in the open science era. We analyze some of the challenges and opportunities of cloud-based science and introduce an example of a national solution in the China Science and Technology Cloud (CSTCloud). We selected three CSTCloud use cases in deploying open science modules, including scalable engineering in astronomical data management, integrated Earth-science resources for SDG-13 decision making, and the coupling of citizen science and artificial intelligence (AI) techniques in biodiversity. We conclude with a forecast on the future development of research e-infrastructures and introduce the idea of the Global Open Science Cloud (GOSC). We hope this analysis can provide some insights into the future development of research e-infrastructures in support of open science.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"355-369"},"PeriodicalIF":3.9,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47975608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuqin Li, Kaibin Zhou, Zeyang Zhuang, Haofen Wang, Jun Ma
ABSTRACT Text-to-SQL aims at translating textual questions into the corresponding SQL queries. Aggregate tables are widely created for high-frequent queries. Although text-to-SQL has emerged as an important task, recent studies paid little attention to the task over aggregate tables. The increased aggregate tables bring two challenges: (1) mapping of natural language questions and relational databases will suffer from more ambiguity, (2) modern models usually adopt self-attention mechanism to encode database schema and question. The mechanism is of quadratic time complexity, which will make inferring more time-consuming as input sequence length grows. In this paper, we introduce a novel approach named WAGG for text-to-SQL over aggregate tables. To effectively select among ambiguous items, we propose a relation selection mechanism for relation computing. To deal with high computation costs, we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables. We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation, where extensive experiments show the effectiveness and efficiency of our proposed method with 4% increase of accuracy and 15% decrease of inference time w.r.t a strong baseline RAT-SQL.
{"title":"Towards Text-to-SQL over Aggregate Tables","authors":"Shuqin Li, Kaibin Zhou, Zeyang Zhuang, Haofen Wang, Jun Ma","doi":"10.1162/dint_a_00194","DOIUrl":"https://doi.org/10.1162/dint_a_00194","url":null,"abstract":"ABSTRACT Text-to-SQL aims at translating textual questions into the corresponding SQL queries. Aggregate tables are widely created for high-frequent queries. Although text-to-SQL has emerged as an important task, recent studies paid little attention to the task over aggregate tables. The increased aggregate tables bring two challenges: (1) mapping of natural language questions and relational databases will suffer from more ambiguity, (2) modern models usually adopt self-attention mechanism to encode database schema and question. The mechanism is of quadratic time complexity, which will make inferring more time-consuming as input sequence length grows. In this paper, we introduce a novel approach named WAGG for text-to-SQL over aggregate tables. To effectively select among ambiguous items, we propose a relation selection mechanism for relation computing. To deal with high computation costs, we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables. We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation, where extensive experiments show the effectiveness and efficiency of our proposed method with 4% increase of accuracy and 15% decrease of inference time w.r.t a strong baseline RAT-SQL.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"457-474"},"PeriodicalIF":3.9,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41824177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACT Metadata is data about data, which is generated mainly for resources organization and description, facilitating finding, identifying, selecting and obtaining information①. With the advancement of technologies, the acquisition of metadata has gradually become a critical step in data modeling and function operation, which leads to the formation of its methodological commons. A series of general operations has been developed to achieve structured description, semantic encoding and machine-understandable information, including entity definition, relation description, object analysis, attribute extraction, ontology modeling, data cleaning, disambiguation, alignment, mapping, relating, enriching, importing, exporting, service implementation, registry and discovery, monitoring etc. Those operations are not only necessary elements in semantic technologies (including linked data) and knowledge graph technology, but has also developed into the common operation and primary strategy in building independent and knowledge-based information systems. In this paper, a series of metadata-related methods are collectively referred to as ‘metadata methodological commons’, which has a lot of best practices reflected in the various standard specifications of the Semantic Web. In the future construction of a multi-modal metaverse based on Web 3.0, it shall play an important role, for example, in building digital twins through adopting knowledge models, or supporting the modeling of the entire virtual world, etc. Manual-based description and coding obviously cannot adapted to the UGC (User Generated Contents) and AIGC (AI Generated Contents)-based content production in the metaverse era. The automatic processing of semantic formalization must be considered as a sure way to adapt metadata methodological commons to meet the future needs of AI era.
{"title":"Metadata as a Methodological Commons: From Aboutness Description to Cognitive Modeling","authors":"Wei Liu, Yaming Fu, Qianqian Liu","doi":"10.1162/dint_a_00189","DOIUrl":"https://doi.org/10.1162/dint_a_00189","url":null,"abstract":"ABSTRACT Metadata is data about data, which is generated mainly for resources organization and description, facilitating finding, identifying, selecting and obtaining information①. With the advancement of technologies, the acquisition of metadata has gradually become a critical step in data modeling and function operation, which leads to the formation of its methodological commons. A series of general operations has been developed to achieve structured description, semantic encoding and machine-understandable information, including entity definition, relation description, object analysis, attribute extraction, ontology modeling, data cleaning, disambiguation, alignment, mapping, relating, enriching, importing, exporting, service implementation, registry and discovery, monitoring etc. Those operations are not only necessary elements in semantic technologies (including linked data) and knowledge graph technology, but has also developed into the common operation and primary strategy in building independent and knowledge-based information systems. In this paper, a series of metadata-related methods are collectively referred to as ‘metadata methodological commons’, which has a lot of best practices reflected in the various standard specifications of the Semantic Web. In the future construction of a multi-modal metaverse based on Web 3.0, it shall play an important role, for example, in building digital twins through adopting knowledge models, or supporting the modeling of the entire virtual world, etc. Manual-based description and coding obviously cannot adapted to the UGC (User Generated Contents) and AIGC (AI Generated Contents)-based content production in the metaverse era. The automatic processing of semantic formalization must be considered as a sure way to adapt metadata methodological commons to meet the future needs of AI era.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"289-302"},"PeriodicalIF":3.9,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48210399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACT Few-shot learning has been proposed and rapidly emerging as a viable means for completing various tasks. Recently, few-shot models have been used for Named Entity Recognition (NER). Prototypical network shows high efficiency on few-shot NER. However, existing prototypical methods only consider the similarity of tokens in query sets and support sets and ignore the semantic similarity among the sentences which contain these entities. We present a novel model, Few-shot Named Entity Recognition with Joint Token and Sentence Awareness (JTSA), to address the issue. The sentence awareness is introduced to probe the semantic similarity among the sentences. The Token awareness is used to explore the similarity of the tokens. To further improve the robustness and results of the model, we adopt the joint learning scheme on the few-shot NER. Experimental results demonstrate that our model outperforms state-of-the-art models on two standard Few-shot NER datasets.
{"title":"Few-shot Named Entity Recognition with Joint Token and Sentence Awareness","authors":"Wen Wen, Yongbin Liu, Qiang Lin, Chunping Ouyang","doi":"10.1162/dint_a_00195","DOIUrl":"https://doi.org/10.1162/dint_a_00195","url":null,"abstract":"ABSTRACT Few-shot learning has been proposed and rapidly emerging as a viable means for completing various tasks. Recently, few-shot models have been used for Named Entity Recognition (NER). Prototypical network shows high efficiency on few-shot NER. However, existing prototypical methods only consider the similarity of tokens in query sets and support sets and ignore the semantic similarity among the sentences which contain these entities. We present a novel model, Few-shot Named Entity Recognition with Joint Token and Sentence Awareness (JTSA), to address the issue. The sentence awareness is introduced to probe the semantic similarity among the sentences. The Token awareness is used to explore the similarity of the tokens. To further improve the robustness and results of the model, we adopt the joint learning scheme on the few-shot NER. Experimental results demonstrate that our model outperforms state-of-the-art models on two standard Few-shot NER datasets.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"767-785"},"PeriodicalIF":3.9,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42109023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, Juan Romero
ABSTRACT In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported, thus providing a flexible data management engine for diverse types of knowledge graph. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.
{"title":"MillenniumDB: An Open-Source Graph Database System","authors":"Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, Juan Romero","doi":"10.1162/dint_a_00229","DOIUrl":"https://doi.org/10.1162/dint_a_00229","url":null,"abstract":"ABSTRACT In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported, thus providing a flexible data management engine for diverse types of knowledge graph. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135401943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACT Sustainable development denotes the enhancement of living standards in the present without compromising future generations’ resources. Sustainable Development Goals (SDGs) quantify the accomplishment of sustainable development and pave the way for a world worth living in for future generations. Scholars can contribute to the achievement of the SDGs by guiding the actions of practitioners based on the analysis of SDG data, as intended by this work. We propose a framework of algorithms based on dimensionality reduction methods with the use of Hilbert Space Filling Curves (HSFCs) in order to semantically cluster new uncategorised SDG data and novel indicators, and efficiently place them in the environment of a distributed knowledge graph store. First, a framework of algorithms for insertion of new indicators and projection on the HSFC curve based on their transformer-based similarity assessment, for retrieval of indicators and load-balancing along with an approach for data classification of entrant-indicators is described. Then, a thorough case study in a distributed knowledge graph environment experimentally evaluates our framework. The results are presented and discussed in light of theory along with the actual impact that can have for practitioners analysing SDG data, including intergovernmental organizations, government agencies and social welfare organizations. Our approach empowers SDG knowledge graphs for causal analysis, inference, and manifold interpretations of the societal implications of SDG-related actions, as data are accessed in reduced retrieval times. It facilitates quicker measurement of influence of users and communities on specific goals and serves for faster distributed knowledge matching, as semantic cohesion of data is preserved.
可持续发展是指在不损害子孙后代资源的前提下提高当前生活水平。可持续发展目标(sdg)量化了可持续发展的成就,为子孙后代创造一个值得生活的世界铺平了道路。学者可以在分析可持续发展目标数据的基础上指导实践者的行动,从而为实现可持续发展目标做出贡献,这也是本工作的目的。我们提出了一种基于降维方法的算法框架,利用希尔伯特空间填充曲线(Hilbert Space Filling Curves, hsfc)对新的未分类的可持续发展目标数据和新的指标进行语义聚类,并有效地将它们放置在分布式知识图存储环境中。首先,描述了基于变压器相似性评估的新指标插入和HSFC曲线投影的算法框架,用于指标检索和负载平衡,以及进入指标的数据分类方法。然后,在分布式知识图环境中进行了全面的案例研究,实验评估了我们的框架。结果在理论的基础上提出和讨论,以及对分析可持续发展目标数据的实践者的实际影响,包括政府间组织、政府机构和社会福利组织。我们的方法使可持续发展目标知识图谱能够进行因果分析、推理,并对可持续发展目标相关行动的社会影响进行多种解释,因为数据可以在更短的检索时间内访问。它有助于更快地衡量用户和社区对特定目标的影响,并有助于更快地进行分布式知识匹配,因为数据的语义内聚得到了保留。
{"title":"A Knowledge Graph-Based Deep Learning Framework for Efficient Content Similarity Search of Sustainable Development Goals Data","authors":"Irene Kilanioti, George A. Papadopoulos","doi":"10.1162/dint_a_00230","DOIUrl":"https://doi.org/10.1162/dint_a_00230","url":null,"abstract":"ABSTRACT Sustainable development denotes the enhancement of living standards in the present without compromising future generations’ resources. Sustainable Development Goals (SDGs) quantify the accomplishment of sustainable development and pave the way for a world worth living in for future generations. Scholars can contribute to the achievement of the SDGs by guiding the actions of practitioners based on the analysis of SDG data, as intended by this work. We propose a framework of algorithms based on dimensionality reduction methods with the use of Hilbert Space Filling Curves (HSFCs) in order to semantically cluster new uncategorised SDG data and novel indicators, and efficiently place them in the environment of a distributed knowledge graph store. First, a framework of algorithms for insertion of new indicators and projection on the HSFC curve based on their transformer-based similarity assessment, for retrieval of indicators and load-balancing along with an approach for data classification of entrant-indicators is described. Then, a thorough case study in a distributed knowledge graph environment experimentally evaluates our framework. The results are presented and discussed in light of theory along with the actual impact that can have for practitioners analysing SDG data, including intergovernmental organizations, government agencies and social welfare organizations. Our approach empowers SDG knowledge graphs for causal analysis, inference, and manifold interpretations of the societal implications of SDG-related actions, as data are accessed in reduced retrieval times. It facilitates quicker measurement of influence of users and communities on specific goals and serves for faster distributed knowledge matching, as semantic cohesion of data is preserved.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135400885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Intelligent Organization and Application of Multi-source Heterogeneous Knowledge Resources for Energy Internet","authors":"Yuxuan Wang, Liqun Luo, Guangjian Li","doi":"10.1162/dint_a_00158","DOIUrl":"https://doi.org/10.1162/dint_a_00158","url":null,"abstract":"","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"75-99"},"PeriodicalIF":3.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64532029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACT Machine reading comprehension has been a research focus in natural language processing and intelligence engineering. However, there is a lack of models and datasets for the MRC tasks in the anti-terrorism domain. Moreover, current research lacks the ability to embed accurate background knowledge and provide precise answers. To address these two problems, this paper first builds a text corpus and testbed that focuses on the anti-terrorism domain in a semi-automatic manner. Then, it proposes a knowledge-based machine reading comprehension model that fuses domain-related triples from a large-scale encyclopedic knowledge base to enhance the semantics of the text. To eliminate knowledge noise that could lead to semantic deviation, this paper uses a mixed mutual attention mechanism among questions, passages, and knowledge triples to select the most relevant triples before embedding their semantics into the sentences. Experiment results indicate that the proposed approach can achieve a 70.70% EM value and an 87.91% F1 score, with a 4.23% and 3.35% improvement over existing methods, respectively.
{"title":"Knowledge Graph based Mutual Attention for Machine Reading Comprehension over Anti-Terrorism Corpus","authors":"Feng Gao, Jin Hou, Jinguang Gu, Lihua Zhang","doi":"10.1162/dint_a_00210","DOIUrl":"https://doi.org/10.1162/dint_a_00210","url":null,"abstract":"ABSTRACT Machine reading comprehension has been a research focus in natural language processing and intelligence engineering. However, there is a lack of models and datasets for the MRC tasks in the anti-terrorism domain. Moreover, current research lacks the ability to embed accurate background knowledge and provide precise answers. To address these two problems, this paper first builds a text corpus and testbed that focuses on the anti-terrorism domain in a semi-automatic manner. Then, it proposes a knowledge-based machine reading comprehension model that fuses domain-related triples from a large-scale encyclopedic knowledge base to enhance the semantics of the text. To eliminate knowledge noise that could lead to semantic deviation, this paper uses a mixed mutual attention mechanism among questions, passages, and knowledge triples to select the most relevant triples before embedding their semantics into the sentences. Experiment results indicate that the proposed approach can achieve a 70.70% EM value and an 87.91% F1 score, with a 4.23% and 3.35% improvement over existing methods, respectively.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135401223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}