首页 > 最新文献

Data Intelligence最新文献

英文 中文
A Survey on Automatic Delineation of Radiotherapy Target Volume based on Machine Learning 基于机器学习的放射治疗靶体积自动绘制研究综述
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-11 DOI: 10.1162/dint_a_00204
Zhenchao Tao, Shengfei Lyu
ABSTRACT Radiotherapy is one of the main treatment methods for cancer, and the delineation of the radiotherapy target area is the basis and premise of precise treatment. Artificial intelligence technology represented by machine learning has done a lot of research in this area, improving the accuracy and efficiency of target delineation. This article will review the applications and research of machine learning in medical image matching, normal organ delineation and treatment target delineation according to the procudures of doctors to delineate the target volume, and give an outlook on the development prospects.
摘要放射治疗是癌症的主要治疗方法之一,放射治疗靶区的划定是精确治疗的基础和前提。以机器学习为代表的人工智能技术在这方面做了大量的研究,提高了目标描绘的准确性和效率。本文将根据医生描绘目标体积的过程,综述机器学习在医学图像匹配、正常器官描绘和治疗目标描绘中的应用和研究,并对其发展前景进行展望。
{"title":"A Survey on Automatic Delineation of Radiotherapy Target Volume based on Machine Learning","authors":"Zhenchao Tao, Shengfei Lyu","doi":"10.1162/dint_a_00204","DOIUrl":"https://doi.org/10.1162/dint_a_00204","url":null,"abstract":"ABSTRACT Radiotherapy is one of the main treatment methods for cancer, and the delineation of the radiotherapy target area is the basis and premise of precise treatment. Artificial intelligence technology represented by machine learning has done a lot of research in this area, improving the accuracy and efficiency of target delineation. This article will review the applications and research of machine learning in medical image matching, normal organ delineation and treatment target delineation according to the procudures of doctors to delineate the target volume, and give an outlook on the development prospects.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"841-856"},"PeriodicalIF":3.9,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41788085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auto Insurance Fraud Detection with Multimodal Learning 基于多模态学习的汽车保险欺诈检测
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-09 DOI: 10.1162/dint_a_00191
Jiaxi Yang, Kui Chen, Kai Ding, Chongning Na, Meng Wang
ABSTRACT In recent years, feature engineering-based machine learning models have made significant progress in auto insurance fraud detection. However, most models or systems focused only on structural data and did not utilize multi-modal data to improve fraud detection efficiency. To solve this problem, we adapt both natural language processing and computer vision techniques to our knowledge-based algorithm and construct an Auto Insurance Multi-modal Learning (AIML) framework. We then apply AIML to detect fraud behavior in auto insurance cases with data from real scenarios and conduct experiments to examine the improvement in model performance with multi-modal data compared to baseline model with structural data only. A self-designed Semi-Auto Feature Engineer (SAFE) algorithm to process auto insurance data and a visual data processing framework are embedded within AIML. Results show that AIML substantially improves the model performance in detecting fraud behavior compared to models that only use structural data.
摘要近年来,基于特征工程的机器学习模型在车险欺诈检测方面取得了重大进展。然而,大多数模型或系统只关注结构数据,而没有利用多模态数据来提高欺诈检测效率。为了解决这个问题,我们将自然语言处理和计算机视觉技术与基于知识的算法相结合,并构建了一个自动保险多模式学习(AIML)框架。然后,我们将AIML应用于使用真实场景中的数据检测汽车保险案件中的欺诈行为,并进行实验来检查与仅使用结构数据的基线模型相比,使用多模态数据的模型性能的改善。在AIML中嵌入了自行设计的半自动特征工程师(SAFE)算法和可视化的数据处理框架。结果表明,与仅使用结构数据的模型相比,AIML显著提高了模型在检测欺诈行为方面的性能。
{"title":"Auto Insurance Fraud Detection with Multimodal Learning","authors":"Jiaxi Yang, Kui Chen, Kai Ding, Chongning Na, Meng Wang","doi":"10.1162/dint_a_00191","DOIUrl":"https://doi.org/10.1162/dint_a_00191","url":null,"abstract":"ABSTRACT In recent years, feature engineering-based machine learning models have made significant progress in auto insurance fraud detection. However, most models or systems focused only on structural data and did not utilize multi-modal data to improve fraud detection efficiency. To solve this problem, we adapt both natural language processing and computer vision techniques to our knowledge-based algorithm and construct an Auto Insurance Multi-modal Learning (AIML) framework. We then apply AIML to detect fraud behavior in auto insurance cases with data from real scenarios and conduct experiments to examine the improvement in model performance with multi-modal data compared to baseline model with structural data only. A self-designed Semi-Auto Feature Engineer (SAFE) algorithm to process auto insurance data and a visual data processing framework are embedded within AIML. Results show that AIML substantially improves the model performance in detecting fraud behavior compared to models that only use structural data.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"388-412"},"PeriodicalIF":3.9,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48167815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research e-infrastructures for open science: The national example of CSTCloud in China 研究开放科学的电子基础设施:CSTCloud在中国的全国性实例
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-09 DOI: 10.1162/dint_a_00196
Lili Zhang, Jianhui Li, P. Uhlir, Liangming Wen, Kaichao Wu, Ze Luo, Yude Liu
ABSTRACT This paper focuses on research e-infrastructures in the open science era. We analyze some of the challenges and opportunities of cloud-based science and introduce an example of a national solution in the China Science and Technology Cloud (CSTCloud). We selected three CSTCloud use cases in deploying open science modules, including scalable engineering in astronomical data management, integrated Earth-science resources for SDG-13 decision making, and the coupling of citizen science and artificial intelligence (AI) techniques in biodiversity. We conclude with a forecast on the future development of research e-infrastructures and introduce the idea of the Global Open Science Cloud (GOSC). We hope this analysis can provide some insights into the future development of research e-infrastructures in support of open science.
摘要本文主要研究开放科学时代的电子基础设施。我们分析了基于云的科学的一些挑战和机遇,并介绍了中国科技云(CSTCloud)中的一个国家解决方案的例子。我们在部署开放科学模块时选择了三个CSTCloud用例,包括天文数据管理中的可扩展工程、SDG-13决策的集成地球科学资源,以及生物多样性中公民科学和人工智能(AI)技术的耦合。最后,我们对研究电子基础设施的未来发展进行了预测,并介绍了全球开放科学云(GOSC)的概念。我们希望这一分析能够为支持开放科学的研究电子基础设施的未来发展提供一些见解。
{"title":"Research e-infrastructures for open science: The national example of CSTCloud in China","authors":"Lili Zhang, Jianhui Li, P. Uhlir, Liangming Wen, Kaichao Wu, Ze Luo, Yude Liu","doi":"10.1162/dint_a_00196","DOIUrl":"https://doi.org/10.1162/dint_a_00196","url":null,"abstract":"ABSTRACT This paper focuses on research e-infrastructures in the open science era. We analyze some of the challenges and opportunities of cloud-based science and introduce an example of a national solution in the China Science and Technology Cloud (CSTCloud). We selected three CSTCloud use cases in deploying open science modules, including scalable engineering in astronomical data management, integrated Earth-science resources for SDG-13 decision making, and the coupling of citizen science and artificial intelligence (AI) techniques in biodiversity. We conclude with a forecast on the future development of research e-infrastructures and introduce the idea of the Global Open Science Cloud (GOSC). We hope this analysis can provide some insights into the future development of research e-infrastructures in support of open science.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"355-369"},"PeriodicalIF":3.9,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47975608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Text-to-SQL over Aggregate Tables 聚合表上的文本到SQL
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-09 DOI: 10.1162/dint_a_00194
Shuqin Li, Kaibin Zhou, Zeyang Zhuang, Haofen Wang, Jun Ma
ABSTRACT Text-to-SQL aims at translating textual questions into the corresponding SQL queries. Aggregate tables are widely created for high-frequent queries. Although text-to-SQL has emerged as an important task, recent studies paid little attention to the task over aggregate tables. The increased aggregate tables bring two challenges: (1) mapping of natural language questions and relational databases will suffer from more ambiguity, (2) modern models usually adopt self-attention mechanism to encode database schema and question. The mechanism is of quadratic time complexity, which will make inferring more time-consuming as input sequence length grows. In this paper, we introduce a novel approach named WAGG for text-to-SQL over aggregate tables. To effectively select among ambiguous items, we propose a relation selection mechanism for relation computing. To deal with high computation costs, we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables. We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation, where extensive experiments show the effectiveness and efficiency of our proposed method with 4% increase of accuracy and 15% decrease of inference time w.r.t a strong baseline RAT-SQL.
摘要文本到SQL旨在将文本问题转换为相应的SQL查询。聚合表广泛用于高频率查询。尽管文本到SQL已经成为一项重要的任务,但最近的研究很少关注聚合表的任务。增加的聚合表带来了两个挑战:(1)自然语言问题和关系数据库的映射将遭受更多的歧义;(2)现代模型通常采用自注意机制对数据库模式和问题进行编码。该机制具有二次型时间复杂度,随着输入序列长度的增长,推理将更加耗时。在本文中,我们介绍了一种新的方法WAGG,用于在聚合表上从文本到SQL。为了有效地在歧义项中进行选择,我们提出了一种用于关系计算的关系选择机制。为了处理高计算成本,我们引入了一种动态修剪策略来丢弃聚合表中常见的不相关项。我们还构建了一个新的大规模数据集SpiderwAGG,该数据集是从Spider数据集扩展而来进行验证的,大量实验表明,与强基线RAT-SQL相比,我们提出的方法的有效性和效率提高了4%,推理时间减少了15%。
{"title":"Towards Text-to-SQL over Aggregate Tables","authors":"Shuqin Li, Kaibin Zhou, Zeyang Zhuang, Haofen Wang, Jun Ma","doi":"10.1162/dint_a_00194","DOIUrl":"https://doi.org/10.1162/dint_a_00194","url":null,"abstract":"ABSTRACT Text-to-SQL aims at translating textual questions into the corresponding SQL queries. Aggregate tables are widely created for high-frequent queries. Although text-to-SQL has emerged as an important task, recent studies paid little attention to the task over aggregate tables. The increased aggregate tables bring two challenges: (1) mapping of natural language questions and relational databases will suffer from more ambiguity, (2) modern models usually adopt self-attention mechanism to encode database schema and question. The mechanism is of quadratic time complexity, which will make inferring more time-consuming as input sequence length grows. In this paper, we introduce a novel approach named WAGG for text-to-SQL over aggregate tables. To effectively select among ambiguous items, we propose a relation selection mechanism for relation computing. To deal with high computation costs, we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables. We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation, where extensive experiments show the effectiveness and efficiency of our proposed method with 4% increase of accuracy and 15% decrease of inference time w.r.t a strong baseline RAT-SQL.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"457-474"},"PeriodicalIF":3.9,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41824177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metadata as a Methodological Commons: From Aboutness Description to Cognitive Modeling 元数据作为一种方法论共享:从能力描述到认知建模
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-07 DOI: 10.1162/dint_a_00189
Wei Liu, Yaming Fu, Qianqian Liu
ABSTRACT Metadata is data about data, which is generated mainly for resources organization and description, facilitating finding, identifying, selecting and obtaining information①. With the advancement of technologies, the acquisition of metadata has gradually become a critical step in data modeling and function operation, which leads to the formation of its methodological commons. A series of general operations has been developed to achieve structured description, semantic encoding and machine-understandable information, including entity definition, relation description, object analysis, attribute extraction, ontology modeling, data cleaning, disambiguation, alignment, mapping, relating, enriching, importing, exporting, service implementation, registry and discovery, monitoring etc. Those operations are not only necessary elements in semantic technologies (including linked data) and knowledge graph technology, but has also developed into the common operation and primary strategy in building independent and knowledge-based information systems. In this paper, a series of metadata-related methods are collectively referred to as ‘metadata methodological commons’, which has a lot of best practices reflected in the various standard specifications of the Semantic Web. In the future construction of a multi-modal metaverse based on Web 3.0, it shall play an important role, for example, in building digital twins through adopting knowledge models, or supporting the modeling of the entire virtual world, etc. Manual-based description and coding obviously cannot adapted to the UGC (User Generated Contents) and AIGC (AI Generated Contents)-based content production in the metaverse era. The automatic processing of semantic formalization must be considered as a sure way to adapt metadata methodological commons to meet the future needs of AI era.
元数据是关于数据的数据,主要是为了组织和描述资源而产生的,便于查找、识别、选择和获取信息①。随着技术的进步,元数据的获取逐渐成为数据建模和功能操作的关键步骤,并导致其方法论公域的形成。为了实现结构化描述、语义编码和机器可理解的信息,开发了一系列通用操作,包括实体定义、关系描述、对象分析、属性提取、本体建模、数据清理、消歧、对齐、映射、关联、丰富、导入、导出、服务实现、注册和发现、监控等。这些操作不仅是语义技术(包括关联数据)和知识图谱技术的必要元素,而且已经发展成为构建独立的、基于知识的信息系统的通用操作和主要策略。在本文中,一系列与元数据相关的方法被统称为“元数据方法公共”,在语义Web的各种标准规范中反映了许多最佳实践。在未来基于Web 3.0的多模态元世界的构建中,它将发挥重要的作用,例如通过采用知识模型构建数字孪生,或者支持整个虚拟世界的建模等。基于手工的描述和编码显然不适应元宇宙时代基于UGC (User Generated Contents)和AIGC (AI Generated Contents)的内容生产。语义形式化的自动处理是适应未来人工智能时代需要的元数据方法共性的必然选择。
{"title":"Metadata as a Methodological Commons: From Aboutness Description to Cognitive Modeling","authors":"Wei Liu, Yaming Fu, Qianqian Liu","doi":"10.1162/dint_a_00189","DOIUrl":"https://doi.org/10.1162/dint_a_00189","url":null,"abstract":"ABSTRACT Metadata is data about data, which is generated mainly for resources organization and description, facilitating finding, identifying, selecting and obtaining information①. With the advancement of technologies, the acquisition of metadata has gradually become a critical step in data modeling and function operation, which leads to the formation of its methodological commons. A series of general operations has been developed to achieve structured description, semantic encoding and machine-understandable information, including entity definition, relation description, object analysis, attribute extraction, ontology modeling, data cleaning, disambiguation, alignment, mapping, relating, enriching, importing, exporting, service implementation, registry and discovery, monitoring etc. Those operations are not only necessary elements in semantic technologies (including linked data) and knowledge graph technology, but has also developed into the common operation and primary strategy in building independent and knowledge-based information systems. In this paper, a series of metadata-related methods are collectively referred to as ‘metadata methodological commons’, which has a lot of best practices reflected in the various standard specifications of the Semantic Web. In the future construction of a multi-modal metaverse based on Web 3.0, it shall play an important role, for example, in building digital twins through adopting knowledge models, or supporting the modeling of the entire virtual world, etc. Manual-based description and coding obviously cannot adapted to the UGC (User Generated Contents) and AIGC (AI Generated Contents)-based content production in the metaverse era. The automatic processing of semantic formalization must be considered as a sure way to adapt metadata methodological commons to meet the future needs of AI era.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"289-302"},"PeriodicalIF":3.9,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48210399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Few-shot Named Entity Recognition with Joint Token and Sentence Awareness 基于联合标记和句子感知的少镜头命名实体识别
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-01-09 DOI: 10.1162/dint_a_00195
Wen Wen, Yongbin Liu, Qiang Lin, Chunping Ouyang
ABSTRACT Few-shot learning has been proposed and rapidly emerging as a viable means for completing various tasks. Recently, few-shot models have been used for Named Entity Recognition (NER). Prototypical network shows high efficiency on few-shot NER. However, existing prototypical methods only consider the similarity of tokens in query sets and support sets and ignore the semantic similarity among the sentences which contain these entities. We present a novel model, Few-shot Named Entity Recognition with Joint Token and Sentence Awareness (JTSA), to address the issue. The sentence awareness is introduced to probe the semantic similarity among the sentences. The Token awareness is used to explore the similarity of the tokens. To further improve the robustness and results of the model, we adopt the joint learning scheme on the few-shot NER. Experimental results demonstrate that our model outperforms state-of-the-art models on two standard Few-shot NER datasets.
摘要:少镜头学习已经被提出并迅速成为一种可行的方法来完成各种任务。近年来,少量的镜头模型被用于命名实体识别(NER)。原型网络在少弹NER上显示出较高的效率。然而,现有的原型方法只考虑查询集和支持集中标记的相似度,而忽略了包含这些实体的句子之间的语义相似度。为了解决这个问题,我们提出了一种新的模型——基于联合令牌和句子感知的少镜头命名实体识别(JTSA)。引入句子感知来探测句子之间的语义相似度。令牌感知用于探索令牌的相似性。为了进一步提高模型的鲁棒性和结果,我们在少镜头NER上采用了联合学习方案。实验结果表明,我们的模型在两个标准的少射NER数据集上优于最先进的模型。
{"title":"Few-shot Named Entity Recognition with Joint Token and Sentence Awareness","authors":"Wen Wen, Yongbin Liu, Qiang Lin, Chunping Ouyang","doi":"10.1162/dint_a_00195","DOIUrl":"https://doi.org/10.1162/dint_a_00195","url":null,"abstract":"ABSTRACT Few-shot learning has been proposed and rapidly emerging as a viable means for completing various tasks. Recently, few-shot models have been used for Named Entity Recognition (NER). Prototypical network shows high efficiency on few-shot NER. However, existing prototypical methods only consider the similarity of tokens in query sets and support sets and ignore the semantic similarity among the sentences which contain these entities. We present a novel model, Few-shot Named Entity Recognition with Joint Token and Sentence Awareness (JTSA), to address the issue. The sentence awareness is introduced to probe the semantic similarity among the sentences. The Token awareness is used to explore the similarity of the tokens. To further improve the robustness and results of the model, we adopt the joint learning scheme on the few-shot NER. Experimental results demonstrate that our model outperforms state-of-the-art models on two standard Few-shot NER datasets.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"767-785"},"PeriodicalIF":3.9,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42109023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MillenniumDB: An Open-Source Graph Database System 一个开源的图形数据库系统
3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-01-01 DOI: 10.1162/dint_a_00229
Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, Juan Romero
ABSTRACT In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported, thus providing a flexible data management engine for diverse types of knowledge graph. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.
在这篇系统论文中,我们提出了millenumdb:一种新颖的图形数据库引擎,它是模块化的,持久的,开源的。millenumdb基于一个图形数据模型,我们称之为领域图,它提供了一个简单的抽象,在此基础上可以支持各种流行的图形模型,从而为各种类型的知识图提供了一个灵活的数据管理引擎。引擎本身是建立在一系列久经考验的技术基础之上的,这些技术来自关系数据管理、最坏情况下最优连接的最先进算法,以及用于评估路径查询的特定于图的算法。在本文中,我们提出了基于millenumdb的主要设计原则,描述了支持的抽象图模型和查询语义,实现的具体数据模型和查询语法,以及使用的存储、索引、查询规划和查询评估技术。我们对真实世界的数据和来自维基数据知识图的查询进行了评估,发现它优于其他流行的持久性图形数据库引擎(包括企业和开源替代品),这些引擎支持类似的查询功能。
{"title":"MillenniumDB: An Open-Source Graph Database System","authors":"Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, Juan Romero","doi":"10.1162/dint_a_00229","DOIUrl":"https://doi.org/10.1162/dint_a_00229","url":null,"abstract":"ABSTRACT In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported, thus providing a flexible data management engine for diverse types of knowledge graph. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135401943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Knowledge Graph-Based Deep Learning Framework for Efficient Content Similarity Search of Sustainable Development Goals Data 基于知识图的可持续发展目标数据内容相似度高效搜索深度学习框架
3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-01-01 DOI: 10.1162/dint_a_00230
Irene Kilanioti, George A. Papadopoulos
ABSTRACT Sustainable development denotes the enhancement of living standards in the present without compromising future generations’ resources. Sustainable Development Goals (SDGs) quantify the accomplishment of sustainable development and pave the way for a world worth living in for future generations. Scholars can contribute to the achievement of the SDGs by guiding the actions of practitioners based on the analysis of SDG data, as intended by this work. We propose a framework of algorithms based on dimensionality reduction methods with the use of Hilbert Space Filling Curves (HSFCs) in order to semantically cluster new uncategorised SDG data and novel indicators, and efficiently place them in the environment of a distributed knowledge graph store. First, a framework of algorithms for insertion of new indicators and projection on the HSFC curve based on their transformer-based similarity assessment, for retrieval of indicators and load-balancing along with an approach for data classification of entrant-indicators is described. Then, a thorough case study in a distributed knowledge graph environment experimentally evaluates our framework. The results are presented and discussed in light of theory along with the actual impact that can have for practitioners analysing SDG data, including intergovernmental organizations, government agencies and social welfare organizations. Our approach empowers SDG knowledge graphs for causal analysis, inference, and manifold interpretations of the societal implications of SDG-related actions, as data are accessed in reduced retrieval times. It facilitates quicker measurement of influence of users and communities on specific goals and serves for faster distributed knowledge matching, as semantic cohesion of data is preserved.
可持续发展是指在不损害子孙后代资源的前提下提高当前生活水平。可持续发展目标(sdg)量化了可持续发展的成就,为子孙后代创造一个值得生活的世界铺平了道路。学者可以在分析可持续发展目标数据的基础上指导实践者的行动,从而为实现可持续发展目标做出贡献,这也是本工作的目的。我们提出了一种基于降维方法的算法框架,利用希尔伯特空间填充曲线(Hilbert Space Filling Curves, hsfc)对新的未分类的可持续发展目标数据和新的指标进行语义聚类,并有效地将它们放置在分布式知识图存储环境中。首先,描述了基于变压器相似性评估的新指标插入和HSFC曲线投影的算法框架,用于指标检索和负载平衡,以及进入指标的数据分类方法。然后,在分布式知识图环境中进行了全面的案例研究,实验评估了我们的框架。结果在理论的基础上提出和讨论,以及对分析可持续发展目标数据的实践者的实际影响,包括政府间组织、政府机构和社会福利组织。我们的方法使可持续发展目标知识图谱能够进行因果分析、推理,并对可持续发展目标相关行动的社会影响进行多种解释,因为数据可以在更短的检索时间内访问。它有助于更快地衡量用户和社区对特定目标的影响,并有助于更快地进行分布式知识匹配,因为数据的语义内聚得到了保留。
{"title":"A Knowledge Graph-Based Deep Learning Framework for Efficient Content Similarity Search of Sustainable Development Goals Data","authors":"Irene Kilanioti, George A. Papadopoulos","doi":"10.1162/dint_a_00230","DOIUrl":"https://doi.org/10.1162/dint_a_00230","url":null,"abstract":"ABSTRACT Sustainable development denotes the enhancement of living standards in the present without compromising future generations’ resources. Sustainable Development Goals (SDGs) quantify the accomplishment of sustainable development and pave the way for a world worth living in for future generations. Scholars can contribute to the achievement of the SDGs by guiding the actions of practitioners based on the analysis of SDG data, as intended by this work. We propose a framework of algorithms based on dimensionality reduction methods with the use of Hilbert Space Filling Curves (HSFCs) in order to semantically cluster new uncategorised SDG data and novel indicators, and efficiently place them in the environment of a distributed knowledge graph store. First, a framework of algorithms for insertion of new indicators and projection on the HSFC curve based on their transformer-based similarity assessment, for retrieval of indicators and load-balancing along with an approach for data classification of entrant-indicators is described. Then, a thorough case study in a distributed knowledge graph environment experimentally evaluates our framework. The results are presented and discussed in light of theory along with the actual impact that can have for practitioners analysing SDG data, including intergovernmental organizations, government agencies and social welfare organizations. Our approach empowers SDG knowledge graphs for causal analysis, inference, and manifold interpretations of the societal implications of SDG-related actions, as data are accessed in reduced retrieval times. It facilitates quicker measurement of influence of users and communities on specific goals and serves for faster distributed knowledge matching, as semantic cohesion of data is preserved.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135400885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on Intelligent Organization and Application of Multi-source Heterogeneous Knowledge Resources for Energy Internet 面向能源互联网的多源异构知识资源智能组织与应用研究
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-01-01 DOI: 10.1162/dint_a_00158
Yuxuan Wang, Liqun Luo, Guangjian Li
{"title":"Research on Intelligent Organization and Application of Multi-source Heterogeneous Knowledge Resources for Energy Internet","authors":"Yuxuan Wang, Liqun Luo, Guangjian Li","doi":"10.1162/dint_a_00158","DOIUrl":"https://doi.org/10.1162/dint_a_00158","url":null,"abstract":"","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"75-99"},"PeriodicalIF":3.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64532029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Knowledge Graph based Mutual Attention for Machine Reading Comprehension over Anti-Terrorism Corpus 基于知识图的反恐语料库机器阅读理解相互关注
3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-01-01 DOI: 10.1162/dint_a_00210
Feng Gao, Jin Hou, Jinguang Gu, Lihua Zhang
ABSTRACT Machine reading comprehension has been a research focus in natural language processing and intelligence engineering. However, there is a lack of models and datasets for the MRC tasks in the anti-terrorism domain. Moreover, current research lacks the ability to embed accurate background knowledge and provide precise answers. To address these two problems, this paper first builds a text corpus and testbed that focuses on the anti-terrorism domain in a semi-automatic manner. Then, it proposes a knowledge-based machine reading comprehension model that fuses domain-related triples from a large-scale encyclopedic knowledge base to enhance the semantics of the text. To eliminate knowledge noise that could lead to semantic deviation, this paper uses a mixed mutual attention mechanism among questions, passages, and knowledge triples to select the most relevant triples before embedding their semantics into the sentences. Experiment results indicate that the proposed approach can achieve a 70.70% EM value and an 87.91% F1 score, with a 4.23% and 3.35% improvement over existing methods, respectively.
机器阅读理解一直是自然语言处理和智能工程领域的研究热点。然而,反恐领域的MRC任务缺乏模型和数据集。此外,目前的研究缺乏嵌入准确背景知识和提供精确答案的能力。针对这两个问题,本文首先以半自动的方式构建了一个针对反恐领域的文本语料库和测试平台。然后,提出了一种基于知识的机器阅读理解模型,该模型融合了大规模百科知识库中的领域相关三元组,以增强文本的语义。为了消除可能导致语义偏差的知识噪声,本文在问题、段落和知识三元组之间使用混合相互注意机制,选择最相关的三元组,然后将其语义嵌入到句子中。实验结果表明,该方法的EM值和F1分数分别达到70.70%和87.91%,比现有方法分别提高4.23%和3.35%。
{"title":"Knowledge Graph based Mutual Attention for Machine Reading Comprehension over Anti-Terrorism Corpus","authors":"Feng Gao, Jin Hou, Jinguang Gu, Lihua Zhang","doi":"10.1162/dint_a_00210","DOIUrl":"https://doi.org/10.1162/dint_a_00210","url":null,"abstract":"ABSTRACT Machine reading comprehension has been a research focus in natural language processing and intelligence engineering. However, there is a lack of models and datasets for the MRC tasks in the anti-terrorism domain. Moreover, current research lacks the ability to embed accurate background knowledge and provide precise answers. To address these two problems, this paper first builds a text corpus and testbed that focuses on the anti-terrorism domain in a semi-automatic manner. Then, it proposes a knowledge-based machine reading comprehension model that fuses domain-related triples from a large-scale encyclopedic knowledge base to enhance the semantics of the text. To eliminate knowledge noise that could lead to semantic deviation, this paper uses a mixed mutual attention mechanism among questions, passages, and knowledge triples to select the most relevant triples before embedding their semantics into the sentences. Experiment results indicate that the proposed approach can achieve a 70.70% EM value and an 87.91% F1 score, with a 4.23% and 3.35% improvement over existing methods, respectively.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135401223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1