AI Open最新文献_第9页

CPM-2: Large-scale cost-effective pre-trained language models CPM-2:大规模经济高效的预训练语言模型

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.12.003

Zhengyan Zhang , Yuxian Gu , Xu Han , Shengqi Chen , Chaojun Xiao , Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun

In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely infmoe, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of infmoe when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at https://github.com/TsinghuaAI/CPM.

近年来，预训练语言模型(plm)的规模突飞猛进。然而，这些大规模plm的效率问题限制了它们在实际场景中的利用。我们提出了一套具有成本效益的技术，用于使用plm来处理预训练，微调和推理的效率问题。(1)引入知识继承，利用已有的plm来加速预训练过程，而不是从零开始训练模型。(2)我们探索了大规模plm快速调优的最佳实践。与传统的微调相比，提示调优显著减少了特定于任务的参数的数量。(3)我们实现了一个新的推理工具箱，即infmoe，用于在有限的计算资源下使用大规模plm。基于我们的高效管道，我们预训练了两个模型:一个具有110亿个参数的编码器-解码器双语模型(CPM-2)和其对应的具有1980亿个参数的MoE版本。在我们的实验中，我们比较了CPM-2和mT5在下游任务中的作用。实验结果表明，CPM-2具有优异的通用语言智能。此外，我们在单个GPU上对具有数百亿参数的大规模模型进行推理时验证了infmoe的效率。所有源代码和模型参数可从https://github.com/TsinghuaAI/CPM获得。

{"title":"CPM-2: Large-scale cost-effective pre-trained language models","authors":"Zhengyan Zhang , Yuxian Gu , Xu Han , Shengqi Chen , Chaojun Xiao , Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun","doi":"10.1016/j.aiopen.2021.12.003","DOIUrl":"10.1016/j.aiopen.2021.12.003","url":null,"abstract":"<div><p>In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely <span>infmoe</span>, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of <span>infmoe</span> when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at <span>https://github.com/TsinghuaAI/CPM</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 216-224"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000310/pdfft?md5=46efc536c128aefd0ff69139f8627ddb&pid=1-s2.0-S2666651021000310-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90204116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

The information propagation model of Weibo network based on spiking neural P systems 基于脉冲神经P系统的微博网络信息传播模型

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.07.003

Tiancui Zhang , Xiaoliang Chen , Yajun Du , Xianyong Li

Information propagation models in the Weibo network play a primary role in analyzing user behaviors, obtaining the propagation paths, determining the opinion leaders, and discovering the hot spots of public opinion. Existing research recognizes the critical role played by information propagation models from different aspects. However, few studies have investigated the specific details of information propagation in any systematic way. Spiking neural P (SNP, for short) systems are one of the most potential research carriers of information propagation by applying their concurrent structures and asynchronous firing rules. This paper proposes a simple and intuitive SNP variant, namely DWIP-SNP, for user behavior analysis in Weibo. The fundamental objects of information propagation in Weibo are represented by a similar SNP formalization. The forward, comment, delete, and other users’ behaviors in the Weibo network can be observed and proceeded more intuitively. Then, the DWIP-SNP systems are combined with time delays to indicate the dynamic information diffusion from the perspective of the Bio-computing systems. Finally, a real-world example of information propagation with Weibo data set is utilized to verify the effectiveness and feasibility of the model. The insights of the DWIP-SNP based propagation model gained from this study may be of assistance to user behavior understanding and information propagation in other complex networks.

微博网络中的信息传播模型在分析用户行为、获取传播路径、确定意见领袖、发现舆情热点等方面起着重要作用。现有研究从不同角度认识到信息传播模型的关键作用。然而，很少有研究系统地研究信息传播的具体细节。脉冲神经P (spike neural P，简称SNP)系统利用其并发结构和异步触发规则成为最有潜力的信息传播载体之一。本文提出了一个简单直观的SNP变体，即DWIP-SNP，用于微博用户行为分析。微博中信息传播的基本对象由类似的SNP形式化表示。用户在微博网络中的转发、评论、删除等行为可以更直观地观察和进行。然后，将DWIP-SNP系统与时间延迟相结合，从生物计算系统的角度来表示动态信息扩散。最后，通过微博数据集的信息传播实例验证了该模型的有效性和可行性。本研究获得的基于DWIP-SNP传播模型的见解可能有助于在其他复杂网络中理解用户行为和信息传播。

{"title":"The information propagation model of Weibo network based on spiking neural P systems","authors":"Tiancui Zhang , Xiaoliang Chen , Yajun Du , Xianyong Li","doi":"10.1016/j.aiopen.2021.07.003","DOIUrl":"10.1016/j.aiopen.2021.07.003","url":null,"abstract":"<div><p>Information propagation models in the Weibo network play a primary role in analyzing user behaviors, obtaining the propagation paths, determining the opinion leaders, and discovering the hot spots of public opinion. Existing research recognizes the critical role played by information propagation models from different aspects. However, few studies have investigated the specific details of information propagation in any systematic way. Spiking neural P (SNP, for short) systems are one of the most potential research carriers of information propagation by applying their concurrent structures and asynchronous firing rules. This paper proposes a simple and intuitive SNP variant, namely DWIP-SNP, for user behavior analysis in Weibo. The fundamental objects of information propagation in Weibo are represented by a similar SNP formalization. The forward, comment, delete, and other users’ behaviors in the Weibo network can be observed and proceeded more intuitively. Then, the DWIP-SNP systems are combined with time delays to indicate the dynamic information diffusion from the perspective of the Bio-computing systems. Finally, a real-world example of information propagation with Weibo data set is utilized to verify the effectiveness and feasibility of the model. The insights of the DWIP-SNP based propagation model gained from this study may be of assistance to user behavior understanding and information propagation in other complex networks.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 135-142"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.07.003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79850721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Robustness of deep learning models on graphs: A survey 图上深度学习模型的鲁棒性:综述

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.05.002

Jiarong Xu, Junru Chen, Siqi You, Zhiqing Xiao, Yang Yang, Jiangang Lu

Machine learning (ML) technologies have achieved significant success in various downstream tasks, e.g., node classification, link prediction, community detection, graph classification and graph clustering. However, many studies have shown that the models built upon ML technologies are vulnerable to noises and adversarial attacks. A number of works have studied the robust models against noise or adversarial examples in image domains and text processing domains, however, it is more challenging to learn robust models in graph domains. Adding noises or perturbations on graph data will make the robustness even harder to enhance – the noises and perturbations of edges or node attributes are easy to propagate to other neighbors via the relational information on a graph. In this paper, we investigate and summarize the existing works that study the robust deep learning models against adversarial attacks or noises on graphs, namely the robust learning (models) on graphs. Specifically, we first provide some robustness evaluation metrics of model robustness on graphs. Then, we comprehensively provide a taxonomy which groups robust models on graphs into five categories: anomaly detection, adversarial training, pre-processing, attention mechanism, and certifiable robustness. Besides, we emphasize some promising future directions in learning robust models on graphs. Hopefully, our works can offer insights for the relevant researchers, thus providing assistance for their studies.

机器学习(ML)技术在节点分类、链接预测、社区检测、图分类和图聚类等各种下游任务中取得了重大成功。然而，许多研究表明，基于机器学习技术建立的模型容易受到噪声和对抗性攻击。许多研究已经在图像域和文本处理域研究了抗噪声或对抗示例的鲁棒模型，然而，在图域学习鲁棒模型更具挑战性。在图数据上添加噪声或扰动将使鲁棒性更难增强——边缘或节点属性的噪声和扰动很容易通过图上的关系信息传播给其他邻居。在本文中，我们调查和总结了已有的研究图上对抗攻击或噪声的鲁棒深度学习模型的工作，即图上的鲁棒学习(模型)。具体来说，我们首先给出了模型在图上的鲁棒性的一些鲁棒性评价指标。然后，我们全面提供了一种分类方法，将图上的鲁棒模型分为五类:异常检测、对抗训练、预处理、注意机制和可验证的鲁棒性。此外，我们强调了在图上学习鲁棒模型的一些有希望的未来方向。希望我们的工作能够为相关研究者提供一些见解，从而为他们的研究提供帮助。

{"title":"Robustness of deep learning models on graphs: A survey","authors":"Jiarong Xu, Junru Chen, Siqi You, Zhiqing Xiao, Yang Yang, Jiangang Lu","doi":"10.1016/j.aiopen.2021.05.002","DOIUrl":"10.1016/j.aiopen.2021.05.002","url":null,"abstract":"<div><p>Machine learning (ML) technologies have achieved significant success in various downstream tasks, e.g., node classification, link prediction, community detection, graph classification and graph clustering. However, many studies have shown that the models built upon ML technologies are vulnerable to noises and adversarial attacks. A number of works have studied the robust models against noise or adversarial examples in image domains and text processing domains, however, it is more challenging to learn robust models in graph domains. Adding noises or perturbations on graph data will make the robustness even harder to enhance – the noises and perturbations of edges or node attributes are easy to propagate to other neighbors via the relational information on a graph. In this paper, we investigate and summarize the existing works that study the robust deep learning models against adversarial attacks or noises on graphs, namely the robust learning (models) on graphs. Specifically, we first provide some robustness evaluation metrics of model robustness on graphs. Then, we comprehensively provide a taxonomy which groups robust models on graphs into five categories: anomaly detection, adversarial training, pre-processing, attention mechanism, and certifiable robustness. Besides, we emphasize some promising future directions in learning robust models on graphs. Hopefully, our works can offer insights for the relevant researchers, thus providing assistance for their studies.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 69-78"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.05.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78272915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Incorporating bidirectional interactive information and regional features for relational facts extraction 结合双向交互信息和区域特征进行关系事实提取

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.09.002

Bingshan Zhu , Yang Yu , Mingying Zhang , Haopeng Ren , Canguang Li , Wenjian Hao , Lixi Wang , Yi Cai

Extracting entity and relation jointly is often complicated since the relational triplets may be overlapped. In this paper, we propose a novel unified joint extraction model that considers the significant information which is useful for relation extraction between a pair of entities. We also consider bidirectional interaction between named entity recognition and relation extraction. To this end, we apply Bi-LSTM to capture sequential information and use Graph Convolutional Network to capture significant regional information in our encoding part. We use multi-layer structure in decoding part including first decode layer, interactive layer and final decode layer to fuse bidirectional interactive information between named entity recognition and relation extraction. In this way, our method can simultaneously extract all entities and their relations including overlapping relations. Experimental results show that our model performs better comparing with other baseline models in this task, and we achieve state-of-the-art performance on two public datasets.

由于关系三元组可能重叠，因此实体和关系的联合提取往往比较复杂。在本文中，我们提出了一种新的统一的联合抽取模型，该模型考虑了对实体间关系抽取有用的重要信息。我们还考虑了命名实体识别和关系提取之间的双向交互。为此，我们使用Bi-LSTM捕获序列信息，并使用图卷积网络捕获编码部分的重要区域信息。我们在解码部分采用多层结构，包括第一解码层、交互层和最终解码层，融合命名实体识别和关系提取之间的双向交互信息。这样，我们的方法可以同时提取所有实体及其关系，包括重叠关系。实验结果表明，与其他基线模型相比，我们的模型在该任务中表现更好，并且我们在两个公共数据集上达到了最先进的性能。

{"title":"Incorporating bidirectional interactive information and regional features for relational facts extraction","authors":"Bingshan Zhu , Yang Yu , Mingying Zhang , Haopeng Ren , Canguang Li , Wenjian Hao , Lixi Wang , Yi Cai","doi":"10.1016/j.aiopen.2021.09.002","DOIUrl":"10.1016/j.aiopen.2021.09.002","url":null,"abstract":"<div><p>Extracting entity and relation jointly is often complicated since the relational triplets may be overlapped. In this paper, we propose a novel unified joint extraction model that considers the significant information which is useful for relation extraction between a pair of entities. We also consider bidirectional interaction between named entity recognition and relation extraction. To this end, we apply Bi-LSTM to capture sequential information and use Graph Convolutional Network to capture significant regional information in our encoding part. We use multi-layer structure in decoding part including first decode layer, interactive layer and final decode layer to fuse bidirectional interactive information between named entity recognition and relation extraction. In this way, our method can simultaneously extract all entities and their relations including overlapping relations. Experimental results show that our model performs better comparing with other baseline models in this task, and we achieve state-of-the-art performance on two public datasets.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 175-185"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000255/pdfft?md5=97db58ca1e40caebd6ee57606b699005&pid=1-s2.0-S2666651021000255-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84724666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pre-trained models: Past, present and future 预训练模型:过去、现在和未来

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.08.002

Xu Han , Zhengyan Zhang , Ning Ding , Yuxian Gu , Xiao Liu , Yuqi Huo , Jiezhong Qiu , Yuan Yao , Ao Zhang , Liang Zhang , Wentao Han , Minlie Huang , Qin Jin , Yanyan Lan , Yang Liu , Zhiyuan Liu , Zhiwu Lu , Xipeng Qiu , Ruihua Song , Jie Tang , Jun Zhu

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.

BERT和GPT等大规模预训练模型(ptm)近年来取得了巨大的成功，成为人工智能领域的一个里程碑。由于复杂的预训练目标和庞大的模型参数，大规模ptm可以有效地从大量标记和未标记数据中获取知识。通过将知识存储到大参数中，并对特定任务进行微调，隐式编码在大参数中的丰富知识可以造福于各种下游任务，这已经通过实验验证和实证分析得到了广泛的证明。现在AI社区的共识是采用ptm作为下游任务的主干，而不是从头开始学习模型。在本文中，我们深入研究了预训练的历史，特别是它与迁移学习和自监督学习的特殊关系，揭示了ptm在人工智能发展光谱中的重要地位。此外，我们全面回顾了ptm的最新突破。这些突破是由计算能力的激增和数据可用性的增加驱动的，朝着四个重要方向发展:设计有效的架构，利用丰富的上下文，提高计算效率，进行解释和理论分析。最后，我们讨论了PTMs研究中存在的一系列问题和研究方向，希望我们的观点能够对PTMs的未来研究有所启发和推动。

{"title":"Pre-trained models: Past, present and future","authors":"Xu Han , Zhengyan Zhang , Ning Ding , Yuxian Gu , Xiao Liu , Yuqi Huo , Jiezhong Qiu , Yuan Yao , Ao Zhang , Liang Zhang , Wentao Han , Minlie Huang , Qin Jin , Yanyan Lan , Yang Liu , Zhiyuan Liu , Zhiwu Lu , Xipeng Qiu , Ruihua Song , Jie Tang , Jun Zhu","doi":"10.1016/j.aiopen.2021.08.002","DOIUrl":"10.1016/j.aiopen.2021.08.002","url":null,"abstract":"<div><p>Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 225-250"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.08.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76058793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 351

Neural, symbolic and neural-symbolic reasoning on knowledge graphs 知识图上的神经、符号和神经-符号推理

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.03.001

Jing Zhang, Bo Chen, Lingxi Zhang, Xirui Ke, Haipeng Ding

Knowledge graph reasoning is the fundamental component to support machine learning applications such as information extraction, information retrieval, and recommendation. Since knowledge graphs can be viewed as the discrete symbolic representations of knowledge, reasoning on knowledge graphs can naturally leverage the symbolic techniques. However, symbolic reasoning is intolerant of the ambiguous and noisy data. On the contrary, the recent advances of deep learning have promoted neural reasoning on knowledge graphs, which is robust to the ambiguous and noisy data, but lacks interpretability compared to symbolic reasoning. Considering the advantages and disadvantages of both methodologies, recent efforts have been made on combining the two reasoning methods. In this survey, we take a thorough look at the development of the symbolic, neural and hybrid reasoning on knowledge graphs. We survey two specific reasoning tasks — knowledge graph completion and question answering on knowledge graphs, and explain them in a unified reasoning framework. We also briefly discuss the future directions for knowledge graph reasoning.

知识图推理是支持机器学习应用(如信息提取、信息检索和推荐)的基本组件。由于知识图可以看作是知识的离散符号表示，在知识图上进行推理可以自然地利用符号技术。然而，符号推理不能容忍模棱两可和有噪声的数据。相反，深度学习的最新进展促进了知识图上的神经推理，它对模糊和噪声数据具有鲁棒性，但与符号推理相比缺乏可解释性。考虑到这两种方法的优点和缺点，最近人们努力将这两种推理方法结合起来。在本调查中，我们将全面了解知识图上符号推理、神经推理和混合推理的发展。我们研究了知识图补全和知识图问答两种具体的推理任务，并在一个统一的推理框架中对它们进行了解释。我们还简要讨论了知识图推理的未来发展方向。

{"title":"Neural, symbolic and neural-symbolic reasoning on knowledge graphs","authors":"Jing Zhang, Bo Chen, Lingxi Zhang, Xirui Ke, Haipeng Ding","doi":"10.1016/j.aiopen.2021.03.001","DOIUrl":"10.1016/j.aiopen.2021.03.001","url":null,"abstract":"<div><p>Knowledge graph reasoning is the fundamental component to support machine learning applications such as information extraction, information retrieval, and recommendation. Since knowledge graphs can be viewed as the discrete symbolic representations of knowledge, reasoning on knowledge graphs can naturally leverage the symbolic techniques. However, symbolic reasoning is intolerant of the ambiguous and noisy data. On the contrary, the recent advances of deep learning have promoted neural reasoning on knowledge graphs, which is robust to the ambiguous and noisy data, but lacks interpretability compared to symbolic reasoning. Considering the advantages and disadvantages of both methodologies, recent efforts have been made on combining the two reasoning methods. In this survey, we take a thorough look at the development of the symbolic, neural and hybrid reasoning on knowledge graphs. We survey two specific reasoning tasks — knowledge graph completion and question answering on knowledge graphs, and explain them in a unified reasoning framework. We also briefly discuss the future directions for knowledge graph reasoning.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 14-35"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.03.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73071933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Structure-enhanced meta-learning for few-shot graph classification 基于结构增强元学习的少镜头图分类

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.08.001

Shunyu Jiang , Fuli Feng , Weijian Chen , Xiang Li , Xiangnan He

Graph classification is a highly impactful task that plays a crucial role in a myriad of real-world applications such as molecular property prediction and protein function prediction. Aiming to handle the new classes with limited labeled graphs, few-shot graph classification has become a bridge of existing graph classification solutions and practical usage. This work explores the potential of metric-based meta-learning for solving few-shot graph classification. We highlight the importance of considering structural characteristics in the solution and propose a novel framework which explicitly considers global structure and local structure of the input graph. An implementation upon GIN, named SMF-GIN, is tested on two datasets, Chembl and TRIANGLES, where extensive experiments validate the effectiveness of the proposed method. The Chembl is constructed to fill in the gap of lacking large-scale benchmark for few-shot graph classification evaluation, which is released together with the implementation of SMF-GIN at: https://github.com/jiangshunyu/SMF-GIN.

图分类是一项非常有影响力的任务，在分子性质预测和蛋白质功能预测等众多现实应用中起着至关重要的作用。为了处理具有有限标记图的新类，少射图分类已成为现有图分类解决方案与实际应用的桥梁。这项工作探索了基于度量的元学习在解决少镜头图分类方面的潜力。我们强调了在解决方案中考虑结构特征的重要性，并提出了一个明确考虑输入图的全局结构和局部结构的新框架。一个基于GIN的实现，命名为SMF-GIN，在Chembl和triangle两个数据集上进行了测试，其中大量的实验验证了所提出方法的有效性。构建Chembl是为了填补少射图分类评价缺乏大规模基准的空白，并与SMF-GIN的实现一起发布在:https://github.com/jiangshunyu/SMF-GIN。

{"title":"Structure-enhanced meta-learning for few-shot graph classification","authors":"Shunyu Jiang , Fuli Feng , Weijian Chen , Xiang Li , Xiangnan He","doi":"10.1016/j.aiopen.2021.08.001","DOIUrl":"10.1016/j.aiopen.2021.08.001","url":null,"abstract":"<div><p>Graph classification is a highly impactful task that plays a crucial role in a myriad of real-world applications such as molecular property prediction and protein function prediction. Aiming to handle the new classes with limited labeled graphs, few-shot graph classification has become a bridge of existing graph classification solutions and practical usage. This work explores the potential of metric-based meta-learning for solving few-shot graph classification. We highlight the importance of considering structural characteristics in the solution and propose a novel framework which explicitly considers <em>global structure</em> and <em>local structure</em> of the input graph. An implementation upon GIN, named SMF-GIN, is tested on two datasets, Chembl and TRIANGLES, where extensive experiments validate the effectiveness of the proposed method. The Chembl is constructed to fill in the gap of lacking large-scale benchmark for few-shot graph classification evaluation, which is released together with the implementation of SMF-GIN at: <span>https://github.com/jiangshunyu/SMF-GIN</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 160-167"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.08.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87440687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Advances and challenges in conversational recommender systems: A survey 会话式推荐系统的进展与挑战:综述

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.06.002

Chongming Gao , Wenqiang Lei , Xiangnan He , Maarten de Rijke , Tat-Seng Chua

Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs in five directions: (1) Question-based user preference elicitation. (2) Multi-turn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey can help to identify and address challenges in CRSs and inspire future research.

推荐系统利用交互历史来估计用户偏好，在广泛的工业应用中被大量使用。然而，由于固有的缺点，静态推荐模型很难很好地回答两个重要的问题:(a)用户到底喜欢什么?(b)用户为什么喜欢某件物品?缺点是由于静态模型学习用户偏好的方式，也就是说，没有明确的指示和来自用户的主动反馈。最近兴起的会话推荐系统(CRSs)从根本上改变了这种情况。在CRS中，用户和系统可以通过自然语言交互进行动态通信，这为显式获取用户的确切偏好提供了前所未有的机会。在不同的环境和应用中，已经投入了大量的精力来开发社会责任标准。现有的信用评级模型、技术和评估方法还很不成熟。在本文中，我们提供了一个系统的回顾，目前的CRSs使用的技术。我们总结了在五个方向上开发CRSs的主要挑战:(1)基于问题的用户偏好提取。(2)多回合会话推荐策略。(3)对话理解与生成。(4)开发-勘探权衡。(5)评价与用户模拟。这些研究方向涉及信息检索(IR)、自然语言处理(NLP)、人机交互(HCI)等多个领域。基于这些研究方向，我们讨论了未来的挑战和机遇。我们为来自多个社区的研究人员提供了在这一领域开始研究的路线图。我们希望这项调查可以帮助识别和解决CRSs中的挑战，并启发未来的研究。

{"title":"Advances and challenges in conversational recommender systems: A survey","authors":"Chongming Gao , Wenqiang Lei , Xiangnan He , Maarten de Rijke , Tat-Seng Chua","doi":"10.1016/j.aiopen.2021.06.002","DOIUrl":"10.1016/j.aiopen.2021.06.002","url":null,"abstract":"<div><p>Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs in five directions: (1) Question-based user preference elicitation. (2) Multi-turn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey can help to identify and address challenges in CRSs and inspire future research.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 100-126"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.06.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88248612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 162

Network representation learning: A macro and micro view 网络表示学习:宏观和微观的观点

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.02.001

Xueyi Liu , Jie Tang

Abstract

Graph is a universe data structure that is widely used to organize data in real-world. Various real-word networks like the transportation network, social and academic network can be represented by graphs. Recent years have witnessed the quick development on representing vertices in the network into a low-dimensional vector space, referred to as network representation learning. Representation learning can facilitate the design of new algorithms on the graph data. In this survey, we conduct a comprehensive review of current literature on network representation learning. Existing algorithms can be categorized into three groups: shallow embedding models, heterogeneous network embedding models, graph neural network based models. We review state-of-the-art algorithms for each category and discuss the essential differences between these algorithms. One advantage of the survey is that we systematically study the underlying theoretical foundations underlying the different categories of algorithms, which offers deep insights for better understanding the development of the network representation learning field.

摘要图是一种广泛用于组织数据的宇宙数据结构。各种现实世界的网络，如交通网络、社会网络和学术网络，都可以用图来表示。近年来，将网络中的顶点表示为低维向量空间的方法得到了迅速发展，称为网络表示学习。表示学习有助于在图数据上设计新的算法。在这项调查中，我们对网络表征学习的当前文献进行了全面的回顾。现有的算法可分为三类:浅嵌入模型、异构网络嵌入模型和基于图神经网络的模型。我们回顾了每个类别的最新算法，并讨论了这些算法之间的本质区别。该调查的一个优点是我们系统地研究了不同类别算法的潜在理论基础，这为更好地理解网络表示学习领域的发展提供了深刻的见解。

引用次数: 15

A comprehensive review on resolving ambiguities in natural language processing 自然语言处理中歧义的解决综述

AI Open

Pub Date : 2021-01-01 DOI: 10.1016/j.aiopen.2021.05.001

Apurwa Yadav , Aarshil Patel , Manan Shah

Natural language processing is a known technology behind the development of some widely known AI assistants such as: SIRI, Natasha, and Watson. However, NLP is a diverse technology used for numerous purposes. NLP based tools are widely used for disambiguation in requirement engineering which will be the primary focus of this paper. A requirement document is a medium for the user to deliver one's expectations from the software. Hence, an ambiguous requirement document may eventually lead to misconceptions in a software. Various tools are available for disambiguation in RE based on different techniques. In this paper, we analyzed different disambiguation tools in order to compare and evaluate them. In our survey, we noticed that even though some disambiguation tools reflect promising results and can supposedly be relied upon, they fail to completely eliminate the ambiguities. In order to avoid ambiguities, the requirement document has to be written using formal language, which is not preferred by users due to its lack of lucidity and readability. Nevertheless, some of the tools we mentioned in this paper are still under development and in future might become capable of eliminating ambiguities. In this paper, we attempt to analyze some existing research work and present an elaborative review of various disambiguation tools.

自然语言处理是一些广为人知的人工智能助手(如SIRI、娜塔莎和沃森)开发背后的一项已知技术。然而，NLP是一种用于多种目的的多样化技术。基于自然语言处理的工具广泛用于需求工程中的消歧，这将是本文的主要焦点。需求文档是用户从软件中交付期望的媒介。因此，模棱两可的需求文档可能最终导致软件中的误解。基于不同的技术，可使用各种工具来消除正则中的歧义。在本文中，我们分析了不同的消歧工具，以便比较和评价它们。在我们的调查中，我们注意到，尽管一些消歧工具反映了有希望的结果，并且可以被认为是可靠的，但它们不能完全消除歧义。为了避免歧义，需求文档必须使用形式语言编写，由于缺乏清晰性和可读性，用户不喜欢这种语言。然而，我们在本文中提到的一些工具仍在开发中，将来可能能够消除歧义。在本文中，我们试图分析一些现有的研究工作，并提出了各种消歧工具的详细综述。

{"title":"A comprehensive review on resolving ambiguities in natural language processing","authors":"Apurwa Yadav , Aarshil Patel , Manan Shah","doi":"10.1016/j.aiopen.2021.05.001","DOIUrl":"10.1016/j.aiopen.2021.05.001","url":null,"abstract":"<div><p>Natural language processing is a known technology behind the development of some widely known AI assistants such as: SIRI, Natasha, and Watson. However, NLP is a diverse technology used for numerous purposes. NLP based tools are widely used for disambiguation in requirement engineering which will be the primary focus of this paper. A requirement document is a medium for the user to deliver one's expectations from the software. Hence, an ambiguous requirement document may eventually lead to misconceptions in a software. Various tools are available for disambiguation in RE based on different techniques. In this paper, we analyzed different disambiguation tools in order to compare and evaluate them. In our survey, we noticed that even though some disambiguation tools reflect promising results and can supposedly be relied upon, they fail to completely eliminate the ambiguities. In order to avoid ambiguities, the requirement document has to be written using formal language, which is not preferred by users due to its lack of lucidity and readability. Nevertheless, some of the tools we mentioned in this paper are still under development and in future might become capable of eliminating ambiguities. In this paper, we attempt to analyze some existing research work and present an elaborative review of various disambiguation tools.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 85-92"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.05.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84363444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11