In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely infmoe, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of infmoe when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at https://github.com/TsinghuaAI/CPM.
{"title":"CPM-2: Large-scale cost-effective pre-trained language models","authors":"Zhengyan Zhang , Yuxian Gu , Xu Han , Shengqi Chen , Chaojun Xiao , Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun","doi":"10.1016/j.aiopen.2021.12.003","DOIUrl":"10.1016/j.aiopen.2021.12.003","url":null,"abstract":"<div><p>In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely <span>infmoe</span>, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of <span>infmoe</span> when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at <span>https://github.com/TsinghuaAI/CPM</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 216-224"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000310/pdfft?md5=46efc536c128aefd0ff69139f8627ddb&pid=1-s2.0-S2666651021000310-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90204116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.07.003
Tiancui Zhang , Xiaoliang Chen , Yajun Du , Xianyong Li
Information propagation models in the Weibo network play a primary role in analyzing user behaviors, obtaining the propagation paths, determining the opinion leaders, and discovering the hot spots of public opinion. Existing research recognizes the critical role played by information propagation models from different aspects. However, few studies have investigated the specific details of information propagation in any systematic way. Spiking neural P (SNP, for short) systems are one of the most potential research carriers of information propagation by applying their concurrent structures and asynchronous firing rules. This paper proposes a simple and intuitive SNP variant, namely DWIP-SNP, for user behavior analysis in Weibo. The fundamental objects of information propagation in Weibo are represented by a similar SNP formalization. The forward, comment, delete, and other users’ behaviors in the Weibo network can be observed and proceeded more intuitively. Then, the DWIP-SNP systems are combined with time delays to indicate the dynamic information diffusion from the perspective of the Bio-computing systems. Finally, a real-world example of information propagation with Weibo data set is utilized to verify the effectiveness and feasibility of the model. The insights of the DWIP-SNP based propagation model gained from this study may be of assistance to user behavior understanding and information propagation in other complex networks.
{"title":"The information propagation model of Weibo network based on spiking neural P systems","authors":"Tiancui Zhang , Xiaoliang Chen , Yajun Du , Xianyong Li","doi":"10.1016/j.aiopen.2021.07.003","DOIUrl":"10.1016/j.aiopen.2021.07.003","url":null,"abstract":"<div><p>Information propagation models in the Weibo network play a primary role in analyzing user behaviors, obtaining the propagation paths, determining the opinion leaders, and discovering the hot spots of public opinion. Existing research recognizes the critical role played by information propagation models from different aspects. However, few studies have investigated the specific details of information propagation in any systematic way. Spiking neural P (SNP, for short) systems are one of the most potential research carriers of information propagation by applying their concurrent structures and asynchronous firing rules. This paper proposes a simple and intuitive SNP variant, namely DWIP-SNP, for user behavior analysis in Weibo. The fundamental objects of information propagation in Weibo are represented by a similar SNP formalization. The forward, comment, delete, and other users’ behaviors in the Weibo network can be observed and proceeded more intuitively. Then, the DWIP-SNP systems are combined with time delays to indicate the dynamic information diffusion from the perspective of the Bio-computing systems. Finally, a real-world example of information propagation with Weibo data set is utilized to verify the effectiveness and feasibility of the model. The insights of the DWIP-SNP based propagation model gained from this study may be of assistance to user behavior understanding and information propagation in other complex networks.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 135-142"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.07.003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79850721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.05.002
Jiarong Xu, Junru Chen, Siqi You, Zhiqing Xiao, Yang Yang, Jiangang Lu
Machine learning (ML) technologies have achieved significant success in various downstream tasks, e.g., node classification, link prediction, community detection, graph classification and graph clustering. However, many studies have shown that the models built upon ML technologies are vulnerable to noises and adversarial attacks. A number of works have studied the robust models against noise or adversarial examples in image domains and text processing domains, however, it is more challenging to learn robust models in graph domains. Adding noises or perturbations on graph data will make the robustness even harder to enhance – the noises and perturbations of edges or node attributes are easy to propagate to other neighbors via the relational information on a graph. In this paper, we investigate and summarize the existing works that study the robust deep learning models against adversarial attacks or noises on graphs, namely the robust learning (models) on graphs. Specifically, we first provide some robustness evaluation metrics of model robustness on graphs. Then, we comprehensively provide a taxonomy which groups robust models on graphs into five categories: anomaly detection, adversarial training, pre-processing, attention mechanism, and certifiable robustness. Besides, we emphasize some promising future directions in learning robust models on graphs. Hopefully, our works can offer insights for the relevant researchers, thus providing assistance for their studies.
{"title":"Robustness of deep learning models on graphs: A survey","authors":"Jiarong Xu, Junru Chen, Siqi You, Zhiqing Xiao, Yang Yang, Jiangang Lu","doi":"10.1016/j.aiopen.2021.05.002","DOIUrl":"10.1016/j.aiopen.2021.05.002","url":null,"abstract":"<div><p>Machine learning (ML) technologies have achieved significant success in various downstream tasks, e.g., node classification, link prediction, community detection, graph classification and graph clustering. However, many studies have shown that the models built upon ML technologies are vulnerable to noises and adversarial attacks. A number of works have studied the robust models against noise or adversarial examples in image domains and text processing domains, however, it is more challenging to learn robust models in graph domains. Adding noises or perturbations on graph data will make the robustness even harder to enhance – the noises and perturbations of edges or node attributes are easy to propagate to other neighbors via the relational information on a graph. In this paper, we investigate and summarize the existing works that study the robust deep learning models against adversarial attacks or noises on graphs, namely the robust learning (models) on graphs. Specifically, we first provide some robustness evaluation metrics of model robustness on graphs. Then, we comprehensively provide a taxonomy which groups robust models on graphs into five categories: anomaly detection, adversarial training, pre-processing, attention mechanism, and certifiable robustness. Besides, we emphasize some promising future directions in learning robust models on graphs. Hopefully, our works can offer insights for the relevant researchers, thus providing assistance for their studies.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 69-78"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.05.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78272915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.09.002
Bingshan Zhu , Yang Yu , Mingying Zhang , Haopeng Ren , Canguang Li , Wenjian Hao , Lixi Wang , Yi Cai
Extracting entity and relation jointly is often complicated since the relational triplets may be overlapped. In this paper, we propose a novel unified joint extraction model that considers the significant information which is useful for relation extraction between a pair of entities. We also consider bidirectional interaction between named entity recognition and relation extraction. To this end, we apply Bi-LSTM to capture sequential information and use Graph Convolutional Network to capture significant regional information in our encoding part. We use multi-layer structure in decoding part including first decode layer, interactive layer and final decode layer to fuse bidirectional interactive information between named entity recognition and relation extraction. In this way, our method can simultaneously extract all entities and their relations including overlapping relations. Experimental results show that our model performs better comparing with other baseline models in this task, and we achieve state-of-the-art performance on two public datasets.
{"title":"Incorporating bidirectional interactive information and regional features for relational facts extraction","authors":"Bingshan Zhu , Yang Yu , Mingying Zhang , Haopeng Ren , Canguang Li , Wenjian Hao , Lixi Wang , Yi Cai","doi":"10.1016/j.aiopen.2021.09.002","DOIUrl":"10.1016/j.aiopen.2021.09.002","url":null,"abstract":"<div><p>Extracting entity and relation jointly is often complicated since the relational triplets may be overlapped. In this paper, we propose a novel unified joint extraction model that considers the significant information which is useful for relation extraction between a pair of entities. We also consider bidirectional interaction between named entity recognition and relation extraction. To this end, we apply Bi-LSTM to capture sequential information and use Graph Convolutional Network to capture significant regional information in our encoding part. We use multi-layer structure in decoding part including first decode layer, interactive layer and final decode layer to fuse bidirectional interactive information between named entity recognition and relation extraction. In this way, our method can simultaneously extract all entities and their relations including overlapping relations. Experimental results show that our model performs better comparing with other baseline models in this task, and we achieve state-of-the-art performance on two public datasets.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 175-185"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000255/pdfft?md5=97db58ca1e40caebd6ee57606b699005&pid=1-s2.0-S2666651021000255-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84724666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.08.002
Xu Han , Zhengyan Zhang , Ning Ding , Yuxian Gu , Xiao Liu , Yuqi Huo , Jiezhong Qiu , Yuan Yao , Ao Zhang , Liang Zhang , Wentao Han , Minlie Huang , Qin Jin , Yanyan Lan , Yang Liu , Zhiyuan Liu , Zhiwu Lu , Xipeng Qiu , Ruihua Song , Jie Tang , Jun Zhu
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.
{"title":"Pre-trained models: Past, present and future","authors":"Xu Han , Zhengyan Zhang , Ning Ding , Yuxian Gu , Xiao Liu , Yuqi Huo , Jiezhong Qiu , Yuan Yao , Ao Zhang , Liang Zhang , Wentao Han , Minlie Huang , Qin Jin , Yanyan Lan , Yang Liu , Zhiyuan Liu , Zhiwu Lu , Xipeng Qiu , Ruihua Song , Jie Tang , Jun Zhu","doi":"10.1016/j.aiopen.2021.08.002","DOIUrl":"10.1016/j.aiopen.2021.08.002","url":null,"abstract":"<div><p>Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 225-250"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.08.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76058793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.03.001
Jing Zhang, Bo Chen, Lingxi Zhang, Xirui Ke, Haipeng Ding
Knowledge graph reasoning is the fundamental component to support machine learning applications such as information extraction, information retrieval, and recommendation. Since knowledge graphs can be viewed as the discrete symbolic representations of knowledge, reasoning on knowledge graphs can naturally leverage the symbolic techniques. However, symbolic reasoning is intolerant of the ambiguous and noisy data. On the contrary, the recent advances of deep learning have promoted neural reasoning on knowledge graphs, which is robust to the ambiguous and noisy data, but lacks interpretability compared to symbolic reasoning. Considering the advantages and disadvantages of both methodologies, recent efforts have been made on combining the two reasoning methods. In this survey, we take a thorough look at the development of the symbolic, neural and hybrid reasoning on knowledge graphs. We survey two specific reasoning tasks — knowledge graph completion and question answering on knowledge graphs, and explain them in a unified reasoning framework. We also briefly discuss the future directions for knowledge graph reasoning.
{"title":"Neural, symbolic and neural-symbolic reasoning on knowledge graphs","authors":"Jing Zhang, Bo Chen, Lingxi Zhang, Xirui Ke, Haipeng Ding","doi":"10.1016/j.aiopen.2021.03.001","DOIUrl":"10.1016/j.aiopen.2021.03.001","url":null,"abstract":"<div><p>Knowledge graph reasoning is the fundamental component to support machine learning applications such as information extraction, information retrieval, and recommendation. Since knowledge graphs can be viewed as the discrete symbolic representations of knowledge, reasoning on knowledge graphs can naturally leverage the symbolic techniques. However, symbolic reasoning is intolerant of the ambiguous and noisy data. On the contrary, the recent advances of deep learning have promoted neural reasoning on knowledge graphs, which is robust to the ambiguous and noisy data, but lacks interpretability compared to symbolic reasoning. Considering the advantages and disadvantages of both methodologies, recent efforts have been made on combining the two reasoning methods. In this survey, we take a thorough look at the development of the symbolic, neural and hybrid reasoning on knowledge graphs. We survey two specific reasoning tasks — knowledge graph completion and question answering on knowledge graphs, and explain them in a unified reasoning framework. We also briefly discuss the future directions for knowledge graph reasoning.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 14-35"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.03.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73071933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.08.001
Shunyu Jiang , Fuli Feng , Weijian Chen , Xiang Li , Xiangnan He
Graph classification is a highly impactful task that plays a crucial role in a myriad of real-world applications such as molecular property prediction and protein function prediction. Aiming to handle the new classes with limited labeled graphs, few-shot graph classification has become a bridge of existing graph classification solutions and practical usage. This work explores the potential of metric-based meta-learning for solving few-shot graph classification. We highlight the importance of considering structural characteristics in the solution and propose a novel framework which explicitly considers global structure and local structure of the input graph. An implementation upon GIN, named SMF-GIN, is tested on two datasets, Chembl and TRIANGLES, where extensive experiments validate the effectiveness of the proposed method. The Chembl is constructed to fill in the gap of lacking large-scale benchmark for few-shot graph classification evaluation, which is released together with the implementation of SMF-GIN at: https://github.com/jiangshunyu/SMF-GIN.
{"title":"Structure-enhanced meta-learning for few-shot graph classification","authors":"Shunyu Jiang , Fuli Feng , Weijian Chen , Xiang Li , Xiangnan He","doi":"10.1016/j.aiopen.2021.08.001","DOIUrl":"10.1016/j.aiopen.2021.08.001","url":null,"abstract":"<div><p>Graph classification is a highly impactful task that plays a crucial role in a myriad of real-world applications such as molecular property prediction and protein function prediction. Aiming to handle the new classes with limited labeled graphs, few-shot graph classification has become a bridge of existing graph classification solutions and practical usage. This work explores the potential of metric-based meta-learning for solving few-shot graph classification. We highlight the importance of considering structural characteristics in the solution and propose a novel framework which explicitly considers <em>global structure</em> and <em>local structure</em> of the input graph. An implementation upon GIN, named SMF-GIN, is tested on two datasets, Chembl and TRIANGLES, where extensive experiments validate the effectiveness of the proposed method. The Chembl is constructed to fill in the gap of lacking large-scale benchmark for few-shot graph classification evaluation, which is released together with the implementation of SMF-GIN at: <span>https://github.com/jiangshunyu/SMF-GIN</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 160-167"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.08.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87440687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.06.002
Chongming Gao , Wenqiang Lei , Xiangnan He , Maarten de Rijke , Tat-Seng Chua
Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs in five directions: (1) Question-based user preference elicitation. (2) Multi-turn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey can help to identify and address challenges in CRSs and inspire future research.
{"title":"Advances and challenges in conversational recommender systems: A survey","authors":"Chongming Gao , Wenqiang Lei , Xiangnan He , Maarten de Rijke , Tat-Seng Chua","doi":"10.1016/j.aiopen.2021.06.002","DOIUrl":"10.1016/j.aiopen.2021.06.002","url":null,"abstract":"<div><p>Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs in five directions: (1) Question-based user preference elicitation. (2) Multi-turn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey can help to identify and address challenges in CRSs and inspire future research.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 100-126"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.06.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88248612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.02.001
Xueyi Liu , Jie Tang
Abstract
Graph is a universe data structure that is widely used to organize data in real-world. Various real-word networks like the transportation network, social and academic network can be represented by graphs. Recent years have witnessed the quick development on representing vertices in the network into a low-dimensional vector space, referred to as network representation learning. Representation learning can facilitate the design of new algorithms on the graph data. In this survey, we conduct a comprehensive review of current literature on network representation learning. Existing algorithms can be categorized into three groups: shallow embedding models, heterogeneous network embedding models, graph neural network based models. We review state-of-the-art algorithms for each category and discuss the essential differences between these algorithms. One advantage of the survey is that we systematically study the underlying theoretical foundations underlying the different categories of algorithms, which offers deep insights for better understanding the development of the network representation learning field.
{"title":"Network representation learning: A macro and micro view","authors":"Xueyi Liu , Jie Tang","doi":"10.1016/j.aiopen.2021.02.001","DOIUrl":"10.1016/j.aiopen.2021.02.001","url":null,"abstract":"<div><p>Abstract</p><p>Graph is a universe data structure that is widely used to organize data in real-world. Various real-word networks like the transportation network, social and academic network can be represented by graphs. Recent years have witnessed the quick development on representing vertices in the network into a low-dimensional vector space, referred to as network representation learning. Representation learning can facilitate the design of new algorithms on the graph data. In this survey, we conduct a comprehensive review of current literature on network representation learning. Existing algorithms can be categorized into three groups: shallow embedding models, heterogeneous network embedding models, graph neural network based models. We review state-of-the-art algorithms for each category and discuss the essential differences between these algorithms. One advantage of the survey is that we systematically study the underlying theoretical foundations underlying the different categories of algorithms, which offers deep insights for better understanding the development of the network representation learning field.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 43-64"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.02.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89127453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.05.001
Apurwa Yadav , Aarshil Patel , Manan Shah
Natural language processing is a known technology behind the development of some widely known AI assistants such as: SIRI, Natasha, and Watson. However, NLP is a diverse technology used for numerous purposes. NLP based tools are widely used for disambiguation in requirement engineering which will be the primary focus of this paper. A requirement document is a medium for the user to deliver one's expectations from the software. Hence, an ambiguous requirement document may eventually lead to misconceptions in a software. Various tools are available for disambiguation in RE based on different techniques. In this paper, we analyzed different disambiguation tools in order to compare and evaluate them. In our survey, we noticed that even though some disambiguation tools reflect promising results and can supposedly be relied upon, they fail to completely eliminate the ambiguities. In order to avoid ambiguities, the requirement document has to be written using formal language, which is not preferred by users due to its lack of lucidity and readability. Nevertheless, some of the tools we mentioned in this paper are still under development and in future might become capable of eliminating ambiguities. In this paper, we attempt to analyze some existing research work and present an elaborative review of various disambiguation tools.
{"title":"A comprehensive review on resolving ambiguities in natural language processing","authors":"Apurwa Yadav , Aarshil Patel , Manan Shah","doi":"10.1016/j.aiopen.2021.05.001","DOIUrl":"10.1016/j.aiopen.2021.05.001","url":null,"abstract":"<div><p>Natural language processing is a known technology behind the development of some widely known AI assistants such as: SIRI, Natasha, and Watson. However, NLP is a diverse technology used for numerous purposes. NLP based tools are widely used for disambiguation in requirement engineering which will be the primary focus of this paper. A requirement document is a medium for the user to deliver one's expectations from the software. Hence, an ambiguous requirement document may eventually lead to misconceptions in a software. Various tools are available for disambiguation in RE based on different techniques. In this paper, we analyzed different disambiguation tools in order to compare and evaluate them. In our survey, we noticed that even though some disambiguation tools reflect promising results and can supposedly be relied upon, they fail to completely eliminate the ambiguities. In order to avoid ambiguities, the requirement document has to be written using formal language, which is not preferred by users due to its lack of lucidity and readability. Nevertheless, some of the tools we mentioned in this paper are still under development and in future might become capable of eliminating ambiguities. In this paper, we attempt to analyze some existing research work and present an elaborative review of various disambiguation tools.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 85-92"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.05.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84363444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}