Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.001
Qingyao Ai , Ting Bai , Zhao Cao , Yi Chang , Jiawei Chen , Zhumin Chen , Zhiyong Cheng , Shoubin Dong , Zhicheng Dou , Fuli Feng , Shen Gao , Jiafeng Guo , Xiangnan He , Yanyan Lan , Chenliang Li , Yiqun Liu , Ziyu Lyu , Weizhi Ma , Jun Ma , Zhaochun Ren , Xiaofei Zhu
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop’s outcomes, including the rethinking of IR’s core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.
{"title":"Information Retrieval meets Large Language Models: A strategic report from Chinese IR community","authors":"Qingyao Ai , Ting Bai , Zhao Cao , Yi Chang , Jiawei Chen , Zhumin Chen , Zhiyong Cheng , Shoubin Dong , Zhicheng Dou , Fuli Feng , Shen Gao , Jiafeng Guo , Xiangnan He , Yanyan Lan , Chenliang Li , Yiqun Liu , Ziyu Lyu , Weizhi Ma , Jun Ma , Zhaochun Ren , Xiaofei Zhu","doi":"10.1016/j.aiopen.2023.08.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.001","url":null,"abstract":"<div><p>The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop’s outcomes, including the rethinking of IR’s core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 80-90"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.002
Rui Feng , Qi Ding , Weihao Qiu , Xiao Yang , Yang yang , Chunping Wang
Traditional network embedding aims to learn representations by capturing a predefined vertex-to-vertex similarity measure. However, in practice, there are different types of similarity measures (e.g., connectivity and structural similarity), which are appropriate for different downstream applications. Meanwhile, it is hard to select the “best” similarity measure that can mostly benefit the application, considering the required domain knowledge of both application scenario and network science. It sometimes requires to cooperate these similarity measures with each other for achieving better performance. Therefore, automatically integrate multiple types of similarity measures into a uniform network embedding framework is critical to obtain effective vertex representations for a downstream application. In this paper, we address the above problem in social networks, and propose a semi-supervised representation learning algorithm. The general idea of our approach is to impose social influence, which occurs when one’s opinions, emotions, or behaviors are affected by others in a social network. Particularly, we build the connection between a user’s representation vector and the probability of her being influenced by another user to have a particular label (e.g., fraud, personal interest, etc.). We conduct efficient experiments based on six real-world datasets and find a clear improvement of our approach comparing with several state-of-the-art baselines.
{"title":"A unified network embedding algorithm for multi-type similarity measures","authors":"Rui Feng , Qi Ding , Weihao Qiu , Xiao Yang , Yang yang , Chunping Wang","doi":"10.1016/j.aiopen.2023.08.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.002","url":null,"abstract":"<div><p>Traditional network embedding aims to learn <em>representations</em> by capturing a predefined <em>vertex-to-vertex similarity measure</em>. However, in practice, there are different types of similarity measures (e.g., <em>connectivity</em> and <em>structural similarity</em>), which are appropriate for different downstream applications. Meanwhile, it is hard to select the “best” similarity measure that can mostly benefit the application, considering the required domain knowledge of both application scenario and network science. It sometimes requires to cooperate these similarity measures with each other for achieving better performance. Therefore, automatically integrate multiple types of similarity measures into a uniform network embedding framework is critical to obtain effective vertex representations for a downstream application. In this paper, we address the above problem in social networks, and propose a <em>semi-supervised</em> representation learning algorithm. The general idea of our approach is to impose <em>social influence</em>, which occurs when one’s opinions, emotions, or behaviors are affected by others in a social network. Particularly, we build the connection between a user’s representation vector and the probability of her being influenced by another user to have a particular label (e.g., fraud, personal interest, etc.). We conduct efficient experiments based on six real-world datasets and find a clear improvement of our approach comparing with several state-of-the-art baselines.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 64-72"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.10.004
Liner Yang , Xin Liu , Tianxin Liao , Zhenghao Liu , Mengyan Wang , Xuezhi Fang , Erhong Yang
The task of Chinese Spelling Check (CSC) is crucial for identifying and rectifying spelling errors in Chinese texts. While prior work in this domain has predominantly relied on benchmarks such as SIGHAN for evaluating model performance, these benchmarks often exhibit an imbalanced distribution of spelling errors. They are typically constructed under idealized conditions, presuming the presence of only spelling errors in the input text. This assumption does not hold in real-world scenarios, where spell checkers frequently encounter a mix of spelling and grammatical errors, thereby presenting additional challenges. To address this gap and create a more realistic testing environment, we introduce a high-quality CSC evaluation benchmark named YACSC (Yet Another Chinese Spelling Check Dataset). YACSC is unique in that it includes annotations for both grammatical and spelling errors, rendering it a more reliable benchmark for CSC tasks. Furthermore, we propose a hierarchical network designed to integrate multidimensional information, leveraging semantic and phonetic aspects, as well as the structural forms of Chinese characters, to enhance the detection and correction of spelling errors. Through extensive experiments, we evaluate the limitations of existing CSC benchmarks and illustrate the application of our proposed system in real-world scenarios, particularly as a preliminary stage in writing assistant systems.
汉语拼写检查是识别和纠正汉语文本拼写错误的关键任务。虽然该领域的先前工作主要依赖于诸如SIGHAN之类的基准来评估模型性能,但这些基准通常表现出拼写错误的不平衡分布。它们通常是在理想条件下构建的,假设输入文本中只存在拼写错误。这种假设在实际场景中并不成立,在实际场景中,拼写检查器经常遇到拼写和语法错误,从而带来额外的挑战。为了解决这一差距并创造一个更现实的测试环境,我们引入了一个高质量的CSC评估基准,名为YACSC (Yet Another Chinese Spelling Check Dataset)。YACSC的独特之处在于它包含语法和拼写错误的注释,使其成为CSC任务更可靠的基准。此外,我们提出了一种分层网络,旨在整合多维信息,利用语义和语音方面,以及汉字的结构形式,以提高拼写错误的检测和纠正。通过广泛的实验,我们评估了现有CSC基准的局限性,并说明了我们提出的系统在现实场景中的应用,特别是作为写作辅助系统的初步阶段。
{"title":"Is Chinese Spelling Check ready? Understanding the correction behavior in real-world scenarios","authors":"Liner Yang , Xin Liu , Tianxin Liao , Zhenghao Liu , Mengyan Wang , Xuezhi Fang , Erhong Yang","doi":"10.1016/j.aiopen.2023.10.004","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.004","url":null,"abstract":"<div><p>The task of Chinese Spelling Check (CSC) is crucial for identifying and rectifying spelling errors in Chinese texts. While prior work in this domain has predominantly relied on benchmarks such as SIGHAN for evaluating model performance, these benchmarks often exhibit an imbalanced distribution of spelling errors. They are typically constructed under idealized conditions, presuming the presence of only spelling errors in the input text. This assumption does not hold in real-world scenarios, where spell checkers frequently encounter a mix of spelling and grammatical errors, thereby presenting additional challenges. To address this gap and create a more realistic testing environment, we introduce a high-quality CSC evaluation benchmark named YACSC (Yet Another Chinese Spelling Check Dataset). YACSC is unique in that it includes annotations for both grammatical and spelling errors, rendering it a more reliable benchmark for CSC tasks. Furthermore, we propose a hierarchical network designed to integrate multidimensional information, leveraging semantic and phonetic aspects, as well as the structural forms of Chinese characters, to enhance the detection and correction of spelling errors. Through extensive experiments, we evaluate the limitations of existing CSC benchmarks and illustrate the application of our proposed system in real-world scenarios, particularly as a preliminary stage in writing assistant systems.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 183-192"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000207/pdfft?md5=74aa1bdba96c30d73a25c1dde4472205&pid=1-s2.0-S2666651023000207-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134657198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2022.12.003
Lingxi Zhang , Jing Zhang , Xirui Ke , Haoyang Li , Xinmei Huang , Zhonghui Shao , Shulin Cao , Xin Lv
Answering complex factual questions has drawn a lot of attention. Researchers leverage various data sources to support complex QA, such as unstructured texts, structured knowledge graphs and relational databases, semi-structured web tables, or even hybrid data sources. However, although the ideas behind these approaches show similarity to some extent, there is not yet a consistent strategy to deal with various data sources. In this survey, we carefully examine how complex factual question answering has evolved across various data sources. We list the similarities among these approaches and group them into the analysis–extend–reason framework, despite the various question types and data sources that they focus on. We also address future directions for difficult factual question answering as well as the relevant benchmarks.
{"title":"A survey on complex factual question answering","authors":"Lingxi Zhang , Jing Zhang , Xirui Ke , Haoyang Li , Xinmei Huang , Zhonghui Shao , Shulin Cao , Xin Lv","doi":"10.1016/j.aiopen.2022.12.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.12.003","url":null,"abstract":"<div><p>Answering complex factual questions has drawn a lot of attention. Researchers leverage various data sources to support complex QA, such as unstructured texts, structured knowledge graphs and relational databases, semi-structured web tables, or even hybrid data sources. However, although the ideas behind these approaches show similarity to some extent, there is not yet a consistent strategy to deal with various data sources. In this survey, we carefully examine how complex factual question answering has evolved across various data sources. We list the similarities among these approaches and group them into the analysis–extend–reason framework, despite the various question types and data sources that they focus on. We also address future directions for difficult factual question answering as well as the relevant benchmarks.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 1-12"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.006
Nazar Zaki , Wenjian Qin , Anusuya Krishnan
Cervical cancer remains a significant health concern worldwide, where precise segmentation of cervical lesions is integral for effective diagnosis and treatment planning. This systematic review critically evaluates the application of graph-based methodologies for cervical cancer segmentation, identifying their potential, drawbacks, and avenues for future development. An exhaustive literature search across Scopus and PubMed databases resulted in 20 pertinent studies. These studies were assessed focusing on their implementation of graph-based techniques for cervical cancer segmentation, the utilized datasets, evaluation metrics, and reported precision levels. The review highlights the progressive strides made in the field, especially regarding the segmentation of intricate, non-convex regions and facilitating the detection and grading of cervical cancer using graph-based methodologies. Nonetheless, several constraints were evident, including a dearth of comparative performance analysis, reliance on high-resolution images, difficulties in specific boundary delineation, and the imperative for additional validation and diversified datasets. The review suggests future work to integrate advanced deep learning strategies for heightened accuracy, formulate hybrid methodologies to counteract existing limitations, and explore multi-modal fusion to boost segmentation precision. Emphasizing the explainability and interpretability of outcomes also stands paramount. Lastly, addressing critical challenges such as scarcity of annotated data, the need for real-time and interactive segmentation, and the segmentation of multiple objects or regions of interest remains a crucial frontier for future endeavors.
{"title":"Graph-based methods for cervical cancer segmentation: Advancements, limitations, and future directions","authors":"Nazar Zaki , Wenjian Qin , Anusuya Krishnan","doi":"10.1016/j.aiopen.2023.08.006","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.006","url":null,"abstract":"<div><p>Cervical cancer remains a significant health concern worldwide, where precise segmentation of cervical lesions is integral for effective diagnosis and treatment planning. This systematic review critically evaluates the application of graph-based methodologies for cervical cancer segmentation, identifying their potential, drawbacks, and avenues for future development. An exhaustive literature search across Scopus and PubMed databases resulted in 20 pertinent studies. These studies were assessed focusing on their implementation of graph-based techniques for cervical cancer segmentation, the utilized datasets, evaluation metrics, and reported precision levels. The review highlights the progressive strides made in the field, especially regarding the segmentation of intricate, non-convex regions and facilitating the detection and grading of cervical cancer using graph-based methodologies. Nonetheless, several constraints were evident, including a dearth of comparative performance analysis, reliance on high-resolution images, difficulties in specific boundary delineation, and the imperative for additional validation and diversified datasets. The review suggests future work to integrate advanced deep learning strategies for heightened accuracy, formulate hybrid methodologies to counteract existing limitations, and explore multi-modal fusion to boost segmentation precision. Emphasizing the explainability and interpretability of outcomes also stands paramount. Lastly, addressing critical challenges such as scarcity of annotated data, the need for real-time and interactive segmentation, and the segmentation of multiple objects or regions of interest remains a crucial frontier for future endeavors.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 42-55"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.12.001
Hadi Abdine , Moussa Kamal Eddine , Davide Buscaldi , Michalis Vazirgiannis
Word sense induction (WSI) is a challenging problem in natural language processing that involves the unsupervised automatic detection of a word’s senses (i.e., meanings). Recent work achieves significant results on the WSI task by pre-training a language model that can exclusively disambiguate word senses. In contrast, others employ off-the-shelf pre-trained language models with additional strategies to induce senses. This paper proposes a novel unsupervised method based on hierarchical clustering and invariant information clustering (IIC). The IIC loss is used to train a small model to optimize the mutual information between two vector representations of a target word occurring in a pair of synthetic paraphrases. This model is later used in inference mode to extract a higher-quality vector representation to be used in the hierarchical clustering. We evaluate our method on two WSI tasks and in two distinct clustering configurations (fixed and dynamic number of clusters). We empirically show that our approach is at least on par with the state-of-the-art baselines, outperforming them in several configurations. The code and data to reproduce this work are available to the public1.
词义归纳(WSI)是自然语言处理中一个具有挑战性的问题,它涉及在无监督的情况下自动检测一个词的词义(即含义)。最近的研究通过预训练一个语言模型,该模型可以专门用于词义消歧,从而在词义归纳任务中取得了显著的成果。与此相反,其他研究则采用现成的预训练语言模型,并增加了诱导词义的策略。本文提出了一种基于分层聚类和不变信息聚类(IIC)的新型无监督方法。IIC 损失用于训练一个小型模型,以优化一对合成意译中出现的目标词的两个向量表示之间的互信息。该模型随后将用于推理模式,以提取更高质量的向量表示,用于分层聚类。我们在两个 WSI 任务和两种不同的聚类配置(固定聚类数和动态聚类数)中对我们的方法进行了评估。我们的经验表明,我们的方法至少与最先进的基线方法不相上下,在几种配置中的表现都优于它们。重现这项工作的代码和数据可公开获取1。
{"title":"Word sense induction with agglomerative clustering and mutual information maximization","authors":"Hadi Abdine , Moussa Kamal Eddine , Davide Buscaldi , Michalis Vazirgiannis","doi":"10.1016/j.aiopen.2023.12.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.12.001","url":null,"abstract":"<div><p>Word sense induction (WSI) is a challenging problem in natural language processing that involves the unsupervised automatic detection of a word’s senses (i.e., meanings). Recent work achieves significant results on the WSI task by pre-training a language model that can exclusively disambiguate word senses. In contrast, others employ off-the-shelf pre-trained language models with additional strategies to induce senses. This paper proposes a novel unsupervised method based on hierarchical clustering and invariant information clustering (IIC). The IIC loss is used to train a small model to optimize the mutual information between two vector representations of a target word occurring in a pair of synthetic paraphrases. This model is later used in inference mode to extract a higher-quality vector representation to be used in the hierarchical clustering. We evaluate our method on two WSI tasks and in two distinct clustering configurations (fixed and dynamic number of clusters). We empirically show that our approach is at least on par with the state-of-the-art baselines, outperforming them in several configurations. The code and data to reproduce this work are available to the public<span><sup>1</sup></span>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 193-201"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000232/pdfft?md5=a0553e94f2fab365fb751bcc0ddf8e6c&pid=1-s2.0-S2666651023000232-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138570139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.009
Wenlong Fang, Yongbin Liu, Chunping Ouyang, Lin Ren, Jiale Li, Yaping Wan
Few-shot Named Entity Recognition (NER) is a challenging task that involves identifying new entity types using a limited number of labeled instances for training. Currently, the majority of Few-shot NER methods are based on span, which pay more attention to the boundary information of the spans as candidate entities and the entity-level information. However, these methods often overlook token-level semantic information, which can limit their effectiveness. To address this issue, we propose a novel Joint Span and Token (JST) framework that integrates both the boundary information of an entity and the semantic information of each token that comprises an entity. The JST framework employs span features to extract the boundary features of the entity and token features to extract the semantic features of each token. Additionally, to reduce the negative impact of the Other class, we introduce a method to separate named entities from the Other class in semantic space, which helps to improve the distinction between entities and the Other class. In addition, we used GPT to do data augmentation on the support sentences, generating similar sentences to the original ones. These sentences increase the diversity of the sample and the reliability of our model. Our experimental results on the Few-NERD1 and SNIPS2 datasets demonstrate that our model outperforms existing methods in terms of performance.
{"title":"Joint span and token framework for few-shot named entity recognition","authors":"Wenlong Fang, Yongbin Liu, Chunping Ouyang, Lin Ren, Jiale Li, Yaping Wan","doi":"10.1016/j.aiopen.2023.08.009","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.009","url":null,"abstract":"<div><p>Few-shot Named Entity Recognition (NER) is a challenging task that involves identifying new entity types using a limited number of labeled instances for training. Currently, the majority of Few-shot NER methods are based on span, which pay more attention to the boundary information of the spans as candidate entities and the entity-level information. However, these methods often overlook token-level semantic information, which can limit their effectiveness. To address this issue, we propose a novel Joint Span and Token (<strong>JST</strong>) framework that integrates both the boundary information of an entity and the semantic information of each token that comprises an entity. The <strong>JST</strong> framework employs span features to extract the boundary features of the entity and token features to extract the semantic features of each token. Additionally, to reduce the negative impact of the Other class, we introduce a method to separate named entities from the Other class in semantic space, which helps to improve the distinction between entities and the Other class. In addition, we used GPT to do data augmentation on the support sentences, generating similar sentences to the original ones. These sentences increase the diversity of the sample and the reliability of our model. Our experimental results on the Few-NERD<span><sup>1</sup></span> and SNIPS<span><sup>2</sup></span> datasets demonstrate that our model outperforms existing methods in terms of performance.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 111-119"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.010
Zeyuan Yang , Zonghan Yang , Yichen Liu , Peng Li , Yang Liu
Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches using a fixed network architecture. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.
{"title":"Restricted orthogonal gradient projection for continual learning","authors":"Zeyuan Yang , Zonghan Yang , Yichen Liu , Peng Li , Yang Liu","doi":"10.1016/j.aiopen.2023.08.010","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.010","url":null,"abstract":"<div><p>Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches <em>using a fixed network architecture</em>. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 98-110"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.10.001
Chenzhan Shang , Yupeng Hou , Wayne Xin Zhao , Yaliang Li , Jing Zhang
Conversational recommender system (CRS) interacts with users through multi-turn dialogues in natural language, which aims to provide high-quality recommendations for user’s instant information need. Although great efforts have been made to develop effective CRS, most of them still focus on the contextual information from the current dialogue, usually suffering from the data scarcity issue. Therefore, we consider leveraging historical dialogue data to enrich the limited contexts of the current dialogue session.
In this paper, we propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data from different perspectives. As the core idea, we employ hypergraph to represent complicated semantic relations underlying historical dialogues. In our approach, we first employ the hypergraph structure to model users’ historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations. Second, to alleviate the issue of data scarcity, we use an external knowledge graph and construct a knowledge-based hypergraph considering fine-grained, entity-level semantics. We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS. Extensive experiments on two benchmarks ReDial and TG-ReDial validate the effectiveness of our approach on both recommendation and conversation tasks. Code is available at: https://github.com/RUCAIBox/MHIM.
{"title":"Multi-grained hypergraph interest modeling for conversational recommendation","authors":"Chenzhan Shang , Yupeng Hou , Wayne Xin Zhao , Yaliang Li , Jing Zhang","doi":"10.1016/j.aiopen.2023.10.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.001","url":null,"abstract":"<div><p>Conversational recommender system (CRS) interacts with users through multi-turn dialogues in natural language, which aims to provide high-quality recommendations for user’s instant information need. Although great efforts have been made to develop effective CRS, most of them still focus on the contextual information from the current dialogue, usually suffering from the data scarcity issue. Therefore, we consider leveraging historical dialogue data to enrich the limited contexts of the current dialogue session.</p><p>In this paper, we propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data from different perspectives. As the core idea, we employ <em>hypergraph</em> to represent complicated semantic relations underlying historical dialogues. In our approach, we first employ the hypergraph structure to model users’ historical dialogue sessions and form a <em>session-based hypergraph</em>, which captures <em>coarse-grained, session-level</em> relations. Second, to alleviate the issue of data scarcity, we use an external knowledge graph and construct a <em>knowledge-based hypergraph</em> considering <em>fine-grained, entity-level</em> semantics. We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS. Extensive experiments on two benchmarks <span>ReDial</span> and <span>TG-ReDial</span> validate the effectiveness of our approach on both recommendation and conversation tasks. Code is available at: <span>https://github.com/RUCAIBox/MHIM</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 154-164"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000177/pdfft?md5=845c75e23c419b9a9e76d0939d4efddc&pid=1-s2.0-S2666651023000177-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92131677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.011
Wanjun Zhong , Yifan Gao , Ning Ding , Zhiyuan Liu , Ming Zhou , Jiahai Wang , Jian Yin , Nan Duan
Task generalization has been a long-standing challenge in Natural Language Processing (NLP). Recent research attempts to improve the task generalization ability of pre-trained language models by mapping NLP tasks into human-readable prompted forms. However, these approaches require laborious and inflexible manual collection of prompts, and different prompts on the same downstream task may receive unstable performance. We propose Unified Schema Prompt, a flexible and extensible prompting method, which automatically customizes the learnable prompts for each task according to the task input schema. It models the shared knowledge between tasks, while keeping the characteristics of different task schema, and thus enhances task generalization ability. The schema prompt takes the explicit data structure of each task to formulate prompts so that little human effort is involved. To test the task generalization ability of schema prompt at scale, we conduct schema prompt-based multitask pre-training on a wide variety of general NLP tasks. The framework achieves strong zero-shot and few-shot generalization performance on 16 unseen downstream tasks from 8 task types (e.g., QA, NLI, etc.). Furthermore, comprehensive analyses demonstrate the effectiveness of each component in the schema prompt, its flexibility in task compositionality, and its ability to improve performance under a full-data fine-tuning setting.
{"title":"Improving task generalization via unified schema prompt","authors":"Wanjun Zhong , Yifan Gao , Ning Ding , Zhiyuan Liu , Ming Zhou , Jiahai Wang , Jian Yin , Nan Duan","doi":"10.1016/j.aiopen.2023.08.011","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.011","url":null,"abstract":"<div><p>Task generalization has been a long-standing challenge in Natural Language Processing (NLP). Recent research attempts to improve the task generalization ability of pre-trained language models by mapping NLP tasks into human-readable prompted forms. However, these approaches require laborious and inflexible manual collection of prompts, and different prompts on the same downstream task may receive unstable performance. We propose Unified Schema Prompt, a flexible and extensible prompting method, which automatically customizes the learnable prompts for each task according to the task input schema. It models the shared knowledge between tasks, while keeping the characteristics of different task schema, and thus enhances task generalization ability. The schema prompt takes the explicit data structure of each task to formulate prompts so that little human effort is involved. To test the task generalization ability of schema prompt at scale, we conduct schema prompt-based multitask pre-training on a wide variety of general NLP tasks. The framework achieves strong zero-shot and few-shot generalization performance on 16 unseen downstream tasks from 8 task types (e.g., QA, NLI, etc.). Furthermore, comprehensive analyses demonstrate the effectiveness of each component in the schema prompt, its flexibility in task compositionality, and its ability to improve performance under a full-data fine-tuning setting.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 120-129"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}