首页 > 最新文献

AI Open最新文献

英文 中文
A unified network embedding algorithm for multi-type similarity measures 一种用于多类型相似性度量的统一网络嵌入算法
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.002
Rui Feng , Qi Ding , Weihao Qiu , Xiao Yang , Yang yang , Chunping Wang

Traditional network embedding aims to learn representations by capturing a predefined vertex-to-vertex similarity measure. However, in practice, there are different types of similarity measures (e.g., connectivity and structural similarity), which are appropriate for different downstream applications. Meanwhile, it is hard to select the “best” similarity measure that can mostly benefit the application, considering the required domain knowledge of both application scenario and network science. It sometimes requires to cooperate these similarity measures with each other for achieving better performance. Therefore, automatically integrate multiple types of similarity measures into a uniform network embedding framework is critical to obtain effective vertex representations for a downstream application. In this paper, we address the above problem in social networks, and propose a semi-supervised representation learning algorithm. The general idea of our approach is to impose social influence, which occurs when one’s opinions, emotions, or behaviors are affected by others in a social network. Particularly, we build the connection between a user’s representation vector and the probability of her being influenced by another user to have a particular label (e.g., fraud, personal interest, etc.). We conduct efficient experiments based on six real-world datasets and find a clear improvement of our approach comparing with several state-of-the-art baselines.

传统的网络嵌入旨在通过捕获预定义的顶点到顶点相似性度量来学习表示。然而,在实践中,存在不同类型的相似性度量(例如,连接性和结构相似性),适用于不同的下游应用。同时,考虑到应用场景和网络科学所需的领域知识,很难选择最有利于应用的“最佳”相似性度量。有时需要将这些相似性度量相互配合以实现更好的性能。因此,将多种类型的相似性度量自动集成到统一的网络嵌入框架中,对于获得下游应用程序的有效顶点表示至关重要。在本文中,我们解决了社交网络中的上述问题,并提出了一种半监督表示学习算法。我们方法的总体思想是施加社会影响,当一个人的观点、情绪或行为在社交网络中受到他人的影响时,就会产生这种影响。特别是,我们在用户的表示向量和她受到另一个用户影响而拥有特定标签(例如,欺诈、个人兴趣等)的概率之间建立了联系。我们基于六个真实世界的数据集进行了有效的实验,并发现与几个最先进的基线相比,我们的方法有了明显的改进。
{"title":"A unified network embedding algorithm for multi-type similarity measures","authors":"Rui Feng ,&nbsp;Qi Ding ,&nbsp;Weihao Qiu ,&nbsp;Xiao Yang ,&nbsp;Yang yang ,&nbsp;Chunping Wang","doi":"10.1016/j.aiopen.2023.08.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.002","url":null,"abstract":"<div><p>Traditional network embedding aims to learn <em>representations</em> by capturing a predefined <em>vertex-to-vertex similarity measure</em>. However, in practice, there are different types of similarity measures (e.g., <em>connectivity</em> and <em>structural similarity</em>), which are appropriate for different downstream applications. Meanwhile, it is hard to select the “best” similarity measure that can mostly benefit the application, considering the required domain knowledge of both application scenario and network science. It sometimes requires to cooperate these similarity measures with each other for achieving better performance. Therefore, automatically integrate multiple types of similarity measures into a uniform network embedding framework is critical to obtain effective vertex representations for a downstream application. In this paper, we address the above problem in social networks, and propose a <em>semi-supervised</em> representation learning algorithm. The general idea of our approach is to impose <em>social influence</em>, which occurs when one’s opinions, emotions, or behaviors are affected by others in a social network. Particularly, we build the connection between a user’s representation vector and the probability of her being influenced by another user to have a particular label (e.g., fraud, personal interest, etc.). We conduct efficient experiments based on six real-world datasets and find a clear improvement of our approach comparing with several state-of-the-art baselines.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 64-72"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is Chinese Spelling Check ready? Understanding the correction behavior in real-world scenarios 中文拼写检查准备好了吗?了解现实场景中的纠正行为
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.10.004
Liner Yang , Xin Liu , Tianxin Liao , Zhenghao Liu , Mengyan Wang , Xuezhi Fang , Erhong Yang

The task of Chinese Spelling Check (CSC) is crucial for identifying and rectifying spelling errors in Chinese texts. While prior work in this domain has predominantly relied on benchmarks such as SIGHAN for evaluating model performance, these benchmarks often exhibit an imbalanced distribution of spelling errors. They are typically constructed under idealized conditions, presuming the presence of only spelling errors in the input text. This assumption does not hold in real-world scenarios, where spell checkers frequently encounter a mix of spelling and grammatical errors, thereby presenting additional challenges. To address this gap and create a more realistic testing environment, we introduce a high-quality CSC evaluation benchmark named YACSC (Yet Another Chinese Spelling Check Dataset). YACSC is unique in that it includes annotations for both grammatical and spelling errors, rendering it a more reliable benchmark for CSC tasks. Furthermore, we propose a hierarchical network designed to integrate multidimensional information, leveraging semantic and phonetic aspects, as well as the structural forms of Chinese characters, to enhance the detection and correction of spelling errors. Through extensive experiments, we evaluate the limitations of existing CSC benchmarks and illustrate the application of our proposed system in real-world scenarios, particularly as a preliminary stage in writing assistant systems.

汉语拼写检查是识别和纠正汉语文本拼写错误的关键任务。虽然该领域的先前工作主要依赖于诸如SIGHAN之类的基准来评估模型性能,但这些基准通常表现出拼写错误的不平衡分布。它们通常是在理想条件下构建的,假设输入文本中只存在拼写错误。这种假设在实际场景中并不成立,在实际场景中,拼写检查器经常遇到拼写和语法错误,从而带来额外的挑战。为了解决这一差距并创造一个更现实的测试环境,我们引入了一个高质量的CSC评估基准,名为YACSC (Yet Another Chinese Spelling Check Dataset)。YACSC的独特之处在于它包含语法和拼写错误的注释,使其成为CSC任务更可靠的基准。此外,我们提出了一种分层网络,旨在整合多维信息,利用语义和语音方面,以及汉字的结构形式,以提高拼写错误的检测和纠正。通过广泛的实验,我们评估了现有CSC基准的局限性,并说明了我们提出的系统在现实场景中的应用,特别是作为写作辅助系统的初步阶段。
{"title":"Is Chinese Spelling Check ready? Understanding the correction behavior in real-world scenarios","authors":"Liner Yang ,&nbsp;Xin Liu ,&nbsp;Tianxin Liao ,&nbsp;Zhenghao Liu ,&nbsp;Mengyan Wang ,&nbsp;Xuezhi Fang ,&nbsp;Erhong Yang","doi":"10.1016/j.aiopen.2023.10.004","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.004","url":null,"abstract":"<div><p>The task of Chinese Spelling Check (CSC) is crucial for identifying and rectifying spelling errors in Chinese texts. While prior work in this domain has predominantly relied on benchmarks such as SIGHAN for evaluating model performance, these benchmarks often exhibit an imbalanced distribution of spelling errors. They are typically constructed under idealized conditions, presuming the presence of only spelling errors in the input text. This assumption does not hold in real-world scenarios, where spell checkers frequently encounter a mix of spelling and grammatical errors, thereby presenting additional challenges. To address this gap and create a more realistic testing environment, we introduce a high-quality CSC evaluation benchmark named YACSC (Yet Another Chinese Spelling Check Dataset). YACSC is unique in that it includes annotations for both grammatical and spelling errors, rendering it a more reliable benchmark for CSC tasks. Furthermore, we propose a hierarchical network designed to integrate multidimensional information, leveraging semantic and phonetic aspects, as well as the structural forms of Chinese characters, to enhance the detection and correction of spelling errors. Through extensive experiments, we evaluate the limitations of existing CSC benchmarks and illustrate the application of our proposed system in real-world scenarios, particularly as a preliminary stage in writing assistant systems.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 183-192"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000207/pdfft?md5=74aa1bdba96c30d73a25c1dde4472205&pid=1-s2.0-S2666651023000207-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134657198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on complex factual question answering 复杂事实问答调查
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2022.12.003
Lingxi Zhang , Jing Zhang , Xirui Ke , Haoyang Li , Xinmei Huang , Zhonghui Shao , Shulin Cao , Xin Lv

Answering complex factual questions has drawn a lot of attention. Researchers leverage various data sources to support complex QA, such as unstructured texts, structured knowledge graphs and relational databases, semi-structured web tables, or even hybrid data sources. However, although the ideas behind these approaches show similarity to some extent, there is not yet a consistent strategy to deal with various data sources. In this survey, we carefully examine how complex factual question answering has evolved across various data sources. We list the similarities among these approaches and group them into the analysis–extend–reason framework, despite the various question types and data sources that they focus on. We also address future directions for difficult factual question answering as well as the relevant benchmarks.

回答复杂的事实问题引起了很多关注。研究人员利用各种数据源来支持复杂的QA,如非结构化文本、结构化知识图和关系数据库、半结构化web表,甚至混合数据源。然而,尽管这些方法背后的思想在某种程度上显示出相似性,但还没有一个一致的策略来处理各种数据源。在这项调查中,我们仔细研究了复杂的事实问题回答是如何在各种数据源中演变的。我们列出了这些方法之间的相似之处,并将它们分组到分析-扩展-原因框架中,尽管它们关注的问题类型和数据来源各不相同。我们还讨论了困难的事实问题回答的未来方向以及相关基准。
{"title":"A survey on complex factual question answering","authors":"Lingxi Zhang ,&nbsp;Jing Zhang ,&nbsp;Xirui Ke ,&nbsp;Haoyang Li ,&nbsp;Xinmei Huang ,&nbsp;Zhonghui Shao ,&nbsp;Shulin Cao ,&nbsp;Xin Lv","doi":"10.1016/j.aiopen.2022.12.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.12.003","url":null,"abstract":"<div><p>Answering complex factual questions has drawn a lot of attention. Researchers leverage various data sources to support complex QA, such as unstructured texts, structured knowledge graphs and relational databases, semi-structured web tables, or even hybrid data sources. However, although the ideas behind these approaches show similarity to some extent, there is not yet a consistent strategy to deal with various data sources. In this survey, we carefully examine how complex factual question answering has evolved across various data sources. We list the similarities among these approaches and group them into the analysis–extend–reason framework, despite the various question types and data sources that they focus on. We also address future directions for difficult factual question answering as well as the relevant benchmarks.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 1-12"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Graph-based methods for cervical cancer segmentation: Advancements, limitations, and future directions 基于图的子宫颈癌分割方法:进展、限制和未来方向
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.006
Nazar Zaki , Wenjian Qin , Anusuya Krishnan

Cervical cancer remains a significant health concern worldwide, where precise segmentation of cervical lesions is integral for effective diagnosis and treatment planning. This systematic review critically evaluates the application of graph-based methodologies for cervical cancer segmentation, identifying their potential, drawbacks, and avenues for future development. An exhaustive literature search across Scopus and PubMed databases resulted in 20 pertinent studies. These studies were assessed focusing on their implementation of graph-based techniques for cervical cancer segmentation, the utilized datasets, evaluation metrics, and reported precision levels. The review highlights the progressive strides made in the field, especially regarding the segmentation of intricate, non-convex regions and facilitating the detection and grading of cervical cancer using graph-based methodologies. Nonetheless, several constraints were evident, including a dearth of comparative performance analysis, reliance on high-resolution images, difficulties in specific boundary delineation, and the imperative for additional validation and diversified datasets. The review suggests future work to integrate advanced deep learning strategies for heightened accuracy, formulate hybrid methodologies to counteract existing limitations, and explore multi-modal fusion to boost segmentation precision. Emphasizing the explainability and interpretability of outcomes also stands paramount. Lastly, addressing critical challenges such as scarcity of annotated data, the need for real-time and interactive segmentation, and the segmentation of multiple objects or regions of interest remains a crucial frontier for future endeavors.

宫颈癌症仍然是世界范围内一个重要的健康问题,宫颈病变的精确分割对于有效的诊断和治疗计划至关重要。这篇系统综述对基于图形的方法在宫颈癌症分割中的应用进行了批判性评估,确定了它们的潜力、缺点和未来发展的途径。在Scopus和PubMed数据库中进行了详尽的文献检索,得出了20项相关研究。对这些研究进行了评估,重点是它们对基于图形的宫颈癌症分割技术的实施、所使用的数据集、评估指标和报告的精度水平。该综述强调了该领域取得的进步,特别是在复杂非凸区域的分割以及使用基于图形的方法促进癌症的检测和分级方面。尽管如此,仍存在一些明显的制约因素,包括缺乏比较性能分析、依赖高分辨率图像、难以划定具体边界,以及需要额外的验证和多样化的数据集。该综述建议未来的工作是集成先进的深度学习策略以提高准确性,制定混合方法以抵消现有的局限性,并探索多模式融合以提高分割精度。强调结果的可解释性和可解释性也至关重要。最后,解决关键挑战,如注释数据的稀缺性、实时和交互式分割的需要以及多个感兴趣对象或区域的分割,仍然是未来努力的关键前沿。
{"title":"Graph-based methods for cervical cancer segmentation: Advancements, limitations, and future directions","authors":"Nazar Zaki ,&nbsp;Wenjian Qin ,&nbsp;Anusuya Krishnan","doi":"10.1016/j.aiopen.2023.08.006","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.006","url":null,"abstract":"<div><p>Cervical cancer remains a significant health concern worldwide, where precise segmentation of cervical lesions is integral for effective diagnosis and treatment planning. This systematic review critically evaluates the application of graph-based methodologies for cervical cancer segmentation, identifying their potential, drawbacks, and avenues for future development. An exhaustive literature search across Scopus and PubMed databases resulted in 20 pertinent studies. These studies were assessed focusing on their implementation of graph-based techniques for cervical cancer segmentation, the utilized datasets, evaluation metrics, and reported precision levels. The review highlights the progressive strides made in the field, especially regarding the segmentation of intricate, non-convex regions and facilitating the detection and grading of cervical cancer using graph-based methodologies. Nonetheless, several constraints were evident, including a dearth of comparative performance analysis, reliance on high-resolution images, difficulties in specific boundary delineation, and the imperative for additional validation and diversified datasets. The review suggests future work to integrate advanced deep learning strategies for heightened accuracy, formulate hybrid methodologies to counteract existing limitations, and explore multi-modal fusion to boost segmentation precision. Emphasizing the explainability and interpretability of outcomes also stands paramount. Lastly, addressing critical challenges such as scarcity of annotated data, the need for real-time and interactive segmentation, and the segmentation of multiple objects or regions of interest remains a crucial frontier for future endeavors.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 42-55"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Word sense induction with agglomerative clustering and mutual information maximization 词义归纳与聚类和互信息最大化
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.12.001
Hadi Abdine , Moussa Kamal Eddine , Davide Buscaldi , Michalis Vazirgiannis

Word sense induction (WSI) is a challenging problem in natural language processing that involves the unsupervised automatic detection of a word’s senses (i.e., meanings). Recent work achieves significant results on the WSI task by pre-training a language model that can exclusively disambiguate word senses. In contrast, others employ off-the-shelf pre-trained language models with additional strategies to induce senses. This paper proposes a novel unsupervised method based on hierarchical clustering and invariant information clustering (IIC). The IIC loss is used to train a small model to optimize the mutual information between two vector representations of a target word occurring in a pair of synthetic paraphrases. This model is later used in inference mode to extract a higher-quality vector representation to be used in the hierarchical clustering. We evaluate our method on two WSI tasks and in two distinct clustering configurations (fixed and dynamic number of clusters). We empirically show that our approach is at least on par with the state-of-the-art baselines, outperforming them in several configurations. The code and data to reproduce this work are available to the public1.

词义归纳(WSI)是自然语言处理中一个具有挑战性的问题,它涉及在无监督的情况下自动检测一个词的词义(即含义)。最近的研究通过预训练一个语言模型,该模型可以专门用于词义消歧,从而在词义归纳任务中取得了显著的成果。与此相反,其他研究则采用现成的预训练语言模型,并增加了诱导词义的策略。本文提出了一种基于分层聚类和不变信息聚类(IIC)的新型无监督方法。IIC 损失用于训练一个小型模型,以优化一对合成意译中出现的目标词的两个向量表示之间的互信息。该模型随后将用于推理模式,以提取更高质量的向量表示,用于分层聚类。我们在两个 WSI 任务和两种不同的聚类配置(固定聚类数和动态聚类数)中对我们的方法进行了评估。我们的经验表明,我们的方法至少与最先进的基线方法不相上下,在几种配置中的表现都优于它们。重现这项工作的代码和数据可公开获取1。
{"title":"Word sense induction with agglomerative clustering and mutual information maximization","authors":"Hadi Abdine ,&nbsp;Moussa Kamal Eddine ,&nbsp;Davide Buscaldi ,&nbsp;Michalis Vazirgiannis","doi":"10.1016/j.aiopen.2023.12.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.12.001","url":null,"abstract":"<div><p>Word sense induction (WSI) is a challenging problem in natural language processing that involves the unsupervised automatic detection of a word’s senses (i.e., meanings). Recent work achieves significant results on the WSI task by pre-training a language model that can exclusively disambiguate word senses. In contrast, others employ off-the-shelf pre-trained language models with additional strategies to induce senses. This paper proposes a novel unsupervised method based on hierarchical clustering and invariant information clustering (IIC). The IIC loss is used to train a small model to optimize the mutual information between two vector representations of a target word occurring in a pair of synthetic paraphrases. This model is later used in inference mode to extract a higher-quality vector representation to be used in the hierarchical clustering. We evaluate our method on two WSI tasks and in two distinct clustering configurations (fixed and dynamic number of clusters). We empirically show that our approach is at least on par with the state-of-the-art baselines, outperforming them in several configurations. The code and data to reproduce this work are available to the public<span><sup>1</sup></span>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 193-201"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000232/pdfft?md5=a0553e94f2fab365fb751bcc0ddf8e6c&pid=1-s2.0-S2666651023000232-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138570139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint span and token framework for few-shot named entity recognition 用于少镜头命名实体识别的联合跨度和令牌框架
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.009
Wenlong Fang, Yongbin Liu, Chunping Ouyang, Lin Ren, Jiale Li, Yaping Wan

Few-shot Named Entity Recognition (NER) is a challenging task that involves identifying new entity types using a limited number of labeled instances for training. Currently, the majority of Few-shot NER methods are based on span, which pay more attention to the boundary information of the spans as candidate entities and the entity-level information. However, these methods often overlook token-level semantic information, which can limit their effectiveness. To address this issue, we propose a novel Joint Span and Token (JST) framework that integrates both the boundary information of an entity and the semantic information of each token that comprises an entity. The JST framework employs span features to extract the boundary features of the entity and token features to extract the semantic features of each token. Additionally, to reduce the negative impact of the Other class, we introduce a method to separate named entities from the Other class in semantic space, which helps to improve the distinction between entities and the Other class. In addition, we used GPT to do data augmentation on the support sentences, generating similar sentences to the original ones. These sentences increase the diversity of the sample and the reliability of our model. Our experimental results on the Few-NERD1 and SNIPS2 datasets demonstrate that our model outperforms existing methods in terms of performance.

少镜头命名实体识别(NER)是一项具有挑战性的任务,涉及使用有限数量的标记实例来识别新的实体类型进行训练。目前,大多数少镜头NER方法都是基于跨度的,它们更关注作为候选实体的跨度的边界信息和实体级别的信息。然而,这些方法往往忽略了令牌级别的语义信息,这可能会限制它们的有效性。为了解决这个问题,我们提出了一种新的联合跨度和令牌(JST)框架,该框架集成了实体的边界信息和包括实体的每个令牌的语义信息。JST框架使用跨度特征来提取实体的边界特征,使用令牌特征来提取每个令牌的语义特征。此外,为了减少Other类的负面影响,我们引入了一种在语义空间中将命名实体与Other类分离的方法,这有助于改进实体和Other类之间的区别。此外,我们使用GPT对支持语句进行数据扩充,生成与原始语句相似的语句。这些句子增加了样本的多样性和我们模型的可靠性。我们在Few-NERD1和SNIPS2数据集上的实验结果表明,我们的模型在性能方面优于现有方法。
{"title":"Joint span and token framework for few-shot named entity recognition","authors":"Wenlong Fang,&nbsp;Yongbin Liu,&nbsp;Chunping Ouyang,&nbsp;Lin Ren,&nbsp;Jiale Li,&nbsp;Yaping Wan","doi":"10.1016/j.aiopen.2023.08.009","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.009","url":null,"abstract":"<div><p>Few-shot Named Entity Recognition (NER) is a challenging task that involves identifying new entity types using a limited number of labeled instances for training. Currently, the majority of Few-shot NER methods are based on span, which pay more attention to the boundary information of the spans as candidate entities and the entity-level information. However, these methods often overlook token-level semantic information, which can limit their effectiveness. To address this issue, we propose a novel Joint Span and Token (<strong>JST</strong>) framework that integrates both the boundary information of an entity and the semantic information of each token that comprises an entity. The <strong>JST</strong> framework employs span features to extract the boundary features of the entity and token features to extract the semantic features of each token. Additionally, to reduce the negative impact of the Other class, we introduce a method to separate named entities from the Other class in semantic space, which helps to improve the distinction between entities and the Other class. In addition, we used GPT to do data augmentation on the support sentences, generating similar sentences to the original ones. These sentences increase the diversity of the sample and the reliability of our model. Our experimental results on the Few-NERD<span><sup>1</sup></span> and SNIPS<span><sup>2</sup></span> datasets demonstrate that our model outperforms existing methods in terms of performance.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 111-119"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Restricted orthogonal gradient projection for continual learning 连续学习的受限正交梯度投影
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.010
Zeyuan Yang , Zonghan Yang , Yichen Liu , Peng Li , Yang Liu

Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches using a fixed network architecture. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.

持续学习旨在避免灾难性的遗忘,并有效地利用所学经验来掌握新知识。现有的梯度投影方法对新任务的优化空间施加了严格的约束,以最大限度地减少干扰,这同时阻碍了正向知识转移。为了解决这个问题,最近的方法在不断增长的网络中重用冻结的参数,导致计算成本高。因此,我们是否可以使用固定网络架构改进梯度投影方法的前向知识转移仍然是一个挑战。在这项工作中,我们提出了限制正交梯度投影(ROGO)框架。其基本思想是采用限制正交约束,允许在倾斜于整个冻结空间的方向上优化参数,以便于在巩固先前知识的同时向前转移知识。我们的框架既不需要数据缓冲区,也不需要额外的参数。大量的实验已经证明了我们的框架相对于几个强大的基线的优越性。我们也为我们的放松策略提供了理论保障。
{"title":"Restricted orthogonal gradient projection for continual learning","authors":"Zeyuan Yang ,&nbsp;Zonghan Yang ,&nbsp;Yichen Liu ,&nbsp;Peng Li ,&nbsp;Yang Liu","doi":"10.1016/j.aiopen.2023.08.010","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.010","url":null,"abstract":"<div><p>Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches <em>using a fixed network architecture</em>. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 98-110"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-grained hypergraph interest modeling for conversational recommendation 用于会话推荐的多粒度超图兴趣建模
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.10.001
Chenzhan Shang , Yupeng Hou , Wayne Xin Zhao , Yaliang Li , Jing Zhang

Conversational recommender system (CRS) interacts with users through multi-turn dialogues in natural language, which aims to provide high-quality recommendations for user’s instant information need. Although great efforts have been made to develop effective CRS, most of them still focus on the contextual information from the current dialogue, usually suffering from the data scarcity issue. Therefore, we consider leveraging historical dialogue data to enrich the limited contexts of the current dialogue session.

In this paper, we propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data from different perspectives. As the core idea, we employ hypergraph to represent complicated semantic relations underlying historical dialogues. In our approach, we first employ the hypergraph structure to model users’ historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations. Second, to alleviate the issue of data scarcity, we use an external knowledge graph and construct a knowledge-based hypergraph considering fine-grained, entity-level semantics. We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS. Extensive experiments on two benchmarks ReDial and TG-ReDial validate the effectiveness of our approach on both recommendation and conversation tasks. Code is available at: https://github.com/RUCAIBox/MHIM.

会话推荐系统(CRS)通过自然语言的多回合对话与用户进行交互,旨在为用户提供即时信息需求的高质量推荐。尽管已经为开发有效的CRS做出了巨大努力,但大多数CRS仍然侧重于当前对话的上下文信息,通常存在数据稀缺问题。因此,我们考虑利用历史对话数据来丰富当前对话的有限背景。在本文中,我们提出了一种新的多粒度超图兴趣建模方法,从不同的角度捕捉复杂历史数据下的用户兴趣。我们的核心思想是利用超图来表示历史对话背后复杂的语义关系。在我们的方法中,我们首先使用超图结构对用户的历史对话会话进行建模,并形成基于会话的超图,该超图捕获粗粒度的会话级关系。其次,为了缓解数据稀缺性问题,我们使用外部知识图,并考虑细粒度的实体级语义,构建基于知识的超图。我们进一步对这两种超图进行了多粒度的超图卷积,并利用增强的表示来开发兴趣感知的CRS。在两个基准测试ReDial和TG-ReDial上进行的大量实验验证了我们的方法在推荐和对话任务上的有效性。代码可从https://github.com/RUCAIBox/MHIM获得。
{"title":"Multi-grained hypergraph interest modeling for conversational recommendation","authors":"Chenzhan Shang ,&nbsp;Yupeng Hou ,&nbsp;Wayne Xin Zhao ,&nbsp;Yaliang Li ,&nbsp;Jing Zhang","doi":"10.1016/j.aiopen.2023.10.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.001","url":null,"abstract":"<div><p>Conversational recommender system (CRS) interacts with users through multi-turn dialogues in natural language, which aims to provide high-quality recommendations for user’s instant information need. Although great efforts have been made to develop effective CRS, most of them still focus on the contextual information from the current dialogue, usually suffering from the data scarcity issue. Therefore, we consider leveraging historical dialogue data to enrich the limited contexts of the current dialogue session.</p><p>In this paper, we propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data from different perspectives. As the core idea, we employ <em>hypergraph</em> to represent complicated semantic relations underlying historical dialogues. In our approach, we first employ the hypergraph structure to model users’ historical dialogue sessions and form a <em>session-based hypergraph</em>, which captures <em>coarse-grained, session-level</em> relations. Second, to alleviate the issue of data scarcity, we use an external knowledge graph and construct a <em>knowledge-based hypergraph</em> considering <em>fine-grained, entity-level</em> semantics. We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS. Extensive experiments on two benchmarks <span>ReDial</span> and <span>TG-ReDial</span> validate the effectiveness of our approach on both recommendation and conversation tasks. Code is available at: <span>https://github.com/RUCAIBox/MHIM</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 154-164"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000177/pdfft?md5=845c75e23c419b9a9e76d0939d4efddc&pid=1-s2.0-S2666651023000177-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92131677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving task generalization via unified schema prompt 通过统一的模式提示提高任务泛化能力
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.011
Wanjun Zhong , Yifan Gao , Ning Ding , Zhiyuan Liu , Ming Zhou , Jiahai Wang , Jian Yin , Nan Duan

Task generalization has been a long-standing challenge in Natural Language Processing (NLP). Recent research attempts to improve the task generalization ability of pre-trained language models by mapping NLP tasks into human-readable prompted forms. However, these approaches require laborious and inflexible manual collection of prompts, and different prompts on the same downstream task may receive unstable performance. We propose Unified Schema Prompt, a flexible and extensible prompting method, which automatically customizes the learnable prompts for each task according to the task input schema. It models the shared knowledge between tasks, while keeping the characteristics of different task schema, and thus enhances task generalization ability. The schema prompt takes the explicit data structure of each task to formulate prompts so that little human effort is involved. To test the task generalization ability of schema prompt at scale, we conduct schema prompt-based multitask pre-training on a wide variety of general NLP tasks. The framework achieves strong zero-shot and few-shot generalization performance on 16 unseen downstream tasks from 8 task types (e.g., QA, NLI, etc.). Furthermore, comprehensive analyses demonstrate the effectiveness of each component in the schema prompt, its flexibility in task compositionality, and its ability to improve performance under a full-data fine-tuning setting.

任务泛化一直是自然语言处理中的一个长期挑战。最近的研究试图通过将NLP任务映射到人类可读的提示形式来提高预训练语言模型的任务泛化能力。然而,这些方法需要费力且不灵活的手动提示收集,并且同一下游任务上的不同提示可能会获得不稳定的性能。我们提出了统一模式提示,这是一种灵活且可扩展的提示方法,它根据任务输入模式自动定制每个任务的可学习提示。它对任务之间的共享知识进行建模,同时保持不同任务模式的特征,从而提高任务的泛化能力。模式提示采用每个任务的显式数据结构来制定提示,因此几乎不需要人工操作。为了在规模上测试模式提示的任务泛化能力,我们对各种通用NLP任务进行了基于模式提示的多任务预训练。该框架在8种任务类型(如QA、NLI等)的16个看不见的下游任务上实现了强大的零样本和较少的搜索泛化性能。此外,综合分析证明了每个组件在模式提示中的有效性、任务组合的灵活性,以及在全数据微调设置下提高性能的能力。
{"title":"Improving task generalization via unified schema prompt","authors":"Wanjun Zhong ,&nbsp;Yifan Gao ,&nbsp;Ning Ding ,&nbsp;Zhiyuan Liu ,&nbsp;Ming Zhou ,&nbsp;Jiahai Wang ,&nbsp;Jian Yin ,&nbsp;Nan Duan","doi":"10.1016/j.aiopen.2023.08.011","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.011","url":null,"abstract":"<div><p>Task generalization has been a long-standing challenge in Natural Language Processing (NLP). Recent research attempts to improve the task generalization ability of pre-trained language models by mapping NLP tasks into human-readable prompted forms. However, these approaches require laborious and inflexible manual collection of prompts, and different prompts on the same downstream task may receive unstable performance. We propose Unified Schema Prompt, a flexible and extensible prompting method, which automatically customizes the learnable prompts for each task according to the task input schema. It models the shared knowledge between tasks, while keeping the characteristics of different task schema, and thus enhances task generalization ability. The schema prompt takes the explicit data structure of each task to formulate prompts so that little human effort is involved. To test the task generalization ability of schema prompt at scale, we conduct schema prompt-based multitask pre-training on a wide variety of general NLP tasks. The framework achieves strong zero-shot and few-shot generalization performance on 16 unseen downstream tasks from 8 task types (e.g., QA, NLI, etc.). Furthermore, comprehensive analyses demonstrate the effectiveness of each component in the schema prompt, its flexibility in task compositionality, and its ability to improve performance under a full-data fine-tuning setting.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 120-129"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Batch virtual adversarial training for graph convolutional networks 图卷积网络的批处理虚拟对抗训练
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.007
Zhijie Deng , Yinpeng Dong , Jun Zhu

We present batch virtual adversarial training (BVAT), a novel regularization method for graph convolutional networks (GCNs). BVAT addresses the issue that GCNs do not ensure the smoothness of the model’s output distribution against local perturbations around the input node features. We propose two algorithms, sampling-based BVAT and optimization-based BVAT, which promote the output smoothness of GCN classifiers based on the generated virtual adversarial perturbations for either a subset of independent nodes or all nodes via an elaborate optimization process. Extensive experiments on three citation network datasets Cora, Citeseer and Pubmed and a knowledge graph dataset Nell validate the efficacy of the proposed method in semi-supervised node classification tasks.

我们提出了批量虚拟对抗性训练(BVAT),这是一种用于图卷积网络(GCN)的新的正则化方法。BVAT解决了GCN不能确保模型输出分布的平滑性以对抗输入节点特征周围的局部扰动的问题。我们提出了两种算法,基于采样的BVAT和基于优化的BVAT,它们通过精心设计的优化过程,基于生成的独立节点子集或所有节点的虚拟对抗性扰动,提高了GCN分类器的输出平滑性。在三个引文网络数据集Cora、Citeseer和Pubmed以及一个知识图数据集Nell上进行的大量实验验证了所提出的方法在半监督节点分类任务中的有效性。
{"title":"Batch virtual adversarial training for graph convolutional networks","authors":"Zhijie Deng ,&nbsp;Yinpeng Dong ,&nbsp;Jun Zhu","doi":"10.1016/j.aiopen.2023.08.007","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.007","url":null,"abstract":"<div><p>We present batch virtual adversarial training (BVAT), a novel regularization method for graph convolutional networks (GCNs). BVAT addresses the issue that GCNs do not ensure the smoothness of the model’s output distribution against local perturbations around the input node features. We propose two algorithms, sampling-based BVAT and optimization-based BVAT, which promote the output smoothness of GCN classifiers based on the generated virtual adversarial perturbations for either a subset of independent nodes or all nodes via an elaborate optimization process. Extensive experiments on three citation network datasets <em>Cora</em>, <em>Citeseer</em> and <em>Pubmed</em> and a knowledge graph dataset <em>Nell</em> validate the efficacy of the proposed method in semi-supervised node classification tasks.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 73-79"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49761369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
期刊
AI Open
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1