首页 > 最新文献

AI Open最新文献

英文 中文
Language as a latent sequence: Deep latent variable models for semi-supervised paraphrase generation 作为潜在序列的语言:半监督转述生成的深层潜在变量模型
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.05.001
Jialin Yu , Alexandra I. Cristea , Anoushka Harit , Zhongtian Sun , Olanrewaju Tahir Aduragba , Lei Shi , Noura Al Moubayed

This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text. To leverage information from text pairs, we additionally introduce a novel supervised model we call dual directional learning (DDL), which is designed to integrate with our proposed VSAR model. Combining VSAR with DDL (DDL+VSAR) enables us to conduct semi-supervised learning. Still, the combined model suffers from a cold-start problem. To further combat this issue, we propose an improved weight initialisation solution, leading to a novel two-stage training scheme we call knowledge-reinforced-learning (KRL). Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data. Furthermore, in scenarios where only a fraction of the labelled pairs are available, our combined model consistently outperforms the strong supervised model baseline (DDL) by a significant margin (p<.05; Wilcoxon test). Our code is publicly available at https://github.com/jialin-yu/latent-sequence-paraphrase.

本文探讨了半监督转述生成的深层潜变量模型,其中未标记数据的缺失目标对被建模为潜转述序列。我们提出了一种新的无监督模型,称为变分序列自动编码重建(VSAR),该模型在给定观测文本的情况下执行潜在序列推理。为了利用来自文本对的信息,我们还引入了一种新的监督模型,称为双向学习(DDL),该模型旨在与我们提出的VSAR模型集成。将VSAR与DDL相结合(DDL+VSAR)使我们能够进行半监督学习。尽管如此,合并后的车型仍存在冷启动问题。为了进一步解决这个问题,我们提出了一种改进的权重初始化解决方案,从而产生了一种新的两阶段训练方案,我们称之为知识强化学习(KRL)。我们的经验评估表明,在完整数据上,与最先进的监督基线相比,组合模型产生了具有竞争力的性能。此外,在只有一小部分标记对可用的情况下,我们的组合模型始终显著优于强监督模型基线(DDL)(p<;.05;Wilcoxon检验)。我们的代码可在https://github.com/jialin-yu/latent-sequence-paraphrase.
{"title":"Language as a latent sequence: Deep latent variable models for semi-supervised paraphrase generation","authors":"Jialin Yu ,&nbsp;Alexandra I. Cristea ,&nbsp;Anoushka Harit ,&nbsp;Zhongtian Sun ,&nbsp;Olanrewaju Tahir Aduragba ,&nbsp;Lei Shi ,&nbsp;Noura Al Moubayed","doi":"10.1016/j.aiopen.2023.05.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.05.001","url":null,"abstract":"<div><p>This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named <em>variational sequence auto-encoding reconstruction</em> (<strong>VSAR</strong>), which performs latent sequence inference given an observed text. To leverage information from text pairs, we additionally introduce a novel supervised model we call <em>dual directional learning</em> (<strong>DDL</strong>), which is designed to integrate with our proposed VSAR model. Combining VSAR with DDL (<strong>DDL+VSAR</strong>) enables us to conduct semi-supervised learning. Still, the combined model suffers from a cold-start problem. To further combat this issue, we propose an improved weight initialisation solution, leading to a novel two-stage training scheme we call <em>knowledge-reinforced-learning</em> (<strong>KRL</strong>). Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data. Furthermore, in scenarios where only a fraction of the labelled pairs are available, our combined model consistently outperforms the strong supervised model baseline (<strong>DDL</strong>) by a significant margin (<span><math><mrow><mi>p</mi><mo>&lt;</mo><mo>.</mo><mn>05</mn></mrow></math></span>; Wilcoxon test). Our code is publicly available at https://github.com/jialin-yu/latent-sequence-paraphrase.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 19-32"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UPRec: User-aware Pre-training for sequential Recommendation UPRec:顺序推荐的用户感知预培训
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.008
Chaojun Xiao , Ruobing Xie , Yuan Yao , Zhiyuan Liu , Maosong Sun , Xu Zhang , Leyu Lin

Recent years witness the success of pre-trained models to alleviate the data sparsity problem in recommender systems. However, existing pre-trained models for recommendation mainly focus on leveraging universal sequence patterns from user behavior sequences and item information, whereas ignore heterogeneous user information to capture personalized interests, which has been shown to contribute to the personalized recommendation. In this paper, we propose a simple yet effective model, called User-aware Pre-training for Recommendation (UPRec), which could flexibly encode heterogeneous user information into the sequential modeling of user behaviors. Specifically, UPRec first encodes the sequential behavior to generate user embeddings, and then jointly optimizes the model with the sequential objective and user-aware objective constructed from the user attributes and structured social graphs. Comprehensive experimental results on two real-world large-scale recommendation datasets demonstrate that UPRec can effectively enrich the user representations with user attributes and social relations and thus provide more appropriate recommendations for users.

近年来,预训练模型在缓解推荐系统中的数据稀疏性问题方面取得了成功。然而,现有的预训练推荐模型主要关注利用来自用户行为序列和项目信息的通用序列模式,而忽略异构用户信息来捕获个性化兴趣,这已被证明有助于个性化推荐。在本文中,我们提出了一个简单而有效的模型,称为用户感知推荐预训练(UPRec),它可以灵活地将异构用户信息编码到用户行为的序列建模中。具体而言,UPRec首先对序列行为进行编码以生成用户嵌入,然后利用由用户属性和结构化社交图构建的序列目标和用户感知目标来联合优化模型。在两个真实世界的大规模推荐数据集上的综合实验结果表明,UPRec可以有效地丰富具有用户属性和社会关系的用户表示,从而为用户提供更合适的推荐。
{"title":"UPRec: User-aware Pre-training for sequential Recommendation","authors":"Chaojun Xiao ,&nbsp;Ruobing Xie ,&nbsp;Yuan Yao ,&nbsp;Zhiyuan Liu ,&nbsp;Maosong Sun ,&nbsp;Xu Zhang ,&nbsp;Leyu Lin","doi":"10.1016/j.aiopen.2023.08.008","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.008","url":null,"abstract":"<div><p>Recent years witness the success of pre-trained models to alleviate the data sparsity problem in recommender systems. However, existing pre-trained models for recommendation mainly focus on leveraging universal sequence patterns from user behavior sequences and item information, whereas ignore heterogeneous user information to capture personalized interests, which has been shown to contribute to the personalized recommendation. In this paper, we propose a simple yet effective model, called <strong>U</strong>ser-aware <strong>P</strong>re-training for <strong>Rec</strong>ommendation (UPRec), which could flexibly encode heterogeneous user information into the sequential modeling of user behaviors. Specifically, UPRec first encodes the sequential behavior to generate user embeddings, and then jointly optimizes the model with the sequential objective and user-aware objective constructed from the user attributes and structured social graphs. Comprehensive experimental results on two real-world large-scale recommendation datasets demonstrate that UPRec can effectively enrich the user representations with user attributes and social relations and thus provide more appropriate recommendations for users.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 137-144"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning fair representations via an adversarial framework 通过对抗性框架学习公平表征
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.003
Huadong Qiu , Rui Feng , Ruoyun Hu , Xiao Yang , Shaowa Lin , Quanjin Tao , Yang Yang

Fairness has become a central issue for our research community as classification algorithms are adopted in societally critical domains such as recidivism prediction and loan approval. In this work, we consider the potential bias based on protected attributes (e.g., race and gender), and tackle this problem by learning latent representations of individuals that are statistically indistinguishable between protected groups while sufficiently preserving other information for classification. To do that, we develop a minimax adversarial framework with a generator to capture the data distribution and generate latent representations, and a critic to ensure that the distributions across different protected groups are similar. Our framework provides theoretical guarantee with respect statistical parity and individual fairness. Empirical results on four real-world datasets also show that the learned representation can effectively be used for classification tasks such as credit risk prediction while obstructing information related to protected groups, especially when removing protected attributes is not sufficient for fair classification.

随着分类算法在累犯预测和贷款审批等社会关键领域的应用,公平性已成为我们研究界的核心问题。在这项工作中,我们考虑了基于受保护属性(如种族和性别)的潜在偏见,并通过学习受保护群体之间在统计上无法区分的个人的潜在表征来解决这个问题,同时充分保留其他信息进行分类。为此,我们开发了一个极小最大对抗性框架,其中有一个生成器来捕获数据分布并生成潜在表示,还有一个评论家来确保不同受保护组之间的分布是相似的。我们的框架为尊重统计平等和个人公平提供了理论保障。在四个真实世界数据集上的经验结果还表明,学习的表示可以有效地用于分类任务,如信用风险预测,同时阻碍与受保护群体相关的信息,特别是当去除受保护属性不足以进行公平分类时。
{"title":"Learning fair representations via an adversarial framework","authors":"Huadong Qiu ,&nbsp;Rui Feng ,&nbsp;Ruoyun Hu ,&nbsp;Xiao Yang ,&nbsp;Shaowa Lin ,&nbsp;Quanjin Tao ,&nbsp;Yang Yang","doi":"10.1016/j.aiopen.2023.08.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.003","url":null,"abstract":"<div><p>Fairness has become a central issue for our research community as classification algorithms are adopted in societally critical domains such as recidivism prediction and loan approval. In this work, we consider the potential bias based on protected attributes (e.g., race and gender), and tackle this problem by learning latent representations of individuals that are statistically indistinguishable between protected groups while sufficiently preserving other information for classification. To do that, we develop a minimax adversarial framework with a <em>generator</em> to capture the data distribution and generate latent representations, and a <em>critic</em> to ensure that the distributions across different protected groups are similar. Our framework provides theoretical guarantee with respect statistical parity and individual fairness. Empirical results on four real-world datasets also show that the learned representation can effectively be used for classification tasks such as credit risk prediction while obstructing information related to protected groups, especially when removing protected attributes is not sufficient for fair classification.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 91-97"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49761371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Associating multiple vision transformer layers for fine-grained image representation 关联多个视觉转换层以实现细粒度图像表示
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.09.001
Fayou Sun , Hea Choon Ngo , Yong Wee Sek , Zuqiang Meng

- Accurate discriminative region proposal has an important effect for fine-grained image recognition. The vision transformer (ViT) brings about a striking effect in computer vision due to its innate multi-head self-attention mechanism. However, the attention maps are gradually similar after certain layers, and since ViT used a classification token to achieve classification, it is unable to effectively select discriminative image patches for fine-grained image classification. To accurately detect discriminative regions, we propose a novel network AMTrans, which efficiently increases layers to learn diverse features and utilizes integrated raw attention maps to capture more salient features. Specifically, we employ DeepViT as backbone to solve the attention collapse issue. Then, we fuse each head attention weight within each layer to produce an attention weight map. After that, we alternatively use recurrent residual refinement blocks to promote salient feature and then utilize the semantic grouping method to propose the discriminative feature region. A lot of experiments prove that AMTrans acquires the SOTA performance on four widely used fine-grained datasets under the same settings, involving Stanford-Cars, Stanford-Dogs, CUB-200-2011, and ImageNet.

-准确的判别区域建议对于细粒度图像识别具有重要作用。视觉转换器(ViT)由于其固有的多头自注意机制,在计算机视觉中产生了引人注目的效果。然而,在某些层之后,注意力图逐渐相似,并且由于ViT使用分类令牌来实现分类,因此无法有效地选择有判别力的图像块进行细粒度图像分类。为了准确检测判别区域,我们提出了一种新的网络AMTrans,它有效地增加了层来学习不同的特征,并利用集成的原始注意力图来捕捉更显著的特征。具体来说,我们使用DeepViT作为主干来解决注意力崩溃问题。然后,我们融合每一层中的每个头部注意力权重,以生成注意力权重图。然后,我们交替地使用递归残差细化块来提升显著特征,然后使用语义分组方法来提出判别特征区域。大量实验证明,AMTrans在相同设置下,在四个广泛使用的细粒度数据集上获得了SOTA性能,这些数据集包括Stanford Cars、Stanford Dogs、CUB200-2011和ImageNet。
{"title":"Associating multiple vision transformer layers for fine-grained image representation","authors":"Fayou Sun ,&nbsp;Hea Choon Ngo ,&nbsp;Yong Wee Sek ,&nbsp;Zuqiang Meng","doi":"10.1016/j.aiopen.2023.09.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.09.001","url":null,"abstract":"<div><p>- Accurate discriminative region proposal has an important effect for fine-grained image recognition. The vision transformer (ViT) brings about a striking effect in computer vision due to its innate multi-head self-attention mechanism. However, the attention maps are gradually similar after certain layers, and since ViT used a classification token to achieve classification, it is unable to effectively select discriminative image patches for fine-grained image classification. To accurately detect discriminative regions, we propose a novel network AMTrans, which efficiently increases layers to learn diverse features and utilizes integrated raw attention maps to capture more salient features. Specifically, we employ DeepViT as backbone to solve the attention collapse issue. Then, we fuse each head attention weight within each layer to produce an attention weight map. After that, we alternatively use recurrent residual refinement blocks to promote salient feature and then utilize the semantic grouping method to propose the discriminative feature region. A lot of experiments prove that AMTrans acquires the SOTA performance on four widely used fine-grained datasets under the same settings, involving Stanford-Cars, Stanford-Dogs, CUB-200-2011, and ImageNet.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 130-136"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MOTT: A new model for multi-object tracking based on green learning paradigm MOTT:基于绿色学习范式的多目标跟踪新模型
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.09.002
Shan Wu , Amnir Hadachi , Chaoru Lu , Damien Vivet

Multi-object tracking (MOT) is one of the most essential and challenging tasks in computer vision (CV). Unlike object detectors, MOT systems nowadays are more complicated and consist of several neural network models. Thus, the balance between the system performance and the runtime is crucial for online scenarios. While some of the works contribute by adding more modules to achieve improvements, we propose a pruned model by leveraging the state-of-the-art Transformer backbone model. Our model saves up to 62% FLOPS compared with other Transformer-based models and almost as twice as fast as them. The results of the proposed model are still competitive among the state-of-the-art methods. Moreover, we will open-source our modified Transformer backbone model for general CV tasks as well as the MOT system.

多目标跟踪(MOT)是计算机视觉(CV)中最重要和最具挑战性的任务之一。与物体探测器不同,MOT系统现在更加复杂,由几个神经网络模型组成。因此,系统性能和运行时间之间的平衡对于在线场景至关重要。虽然一些工作通过添加更多模块来实现改进,但我们通过利用最先进的Transformer主干模型提出了一个精简模型。与其他基于Transformer的模型相比,我们的模型节省了高达62%的FLOPS,速度几乎是它们的两倍。所提出的模型的结果在最先进的方法中仍然具有竞争力。此外,我们将为通用CV任务和MOT系统开源我们修改后的Transformer主干模型。
{"title":"MOTT: A new model for multi-object tracking based on green learning paradigm","authors":"Shan Wu ,&nbsp;Amnir Hadachi ,&nbsp;Chaoru Lu ,&nbsp;Damien Vivet","doi":"10.1016/j.aiopen.2023.09.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.09.002","url":null,"abstract":"<div><p>Multi-object tracking (MOT) is one of the most essential and challenging tasks in computer vision (CV). Unlike object detectors, MOT systems nowadays are more complicated and consist of several neural network models. Thus, the balance between the system performance and the runtime is crucial for online scenarios. While some of the works contribute by adding more modules to achieve improvements, we propose a pruned model by leveraging the state-of-the-art Transformer backbone model. Our model saves up to 62% FLOPS compared with other Transformer-based models and almost as twice as fast as them. The results of the proposed model are still competitive among the state-of-the-art methods. Moreover, we will open-source our modified Transformer backbone model for general CV tasks as well as the MOT system.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 145-153"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic graph based topic modelling framework for multilingual fake news detection 基于语义图的多语言假新闻检测主题建模框架
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.004
Rami Mohawesh , Xiao Liu , Hilya Mudrika Arini , Yutao Wu , Hui Yin

Fake news detection is one of the most alluring problems that has grabbed the interest of Machine Learning (ML) and Natural Language Processing (NLP) experts in recent years. The majority of existing studies on detecting fake news are written in English, restricting its application outside the English-speaking population. The lack of annotated corpora and technologies makes it difficult to identify false news in the scenario of low-resource languages, despite the growth in multilingual web content. Moreover, existing works cannot collect more semantic and contextual characteristics from documents in a particular multilingual text corpus. To bridge up these challenges and deal with the multilingual fake news detection challenge, we develop a new semantic graph attention-based representation learning framework to extract structural and semantic representations of texts. Our experiments on TALLIP fake news datasets showed that the classification performance had been significantly enhanced, ranging from 1% to 7% in terms of accuracy metric, and our proposed framework outperformed the state-of-the-art techniques for the multilingual fake news detection task.

假新闻检测是近年来引起机器学习(ML)和自然语言处理(NLP)专家兴趣的最具吸引力的问题之一。现有的大多数关于检测假新闻的研究都是用英语写的,这限制了它在英语人群之外的应用。尽管多语言网络内容不断增长,但由于缺乏带注释的语料库和技术,在资源匮乏的语言环境中很难识别虚假新闻。此外,现有的作品无法从特定的多语言文本语料库中的文档中收集更多的语义和上下文特征。为了弥补这些挑战并应对多语言假新闻检测的挑战,我们开发了一个新的基于语义图注意力的表示学习框架来提取文本的结构和语义表示。我们在TALLIP假新闻数据集上的实验表明,分类性能得到了显著提高,准确度从1%到7%不等,并且我们提出的框架在多语言假新闻检测任务中优于最先进的技术。
{"title":"Semantic graph based topic modelling framework for multilingual fake news detection","authors":"Rami Mohawesh ,&nbsp;Xiao Liu ,&nbsp;Hilya Mudrika Arini ,&nbsp;Yutao Wu ,&nbsp;Hui Yin","doi":"10.1016/j.aiopen.2023.08.004","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.004","url":null,"abstract":"<div><p>Fake news detection is one of the most alluring problems that has grabbed the interest of Machine Learning (ML) and Natural Language Processing (NLP) experts in recent years. The majority of existing studies on detecting fake news are written in English, restricting its application outside the English-speaking population. The lack of annotated corpora and technologies makes it difficult to identify false news in the scenario of low-resource languages, despite the growth in multilingual web content. Moreover, existing works cannot collect more semantic and contextual characteristics from documents in a particular multilingual text corpus. To bridge up these challenges and deal with the multilingual fake news detection challenge, we develop a new semantic graph attention-based representation learning framework to extract structural and semantic representations of texts. Our experiments on TALLIP fake news datasets showed that the classification performance had been significantly enhanced, ranging from 1% to 7% in terms of accuracy metric, and our proposed framework outperformed the state-of-the-art techniques for the multilingual fake news detection task.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 33-41"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation 加速预训练语言模型知识升华的自适应数据选择
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.005
Qinhong Zhou , Peng Li , Yang Liu , Yuyang Guan , Qizhou Xing , Ming Chen , Maosong Sun , Yang Liu

Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre-trained language models (PLMs) grow larger. Computing KD loss on only part of the training set is a promising way to accelerate KD. However, existing works heuristically leverage only one static data selection strategy during the KD process, demonstrating inconsistent improvements across different distillation scenarios. In this work, we conduct a thorough study on various typical data selection strategies for KD, and show that this problem is due to the fact that the best data selection strategy is specific to various factors, including task, selected data size, and training stage. To automatically adapt to these factors, we propose a framework named AdaDS to learn to choose the data selection strategy adaptively during the KD process. Experimental results show that our proposed method is effective for various tasks and selected data sizes under both fine-tuning and pre-training stages, achieving comparable performance to DistilBERT with only 10% amount of queries to the teacher model.

知识提取(KD)是一种广泛使用的方法,用于将知识从大型教师模型转移到计算高效的学生模型。不幸的是,随着预训练语言模型(PLM)的增长,KD的计算成本变得难以承受。仅在训练集的一部分上计算KD损失是加速KD的一种很有前途的方法。然而,现有的工作在KD过程中仅启发式地利用了一种静态数据选择策略,表明不同蒸馏场景的改进不一致。在这项工作中,我们对KD的各种典型数据选择策略进行了深入的研究,并表明这个问题是由于最佳数据选择策略是特定于各种因素的,包括任务、选择的数据大小和训练阶段。为了自动适应这些因素,我们提出了一个名为AdaDS的框架来学习在KD过程中自适应地选择数据选择策略。实验结果表明,在微调和预训练阶段,我们提出的方法对各种任务和选定的数据大小都是有效的,在对教师模型只有10%的查询量的情况下,实现了与DistilBERT相当的性能。
{"title":"AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation","authors":"Qinhong Zhou ,&nbsp;Peng Li ,&nbsp;Yang Liu ,&nbsp;Yuyang Guan ,&nbsp;Qizhou Xing ,&nbsp;Ming Chen ,&nbsp;Maosong Sun ,&nbsp;Yang Liu","doi":"10.1016/j.aiopen.2023.08.005","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.005","url":null,"abstract":"<div><p>Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre-trained language models (PLMs) grow larger. Computing KD loss on only part of the training set is a promising way to accelerate KD. However, existing works heuristically leverage only one static data selection strategy during the KD process, demonstrating inconsistent improvements across different distillation scenarios. In this work, we conduct a thorough study on various typical data selection strategies for KD, and show that this problem is due to the fact that the best data selection strategy is specific to various factors, including task, selected data size, and training stage. To automatically adapt to these factors, we propose a framework named AdaDS to learn to choose the data selection strategy adaptively during the KD process. Experimental results show that our proposed method is effective for various tasks and selected data sizes under both fine-tuning and pre-training stages, achieving comparable performance to DistilBERT with only 10% amount of queries to the teacher model.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 56-63"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MONEY: Ensemble learning for stock price movement prediction via a convolutional network with adversarial hypergraph model MONEY:基于对抗超图模型卷积网络的股票价格运动预测集成学习
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.10.002
Zhongtian Sun , Anoushka Harit , Alexandra I. Cristea , Jingyun Wang , Pietro Lio

Stock price prediction is challenging in financial investment, with the AI boom leading to increased interest from researchers. Despite these recent advances, many studies are limited to capturing the time series characteristics of price movement via recurrent neural networks (RNNs) but neglect other critical relevant factors, such as industry, shareholders, and news. On the other hand, graph neural networks have been applied to a broad range of tasks due to their superior performance in capturing complex relations among entities and representation learning. This paper investigates the effectiveness of using graph neural networks for stock price movement prediction. Inspired by a recent study, we capture the complex group-level information (co-movement of similar companies) via hypergraphs. Unlike other hypergraph studies, we also use a graph model to learn pairwise relations. Moreover, we are the first to demonstrate that this simple graph model should be applied before using RNNs, rather than later, as prior research suggested. In this paper, the long-term dependencies of similar companies can be learnt by the next RNNs, which augments their predictability. We also apply adversarial training to capture the stochastic nature of the financial market and enhance the generalisation of the proposed model. Hence, we contribute with a novel ensemble learning framework to predict stock price movement, named MONEY. It is comprised of (a) a Graph Convolution Network (GCN), representing pairwise industry and price information and (b) a hypergraph convolution network for group-oriented information transmission via hyperedges with adversarial training by adding perturbations on inputs before the last prediction layer. Real-world data experiments demonstrate that MONEY significantly outperforms, on average, the state-of-the-art methods and performs particularly well in the bear market.

随着人工智能的兴起,研究人员对股价预测的兴趣日益浓厚,在金融投资领域,股价预测具有挑战性。尽管最近取得了这些进展,但许多研究仅限于通过循环神经网络(rnn)捕捉价格运动的时间序列特征,而忽略了其他关键的相关因素,如行业、股东和新闻。另一方面,图神经网络由于其在捕获实体之间的复杂关系和表示学习方面的优异性能而被广泛应用于各种任务。本文研究了利用图神经网络进行股票价格走势预测的有效性。受最近一项研究的启发,我们通过超图捕获了复杂的群体级信息(类似公司的共同运动)。与其他超图研究不同,我们还使用图模型来学习两两关系。此外,我们是第一个证明这个简单的图模型应该在使用rnn之前应用,而不是像之前的研究建议的那样在之后应用。在本文中,类似公司的长期依赖关系可以被下一个rnn学习,这增加了它们的可预测性。我们还应用对抗性训练来捕捉金融市场的随机性,并增强所提出模型的泛化性。因此,我们提出了一个新的集成学习框架来预测股票价格走势,命名为MONEY。它由(a)一个图卷积网络(GCN)组成,两两表示行业和价格信息;(b)一个超图卷积网络,通过在最后一个预测层之前的输入上添加扰动,通过超边进行对抗性训练,用于面向群体的信息传输。现实世界的数据实验表明,平均而言,MONEY的表现明显优于最先进的方法,在熊市中表现尤其出色。
{"title":"MONEY: Ensemble learning for stock price movement prediction via a convolutional network with adversarial hypergraph model","authors":"Zhongtian Sun ,&nbsp;Anoushka Harit ,&nbsp;Alexandra I. Cristea ,&nbsp;Jingyun Wang ,&nbsp;Pietro Lio","doi":"10.1016/j.aiopen.2023.10.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.002","url":null,"abstract":"<div><p>Stock price prediction is challenging in financial investment, with the AI boom leading to increased interest from researchers. Despite these recent advances, many studies are limited to capturing the time series characteristics of price movement via recurrent neural networks (RNNs) but neglect other critical relevant factors, such as industry, shareholders, and news. On the other hand, graph neural networks have been applied to a broad range of tasks due to their superior performance in capturing complex relations among entities and representation learning. This paper investigates the effectiveness of using graph neural networks for stock price movement prediction. Inspired by a recent study, we capture the complex group-level information (co-movement of similar companies) via hypergraphs. Unlike other hypergraph studies, we also use a graph model to learn pairwise relations. Moreover, we are the first to demonstrate that this simple graph model should be applied before using RNNs, rather than later, as prior research suggested. In this paper, the long-term dependencies of similar companies can be learnt by the next RNNs, which augments their predictability. We also apply adversarial training to capture the stochastic nature of the financial market and enhance the generalisation of the proposed model. Hence, we contribute with a novel ensemble learning framework to predict stock price movement, named MONEY. It is comprised of (a) a Graph Convolution Network (GCN), representing pairwise industry and price information and (b) a hypergraph convolution network for group-oriented information transmission via hyperedges with adversarial training by adding perturbations on inputs before the last prediction layer. Real-world data experiments demonstrate that MONEY significantly outperforms, on average, the state-of-the-art methods and performs particularly well in the bear market.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 165-174"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000189/pdfft?md5=40081746293fa3fdc23c059ee4dd4684&pid=1-s2.0-S2666651023000189-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92026116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive active learning for fairness with partial group label 基于部分分组标签的公平交互式主动学习
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.10.003
Zeyu Yang , Jizhi Zhang , Fuli Feng , Chongming Gao , Qifan Wang , Xiangnan He

The rapid development of AI technologies has found numerous applications across various domains in human society. Ensuring fairness and preventing discrimination are critical considerations in the development of AI models. However, incomplete information often hinders the complete collection of sensitive attributes in real-world applications, primarily due to the high cost and potential privacy violations associated with such data collection. Label reconstruction through building another learner on sensitive attributes is a common approach to address this issue. However, existing methods focus solely on improving the prediction accuracy of the sensitive learner as a separate model, while ignoring the disparity between its accuracy and the fairness of the base model. To bridge this gap, this paper proposes an interactive learning framework that aims to optimize the sensitive learner while considering the fairness of the base learner. Furthermore, a new active sampling strategy is developed to select the most valuable data for the sensitive learner regarding the fairness of the base model. The effectiveness of our proposed method in improving model fairness is demonstrated through comprehensive evaluations conducted on various datasets and fairness criteria.

人工智能技术的快速发展已经在人类社会的各个领域找到了许多应用。确保公平和防止歧视是人工智能模型开发中的关键考虑因素。然而,在实际应用程序中,不完整的信息通常会阻碍敏感属性的完整收集,这主要是由于与此类数据收集相关的高成本和潜在的隐私侵犯。通过在敏感属性上构建另一个学习器来进行标签重构是解决这一问题的常用方法。然而,现有的方法仅仅关注于提高敏感学习器作为一个独立模型的预测精度,而忽略了其准确性与基础模型公平性之间的差异。为了弥补这一差距,本文提出了一种交互式学习框架,旨在优化敏感学习者,同时考虑基础学习者的公平性。此外,提出了一种新的主动采样策略,根据基本模型的公平性为敏感学习者选择最有价值的数据。通过对各种数据集和公平性标准进行综合评估,证明了我们提出的方法在提高模型公平性方面的有效性。
{"title":"Interactive active learning for fairness with partial group label","authors":"Zeyu Yang ,&nbsp;Jizhi Zhang ,&nbsp;Fuli Feng ,&nbsp;Chongming Gao ,&nbsp;Qifan Wang ,&nbsp;Xiangnan He","doi":"10.1016/j.aiopen.2023.10.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.003","url":null,"abstract":"<div><p>The rapid development of AI technologies has found numerous applications across various domains in human society. Ensuring fairness and preventing discrimination are critical considerations in the development of AI models. However, incomplete information often hinders the complete collection of sensitive attributes in real-world applications, primarily due to the high cost and potential privacy violations associated with such data collection. Label reconstruction through building another learner on sensitive attributes is a common approach to address this issue. However, existing methods focus solely on improving the prediction accuracy of the sensitive learner as a separate model, while ignoring the disparity between its accuracy and the fairness of the base model. To bridge this gap, this paper proposes an interactive learning framework that aims to optimize the sensitive learner while considering the fairness of the base learner. Furthermore, a new active sampling strategy is developed to select the most valuable data for the sensitive learner regarding the fairness of the base model. The effectiveness of our proposed method in improving model fairness is demonstrated through comprehensive evaluations conducted on various datasets and fairness criteria.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 175-182"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000190/pdfft?md5=8647172d4d8f417e44b8c64861c1afd4&pid=1-s2.0-S2666651023000190-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92131676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information Retrieval meets Large Language Models: A strategic report from Chinese IR community 信息检索与大型语言模型的结合——来自中国信息检索界的战略报告
Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.001
Qingyao Ai , Ting Bai , Zhao Cao , Yi Chang , Jiawei Chen , Zhumin Chen , Zhiyong Cheng , Shoubin Dong , Zhicheng Dou , Fuli Feng , Shen Gao , Jiafeng Guo , Xiangnan He , Yanyan Lan , Chenliang Li , Yiqun Liu , Ziyu Lyu , Weizhi Ma , Jun Ma , Zhaochun Ren , Xiaofei Zhu

The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop’s outcomes, including the rethinking of IR’s core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.

信息检索(IR)的研究领域已经发生了重大变化,超越了传统的搜索,以满足不同用户的信息需求。最近,大型语言模型(LLM)在文本理解、生成和知识推理方面表现出了非凡的能力,为IR研究开辟了令人兴奋的途径。LLM不仅有助于生成检索,还为用户理解、模型评估和用户系统交互提供了改进的解决方案。更重要的是,IR模型、LLM和人类之间的协同关系形成了一种新的技术范式,对信息寻求来说更为强大。IR模型提供实时和相关的信息,LLM贡献内部知识,人类在信息服务的可靠性方面扮演着需求者和评估者的核心角色。尽管如此,仍然存在重大挑战,包括计算成本、可信度问题、特定领域的限制和道德考虑。为了深入讨论LLM对IR研究的变革性影响,中国IR界于2023年4月举办了一次战略研讨会,产生了宝贵的见解。本文总结了研讨会的成果,包括对IR核心价值观的重新思考、LLM和IR的相互增强、新的IR技术范式的提出以及公开的挑战。
{"title":"Information Retrieval meets Large Language Models: A strategic report from Chinese IR community","authors":"Qingyao Ai ,&nbsp;Ting Bai ,&nbsp;Zhao Cao ,&nbsp;Yi Chang ,&nbsp;Jiawei Chen ,&nbsp;Zhumin Chen ,&nbsp;Zhiyong Cheng ,&nbsp;Shoubin Dong ,&nbsp;Zhicheng Dou ,&nbsp;Fuli Feng ,&nbsp;Shen Gao ,&nbsp;Jiafeng Guo ,&nbsp;Xiangnan He ,&nbsp;Yanyan Lan ,&nbsp;Chenliang Li ,&nbsp;Yiqun Liu ,&nbsp;Ziyu Lyu ,&nbsp;Weizhi Ma ,&nbsp;Jun Ma ,&nbsp;Zhaochun Ren ,&nbsp;Xiaofei Zhu","doi":"10.1016/j.aiopen.2023.08.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.001","url":null,"abstract":"<div><p>The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop’s outcomes, including the rethinking of IR’s core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 80-90"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
AI Open
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1