Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.05.001
Jialin Yu , Alexandra I. Cristea , Anoushka Harit , Zhongtian Sun , Olanrewaju Tahir Aduragba , Lei Shi , Noura Al Moubayed
This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text. To leverage information from text pairs, we additionally introduce a novel supervised model we call dual directional learning (DDL), which is designed to integrate with our proposed VSAR model. Combining VSAR with DDL (DDL+VSAR) enables us to conduct semi-supervised learning. Still, the combined model suffers from a cold-start problem. To further combat this issue, we propose an improved weight initialisation solution, leading to a novel two-stage training scheme we call knowledge-reinforced-learning (KRL). Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data. Furthermore, in scenarios where only a fraction of the labelled pairs are available, our combined model consistently outperforms the strong supervised model baseline (DDL) by a significant margin (; Wilcoxon test). Our code is publicly available at https://github.com/jialin-yu/latent-sequence-paraphrase.
{"title":"Language as a latent sequence: Deep latent variable models for semi-supervised paraphrase generation","authors":"Jialin Yu , Alexandra I. Cristea , Anoushka Harit , Zhongtian Sun , Olanrewaju Tahir Aduragba , Lei Shi , Noura Al Moubayed","doi":"10.1016/j.aiopen.2023.05.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.05.001","url":null,"abstract":"<div><p>This paper explores deep latent variable models for semi-supervised paraphrase generation, where the missing target pair for unlabelled data is modelled as a latent paraphrase sequence. We present a novel unsupervised model named <em>variational sequence auto-encoding reconstruction</em> (<strong>VSAR</strong>), which performs latent sequence inference given an observed text. To leverage information from text pairs, we additionally introduce a novel supervised model we call <em>dual directional learning</em> (<strong>DDL</strong>), which is designed to integrate with our proposed VSAR model. Combining VSAR with DDL (<strong>DDL+VSAR</strong>) enables us to conduct semi-supervised learning. Still, the combined model suffers from a cold-start problem. To further combat this issue, we propose an improved weight initialisation solution, leading to a novel two-stage training scheme we call <em>knowledge-reinforced-learning</em> (<strong>KRL</strong>). Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data. Furthermore, in scenarios where only a fraction of the labelled pairs are available, our combined model consistently outperforms the strong supervised model baseline (<strong>DDL</strong>) by a significant margin (<span><math><mrow><mi>p</mi><mo><</mo><mo>.</mo><mn>05</mn></mrow></math></span>; Wilcoxon test). Our code is publicly available at https://github.com/jialin-yu/latent-sequence-paraphrase.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 19-32"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.008
Chaojun Xiao , Ruobing Xie , Yuan Yao , Zhiyuan Liu , Maosong Sun , Xu Zhang , Leyu Lin
Recent years witness the success of pre-trained models to alleviate the data sparsity problem in recommender systems. However, existing pre-trained models for recommendation mainly focus on leveraging universal sequence patterns from user behavior sequences and item information, whereas ignore heterogeneous user information to capture personalized interests, which has been shown to contribute to the personalized recommendation. In this paper, we propose a simple yet effective model, called User-aware Pre-training for Recommendation (UPRec), which could flexibly encode heterogeneous user information into the sequential modeling of user behaviors. Specifically, UPRec first encodes the sequential behavior to generate user embeddings, and then jointly optimizes the model with the sequential objective and user-aware objective constructed from the user attributes and structured social graphs. Comprehensive experimental results on two real-world large-scale recommendation datasets demonstrate that UPRec can effectively enrich the user representations with user attributes and social relations and thus provide more appropriate recommendations for users.
{"title":"UPRec: User-aware Pre-training for sequential Recommendation","authors":"Chaojun Xiao , Ruobing Xie , Yuan Yao , Zhiyuan Liu , Maosong Sun , Xu Zhang , Leyu Lin","doi":"10.1016/j.aiopen.2023.08.008","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.008","url":null,"abstract":"<div><p>Recent years witness the success of pre-trained models to alleviate the data sparsity problem in recommender systems. However, existing pre-trained models for recommendation mainly focus on leveraging universal sequence patterns from user behavior sequences and item information, whereas ignore heterogeneous user information to capture personalized interests, which has been shown to contribute to the personalized recommendation. In this paper, we propose a simple yet effective model, called <strong>U</strong>ser-aware <strong>P</strong>re-training for <strong>Rec</strong>ommendation (UPRec), which could flexibly encode heterogeneous user information into the sequential modeling of user behaviors. Specifically, UPRec first encodes the sequential behavior to generate user embeddings, and then jointly optimizes the model with the sequential objective and user-aware objective constructed from the user attributes and structured social graphs. Comprehensive experimental results on two real-world large-scale recommendation datasets demonstrate that UPRec can effectively enrich the user representations with user attributes and social relations and thus provide more appropriate recommendations for users.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 137-144"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.003
Huadong Qiu , Rui Feng , Ruoyun Hu , Xiao Yang , Shaowa Lin , Quanjin Tao , Yang Yang
Fairness has become a central issue for our research community as classification algorithms are adopted in societally critical domains such as recidivism prediction and loan approval. In this work, we consider the potential bias based on protected attributes (e.g., race and gender), and tackle this problem by learning latent representations of individuals that are statistically indistinguishable between protected groups while sufficiently preserving other information for classification. To do that, we develop a minimax adversarial framework with a generator to capture the data distribution and generate latent representations, and a critic to ensure that the distributions across different protected groups are similar. Our framework provides theoretical guarantee with respect statistical parity and individual fairness. Empirical results on four real-world datasets also show that the learned representation can effectively be used for classification tasks such as credit risk prediction while obstructing information related to protected groups, especially when removing protected attributes is not sufficient for fair classification.
{"title":"Learning fair representations via an adversarial framework","authors":"Huadong Qiu , Rui Feng , Ruoyun Hu , Xiao Yang , Shaowa Lin , Quanjin Tao , Yang Yang","doi":"10.1016/j.aiopen.2023.08.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.003","url":null,"abstract":"<div><p>Fairness has become a central issue for our research community as classification algorithms are adopted in societally critical domains such as recidivism prediction and loan approval. In this work, we consider the potential bias based on protected attributes (e.g., race and gender), and tackle this problem by learning latent representations of individuals that are statistically indistinguishable between protected groups while sufficiently preserving other information for classification. To do that, we develop a minimax adversarial framework with a <em>generator</em> to capture the data distribution and generate latent representations, and a <em>critic</em> to ensure that the distributions across different protected groups are similar. Our framework provides theoretical guarantee with respect statistical parity and individual fairness. Empirical results on four real-world datasets also show that the learned representation can effectively be used for classification tasks such as credit risk prediction while obstructing information related to protected groups, especially when removing protected attributes is not sufficient for fair classification.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 91-97"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49761371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.09.001
Fayou Sun , Hea Choon Ngo , Yong Wee Sek , Zuqiang Meng
- Accurate discriminative region proposal has an important effect for fine-grained image recognition. The vision transformer (ViT) brings about a striking effect in computer vision due to its innate multi-head self-attention mechanism. However, the attention maps are gradually similar after certain layers, and since ViT used a classification token to achieve classification, it is unable to effectively select discriminative image patches for fine-grained image classification. To accurately detect discriminative regions, we propose a novel network AMTrans, which efficiently increases layers to learn diverse features and utilizes integrated raw attention maps to capture more salient features. Specifically, we employ DeepViT as backbone to solve the attention collapse issue. Then, we fuse each head attention weight within each layer to produce an attention weight map. After that, we alternatively use recurrent residual refinement blocks to promote salient feature and then utilize the semantic grouping method to propose the discriminative feature region. A lot of experiments prove that AMTrans acquires the SOTA performance on four widely used fine-grained datasets under the same settings, involving Stanford-Cars, Stanford-Dogs, CUB-200-2011, and ImageNet.
{"title":"Associating multiple vision transformer layers for fine-grained image representation","authors":"Fayou Sun , Hea Choon Ngo , Yong Wee Sek , Zuqiang Meng","doi":"10.1016/j.aiopen.2023.09.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.09.001","url":null,"abstract":"<div><p>- Accurate discriminative region proposal has an important effect for fine-grained image recognition. The vision transformer (ViT) brings about a striking effect in computer vision due to its innate multi-head self-attention mechanism. However, the attention maps are gradually similar after certain layers, and since ViT used a classification token to achieve classification, it is unable to effectively select discriminative image patches for fine-grained image classification. To accurately detect discriminative regions, we propose a novel network AMTrans, which efficiently increases layers to learn diverse features and utilizes integrated raw attention maps to capture more salient features. Specifically, we employ DeepViT as backbone to solve the attention collapse issue. Then, we fuse each head attention weight within each layer to produce an attention weight map. After that, we alternatively use recurrent residual refinement blocks to promote salient feature and then utilize the semantic grouping method to propose the discriminative feature region. A lot of experiments prove that AMTrans acquires the SOTA performance on four widely used fine-grained datasets under the same settings, involving Stanford-Cars, Stanford-Dogs, CUB-200-2011, and ImageNet.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 130-136"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-object tracking (MOT) is one of the most essential and challenging tasks in computer vision (CV). Unlike object detectors, MOT systems nowadays are more complicated and consist of several neural network models. Thus, the balance between the system performance and the runtime is crucial for online scenarios. While some of the works contribute by adding more modules to achieve improvements, we propose a pruned model by leveraging the state-of-the-art Transformer backbone model. Our model saves up to 62% FLOPS compared with other Transformer-based models and almost as twice as fast as them. The results of the proposed model are still competitive among the state-of-the-art methods. Moreover, we will open-source our modified Transformer backbone model for general CV tasks as well as the MOT system.
{"title":"MOTT: A new model for multi-object tracking based on green learning paradigm","authors":"Shan Wu , Amnir Hadachi , Chaoru Lu , Damien Vivet","doi":"10.1016/j.aiopen.2023.09.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.09.002","url":null,"abstract":"<div><p>Multi-object tracking (MOT) is one of the most essential and challenging tasks in computer vision (CV). Unlike object detectors, MOT systems nowadays are more complicated and consist of several neural network models. Thus, the balance between the system performance and the runtime is crucial for online scenarios. While some of the works contribute by adding more modules to achieve improvements, we propose a pruned model by leveraging the state-of-the-art Transformer backbone model. Our model saves up to 62% FLOPS compared with other Transformer-based models and almost as twice as fast as them. The results of the proposed model are still competitive among the state-of-the-art methods. Moreover, we will open-source our modified Transformer backbone model for general CV tasks as well as the MOT system.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 145-153"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fake news detection is one of the most alluring problems that has grabbed the interest of Machine Learning (ML) and Natural Language Processing (NLP) experts in recent years. The majority of existing studies on detecting fake news are written in English, restricting its application outside the English-speaking population. The lack of annotated corpora and technologies makes it difficult to identify false news in the scenario of low-resource languages, despite the growth in multilingual web content. Moreover, existing works cannot collect more semantic and contextual characteristics from documents in a particular multilingual text corpus. To bridge up these challenges and deal with the multilingual fake news detection challenge, we develop a new semantic graph attention-based representation learning framework to extract structural and semantic representations of texts. Our experiments on TALLIP fake news datasets showed that the classification performance had been significantly enhanced, ranging from 1% to 7% in terms of accuracy metric, and our proposed framework outperformed the state-of-the-art techniques for the multilingual fake news detection task.
{"title":"Semantic graph based topic modelling framework for multilingual fake news detection","authors":"Rami Mohawesh , Xiao Liu , Hilya Mudrika Arini , Yutao Wu , Hui Yin","doi":"10.1016/j.aiopen.2023.08.004","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.004","url":null,"abstract":"<div><p>Fake news detection is one of the most alluring problems that has grabbed the interest of Machine Learning (ML) and Natural Language Processing (NLP) experts in recent years. The majority of existing studies on detecting fake news are written in English, restricting its application outside the English-speaking population. The lack of annotated corpora and technologies makes it difficult to identify false news in the scenario of low-resource languages, despite the growth in multilingual web content. Moreover, existing works cannot collect more semantic and contextual characteristics from documents in a particular multilingual text corpus. To bridge up these challenges and deal with the multilingual fake news detection challenge, we develop a new semantic graph attention-based representation learning framework to extract structural and semantic representations of texts. Our experiments on TALLIP fake news datasets showed that the classification performance had been significantly enhanced, ranging from 1% to 7% in terms of accuracy metric, and our proposed framework outperformed the state-of-the-art techniques for the multilingual fake news detection task.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 33-41"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.005
Qinhong Zhou , Peng Li , Yang Liu , Yuyang Guan , Qizhou Xing , Ming Chen , Maosong Sun , Yang Liu
Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre-trained language models (PLMs) grow larger. Computing KD loss on only part of the training set is a promising way to accelerate KD. However, existing works heuristically leverage only one static data selection strategy during the KD process, demonstrating inconsistent improvements across different distillation scenarios. In this work, we conduct a thorough study on various typical data selection strategies for KD, and show that this problem is due to the fact that the best data selection strategy is specific to various factors, including task, selected data size, and training stage. To automatically adapt to these factors, we propose a framework named AdaDS to learn to choose the data selection strategy adaptively during the KD process. Experimental results show that our proposed method is effective for various tasks and selected data sizes under both fine-tuning and pre-training stages, achieving comparable performance to DistilBERT with only 10% amount of queries to the teacher model.
{"title":"AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation","authors":"Qinhong Zhou , Peng Li , Yang Liu , Yuyang Guan , Qizhou Xing , Ming Chen , Maosong Sun , Yang Liu","doi":"10.1016/j.aiopen.2023.08.005","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.005","url":null,"abstract":"<div><p>Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre-trained language models (PLMs) grow larger. Computing KD loss on only part of the training set is a promising way to accelerate KD. However, existing works heuristically leverage only one static data selection strategy during the KD process, demonstrating inconsistent improvements across different distillation scenarios. In this work, we conduct a thorough study on various typical data selection strategies for KD, and show that this problem is due to the fact that the best data selection strategy is specific to various factors, including task, selected data size, and training stage. To automatically adapt to these factors, we propose a framework named AdaDS to learn to choose the data selection strategy adaptively during the KD process. Experimental results show that our proposed method is effective for various tasks and selected data sizes under both fine-tuning and pre-training stages, achieving comparable performance to DistilBERT with only 10% amount of queries to the teacher model.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 56-63"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.10.002
Zhongtian Sun , Anoushka Harit , Alexandra I. Cristea , Jingyun Wang , Pietro Lio
Stock price prediction is challenging in financial investment, with the AI boom leading to increased interest from researchers. Despite these recent advances, many studies are limited to capturing the time series characteristics of price movement via recurrent neural networks (RNNs) but neglect other critical relevant factors, such as industry, shareholders, and news. On the other hand, graph neural networks have been applied to a broad range of tasks due to their superior performance in capturing complex relations among entities and representation learning. This paper investigates the effectiveness of using graph neural networks for stock price movement prediction. Inspired by a recent study, we capture the complex group-level information (co-movement of similar companies) via hypergraphs. Unlike other hypergraph studies, we also use a graph model to learn pairwise relations. Moreover, we are the first to demonstrate that this simple graph model should be applied before using RNNs, rather than later, as prior research suggested. In this paper, the long-term dependencies of similar companies can be learnt by the next RNNs, which augments their predictability. We also apply adversarial training to capture the stochastic nature of the financial market and enhance the generalisation of the proposed model. Hence, we contribute with a novel ensemble learning framework to predict stock price movement, named MONEY. It is comprised of (a) a Graph Convolution Network (GCN), representing pairwise industry and price information and (b) a hypergraph convolution network for group-oriented information transmission via hyperedges with adversarial training by adding perturbations on inputs before the last prediction layer. Real-world data experiments demonstrate that MONEY significantly outperforms, on average, the state-of-the-art methods and performs particularly well in the bear market.
{"title":"MONEY: Ensemble learning for stock price movement prediction via a convolutional network with adversarial hypergraph model","authors":"Zhongtian Sun , Anoushka Harit , Alexandra I. Cristea , Jingyun Wang , Pietro Lio","doi":"10.1016/j.aiopen.2023.10.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.002","url":null,"abstract":"<div><p>Stock price prediction is challenging in financial investment, with the AI boom leading to increased interest from researchers. Despite these recent advances, many studies are limited to capturing the time series characteristics of price movement via recurrent neural networks (RNNs) but neglect other critical relevant factors, such as industry, shareholders, and news. On the other hand, graph neural networks have been applied to a broad range of tasks due to their superior performance in capturing complex relations among entities and representation learning. This paper investigates the effectiveness of using graph neural networks for stock price movement prediction. Inspired by a recent study, we capture the complex group-level information (co-movement of similar companies) via hypergraphs. Unlike other hypergraph studies, we also use a graph model to learn pairwise relations. Moreover, we are the first to demonstrate that this simple graph model should be applied before using RNNs, rather than later, as prior research suggested. In this paper, the long-term dependencies of similar companies can be learnt by the next RNNs, which augments their predictability. We also apply adversarial training to capture the stochastic nature of the financial market and enhance the generalisation of the proposed model. Hence, we contribute with a novel ensemble learning framework to predict stock price movement, named MONEY. It is comprised of (a) a Graph Convolution Network (GCN), representing pairwise industry and price information and (b) a hypergraph convolution network for group-oriented information transmission via hyperedges with adversarial training by adding perturbations on inputs before the last prediction layer. Real-world data experiments demonstrate that MONEY significantly outperforms, on average, the state-of-the-art methods and performs particularly well in the bear market.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 165-174"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000189/pdfft?md5=40081746293fa3fdc23c059ee4dd4684&pid=1-s2.0-S2666651023000189-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92026116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.10.003
Zeyu Yang , Jizhi Zhang , Fuli Feng , Chongming Gao , Qifan Wang , Xiangnan He
The rapid development of AI technologies has found numerous applications across various domains in human society. Ensuring fairness and preventing discrimination are critical considerations in the development of AI models. However, incomplete information often hinders the complete collection of sensitive attributes in real-world applications, primarily due to the high cost and potential privacy violations associated with such data collection. Label reconstruction through building another learner on sensitive attributes is a common approach to address this issue. However, existing methods focus solely on improving the prediction accuracy of the sensitive learner as a separate model, while ignoring the disparity between its accuracy and the fairness of the base model. To bridge this gap, this paper proposes an interactive learning framework that aims to optimize the sensitive learner while considering the fairness of the base learner. Furthermore, a new active sampling strategy is developed to select the most valuable data for the sensitive learner regarding the fairness of the base model. The effectiveness of our proposed method in improving model fairness is demonstrated through comprehensive evaluations conducted on various datasets and fairness criteria.
{"title":"Interactive active learning for fairness with partial group label","authors":"Zeyu Yang , Jizhi Zhang , Fuli Feng , Chongming Gao , Qifan Wang , Xiangnan He","doi":"10.1016/j.aiopen.2023.10.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.003","url":null,"abstract":"<div><p>The rapid development of AI technologies has found numerous applications across various domains in human society. Ensuring fairness and preventing discrimination are critical considerations in the development of AI models. However, incomplete information often hinders the complete collection of sensitive attributes in real-world applications, primarily due to the high cost and potential privacy violations associated with such data collection. Label reconstruction through building another learner on sensitive attributes is a common approach to address this issue. However, existing methods focus solely on improving the prediction accuracy of the sensitive learner as a separate model, while ignoring the disparity between its accuracy and the fairness of the base model. To bridge this gap, this paper proposes an interactive learning framework that aims to optimize the sensitive learner while considering the fairness of the base learner. Furthermore, a new active sampling strategy is developed to select the most valuable data for the sensitive learner regarding the fairness of the base model. The effectiveness of our proposed method in improving model fairness is demonstrated through comprehensive evaluations conducted on various datasets and fairness criteria.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 175-182"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000190/pdfft?md5=8647172d4d8f417e44b8c64861c1afd4&pid=1-s2.0-S2666651023000190-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92131676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.001
Qingyao Ai , Ting Bai , Zhao Cao , Yi Chang , Jiawei Chen , Zhumin Chen , Zhiyong Cheng , Shoubin Dong , Zhicheng Dou , Fuli Feng , Shen Gao , Jiafeng Guo , Xiangnan He , Yanyan Lan , Chenliang Li , Yiqun Liu , Ziyu Lyu , Weizhi Ma , Jun Ma , Zhaochun Ren , Xiaofei Zhu
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop’s outcomes, including the rethinking of IR’s core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.
{"title":"Information Retrieval meets Large Language Models: A strategic report from Chinese IR community","authors":"Qingyao Ai , Ting Bai , Zhao Cao , Yi Chang , Jiawei Chen , Zhumin Chen , Zhiyong Cheng , Shoubin Dong , Zhicheng Dou , Fuli Feng , Shen Gao , Jiafeng Guo , Xiangnan He , Yanyan Lan , Chenliang Li , Yiqun Liu , Ziyu Lyu , Weizhi Ma , Jun Ma , Zhaochun Ren , Xiaofei Zhu","doi":"10.1016/j.aiopen.2023.08.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.001","url":null,"abstract":"<div><p>The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop’s outcomes, including the rethinking of IR’s core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 80-90"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}