Pub Date : 2025-01-31DOI: 10.1109/TBDATA.2025.3533905
Bayaraa Enkhsaikhan;Ohyun Jo
We explored the application of Risk-averse Reinforcement Learning (Risk-averse RL) in Constrained Markov Decision Process (CMDP) in optimizing investment portfolios, incorporating constraints assessment. The investment portfolio must be always constrained with risk characteristics by investors and regulators. Therefore, the hard constraint is necessary for the practical Portfolio optimization. Moreover, traditional portfolio optimization techniques lack flexibility to model complex dynamic financial market. To address this issue, Augmented Lagrangian Multiplier (ALM) was employed to enforce constraints on the agent, mitigating the impact of risk in the decision process. Our proposal of the risk-constrained RL algorithm demonstrated no constraint violations during the testing phase, and outperformance compared to other Risk-averse RL algorithms, fulfilling our primary goal. This suggests that incorporating a risk-constrained RL technique holds promise for portfolio optimization, particularly for risk-averse investors.
{"title":"Risk-Constrained Reinforcement Learning With Augmented Lagrangian Multiplier for Portfolio Optimization","authors":"Bayaraa Enkhsaikhan;Ohyun Jo","doi":"10.1109/TBDATA.2025.3533905","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533905","url":null,"abstract":"We explored the application of Risk-averse Reinforcement Learning (Risk-averse RL) in Constrained Markov Decision Process (CMDP) in optimizing investment portfolios, incorporating constraints assessment. The investment portfolio must be always constrained with risk characteristics by investors and regulators. Therefore, the hard constraint is necessary for the practical Portfolio optimization. Moreover, traditional portfolio optimization techniques lack flexibility to model complex dynamic financial market. To address this issue, Augmented Lagrangian Multiplier (ALM) was employed to enforce constraints on the agent, mitigating the impact of risk in the decision process. Our proposal of the risk-constrained RL algorithm demonstrated no constraint violations during the testing phase, and outperformance compared to other Risk-averse RL algorithms, fulfilling our primary goal. This suggests that incorporating a risk-constrained RL technique holds promise for portfolio optimization, particularly for risk-averse investors.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2489-2502"},"PeriodicalIF":5.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heart failure (HF) poses a significant public health challenge, with a rising global mortality rate. Early detection and prevention of HF could significantly reduce its impact. We introduce a novel methodology for predicting HF risk using 12-lead electrocardiograms (ECGs). We present a novel, lightweight dual attention ECG network designed to capture complex ECG features essential for early HF risk prediction, despite the notable imbalance between low and high-risk groups. This network incorporates a cross-lead attention module and 12 lead-specific temporal attention modules, focusing on cross-lead interactions and each lead's local dynamics. To further alleviate model overfitting, we leverage a large language model (LLM) with a public ECG-Report dataset for pretraining on an ECG-Report alignment task. The network is then fine-tuned for HF risk prediction using two specific cohorts from the U.K. Biobank study, focusing on patients with hypertension (UKB-HYP) and those who have had a myocardial infarction (UKB-MI). The results reveal that LLM-informed pre-training substantially enhances HF risk prediction in these cohorts. The dual attention design not only improves interpretability but also predictive accuracy, outperforming existing competitive methods with C-index scores of 0.6349 for UKB-HYP and 0.5805 for UKB-MI. This demonstrates our method's potential in advancing HF risk assessment with clinical complex ECG data.
{"title":"Large Language Model-Informed ECG Dual Attention Network for Heart Failure Risk Prediction","authors":"Chen Chen;Lei Li;Marcel Beetz;Abhirup Banerjee;Ramneek Gupta;Vicente Grau","doi":"10.1109/TBDATA.2025.3536922","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536922","url":null,"abstract":"Heart failure (HF) poses a significant public health challenge, with a rising global mortality rate. Early detection and prevention of HF could significantly reduce its impact. We introduce a novel methodology for predicting HF risk using 12-lead electrocardiograms (ECGs). We present a novel, lightweight dual attention ECG network designed to capture complex ECG features essential for early HF risk prediction, despite the notable imbalance between low and high-risk groups. This network incorporates a cross-lead attention module and 12 lead-specific temporal attention modules, focusing on cross-lead interactions and each lead's local dynamics. To further alleviate model overfitting, we leverage a large language model (LLM) with a public ECG-Report dataset for pretraining on an ECG-Report alignment task. The network is then fine-tuned for HF risk prediction using two specific cohorts from the U.K. Biobank study, focusing on patients with hypertension (UKB-HYP) and those who have had a myocardial infarction (UKB-MI). The results reveal that LLM-informed pre-training substantially enhances HF risk prediction in these cohorts. The dual attention design not only improves interpretability but also predictive accuracy, outperforming existing competitive methods with C-index scores of 0.6349 for UKB-HYP and 0.5805 for UKB-MI. This demonstrates our method's potential in advancing HF risk assessment with clinical complex ECG data.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"948-960"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858425","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-30DOI: 10.1109/TBDATA.2025.3536927
Luis Ibañez-Lissen;Lorena González-Manzano;José M. de Fuentes;Manuel Goyanes
Fake content is a noteworthy threat which is managed by assorted means. This is a serious problem for online shopping platforms whose products can be affected by negative or positive reviews. Artificial intelligence is commonly applied for fake review generation, being transfer learning a promising approach to reduce training requirements. However, the feasibility of generating in-context fake reviews using transfer learning has not been explored yet. This paper analyses the suitability of a couple of transformers (T5 and BART) to generate realistic in-context fake reviews. Results show that 1) the diversity of generated reviews is comparable to existing works; 2) human-based detection is close to random; 3) just reviews generated with one of the used transformers can be detected with 38% precision; and 1 h of training and 8 k real reviews are needed to produce realistic fake reviews.
{"title":"Use of Transfer Learning for Affordable In-Context Fake Review Generation","authors":"Luis Ibañez-Lissen;Lorena González-Manzano;José M. de Fuentes;Manuel Goyanes","doi":"10.1109/TBDATA.2025.3536927","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536927","url":null,"abstract":"Fake content is a noteworthy threat which is managed by assorted means. This is a serious problem for online shopping platforms whose products can be affected by negative or positive reviews. Artificial intelligence is commonly applied for fake review generation, being transfer learning a promising approach to reduce training requirements. However, the feasibility of generating in-context fake reviews using transfer learning has not been explored yet. This paper analyses the suitability of a couple of transformers (T5 and BART) to generate realistic in-context fake reviews. Results show that 1) the diversity of generated reviews is comparable to existing works; 2) human-based detection is close to random; 3) just reviews generated with one of the used transformers can be detected with 38% precision; and 1 h of training and 8 k real reviews are needed to produce realistic fake reviews.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"976-987"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858443","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most existing methods for predicting drug-drug interactions (DDI) predominantly concentrate on capturing the explicit relationships among drugs, overlooking the valuable implicit correlations present between drug pairs (DPs), which leads to weak predictions. To address this issue, this paper introduces a hierarchical multi-relational graph representation learning (HMGRL) approach. Within the framework of HMGRL, we leverage a wealth of drug-related heterogeneous data sources to construct heterogeneous graphs, where nodes represent drugs and edges denote clear and various associations. The relational graph convolutional network (RGCN) is employed to capture diverse explicit relationships between drugs from these heterogeneous graphs. Additionally, a multi-view differentiable spectral clustering (MVDSC) module is developed to capture multiple valuable implicit correlations between DPs. Within the MVDSC, we utilize multiple DP features to construct graphs, where nodes represent DPs and edges denote different implicit correlations. Subsequently, multiple DP representations are generated through graph cutting, each emphasizing distinct implicit correlations. The graph-cutting strategy enables our HMGRL to identify strongly connected communities of graphs, thereby reducing the fusion of irrelevant features. By combining every representation view of a DP, we create high-level DP representations for predicting DDIs. Two genuine datasets spanning three distinct tasks are adopted to gauge the efficacy of our HMGRL. Experimental outcomes unequivocally indicate that HMGRL surpasses several leading-edge methods in performance.
{"title":"Hierarchical Multi-Relational Graph Representation Learning for Large-Scale Prediction of Drug-Drug Interactions","authors":"Mengying Jiang;Guizhong Liu;Yuanchao Su;Weiqiang Jin;Biao Zhao","doi":"10.1109/TBDATA.2025.3536924","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536924","url":null,"abstract":"Most existing methods for predicting drug-drug interactions (DDI) predominantly concentrate on capturing the explicit relationships among drugs, overlooking the valuable implicit correlations present between drug pairs (DPs), which leads to weak predictions. To address this issue, this paper introduces a hierarchical multi-relational graph representation learning (HMGRL) approach. Within the framework of HMGRL, we leverage a wealth of drug-related heterogeneous data sources to construct heterogeneous graphs, where nodes represent drugs and edges denote clear and various associations. The relational graph convolutional network (RGCN) is employed to capture diverse explicit relationships between drugs from these heterogeneous graphs. Additionally, a multi-view differentiable spectral clustering (MVDSC) module is developed to capture multiple valuable implicit correlations between DPs. Within the MVDSC, we utilize multiple DP features to construct graphs, where nodes represent DPs and edges denote different implicit correlations. Subsequently, multiple DP representations are generated through graph cutting, each emphasizing distinct implicit correlations. The graph-cutting strategy enables our HMGRL to identify strongly connected communities of graphs, thereby reducing the fusion of irrelevant features. By combining every representation view of a DP, we create high-level DP representations for predicting DDIs. Two genuine datasets spanning three distinct tasks are adopted to gauge the efficacy of our HMGRL. Experimental outcomes unequivocally indicate that HMGRL surpasses several leading-edge methods in performance.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"961-975"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large Vision-Language Models (LVLMs) have made significant strides in various multimodal tasks. Notably, GPT4V, Claude, Gemini, and others showcase exceptional multimodal capabilities, marked by profound comprehension and reasoning skills. This study introduces a comprehensive and efficient evaluation framework, TinyLVLM-eHub, to assess LVLMs’ performance, including proprietary models. TinyLVLM-eHub covers six key multimodal capabilities, such as visual perception, knowledge acquisition, reasoning, commonsense understanding, object hallucination, and embodied intelligence. The benchmark, utilizing 2.1K image-text pairs, provides a user-friendly and accessible platform for LVLM evaluation. The evaluation employs the ChatGPT Ensemble Evaluation (CEE) method, which improves alignment with human evaluation compared to word-matching approaches. Results reveal that closed-source API models like GPT4V and GeminiPro-V excel in most capabilities compared to previous open-source LVLMs, though they show some vulnerability in object hallucination. This evaluation underscores areas for LVLM improvement in real-world applications and serves as a foundational assessment for future multimodal advancements.
{"title":"TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models","authors":"Wenqi Shao;Meng Lei;Yutao Hu;Peng Gao;Peng Xu;Kaipeng Zhang;Fanqing Meng;Siyuan Huang;Hongsheng Li;Yu Qiao;Ping Luo","doi":"10.1109/TBDATA.2025.3536930","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536930","url":null,"abstract":"Large Vision-Language Models (LVLMs) have made significant strides in various multimodal tasks. Notably, GPT4V, Claude, Gemini, and others showcase exceptional multimodal capabilities, marked by profound comprehension and reasoning skills. This study introduces a comprehensive and efficient evaluation framework, TinyLVLM-eHub, to assess LVLMs’ performance, including proprietary models. TinyLVLM-eHub covers six key multimodal capabilities, such as visual perception, knowledge acquisition, reasoning, commonsense understanding, object hallucination, and embodied intelligence. The benchmark, utilizing 2.1K image-text pairs, provides a user-friendly and accessible platform for LVLM evaluation. The evaluation employs the ChatGPT Ensemble Evaluation (CEE) method, which improves alignment with human evaluation compared to word-matching approaches. Results reveal that closed-source API models like GPT4V and GeminiPro-V excel in most capabilities compared to previous open-source LVLMs, though they show some vulnerability in object hallucination. This evaluation underscores areas for LVLM improvement in real-world applications and serves as a foundational assessment for future multimodal advancements.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"933-947"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-30DOI: 10.1109/TBDATA.2025.3536929
Peipeng Yu;Jiahan Chen;Xuan Feng;Zhihua Xia
The powerful ability of ChatGPT has caused widespread concern in the academic community. Malicious users could synthesize dummy academic content through ChatGPT, which is extremely harmful to academic rigor and originality. The need to develop ChatGPT-written content detection algorithms calls for large-scale datasets. In this paper, we initially investigate the possible negative impact of ChatGPT on academia, and present a large-scale CHatGPT-writtEn AbsTract dataset (CHEAT) to support the development of detection algorithms. In particular, the ChatGPT-written abstract dataset contains 35,304 synthetic abstracts, with $Generation$, $Polish$, and $Fusion$ as prominent representatives. Based on these data, we perform a thorough analysis of the existing text synthesis detection algorithms. We show that ChatGPT-written abstracts are detectable with well-trained detectors, while the detection difficulty increases with more human guidance involved.
ChatGPT的强大能力引起了学术界的广泛关注。恶意用户可以通过ChatGPT合成虚假的学术内容,这对学术严谨性和原创性是极其有害的。开发chatgpt编写的内容检测算法需要大规模的数据集。在本文中,我们初步研究了ChatGPT对学术界可能产生的负面影响,并提出了一个大规模的ChatGPT - written AbsTract dataset (CHEAT)来支持检测算法的开发。特别是,chatgpt编写的摘要数据集包含35304个合成摘要,其中$Generation$, $Polish$和$Fusion$是突出的代表。基于这些数据,我们对现有的文本合成检测算法进行了深入的分析。我们表明,chatgpt编写的摘要可以被训练有素的检测器检测到,而检测难度随着人工指导的增加而增加。
{"title":"CHEAT: A Large-Scale Dataset for Detecting CHatGPT-writtEn AbsTracts","authors":"Peipeng Yu;Jiahan Chen;Xuan Feng;Zhihua Xia","doi":"10.1109/TBDATA.2025.3536929","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536929","url":null,"abstract":"The powerful ability of ChatGPT has caused widespread concern in the academic community. Malicious users could synthesize dummy academic content through ChatGPT, which is extremely harmful to academic rigor and originality. The need to develop ChatGPT-written content detection algorithms calls for large-scale datasets. In this paper, we initially investigate the possible negative impact of ChatGPT on academia, and present a large-scale CHatGPT-writtEn AbsTract dataset (CHEAT) to support the development of detection algorithms. In particular, the ChatGPT-written abstract dataset contains 35,304 synthetic abstracts, with <inline-formula><tex-math>$Generation$</tex-math></inline-formula>, <inline-formula><tex-math>$Polish$</tex-math></inline-formula>, and <inline-formula><tex-math>$Fusion$</tex-math></inline-formula> as prominent representatives. Based on these data, we perform a thorough analysis of the existing text synthesis detection algorithms. We show that ChatGPT-written abstracts are detectable with well-trained detectors, while the detection difficulty increases with more human guidance involved.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"898-906"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Considering the potential of tools such as ChatGPT or Gemini to generate texts in a similar way to a human would do, having reliable detectors of AI –AI-generated content (AIGC)– is vital to combat the misuse and the surrounding negative consequences of those tools. Most research on AIGC detection has focused on the English language, often overlooking other languages that also have tools capable of generating human-like texts, such is the case of the Spanish language. This paper proposes a novel multilingual and multi-task approach for detecting machine versus human-generated text. The first task classifies whether a text is written by a machine or by a human, which is the research objective of this paper. The second task consists in detect the language of the text. To evaluate the results of our approach, this study has framed the scope of the AuTexTification shared task and also we have collected a different dataset in Spanish. The experiments carried out in Spanish and English show that our approach is very competitive concerning the state of the art, as well as it can generalize better, thus being able to detect an AI-generated text in multiple domains.
{"title":"To Write or Not to Write as a Machine? That’s the Question","authors":"Robiert Sepúlveda-Torres;Iván Martínez-Murillo;Estela Saquete;Elena Lloret;Manuel Palomar","doi":"10.1109/TBDATA.2025.3536938","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536938","url":null,"abstract":"Considering the potential of tools such as ChatGPT or Gemini to generate texts in a similar way to a human would do, having reliable detectors of AI –AI-generated content (AIGC)– is vital to combat the misuse and the surrounding negative consequences of those tools. Most research on AIGC detection has focused on the English language, often overlooking other languages that also have tools capable of generating human-like texts, such is the case of the Spanish language. This paper proposes a novel multilingual and multi-task approach for detecting machine versus human-generated text. The first task classifies whether a text is written by a machine or by a human, which is the research objective of this paper. The second task consists in detect the language of the text. To evaluate the results of our approach, this study has framed the scope of the AuTexTification shared task and also we have collected a different dataset in Spanish. The experiments carried out in Spanish and English show that our approach is very competitive concerning the state of the art, as well as it can generalize better, thus being able to detect an AI-generated text in multiple domains.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1042-1053"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858399","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-30DOI: 10.1109/TBDATA.2025.3536926
Yan Wang;Huiyu Zhou;Xinghui Dong
Natural terrain scene images play important roles in the geographical research and application. However, it is challenging to collect a large set of terrain scene images. Recently, great progress has been made in image generation. Although impressive results can be achieved, the efficiency of the state-of-the-art methods, e.g., the Vector Quantized Generative Adversarial Network (VQGAN), is still dissatisfying. The VQGAN confronts two issues, i.e., high space complexity and heavy computational demand. To efficiently fulfill the terrain scene generation task, we first collect a Natural Terrain Scene Data Set (NTSD), which contains 36,672 images divided into 38 classes. Then we propose a Lightweight VQGAN (Lit-VQGAN), which uses the fewer parameters and has the lower computational complexity, compared with the VQGAN. A lightweight super-resolution network is further adopted, to speedily derive a high-resolution image from the image that the Lit-VQGAN generates. The Lit-VQGAN can be trained and tested on the NTSD. To our knowledge, either the NTSD or the Lit-VQGAN has not been exploited before.1 Experimental results show that the Lit-VQGAN is more efficient and effective than the VQGAN for the image generation task. These promising results should be due to the lightweight yet effective networks that we design.
{"title":"Terrain Scene Generation Using a Lightweight Vector Quantized Generative Adversarial Network","authors":"Yan Wang;Huiyu Zhou;Xinghui Dong","doi":"10.1109/TBDATA.2025.3536926","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536926","url":null,"abstract":"Natural terrain scene images play important roles in the geographical research and application. However, it is challenging to collect a large set of terrain scene images. Recently, great progress has been made in image generation. Although impressive results can be achieved, the efficiency of the state-of-the-art methods, e.g., the Vector Quantized Generative Adversarial Network (VQGAN), is still dissatisfying. The VQGAN confronts two issues, i.e., high space complexity and heavy computational demand. To efficiently fulfill the terrain scene generation task, we first collect a Natural Terrain Scene Data Set (NTSD), which contains 36,672 images divided into 38 classes. Then we propose a Lightweight VQGAN (Lit-VQGAN), which uses the fewer parameters and has the lower computational complexity, compared with the VQGAN. A lightweight super-resolution network is further adopted, to speedily derive a high-resolution image from the image that the Lit-VQGAN generates. The Lit-VQGAN can be trained and tested on the NTSD. To our knowledge, either the NTSD or the Lit-VQGAN has not been exploited before.<sup>1</sup> Experimental results show that the Lit-VQGAN is more efficient and effective than the VQGAN for the image generation task. These promising results should be due to the lightweight yet effective networks that we design.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"988-1000"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study a novel problem in this paper, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.
{"title":"Adapt Anything: Tailor Any Image Classifier Across Domains and Categories Using Text-to-Image Diffusion Models","authors":"Weijie Chen;Haoyu Wang;Shicai Yang;Lei Zhang;Wei Wei;Yanning Zhang;Luojun Lin;Di Xie;Yueting Zhuang","doi":"10.1109/TBDATA.2025.3536933","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536933","url":null,"abstract":"We study a novel problem in this paper, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1013-1026"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning (FSL) scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely used strategy to mitigate such challenges is to perform data augmentation to better capture data invariance and increase the sample size. However, current text data augmentation methods either can’t ensure the correct labeling of the generated data (lacking faithfulness), or can’t ensure sufficient diversity in the generated data (lacking compactness), or both. Inspired by the recent success of large language models (LLM), especially the development of ChatGPT, we propose a text data augmentation approach based on ChatGPT (named ”AugGPT”). AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on multiple few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples.
{"title":"AugGPT: Leveraging ChatGPT for Text Data Augmentation","authors":"Haixing Dai;Zhengliang Liu;Wenxiong Liao;Xiaoke Huang;Yihan Cao;Zihao Wu;Lin Zhao;Shaochen Xu;Fang Zeng;Wei Liu;Ninghao Liu;Sheng Li;Dajiang Zhu;Hongmin Cai;Lichao Sun;Quanzheng Li;Dinggang Shen;Tianming Liu;Xiang Li","doi":"10.1109/TBDATA.2025.3536934","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536934","url":null,"abstract":"Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning (FSL) scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely used strategy to mitigate such challenges is to perform data augmentation to better capture data invariance and increase the sample size. However, current text data augmentation methods either can’t ensure the correct labeling of the generated data (lacking faithfulness), or can’t ensure sufficient diversity in the generated data (lacking compactness), or both. Inspired by the recent success of large language models (LLM), especially the development of ChatGPT, we propose a text data augmentation approach based on ChatGPT (named ”AugGPT”). AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on multiple few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"907-918"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}