Pub Date : 2025-01-30DOI: 10.1109/TBDATA.2025.3536926
Yan Wang;Huiyu Zhou;Xinghui Dong
Natural terrain scene images play important roles in the geographical research and application. However, it is challenging to collect a large set of terrain scene images. Recently, great progress has been made in image generation. Although impressive results can be achieved, the efficiency of the state-of-the-art methods, e.g., the Vector Quantized Generative Adversarial Network (VQGAN), is still dissatisfying. The VQGAN confronts two issues, i.e., high space complexity and heavy computational demand. To efficiently fulfill the terrain scene generation task, we first collect a Natural Terrain Scene Data Set (NTSD), which contains 36,672 images divided into 38 classes. Then we propose a Lightweight VQGAN (Lit-VQGAN), which uses the fewer parameters and has the lower computational complexity, compared with the VQGAN. A lightweight super-resolution network is further adopted, to speedily derive a high-resolution image from the image that the Lit-VQGAN generates. The Lit-VQGAN can be trained and tested on the NTSD. To our knowledge, either the NTSD or the Lit-VQGAN has not been exploited before.1 Experimental results show that the Lit-VQGAN is more efficient and effective than the VQGAN for the image generation task. These promising results should be due to the lightweight yet effective networks that we design.
{"title":"Terrain Scene Generation Using a Lightweight Vector Quantized Generative Adversarial Network","authors":"Yan Wang;Huiyu Zhou;Xinghui Dong","doi":"10.1109/TBDATA.2025.3536926","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536926","url":null,"abstract":"Natural terrain scene images play important roles in the geographical research and application. However, it is challenging to collect a large set of terrain scene images. Recently, great progress has been made in image generation. Although impressive results can be achieved, the efficiency of the state-of-the-art methods, e.g., the Vector Quantized Generative Adversarial Network (VQGAN), is still dissatisfying. The VQGAN confronts two issues, i.e., high space complexity and heavy computational demand. To efficiently fulfill the terrain scene generation task, we first collect a Natural Terrain Scene Data Set (NTSD), which contains 36,672 images divided into 38 classes. Then we propose a Lightweight VQGAN (Lit-VQGAN), which uses the fewer parameters and has the lower computational complexity, compared with the VQGAN. A lightweight super-resolution network is further adopted, to speedily derive a high-resolution image from the image that the Lit-VQGAN generates. The Lit-VQGAN can be trained and tested on the NTSD. To our knowledge, either the NTSD or the Lit-VQGAN has not been exploited before.<sup>1</sup> Experimental results show that the Lit-VQGAN is more efficient and effective than the VQGAN for the image generation task. These promising results should be due to the lightweight yet effective networks that we design.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"988-1000"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study a novel problem in this paper, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.
{"title":"Adapt Anything: Tailor Any Image Classifier Across Domains and Categories Using Text-to-Image Diffusion Models","authors":"Weijie Chen;Haoyu Wang;Shicai Yang;Lei Zhang;Wei Wei;Yanning Zhang;Luojun Lin;Di Xie;Yueting Zhuang","doi":"10.1109/TBDATA.2025.3536933","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536933","url":null,"abstract":"We study a novel problem in this paper, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1013-1026"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning (FSL) scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely used strategy to mitigate such challenges is to perform data augmentation to better capture data invariance and increase the sample size. However, current text data augmentation methods either can’t ensure the correct labeling of the generated data (lacking faithfulness), or can’t ensure sufficient diversity in the generated data (lacking compactness), or both. Inspired by the recent success of large language models (LLM), especially the development of ChatGPT, we propose a text data augmentation approach based on ChatGPT (named ”AugGPT”). AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on multiple few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples.
{"title":"AugGPT: Leveraging ChatGPT for Text Data Augmentation","authors":"Haixing Dai;Zhengliang Liu;Wenxiong Liao;Xiaoke Huang;Yihan Cao;Zihao Wu;Lin Zhao;Shaochen Xu;Fang Zeng;Wei Liu;Ninghao Liu;Sheng Li;Dajiang Zhu;Hongmin Cai;Lichao Sun;Quanzheng Li;Dinggang Shen;Tianming Liu;Xiang Li","doi":"10.1109/TBDATA.2025.3536934","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536934","url":null,"abstract":"Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning (FSL) scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely used strategy to mitigate such challenges is to perform data augmentation to better capture data invariance and increase the sample size. However, current text data augmentation methods either can’t ensure the correct labeling of the generated data (lacking faithfulness), or can’t ensure sufficient diversity in the generated data (lacking compactness), or both. Inspired by the recent success of large language models (LLM), especially the development of ChatGPT, we propose a text data augmentation approach based on ChatGPT (named ”AugGPT”). AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on multiple few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"907-918"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Whether viewed as an expert or as a source of ‘knowledge hallucination’, the use of ChatGPT in medical practice has stirred ongoing debate. This study sought to evaluate ChatGPT's capabilities in the field of clinical genetics, focusing on tasks such as ‘Clinical genetics exams’, ‘Associations between genetic diseases and pathogenic genes’, and ‘Limitations and trends in clinical genetics’. Results indicated that ChatGPT performed exceptionally well in question-answering tasks, particularly in clinical genetics exams and diagnosing single-gene diseases. It also effectively outlined the current limitations and prospective trends in clinical genetics. However, ChatGPT struggled to provide comprehensive answers regarding multi-gene or epigenetic diseases, particularly with respect to genetic variations or chromosomal abnormalities. In terms of systematic summarization and inference, some randomness was evident in ChatGPT's responses. In summary, while ChatGPT possesses a foundational understanding of general knowledge in clinical genetics due to hyperparameter learning, it encounters significant challenges when delving into specialized knowledge and navigating the complexities of clinical genetics, particularly in mitigating ‘Knowledge Hallucination’. To optimize its performance and depth of expertise in clinical genetics, integration with specialized knowledge databases and knowledge graphs is imperative.
{"title":"Expertise or Hallucination? A Comprehensive Evaluation of ChatGPT's Aptitude in Clinical Genetics","authors":"Yingbo Zhang;Shumin Ren;Jiao Wang;Chaoying Zhan;Mengqiao He;Xingyun Liu;Rongrong Wu;Jing Zhao;Cong Wu;Chuanzhu Fan;Bairong Shen","doi":"10.1109/TBDATA.2025.3536939","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536939","url":null,"abstract":"Whether viewed as an expert or as a source of ‘knowledge hallucination’, the use of ChatGPT in medical practice has stirred ongoing debate. This study sought to evaluate ChatGPT's capabilities in the field of clinical genetics, focusing on tasks such as ‘Clinical genetics exams’, ‘Associations between genetic diseases and pathogenic genes’, and ‘Limitations and trends in clinical genetics’. Results indicated that ChatGPT performed exceptionally well in question-answering tasks, particularly in clinical genetics exams and diagnosing single-gene diseases. It also effectively outlined the current limitations and prospective trends in clinical genetics. However, ChatGPT struggled to provide comprehensive answers regarding multi-gene or epigenetic diseases, particularly with respect to genetic variations or chromosomal abnormalities. In terms of systematic summarization and inference, some randomness was evident in ChatGPT's responses. In summary, while ChatGPT possesses a foundational understanding of general knowledge in clinical genetics due to hyperparameter learning, it encounters significant challenges when delving into specialized knowledge and navigating the complexities of clinical genetics, particularly in mitigating ‘Knowledge Hallucination’. To optimize its performance and depth of expertise in clinical genetics, integration with specialized knowledge databases and knowledge graphs is imperative.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"919-932"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-30DOI: 10.1109/TBDATA.2025.3536937
Mohammad Nadeem;Shahab Saquib Sohail;Dag Øivind Madsen;Ahmed Ibrahim Alzahrani;Javier Del Ser;Khan Muhammad
Recent years have witnessed tremendous advancements in Al tools (e.g., ChatGPT, GPT-4, and Bard), driven by the growing power, reasoning, and efficiency of Large Language Models (LLMs). LLMs have been shown to excel in tasks ranging from poem writing and coding to essay generation and puzzle solving. Despite their proficiency in general queries, specialized tasks such as metaphor understanding and fake news detection often require finely tuned models, posing a comparison challenge with specialized Deep Learning (DL). We propose an assessment framework to compare task-specific intelligence with general-purpose LLMs on suicide and depression tendency identification. For this purpose, we trained two DL models on a suicide and depression detection dataset, followed by testing their performance on a test set. Afterward, the same test dataset is used to evaluate the performance of four LLMs (GPT-3.5, GPT-4, Google Bard, and MS Bing) using four classification metrics. The BERT-based DL model performed the best among all, with a testing accuracy of 94.61%, while GPT-4 was the runner-up with accuracy 92.5%. Results demonstrate that LLMs do not outperform the specialized DL models but are able to achieve comparable performance, making them a decent option for downstream tasks without specialized training. However, LLMs outperformed specialized models on the reduced dataset.
{"title":"A Multi-Modal Assessment Framework for Comparison of Specialized Deep Learning and General-Purpose Large Language Models","authors":"Mohammad Nadeem;Shahab Saquib Sohail;Dag Øivind Madsen;Ahmed Ibrahim Alzahrani;Javier Del Ser;Khan Muhammad","doi":"10.1109/TBDATA.2025.3536937","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3536937","url":null,"abstract":"Recent years have witnessed tremendous advancements in Al tools (e.g., ChatGPT, GPT-4, and Bard), driven by the growing power, reasoning, and efficiency of Large Language Models (LLMs). LLMs have been shown to excel in tasks ranging from poem writing and coding to essay generation and puzzle solving. Despite their proficiency in general queries, specialized tasks such as metaphor understanding and fake news detection often require finely tuned models, posing a comparison challenge with specialized Deep Learning (DL). We propose an assessment framework to compare task-specific intelligence with general-purpose LLMs on suicide and depression tendency identification. For this purpose, we trained two DL models on a suicide and depression detection dataset, followed by testing their performance on a test set. Afterward, the same test dataset is used to evaluate the performance of four LLMs (GPT-3.5, GPT-4, Google Bard, and MS Bing) using four classification metrics. The BERT-based DL model performed the best among all, with a testing accuracy of 94.61%, while GPT-4 was the runner-up with accuracy 92.5%. Results demonstrate that LLMs do not outperform the specialized DL models but are able to achieve comparable performance, making them a decent option for downstream tasks without specialized training. However, LLMs outperformed specialized models on the reduced dataset.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1001-1012"},"PeriodicalIF":7.5,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3533916
Hongchun Lu;Songlin He;Xue Li;Min Han;Chase Wu
Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in fine-grained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available fine-grained datasets. The results show that our method effectively improves the performance of fine-grained image retrieval.
{"title":"SRGTNet: Subregion-Guided Transformer Hash Network for Fine-Grained Image Retrieval","authors":"Hongchun Lu;Songlin He;Xue Li;Min Han;Chase Wu","doi":"10.1109/TBDATA.2025.3533916","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533916","url":null,"abstract":"Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in fine-grained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available fine-grained datasets. The results show that our method effectively improves the performance of fine-grained image retrieval.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2388-2400"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144989930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3534625
Ao Chen;Xiren Zhou;Yizhan Fan;Huanhuan Chen
Anomaly detection (AD) is gaining prominence, especially in situations with limited labeled data or unknown anomalies, demanding an efficient approach with minimal reliance on labeled data or prior knowledge. Building upon the framework of Learning in the Model Space (LMS), this paper proposes conducting AD through Learning in the Multi-Level Model Spaces (MLMS). LMS transforms the data from the data space to the model space by representing each data instance with a fitted model. In MLMS, to fully capture the dynamic characteristics within the data, multi-level details of the original data instance are decomposed. These details are individually fitted, resulting in a set of fitted models that capture the multi-level dynamic characteristics of the original instance. Representing each data instance with a set of fitted models, rather than a single one, transforms it from the data space into the multi-level model spaces. The pairwise difference measurement between model sets is introduced, fully considering the distance between fitted models and the intra-class aggregation of similar models at each level of detail. Subsequently, effective AD can be implemented in the multi-level model spaces, with or without sufficient multi-class labeled data. Experiments on multiple AD datasets demonstrate the effectiveness of the proposed method.
{"title":"Anomaly Detection in Multi-Level Model Space","authors":"Ao Chen;Xiren Zhou;Yizhan Fan;Huanhuan Chen","doi":"10.1109/TBDATA.2025.3534625","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3534625","url":null,"abstract":"Anomaly detection (AD) is gaining prominence, especially in situations with limited labeled data or unknown anomalies, demanding an efficient approach with minimal reliance on labeled data or prior knowledge. Building upon the framework of Learning in the Model Space (LMS), this paper proposes conducting AD through Learning in the Multi-Level Model Spaces (MLMS). LMS transforms the data from the data space to the model space by representing each data instance with a fitted model. In MLMS, to fully capture the dynamic characteristics within the data, multi-level details of the original data instance are decomposed. These details are individually fitted, resulting in a set of fitted models that capture the multi-level dynamic characteristics of the original instance. Representing each data instance with a set of fitted models, rather than a single one, transforms it from the data space into the multi-level model spaces. The pairwise difference measurement between model sets is introduced, fully considering the distance between fitted models and the intra-class aggregation of similar models at each level of detail. Subsequently, effective AD can be implemented in the multi-level model spaces, with or without sufficient multi-class labeled data. Experiments on multiple AD datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2376-2387"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3533919
Lanyu Shang;Yang Zhang;Yawen Deng;Dong Wang
With the prevalence of social media and short video sharing platforms (e.g., TikTok, YouTube Shorts), the proliferation of healthcare misinformation has become a widespread and concerning issue that threatens public health and undermines trust in mass media. This paper focuses on an important problem of detecting multimodal healthcare misinformation in short videos on TikTok. Our objective is to accurately identify misleading healthcare information that is jointly conveyed by the visual, audio, and textual content within the TikTok short videos. Three critical challenges exist in solving our problem: i) how to effectively extract information from distractive and manipulated visual content in short videos? ii) How to efficiently identify the interrelation of the heterogeneous visual and speech content in short videos? iii) How to accurately capture the complex dependency of the densely connected sequential content in short videos? To address the above challenges, we develop MultiTec, a multimodal detector that explicitly explores the audio and visual content in short videos to investigate both the sequential relation of video elements and their inter-modality dependencies to jointly detect misinformation in healthcare videos on TikTok. To the best of our knowledge, MultiTec is the first modality-aware dual-attentive short video detection model for multimodal healthcare misinformation on TikTok. We evaluate MultiTec on two real-world healthcare video datasets collected from TikTok. Evaluation results show that MultiTec achieves substantial performance gains compared to state-of-the-art baselines in accurately detecting misleading healthcare short videos.
{"title":"MultiTec: A Data-Driven Multimodal Short Video Detection Framework for Healthcare Misinformation on TikTok","authors":"Lanyu Shang;Yang Zhang;Yawen Deng;Dong Wang","doi":"10.1109/TBDATA.2025.3533919","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533919","url":null,"abstract":"With the prevalence of social media and short video sharing platforms (e.g., TikTok, YouTube Shorts), the proliferation of healthcare misinformation has become a widespread and concerning issue that threatens public health and undermines trust in mass media. This paper focuses on an important problem of detecting multimodal healthcare misinformation in short videos on TikTok. Our objective is to accurately identify misleading healthcare information that is jointly conveyed by the visual, audio, and textual content within the TikTok short videos. Three critical challenges exist in solving our problem: i) how to effectively extract information from distractive and manipulated visual content in short videos? ii) How to efficiently identify the interrelation of the heterogeneous visual and speech content in short videos? iii) How to accurately capture the complex dependency of the densely connected sequential content in short videos? To address the above challenges, we develop <italic>MultiTec</i>, a multimodal detector that explicitly explores the audio and visual content in short videos to investigate both the sequential relation of video elements and their inter-modality dependencies to jointly detect misinformation in healthcare videos on TikTok. To the best of our knowledge, MultiTec is the first modality-aware dual-attentive short video detection model for multimodal healthcare misinformation on TikTok. We evaluate MultiTec on two real-world healthcare video datasets collected from TikTok. Evaluation results show that MultiTec achieves substantial performance gains compared to state-of-the-art baselines in accurately detecting misleading healthcare short videos.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2471-2488"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854802","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3533882
Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan;Qi Zhang
Accurate and comprehensive air pollution data is essential for understanding and addressing environmental challenges. Missing data can impair accurate analysis and decision-making. This study presents a novel approach, named CNN-Transformer-based Spatial-Temporal Data Imputation (CTDI), for imputing missing air pollution data. Data pre-processing incorporates observed air pollution data and related urban data to produce 24-hour period tensors as input samples. 1-by-1 CNN layers capture the interaction between different types of input data. Deep learning transformer architecture is employed in a spatial-temporal (S-T) transformer module to capture long-range dependencies and extract complex relationships in both spatial and temporal dimensions. Hong Kong air pollution data is statistically analyzed and used to evaluate CTDI in its recovery of generated and actual patterns of missing data. Experimental results show that CTDI consistently outperforms existing imputation methods across all evaluated scenarios, including cases with higher rates of missing data, thereby demonstrating its robustness and effectiveness in enhancing air quality monitoring. Additionally, ablation experiments reveal that each component significantly contributes to the model's performance, with the temporal transformer proving particularly crucial under varying rates of missing data.
准确和全面的空气污染数据对于理解和应对环境挑战至关重要。缺少数据会影响准确的分析和决策。本研究提出了一种新的方法,称为CNN-Transformer-based Spatial-Temporal Data Imputation (CTDI),用于输入缺失的空气污染数据。数据预处理结合观测到的空气污染数据和相关城市数据,产生24小时周期张量作为输入样本。1乘1的CNN层捕获不同类型输入数据之间的交互。在时空(S-T)转换器模块中采用深度学习转换器架构来捕获远程依赖关系并提取时空维度上的复杂关系。对香港的空气污染数据进行统计分析,并用于评估CTDI对缺失数据的生成模式和实际模式的恢复。实验结果表明,CTDI在所有评估情景(包括数据缺失率较高的情况)中始终优于现有的归算方法,从而证明了其在加强空气质量监测方面的鲁棒性和有效性。此外,烧蚀实验表明,每个分量对模型的性能都有显著贡献,在数据丢失率不同的情况下,时间转换器被证明尤为重要。
{"title":"CTDI: CNN-Transformer-Based Spatial-Temporal Missing Air Pollution Data Imputation","authors":"Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan;Qi Zhang","doi":"10.1109/TBDATA.2025.3533882","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533882","url":null,"abstract":"Accurate and comprehensive air pollution data is essential for understanding and addressing environmental challenges. Missing data can impair accurate analysis and decision-making. This study presents a novel approach, named CNN-Transformer-based Spatial-Temporal Data Imputation (CTDI), for imputing missing air pollution data. Data pre-processing incorporates observed air pollution data and related urban data to produce 24-hour period tensors as input samples. 1-by-1 CNN layers capture the interaction between different types of input data. Deep learning transformer architecture is employed in a spatial-temporal (S-T) transformer module to capture long-range dependencies and extract complex relationships in both spatial and temporal dimensions. Hong Kong air pollution data is statistically analyzed and used to evaluate CTDI in its recovery of generated and actual patterns of missing data. Experimental results show that CTDI consistently outperforms existing imputation methods across all evaluated scenarios, including cases with higher rates of missing data, thereby demonstrating its robustness and effectiveness in enhancing air quality monitoring. Additionally, ablation experiments reveal that each component significantly contributes to the model's performance, with the temporal transformer proving particularly crucial under varying rates of missing data.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2443-2456"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/TBDATA.2025.3533892
Sensen Zhang;Haibo Hong;Mande Xie
Currently, deep neural networks (DNNs) are susceptible to adversarial attacks, particularly when the network's structure and parameters are known, while most of the existing attacks do not perform satisfactorily in the presence of black-box settings. In this context, model augmentation is considered to be effective to improve the success rates of black-box attacks on adversarial examples. However, the existing model augmentation methods tend to rely on a single transformation, which limits the diversity of augmented model collections and thus affects the transferability of adversarial examples. In this paper, we first propose the random diversity ensemble method (RDE-MI-FGSM) to effectively enhance the diversity of the augmented model collection, thereby improving the transferability of the generated adversarial examples. Afterwards, we put forward the random diversity variance ensemble method (RDE-VRA-MI-FGSM), which adopts variance reduction augmentation (VRA) to improve the gradient variance of the enhanced model set and avoid falling into a poor local optimum, so as to further improve the transferability of adversarial examples. Furthermore, experimental results demonstrate that our approaches are compatible with many existing transfer-based attacks and can effectively improve the transferability of gradient-based adversarial attacks on the ImageNet dataset. Also, our proposals have achieved higher attack success rates even if the target model adopts advanced defenses. Specifically, we have achieved an average attack success rate of 91.4% on the defense model, which is higher than other baseline approaches.
{"title":"Enhancing the Transferability of Adversarial Examples With Random Diversity Ensemble and Variance Reduction Augmentation","authors":"Sensen Zhang;Haibo Hong;Mande Xie","doi":"10.1109/TBDATA.2025.3533892","DOIUrl":"https://doi.org/10.1109/TBDATA.2025.3533892","url":null,"abstract":"Currently, deep neural networks (DNNs) are susceptible to adversarial attacks, particularly when the network's structure and parameters are known, while most of the existing attacks do not perform satisfactorily in the presence of black-box settings. In this context, model augmentation is considered to be effective to improve the success rates of black-box attacks on adversarial examples. However, the existing model augmentation methods tend to rely on a single transformation, which limits the diversity of augmented model collections and thus affects the transferability of adversarial examples. In this paper, we first propose the random diversity ensemble method (RDE-MI-FGSM) to effectively enhance the diversity of the augmented model collection, thereby improving the transferability of the generated adversarial examples. Afterwards, we put forward the random diversity variance ensemble method (RDE-VRA-MI-FGSM), which adopts variance reduction augmentation (VRA) to improve the gradient variance of the enhanced model set and avoid falling into a poor local optimum, so as to further improve the transferability of adversarial examples. Furthermore, experimental results demonstrate that our approaches are compatible with many existing transfer-based attacks and can effectively improve the transferability of gradient-based adversarial attacks on the ImageNet dataset. Also, our proposals have achieved higher attack success rates even if the target model adopts advanced defenses. Specifically, we have achieved an average attack success rate of 91.4% on the defense model, which is higher than other baseline approaches.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2417-2430"},"PeriodicalIF":5.7,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}