Pub Date : 2024-09-18DOI: 10.1016/j.inffus.2024.102701
Chenyu Zhang, Mingwang Hu, Wenhui Li, Lanjun Wang
Recently, the text-to-image diffusion model has gained considerable attention from the community due to its exceptional image generation capability. A representative model, Stable Diffusion, amassed more than 10 million users within just two months of its release. This surge in popularity has facilitated studies on the robustness and safety of the model, leading to the proposal of various adversarial attack methods. Simultaneously, there has been a marked increase in research focused on defense methods to improve the robustness and safety of these models. In this survey, we provide a comprehensive review of the literature on adversarial attacks and defenses targeting text-to-image diffusion models. We begin with an overview of text-to-image diffusion models, followed by an introduction to a taxonomy of adversarial attacks and an in-depth review of existing attack methods. We then present a detailed analysis of current defense methods that improve model robustness and safety. Finally, we discuss ongoing challenges and explore promising future research directions. For a complete list of the adversarial attack and defense methods covered in this survey, please refer to our curated repository at https://github.com/datar001/Awesome-AD-on-T2IDM.
Warning:
This paper includes model-generated content that may contain offensive or distressing material.
{"title":"Adversarial attacks and defenses on text-to-image diffusion models: A survey","authors":"Chenyu Zhang, Mingwang Hu, Wenhui Li, Lanjun Wang","doi":"10.1016/j.inffus.2024.102701","DOIUrl":"10.1016/j.inffus.2024.102701","url":null,"abstract":"<div><p>Recently, the text-to-image diffusion model has gained considerable attention from the community due to its exceptional image generation capability. A representative model, Stable Diffusion, amassed more than 10 million users within just two months of its release. This surge in popularity has facilitated studies on the robustness and safety of the model, leading to the proposal of various adversarial attack methods. Simultaneously, there has been a marked increase in research focused on defense methods to improve the robustness and safety of these models. In this survey, we provide a comprehensive review of the literature on adversarial attacks and defenses targeting text-to-image diffusion models. We begin with an overview of text-to-image diffusion models, followed by an introduction to a taxonomy of adversarial attacks and an in-depth review of existing attack methods. We then present a detailed analysis of current defense methods that improve model robustness and safety. Finally, we discuss ongoing challenges and explore promising future research directions. For a complete list of the adversarial attack and defense methods covered in this survey, please refer to our curated repository at <span><span>https://github.com/datar001/Awesome-AD-on-T2IDM</span><svg><path></path></svg></span>.</p></div><div><h3>Warning:</h3><p>This paper includes model-generated content that may contain offensive or distressing material.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102701"},"PeriodicalIF":14.7,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1016/j.inffus.2024.102681
Yanjie Li , Jingyi Liu , Min Wu , Lina Yu , Weijun Li , Xin Ning , Wenqiang Li , Meilan Hao , Yusong Deng , Shu Wei
Mathematical formulas are the crystallization of human wisdom in exploring the laws of nature for thousands of years. Describing the complex laws of nature with a concise mathematical formula is a constant pursuit of scientists and a great challenge for artificial intelligence. This field is called symbolic regression (SR). Symbolic regression was originally formulated as a combinatorial optimization problem, and Genetic Programming (GP) and Reinforcement Learning algorithms were used to solve it. However, GP is sensitive to hyperparameters, and these two types of algorithms are inefficient. To solve this problem, researchers treat the mapping from data to expressions as a translation problem. And the corresponding large-scale pre-trained model is introduced. However, the data and expression skeletons do not have very clear word correspondences as the two languages do. Instead, they are more like two modalities (e.g., image and text). Therefore, in this paper, we proposed MMSR. The SR problem is solved as a pure multi-modal problem, and contrastive learning is also introduced in the training process for modal alignment to facilitate later modal feature fusion. It is worth noting that to better promote the modal feature fusion, we adopt the strategy of training contrastive learning loss and other losses at the same time, which only needs one-step training, instead of training contrastive learning loss first and then training other losses. Because our experiments prove training together can make the feature extraction module and feature fusion module wearing-in better. Experimental results show that compared with multiple large-scale pre-training baselines, MMSR achieves the most advanced results on multiple mainstream datasets including SRBench. Our code is open source at https://github.com/1716757342/MMSR.
{"title":"MMSR: Symbolic regression is a multi-modal information fusion task","authors":"Yanjie Li , Jingyi Liu , Min Wu , Lina Yu , Weijun Li , Xin Ning , Wenqiang Li , Meilan Hao , Yusong Deng , Shu Wei","doi":"10.1016/j.inffus.2024.102681","DOIUrl":"10.1016/j.inffus.2024.102681","url":null,"abstract":"<div><p>Mathematical formulas are the crystallization of human wisdom in exploring the laws of nature for thousands of years. Describing the complex laws of nature with a concise mathematical formula is a constant pursuit of scientists and a great challenge for artificial intelligence. This field is called symbolic regression (SR). Symbolic regression was originally formulated as a combinatorial optimization problem, and Genetic Programming (GP) and Reinforcement Learning algorithms were used to solve it. However, GP is sensitive to hyperparameters, and these two types of algorithms are inefficient. To solve this problem, researchers treat the mapping from data to expressions as a translation problem. And the corresponding large-scale pre-trained model is introduced. However, the data and expression skeletons do not have very clear word correspondences as the two languages do. Instead, they are more like two modalities (e.g., image and text). Therefore, in this paper, we proposed MMSR. The SR problem is solved as a pure multi-modal problem, and contrastive learning is also introduced in the training process for modal alignment to facilitate later modal feature fusion. It is worth noting that to better promote the modal feature fusion, we adopt the strategy of training contrastive learning loss and other losses at the same time, which only needs one-step training, instead of training contrastive learning loss first and then training other losses. Because our experiments prove training together can make the feature extraction module and feature fusion module wearing-in better. Experimental results show that compared with multiple large-scale pre-training baselines, MMSR achieves the most advanced results on multiple mainstream datasets including SRBench. Our code is open source at <span><span>https://github.com/1716757342/MMSR</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102681"},"PeriodicalIF":14.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1016/j.inffus.2024.102699
Hengrong Ju , Jing Guo , Weiping Ding , Xibei Yang
Deep clustering has gained significant traction as an unsupervised learning method, demonstrating considerable success in processing high-dimensional samples in data mining and computer vision. However, the ambiguity of high-dimensional data presents a challenge for deep clustering, which struggles to manage data uncertainty directly. In addition, while similarities and correlations in data often concentrate in local neighborhoods, traditional deep clustering methods frequently overlook these local relationships. To overcome these limitations, this paper presents a novel deep three-way clustering with granular evidence fusion. First, a fused contrastive deep FCM clustering network framework is introduced to project data from complex original data space to a more suitable deep feature space. Second, drawing upon the principles of three-way decision, the clustering results of the first stage are divided into positive, boundary, and negative regions, effectively addressing data uncertainty. Finally, a novel semiball neighborhood granulation method is employed to construct information granules for uncertain samples. This paper further leverages evidence theory to integrate belief information in these information granules, facilitating the redistribution of uncertain data. By emphasizing local structures, the proposed method effectively describes the characteristics of complex data. Experimental results confirm the effectiveness of this approach, showcasing its ability to enhance the clustering process.
{"title":"D3WC: Deep three-way clustering with granular evidence fusion","authors":"Hengrong Ju , Jing Guo , Weiping Ding , Xibei Yang","doi":"10.1016/j.inffus.2024.102699","DOIUrl":"10.1016/j.inffus.2024.102699","url":null,"abstract":"<div><p>Deep clustering has gained significant traction as an unsupervised learning method, demonstrating considerable success in processing high-dimensional samples in data mining and computer vision. However, the ambiguity of high-dimensional data presents a challenge for deep clustering, which struggles to manage data uncertainty directly. In addition, while similarities and correlations in data often concentrate in local neighborhoods, traditional deep clustering methods frequently overlook these local relationships. To overcome these limitations, this paper presents a novel deep three-way clustering with granular evidence fusion. First, a fused contrastive deep FCM clustering network framework is introduced to project data from complex original data space to a more suitable deep feature space. Second, drawing upon the principles of three-way decision, the clustering results of the first stage are divided into positive, boundary, and negative regions, effectively addressing data uncertainty. Finally, a novel semiball neighborhood granulation method is employed to construct information granules for uncertain samples. This paper further leverages evidence theory to integrate belief information in these information granules, facilitating the redistribution of uncertain data. By emphasizing local structures, the proposed method effectively describes the characteristics of complex data. Experimental results confirm the effectiveness of this approach, showcasing its ability to enhance the clustering process.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102699"},"PeriodicalIF":14.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1016/j.inffus.2024.102697
Ayman Anwar , Yassin Khalifa , James L. Coyle , Ervin Sejdic
Transformer architectures have become increasingly popular in healthcare applications. Through outstanding performance in natural language processing and superior capability to encode sequences, transformers have influenced researchers from various healthcare domains. Biosignal processing, in particular, has been a main focus in healthcare research to understand and assess complex physiological processes. Since their advent, multiple variants of transformer architectures have been leveraged by numerous studies to classify, analyze, and extract physiological events encoded within biosignals. In this paper, we aim to conduct a comprehensive survey that bridges research endeavors and highlights the most common and state-of-the-art transformer architectures utilized across the various subfields of biosignal analysis. Additionally, we also provide an objective comparison between transformers and similar sequence-specialized neural networks to highlight strengths, weaknesses, and best practices in biosignal analysis. By doing so, we aspire to provide a roadmap for researchers interested in leveraging transformer architectures for biosignal analysis applications.
{"title":"Transformers in biosignal analysis: A review","authors":"Ayman Anwar , Yassin Khalifa , James L. Coyle , Ervin Sejdic","doi":"10.1016/j.inffus.2024.102697","DOIUrl":"10.1016/j.inffus.2024.102697","url":null,"abstract":"<div><p>Transformer architectures have become increasingly popular in healthcare applications. Through outstanding performance in natural language processing and superior capability to encode sequences, transformers have influenced researchers from various healthcare domains. Biosignal processing, in particular, has been a main focus in healthcare research to understand and assess complex physiological processes. Since their advent, multiple variants of transformer architectures have been leveraged by numerous studies to classify, analyze, and extract physiological events encoded within biosignals. In this paper, we aim to conduct a comprehensive survey that bridges research endeavors and highlights the most common and state-of-the-art transformer architectures utilized across the various subfields of biosignal analysis. Additionally, we also provide an objective comparison between transformers and similar sequence-specialized neural networks to highlight strengths, weaknesses, and best practices in biosignal analysis. By doing so, we aspire to provide a roadmap for researchers interested in leveraging transformer architectures for biosignal analysis applications.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102697"},"PeriodicalIF":14.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1016/j.inffus.2024.102700
Sepehr Nourmohammadi , Shervin Rahimzadeh Arashloo , Josef Kittler
Classifier fusion is established as an effective methodology for boosting performance in different classification settings and one-class classification is no exception. In this study, we consider the one-class classifier fusion problem by modelling the sparsity/uniformity of the ensemble. To this end, we formulate a convex objective function to learn the weights in a linear ensemble model and impose a variable -norm constraint on the weight vector. The vector-norm constraint enables the model to adapt to the intrinsic uniformity/sparsity of the ensemble in the space of base learners and acts as a (soft) classifier selection mechanism by shaping the relative magnitudes of fusion weights. Drawing on the Frank–Wolfe algorithm, we then present an effective approach to solve the proposed convex constrained optimisation problem efficiently.
We evaluate the proposed one-class classifier combination approach on multiple data sets from diverse application domains and illustrate its merits in comparison to the existing approaches.
{"title":"ℓp-norm constrained one-class classifier combination","authors":"Sepehr Nourmohammadi , Shervin Rahimzadeh Arashloo , Josef Kittler","doi":"10.1016/j.inffus.2024.102700","DOIUrl":"10.1016/j.inffus.2024.102700","url":null,"abstract":"<div><p>Classifier fusion is established as an effective methodology for boosting performance in different classification settings and one-class classification is no exception. In this study, we consider the one-class classifier fusion problem by modelling the sparsity/uniformity of the ensemble. To this end, we formulate a convex objective function to learn the weights in a linear ensemble model and impose a variable <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>p</mi><mo>≥</mo><mn>1</mn></mrow></msub></math></span>-norm constraint on the weight vector. The vector-norm constraint enables the model to adapt to the intrinsic uniformity/sparsity of the ensemble in the space of base learners and acts as a (soft) classifier selection mechanism by shaping the relative magnitudes of fusion weights. Drawing on the Frank–Wolfe algorithm, we then present an effective approach to solve the proposed convex constrained optimisation problem efficiently.</p><p>We evaluate the proposed one-class classifier combination approach on multiple data sets from diverse application domains and illustrate its merits in comparison to the existing approaches.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102700"},"PeriodicalIF":14.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1016/j.inffus.2024.102698
Wenhan Liu , Shurong Pan , Zhoutong Li , Sheng Chang , Qijun Huang , Nan Jiang
Nowadays, deep learning depends on large-scale labeled datasets, which limits its broader application in electrocardiogram (ECG) analysis, as manual labeling of ECGs is consistently costly. To overcome this issue, this paper proposes a fused self-supervised learning (SSL) method for multi-lead ECGs: lead-fusion Barlow twins (LFBT). It utilizes unlabeled ECG datasets to pretrain an encoder group using a fused loss. This loss fuses intra-lead and inter-lead BT loss. By employing BT, LFBT avoids the need for additional techniques to prevent trivial solutions (collapse) in pretraining. Moreover, multi-branch concatenation (MBC) fuses information from all leads when transferring pretrained encoders to downstream tasks. According to the experiments, LFBT can extract prior knowledge from unlabeled ECG datasets, making a deep learning model yield comparable performances with its supervised counterpart (trained from scratch) using 3 fewer labels. Furthermore, LFBT is robust when applied to uncurated ECGs from real-world hospitals, with no significant performance decline observed after pretraining. Model interpretation based on gradient-weighted class activation mapping (GradCAM) indicates that LFBT helps models focus on critical waveform changes when training data and labels are insufficient. Compared with previous methods, LFBT demonstrates advantages in performance and implementation. To summarize, LFBT shows considerable potential in reducing the need for manual labeling of ECGs, thereby advancing deep learning applications in real-world ECG-based diagnoses. Code is available at https://github.com/Aiwiscal/ECG_SSL_LFBT.
{"title":"Lead-fusion Barlow twins: A fused self-supervised learning method for multi-lead electrocardiograms","authors":"Wenhan Liu , Shurong Pan , Zhoutong Li , Sheng Chang , Qijun Huang , Nan Jiang","doi":"10.1016/j.inffus.2024.102698","DOIUrl":"10.1016/j.inffus.2024.102698","url":null,"abstract":"<div><p>Nowadays, deep learning depends on large-scale labeled datasets, which limits its broader application in electrocardiogram (ECG) analysis, as manual labeling of ECGs is consistently costly. To overcome this issue, this paper proposes a fused self-supervised learning (SSL) method for multi-lead ECGs: lead-fusion Barlow twins (LFBT). It utilizes unlabeled ECG datasets to pretrain an encoder group using a fused loss. This loss fuses intra-lead and inter-lead BT loss. By employing BT, LFBT avoids the need for additional techniques to prevent trivial solutions (collapse) in pretraining. Moreover, multi-branch concatenation (MBC) fuses information from all leads when transferring pretrained encoders to downstream tasks. According to the experiments, LFBT can extract prior knowledge from unlabeled ECG datasets, making a deep learning model yield comparable performances with its supervised counterpart (trained from scratch) using 3<span><math><mrow><mo>∼</mo><mn>5</mn><mo>×</mo></mrow></math></span> fewer labels. Furthermore, LFBT is robust when applied to uncurated ECGs from real-world hospitals, with no significant performance decline observed after pretraining. Model interpretation based on gradient-weighted class activation mapping (GradCAM) indicates that LFBT helps models focus on critical waveform changes when training data and labels are insufficient. Compared with previous methods, LFBT demonstrates advantages in performance and implementation. To summarize, LFBT shows considerable potential in reducing the need for manual labeling of ECGs, thereby advancing deep learning applications in real-world ECG-based diagnoses. Code is available at <span><span>https://github.com/Aiwiscal/ECG_SSL_LFBT</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102698"},"PeriodicalIF":14.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1016/j.inffus.2024.102696
Ehsan Rassekh, Lauro Snidaro
Human fall detection is a critical research area focused on developing methods and systems that can automatically detect and recognize falls, particularly among the elderly and individuals with disabilities. Falls are a major cause of injuries and deaths among these populations, and timely intervention can reduce the severity of consequences. This article presents a comprehensive review of fall detection systems, emphasizing the use of cutting-edge technologies such as deep learning, sensor fusion, and machine learning. The research explores a variety of methodologies and strategies employed in fall detection systems, including the integration of wearable sensors, smartphones, and cameras. By examining various fall detection techniques and their experimental results, the article highlights the effectiveness of these systems in identifying and classifying falls. The study also addresses the challenges and limitations associated with fall detection systems, emphasizing the need for ongoing research and advancements. In summary, this research contributes to the development of advanced fall detection systems, demonstrating their potential to improve the quality of life for the elderly, alleviate healthcare burdens, and provide reliable solutions for fall detection and classification.
{"title":"Survey on data fusion approaches for fall-detection","authors":"Ehsan Rassekh, Lauro Snidaro","doi":"10.1016/j.inffus.2024.102696","DOIUrl":"10.1016/j.inffus.2024.102696","url":null,"abstract":"<div><p>Human fall detection is a critical research area focused on developing methods and systems that can automatically detect and recognize falls, particularly among the elderly and individuals with disabilities. Falls are a major cause of injuries and deaths among these populations, and timely intervention can reduce the severity of consequences. This article presents a comprehensive review of fall detection systems, emphasizing the use of cutting-edge technologies such as deep learning, sensor fusion, and machine learning. The research explores a variety of methodologies and strategies employed in fall detection systems, including the integration of wearable sensors, smartphones, and cameras. By examining various fall detection techniques and their experimental results, the article highlights the effectiveness of these systems in identifying and classifying falls. The study also addresses the challenges and limitations associated with fall detection systems, emphasizing the need for ongoing research and advancements. In summary, this research contributes to the development of advanced fall detection systems, demonstrating their potential to improve the quality of life for the elderly, alleviate healthcare burdens, and provide reliable solutions for fall detection and classification.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102696"},"PeriodicalIF":14.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1016/j.inffus.2024.102705
Hongyu Chen , Qiping Geoffrey Shen , Miroslaw J. Skibniewski , Yuan Cao , Yang Liu
In this paper, a hybrid intelligent framework comprising Bayesian optimization (BO), gradient boosting with categorical features (CatBoost) and the nondominated sorting genetic algorithm-III (NSGA-III) was proposed to support multiobjective optimization of shield construction parameters without large sample datasets, improve the shield performance, and ensure reliable and interpretable results. First, with the use of the specific tunneling energy, advancing speed and cutter wear as objective functions, a BO-CatBoost prediction model for shield construction parameters and various objectives was constructed, and the key influencing factors were identified via the SHapley Additive exPlanations (SHAP) method. Then, a BO-CatBoost-NSGA-III model was developed to obtain Pareto solutions under different scenarios involving the adjustment of the key influencing factors. Finally, adopting the Wuhan Metro as the background, the accuracy, stability, and generalizability of the constructed algorithm were verified. The results indicated that (1) the developed BO-CatBoost algorithm is superior to 9 other algorithms. The R2 values of the proposed approach were 0.976 and 0.901–0.976 on the test set. (2) The developed BO-CatBoost-NSGA-III algorithm could be used to obtain Pareto solutions under different scenarios via the adjustment of the key influencing factors with the SHAP method, and the optimal solutions could facilitate improvements in the advancing speed, specific tunneling energy and cutter wear of 3.45 %, 6.09 %, and 0.52 %, respectively, with an overall average reliability of 90.5 %. (3) By comparing various prediction algorithms, optimization schemes of different objectives and geological conditions, the accuracy, stability, and generalizability of the constructed algorithm were verified. The developed BO-CatBoost-NSGA-III framework could enable dynamic adjustment of shield construction parameters for decision-making purposes in the event of conflicting shield construction objectives and exhibits generality.
{"title":"Dynamic prediction and optimization of tunneling parameters with high reliability based on a hybrid intelligent algorithm","authors":"Hongyu Chen , Qiping Geoffrey Shen , Miroslaw J. Skibniewski , Yuan Cao , Yang Liu","doi":"10.1016/j.inffus.2024.102705","DOIUrl":"10.1016/j.inffus.2024.102705","url":null,"abstract":"<div><p>In this paper, a hybrid intelligent framework comprising Bayesian optimization (BO), gradient boosting with categorical features (CatBoost) and the nondominated sorting genetic algorithm-III (NSGA-III) was proposed to support multiobjective optimization of shield construction parameters without large sample datasets, improve the shield performance, and ensure reliable and interpretable results. First, with the use of the specific tunneling energy, advancing speed and cutter wear as objective functions, a BO-CatBoost prediction model for shield construction parameters and various objectives was constructed, and the key influencing factors were identified via the SHapley Additive exPlanations (SHAP) method. Then, a BO-CatBoost-NSGA-III model was developed to obtain Pareto solutions under different scenarios involving the adjustment of the key influencing factors. Finally, adopting the Wuhan Metro as the background, the accuracy, stability, and generalizability of the constructed algorithm were verified. The results indicated that (1) the developed BO-CatBoost algorithm is superior to 9 other algorithms. The R<sup>2</sup> values of the proposed approach were 0.976 and 0.901–0.976 on the test set. (2) The developed BO-CatBoost-NSGA-III algorithm could be used to obtain Pareto solutions under different scenarios via the adjustment of the key influencing factors with the SHAP method, and the optimal solutions could facilitate improvements in the advancing speed, specific tunneling energy and cutter wear of 3.45 %, 6.09 %, and 0.52 %, respectively, with an overall average reliability of 90.5 %. (3) By comparing various prediction algorithms, optimization schemes of different objectives and geological conditions, the accuracy, stability, and generalizability of the constructed algorithm were verified. The developed BO-CatBoost-NSGA-III framework could enable dynamic adjustment of shield construction parameters for decision-making purposes in the event of conflicting shield construction objectives and exhibits generality.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102705"},"PeriodicalIF":14.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-15DOI: 10.1016/j.inffus.2024.102708
Kuangchi Sun, Aijun Yin
Multi-sensor time-series data at different locations contains not only temporal correlation information but also spatial correlation information which is treasure for machine fault diagnosis. Existing graph construction methods mainly apply different data analysis methods to connect nodes and edges. Few works, however, consider the location of the sensor itself and temporal correlation information of multi-sensor time-series data. To mine the relationship between spatial information and temporal information, the multi-sensor temporal-spatial graph is constructed in this paper. Hereinto, the different data points of multi-sensor are severed as different nodes which represents the spatial feature information. The temporal information is contained between different nodes of the same sensor. Moreover, an empirical mode decomposition graph convolution network (EGCN) is proposed to extract the feature. Specifically, the traditional graph convolution operator is changed to empirical mode decomposition which can decompose the input features into multiple intrinsic modal features to achieve adaptive feature extraction and improve the representation capability of the network. Finally, the different fault types can be classified by fully connected layers. Experiments from different test rigs demonstrate that the proposed method achieves a diagnostic accuracy exceeding 99 % under limited fault samples.
{"title":"Multi-sensor temporal-spatial graph network fusion empirical mode decomposition convolution for machine fault diagnosis","authors":"Kuangchi Sun, Aijun Yin","doi":"10.1016/j.inffus.2024.102708","DOIUrl":"10.1016/j.inffus.2024.102708","url":null,"abstract":"<div><p>Multi-sensor time-series data at different locations contains not only temporal correlation information but also spatial correlation information which is treasure for machine fault diagnosis. Existing graph construction methods mainly apply different data analysis methods to connect nodes and edges. Few works, however, consider the location of the sensor itself and temporal correlation information of multi-sensor time-series data. To mine the relationship between spatial information and temporal information, the multi-sensor temporal-spatial graph is constructed in this paper. Hereinto, the different data points of multi-sensor are severed as different nodes which represents the spatial feature information. The temporal information is contained between different nodes of the same sensor. Moreover, an empirical mode decomposition graph convolution network (EGCN) is proposed to extract the feature. Specifically, the traditional graph convolution operator is changed to empirical mode decomposition which can decompose the input features into multiple intrinsic modal features to achieve adaptive feature extraction and improve the representation capability of the network. Finally, the different fault types can be classified by fully connected layers. Experiments from different test rigs demonstrate that the proposed method achieves a diagnostic accuracy exceeding 99 % under limited fault samples.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102708"},"PeriodicalIF":14.7,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1016/j.inffus.2024.102690
Felix Krones , Umar Marikkar , Guy Parsons , Adam Szmul , Adam Mahdi
Machine learning methods in healthcare have traditionally focused on using data from a single modality, limiting their ability to effectively replicate the clinical practice of integrating multiple sources of information for improved decision making. Clinicians typically rely on a variety of data sources including patients’ demographic information, laboratory data, vital signs and various imaging data modalities to make informed decisions and contextualise their findings. Recent advances in machine learning have facilitated the more efficient incorporation of multimodal data, resulting in applications that better represent the clinician’s approach. Here, we provide an overview of multimodal machine learning approaches in healthcare, encompassing various data modalities commonly used in clinical diagnoses, such as imaging, text, time series and tabular data. We discuss key stages of model development, including pre-training, fine-tuning and evaluation. Additionally, we explore common data fusion approaches used in modelling, highlighting their advantages and performance challenges. An overview is provided of 17 multimodal clinical datasets with detailed description of the specific data modalities used in each dataset. Over 50 studies have been reviewed, with a predominant focus on the integration of imaging and tabular data. While multimodal techniques have shown potential in improving predictive accuracy across many healthcare areas, our review highlights that the effectiveness of a method is contingent upon the specific data and task at hand.
{"title":"Review of multimodal machine learning approaches in healthcare","authors":"Felix Krones , Umar Marikkar , Guy Parsons , Adam Szmul , Adam Mahdi","doi":"10.1016/j.inffus.2024.102690","DOIUrl":"10.1016/j.inffus.2024.102690","url":null,"abstract":"<div><p>Machine learning methods in healthcare have traditionally focused on using data from a single modality, limiting their ability to effectively replicate the clinical practice of integrating multiple sources of information for improved decision making. Clinicians typically rely on a variety of data sources including patients’ demographic information, laboratory data, vital signs and various imaging data modalities to make informed decisions and contextualise their findings. Recent advances in machine learning have facilitated the more efficient incorporation of multimodal data, resulting in applications that better represent the clinician’s approach. Here, we provide an overview of multimodal machine learning approaches in healthcare, encompassing various data modalities commonly used in clinical diagnoses, such as imaging, text, time series and tabular data. We discuss key stages of model development, including pre-training, fine-tuning and evaluation. Additionally, we explore common data fusion approaches used in modelling, highlighting their advantages and performance challenges. An overview is provided of 17 multimodal clinical datasets with detailed description of the specific data modalities used in each dataset. Over 50 studies have been reviewed, with a predominant focus on the integration of imaging and tabular data. While multimodal techniques have shown potential in improving predictive accuracy across many healthcare areas, our review highlights that the effectiveness of a method is contingent upon the specific data and task at hand.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102690"},"PeriodicalIF":14.7,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1566253524004688/pdfft?md5=c13f0b2819a78d412d45575c042d7e61&pid=1-s2.0-S1566253524004688-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}