Pub Date : 2024-10-19DOI: 10.1016/j.neunet.2024.106808
Yanjun Zhao , Haibin Sun , Xiangyu Wang , Dong Yang , Ticao Jiao
This article analyses leader-following bipartite consensus for one-sided Lipschitz multi-agent systems by dual-terminal event-triggered output feedback control approach. A distributed observer is designed to estimate unknown system states by employing relative output information at triggering time instants, and then an event-triggered output feedback controller is proposed. Dual-terminal dynamic event-triggered mechanisms are proposed in sensor–observer channel and controller–actuator channel, which can save communication resources to a great extent, and the Zeno behavior is ruled out. A new generalized one-sided Lipschitz condition is proposed to handle the nonlinear term and achieve bipartite consensus. Some stability conditions are presented to guarantee leader-following bipartite consensus. Finally, one-link robot manipulator systems are introduced to demonstrate the availability of the designed scheme. The results demonstrate that the agents of the robot manipulators can track the reference trajectories bi-directionally, and effectively reduce communication resources by 61.22% and 68.04% at the sensor–observer and controller–actuator channels, respectively.
{"title":"Distributed leader-following bipartite consensus for one-sided Lipschitz multi-agent systems via dual-terminal event-triggered mechanism","authors":"Yanjun Zhao , Haibin Sun , Xiangyu Wang , Dong Yang , Ticao Jiao","doi":"10.1016/j.neunet.2024.106808","DOIUrl":"10.1016/j.neunet.2024.106808","url":null,"abstract":"<div><div>This article analyses leader-following bipartite consensus for one-sided Lipschitz multi-agent systems by dual-terminal event-triggered output feedback control approach. A distributed observer is designed to estimate unknown system states by employing relative output information at triggering time instants, and then an event-triggered output feedback controller is proposed. Dual-terminal dynamic event-triggered mechanisms are proposed in sensor–observer channel and controller–actuator channel, which can save communication resources to a great extent, and the Zeno behavior is ruled out. A new generalized one-sided Lipschitz condition is proposed to handle the nonlinear term and achieve bipartite consensus. Some stability conditions are presented to guarantee leader-following bipartite consensus. Finally, one-link robot manipulator systems are introduced to demonstrate the availability of the designed scheme. The results demonstrate that the agents of the robot manipulators can track the reference trajectories bi-directionally, and effectively reduce communication resources by 61.22% and 68.04% at the sensor–observer and controller–actuator channels, respectively.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106808"},"PeriodicalIF":6.0,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142511781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.neunet.2024.106807
Junbo Wang , Jianrui Chen , Zhihui Wang , Maoguo Gong
Hyperedge prediction aims to predict common relations among multiple nodes that will occur in the future or remain undiscovered in the current hypergraph. It is traditionally modeled as a classification task, which performs hypergraph feature learning and classifies the target samples as either present or absent. However, these approaches involve two issues: (i) in hyperedge feature learning, they fail to measure the influence of nodes on the hyperedges that include them and the neighboring hyperedges, and (ii) in the binary classification task, the quality of the generated negative samples directly impacts the prediction results. To this end, we propose a Hypergraph Contrastive Attention Network (HCAN) model for hyperedge prediction. Inspired by the brain organization, HCAN considers the influence of hyperedges with different orders through the order propagation attention mechanism. It also utilizes the contrastive mechanism to measure the reliability of attention effectively. Furthermore, we design a negative sample generator to produce three different types of negative samples. We evaluate the impact of various negative samples on the model and analyze the problems of binary classification modeling. The effectiveness of HCAN in hyperedge prediction is validated by experimentally comparing 12 baselines on 9 datasets. Our implementations will be publicly available at https://github.com/jianruichen/HCAN.
{"title":"Hypergraph contrastive attention networks for hyperedge prediction with negative samples evaluation","authors":"Junbo Wang , Jianrui Chen , Zhihui Wang , Maoguo Gong","doi":"10.1016/j.neunet.2024.106807","DOIUrl":"10.1016/j.neunet.2024.106807","url":null,"abstract":"<div><div>Hyperedge prediction aims to predict common relations among multiple nodes that will occur in the future or remain undiscovered in the current hypergraph. It is traditionally modeled as a classification task, which performs hypergraph feature learning and classifies the target samples as either present or absent. However, these approaches involve two issues: (i) in hyperedge feature learning, they fail to measure the influence of nodes on the hyperedges that include them and the neighboring hyperedges, and (ii) in the binary classification task, the quality of the generated negative samples directly impacts the prediction results. To this end, we propose a Hypergraph Contrastive Attention Network (HCAN) model for hyperedge prediction. Inspired by the brain organization, HCAN considers the influence of hyperedges with different orders through the order propagation attention mechanism. It also utilizes the contrastive mechanism to measure the reliability of attention effectively. Furthermore, we design a negative sample generator to produce three different types of negative samples. We evaluate the impact of various negative samples on the model and analyze the problems of binary classification modeling. The effectiveness of HCAN in hyperedge prediction is validated by experimentally comparing 12 baselines on 9 datasets. Our implementations will be publicly available at <span><span>https://github.com/jianruichen/HCAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106807"},"PeriodicalIF":6.0,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142511785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.neunet.2024.106818
Zhichao Zuo , Zhao Zhang , Yan Luo , Yang Zhao , Haijun Zhang , Yi Yang , Meng Wang
This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image. While the text-driven video editing has demonstrated remarkable ability to generate highly diverse videos following given text prompts, the fine-grained semantic edits are hard to control by plain textual prompt only in terms of object details and edited region, and cumbersome long text descriptions are usually needed for the task. We therefore investigate subject-driven video editing for more precise control of both edited regions and background preservation, and fine-grained semantic generation. We achieve this goal by introducing an reference image as supplementary input to the text-driven video editing, which avoids racking your brain to come up with a cumbersome text prompt describing the detailed appearance of the object. To limit the editing area, we refer to a method of cross attention control in image editing and successfully extend it to video editing by fusing the attention map of adjacent frames, which strikes a balance between maintaining video background and spatio-temporal consistency. Compared with current methods, the whole process of our method is like “cut” the source object to be edited and then “paste” the target object provided by reference image. We demonstrate that our method performs favorably over prior arts for video editing under the guidance of text prompt and extra reference image, as measured by both quantitative and subjective evaluations.
{"title":"Cut-and-Paste: Subject-driven video editing with attention control","authors":"Zhichao Zuo , Zhao Zhang , Yan Luo , Yang Zhao , Haijun Zhang , Yi Yang , Meng Wang","doi":"10.1016/j.neunet.2024.106818","DOIUrl":"10.1016/j.neunet.2024.106818","url":null,"abstract":"<div><div>This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image. While the text-driven video editing has demonstrated remarkable ability to generate highly diverse videos following given text prompts, the fine-grained semantic edits are hard to control by plain textual prompt only in terms of object details and edited region, and cumbersome long text descriptions are usually needed for the task. We therefore investigate subject-driven video editing for more precise control of both edited regions and background preservation, and fine-grained semantic generation. We achieve this goal by introducing an reference image as supplementary input to the text-driven video editing, which avoids racking your brain to come up with a cumbersome text prompt describing the detailed appearance of the object. To limit the editing area, we refer to a method of cross attention control in image editing and successfully extend it to video editing by fusing the attention map of adjacent frames, which strikes a balance between maintaining video background and spatio-temporal consistency. Compared with current methods, the whole process of our method is like “cut” the source object to be edited and then “paste” the target object provided by reference image. We demonstrate that our method performs favorably over prior arts for video editing under the guidance of text prompt and extra reference image, as measured by both quantitative and subjective evaluations.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106818"},"PeriodicalIF":6.0,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142540159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.neunet.2024.106819
Yishun Liu , Keke Huang , Benedict Jun Ma , Ke Wei , Yuxuan Li , Chunhua Yang , Weihua Gui
Fault detection consistently plays a crucial role in industrial dynamic processes as it enables timely prevention of production losses. However, since industrial dynamic processes become increasingly coupled and complex, they introduce uneven dynamics within the collected data, posing significant challenges in effectively extracting dynamic features. In addition, it is a tricky business to distinguish whether the fault that occurs is quality-related or not, resulting in unnecessary repairing and large losses. In order to deal with these issues, this paper comes up with a novel fault detection method based on quality-driven long short-term memory and autoencoder (QLSTM-AE). Specifically, an LSTM network is initially employed to extract dynamic features, while quality variables are simultaneously incorporated in parallel to capture quality-related features. Then, a fault detection strategy based on reconstruction error statistic squared prediction error () and the quality monitoring statistic Hotelling () is designed, which can distinguish various types of faults to realize accurate monitoring for dynamic processes. Finally, several experiments conducted on numerical simulations and the Tennessee Eastman (TE) benchmark process demonstrate the reliability and effectiveness of the proposed QLSTM-AE method, which indicates it has higher accuracy and can separate different faults efficiently compared to some state-of-the-art methods.
{"title":"Quality-related fault detection for dynamic process based on quality-driven long short-term memory network and autoencoder","authors":"Yishun Liu , Keke Huang , Benedict Jun Ma , Ke Wei , Yuxuan Li , Chunhua Yang , Weihua Gui","doi":"10.1016/j.neunet.2024.106819","DOIUrl":"10.1016/j.neunet.2024.106819","url":null,"abstract":"<div><div>Fault detection consistently plays a crucial role in industrial dynamic processes as it enables timely prevention of production losses. However, since industrial dynamic processes become increasingly coupled and complex, they introduce uneven dynamics within the collected data, posing significant challenges in effectively extracting dynamic features. In addition, it is a tricky business to distinguish whether the fault that occurs is quality-related or not, resulting in unnecessary repairing and large losses. In order to deal with these issues, this paper comes up with a novel fault detection method based on quality-driven long short-term memory and autoencoder (QLSTM-AE). Specifically, an LSTM network is initially employed to extract dynamic features, while quality variables are simultaneously incorporated in parallel to capture quality-related features. Then, a fault detection strategy based on reconstruction error statistic squared prediction error (<span><math><mrow><mi>S</mi><mi>P</mi><mi>E</mi></mrow></math></span>) and the quality monitoring statistic Hotelling <span><math><msup><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> (<span><math><msup><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>) is designed, which can distinguish various types of faults to realize accurate monitoring for dynamic processes. Finally, several experiments conducted on numerical simulations and the Tennessee Eastman (TE) benchmark process demonstrate the reliability and effectiveness of the proposed QLSTM-AE method, which indicates it has higher accuracy and can separate different faults efficiently compared to some state-of-the-art methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106819"},"PeriodicalIF":6.0,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142511789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.neunet.2024.106810
Xingwen Zhou , Zhenghao You , Weiguo Sun , Dongdong Zhao , Shi Yan
In this paper, a novel fractional-order stochastic gradient descent with momentum and energy (FOSGDME) approach is proposed. Specifically, to address the challenge of converging to a real extreme point encountered by the existing fractional gradient algorithms, a novel fractional-order stochastic gradient descent (FOSGD) method is presented by modifying the definition of the Caputo fractional-order derivative. A FOSGD with moment (FOSGDM) is established by incorporating momentum information to accelerate the convergence speed and accuracy further. In addition, to improve the robustness and accuracy, a FOSGD with moment and energy is established by further introducing energy formation. The extensive experimental results on the image classification CIFAR-10 dataset obtained with ResNet and DenseNet demonstrate that the proposed FOSGD, FOSGDM and FOSGDME algorithms are superior to the integer order optimization algorithms, and achieve state-of-the-art performance.
{"title":"Fractional-order stochastic gradient descent method with momentum and energy for deep neural networks","authors":"Xingwen Zhou , Zhenghao You , Weiguo Sun , Dongdong Zhao , Shi Yan","doi":"10.1016/j.neunet.2024.106810","DOIUrl":"10.1016/j.neunet.2024.106810","url":null,"abstract":"<div><div>In this paper, a novel fractional-order stochastic gradient descent with momentum and energy (FOSGDME) approach is proposed. Specifically, to address the challenge of converging to a real extreme point encountered by the existing fractional gradient algorithms, a novel fractional-order stochastic gradient descent (FOSGD) method is presented by modifying the definition of the Caputo fractional-order derivative. A FOSGD with moment (FOSGDM) is established by incorporating momentum information to accelerate the convergence speed and accuracy further. In addition, to improve the robustness and accuracy, a FOSGD with moment and energy is established by further introducing energy formation. The extensive experimental results on the image classification CIFAR-10 dataset obtained with ResNet and DenseNet demonstrate that the proposed FOSGD, FOSGDM and FOSGDME algorithms are superior to the integer order optimization algorithms, and achieve state-of-the-art performance.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106810"},"PeriodicalIF":6.0,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142511784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1016/j.neunet.2024.106806
Xiaoke Hao, Shiyu Liu, Chuanbo Feng, Ye Zhu
Domain adaptation aims to reduce the model degradation on the target domain caused by the domain shift between the source and target domains. Although encouraging performance has been achieved by combining contrastive learning with the self-training paradigm, they suffer from ambiguous scenarios caused by scale, illumination, or overlapping when deploying deterministic embedding. To address these issues, we propose probabilistic prototypical pixel contrast (PPPC), a universal adaptation framework that models each pixel embedding as a probability via multivariate Gaussian distribution to fully exploit the uncertainty within them, eventually improving the representation quality of the model. In addition, we derive prototypes from probability estimation posterior probability estimation which helps to push the decision boundary away from the ambiguity points. Moreover, we employ an efficient method to compute similarity between distributions, eliminating the need for sampling and reparameterization, thereby significantly reducing computational overhead. Further, we dynamically select the ambiguous crops at the image level to enlarge the number of boundary points involved in contrastive learning, which benefits the establishment of precise distributions for each category. Extensive experimentation demonstrates that PPPC not only helps to address ambiguity at the pixel level, yielding discriminative representations but also achieves significant improvements in both synthetic-to-real and day-to-night adaptation tasks. It surpasses the previous state-of-the-art (SOTA) by +5.2% mIoU in the most challenging daytime-to-nighttime adaptation scenario, exhibiting stronger generalization on other unseen datasets. The code and models are available at https://github.com/DarlingInTheSV/Probabilistic-Prototypical-Pixel-Contrast.
{"title":"Reducing semantic ambiguity in domain adaptive semantic segmentation via probabilistic prototypical pixel contrast","authors":"Xiaoke Hao, Shiyu Liu, Chuanbo Feng, Ye Zhu","doi":"10.1016/j.neunet.2024.106806","DOIUrl":"10.1016/j.neunet.2024.106806","url":null,"abstract":"<div><div>Domain adaptation aims to reduce the model degradation on the target domain caused by the domain shift between the source and target domains. Although encouraging performance has been achieved by combining contrastive learning with the self-training paradigm, they suffer from ambiguous scenarios caused by scale, illumination, or overlapping when deploying deterministic embedding. To address these issues, we propose probabilistic prototypical pixel contrast (PPPC), a universal adaptation framework that models each pixel embedding as a probability via multivariate Gaussian distribution to fully exploit the uncertainty within them, eventually improving the representation quality of the model. In addition, we derive prototypes from probability estimation posterior probability estimation which helps to push the decision boundary away from the ambiguity points. Moreover, we employ an efficient method to compute similarity between distributions, eliminating the need for sampling and reparameterization, thereby significantly reducing computational overhead. Further, we dynamically select the ambiguous crops at the image level to enlarge the number of boundary points involved in contrastive learning, which benefits the establishment of precise distributions for each category. Extensive experimentation demonstrates that PPPC not only helps to address ambiguity at the pixel level, yielding discriminative representations but also achieves significant improvements in both synthetic-to-real and day-to-night adaptation tasks. It surpasses the previous state-of-the-art (SOTA) by +5.2% mIoU in the most challenging daytime-to-nighttime adaptation scenario, exhibiting stronger generalization on other unseen datasets. The code and models are available at <span><span>https://github.com/DarlingInTheSV/Probabilistic-Prototypical-Pixel-Contrast</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106806"},"PeriodicalIF":6.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142511790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genitourinary syndrome of menopause (GSM) is a physiological disorder caused by reduced levels of oestrogen in menopausal women. Gradually, its symptoms worsen with age and prolonged menopausal status, which gravely impacts the quality of life as well as the physical and mental health of the patients. In this regard, optical coherence tomography (OCT) system effectively reduces the patient’s burden in clinical diagnosis with its noncontact, noninvasive tomographic imaging process. Consequently, supervised computer vision models applied on OCT images have yielded excellent results for disease diagnosis. However, manual labeling on an extensive number of medical images is expensive and time-consuming. To this end, this paper proposes GO-MAE, a pretraining framework for self-supervised learning of GSM OCT images based on Masked Autoencoder (MAE). To the best of our knowledge, this is the first study that applies self-supervised learning methods on the field of GSM disease screening. Focusing on the semantic complexity and feature sparsity of GSM OCT images, the objective of this study is two-pronged: first, a dynamic masking strategy is introduced for OCT characteristics in downstream tasks. This method can reduce the interference of invalid features on the model and shorten the training time. In the encoder design of MAE, we propose a convolutional neural network and transformer parallel network architecture (C&T), which aims to fuse the local and global representations of the relevant lesions in an interactive manner such that the model can still learn the richer differences between the feature information without labels. Thereafter, a series of experimental results on the acquired GSM-OCT dataset revealed that GO-MAE yields significant improvements over existing state-of-the-art techniques. Furthermore, the superiority of the model in terms of robustness and interpretability was verified through a series of comparative experiments and visualization operations, which consequently demonstrated its great potential for screening GSM symptoms.
更年期泌尿生殖系统综合征(GSM)是由更年期妇女体内雌激素水平降低引起的一种生理紊乱。随着年龄的增长和绝经期的延长,其症状会逐渐加重,严重影响患者的生活质量和身心健康。在这方面,光学相干断层扫描(OCT)系统以其非接触、非侵入性的断层成像过程,有效地减轻了患者的临床诊断负担。因此,应用于 OCT 图像的计算机视觉监督模型在疾病诊断方面取得了卓越的成果。然而,对大量医学图像进行人工标注既昂贵又耗时。为此,本文提出了 GO-MAE,一种基于掩码自动编码器(MAE)的 GSM OCT 图像自监督学习预训练框架。据我们所知,这是第一项将自监督学习方法应用于 GSM 疾病筛查领域的研究。针对 GSM OCT 图像的语义复杂性和特征稀疏性,本研究的目标是双管齐下的:首先,针对下游任务中的 OCT 特征引入动态掩蔽策略。这种方法可以减少无效特征对模型的干扰,缩短训练时间。在 MAE 的编码器设计中,我们提出了卷积神经网络和变压器并行网络架构(C&T),旨在以交互的方式融合相关病变的局部和全局表征,使模型在没有标签的情况下仍能学习到更丰富的差异特征信息。此后,在获取的 GSM-OCT 数据集上进行的一系列实验结果表明,GO-MAE 比现有的最先进技术有显著改进。此外,通过一系列对比实验和可视化操作,验证了该模型在鲁棒性和可解释性方面的优越性,从而证明了其在筛查 GSM 症状方面的巨大潜力。
{"title":"GO-MAE: Self-supervised pre-training via masked autoencoder for OCT image classification of gynecology","authors":"Haoran Wang, Xinyu Guo, Kaiwen Song, Mingyang Sun, Yanbin Shao, Songfeng Xue, Hongwei Zhang, Tianyu Zhang","doi":"10.1016/j.neunet.2024.106817","DOIUrl":"10.1016/j.neunet.2024.106817","url":null,"abstract":"<div><div>Genitourinary syndrome of menopause (GSM) is a physiological disorder caused by reduced levels of oestrogen in menopausal women. Gradually, its symptoms worsen with age and prolonged menopausal status, which gravely impacts the quality of life as well as the physical and mental health of the patients. In this regard, optical coherence tomography (OCT) system effectively reduces the patient’s burden in clinical diagnosis with its noncontact, noninvasive tomographic imaging process. Consequently, supervised computer vision models applied on OCT images have yielded excellent results for disease diagnosis. However, manual labeling on an extensive number of medical images is expensive and time-consuming. To this end, this paper proposes GO-MAE, a pretraining framework for self-supervised learning of GSM OCT images based on Masked Autoencoder (MAE). To the best of our knowledge, this is the first study that applies self-supervised learning methods on the field of GSM disease screening. Focusing on the semantic complexity and feature sparsity of GSM OCT images, the objective of this study is two-pronged: first, a dynamic masking strategy is introduced for OCT characteristics in downstream tasks. This method can reduce the interference of invalid features on the model and shorten the training time. In the encoder design of MAE, we propose a convolutional neural network and transformer parallel network architecture (C&T), which aims to fuse the local and global representations of the relevant lesions in an interactive manner such that the model can still learn the richer differences between the feature information without labels. Thereafter, a series of experimental results on the acquired GSM-OCT dataset revealed that GO-MAE yields significant improvements over existing state-of-the-art techniques. Furthermore, the superiority of the model in terms of robustness and interpretability was verified through a series of comparative experiments and visualization operations, which consequently demonstrated its great potential for screening GSM symptoms.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106817"},"PeriodicalIF":6.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1016/j.neunet.2024.106812
Shiguang Wang , Zhongyu Zhang , Guo Ai , Jian Cheng
Mixed-precision quantization plays a pivotal role in deploying deep neural networks in resource-constrained environments. However, the task of finding the optimal bit-width configurations for different layers under deployable mixed-precision quantization has barely been explored and remains a challenge. In this work, we present Cobits, an efficient and effective deployable mixed-precision quantization framework based on the relationship between the range of real-valued input and the range of quantized real-valued. It assigns a higher bit-width to the quantizer with a narrower quantized real-valued range and a lower bit-width to the quantizer with a wider quantized real-valued range. Cobits employs a co-learning approach to entangle and learn quantization parameters across various bit-widths, distinguishing between shared and specific parts. The shared part collaborates, while the specific part isolates precision conflicts. Additionally, we upgrade the normal quantizer to dynamic quantizer to mitigate statistical issues in the deployable mixed-precision supernet. Over the trained mixed-precision supernet, we utilize the quantized real-valued ranges to derive quantized-bit-sensitivity, which can serve as importance indicators for efficiently determining bit-width configurations, eliminating the need for iterative validation dataset evaluations. Extensive experiments show that Cobits outperforms previous state-of-the-art quantization methods on the ImageNet and COCO datasets while retaining superior efficiency. We show this approach dynamically adapts to varying bit-width and can generalize to various deployable backends. The code will be made public in https://github.com/sunnyxiaohu/cobits.
{"title":"Deployable mixed-precision quantization with co-learning and one-time search","authors":"Shiguang Wang , Zhongyu Zhang , Guo Ai , Jian Cheng","doi":"10.1016/j.neunet.2024.106812","DOIUrl":"10.1016/j.neunet.2024.106812","url":null,"abstract":"<div><div>Mixed-precision quantization plays a pivotal role in deploying deep neural networks in resource-constrained environments. However, the task of finding the optimal bit-width configurations for different layers under <strong>deployable mixed-precision quantization</strong> has barely been explored and remains a challenge. In this work, we present Cobits, an efficient and effective deployable mixed-precision quantization framework based on the relationship between the range of real-valued input and the range of quantized real-valued. It assigns a higher bit-width to the quantizer with a narrower quantized real-valued range and a lower bit-width to the quantizer with a wider quantized real-valued range. Cobits employs a co-learning approach to entangle and learn quantization parameters across various bit-widths, distinguishing between shared and specific parts. The shared part collaborates, while the specific part isolates precision conflicts. Additionally, we upgrade the normal quantizer to dynamic quantizer to mitigate statistical issues in the deployable mixed-precision supernet. Over the trained mixed-precision supernet, we utilize the quantized real-valued ranges to derive <em>quantized-bit-sensitivity</em>, which can serve as importance indicators for efficiently determining bit-width configurations, eliminating the need for iterative validation dataset evaluations. Extensive experiments show that Cobits outperforms previous state-of-the-art quantization methods on the ImageNet and COCO datasets while retaining superior efficiency. We show this approach dynamically adapts to varying bit-width and can generalize to various deployable backends. The code will be made public in <span><span>https://github.com/sunnyxiaohu/cobits</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106812"},"PeriodicalIF":6.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1016/j.neunet.2024.106809
Laibin Chang , Yunke Wang , Bo Du , Chang Xu
Single underwater images often face limitations in field-of-view and visual perception due to scattering and absorption. Numerous image stitching techniques have attempted to provide a wider viewing range, but the resulting stitched images may exhibit unsightly irregular boundaries. Unlike natural landscapes, the absence of reliable high-fidelity references in water complicates the replicability of these deep learning-based methods, leading to unpredictable distortions in cross-domain applications. To address these challenges, we propose an Underwater Wide-field Image Rectangling and Enhancement (UWIRE) framework that incorporates two procedures, i.e., the R-procedure and E-procedure, both of which employ self-coordinated modes, requiring only a single underwater stitched image as input. The R-procedure rectangles the irregular boundaries in stitched images by employing the initial shape resizing and mesh-based image preservation warping. Instead of local linear constraints, we use complementary optimization of boundary–structure–content to ensure a natural appearance with minimal distortion. The E-procedure enhances the rectangled image by employing parameter-adaptive correction to balance information distribution across channels. We further propose an attentive weight-guided fusion method to balance the perception of color restoration, contrast enhancement, and texture sharpening in a complementary manner. Comprehensive experiments demonstrate the superior performance of our UWIRE framework over state-of-the-art image rectangling and enhancement methods, both in quantitative and qualitative evaluation.
由于散射和吸收的原因,单幅水下图像在视野和视觉感知方面往往受到限制。许多图像拼接技术都试图提供更宽的视角范围,但拼接后的图像可能会出现难看的不规则边界。与自然景观不同,水中缺乏可靠的高保真参照物,这使得这些基于深度学习的方法的可复制性变得更加复杂,从而导致跨领域应用中出现不可预测的失真。为了应对这些挑战,我们提出了水下宽视场图像矩形化和增强(UWIRE)框架,该框架包含两个程序,即 R 程序和 E 程序,这两个程序都采用自协调模式,只需要一个水下拼接图像作为输入。R 程序通过调整初始形状大小和基于网格的图像保存扭曲,对拼接图像中的不规则边界进行矩形化处理。我们使用边界-结构-内容的互补优化来代替局部线性约束,以确保外观自然,失真最小。E 程序通过采用参数自适应校正来平衡各通道的信息分布,从而增强矩形图像。我们进一步提出了一种贴心的权重引导融合方法,以互补的方式平衡色彩还原、对比度增强和纹理锐化的感知。综合实验证明,我们的 UWIRE 框架在定量和定性评估方面都优于最先进的图像纠偏和增强方法。
{"title":"Rectangling and enhancing underwater stitched image via content-aware warping and perception balancing","authors":"Laibin Chang , Yunke Wang , Bo Du , Chang Xu","doi":"10.1016/j.neunet.2024.106809","DOIUrl":"10.1016/j.neunet.2024.106809","url":null,"abstract":"<div><div>Single underwater images often face limitations in field-of-view and visual perception due to scattering and absorption. Numerous image stitching techniques have attempted to provide a wider viewing range, but the resulting stitched images may exhibit unsightly irregular boundaries. Unlike natural landscapes, the absence of reliable high-fidelity references in water complicates the replicability of these deep learning-based methods, leading to unpredictable distortions in cross-domain applications. To address these challenges, we propose an Underwater Wide-field Image Rectangling and Enhancement (UWIRE) framework that incorporates two procedures, <em>i.e.</em>, the R-procedure and E-procedure, both of which employ self-coordinated modes, requiring only a single underwater stitched image as input. The R-procedure rectangles the irregular boundaries in stitched images by employing the initial shape resizing and mesh-based image preservation warping. Instead of local linear constraints, we use complementary optimization of boundary–structure–content to ensure a natural appearance with minimal distortion. The E-procedure enhances the rectangled image by employing parameter-adaptive correction to balance information distribution across channels. We further propose an attentive weight-guided fusion method to balance the perception of color restoration, contrast enhancement, and texture sharpening in a complementary manner. Comprehensive experiments demonstrate the superior performance of our UWIRE framework over state-of-the-art image rectangling and enhancement methods, both in quantitative and qualitative evaluation.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106809"},"PeriodicalIF":6.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1016/j.neunet.2024.106814
Zhenlei Dai , Liangchen Hu , Huaijiang Sun
The dependency of low-dimensional embedding to principal component space seriously limits the effectiveness of existing robust principal component analysis (PCA) algorithms. Simply projecting the original sample coordinates onto orthogonal principal component directions may not effectively address various noise-corrupted scenarios, impairing both discriminability and recoverability. Our method addresses this issue through a generalized PCA (GPCA), which optimizes regression bias rather than sample mean, leading to more adaptable properties. And, we propose a robust GPCA model with joint loss and regularization based on the norm and norms, respectively. This approach not only mitigates sensitivity to outliers but also enhances feature extraction and selection flexibility. Additionally, we introduce a truncated and reweighted loss strategy, where truncation eliminates severely deviated outliers, and reweighting prioritizes the remaining samples. These innovations collectively improve the GPCA model’s performance. To solve the proposed model, we propose a non-greedy iterative algorithm and theoretically guarantee the convergence. Experimental results demonstrate that the proposed GPCA model outperforms the previous robust PCA models in both recoverability and discrimination.
{"title":"Robust generalized PCA for enhancing discriminability and recoverability","authors":"Zhenlei Dai , Liangchen Hu , Huaijiang Sun","doi":"10.1016/j.neunet.2024.106814","DOIUrl":"10.1016/j.neunet.2024.106814","url":null,"abstract":"<div><div>The dependency of low-dimensional embedding to principal component space seriously limits the effectiveness of existing robust principal component analysis (PCA) algorithms. Simply projecting the original sample coordinates onto orthogonal principal component directions may not effectively address various noise-corrupted scenarios, impairing both discriminability and recoverability. Our method addresses this issue through a generalized PCA (GPCA), which optimizes regression bias rather than sample mean, leading to more adaptable properties. And, we propose a robust GPCA model with joint loss and regularization based on the <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mi>μ</mi></mrow></msub></math></span> norm and <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mi>ν</mi></mrow></msub></math></span> norms, respectively. This approach not only mitigates sensitivity to outliers but also enhances feature extraction and selection flexibility. Additionally, we introduce a truncated and reweighted loss strategy, where truncation eliminates severely deviated outliers, and reweighting prioritizes the remaining samples. These innovations collectively improve the GPCA model’s performance. To solve the proposed model, we propose a non-greedy iterative algorithm and theoretically guarantee the convergence. Experimental results demonstrate that the proposed GPCA model outperforms the previous robust PCA models in both recoverability and discrimination.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106814"},"PeriodicalIF":6.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142511796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}