Pub Date : 2024-09-02DOI: 10.1007/s13042-024-02335-9
Zhenbo Zhang, Zhiguo Feng, Aiqi Long, Zhiyu Wang
With the rapid advancement of artificial intelligence and automation technology, interest in autonomous driving research is also growing. However, under heavy rain, fog, and other adverse weather conditions, the visual quality of the images is reduced due to suspended atmospheric particles that affect the vehicle’s visual perception system, which is not conducive to the autonomous driving system’s accurate perception of the road environment. To address these challenges, this article presents a computationally efficient end-to-end light-weight Transformer-like neural network called LWTD (Light-Weight Transformer-like DehazeNet) to reconstruct haze-free images for driving tasks, which based on the reformulated ASM theory without prior knowledge. First, a strategy for simplifying the atmospheric light and transmission map into a feature map is adopted, a CMT (Convolutional Mapping Transformer) module for the extraction of global features is developed, and the hazy image is decomposed into a base layer (global features) and a detail layer (local features) for Low-Level, Medium-Level, and High-Level stages. Meanwhile, a channel attention module is introduced to weigh and assign the weights of each feature, and to fuse them with the reformulated ASM (Atmospheric Scattering Model) model to restore the haze-free image. Second, a joint loss function of the graphical features is formulated to further direct the network to converge in the direction of abundant features. In addition, a dataset of real-world fog driving is constructed. Extensive experiments with synthetic and natural hazy images confirmed the superiority of the proposed method through quantitative and qualitative evaluations on various datasets. Furthermore, additional experiments validated the applicability of the proposed method for traffic participant detection and semantic segmentation tasks. The source code has been made publicly available on https://github.com/ZebGH/LWTD-Net.
{"title":"LWTD: a novel light-weight transformer-like CNN architecture for driving scene dehazing","authors":"Zhenbo Zhang, Zhiguo Feng, Aiqi Long, Zhiyu Wang","doi":"10.1007/s13042-024-02335-9","DOIUrl":"https://doi.org/10.1007/s13042-024-02335-9","url":null,"abstract":"<p>With the rapid advancement of artificial intelligence and automation technology, interest in autonomous driving research is also growing. However, under heavy rain, fog, and other adverse weather conditions, the visual quality of the images is reduced due to suspended atmospheric particles that affect the vehicle’s visual perception system, which is not conducive to the autonomous driving system’s accurate perception of the road environment. To address these challenges, this article presents a computationally efficient end-to-end light-weight Transformer-like neural network called LWTD (Light-Weight Transformer-like DehazeNet) to reconstruct haze-free images for driving tasks, which based on the reformulated ASM theory without prior knowledge. First, a strategy for simplifying the atmospheric light and transmission map into a feature map is adopted, a CMT (Convolutional Mapping Transformer) module for the extraction of global features is developed, and the hazy image is decomposed into a base layer (global features) and a detail layer (local features) for Low-Level, Medium-Level, and High-Level stages. Meanwhile, a channel attention module is introduced to weigh and assign the weights of each feature, and to fuse them with the reformulated ASM (Atmospheric Scattering Model) model to restore the haze-free image. Second, a joint loss function of the graphical features is formulated to further direct the network to converge in the direction of abundant features. In addition, a dataset of real-world fog driving is constructed. Extensive experiments with synthetic and natural hazy images confirmed the superiority of the proposed method through quantitative and qualitative evaluations on various datasets. Furthermore, additional experiments validated the applicability of the proposed method for traffic participant detection and semantic segmentation tasks. The source code has been made publicly available on https://github.com/ZebGH/LWTD-Net.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"73 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With deep representation learning advances, supervised dependency parsing has achieved a notable enhancement. However, when the training data is drawn from various predefined out-domains, the parsing performance drops sharply due to the domain distribution shift. The key to addressing this problem is to model the associations and differences between multiple source and target domains. In this work, we propose an innovative domain-aware adversarial and parameter generation network for multi-source cross-domain dependency parsing where a domain-aware parameter generation network is used for identifying domain-specific features and an adversarial network is used for learning domain-invariant ones. Experiments on the benchmark datasets reveal that our model outperforms strong BERT-enhanced baselines by 2 points in the average labeled attachment score (LAS). Detailed analysis of various domain representation strategies shows that our proposed distributed domain embedding can accurately capture domain relevance, which motivates the domain-aware parameter generation network to emphasize useful domain-specific representations and disregard unnecessary or even harmful ones. Additionally, extensive comparison experiments show deeper insights on the contributions of the two components.
{"title":"Multi-source domain adaptation for dependency parsing via domain-aware feature generation","authors":"Ying Li, Zhenguo Zhang, Yantuan Xian, Zhengtao Yu, Shengxiang Gao, Cunli Mao, Yuxin Huang","doi":"10.1007/s13042-024-02306-0","DOIUrl":"https://doi.org/10.1007/s13042-024-02306-0","url":null,"abstract":"<p>With deep representation learning advances, supervised dependency parsing has achieved a notable enhancement. However, when the training data is drawn from various predefined out-domains, the parsing performance drops sharply due to the domain distribution shift. The key to addressing this problem is to model the associations and differences between multiple source and target domains. In this work, we propose an innovative domain-aware adversarial and parameter generation network for multi-source cross-domain dependency parsing where a domain-aware parameter generation network is used for identifying domain-specific features and an adversarial network is used for learning domain-invariant ones. Experiments on the benchmark datasets reveal that our model outperforms strong BERT-enhanced baselines by 2 points in the average labeled attachment score (LAS). Detailed analysis of various domain representation strategies shows that our proposed distributed domain embedding can accurately capture domain relevance, which motivates the domain-aware parameter generation network to emphasize useful domain-specific representations and disregard unnecessary or even harmful ones. Additionally, extensive comparison experiments show deeper insights on the contributions of the two components.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"34 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-31DOI: 10.1007/s13042-024-02327-9
Jiawei Liu, Wenyi Xiao, Hongtao Cheng, Chuan Shi
SAT-based formal verification is a systematic process to prove the correctness of computer hardware design based on formal specifications, providing an alternative to time-consuming simulations and ensuring design reliability and accuracy. Predicting the runtime of SAT solvers is important to effectively allocate verification resources and determine if the verification can be completed within time limits. Predicting SAT solver runtime is challenging due to variations in solving time across different solvers and dependence on problem complexity and solver mechanisms. Existing approaches rely on feature engineering and machine learning, but they have drawbacks in terms of expert knowledge requirements and time-consuming feature extraction. To address this, using graph neural networks (GNNs) for runtime prediction is considered, as they excel in capturing graph topology and relationships. However, directly applying existing GNNs to predict SAT solver runtime does not yield satisfactory results, as SAT solvers’ proving procedure is crucial. In this paper, we propose a novel model, TESS, that integrates the working mechanism of SAT solvers with graph neural networks (GNNs) for predicting solving time. The model incorporates a graph representation inspired by the CDCL paradigm, proposes adaptive aggregation for multilayer information and separate modules for conflict learning. Experimental results on multiple datasets validate the effectiveness, scalability, and robustness of our model, outperforming baselines in SAT solver runtime prediction.
基于 SAT 的形式化验证是一种基于形式化规范证明计算机硬件设计正确性的系统过程,它提供了一种替代耗时模拟的方法,并确保了设计的可靠性和准确性。预测 SAT 求解器的运行时间对于有效分配验证资源和确定能否在规定时间内完成验证非常重要。由于不同求解器的求解时间存在差异,并且取决于问题的复杂性和求解器机制,因此预测 SAT 求解器的运行时间具有挑战性。现有方法依赖于特征工程和机器学习,但它们在专家知识要求和耗时的特征提取方面存在缺陷。为了解决这个问题,我们考虑使用图神经网络(GNN)进行运行时预测,因为它们在捕捉图拓扑和关系方面表现出色。然而,直接应用现有的图神经网络预测 SAT 解算器的运行时间并不能获得令人满意的结果,因为 SAT 解算器的证明过程至关重要。在本文中,我们提出了一种新型模型 TESS,它将 SAT 求解器的工作机制与图神经网络(GNN)相结合,用于预测求解时间。该模型结合了受 CDCL 范式启发的图表示法,提出了多层信息自适应聚合和冲突学习独立模块。在多个数据集上的实验结果验证了我们模型的有效性、可扩展性和鲁棒性,在 SAT 解算器运行时间预测方面优于基线模型。
{"title":"Graph neural network based time estimator for SAT solver","authors":"Jiawei Liu, Wenyi Xiao, Hongtao Cheng, Chuan Shi","doi":"10.1007/s13042-024-02327-9","DOIUrl":"https://doi.org/10.1007/s13042-024-02327-9","url":null,"abstract":"<p>SAT-based formal verification is a systematic process to prove the correctness of computer hardware design based on formal specifications, providing an alternative to time-consuming simulations and ensuring design reliability and accuracy. Predicting the runtime of SAT solvers is important to effectively allocate verification resources and determine if the verification can be completed within time limits. Predicting SAT solver runtime is challenging due to variations in solving time across different solvers and dependence on problem complexity and solver mechanisms. Existing approaches rely on feature engineering and machine learning, but they have drawbacks in terms of expert knowledge requirements and time-consuming feature extraction. To address this, using graph neural networks (GNNs) for runtime prediction is considered, as they excel in capturing graph topology and relationships. However, directly applying existing GNNs to predict SAT solver runtime does not yield satisfactory results, as SAT solvers’ proving procedure is crucial. In this paper, we propose a novel model, TESS, that integrates the working mechanism of SAT solvers with graph neural networks (GNNs) for predicting solving time. The model incorporates a graph representation inspired by the CDCL paradigm, proposes adaptive aggregation for multilayer information and separate modules for conflict learning. Experimental results on multiple datasets validate the effectiveness, scalability, and robustness of our model, outperforming baselines in SAT solver runtime prediction.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"55 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-31DOI: 10.1007/s13042-024-02325-x
James Ciyu Qin, Rujun Jiang, Huadong Mo, Daoyi Dong
This paper introduces a novel mixed integer programming (MIP) reformulation for the joint chance-constrained optimal power flow problem under uncertain load and renewable energy generation. Unlike traditional models, our approach incorporates a comprehensive evaluation of system-wide risk without decomposing joint chance constraints into individual constraints, thus preventing overly conservative solutions and ensuring robust system security. A significant innovation in our method is the use of historical data to form a sample average approximation that directly informs the MIP model, bypassing the need for distributional assumptions to enhance solution robustness. Additionally, we implement a model improvement strategy to reduce the computational burden, making our method more scalable for large-scale power systems. Our approach is validated against benchmark systems, i.e., IEEE 14-, 57- and 118-bus systems, demonstrating superior performance in terms of cost-efficiency and robustness, with lower computational demand compared to existing methods.
{"title":"A data-driven mixed integer programming approach for joint chance-constrained optimal power flow under uncertainty","authors":"James Ciyu Qin, Rujun Jiang, Huadong Mo, Daoyi Dong","doi":"10.1007/s13042-024-02325-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02325-x","url":null,"abstract":"<p>This paper introduces a novel mixed integer programming (MIP) reformulation for the joint chance-constrained optimal power flow problem under uncertain load and renewable energy generation. Unlike traditional models, our approach incorporates a comprehensive evaluation of system-wide risk without decomposing joint chance constraints into individual constraints, thus preventing overly conservative solutions and ensuring robust system security. A significant innovation in our method is the use of historical data to form a sample average approximation that directly informs the MIP model, bypassing the need for distributional assumptions to enhance solution robustness. Additionally, we implement a model improvement strategy to reduce the computational burden, making our method more scalable for large-scale power systems. Our approach is validated against benchmark systems, i.e., IEEE 14-, 57- and 118-bus systems, demonstrating superior performance in terms of cost-efficiency and robustness, with lower computational demand compared to existing methods.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"47 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1007/s13042-024-02342-w
Yifan Chen, Haoliang Xiong, Kuntao Li, Weixing Mai, Yun Xue, Qianhua Cai, Fenghuan Li
Multimodal aspect-based sentiment analysis, which aims to identify the sentiment polarities over each aspect mentioned in an image-text pair, has sparked considerable research interest in the field of multimodal analysis. Despite existing approaches have shown remarkable results in incorporating external knowledge to enhance visual entity information, they still suffer from two problems: (1) the image-aspect global relevance. (2) the entity-aspect local alignment. To tackle these issues, we propose a Relevance-Aware Visual Entity Filter Network (REF) for MABSA. Specifically, we utilize the nouns of ANPs extracted from the given image as bridges to facilitate cross-modal feature alignment. Moreover, we introduce an additional “UNRELATED” marker word and utilize Contrastive Content Re-sourcing (CCR) and Contrastive Content Swapping (CCS) constraints to obtain accurate attention weight to identify image-aspect relevance for dynamically controlling the contribution of visual information. We further adopt the accurate reversed attention weight distributions to selectively filter out aspect-unrelated visual entities for better entity-aspect alignment. Comprehensive experimental results demonstrate the consistent superiority of our REF model over state-of-the-art approaches on the Twitter-2015 and Twitter-2017 datasets.
{"title":"Relevance-aware visual entity filter network for multimodal aspect-based sentiment analysis","authors":"Yifan Chen, Haoliang Xiong, Kuntao Li, Weixing Mai, Yun Xue, Qianhua Cai, Fenghuan Li","doi":"10.1007/s13042-024-02342-w","DOIUrl":"https://doi.org/10.1007/s13042-024-02342-w","url":null,"abstract":"<p>Multimodal aspect-based sentiment analysis, which aims to identify the sentiment polarities over each aspect mentioned in an image-text pair, has sparked considerable research interest in the field of multimodal analysis. Despite existing approaches have shown remarkable results in incorporating external knowledge to enhance visual entity information, they still suffer from two problems: (1) the image-aspect global relevance. (2) the entity-aspect local alignment. To tackle these issues, we propose a Relevance-Aware Visual Entity Filter Network (REF) for MABSA. Specifically, we utilize the nouns of ANPs extracted from the given image as bridges to facilitate cross-modal feature alignment. Moreover, we introduce an additional “UNRELATED” marker word and utilize Contrastive Content Re-sourcing (CCR) and Contrastive Content Swapping (CCS) constraints to obtain accurate attention weight to identify image-aspect relevance for dynamically controlling the contribution of visual information. We further adopt the accurate reversed attention weight distributions to selectively filter out aspect-unrelated visual entities for better entity-aspect alignment. Comprehensive experimental results demonstrate the consistent superiority of our REF model over state-of-the-art approaches on the Twitter-2015 and Twitter-2017 datasets.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"23 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1007/s13042-024-02340-y
Jilong He, Cong Cao
The Bratu-type equation is a fundamental differential equation with numerous applications in engineering fields, such as radiative heat transfer, thermal reaction, and nanotechnology. This paper introduces a novel approach known as the rational polynomial neural network. In this approach, rational orthogonal polynomials are utilized within the neural network’s hidden layer. To solve the equation, the initial boundary value conditions of both the differential equation and the rational polynomial neural network are integrated into the construction of the numerical solution. This construction transforms the Bratu-type equation into a set of nonlinear equations, which are subsequently solved using an appropriate optimization technique. Finally, three sets of numerical examples are presented to validate the efficacy and versatility of the proposed rational orthogonal neural network method, with comparisons made across different hyperparameters. Furthermore, the experimental results are juxtaposed against traditional methods such as the Adomian decomposition method, genetic algorithm, Laplace transform method, spectral method, and multilayer perceptron, our method exhibits consistently optimal performance.
{"title":"A new neural network method for solving Bratu type equations with rational polynomials","authors":"Jilong He, Cong Cao","doi":"10.1007/s13042-024-02340-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02340-y","url":null,"abstract":"<p>The Bratu-type equation is a fundamental differential equation with numerous applications in engineering fields, such as radiative heat transfer, thermal reaction, and nanotechnology. This paper introduces a novel approach known as the rational polynomial neural network. In this approach, rational orthogonal polynomials are utilized within the neural network’s hidden layer. To solve the equation, the initial boundary value conditions of both the differential equation and the rational polynomial neural network are integrated into the construction of the numerical solution. This construction transforms the Bratu-type equation into a set of nonlinear equations, which are subsequently solved using an appropriate optimization technique. Finally, three sets of numerical examples are presented to validate the efficacy and versatility of the proposed rational orthogonal neural network method, with comparisons made across different hyperparameters. Furthermore, the experimental results are juxtaposed against traditional methods such as the Adomian decomposition method, genetic algorithm, Laplace transform method, spectral method, and multilayer perceptron, our method exhibits consistently optimal performance.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"20 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1007/s13042-024-02336-8
Yunpeng Mei, Shuze Wang, Zhuo Li, Jian Sun, Gang Wang
Accurate six degrees of freedom (6-DoF) pose estimation is crucial for robust visual perception in fields such as smart manufacturing. Traditional RGB-based methods, though widely used, often face difficulties in adapting to dynamic scenes, understanding contextual information, and capturing temporal variations effectively. To address these challenges, we introduce a novel multi-modal 6-DoF pose estimation framework. This framework uses RGB images as the primary input and integrates spatial cues, including keypoint heatmaps and affinity fields, through a spatially aligned approach inspired by the Trans-UNet architecture. Our multi-modal method enhances both contextual understanding and temporal consistency. Experimental results on the Objectron dataset demonstrate that our approach surpasses existing algorithms across most categories. Furthermore, real-world tests confirm the accuracy and practical applicability of our method for robotic tasks, such as precision grasping, highlighting its effectiveness for real-world applications.
{"title":"Multi-modal 6-DoF object pose tracking: integrating spatial cues with monocular RGB imagery","authors":"Yunpeng Mei, Shuze Wang, Zhuo Li, Jian Sun, Gang Wang","doi":"10.1007/s13042-024-02336-8","DOIUrl":"https://doi.org/10.1007/s13042-024-02336-8","url":null,"abstract":"<p>Accurate six degrees of freedom (6-DoF) pose estimation is crucial for robust visual perception in fields such as smart manufacturing. Traditional RGB-based methods, though widely used, often face difficulties in adapting to dynamic scenes, understanding contextual information, and capturing temporal variations effectively. To address these challenges, we introduce a novel multi-modal 6-DoF pose estimation framework. This framework uses RGB images as the primary input and integrates spatial cues, including keypoint heatmaps and affinity fields, through a spatially aligned approach inspired by the Trans-UNet architecture. Our multi-modal method enhances both contextual understanding and temporal consistency. Experimental results on the Objectron dataset demonstrate that our approach surpasses existing algorithms across most categories. Furthermore, real-world tests confirm the accuracy and practical applicability of our method for robotic tasks, such as precision grasping, highlighting its effectiveness for real-world applications.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"19 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1007/s13042-024-02350-w
Limin Ma, Can Tong, Shouliang Qi, Yudong Yao, Yueyang Teng
Nonnegative matrix factorization (NMF) for image clustering attains impressive machine learning performances. However, the current iterative methods for optimizing NMF problems involve numerous matrix calculations and suffer from high computational costs in large-scale images. To address this issue, this paper presents an ordered subsets orthogonal NMF framework (OS-ONMF) that divides the data matrix in an orderly manner into several subsets and performs NMF on each subset. It balances clustering performance and computational efficiency. After decomposition, each ordered subset still contains the core information of the original data. That is, blocking does not reduce image resolutions but can greatly shorten running time. This framework is a general model that can be applied to various existing iterative update algorithms. We also provide a subset selection method and a convergence analysis of the algorithm. Finally, we conducted clustering experiments on seven real-world image datasets. The experimental results showed that the proposed method can greatly shorten the running time without reducing clustering accuracy.
{"title":"An ordered subsets orthogonal nonnegative matrix factorization framework with application to image clustering","authors":"Limin Ma, Can Tong, Shouliang Qi, Yudong Yao, Yueyang Teng","doi":"10.1007/s13042-024-02350-w","DOIUrl":"https://doi.org/10.1007/s13042-024-02350-w","url":null,"abstract":"<p>Nonnegative matrix factorization (NMF) for image clustering attains impressive machine learning performances. However, the current iterative methods for optimizing NMF problems involve numerous matrix calculations and suffer from high computational costs in large-scale images. To address this issue, this paper presents an ordered subsets orthogonal NMF framework (OS-ONMF) that divides the data matrix in an orderly manner into several subsets and performs NMF on each subset. It balances clustering performance and computational efficiency. After decomposition, each ordered subset still contains the core information of the original data. That is, blocking does not reduce image resolutions but can greatly shorten running time. This framework is a general model that can be applied to various existing iterative update algorithms. We also provide a subset selection method and a convergence analysis of the algorithm. Finally, we conducted clustering experiments on seven real-world image datasets. The experimental results showed that the proposed method can greatly shorten the running time without reducing clustering accuracy.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"8 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1007/s13042-024-02317-x
Jin Fan, Jiaqian Xiang, Jie Liu, Zheyu Wang, Huifeng Wu
The long-term time series forecasting (LTSF) plays a crucial role in various domains, utilizing a large amount of historical data to forecast trends over an extended future time range. However, in real-life scenarios, the performance of LTSF is often hindered by missing data. Few-shot learning aims to address the issue of data scarcity, but there is relatively little research on using few-shot learning to tackle sample scarcity in long-term time series forecasting tasks, and most few-shot learning methods rely on transfer learning. To address this problem, this paper proposes a Siamese network-based time series Transformer (SiaTST) for the task of LTSF in a few-shot setting. To increase the diversity of input scales and better capture local features in time series, we adopt a dual-level hierarchical input strategy. Additionally, we introduce a learnable prediction token (LPT) to capture global features of the time series. Furthermore, a feature fusion layer is utilized to capture dependencies among multiple variables and integrate information from different levels. Experimental results on 7 popular LSTF datasets demonstrate that our proposed model achieves state-of-the-art performance.
{"title":"Long-term time series forecasting based on Siamese network: a perspective on few-shot learning","authors":"Jin Fan, Jiaqian Xiang, Jie Liu, Zheyu Wang, Huifeng Wu","doi":"10.1007/s13042-024-02317-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02317-x","url":null,"abstract":"<p>The long-term time series forecasting (LTSF) plays a crucial role in various domains, utilizing a large amount of historical data to forecast trends over an extended future time range. However, in real-life scenarios, the performance of LTSF is often hindered by missing data. Few-shot learning aims to address the issue of data scarcity, but there is relatively little research on using few-shot learning to tackle sample scarcity in long-term time series forecasting tasks, and most few-shot learning methods rely on transfer learning. To address this problem, this paper proposes a Siamese network-based time series Transformer (SiaTST) for the task of LTSF in a few-shot setting. To increase the diversity of input scales and better capture local features in time series, we adopt a dual-level hierarchical input strategy. Additionally, we introduce a learnable prediction token (LPT) to capture global features of the time series. Furthermore, a feature fusion layer is utilized to capture dependencies among multiple variables and integrate information from different levels. Experimental results on 7 popular LSTF datasets demonstrate that our proposed model achieves state-of-the-art performance.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"19 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1007/s13042-024-02358-2
Zayn Wang
Visual health and optimal eyesight hold immense importance in our lives. However, ocular diseases can inflict emotional and financial hardships on patients and families. While various clinical methods exist for diagnosing ocular conditions, early screening of retinal images offers not only a cost-effective approach but also the detection of potential ocular diseases at earlier stages. Simultaneously, many studies have harnessed Convolutional Neural Networks (CNNs) for image recognition, capitalizing on their potential. Nevertheless, the applicability of most networks tends to be limited across different domains. When well-trained models from a domain are applied to another domain, a significant decline in accuracy might occur, thereby constraining the networks’ practical implementation and wider adoption. In this research endeavor, we present a domain adaptive framework, ResNet-50 with Maximum Mean Discrepancy (RMMD). Initially, we employed ResNet-50 architecture as a foundational network, a popular network used for modification and experimenting with whether a module could improve the accuracy. Additionally, we introduce the concept of Maximum Mean Discrepancy (MMD), a metric for quantifying domain differences. Subsequently, we integrate MMD into the loss function, inducing a state of confusion within the network concerning domain disparities. The outcomes derived from the OIA-ODIR dataset substantiate the efficacy of our proposed network. Our framework attains an impressive accuracy of 40.51% (F1) and 81.06% (AUC, Area Under the Receiver Operating Characteristic Curve), marking a notable enhancement of 9.52% and 7.18% respectively when juxtaposed with the fundamental ResNet-50 model, compared with raw ResNet-50 30.99% (F1) and 73.88% (AUC).
{"title":"Enhancing ocular diseases recognition with domain adaptive framework: leveraging domain confusion","authors":"Zayn Wang","doi":"10.1007/s13042-024-02358-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02358-2","url":null,"abstract":"<p>Visual health and optimal eyesight hold immense importance in our lives. However, ocular diseases can inflict emotional and financial hardships on patients and families. While various clinical methods exist for diagnosing ocular conditions, early screening of retinal images offers not only a cost-effective approach but also the detection of potential ocular diseases at earlier stages. Simultaneously, many studies have harnessed Convolutional Neural Networks (CNNs) for image recognition, capitalizing on their potential. Nevertheless, the applicability of most networks tends to be limited across different domains. When well-trained models from a domain are applied to another domain, a significant decline in accuracy might occur, thereby constraining the networks’ practical implementation and wider adoption. In this research endeavor, we present a domain adaptive framework, ResNet-50 with Maximum Mean Discrepancy (RMMD). Initially, we employed ResNet-50 architecture as a foundational network, a popular network used for modification and experimenting with whether a module could improve the accuracy. Additionally, we introduce the concept of Maximum Mean Discrepancy (MMD), a metric for quantifying domain differences. Subsequently, we integrate MMD into the loss function, inducing a state of confusion within the network concerning domain disparities. The outcomes derived from the OIA-ODIR dataset substantiate the efficacy of our proposed network. Our framework attains an impressive accuracy of 40.51% (F1) and 81.06% (AUC, Area Under the Receiver Operating Characteristic Curve), marking a notable enhancement of 9.52% and 7.18% respectively when juxtaposed with the fundamental ResNet-50 model, compared with raw ResNet-50 30.99% (F1) and 73.88% (AUC).</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"58 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}