International Journal of Machine Learning and Cybernetics最新文献_第4页

LWTD: a novel light-weight transformer-like CNN architecture for driving scene dehazing LWTD：用于驾驶场景去毛刺的新型轻量级变压器式 CNN 架构

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-09-02 DOI: 10.1007/s13042-024-02335-9

Zhenbo Zhang, Zhiguo Feng, Aiqi Long, Zhiyu Wang

With the rapid advancement of artificial intelligence and automation technology, interest in autonomous driving research is also growing. However, under heavy rain, fog, and other adverse weather conditions, the visual quality of the images is reduced due to suspended atmospheric particles that affect the vehicle’s visual perception system, which is not conducive to the autonomous driving system’s accurate perception of the road environment. To address these challenges, this article presents a computationally efficient end-to-end light-weight Transformer-like neural network called LWTD (Light-Weight Transformer-like DehazeNet) to reconstruct haze-free images for driving tasks, which based on the reformulated ASM theory without prior knowledge. First, a strategy for simplifying the atmospheric light and transmission map into a feature map is adopted, a CMT (Convolutional Mapping Transformer) module for the extraction of global features is developed, and the hazy image is decomposed into a base layer (global features) and a detail layer (local features) for Low-Level, Medium-Level, and High-Level stages. Meanwhile, a channel attention module is introduced to weigh and assign the weights of each feature, and to fuse them with the reformulated ASM (Atmospheric Scattering Model) model to restore the haze-free image. Second, a joint loss function of the graphical features is formulated to further direct the network to converge in the direction of abundant features. In addition, a dataset of real-world fog driving is constructed. Extensive experiments with synthetic and natural hazy images confirmed the superiority of the proposed method through quantitative and qualitative evaluations on various datasets. Furthermore, additional experiments validated the applicability of the proposed method for traffic participant detection and semantic segmentation tasks. The source code has been made publicly available on https://github.com/ZebGH/LWTD-Net.

随着人工智能和自动化技术的飞速发展，人们对自动驾驶研究的兴趣也与日俱增。然而，在大雨、大雾等恶劣天气条件下，由于悬浮的大气颗粒会影响车辆的视觉感知系统，导致图像的视觉质量下降，不利于自动驾驶系统准确感知道路环境。为了应对这些挑战，本文提出了一种计算高效的端到端轻量级类变形器神经网络，称为 LWTD（Light-Weight Transformer-like DehazeNet），用于重建无雾霾图像以完成驾驶任务，该网络基于重构的 ASM 理论，无需先验知识。首先，采用将大气光和透射图简化为特征图的策略，开发了用于提取全局特征的 CMT（卷积映射变换器）模块，并将雾霾图像分解为低层、中层和高层阶段的基础层（全局特征）和细节层（局部特征）。同时，引入通道关注模块来权衡和分配每个特征的权重，并将其与重新制定的 ASM（大气散射模型）模型融合，以还原无雾霾图像。其次，制定了图形特征的联合损失函数，进一步引导网络向丰富特征的方向收敛。此外，还构建了一个真实世界雾驾驶数据集。通过对各种数据集进行定量和定性评估，利用合成图像和自然雾霾图像进行的大量实验证实了所提方法的优越性。此外，其他实验也验证了所提方法在交通参与者检测和语义分割任务中的适用性。源代码已在 https://github.com/ZebGH/LWTD-Net 上公开。

{"title":"LWTD: a novel light-weight transformer-like CNN architecture for driving scene dehazing","authors":"Zhenbo Zhang, Zhiguo Feng, Aiqi Long, Zhiyu Wang","doi":"10.1007/s13042-024-02335-9","DOIUrl":"https://doi.org/10.1007/s13042-024-02335-9","url":null,"abstract":"With the rapid advancement of artificial intelligence and automation technology, interest in autonomous driving research is also growing. However, under heavy rain, fog, and other adverse weather conditions, the visual quality of the images is reduced due to suspended atmospheric particles that affect the vehicle’s visual perception system, which is not conducive to the autonomous driving system’s accurate perception of the road environment. To address these challenges, this article presents a computationally efficient end-to-end light-weight Transformer-like neural network called LWTD (Light-Weight Transformer-like DehazeNet) to reconstruct haze-free images for driving tasks, which based on the reformulated ASM theory without prior knowledge. First, a strategy for simplifying the atmospheric light and transmission map into a feature map is adopted, a CMT (Convolutional Mapping Transformer) module for the extraction of global features is developed, and the hazy image is decomposed into a base layer (global features) and a detail layer (local features) for Low-Level, Medium-Level, and High-Level stages. Meanwhile, a channel attention module is introduced to weigh and assign the weights of each feature, and to fuse them with the reformulated ASM (Atmospheric Scattering Model) model to restore the haze-free image. Second, a joint loss function of the graphical features is formulated to further direct the network to converge in the direction of abundant features. In addition, a dataset of real-world fog driving is constructed. Extensive experiments with synthetic and natural hazy images confirmed the superiority of the proposed method through quantitative and qualitative evaluations on various datasets. Furthermore, additional experiments validated the applicability of the proposed method for traffic participant detection and semantic segmentation tasks. The source code has been made publicly available on https://github.com/ZebGH/LWTD-Net.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"73 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-source domain adaptation for dependency parsing via domain-aware feature generation 通过领域感知特征生成，实现依赖关系解析的多源领域适应性

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-09-02 DOI: 10.1007/s13042-024-02306-0

Ying Li, Zhenguo Zhang, Yantuan Xian, Zhengtao Yu, Shengxiang Gao, Cunli Mao, Yuxin Huang

With deep representation learning advances, supervised dependency parsing has achieved a notable enhancement. However, when the training data is drawn from various predefined out-domains, the parsing performance drops sharply due to the domain distribution shift. The key to addressing this problem is to model the associations and differences between multiple source and target domains. In this work, we propose an innovative domain-aware adversarial and parameter generation network for multi-source cross-domain dependency parsing where a domain-aware parameter generation network is used for identifying domain-specific features and an adversarial network is used for learning domain-invariant ones. Experiments on the benchmark datasets reveal that our model outperforms strong BERT-enhanced baselines by 2 points in the average labeled attachment score (LAS). Detailed analysis of various domain representation strategies shows that our proposed distributed domain embedding can accurately capture domain relevance, which motivates the domain-aware parameter generation network to emphasize useful domain-specific representations and disregard unnecessary or even harmful ones. Additionally, extensive comparison experiments show deeper insights on the contributions of the two components.

随着深度表示学习的发展，有监督的依赖关系解析能力得到了显著提高。然而，当训练数据来自各种预定义的外域时，解析性能就会因域分布偏移而急剧下降。解决这一问题的关键在于对多个源域和目标域之间的关联和差异进行建模。在这项工作中，我们为多源跨域依赖解析提出了一种创新的领域感知对抗和参数生成网络，其中领域感知参数生成网络用于识别特定领域的特征，对抗网络用于学习领域不变特征。在基准数据集上的实验表明，我们的模型在平均标注附件得分（LAS）方面比强 BERT 增强基线高出 2 分。对各种领域表示策略的详细分析表明，我们提出的分布式领域嵌入能够准确捕捉领域相关性，这促使领域感知参数生成网络强调有用的特定领域表示，而忽略不必要甚至有害的表示。此外，广泛的对比实验还显示了对这两个组件贡献的更深入了解。

{"title":"Multi-source domain adaptation for dependency parsing via domain-aware feature generation","authors":"Ying Li, Zhenguo Zhang, Yantuan Xian, Zhengtao Yu, Shengxiang Gao, Cunli Mao, Yuxin Huang","doi":"10.1007/s13042-024-02306-0","DOIUrl":"https://doi.org/10.1007/s13042-024-02306-0","url":null,"abstract":"With deep representation learning advances, supervised dependency parsing has achieved a notable enhancement. However, when the training data is drawn from various predefined out-domains, the parsing performance drops sharply due to the domain distribution shift. The key to addressing this problem is to model the associations and differences between multiple source and target domains. In this work, we propose an innovative domain-aware adversarial and parameter generation network for multi-source cross-domain dependency parsing where a domain-aware parameter generation network is used for identifying domain-specific features and an adversarial network is used for learning domain-invariant ones. Experiments on the benchmark datasets reveal that our model outperforms strong BERT-enhanced baselines by 2 points in the average labeled attachment score (LAS). Detailed analysis of various domain representation strategies shows that our proposed distributed domain embedding can accurately capture domain relevance, which motivates the domain-aware parameter generation network to emphasize useful domain-specific representations and disregard unnecessary or even harmful ones. Additionally, extensive comparison experiments show deeper insights on the contributions of the two components.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"34 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph neural network based time estimator for SAT solver 基于图神经网络的 SAT 求解器时间估算器

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-31 DOI: 10.1007/s13042-024-02327-9

Jiawei Liu, Wenyi Xiao, Hongtao Cheng, Chuan Shi

SAT-based formal verification is a systematic process to prove the correctness of computer hardware design based on formal specifications, providing an alternative to time-consuming simulations and ensuring design reliability and accuracy. Predicting the runtime of SAT solvers is important to effectively allocate verification resources and determine if the verification can be completed within time limits. Predicting SAT solver runtime is challenging due to variations in solving time across different solvers and dependence on problem complexity and solver mechanisms. Existing approaches rely on feature engineering and machine learning, but they have drawbacks in terms of expert knowledge requirements and time-consuming feature extraction. To address this, using graph neural networks (GNNs) for runtime prediction is considered, as they excel in capturing graph topology and relationships. However, directly applying existing GNNs to predict SAT solver runtime does not yield satisfactory results, as SAT solvers’ proving procedure is crucial. In this paper, we propose a novel model, TESS, that integrates the working mechanism of SAT solvers with graph neural networks (GNNs) for predicting solving time. The model incorporates a graph representation inspired by the CDCL paradigm, proposes adaptive aggregation for multilayer information and separate modules for conflict learning. Experimental results on multiple datasets validate the effectiveness, scalability, and robustness of our model, outperforming baselines in SAT solver runtime prediction.

基于 SAT 的形式化验证是一种基于形式化规范证明计算机硬件设计正确性的系统过程，它提供了一种替代耗时模拟的方法，并确保了设计的可靠性和准确性。预测 SAT 求解器的运行时间对于有效分配验证资源和确定能否在规定时间内完成验证非常重要。由于不同求解器的求解时间存在差异，并且取决于问题的复杂性和求解器机制，因此预测 SAT 求解器的运行时间具有挑战性。现有方法依赖于特征工程和机器学习，但它们在专家知识要求和耗时的特征提取方面存在缺陷。为了解决这个问题，我们考虑使用图神经网络（GNN）进行运行时预测，因为它们在捕捉图拓扑和关系方面表现出色。然而，直接应用现有的图神经网络预测 SAT 解算器的运行时间并不能获得令人满意的结果，因为 SAT 解算器的证明过程至关重要。在本文中，我们提出了一种新型模型 TESS，它将 SAT 求解器的工作机制与图神经网络（GNN）相结合，用于预测求解时间。该模型结合了受 CDCL 范式启发的图表示法，提出了多层信息自适应聚合和冲突学习独立模块。在多个数据集上的实验结果验证了我们模型的有效性、可扩展性和鲁棒性，在 SAT 解算器运行时间预测方面优于基线模型。

{"title":"Graph neural network based time estimator for SAT solver","authors":"Jiawei Liu, Wenyi Xiao, Hongtao Cheng, Chuan Shi","doi":"10.1007/s13042-024-02327-9","DOIUrl":"https://doi.org/10.1007/s13042-024-02327-9","url":null,"abstract":"SAT-based formal verification is a systematic process to prove the correctness of computer hardware design based on formal specifications, providing an alternative to time-consuming simulations and ensuring design reliability and accuracy. Predicting the runtime of SAT solvers is important to effectively allocate verification resources and determine if the verification can be completed within time limits. Predicting SAT solver runtime is challenging due to variations in solving time across different solvers and dependence on problem complexity and solver mechanisms. Existing approaches rely on feature engineering and machine learning, but they have drawbacks in terms of expert knowledge requirements and time-consuming feature extraction. To address this, using graph neural networks (GNNs) for runtime prediction is considered, as they excel in capturing graph topology and relationships. However, directly applying existing GNNs to predict SAT solver runtime does not yield satisfactory results, as SAT solvers’ proving procedure is crucial. In this paper, we propose a novel model, TESS, that integrates the working mechanism of SAT solvers with graph neural networks (GNNs) for predicting solving time. The model incorporates a graph representation inspired by the CDCL paradigm, proposes adaptive aggregation for multilayer information and separate modules for conflict learning. Experimental results on multiple datasets validate the effectiveness, scalability, and robustness of our model, outperforming baselines in SAT solver runtime prediction.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"55 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A data-driven mixed integer programming approach for joint chance-constrained optimal power flow under uncertainty 一种数据驱动的混合整数程序设计方法，用于在不确定条件下实现受机会制约的联合最优电力流

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-31 DOI: 10.1007/s13042-024-02325-x

James Ciyu Qin, Rujun Jiang, Huadong Mo, Daoyi Dong

This paper introduces a novel mixed integer programming (MIP) reformulation for the joint chance-constrained optimal power flow problem under uncertain load and renewable energy generation. Unlike traditional models, our approach incorporates a comprehensive evaluation of system-wide risk without decomposing joint chance constraints into individual constraints, thus preventing overly conservative solutions and ensuring robust system security. A significant innovation in our method is the use of historical data to form a sample average approximation that directly informs the MIP model, bypassing the need for distributional assumptions to enhance solution robustness. Additionally, we implement a model improvement strategy to reduce the computational burden, making our method more scalable for large-scale power systems. Our approach is validated against benchmark systems, i.e., IEEE 14-, 57- and 118-bus systems, demonstrating superior performance in terms of cost-efficiency and robustness, with lower computational demand compared to existing methods.

本文针对不确定负荷和可再生能源发电条件下的联合机会约束最优功率流问题，介绍了一种新颖的混合整数编程（MIP）重构方法。与传统模型不同的是，我们的方法纳入了对整个系统风险的综合评估，而没有将联合机会约束分解为单个约束，从而避免了过于保守的解决方案，确保了稳健的系统安全。我们方法的一大创新是利用历史数据形成样本平均近似值，直接为 MIP 模型提供信息，从而绕过了对分布假设的需求，增强了解决方案的稳健性。此外，我们还实施了一种模型改进策略，以减轻计算负担，从而使我们的方法在大规模电力系统中更具可扩展性。我们的方法经过了基准系统（即 IEEE 14、57 和 118 总线系统）的验证，在成本效益和鲁棒性方面表现出色，与现有方法相比，计算需求更低。

{"title":"A data-driven mixed integer programming approach for joint chance-constrained optimal power flow under uncertainty","authors":"James Ciyu Qin, Rujun Jiang, Huadong Mo, Daoyi Dong","doi":"10.1007/s13042-024-02325-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02325-x","url":null,"abstract":"This paper introduces a novel mixed integer programming (MIP) reformulation for the joint chance-constrained optimal power flow problem under uncertain load and renewable energy generation. Unlike traditional models, our approach incorporates a comprehensive evaluation of system-wide risk without decomposing joint chance constraints into individual constraints, thus preventing overly conservative solutions and ensuring robust system security. A significant innovation in our method is the use of historical data to form a sample average approximation that directly informs the MIP model, bypassing the need for distributional assumptions to enhance solution robustness. Additionally, we implement a model improvement strategy to reduce the computational burden, making our method more scalable for large-scale power systems. Our approach is validated against benchmark systems, i.e., IEEE 14-, 57- and 118-bus systems, demonstrating superior performance in terms of cost-efficiency and robustness, with lower computational demand compared to existing methods.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"47 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Relevance-aware visual entity filter network for multimodal aspect-based sentiment analysis 基于多模态方面的情感分析的相关性感知视觉实体过滤网络

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-30 DOI: 10.1007/s13042-024-02342-w

Yifan Chen, Haoliang Xiong, Kuntao Li, Weixing Mai, Yun Xue, Qianhua Cai, Fenghuan Li

Multimodal aspect-based sentiment analysis, which aims to identify the sentiment polarities over each aspect mentioned in an image-text pair, has sparked considerable research interest in the field of multimodal analysis. Despite existing approaches have shown remarkable results in incorporating external knowledge to enhance visual entity information, they still suffer from two problems: (1) the image-aspect global relevance. (2) the entity-aspect local alignment. To tackle these issues, we propose a Relevance-Aware Visual Entity Filter Network (REF) for MABSA. Specifically, we utilize the nouns of ANPs extracted from the given image as bridges to facilitate cross-modal feature alignment. Moreover, we introduce an additional “UNRELATED” marker word and utilize Contrastive Content Re-sourcing (CCR) and Contrastive Content Swapping (CCS) constraints to obtain accurate attention weight to identify image-aspect relevance for dynamically controlling the contribution of visual information. We further adopt the accurate reversed attention weight distributions to selectively filter out aspect-unrelated visual entities for better entity-aspect alignment. Comprehensive experimental results demonstrate the consistent superiority of our REF model over state-of-the-art approaches on the Twitter-2015 and Twitter-2017 datasets.

基于多模态方面的情感分析旨在识别图像-文本对中提及的每个方面的情感极性，在多模态分析领域引发了相当大的研究兴趣。尽管现有方法在结合外部知识增强视觉实体信息方面取得了显著成果，但它们仍然存在两个问题：(1) 图像-方面的全局相关性。(2) 实体方面的局部对齐。为了解决这些问题，我们为 MABSA 提出了相关性感知视觉实体过滤网络（REF）。具体来说，我们利用从给定图像中提取的 ANPs 的名词作为桥梁，促进跨模态特征配准。此外，我们还引入了一个额外的 "无关联 "标记词，并利用对比内容再来源（CCR）和对比内容交换（CCS）约束来获得精确的注意力权重，以识别图像-视角相关性，从而动态控制视觉信息的贡献。我们进一步采用精确的反向注意力权重分布，有选择性地过滤掉与方面无关的视觉实体，以更好地实现实体-方面的配准。综合实验结果表明，在 Twitter-2015 和 Twitter-2017 数据集上，我们的 REF 模型始终优于最先进的方法。

{"title":"Relevance-aware visual entity filter network for multimodal aspect-based sentiment analysis","authors":"Yifan Chen, Haoliang Xiong, Kuntao Li, Weixing Mai, Yun Xue, Qianhua Cai, Fenghuan Li","doi":"10.1007/s13042-024-02342-w","DOIUrl":"https://doi.org/10.1007/s13042-024-02342-w","url":null,"abstract":"Multimodal aspect-based sentiment analysis, which aims to identify the sentiment polarities over each aspect mentioned in an image-text pair, has sparked considerable research interest in the field of multimodal analysis. Despite existing approaches have shown remarkable results in incorporating external knowledge to enhance visual entity information, they still suffer from two problems: (1) the image-aspect global relevance. (2) the entity-aspect local alignment. To tackle these issues, we propose a Relevance-Aware Visual Entity Filter Network (REF) for MABSA. Specifically, we utilize the nouns of ANPs extracted from the given image as bridges to facilitate cross-modal feature alignment. Moreover, we introduce an additional “UNRELATED” marker word and utilize Contrastive Content Re-sourcing (CCR) and Contrastive Content Swapping (CCS) constraints to obtain accurate attention weight to identify image-aspect relevance for dynamically controlling the contribution of visual information. We further adopt the accurate reversed attention weight distributions to selectively filter out aspect-unrelated visual entities for better entity-aspect alignment. Comprehensive experimental results demonstrate the consistent superiority of our REF model over state-of-the-art approaches on the Twitter-2015 and Twitter-2017 datasets.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"23 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new neural network method for solving Bratu type equations with rational polynomials 用有理多项式求解布拉图方程的新神经网络方法

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-30 DOI: 10.1007/s13042-024-02340-y

Jilong He, Cong Cao

The Bratu-type equation is a fundamental differential equation with numerous applications in engineering fields, such as radiative heat transfer, thermal reaction, and nanotechnology. This paper introduces a novel approach known as the rational polynomial neural network. In this approach, rational orthogonal polynomials are utilized within the neural network’s hidden layer. To solve the equation, the initial boundary value conditions of both the differential equation and the rational polynomial neural network are integrated into the construction of the numerical solution. This construction transforms the Bratu-type equation into a set of nonlinear equations, which are subsequently solved using an appropriate optimization technique. Finally, three sets of numerical examples are presented to validate the efficacy and versatility of the proposed rational orthogonal neural network method, with comparisons made across different hyperparameters. Furthermore, the experimental results are juxtaposed against traditional methods such as the Adomian decomposition method, genetic algorithm, Laplace transform method, spectral method, and multilayer perceptron, our method exhibits consistently optimal performance.

布拉图型方程是一种基本微分方程，在辐射传热、热反应和纳米技术等工程领域应用广泛。本文介绍了一种称为有理多项式神经网络的新方法。在这种方法中，有理正交多项式被用于神经网络的隐藏层。为了求解方程，微分方程和有理多项式神经网络的初始边界值条件都被整合到数值解的构建中。这种构造将布拉图型方程转化为一组非线性方程，随后使用适当的优化技术对其进行求解。最后，介绍了三组数值示例，以验证所提出的有理正交神经网络方法的有效性和多功能性，并对不同的超参数进行了比较。此外，实验结果与阿多米分解法、遗传算法、拉普拉斯变换法、光谱法和多层感知器等传统方法相比，我们的方法始终表现出最佳性能。

{"title":"A new neural network method for solving Bratu type equations with rational polynomials","authors":"Jilong He, Cong Cao","doi":"10.1007/s13042-024-02340-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02340-y","url":null,"abstract":"The Bratu-type equation is a fundamental differential equation with numerous applications in engineering fields, such as radiative heat transfer, thermal reaction, and nanotechnology. This paper introduces a novel approach known as the rational polynomial neural network. In this approach, rational orthogonal polynomials are utilized within the neural network’s hidden layer. To solve the equation, the initial boundary value conditions of both the differential equation and the rational polynomial neural network are integrated into the construction of the numerical solution. This construction transforms the Bratu-type equation into a set of nonlinear equations, which are subsequently solved using an appropriate optimization technique. Finally, three sets of numerical examples are presented to validate the efficacy and versatility of the proposed rational orthogonal neural network method, with comparisons made across different hyperparameters. Furthermore, the experimental results are juxtaposed against traditional methods such as the Adomian decomposition method, genetic algorithm, Laplace transform method, spectral method, and multilayer perceptron, our method exhibits consistently optimal performance.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"20 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-modal 6-DoF object pose tracking: integrating spatial cues with monocular RGB imagery 多模态 6-DoF 物体姿态跟踪：将空间线索与单目 RGB 图像相结合

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-30 DOI: 10.1007/s13042-024-02336-8

Yunpeng Mei, Shuze Wang, Zhuo Li, Jian Sun, Gang Wang

Accurate six degrees of freedom (6-DoF) pose estimation is crucial for robust visual perception in fields such as smart manufacturing. Traditional RGB-based methods, though widely used, often face difficulties in adapting to dynamic scenes, understanding contextual information, and capturing temporal variations effectively. To address these challenges, we introduce a novel multi-modal 6-DoF pose estimation framework. This framework uses RGB images as the primary input and integrates spatial cues, including keypoint heatmaps and affinity fields, through a spatially aligned approach inspired by the Trans-UNet architecture. Our multi-modal method enhances both contextual understanding and temporal consistency. Experimental results on the Objectron dataset demonstrate that our approach surpasses existing algorithms across most categories. Furthermore, real-world tests confirm the accuracy and practical applicability of our method for robotic tasks, such as precision grasping, highlighting its effectiveness for real-world applications.

精确的六自由度（6-DoF）姿态估计对于智能制造等领域的稳健视觉感知至关重要。传统的基于 RGB 的方法虽然应用广泛，但在适应动态场景、理解上下文信息和有效捕捉时间变化方面往往面临困难。为了应对这些挑战，我们引入了一种新颖的多模态 6-DoF 姿态估计框架。该框架使用 RGB 图像作为主要输入，并通过受 Trans-UNet 架构启发的空间对齐方法整合了空间线索，包括关键点热图和亲和场。我们的多模态方法增强了上下文理解和时间一致性。在 Objectron 数据集上的实验结果表明，我们的方法在大多数类别上都超越了现有算法。此外，实际测试证实了我们的方法在机器人任务（如精确抓取）中的准确性和实际适用性，突出了其在实际应用中的有效性。

引用次数: 0

An ordered subsets orthogonal nonnegative matrix factorization framework with application to image clustering 有序子集正交非负矩阵因式分解框架在图像聚类中的应用

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-30 DOI: 10.1007/s13042-024-02350-w

Limin Ma, Can Tong, Shouliang Qi, Yudong Yao, Yueyang Teng

Nonnegative matrix factorization (NMF) for image clustering attains impressive machine learning performances. However, the current iterative methods for optimizing NMF problems involve numerous matrix calculations and suffer from high computational costs in large-scale images. To address this issue, this paper presents an ordered subsets orthogonal NMF framework (OS-ONMF) that divides the data matrix in an orderly manner into several subsets and performs NMF on each subset. It balances clustering performance and computational efficiency. After decomposition, each ordered subset still contains the core information of the original data. That is, blocking does not reduce image resolutions but can greatly shorten running time. This framework is a general model that can be applied to various existing iterative update algorithms. We also provide a subset selection method and a convergence analysis of the algorithm. Finally, we conducted clustering experiments on seven real-world image datasets. The experimental results showed that the proposed method can greatly shorten the running time without reducing clustering accuracy.

用于图像聚类的非负矩阵因式分解（NMF）具有令人印象深刻的机器学习性能。然而，目前用于优化 NMF 问题的迭代方法涉及大量矩阵计算，在大规模图像中存在计算成本高的问题。为了解决这个问题，本文提出了一种有序子集正交 NMF 框架（OS-ONMF），它将数据矩阵有序地划分为多个子集，并在每个子集上执行 NMF。它兼顾了聚类性能和计算效率。分解后，每个有序子集仍包含原始数据的核心信息。也就是说，分块不会降低图像分辨率，却能大大缩短运行时间。这个框架是一个通用模型，可以应用于现有的各种迭代更新算法。我们还提供了子集选择方法和算法的收敛性分析。最后，我们在七个真实世界的图像数据集上进行了聚类实验。实验结果表明，所提出的方法可以在不降低聚类精度的情况下大大缩短运行时间。

{"title":"An ordered subsets orthogonal nonnegative matrix factorization framework with application to image clustering","authors":"Limin Ma, Can Tong, Shouliang Qi, Yudong Yao, Yueyang Teng","doi":"10.1007/s13042-024-02350-w","DOIUrl":"https://doi.org/10.1007/s13042-024-02350-w","url":null,"abstract":"Nonnegative matrix factorization (NMF) for image clustering attains impressive machine learning performances. However, the current iterative methods for optimizing NMF problems involve numerous matrix calculations and suffer from high computational costs in large-scale images. To address this issue, this paper presents an ordered subsets orthogonal NMF framework (OS-ONMF) that divides the data matrix in an orderly manner into several subsets and performs NMF on each subset. It balances clustering performance and computational efficiency. After decomposition, each ordered subset still contains the core information of the original data. That is, blocking does not reduce image resolutions but can greatly shorten running time. This framework is a general model that can be applied to various existing iterative update algorithms. We also provide a subset selection method and a convergence analysis of the algorithm. Finally, we conducted clustering experiments on seven real-world image datasets. The experimental results showed that the proposed method can greatly shorten the running time without reducing clustering accuracy.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"8 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Long-term time series forecasting based on Siamese network: a perspective on few-shot learning 基于连体网络的长期时间序列预测：少数几次学习的视角

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-29 DOI: 10.1007/s13042-024-02317-x

Jin Fan, Jiaqian Xiang, Jie Liu, Zheyu Wang, Huifeng Wu

The long-term time series forecasting (LTSF) plays a crucial role in various domains, utilizing a large amount of historical data to forecast trends over an extended future time range. However, in real-life scenarios, the performance of LTSF is often hindered by missing data. Few-shot learning aims to address the issue of data scarcity, but there is relatively little research on using few-shot learning to tackle sample scarcity in long-term time series forecasting tasks, and most few-shot learning methods rely on transfer learning. To address this problem, this paper proposes a Siamese network-based time series Transformer (SiaTST) for the task of LTSF in a few-shot setting. To increase the diversity of input scales and better capture local features in time series, we adopt a dual-level hierarchical input strategy. Additionally, we introduce a learnable prediction token (LPT) to capture global features of the time series. Furthermore, a feature fusion layer is utilized to capture dependencies among multiple variables and integrate information from different levels. Experimental results on 7 popular LSTF datasets demonstrate that our proposed model achieves state-of-the-art performance.

长期时间序列预测（LTSF）在各个领域都发挥着重要作用，它利用大量历史数据来预测未来较长一段时间内的趋势。然而，在现实生活中，长期时间序列预测的性能往往受到缺失数据的阻碍。少点学习旨在解决数据缺失问题，但利用少点学习解决长期时间序列预测任务中样本缺失问题的研究相对较少，而且大多数少点学习方法都依赖于迁移学习。为解决这一问题，本文提出了一种基于连体网络的时间序列转换器（SiaTST），用于在少点学习环境下完成长期时间序列预测任务。为了增加输入尺度的多样性并更好地捕捉时间序列中的局部特征，我们采用了双层分级输入策略。此外，我们还引入了可学习预测标记（LPT）来捕捉时间序列的全局特征。此外，我们还利用特征融合层来捕捉多个变量之间的依赖关系，并整合来自不同层次的信息。在 7 个流行的 LSTF 数据集上的实验结果表明，我们提出的模型达到了最先进的性能。

{"title":"Long-term time series forecasting based on Siamese network: a perspective on few-shot learning","authors":"Jin Fan, Jiaqian Xiang, Jie Liu, Zheyu Wang, Huifeng Wu","doi":"10.1007/s13042-024-02317-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02317-x","url":null,"abstract":"The long-term time series forecasting (LTSF) plays a crucial role in various domains, utilizing a large amount of historical data to forecast trends over an extended future time range. However, in real-life scenarios, the performance of LTSF is often hindered by missing data. Few-shot learning aims to address the issue of data scarcity, but there is relatively little research on using few-shot learning to tackle sample scarcity in long-term time series forecasting tasks, and most few-shot learning methods rely on transfer learning. To address this problem, this paper proposes a Siamese network-based time series Transformer (SiaTST) for the task of LTSF in a few-shot setting. To increase the diversity of input scales and better capture local features in time series, we adopt a dual-level hierarchical input strategy. Additionally, we introduce a learnable prediction token (LPT) to capture global features of the time series. Furthermore, a feature fusion layer is utilized to capture dependencies among multiple variables and integrate information from different levels. Experimental results on 7 popular LSTF datasets demonstrate that our proposed model achieves state-of-the-art performance.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"19 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing ocular diseases recognition with domain adaptive framework: leveraging domain confusion 利用领域自适应框架增强眼科疾病识别能力：利用领域混淆

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-29 DOI: 10.1007/s13042-024-02358-2

Zayn Wang

Visual health and optimal eyesight hold immense importance in our lives. However, ocular diseases can inflict emotional and financial hardships on patients and families. While various clinical methods exist for diagnosing ocular conditions, early screening of retinal images offers not only a cost-effective approach but also the detection of potential ocular diseases at earlier stages. Simultaneously, many studies have harnessed Convolutional Neural Networks (CNNs) for image recognition, capitalizing on their potential. Nevertheless, the applicability of most networks tends to be limited across different domains. When well-trained models from a domain are applied to another domain, a significant decline in accuracy might occur, thereby constraining the networks’ practical implementation and wider adoption. In this research endeavor, we present a domain adaptive framework, ResNet-50 with Maximum Mean Discrepancy (RMMD). Initially, we employed ResNet-50 architecture as a foundational network, a popular network used for modification and experimenting with whether a module could improve the accuracy. Additionally, we introduce the concept of Maximum Mean Discrepancy (MMD), a metric for quantifying domain differences. Subsequently, we integrate MMD into the loss function, inducing a state of confusion within the network concerning domain disparities. The outcomes derived from the OIA-ODIR dataset substantiate the efficacy of our proposed network. Our framework attains an impressive accuracy of 40.51% (F1) and 81.06% (AUC, Area Under the Receiver Operating Characteristic Curve), marking a notable enhancement of 9.52% and 7.18% respectively when juxtaposed with the fundamental ResNet-50 model, compared with raw ResNet-50 30.99% (F1) and 73.88% (AUC).

视觉健康和最佳视力在我们的生活中占有极其重要的地位。然而，眼部疾病会给患者和家庭带来精神和经济上的痛苦。虽然有各种临床方法可以诊断眼部疾病，但视网膜图像的早期筛查不仅是一种具有成本效益的方法，还能在早期阶段发现潜在的眼部疾病。与此同时，许多研究利用卷积神经网络（CNN）的潜力进行图像识别。然而，大多数网络在不同领域的适用性往往有限。当一个领域中训练有素的模型应用到另一个领域时，准确率可能会显著下降，从而限制了网络的实际应用和广泛采用。在这项研究工作中，我们提出了一个领域自适应框架，即具有最大均值差异（RMMD）的 ResNet-50。首先，我们采用 ResNet-50 架构作为基础网络，这是一个用于修改和实验模块是否能提高准确性的常用网络。此外，我们还引入了最大平均差异（MMD）的概念，这是一种量化领域差异的指标。随后，我们将最大平均差异纳入损失函数中，从而在网络中产生一种关于领域差异的混淆状态。从 OIA-ODIR 数据集得出的结果证明了我们提出的网络的有效性。与原始 ResNet-50 的 30.99% (F1) 和 73.88% (AUC) 相比，我们的框架达到了令人印象深刻的 40.51% (F1)和 81.06% (AUC，Receiver Operating Characteristic Curve 下的面积)的准确率，与基本 ResNet-50 模型相比分别提高了 9.52% 和 7.18%。

{"title":"Enhancing ocular diseases recognition with domain adaptive framework: leveraging domain confusion","authors":"Zayn Wang","doi":"10.1007/s13042-024-02358-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02358-2","url":null,"abstract":"Visual health and optimal eyesight hold immense importance in our lives. However, ocular diseases can inflict emotional and financial hardships on patients and families. While various clinical methods exist for diagnosing ocular conditions, early screening of retinal images offers not only a cost-effective approach but also the detection of potential ocular diseases at earlier stages. Simultaneously, many studies have harnessed Convolutional Neural Networks (CNNs) for image recognition, capitalizing on their potential. Nevertheless, the applicability of most networks tends to be limited across different domains. When well-trained models from a domain are applied to another domain, a significant decline in accuracy might occur, thereby constraining the networks’ practical implementation and wider adoption. In this research endeavor, we present a domain adaptive framework, ResNet-50 with Maximum Mean Discrepancy (RMMD). Initially, we employed ResNet-50 architecture as a foundational network, a popular network used for modification and experimenting with whether a module could improve the accuracy. Additionally, we introduce the concept of Maximum Mean Discrepancy (MMD), a metric for quantifying domain differences. Subsequently, we integrate MMD into the loss function, inducing a state of confusion within the network concerning domain disparities. The outcomes derived from the OIA-ODIR dataset substantiate the efficacy of our proposed network. Our framework attains an impressive accuracy of 40.51% (F1) and 81.06% (AUC, Area Under the Receiver Operating Characteristic Curve), marking a notable enhancement of 9.52% and 7.18% respectively when juxtaposed with the fundamental ResNet-50 model, compared with raw ResNet-50 30.99% (F1) and 73.88% (AUC).","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"58 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0