Algorithms最新文献_第4页

Cross-Project Defect Prediction Based on Domain Adaptation and LSTM Optimization 基于领域适应和 LSTM 优化的跨项目缺陷预测

Algorithms

Pub Date : 2024-04-24 DOI: 10.3390/a17050175

Khadija Javed, Shengbing Ren, M. Asim, M. A. Wani

Cross-project defect prediction (CPDP) aims to predict software defects in a target project domain by leveraging information from different source project domains, allowing testers to identify defective modules quickly. However, CPDP models often underperform due to different data distributions between source and target domains, class imbalances, and the presence of noisy and irrelevant instances in both source and target projects. Additionally, standard features often fail to capture sufficient semantic and contextual information from the source project, leading to poor prediction performance in the target project. To address these challenges, this research proposes Smote Correlation and Attention Gated recurrent unit based Long Short-Term Memory optimization (SCAG-LSTM), which first employs a novel hybrid technique that extends the synthetic minority over-sampling technique (SMOTE) with edited nearest neighbors (ENN) to rebalance class distributions and mitigate the issues caused by noisy and irrelevant instances in both source and target domains. Furthermore, correlation-based feature selection (CFS) with best-first search (BFS) is utilized to identify and select the most important features, aiming to reduce the differences in data distribution among projects. Additionally, SCAG-LSTM integrates bidirectional gated recurrent unit (Bi-GRU) and bidirectional long short-term memory (Bi-LSTM) networks to enhance the effectiveness of the long short-term memory (LSTM) model. These components efficiently capture semantic and contextual information as well as dependencies within the data, leading to more accurate predictions. Moreover, an attention mechanism is incorporated into the model to focus on key features, further improving prediction performance. Experiments are conducted on apache_lucene, equinox, eclipse_jdt_core, eclipse_pde_ui, and mylyn (AEEEM) and predictor models in software engineering (PROMISE) datasets and compared with active learning-based method (ALTRA), multi-source-based cross-project defect prediction method (MSCPDP), the two-phase feature importance amplification method (TFIA) on AEEEM and the two-phase transfer learning method (TPTL), domain adaptive kernel twin support vector machines method (DA-KTSVMO), and generative adversarial long-short term memory neural networks method (GB-CPDP) on PROMISE datasets. The results demonstrate that the proposed SCAG-LSTM model enhances the baseline models by 33.03%, 29.15% and 1.48% in terms of F1- measure and by 16.32%, 34.41% and 3.59% in terms of Area Under the Curve (AUC) on the AEEEM dataset, while on the PROMISE dataset it enhances the baseline models’ F1- measure by 42.60%, 32.00% and 25.10% and AUC by 34.90%, 27.80% and 12.96%. These findings suggest that the proposed model exhibits strong predictive performance.

跨项目缺陷预测（CPDP）旨在利用来自不同源项目域的信息预测目标项目域中的软件缺陷，使测试人员能够快速识别缺陷模块。然而，由于源领域和目标领域的数据分布不同、类不平衡以及源项目和目标项目中都存在噪声和不相关的实例，CPDP 模型往往表现不佳。此外，标准特征往往无法从源项目中获取足够的语义和上下文信息，从而导致目标项目中的预测性能不佳。为了应对这些挑战，本研究提出了基于长短期记忆优化的 Smote Correlation and Attention Gated 循环单元（SCAG-LSTM），它首先采用了一种新颖的混合技术，将合成少数过度采样技术（SMOTE）与编辑近邻技术（ENN）进行了扩展，以重新平衡类分布，缓解源域和目标域中由噪声和不相关实例引起的问题。此外，还利用基于相关性的特征选择（CFS）和最佳优先搜索（BFS）来识别和选择最重要的特征，以减少项目间数据分布的差异。此外，SCAG-LSTM 还集成了双向门控递归单元（Bi-GRU）和双向长短期记忆（Bi-LSTM）网络，以增强长短期记忆（LSTM）模型的有效性。这些组件能有效捕捉语义和上下文信息以及数据中的依赖关系，从而实现更准确的预测。此外，还在模型中加入了注意力机制，以关注关键特征，从而进一步提高预测性能。在 apache_lucene、equinox、eclipse_jdt_core、eclipse_pde_ui 和 mylyn（AEEEM）以及软件工程（PROMISE）数据集中的预测模型上进行了实验，并与基于主动学习的方法（ALTRA）、基于多源的跨项目缺陷预测方法（MSCPDP）进行了比较、在 PROMISE 数据集上与基于主动学习的方法（ALTRA）、基于多源的跨项目缺陷预测方法（MSCPDP）、AEEEM 上的两阶段特征重要性放大方法（TFIA）、两阶段迁移学习方法（TPTL）、域自适应核孪生支持向量机方法（DA-KTSVMO）和生成对抗长短期记忆神经网络方法（GB-CPDP）进行了比较。结果表明，在 AEEEM 数据集上，拟议的 SCAG-LSTM 模型的 F1- 测量值比基线模型分别提高了 33.03%、29.15% 和 1.48%，曲线下面积（AUC）比基线模型分别提高了 16.32%、34.41% 和 3.59%；在 PROMISE 数据集上，拟议的 SCAG-LSTM 模型的 F1- 测量值比基线模型分别提高了 42.60%、32.00% 和 25.10%，曲线下面积（AUC）比基线模型分别提高了 34.90%、27.80% 和 12.96%。这些发现表明，所提出的模型具有很强的预测性能。

{"title":"Cross-Project Defect Prediction Based on Domain Adaptation and LSTM Optimization","authors":"Khadija Javed, Shengbing Ren, M. Asim, M. A. Wani","doi":"10.3390/a17050175","DOIUrl":"https://doi.org/10.3390/a17050175","url":null,"abstract":"Cross-project defect prediction (CPDP) aims to predict software defects in a target project domain by leveraging information from different source project domains, allowing testers to identify defective modules quickly. However, CPDP models often underperform due to different data distributions between source and target domains, class imbalances, and the presence of noisy and irrelevant instances in both source and target projects. Additionally, standard features often fail to capture sufficient semantic and contextual information from the source project, leading to poor prediction performance in the target project. To address these challenges, this research proposes Smote Correlation and Attention Gated recurrent unit based Long Short-Term Memory optimization (SCAG-LSTM), which first employs a novel hybrid technique that extends the synthetic minority over-sampling technique (SMOTE) with edited nearest neighbors (ENN) to rebalance class distributions and mitigate the issues caused by noisy and irrelevant instances in both source and target domains. Furthermore, correlation-based feature selection (CFS) with best-first search (BFS) is utilized to identify and select the most important features, aiming to reduce the differences in data distribution among projects. Additionally, SCAG-LSTM integrates bidirectional gated recurrent unit (Bi-GRU) and bidirectional long short-term memory (Bi-LSTM) networks to enhance the effectiveness of the long short-term memory (LSTM) model. These components efficiently capture semantic and contextual information as well as dependencies within the data, leading to more accurate predictions. Moreover, an attention mechanism is incorporated into the model to focus on key features, further improving prediction performance. Experiments are conducted on apache_lucene, equinox, eclipse_jdt_core, eclipse_pde_ui, and mylyn (AEEEM) and predictor models in software engineering (PROMISE) datasets and compared with active learning-based method (ALTRA), multi-source-based cross-project defect prediction method (MSCPDP), the two-phase feature importance amplification method (TFIA) on AEEEM and the two-phase transfer learning method (TPTL), domain adaptive kernel twin support vector machines method (DA-KTSVMO), and generative adversarial long-short term memory neural networks method (GB-CPDP) on PROMISE datasets. The results demonstrate that the proposed SCAG-LSTM model enhances the baseline models by 33.03%, 29.15% and 1.48% in terms of F1- measure and by 16.32%, 34.41% and 3.59% in terms of Area Under the Curve (AUC) on the AEEEM dataset, while on the PROMISE dataset it enhances the baseline models’ F1- measure by 42.60%, 32.00% and 25.10% and AUC by 34.90%, 27.80% and 12.96%. These findings suggest that the proposed model exhibits strong predictive performance.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"55 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140662665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved Brain Storm Optimization Algorithm Based on Flock Decision Mutation Strategy 基于群决策突变策略的改进型头脑风暴优化算法

Algorithms

Pub Date : 2024-04-23 DOI: 10.3390/a17050172

Yanchi Zhao, Jia-Ping Cheng, Jing Cai

To tackle the problem of the brain storm optimization (BSO) algorithm’s suboptimal capability for avoiding local optima, which contributes to its inadequate optimization precision, we developed a flock decision mutation approach that substantially enhances the efficacy of the BSO algorithm. Furthermore, to solve the problem of insufficient BSO algorithm population diversity, we introduced a strategy that utilizes the good point set to enhance the initial population’s quality. Simultaneously, we substituted the K-means clustering approach with spectral clustering to improve the clustering accuracy of the algorithm. This work introduced an enhanced version of the brain storm optimization algorithm founded on a flock decision mutation strategy (FDIBSO). The improved algorithm was compared against contemporary leading algorithms through the CEC2018. The experimental section additionally employs the AUV intelligence evaluation as an application case. It addresses the combined weight model under various dimensional settings to substantiate the efficacy of the FDIBSO algorithm further. The findings indicate that FDIBSO surpasses BSO and other enhanced algorithms for addressing intricate optimization challenges.

为了解决脑暴优化算法（BSO）在避免局部最优化方面能力不足，导致其优化精度不够的问题，我们开发了一种羊群决策突变方法，大大提高了 BSO 算法的功效。此外，为了解决 BSO 算法种群多样性不足的问题，我们引入了一种利用好点集来提高初始种群质量的策略。同时，我们用光谱聚类取代了 K-means 聚类方法，提高了算法的聚类精度。这项工作引入了基于群决策突变策略的增强版头脑风暴优化算法（FDIBSO）。改进后的算法通过 CEC2018 与当代领先算法进行了比较。实验部分还采用了 AUV 智能评估作为应用案例。它针对不同维度设置下的组合权重模型，进一步证实了 FDIBSO 算法的有效性。研究结果表明，在应对复杂的优化挑战方面，FDIBSO 超越了 BSO 和其他增强算法。

{"title":"Improved Brain Storm Optimization Algorithm Based on Flock Decision Mutation Strategy","authors":"Yanchi Zhao, Jia-Ping Cheng, Jing Cai","doi":"10.3390/a17050172","DOIUrl":"https://doi.org/10.3390/a17050172","url":null,"abstract":"To tackle the problem of the brain storm optimization (BSO) algorithm’s suboptimal capability for avoiding local optima, which contributes to its inadequate optimization precision, we developed a flock decision mutation approach that substantially enhances the efficacy of the BSO algorithm. Furthermore, to solve the problem of insufficient BSO algorithm population diversity, we introduced a strategy that utilizes the good point set to enhance the initial population’s quality. Simultaneously, we substituted the K-means clustering approach with spectral clustering to improve the clustering accuracy of the algorithm. This work introduced an enhanced version of the brain storm optimization algorithm founded on a flock decision mutation strategy (FDIBSO). The improved algorithm was compared against contemporary leading algorithms through the CEC2018. The experimental section additionally employs the AUV intelligence evaluation as an application case. It addresses the combined weight model under various dimensional settings to substantiate the efficacy of the FDIBSO algorithm further. The findings indicate that FDIBSO surpasses BSO and other enhanced algorithms for addressing intricate optimization challenges.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"12 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140666986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Overview of Demand Analysis and Forecasting Algorithms for the Flow of Checked Baggage among Departing Passengers 离境旅客托运行李流量需求分析和预测算法概述

Algorithms

Pub Date : 2024-04-23 DOI: 10.3390/a17050173

Bo Jiang, Guofu Ding, Jianlin Fu, Jian Zhang, Yong Zhang

The research on baggage flow plays a pivotal role in achieving the efficient and intelligent allocation and scheduling of airport service resources, as well as serving as a fundamental element in determining the design, development, and process optimization of airport baggage handling systems. This paper examines baggage checked in by departing passengers at airports. The crrent state of the research on baggage flow demand is first reviewed and analyzed. Then, using examples of objective data, it is concluded that while there is a significant correlation between airport passenger flow and baggage flow, an increase in passenger flow does not necessarily result in a proportional increase in baggage flow. According to the existing research results on the influencing factors of baggage flow sorting and classification, the main influencing factors of baggage flow are divided into two categories: macro-influencing factors and micro-influencing factors. When studying the relationship between the economy and baggage flow, it is recommended to use a comprehensive analysis that includes multiple economic indicators, rather than relying solely on GDP. This paper provides a brief overview of prevalent transportation flow prediction methods, categorizing algorithmic models into three groups: based on mathematical and statistical models, intelligent algorithmic-based models, and combined algorithmic models utilizing artificial neural networks. The structures, strengths, and weaknesses of various transportation flow prediction algorithms are analyzed, as well as their application scenarios. The potential advantages of using artificial neural network-based combined prediction models for baggage flow forecasting are explained. It concludes with an outlook on research regarding the demand for baggage flow. This review may provide further research assistance to scholars in airport management and baggage handling system development.

行李流研究对于实现机场服务资源的高效、智能分配和调度具有举足轻重的作用，同时也是决定机场行李处理系统的设计、开发和流程优化的基本要素。本文研究机场离港旅客托运的行李。首先回顾并分析了行李流需求的研究现状。然后，通过客观数据实例得出结论：虽然机场旅客流量与行李流量之间存在显著的相关性，但旅客流量的增加并不一定会导致行李流量的成比例增加。根据现有的行李流量影响因素梳理和分类研究成果，行李流量的主要影响因素分为宏观影响因素和微观影响因素两类。在研究经济与行李流之间的关系时，建议采用包含多种经济指标的综合分析方法，而不是单纯依赖 GDP。本文简要概述了目前流行的运输流量预测方法，将算法模型分为三类：基于数理统计模型的算法模型、基于智能算法的算法模型和利用人工神经网络的组合算法模型。分析了各种交通流预测算法的结构、优缺点及其应用场景。阐述了在行李流量预测中使用基于人工神经网络的组合预测模型的潜在优势。最后对行李流量需求研究进行了展望。本综述可为机场管理和行李处理系统开发方面的学者提供进一步的研究帮助。

{"title":"An Overview of Demand Analysis and Forecasting Algorithms for the Flow of Checked Baggage among Departing Passengers","authors":"Bo Jiang, Guofu Ding, Jianlin Fu, Jian Zhang, Yong Zhang","doi":"10.3390/a17050173","DOIUrl":"https://doi.org/10.3390/a17050173","url":null,"abstract":"The research on baggage flow plays a pivotal role in achieving the efficient and intelligent allocation and scheduling of airport service resources, as well as serving as a fundamental element in determining the design, development, and process optimization of airport baggage handling systems. This paper examines baggage checked in by departing passengers at airports. The crrent state of the research on baggage flow demand is first reviewed and analyzed. Then, using examples of objective data, it is concluded that while there is a significant correlation between airport passenger flow and baggage flow, an increase in passenger flow does not necessarily result in a proportional increase in baggage flow. According to the existing research results on the influencing factors of baggage flow sorting and classification, the main influencing factors of baggage flow are divided into two categories: macro-influencing factors and micro-influencing factors. When studying the relationship between the economy and baggage flow, it is recommended to use a comprehensive analysis that includes multiple economic indicators, rather than relying solely on GDP. This paper provides a brief overview of prevalent transportation flow prediction methods, categorizing algorithmic models into three groups: based on mathematical and statistical models, intelligent algorithmic-based models, and combined algorithmic models utilizing artificial neural networks. The structures, strengths, and weaknesses of various transportation flow prediction algorithms are analyzed, as well as their application scenarios. The potential advantages of using artificial neural network-based combined prediction models for baggage flow forecasting are explained. It concludes with an outlook on research regarding the demand for baggage flow. This review may provide further research assistance to scholars in airport management and baggage handling system development.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"72 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140670501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Security and Ownership in User-Defined Data Meshes 用户定义数据网格的安全性和所有权

Algorithms

Pub Date : 2024-04-22 DOI: 10.3390/a17040169

Michalis Pingos, Panayiotis Christodoulou, Andreas S. Andreou

Data meshes are an approach to data architecture and organization that treats data as a product and focuses on decentralizing data ownership and access. It has recently emerged as a field that presents quite a few challenges related to data ownership, governance, security, monitoring, and observability. To address these challenges, this paper introduces an innovative algorithmic framework leveraging data blueprints to enable the dynamic creation of data meshes and data products in response to user requests, ensuring that stakeholders have access to specific portions of the data mesh as needed. Ownership and governance concerns are addressed through a unique mechanism involving Blockchain and Non-Fungible Tokens (NFTs). This facilitates the secure and transparent transfer of data ownership, with the ability to mint time-based NFTs. By combining these advancements with the fundamental tenets of data meshes, this research offers a comprehensive solution to the challenges surrounding data ownership and governance. It empowers stakeholders to navigate the complexities of data management within a decentralized architecture, ensuring a secure, efficient, and user-centric approach to data utilization. The proposed framework is demonstrated using real-world data from a poultry meat production factory.

数据网格是一种数据架构和组织方法，它将数据视为一种产品，并侧重于分散数据所有权和访问权。最近，数据网格作为一个新兴领域出现，在数据所有权、治理、安全、监控和可观测性方面提出了许多挑战。为了应对这些挑战，本文介绍了一种创新的算法框架，利用数据蓝图来实现数据网格和数据产品的动态创建，以响应用户的要求，确保利益相关者能够根据需要访问数据网格的特定部分。所有权和管理问题通过涉及区块链和不可篡改代币（NFT）的独特机制来解决。这有助于安全、透明地转移数据所有权，并能铸造基于时间的 NFT。通过将这些进步与数据网格的基本原则相结合，这项研究为围绕数据所有权和治理的挑战提供了一个全面的解决方案。它使利益相关者能够在分散式架构内驾驭复杂的数据管理，确保安全、高效和以用户为中心的数据利用方法。我们利用一家家禽肉类生产工厂的真实数据对所提出的框架进行了演示。

{"title":"Security and Ownership in User-Defined Data Meshes","authors":"Michalis Pingos, Panayiotis Christodoulou, Andreas S. Andreou","doi":"10.3390/a17040169","DOIUrl":"https://doi.org/10.3390/a17040169","url":null,"abstract":"Data meshes are an approach to data architecture and organization that treats data as a product and focuses on decentralizing data ownership and access. It has recently emerged as a field that presents quite a few challenges related to data ownership, governance, security, monitoring, and observability. To address these challenges, this paper introduces an innovative algorithmic framework leveraging data blueprints to enable the dynamic creation of data meshes and data products in response to user requests, ensuring that stakeholders have access to specific portions of the data mesh as needed. Ownership and governance concerns are addressed through a unique mechanism involving Blockchain and Non-Fungible Tokens (NFTs). This facilitates the secure and transparent transfer of data ownership, with the ability to mint time-based NFTs. By combining these advancements with the fundamental tenets of data meshes, this research offers a comprehensive solution to the challenges surrounding data ownership and governance. It empowers stakeholders to navigate the complexities of data management within a decentralized architecture, ensuring a secure, efficient, and user-centric approach to data utilization. The proposed framework is demonstrated using real-world data from a poultry meat production factory.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"29 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140673025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Multi-Stage Method for Logo Detection in Scanned Official Documents Based on Image Processing 基于图像处理的多阶段公文扫描徽标检测方法

Algorithms

Pub Date : 2024-04-22 DOI: 10.3390/a17040170

María Guijarro, Juan Bayon, Daniel Martín-Carabias, Joaquín Recas

A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using the public Tobacco800 database. Our method outputs a set of regions from an official document with a high probability to contain a logo using a new approach based on the variation of the feature rectangles method available in the literature. Candidate regions were computed using the longest increasing run algorithm over the document blank lines’ indices. Those regions were further refined by using a feature-rectangle-expansion method with forward checking, where the rectangle expansion can occur in parallel in each region. Finally, a C4.5 decision tree was trained and tested against a set of 1291 official documents to evaluate its performance. The strategic combination of the three previous steps offers a precision and recall for logo detention of 98.9% and 89.9%, respectively, being also resistant to noise and low-quality documents. The method is also able to reduce the processing area of the document while maintaining a low percentage of false negatives.

徽标是一个矩形区域，由一系列不同于文字的特征定义，这些特征来自像素信息和区域形状。本文提出了一种新的徽标自动检测方法，并使用公共 Tobacco800 数据库进行了测试。我们的方法使用一种基于文献中特征矩形方法变体的新方法，从正式文件中输出一组包含徽标概率较高的区域。在计算候选区域时，使用了文件空白行指数的最长递增运行算法。通过使用带有前向检查功能的特征矩形扩展法进一步完善这些区域，其中矩形扩展可以在每个区域中并行进行。最后，对 C4.5 决策树进行了训练，并对 1291 份官方文件集进行了测试，以评估其性能。前三个步骤的策略性组合使徽标识别的精确度和召回率分别达到 98.9% 和 89.9%，同时还能抵御噪音和低质量文件。该方法还能减少文件的处理面积，同时保持较低的误判率。

{"title":"A Multi-Stage Method for Logo Detection in Scanned Official Documents Based on Image Processing","authors":"María Guijarro, Juan Bayon, Daniel Martín-Carabias, Joaquín Recas","doi":"10.3390/a17040170","DOIUrl":"https://doi.org/10.3390/a17040170","url":null,"abstract":"A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using the public Tobacco800 database. Our method outputs a set of regions from an official document with a high probability to contain a logo using a new approach based on the variation of the feature rectangles method available in the literature. Candidate regions were computed using the longest increasing run algorithm over the document blank lines’ indices. Those regions were further refined by using a feature-rectangle-expansion method with forward checking, where the rectangle expansion can occur in parallel in each region. Finally, a C4.5 decision tree was trained and tested against a set of 1291 official documents to evaluate its performance. The strategic combination of the three previous steps offers a precision and recall for logo detention of 98.9% and 89.9%, respectively, being also resistant to noise and low-quality documents. The method is also able to reduce the processing area of the document while maintaining a low percentage of false negatives.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"48 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140676252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pediatric Ischemic Stroke: Clinical and Paraclinical Manifestations—Algorithms for Diagnosis and Treatment 小儿缺血性中风：临床和副临床表现-诊断和治疗方案

Algorithms

Pub Date : 2024-04-22 DOI: 10.3390/a17040171

Niels Wessel, M. Sprincean, Ludmila Sidorenko, N. Revenco, S. Hadjiu

Childhood stroke can lead to lifelong disability. Developing algorithms for timely recognition of clinical and paraclinical signs is crucial to ensure prompt stroke diagnosis and minimize decision-making time. This study aimed to characterize clinical and paraclinical symptoms of childhood and neonatal stroke as relevant diagnostic criteria encountered in clinical practice, in order to develop algorithms for prompt stroke diagnosis. The analysis included data from 402 pediatric case histories from 2010 to 2016 and 108 prospective stroke cases from 2017 to 2020. Stroke cases were predominantly diagnosed in newborns, with 362 (71%, 95% CI 68.99–73.01) cases occurring within the first 28 days of birth, and 148 (29%, 95% CI 26.99–31.01) cases occurring after 28 days. The findings of the study enable the development of algorithms for timely stroke recognition, facilitating the selection of optimal treatment options for newborns and children of various age groups. Logistic regression serves as the basis for deriving these algorithms, aiming to initiate early treatment and reduce lifelong morbidity and mortality in children. The study outcomes include the formulation of algorithms for timely recognition of newborn stroke, with plans to adopt these algorithms and train a fuzzy classifier-based diagnostic model using machine learning techniques for efficient stroke recognition.

儿童中风可导致终身残疾。制定及时识别临床和辅助临床症状的算法对于确保及时诊断中风和减少决策时间至关重要。本研究旨在描述儿童和新生儿卒中的临床和副临床症状，作为临床实践中遇到的相关诊断标准，从而开发出及时诊断卒中的算法。分析包括 2010 年至 2016 年的 402 例儿科病历数据和 2017 年至 2020 年的 108 例前瞻性中风病例数据。脑卒中病例主要在新生儿中确诊，其中362例（71%，95% CI 68.99-73.01）发生在出生后28天内，148例（29%，95% CI 26.99-31.01）发生在28天后。研究结果有助于制定及时识别中风的算法，为新生儿和各年龄组儿童选择最佳治疗方案提供便利。逻辑回归是推导这些算法的基础，旨在启动早期治疗，降低儿童的终生发病率和死亡率。研究成果包括制定了及时识别新生儿中风的算法，并计划采用这些算法，利用机器学习技术训练基于模糊分类器的诊断模型，以高效识别中风。

{"title":"Pediatric Ischemic Stroke: Clinical and Paraclinical Manifestations—Algorithms for Diagnosis and Treatment","authors":"Niels Wessel, M. Sprincean, Ludmila Sidorenko, N. Revenco, S. Hadjiu","doi":"10.3390/a17040171","DOIUrl":"https://doi.org/10.3390/a17040171","url":null,"abstract":"Childhood stroke can lead to lifelong disability. Developing algorithms for timely recognition of clinical and paraclinical signs is crucial to ensure prompt stroke diagnosis and minimize decision-making time. This study aimed to characterize clinical and paraclinical symptoms of childhood and neonatal stroke as relevant diagnostic criteria encountered in clinical practice, in order to develop algorithms for prompt stroke diagnosis. The analysis included data from 402 pediatric case histories from 2010 to 2016 and 108 prospective stroke cases from 2017 to 2020. Stroke cases were predominantly diagnosed in newborns, with 362 (71%, 95% CI 68.99–73.01) cases occurring within the first 28 days of birth, and 148 (29%, 95% CI 26.99–31.01) cases occurring after 28 days. The findings of the study enable the development of algorithms for timely stroke recognition, facilitating the selection of optimal treatment options for newborns and children of various age groups. Logistic regression serves as the basis for deriving these algorithms, aiming to initiate early treatment and reduce lifelong morbidity and mortality in children. The study outcomes include the formulation of algorithms for timely recognition of newborn stroke, with plans to adopt these algorithms and train a fuzzy classifier-based diagnostic model using machine learning techniques for efficient stroke recognition.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"4 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140675248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CCFNet: Collaborative Cross-Fusion Network for Medical Image Segmentation CCFNet：用于医学图像分割的协作交叉融合网络

Algorithms

Pub Date : 2024-04-21 DOI: 10.3390/a17040168

Jialu Chen, Baohua Yuan

The Transformer architecture has gained widespread acceptance in image segmentation. However, it sacrifices local feature details and necessitates extensive data for training, posing challenges to its integration into computer-aided medical image segmentation. To address the above challenges, we introduce CCFNet, a collaborative cross-fusion network, which continuously fuses a CNN and Transformer interactively to exploit context dependencies. In particular, when integrating CNN features into Transformer, the correlations between local and global tokens are adaptively fused through collaborative self-attention fusion to minimize the semantic disparity between these two types of features. When integrating Transformer features into the CNN, it uses the spatial feature injector to reduce the spatial information gap between features due to the asymmetry of the extracted features. In addition, CCFNet implements the parallel operation of Transformer and the CNN and independently encodes hierarchical global and local representations when effectively aggregating different features, which can preserve global representations and local features. The experimental findings from two public medical image segmentation datasets reveal that our approach exhibits competitive performance in comparison to current state-of-the-art methods.

Transformer 架构已在图像分割领域获得广泛认可。然而，它牺牲了局部特征细节，并且需要大量数据进行训练，这给将其集成到计算机辅助医学图像分割中带来了挑战。为应对上述挑战，我们引入了交叉融合协作网络 CCFNet，该网络可持续交互融合 CNN 和 Transformer，以利用上下文依赖关系。特别是，当将 CNN 特征整合到 Transformer 时，局部标记和全局标记之间的相关性会通过协作自注意融合进行自适应融合，以尽量减少这两类特征之间的语义差异。在将 Transformer 特征集成到 CNN 时，它使用空间特征注入器来减少由于提取特征的不对称而造成的特征间的空间信息差距。此外，CCFNet 实现了 Transformer 和 CNN 的并行操作，并在有效聚合不同特征时独立编码分层的全局和局部表征，从而保留了全局表征和局部特征。两个公开医学图像分割数据集的实验结果表明，与目前最先进的方法相比，我们的方法表现出了极具竞争力的性能。

{"title":"CCFNet: Collaborative Cross-Fusion Network for Medical Image Segmentation","authors":"Jialu Chen, Baohua Yuan","doi":"10.3390/a17040168","DOIUrl":"https://doi.org/10.3390/a17040168","url":null,"abstract":"The Transformer architecture has gained widespread acceptance in image segmentation. However, it sacrifices local feature details and necessitates extensive data for training, posing challenges to its integration into computer-aided medical image segmentation. To address the above challenges, we introduce CCFNet, a collaborative cross-fusion network, which continuously fuses a CNN and Transformer interactively to exploit context dependencies. In particular, when integrating CNN features into Transformer, the correlations between local and global tokens are adaptively fused through collaborative self-attention fusion to minimize the semantic disparity between these two types of features. When integrating Transformer features into the CNN, it uses the spatial feature injector to reduce the spatial information gap between features due to the asymmetry of the extracted features. In addition, CCFNet implements the parallel operation of Transformer and the CNN and independently encodes hierarchical global and local representations when effectively aggregating different features, which can preserve global representations and local features. The experimental findings from two public medical image segmentation datasets reveal that our approach exhibits competitive performance in comparison to current state-of-the-art methods.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"121 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140677907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating Diffusion Models for the Automation of Ultrasonic Nondestructive Evaluation Data Analysis 评估扩散模型，实现超声波无损检测数据分析自动化

Algorithms

Pub Date : 2024-04-21 DOI: 10.3390/a17040167

Nick Torenvliet, John Zelek

We develop decision support and automation for the task of ultrasonic non-destructive evaluation data analysis. First, we develop a probabilistic model for the task and then implement the model as a series of neural networks based on Conditional Score-Based Diffusion and Denoising Diffusion Probabilistic Model architectures. We use the neural networks to generate estimates for peak amplitude response time of flight and perform a series of tests probing their behavior, capacity, and characteristics in terms of the probabilistic model. We train the neural networks on a series of datasets constructed from ultrasonic non-destructive evaluation data acquired during an inspection at a nuclear power generation facility. We modulate the partition classifying nominal and anomalous data in the dataset and observe that the probabilistic model predicts trends in neural network model performance, thereby demonstrating a principled basis for explainability. We improve on previous related work as our methods are self-supervised and require no data annotation or pre-processing, and we train on a per-dataset basis, meaning we do not rely on out-of-distribution generalization. The capacity of the probabilistic model to predict trends in neural network performance, as well as the quality of the estimates sampled from the neural networks, support the development of a technical justification for usage of the method in safety-critical contexts such as nuclear applications. The method may provide a basis or template for extension into similar non-destructive evaluation tasks in other industrial contexts.

我们为超声波无损评估数据分析任务开发了决策支持和自动化功能。首先，我们为这项任务开发了一个概率模型，然后根据基于条件得分的扩散和去噪扩散概率模型架构，将该模型实施为一系列神经网络。我们使用神经网络对飞行的峰值振幅响应时间进行估计，并根据概率模型对神经网络的行为、容量和特性进行一系列测试。我们在一系列数据集上对神经网络进行了训练，这些数据集是在核电设施检查过程中获取的超声波无损评估数据。我们对数据集中的标称数据和异常数据进行分区分类，观察到概率模型预测了神经网络模型性能的趋势，从而证明了可解释性的原则基础。我们的方法是自监督式的，无需数据注释或预处理，而且我们是按数据集进行训练，这意味着我们不依赖于分布外泛化，因此我们改进了之前的相关工作。概率模型预测神经网络性能趋势的能力以及神经网络采样估计值的质量，为在核应用等安全关键环境中使用该方法提供了技术依据。该方法可为在其他工业环境中扩展到类似的无损评估任务提供基础或模板。

{"title":"Evaluating Diffusion Models for the Automation of Ultrasonic Nondestructive Evaluation Data Analysis","authors":"Nick Torenvliet, John Zelek","doi":"10.3390/a17040167","DOIUrl":"https://doi.org/10.3390/a17040167","url":null,"abstract":"We develop decision support and automation for the task of ultrasonic non-destructive evaluation data analysis. First, we develop a probabilistic model for the task and then implement the model as a series of neural networks based on Conditional Score-Based Diffusion and Denoising Diffusion Probabilistic Model architectures. We use the neural networks to generate estimates for peak amplitude response time of flight and perform a series of tests probing their behavior, capacity, and characteristics in terms of the probabilistic model. We train the neural networks on a series of datasets constructed from ultrasonic non-destructive evaluation data acquired during an inspection at a nuclear power generation facility. We modulate the partition classifying nominal and anomalous data in the dataset and observe that the probabilistic model predicts trends in neural network model performance, thereby demonstrating a principled basis for explainability. We improve on previous related work as our methods are self-supervised and require no data annotation or pre-processing, and we train on a per-dataset basis, meaning we do not rely on out-of-distribution generalization. The capacity of the probabilistic model to predict trends in neural network performance, as well as the quality of the estimates sampled from the neural networks, support the development of a technical justification for usage of the method in safety-critical contexts such as nuclear applications. The method may provide a basis or template for extension into similar non-destructive evaluation tasks in other industrial contexts.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"114 50","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140678145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diabetic Retinopathy Lesion Segmentation Method Based on Multi-Scale Attention and Lesion Perception 基于多尺度注意力和病变感知的糖尿病视网膜病变病灶分割方法

Algorithms

Pub Date : 2024-04-19 DOI: 10.3390/a17040164

Ye Bian, Chengyong Si, Lei Wang

The early diagnosis of diabetic retinopathy (DR) can effectively prevent irreversible vision loss and assist ophthalmologists in providing timely and accurate treatment plans. However, the existing methods based on deep learning have a weak perception ability of different scale information in retinal fundus images, and the segmentation capability of subtle lesions is also insufficient. This paper aims to address these issues and proposes MLNet for DR lesion segmentation, which mainly consists of the Multi-Scale Attention Block (MSAB) and the Lesion Perception Block (LPB). The MSAB is designed to capture multi-scale lesion features in fundus images, while the LPB perceives subtle lesions in depth. In addition, a novel loss function with tailored lesion weight is designed to reduce the influence of imbalanced datasets on the algorithm. The performance comparison between MLNet and other state-of-the-art methods is carried out in the DDR dataset and DIARETDB1 dataset, and MLNet achieves the best results of 51.81% mAUPR, 49.85% mDice, and 37.19% mIoU in the DDR dataset, and 67.16% mAUPR and 61.82% mDice in the DIARETDB1 dataset. The generalization experiment of MLNet in the IDRiD dataset achieves 59.54% mAUPR, which is the best among other methods. The results show that MLNet has outstanding DR lesion segmentation ability.

糖尿病视网膜病变（DR）的早期诊断可有效防止不可逆转的视力损失，并有助于眼科医生提供及时准确的治疗方案。然而，现有的基于深度学习的方法对视网膜眼底图像中不同尺度信息的感知能力较弱，对细微病变的分割能力也不足。本文旨在解决这些问题，提出了用于 DR 病变分割的 MLNet，主要由多尺度注意块（MSAB）和病变感知块（LPB）组成。MSAB 专为捕捉眼底图像中的多尺度病变特征而设计，而 LPB 可感知深度上的细微病变。此外，还设计了一种具有定制病变权重的新型损失函数，以减少不平衡数据集对算法的影响。在 DDR 数据集和 DIARETDB1 数据集中，MLNet 与其他最先进方法进行了性能比较，在 DDR 数据集中，MLNet 取得了 51.81% mAUPR、49.85% mDice 和 37.19% mIoU 的最佳结果；在 DIARETDB1 数据集中，MLNet 取得了 67.16% mAUPR 和 61.82% mDice 的最佳结果。MLNet 在 IDRiD 数据集中的泛化实验取得了 59.54% 的 mAUPR，是其他方法中最好的。结果表明，MLNet 具有出色的 DR 病变分割能力。

{"title":"Diabetic Retinopathy Lesion Segmentation Method Based on Multi-Scale Attention and Lesion Perception","authors":"Ye Bian, Chengyong Si, Lei Wang","doi":"10.3390/a17040164","DOIUrl":"https://doi.org/10.3390/a17040164","url":null,"abstract":"The early diagnosis of diabetic retinopathy (DR) can effectively prevent irreversible vision loss and assist ophthalmologists in providing timely and accurate treatment plans. However, the existing methods based on deep learning have a weak perception ability of different scale information in retinal fundus images, and the segmentation capability of subtle lesions is also insufficient. This paper aims to address these issues and proposes MLNet for DR lesion segmentation, which mainly consists of the Multi-Scale Attention Block (MSAB) and the Lesion Perception Block (LPB). The MSAB is designed to capture multi-scale lesion features in fundus images, while the LPB perceives subtle lesions in depth. In addition, a novel loss function with tailored lesion weight is designed to reduce the influence of imbalanced datasets on the algorithm. The performance comparison between MLNet and other state-of-the-art methods is carried out in the DDR dataset and DIARETDB1 dataset, and MLNet achieves the best results of 51.81% mAUPR, 49.85% mDice, and 37.19% mIoU in the DDR dataset, and 67.16% mAUPR and 61.82% mDice in the DIARETDB1 dataset. The generalization experiment of MLNet in the IDRiD dataset achieves 59.54% mAUPR, which is the best among other methods. The results show that MLNet has outstanding DR lesion segmentation ability.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":" 57","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140683403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting the Aggregate Mobility of a Vehicle Fleet within a City Graph 预测城市图谱中车队的总体流动性

Algorithms

Pub Date : 2024-04-19 DOI: 10.3390/a17040166

J. F. Sánchez-Rada, Raquel Vila-Rodríguez, Jesús Montes, Pedro J. Zufiria

Predicting vehicle mobility is crucial in domains such as ride-hailing, where the balance between offer and demand is paramount. Since city road networks can be easily represented as graphs, recent works have exploited graph neural networks (GNNs) to produce more accurate predictions on real traffic data. However, a better understanding of the characteristics and limitations of this approach is needed. In this work, we compare several GNN aggregated mobility prediction schemes to a selection of other approaches in a very restricted and controlled simulation scenario. The city graph employed represents roads as directed edges and road intersections as nodes. Individual vehicle mobility is modeled as transitions between nodes in the graph. A time series of aggregated mobility is computed by counting vehicles in each node at any given time. Three main approaches are employed to construct the aggregated mobility predictors. First, the behavior of the moving individuals is assumed to follow a Markov chain (MC) model whose transition matrix is inferred via a least squares estimation procedure; the recurrent application of this MC provides the aggregated mobility prediction values. Second, a multilayer perceptron (MLP) is trained so that—given the node occupation at a given time—it can recursively provide predictions for the next values of the time series. Third, we train a GNN (according to the city graph) with the time series data via a supervised learning formulation that computes—through an embedding construction for each node in the graph—the aggregated mobility predictions. Some mobility patterns are simulated in the city to generate different time series for testing purposes. The proposed schemes are comparatively assessed compared to different baseline prediction procedures. The comparison illustrates several limitations of the GNN approaches in the selected scenario and uncovers future lines of investigation.

在诸如打车等领域，预测车辆的流动性至关重要，因为在这些领域，供需平衡是最重要的。由于城市道路网络可以很容易地表示为图，最近的研究利用图神经网络（GNN）对真实交通数据进行了更准确的预测。然而，我们需要更好地了解这种方法的特点和局限性。在这项工作中，我们在一个非常受限和受控的模拟场景中，将几种 GNN 聚合流动性预测方案与其他一些方法进行了比较。采用的城市图将道路表示为有向边，将道路交叉口表示为节点。单个车辆的流动性被模拟为图中节点之间的转换。通过计算任意给定时间内每个节点上的车辆数，计算出总流动性的时间序列。构建综合流动性预测器主要采用三种方法。首先，假定移动个体的行为遵循马尔可夫链（MC）模型，该模型的转换矩阵通过最小二乘估算程序进行推断；该 MC 的循环应用提供了聚集移动预测值。其次，对多层感知器（MLP）进行训练，使其能够根据给定时间内的节点占据情况，递归预测时间序列的下一个值。第三，我们通过有监督的学习方法，用时间序列数据训练一个 GNN（根据城市图），通过对图中每个节点的嵌入构造，计算出综合流动预测值。在城市中模拟一些流动模式，生成不同的时间序列进行测试。建议的方案与不同的基准预测程序进行了比较评估。比较结果说明了 GNN 方法在所选场景中的一些局限性，并揭示了未来的研究方向。

{"title":"Predicting the Aggregate Mobility of a Vehicle Fleet within a City Graph","authors":"J. F. Sánchez-Rada, Raquel Vila-Rodríguez, Jesús Montes, Pedro J. Zufiria","doi":"10.3390/a17040166","DOIUrl":"https://doi.org/10.3390/a17040166","url":null,"abstract":"Predicting vehicle mobility is crucial in domains such as ride-hailing, where the balance between offer and demand is paramount. Since city road networks can be easily represented as graphs, recent works have exploited graph neural networks (GNNs) to produce more accurate predictions on real traffic data. However, a better understanding of the characteristics and limitations of this approach is needed. In this work, we compare several GNN aggregated mobility prediction schemes to a selection of other approaches in a very restricted and controlled simulation scenario. The city graph employed represents roads as directed edges and road intersections as nodes. Individual vehicle mobility is modeled as transitions between nodes in the graph. A time series of aggregated mobility is computed by counting vehicles in each node at any given time. Three main approaches are employed to construct the aggregated mobility predictors. First, the behavior of the moving individuals is assumed to follow a Markov chain (MC) model whose transition matrix is inferred via a least squares estimation procedure; the recurrent application of this MC provides the aggregated mobility prediction values. Second, a multilayer perceptron (MLP) is trained so that—given the node occupation at a given time—it can recursively provide predictions for the next values of the time series. Third, we train a GNN (according to the city graph) with the time series data via a supervised learning formulation that computes—through an embedding construction for each node in the graph—the aggregated mobility predictions. Some mobility patterns are simulated in the city to generate different time series for testing purposes. The proposed schemes are comparatively assessed compared to different baseline prediction procedures. The comparison illustrates several limitations of the GNN approaches in the selected scenario and uncovers future lines of investigation.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":" 35","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140684257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0