Future Generation Computer Systems-The International Journal of Escience最新文献_第10页

Integrating scientific single-page applications with DevSecOps

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-31 DOI: 10.1016/j.future.2024.107695

Lance Drane, Marshall McDonnell, Randall Petras, Cody Stiner, Arthur J. Ruckman, Gavin M. Wiggins, Gregory Cage, Robert Smith, Seth Hitefield, Jesse McGaha, Andrew Ayres, Mike Brim, Richard Archibald, Addi Malviya-Thakur

In the rapidly evolving field of frontend development, Single-Page Applications (SPAs) stand out for their ability to create dynamic and interactive web applications, particularly valuable in scientific software for their real-time data integration and complex workflow management. However, the process of creating a single-page web application development environment that accurately reflects the production environment isn’t always straightforward. Most SPA build systems assume configuration at build time, while DevSecOps engineers prefer runtime configuration. This paper proposes a unique, framework-agnostic methodology designed to bridge this divide, facilitating the seamless integration of SPAs within the DevSecOps framework without necessitating expertise in both domains. Leveraging environmental variables, Docker, and a strategic approach to Content Security Policy (CSP), we provide a comprehensive guide for developing, deploying, and securing SPAs in a manner that is both efficient and secure. Applying this method to the INTERSECT and Smart Spectral Matching platforms, we demonstrate its effectiveness in enhancing both the development process and the user experience in scientific applications, thereby addressing the complex challenges faced by research software engineers in the current landscape.

{"title":"Integrating scientific single-page applications with DevSecOps","authors":"Lance Drane, Marshall McDonnell, Randall Petras, Cody Stiner, Arthur J. Ruckman, Gavin M. Wiggins, Gregory Cage, Robert Smith, Seth Hitefield, Jesse McGaha, Andrew Ayres, Mike Brim, Richard Archibald, Addi Malviya-Thakur","doi":"10.1016/j.future.2024.107695","DOIUrl":"10.1016/j.future.2024.107695","url":null,"abstract":"<div><div>In the rapidly evolving field of frontend development, Single-Page Applications (SPAs) stand out for their ability to create dynamic and interactive web applications, particularly valuable in scientific software for their real-time data integration and complex workflow management. However, the process of creating a single-page web application development environment that accurately reflects the production environment isn’t always straightforward. Most SPA build systems assume configuration at build time, while DevSecOps engineers prefer runtime configuration. This paper proposes a unique, framework-agnostic methodology designed to bridge this divide, facilitating the seamless integration of SPAs within the DevSecOps framework without necessitating expertise in both domains. Leveraging environmental variables, Docker, and a strategic approach to Content Security Policy (CSP), we provide a comprehensive guide for developing, deploying, and securing SPAs in a manner that is both efficient and secure. Applying this method to the INTERSECT and Smart Spectral Matching platforms, we demonstrate its effectiveness in enhancing both the development process and the user experience in scientific applications, thereby addressing the complex challenges faced by research software engineers in the current landscape.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107695"},"PeriodicalIF":6.2,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing performance of machine learning tasks on edge-cloud infrastructures: A cross-domain Internet of Things based framework

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-31 DOI: 10.1016/j.future.2024.107696

Osama Almurshed , Ashish Kaushal , Souham Meshoul , Asmail Muftah , Osama Almoghamis , Ioan Petri , Nitin Auluck , Omer Rana

The Internet of Things (IoT) and Edge-Cloud Computing have been trending technologies over the past few years. In this work, we introduce the Enhanced Optimized-Greedy Nominator Heuristic (EO-GNH), a framework designed to optimize machine learning (ML) and artificial intelligence (AI) application placement in edge environments, aiming to improve Quality of Service (QoS). Developed specifically for sectors such as smart agriculture, industry, and healthcare, EO-GNH integrates asynchronous MapReduce and parallel meta-heuristics to effectively manage AI applications, focusing on execution performance, resource utilization, and infrastructure resilience. The framework carefully addresses the distribution challenges of AI applications, especially Service Function Chains (SFCs), in edge-cloud infrastructures. It contains Data Flow Management, which covers aspects of data storage and data privacy, and also considers factors like regional adaptations, mobile access, and AI model refinement. EO-GNH ensures high availability for forecasting, prediction, and training AI models, operating efficiently within a geo-distributed infrastructure. The proposed strategies within EO-GNH emphasize concurrent multi-node execution, enhancing AI application placement by improving execution time, dependability, and cost-effectiveness. The efficiency of EO-GNH is demonstrated through its impact on QoS in real-time resource management across three application domains, highlighting its adaptability and potential in diverse cross-domain IoT-based environments.

{"title":"Enhancing performance of machine learning tasks on edge-cloud infrastructures: A cross-domain Internet of Things based framework","authors":"Osama Almurshed , Ashish Kaushal , Souham Meshoul , Asmail Muftah , Osama Almoghamis , Ioan Petri , Nitin Auluck , Omer Rana","doi":"10.1016/j.future.2024.107696","DOIUrl":"10.1016/j.future.2024.107696","url":null,"abstract":"<div><div>The Internet of Things (IoT) and Edge-Cloud Computing have been trending technologies over the past few years. In this work, we introduce the Enhanced Optimized-Greedy Nominator Heuristic (EO-GNH), a framework designed to optimize machine learning (ML) and artificial intelligence (AI) application placement in edge environments, aiming to improve Quality of Service (QoS). Developed specifically for sectors such as smart agriculture, industry, and healthcare, EO-GNH integrates asynchronous MapReduce and parallel meta-heuristics to effectively manage AI applications, focusing on execution performance, resource utilization, and infrastructure resilience. The framework carefully addresses the distribution challenges of AI applications, especially Service Function Chains (SFCs), in edge-cloud infrastructures. It contains Data Flow Management, which covers aspects of data storage and data privacy, and also considers factors like regional adaptations, mobile access, and AI model refinement. EO-GNH ensures high availability for forecasting, prediction, and training AI models, operating efficiently within a geo-distributed infrastructure. The proposed strategies within EO-GNH emphasize concurrent multi-node execution, enhancing AI application placement by improving execution time, dependability, and cost-effectiveness. The efficiency of EO-GNH is demonstrated through its impact on QoS in real-time resource management across three application domains, highlighting its adaptability and potential in diverse cross-domain IoT-based environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107696"},"PeriodicalIF":6.2,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143167490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel software design of large-scale diamond-structured crystals molecular dynamics simulation

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-30 DOI: 10.1016/j.future.2024.107694

Jianguo Liang , Qianqian Li , Hao Han , You Fu

Molecular dynamics (MD) simulation, a crucial technique for investigating atomic structure and dynamic properties, has become a primary method for studying the thermodynamic properties of dielectric materials, such as silicon, and their low-dimensional nanostructures. Diamond-structured semiconductors exhibit unique crystallographic properties. Achieving optimal simulation performance on supercomputing platforms necessitates specialized parallel design and optimization, considering both atom spatial characteristics and platform architecture. To tackle storage challenges in large-scale simulations of diamond-structured crystals, we designed a hierarchical storage-based atom data organization and a neighbor list construction algorithm exploiting positional offsets. Furthermore, a novel “point-line-plane” communication model was implemented. This model leverages the distribution of atom neighbors and a fixed neighbor list, enhancing communication efficiency via data packing to enable scalable simulations. A numerical simulation software, Diamond-MD, was developed for simulating diamond-structured crystals, enabling simulations at the 100 million-atom scale. Benchmark results indicate that Diamond-MD achieves a 44% reduction in memory usage and a 48% improvement in computational performance compared to LAMMPS. Moreover, Diamond-MD demonstrates excellent scalability.

{"title":"Parallel software design of large-scale diamond-structured crystals molecular dynamics simulation","authors":"Jianguo Liang , Qianqian Li , Hao Han , You Fu","doi":"10.1016/j.future.2024.107694","DOIUrl":"10.1016/j.future.2024.107694","url":null,"abstract":"<div><div>Molecular dynamics (MD) simulation, a crucial technique for investigating atomic structure and dynamic properties, has become a primary method for studying the thermodynamic properties of dielectric materials, such as silicon, and their low-dimensional nanostructures. Diamond-structured semiconductors exhibit unique crystallographic properties. Achieving optimal simulation performance on supercomputing platforms necessitates specialized parallel design and optimization, considering both atom spatial characteristics and platform architecture. To tackle storage challenges in large-scale simulations of diamond-structured crystals, we designed a hierarchical storage-based atom data organization and a neighbor list construction algorithm exploiting positional offsets. Furthermore, a novel “point-line-plane” communication model was implemented. This model leverages the distribution of atom neighbors and a fixed neighbor list, enhancing communication efficiency via data packing to enable scalable simulations. A numerical simulation software, Diamond-MD, was developed for simulating diamond-structured crystals, enabling simulations at the 100 million-atom scale. Benchmark results indicate that Diamond-MD achieves a 44% reduction in memory usage and a 48% improvement in computational performance compared to LAMMPS. Moreover, Diamond-MD demonstrates excellent scalability.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107694"},"PeriodicalIF":6.2,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143167345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cooperative metric learning-based hybrid transformer for automatic recognition of standard echocardiographic multi-views 基于协同度量学习的标准超声心动图多视图自动识别混合变压器

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-30 DOI: 10.1016/j.future.2024.107693

Yiran Li , Yankun Cao , Jia Mi , Xiaoxiao Cui , Xifeng Hu , Yuezhong Zhang , Zhi Liu , Lizhen Cui , Shuo Li

The successful recognition of the standard echocardiographic ten-views remains elusive, primarily due to the complexity of cardiac anatomy, confusion caused by low-quality data, and subtle variations among closely related multi-views. To cope with the limitations of existing algorithms, which include a lack of objectivity, accuracy, and robustness, we propose a Hybrid Cooperative Metric Network (HCMN). We enhance the objectivity, accuracy and robustness of the quality assessment by integrating knowledge of cycle-consistency with metric consistency, which helps mitigate inaccurate fitting in hybrid distributions. Therefore, it provides a clear feature similarity distribution to prevent feature confusion. The experiments demonstrate that the HCMN model significantly outperforms the state-of-the-art in quality assessment, achieving an impressive accuracy of 96.74%. We believe this novel framework will establish a reliable benchmark for recognizing standard echocardiographic multi-views and provide a new interpretable perspective on standardized the automatic cardiac disease diagnosis. By adapting and applying advanced assessment methodologies, we can enhance the clarity and interpretability of medical imaging, thereby aiding in the precise identification of lesions and improving decision-making accuracy in drug discovery.

成功识别标准超声心动图十视图仍然难以捉摸，主要是由于心脏解剖的复杂性，低质量数据引起的混淆，以及密切相关的多视图之间的微妙变化。为了解决现有算法缺乏客观性、准确性和鲁棒性的局限性，我们提出了一种混合合作度量网络（HCMN）。我们通过将循环一致性知识与度量一致性知识相结合，提高了质量评估的客观性、准确性和鲁棒性，这有助于减轻混合分布中的不准确拟合。因此，它提供了一个清晰的特征相似度分布，防止特征混淆。实验表明，HCMN模型在质量评估方面的表现明显优于最先进的模型，达到了令人印象深刻的96.74%的准确率。我们相信这一新框架将为标准超声心动图多视图识别建立可靠的基准，并为标准化的心脏病自动诊断提供新的解释视角。通过适应和应用先进的评估方法，我们可以提高医学成像的清晰度和可解释性，从而帮助精确识别病变，提高药物发现决策的准确性。

{"title":"Cooperative metric learning-based hybrid transformer for automatic recognition of standard echocardiographic multi-views","authors":"Yiran Li , Yankun Cao , Jia Mi , Xiaoxiao Cui , Xifeng Hu , Yuezhong Zhang , Zhi Liu , Lizhen Cui , Shuo Li","doi":"10.1016/j.future.2024.107693","DOIUrl":"10.1016/j.future.2024.107693","url":null,"abstract":"<div><div>The successful recognition of the standard echocardiographic ten-views remains elusive, primarily due to the complexity of cardiac anatomy, confusion caused by low-quality data, and subtle variations among closely related multi-views. To cope with the limitations of existing algorithms, which include a lack of objectivity, accuracy, and robustness, we propose a Hybrid Cooperative Metric Network (HCMN). We enhance the objectivity, accuracy and robustness of the quality assessment by integrating knowledge of cycle-consistency with metric consistency, which helps mitigate inaccurate fitting in hybrid distributions. Therefore, it provides a clear feature similarity distribution to prevent feature confusion. The experiments demonstrate that the HCMN model significantly outperforms the state-of-the-art in quality assessment, achieving an impressive accuracy of 96.74%. We believe this novel framework will establish a reliable benchmark for recognizing standard echocardiographic multi-views and provide a new interpretable perspective on standardized the automatic cardiac disease diagnosis. By adapting and applying advanced assessment methodologies, we can enhance the clarity and interpretability of medical imaging, thereby aiding in the precise identification of lesions and improving decision-making accuracy in drug discovery.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107693"},"PeriodicalIF":6.2,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DP-LTGAN: Differentially private trajectory publishing via Locally-aware Transformer-based GAN

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-28 DOI: 10.1016/j.future.2024.107686

R. Zhang, W. Ni, N. Fu, L. Hou, D. Zhang, Y. Zhang

Trajectory data has a wide range of applications in various domains but also raises serious privacy concerns. To address these concerns, the integration of deep learning with differential privacy for trajectory publication has gained widespread attention. However, existing solutions are mostly based on temporal neural networks and the Generative Adversarial Networks (GAN) framework, which intrinsically faces the ”forgetting” problem, leading to their failure to capture and simulate the multi-time-scale behavioral patterns of traffic participants, thereby reducing the utility of published trajectories. Moreover, the computational cost and the noise introduced by the widely used differentially private gradient perturbation method are proportional to the model size, which compromises model quality. To address these problems, we propose a Differentially Private Trajectory Publishing method via Locally-aware Transformer-based GAN (DP-LTGAN), achieving high-utility trajectory publishing while providing differential privacy protection. Specifically, our method features a Locally-aware Transformer, whose attention mechanism is refined by incorporating local state encoding and a multi-scale temporal encoding mechanism. This enhancement significantly improves the modeling of both long- and short-term trajectory patterns. Furthermore, a differentially private gradient perturbation method named Common Term Perturbation (CTP) has been developed, which effectively reduces the amount of noise and the computational cost by utilizing a designed local noise addition pattern and an adaptive noise addition mechanism. Extensive experiments on several real trajectory datasets show that our method enhances the utility and efficiency of synthetic trajectories by 57.7% and 46.88%, respectively, significantly outperforming current state-of-the-art approaches.

{"title":"DP-LTGAN: Differentially private trajectory publishing via Locally-aware Transformer-based GAN","authors":"R. Zhang, W. Ni, N. Fu, L. Hou, D. Zhang, Y. Zhang","doi":"10.1016/j.future.2024.107686","DOIUrl":"10.1016/j.future.2024.107686","url":null,"abstract":"<div><div>Trajectory data has a wide range of applications in various domains but also raises serious privacy concerns. To address these concerns, the integration of deep learning with differential privacy for trajectory publication has gained widespread attention. However, existing solutions are mostly based on temporal neural networks and the Generative Adversarial Networks (GAN) framework, which intrinsically faces the ”forgetting” problem, leading to their failure to capture and simulate the multi-time-scale behavioral patterns of traffic participants, thereby reducing the utility of published trajectories. Moreover, the computational cost and the noise introduced by the widely used differentially private gradient perturbation method are proportional to the model size, which compromises model quality. To address these problems, we propose a Differentially Private Trajectory Publishing method via Locally-aware Transformer-based GAN (DP-LTGAN), achieving high-utility trajectory publishing while providing differential privacy protection. Specifically, our method features a Locally-aware Transformer, whose attention mechanism is refined by incorporating local state encoding and a multi-scale temporal encoding mechanism. This enhancement significantly improves the modeling of both long- and short-term trajectory patterns. Furthermore, a differentially private gradient perturbation method named Common Term Perturbation (CTP) has been developed, which effectively reduces the amount of noise and the computational cost by utilizing a designed local noise addition pattern and an adaptive noise addition mechanism. Extensive experiments on several real trajectory datasets show that our method enhances the utility and efficiency of synthetic trajectories by 57.7% and 46.88%, respectively, significantly outperforming current state-of-the-art approaches.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107686"},"PeriodicalIF":6.2,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143166204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing cuffless arterial blood pressure estimation: A patient-specific optimized approach reducing computational requirements

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-28 DOI: 10.1016/j.future.2024.107689

José A. González-Nóvoa , Laura Busto , Silvia Campanioni , Carlos Martínez , José Fariña , Juan J. Rodríguez-Andina , Pablo Juan-Salvadores , Víctor Jiménez , Andrés Íñiguez , César Veiga

Hypertension remains a leading cause of premature mortality globally, emphasizing the critical need for early detection and management. Unfortunately, less than half of hypertensive adults receive proper diagnosis and treatment. To address this gap, continuous blood pressure (ABP) monitoring has emerged as a valuable tool for detecting cardiovascular complications before they escalate. ABP monitoring can be achieved by cuffless ABP estimation techniques embedded on wearables. In this paper, we present an innovative personalized medicine approach for cuffless arterial blood pressure estimation, characterized by its patient-specific focus and computational requirements reduction. An XGBoost patient specific ABP estimator model is optimized for each patient through Bayesian techniques, using their photoplethysmogram (PPG) features. The proposed method achieves a mean absolute error (MAE) of 7.27 mmHg for systolic and 3.33 mmHg for diastolic blood pressure. Additionally, recursive feature elimination techniques are used to streamline the model, making it suitable for resource-limited environments such as wearables platforms. This combination of approaches offers a promising outlook for the application of personalized medicine in blood pressure monitoring, thereby enhancing hypertension management and reducing associated health risks.

{"title":"Advancing cuffless arterial blood pressure estimation: A patient-specific optimized approach reducing computational requirements","authors":"José A. González-Nóvoa , Laura Busto , Silvia Campanioni , Carlos Martínez , José Fariña , Juan J. Rodríguez-Andina , Pablo Juan-Salvadores , Víctor Jiménez , Andrés Íñiguez , César Veiga","doi":"10.1016/j.future.2024.107689","DOIUrl":"10.1016/j.future.2024.107689","url":null,"abstract":"<div><div>Hypertension remains a leading cause of premature mortality globally, emphasizing the critical need for early detection and management. Unfortunately, less than half of hypertensive adults receive proper diagnosis and treatment. To address this gap, continuous blood pressure (ABP) monitoring has emerged as a valuable tool for detecting cardiovascular complications before they escalate. ABP monitoring can be achieved by cuffless ABP estimation techniques embedded on wearables. In this paper, we present an innovative personalized medicine approach for cuffless arterial blood pressure estimation, characterized by its patient-specific focus and computational requirements reduction. An XGBoost patient specific ABP estimator model is optimized for each patient through Bayesian techniques, using their photoplethysmogram (PPG) features. The proposed method achieves a mean absolute error (MAE) of 7.27 mmHg for systolic and 3.33 mmHg for diastolic blood pressure. Additionally, recursive feature elimination techniques are used to streamline the model, making it suitable for resource-limited environments such as wearables platforms. This combination of approaches offers a promising outlook for the application of personalized medicine in blood pressure monitoring, thereby enhancing hypertension management and reducing associated health risks.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107689"},"PeriodicalIF":6.2,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143167346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A mask guided cross data augmentation method for industrial defect detection 一种用于工业缺陷检测的掩模引导交叉数据增强方法

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-24 DOI: 10.1016/j.future.2024.107676

Xubin Wang, Wenju Li, Chang Lu

Expert in pattern recognition, deep learning is widely used for defect detection. Data augmentation is in principle capable of improving the accuracy and robustness of such data-driven models by supplementing patterns. However, the requirements for realistic images, precise annotations, and diverse defect patterns often cannot be addressed simultaneously in current augmentation methods. In this work, a Mask Guided Cross Data Augmentation method dubbed MGCDA using diffusion model is proposed to boost defect detection. Firstly, a generation pipeline in latent diffusion space utilizing autoencoder is formulated to improve the fidelity and resource effort. Based on this, we propose to adopt conditional mechanism to enable samples being synthesized under the guidance of specific masks. To further enhance the information gain, a cross-learning strategy is proposed to empower MGCDA learning and generalizing diverse defect patterns from different categories, making detection more robust. Finally, two strategies are proposed to tackle the demand for data augmentation in different situations. Experiments on eight common industrial datasets show that MGCDA has high applicability to different scenarios and detection models, it can generate high-fidelity samples aligned to guidance and effectively improve the performance of baselines at both image- and pixel-level.

深度学习是模式识别的专家，被广泛应用于缺陷检测。数据增强原则上能够通过补充模式来提高这种数据驱动模型的准确性和健壮性。然而，在当前的增强方法中，对真实图像、精确注释和多种缺陷模式的需求往往不能同时得到满足。在这项工作中，提出了一种基于扩散模型的掩模引导交叉数据增强方法MGCDA来增强缺陷检测。首先，提出了一种利用自编码器的潜在扩散空间生成流程，以提高图像的保真度和资源消耗。基于此，我们提出采用条件机制，使样品在特定掩模的引导下合成。为了进一步提高信息增益，提出了一种交叉学习策略，使MGCDA能够学习和概括来自不同类别的不同缺陷模式，使检测更加鲁棒。最后，针对不同情况下的数据扩充需求，提出了两种策略。在8个常见工业数据集上的实验表明，MGCDA对不同场景和检测模型具有较高的适用性，能够生成与指导对齐的高保真样本，有效提高了基线在图像和像素级的性能。

{"title":"A mask guided cross data augmentation method for industrial defect detection","authors":"Xubin Wang, Wenju Li, Chang Lu","doi":"10.1016/j.future.2024.107676","DOIUrl":"10.1016/j.future.2024.107676","url":null,"abstract":"<div><div>Expert in pattern recognition, deep learning is widely used for defect detection. Data augmentation is in principle capable of improving the accuracy and robustness of such data-driven models by supplementing patterns. However, the requirements for realistic images, precise annotations, and diverse defect patterns often cannot be addressed simultaneously in current augmentation methods. In this work, a <strong>M</strong>ask <strong>G</strong>uided <strong>C</strong>ross <strong>D</strong>ata <strong>A</strong>ugmentation method dubbed MGCDA using diffusion model is proposed to boost defect detection. Firstly, a generation pipeline in latent diffusion space utilizing autoencoder is formulated to improve the fidelity and resource effort. Based on this, we propose to adopt conditional mechanism to enable samples being synthesized under the guidance of specific masks. To further enhance the information gain, a cross-learning strategy is proposed to empower MGCDA learning and generalizing diverse defect patterns from different categories, making detection more robust. Finally, two strategies are proposed to tackle the demand for data augmentation in different situations. Experiments on eight common industrial datasets show that MGCDA has high applicability to different scenarios and detection models, it can generate high-fidelity samples aligned to guidance and effectively improve the performance of baselines at both image- and pixel-level.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107676"},"PeriodicalIF":6.2,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual-channel meta-federated graph learning with robust aggregation and privacy enhancement 具有鲁棒聚合和隐私增强的双通道元联邦图学习

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-24 DOI: 10.1016/j.future.2024.107677

Jingtong Huang , Xu Ma , Yuan Ma , Kehao Chen , Xiaoyu Zhang

Graph neural networks (GNNs) are effective for graph-based node classification tasks, such as data mining and recommendation systems. Combining federated learning(FL) with GNN enables multiple participants to collaboratively train powerful models without sharing private data. However, subgraph-level FL faces challenges, including missing cross-client edges and non-IID data distributions. Additionally, ensuring security in non-fully trusted environments is a critical concern. To address these issues, we propose RMFGL (Robust Meta Federated Graph Learning), a framework for subgraph-level node classification. RMFGL integrates cross-client information through pre-feature aggregation and leverages model-agnostic meta-learning (MAML) to optimize meta-parameters with minimal federated updates. For robustness, we employ a GCN architecture with dual-channel attention aggregation, while Multi-key Fully Homomorphic Encryption (MKFHE) ensures privacy during training. Experimental results on Cora, CiteSeer, PubMed and Coauthor-CS datasets show that RMFGL achieves up to a 2x accuracy improvement with minimal fine-tuning compared to baseline methods and outperforms state-of-the-art techniques. Notably, RMFGL significantly enhances robustness against malicious clients, with up to 100x improvement in stability, while maintaining strong performance with non-IID data.

图神经网络（gnn）是一种有效的基于图的节点分类任务，如数据挖掘和推荐系统。将联邦学习（FL）与GNN相结合，使多个参与者能够在不共享私有数据的情况下协作训练强大的模型。然而，子图级FL面临着挑战，包括缺少跨客户端边缘和非iid数据分布。此外，确保非完全可信环境中的安全性也是一个关键问题。为了解决这些问题，我们提出了RMFGL（鲁棒元联邦图学习），这是一种用于子图级节点分类的框架。RMFGL通过预特征聚合集成跨客户端的信息，并利用与模型无关的元学习（MAML）以最少的联邦更新优化元参数。在鲁棒性方面，我们采用了具有双通道注意力聚合的GCN架构，而多密钥完全同态加密（MKFHE）确保了训练期间的隐私性。在Cora， CiteSeer， PubMed和Coauthor-CS数据集上的实验结果表明，与基线方法相比，RMFGL以最小的微调实现了高达2倍的精度提高，并且优于最先进的技术。值得注意的是，RMFGL显著增强了对恶意客户端的鲁棒性，稳定性提高了100倍，同时在处理非iid数据时保持了强大的性能。

{"title":"Dual-channel meta-federated graph learning with robust aggregation and privacy enhancement","authors":"Jingtong Huang , Xu Ma , Yuan Ma , Kehao Chen , Xiaoyu Zhang","doi":"10.1016/j.future.2024.107677","DOIUrl":"10.1016/j.future.2024.107677","url":null,"abstract":"<div><div>Graph neural networks (GNNs) are effective for graph-based node classification tasks, such as data mining and recommendation systems. Combining federated learning(FL) with GNN enables multiple participants to collaboratively train powerful models without sharing private data. However, subgraph-level FL faces challenges, including missing cross-client edges and non-IID data distributions. Additionally, ensuring security in non-fully trusted environments is a critical concern. To address these issues, we propose RMFGL (<strong>R</strong>obust <strong>M</strong>eta <strong>F</strong>ederated <strong>G</strong>raph <strong>L</strong>earning), a framework for subgraph-level node classification. RMFGL integrates cross-client information through pre-feature aggregation and leverages model-agnostic meta-learning (MAML) to optimize meta-parameters with minimal federated updates. For robustness, we employ a GCN architecture with dual-channel attention aggregation, while Multi-key Fully Homomorphic Encryption (MKFHE) ensures privacy during training. Experimental results on Cora, CiteSeer, PubMed and Coauthor-CS datasets show that RMFGL achieves up to a 2x accuracy improvement with minimal fine-tuning compared to baseline methods and outperforms state-of-the-art techniques. Notably, RMFGL significantly enhances robustness against malicious clients, with up to 100x improvement in stability, while maintaining strong performance with non-IID data.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107677"},"PeriodicalIF":6.2,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142889237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scheduling energy-constrained parallel applications in heterogeneous systems 异构系统中调度能量受限的并行应用程序

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-21 DOI: 10.1016/j.future.2024.107678

Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li

With the rapid development of information technology, efficient energy utilization has become a major challenge in modern computing system design. This paper focuses on the energy-constrained parallel application scheduling problem in heterogeneous systems and proposes three algorithms to minimize the makespan of applications. The first one is the minimum makespan algorithm under energy constraints. In this algorithm, we construct an optimal cost table with energy constraints, which can be applied to determine the priority of tasks and the processors allocated in the application. The second one is the energy reclaiming algorithm, which is used to reclaim some energy from non-critical tasks while ensuring that the makespan of the application remains unchanged. The third one is the energy reallocation algorithm, which tends to allocate reclaimed energy to critical tasks to increase their execution frequency, thereby reducing the makespan of the entire application. Experiments were conducted on different parallel applications in various scenarios, and the results showed that the proposed algorithm can achieve smaller makespan compared to existing algorithms in most cases.

随着信息技术的飞速发展，高效利用能源已成为现代计算系统设计的主要挑战。针对异构系统中能量受限的并行应用程序调度问题，提出了三种最小化应用程序完工时间的算法。第一个是能量约束下的最小完工时间算法。在该算法中，我们构造了一个具有能量约束的最优代价表，用于确定任务的优先级和在应用程序中分配的处理器。第二种是能量回收算法，用于从非关键任务中回收一些能量，同时确保应用程序的makespan保持不变。第三种是能量重新分配算法，它倾向于将回收的能量分配给关键任务，以增加其执行频率，从而减少整个应用程序的makespan。在不同的并行应用场景下进行了实验，结果表明，在大多数情况下，与现有算法相比，本文算法可以实现更小的makespan。

{"title":"Scheduling energy-constrained parallel applications in heterogeneous systems","authors":"Hongzhi Xu , Binlian Zhang , Chen Pan , Keqin Li","doi":"10.1016/j.future.2024.107678","DOIUrl":"10.1016/j.future.2024.107678","url":null,"abstract":"<div><div>With the rapid development of information technology, efficient energy utilization has become a major challenge in modern computing system design. This paper focuses on the energy-constrained parallel application scheduling problem in heterogeneous systems and proposes three algorithms to minimize the makespan of applications. The first one is the minimum makespan algorithm under energy constraints. In this algorithm, we construct an optimal cost table with energy constraints, which can be applied to determine the priority of tasks and the processors allocated in the application. The second one is the energy reclaiming algorithm, which is used to reclaim some energy from non-critical tasks while ensuring that the makespan of the application remains unchanged. The third one is the energy reallocation algorithm, which tends to allocate reclaimed energy to critical tasks to increase their execution frequency, thereby reducing the makespan of the entire application. Experiments were conducted on different parallel applications in various scenarios, and the results showed that the proposed algorithm can achieve smaller makespan compared to existing algorithms in most cases.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107678"},"PeriodicalIF":6.2,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142889238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Stochastic Gradient Descent (SGD) for erratic datasets

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience

Pub Date : 2024-12-21 DOI: 10.1016/j.future.2024.107682

Idriss Dagal , Kürşat Tanriöven , Ahmet Nayir , Burak Akın

Stochastic Gradient Descent (SGD) is a highly efficient optimization algorithm, particularly well suited for large datasets due to its incremental parameter updates. In this study, we apply SGD to a simple linear classifier using logistic regression, a widely used method for binary classification tasks. Unlike traditional batch Gradient Descent (GD), which processes the entire dataset simultaneously, SGD offers enhanced scalability and performance for streaming and large-scale data. Our experiments reveal that SGD outperforms GD across multiple performance metrics, achieving 45.83% accuracy compared to GD’s 41.67 %, and excelling in precision (60 % vs. 45.45 %), recall (100 % vs. 60 %), and F1-score (100 % vs. 62 %). Additionally, SGD achieves 99.99 % of Principal Component Analysis (PCA) accuracy, slightly surpassing GD’s 99.92 %.

These results highlight SGD’s superior efficiency and flexibility for large-scale data environments, driven by its ability to balance precision and recall effectively. To further enhance SGD’s robustness, the proposed method incorporates adaptive learning rates, momentum, and logistic regression, addressing traditional GD drawbacks. These modifications improve the algorithm’s stability, convergence behavior, and applicability to complex, large-scale optimization tasks where standard GD often struggles, making SGD a highly effective solution for challenging data-driven scenarios.

{"title":"Adaptive Stochastic Gradient Descent (SGD) for erratic datasets","authors":"Idriss Dagal , Kürşat Tanriöven , Ahmet Nayir , Burak Akın","doi":"10.1016/j.future.2024.107682","DOIUrl":"10.1016/j.future.2024.107682","url":null,"abstract":"<div><div>Stochastic Gradient Descent (SGD) is a highly efficient optimization algorithm, particularly well suited for large datasets due to its incremental parameter updates. In this study, we apply SGD to a simple linear classifier using logistic regression, a widely used method for binary classification tasks. Unlike traditional batch Gradient Descent (GD), which processes the entire dataset simultaneously, SGD offers enhanced scalability and performance for streaming and large-scale data. Our experiments reveal that SGD outperforms GD across multiple performance metrics, achieving 45.83% accuracy compared to GD’s 41.67 %, and excelling in precision (60 % vs. 45.45 %), recall (100 % vs. 60 %), and F1-score (100 % vs. 62 %). Additionally, SGD achieves 99.99 % of Principal Component Analysis (PCA) accuracy, slightly surpassing GD’s 99.92 %.</div><div>These results highlight SGD’s superior efficiency and flexibility for large-scale data environments, driven by its ability to balance precision and recall effectively. To further enhance SGD’s robustness, the proposed method incorporates adaptive learning rates, momentum, and logistic regression, addressing traditional GD drawbacks. These modifications improve the algorithm’s stability, convergence behavior, and applicability to complex, large-scale optimization tasks where standard GD often struggles, making SGD a highly effective solution for challenging data-driven scenarios.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107682"},"PeriodicalIF":6.2,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143167169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0