Muhammad Shalahuddin, W. Sunindyo, Mohammad Ridwan Effendi, K. Surendro
Modelers often create diverse system dynamics models for the same issue, depending on their viewpoints, which can decrease stakeholder assurance. Validating system dynamics may enhance stakeholder confidence. This study suggests using fuzzy‐set qualitative comparative analysis (fsQCA) as a technique based on a set theory approach to validate the causal connections between entities in causal loop diagram (CLD) models. This case study analyzed the issue of Indonesian mobile network operators with limited sample data, utilizing the fsQCA method to test causal connections between entities in the CLD model that require validation. Following the creation of the CLD model through the system dynamics methodology, fsQCA was employed to enhance the previously formed model. The fsQCA method fuses qualitative comparative analysis (QCA) with fuzzy set theory, permitting partial membership, and can identify causal links among entities in the CLD model. It assists in testing causal relationships using limited sample data and boosts stakeholder confidence in the CLD model.
{"title":"Fuzzy‐set qualitative comparative analysis (fsQCA) for validating causal relationships in system dynamics models","authors":"Muhammad Shalahuddin, W. Sunindyo, Mohammad Ridwan Effendi, K. Surendro","doi":"10.1002/eng2.12855","DOIUrl":"https://doi.org/10.1002/eng2.12855","url":null,"abstract":"Modelers often create diverse system dynamics models for the same issue, depending on their viewpoints, which can decrease stakeholder assurance. Validating system dynamics may enhance stakeholder confidence. This study suggests using fuzzy‐set qualitative comparative analysis (fsQCA) as a technique based on a set theory approach to validate the causal connections between entities in causal loop diagram (CLD) models. This case study analyzed the issue of Indonesian mobile network operators with limited sample data, utilizing the fsQCA method to test causal connections between entities in the CLD model that require validation. Following the creation of the CLD model through the system dynamics methodology, fsQCA was employed to enhance the previously formed model. The fsQCA method fuses qualitative comparative analysis (QCA) with fuzzy set theory, permitting partial membership, and can identify causal links among entities in the CLD model. It assists in testing causal relationships using limited sample data and boosts stakeholder confidence in the CLD model.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139807189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the problem that the whale optimization algorithm tends to fall into the local optimum and fails to maintain a balance between exploration and exploitation, an elitist whale optimization algorithm with the nonlinear parameter (EWOANP) is proposed in this paper. An elitist strategy based on the random Cauchy mutation is used in the shrinking encircling mechanism to increase the chance of escaping the local optimum. Cleverly, the strategy is to generate mutation solutions based on the random Cauchy mutation, after which the better population is selected to proceed to the next iteration. Then, a nonlinear parameter is used in the logarithmic spiral mechanism to balance exploration and exploitation. Various numerical optimization experiments are performed based on the IEEE CEC2020 benchmark suite and compared with eleven other algorithms. The results show that EWOANP outperforms most competitors in numerical optimization. Finally, the backpropagation neural network is optimized by EWOANP to build a prediction model for the sulfur content in the molten iron. The experimental results based on production data indicate that the proposed prediction model has a relatively small fluctuation in errors. Compared to the other seven competitors, the proposed model has a better prediction performance with and =0.916619.
{"title":"An elitist whale optimization algorithm with the nonlinear parameter: Algorithm and application","authors":"Yajing Zhang, Guoxu Zhang","doi":"10.1002/eng2.12857","DOIUrl":"https://doi.org/10.1002/eng2.12857","url":null,"abstract":"To address the problem that the whale optimization algorithm tends to fall into the local optimum and fails to maintain a balance between exploration and exploitation, an elitist whale optimization algorithm with the nonlinear parameter (EWOANP) is proposed in this paper. An elitist strategy based on the random Cauchy mutation is used in the shrinking encircling mechanism to increase the chance of escaping the local optimum. Cleverly, the strategy is to generate mutation solutions based on the random Cauchy mutation, after which the better population is selected to proceed to the next iteration. Then, a nonlinear parameter is used in the logarithmic spiral mechanism to balance exploration and exploitation. Various numerical optimization experiments are performed based on the IEEE CEC2020 benchmark suite and compared with eleven other algorithms. The results show that EWOANP outperforms most competitors in numerical optimization. Finally, the backpropagation neural network is optimized by EWOANP to build a prediction model for the sulfur content in the molten iron. The experimental results based on production data indicate that the proposed prediction model has a relatively small fluctuation in errors. Compared to the other seven competitors, the proposed model has a better prediction performance with and =0.916619.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139867114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Karaca, Mert Guney, A. Agibayeva, Nurlan Otesh, M. Kulimbet, Natalya Glushkova, Yuefang Chang, Akira Sekikawa, K. Davletov
The present study introduces a concentration estimation model for indoor inhalable fine particles (PM2.5) during cooking activities in typical Kazakh houses, which are generally poorly ventilated with high emission levels. The aim of the present work is to identify factors influencing PM2.5 concentrations during cooking and elucidate the mechanisms underlying the build‐up and reduction of PM2.5 concentrations. These are achieved through a methodology that combines PM2.5 sampling, monitoring, and modeling to predict household PM2.5 levels and estimate daily concentrations. Specifically, USEPA's IAQX v1.1 was employed to simulate the one‐zone concept (kitchen) for concentrations related to cooking activities in several households. The results reveal that PM2.5 concentrations varied between 13 and 266 μg/m3 during cooking activities. Factors such as kitchen size, air exchange characteristics, and the type of food and cooking style were identified as important, influencing the observed concentrations. The model accurately captured concentration trends (R > 0.9). However, certain predictions tended to overestimate the measurements, attributing to inaccuracies in selecting air exchange and emission rates. Cooking activities contributed to household air pollutant (HAP) PM2.5 levels ranging from 9% to 94%. Notably, during the non‐heating period of the year (corresponding to the warmer half of the year), the impact of cooking became more significant and was identified as a major contributor to indoor PM2.5 concentrations. Conversely, during the heating period (i.e., the colder part of the year), outdoor PM levels and household ventilation practices played primary roles in regulating indoor air concentrations. This present study presents one of the initial efforts to assess household air pollutants in Central Asia, providing foundation and insights into the indoor air quality of Kazakh houses, where the understanding of indoor air quality remains limited. Future research recommendations include developing advanced models that account for individual activity patterns and specific house types for improved accuracy and representativeness.
{"title":"Indoor air quality in Kazakh households: Evaluating PM2.5 levels generated by cooking activities","authors":"F. Karaca, Mert Guney, A. Agibayeva, Nurlan Otesh, M. Kulimbet, Natalya Glushkova, Yuefang Chang, Akira Sekikawa, K. Davletov","doi":"10.1002/eng2.12845","DOIUrl":"https://doi.org/10.1002/eng2.12845","url":null,"abstract":"The present study introduces a concentration estimation model for indoor inhalable fine particles (PM2.5) during cooking activities in typical Kazakh houses, which are generally poorly ventilated with high emission levels. The aim of the present work is to identify factors influencing PM2.5 concentrations during cooking and elucidate the mechanisms underlying the build‐up and reduction of PM2.5 concentrations. These are achieved through a methodology that combines PM2.5 sampling, monitoring, and modeling to predict household PM2.5 levels and estimate daily concentrations. Specifically, USEPA's IAQX v1.1 was employed to simulate the one‐zone concept (kitchen) for concentrations related to cooking activities in several households. The results reveal that PM2.5 concentrations varied between 13 and 266 μg/m3 during cooking activities. Factors such as kitchen size, air exchange characteristics, and the type of food and cooking style were identified as important, influencing the observed concentrations. The model accurately captured concentration trends (R > 0.9). However, certain predictions tended to overestimate the measurements, attributing to inaccuracies in selecting air exchange and emission rates. Cooking activities contributed to household air pollutant (HAP) PM2.5 levels ranging from 9% to 94%. Notably, during the non‐heating period of the year (corresponding to the warmer half of the year), the impact of cooking became more significant and was identified as a major contributor to indoor PM2.5 concentrations. Conversely, during the heating period (i.e., the colder part of the year), outdoor PM levels and household ventilation practices played primary roles in regulating indoor air concentrations. This present study presents one of the initial efforts to assess household air pollutants in Central Asia, providing foundation and insights into the indoor air quality of Kazakh houses, where the understanding of indoor air quality remains limited. Future research recommendations include developing advanced models that account for individual activity patterns and specific house types for improved accuracy and representativeness.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139869015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes the use of an unmanned aerial vehicle (UAV) with an autopilot to identify the defects present in municipal sewerage pipes. The framework also includes an effective autopilot control mechanism that can direct the flight path of a UAV within a sewer line. Both of these breakthroughs have been addressed throughout this work. The UAV's camera proved useful throughout a sewage inspection, providing important contextual data that helped analyze the sewerage line's internal condition. A plethora of information useful for understanding the sewerage line's inner functioning and extracting interior visual details can be obtained from camera‐recorded sewerage imagery if a defect is present. In the case of sewerage inspections, nevertheless, the impact of a false negative is significantly higher than that of a false positive. One of the trickiest parts of the procedure is identifying defective sewerage pipelines and false negatives. In order to get rid of the false negative outcome or false positive outcome, a guided image filter (GIF) is implemented in this proposed method during the pre‐processing stage. Afterwards, the algorithms Gabor transform (GT) and stroke width transform (SWT) were used to obtain the features of the UAV‐captured surveillance image. The UAV camera's sewerage image is then classified as “defective” or “not defective” using the obtained features by a Weighted Naive Bayes Classifier (WNBC). Next, images of the sewerage lines captured by the UAV are analyzed using speed‐up robust features (SURF) and deep learning to identify different types of defects. As a result, the proposed methodology achieved more favorable outcomes than prior existing approaches in terms of the following metrics: mean PSNR (71.854), mean MSE (0.0618), mean RMSE (0.2485), mean SSIM (98.71%), mean accuracy (98.372), mean specificity (97.837%), mean precision (93.296%), mean recall (94.255%), mean F1‐score (93.773%), and mean processing time (35.43 min).
{"title":"Autopilot control unmanned aerial vehicle system for sewage defect detection using deep learning","authors":"B. Pandey, Digvijay Pandey, S. K. Sahani","doi":"10.1002/eng2.12852","DOIUrl":"https://doi.org/10.1002/eng2.12852","url":null,"abstract":"This work proposes the use of an unmanned aerial vehicle (UAV) with an autopilot to identify the defects present in municipal sewerage pipes. The framework also includes an effective autopilot control mechanism that can direct the flight path of a UAV within a sewer line. Both of these breakthroughs have been addressed throughout this work. The UAV's camera proved useful throughout a sewage inspection, providing important contextual data that helped analyze the sewerage line's internal condition. A plethora of information useful for understanding the sewerage line's inner functioning and extracting interior visual details can be obtained from camera‐recorded sewerage imagery if a defect is present. In the case of sewerage inspections, nevertheless, the impact of a false negative is significantly higher than that of a false positive. One of the trickiest parts of the procedure is identifying defective sewerage pipelines and false negatives. In order to get rid of the false negative outcome or false positive outcome, a guided image filter (GIF) is implemented in this proposed method during the pre‐processing stage. Afterwards, the algorithms Gabor transform (GT) and stroke width transform (SWT) were used to obtain the features of the UAV‐captured surveillance image. The UAV camera's sewerage image is then classified as “defective” or “not defective” using the obtained features by a Weighted Naive Bayes Classifier (WNBC). Next, images of the sewerage lines captured by the UAV are analyzed using speed‐up robust features (SURF) and deep learning to identify different types of defects. As a result, the proposed methodology achieved more favorable outcomes than prior existing approaches in terms of the following metrics: mean PSNR (71.854), mean MSE (0.0618), mean RMSE (0.2485), mean SSIM (98.71%), mean accuracy (98.372), mean specificity (97.837%), mean precision (93.296%), mean recall (94.255%), mean F1‐score (93.773%), and mean processing time (35.43 min).","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140488599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, penetration testing (pen‐testing) has emerged as a crucial process for evaluating the security level of network infrastructures by simulating real‐world cyber‐attacks. Automating pen‐testing through reinforcement learning (RL) facilitates more frequent assessments, minimizes human effort, and enhances scalability. However, real‐world pen‐testing tasks often involve incomplete knowledge of the target network system. Effectively managing the intrinsic uncertainties via partially observable Markov decision processes (POMDPs) constitutes a persistent challenge within the realm of pen‐testing. Furthermore, RL agents are compelled to formulate intricate strategies to contend with the challenges posed by partially observable environments, thereby engendering augmented computational and temporal expenditures. To address these issues, this study introduces EPPTA (efficient POMDP‐driven penetration testing agent), an agent built on an asynchronous RL framework, designed for conducting pen‐testing tasks within partially observable environments. We incorporate an implicit belief module in EPPTA, grounded on the belief update formula of the traditional POMDP model, which represents the agent's probabilistic estimation of the current environment state. Furthermore, by integrating the algorithm with the high‐performance RL framework, sample factory, EPPTA significantly reduces convergence time compared to existing pen‐testing methods, resulting in an approximately 20‐fold acceleration. Empirical results across various pen‐testing scenarios validate EPPTA's superior task reward performance and enhanced scalability, providing substantial support for efficient and advanced evaluation of network infrastructure security.
{"title":"EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications","authors":"Zegang Li, Qian Zhang, Guangwen Yang","doi":"10.1002/eng2.12818","DOIUrl":"https://doi.org/10.1002/eng2.12818","url":null,"abstract":"In recent years, penetration testing (pen‐testing) has emerged as a crucial process for evaluating the security level of network infrastructures by simulating real‐world cyber‐attacks. Automating pen‐testing through reinforcement learning (RL) facilitates more frequent assessments, minimizes human effort, and enhances scalability. However, real‐world pen‐testing tasks often involve incomplete knowledge of the target network system. Effectively managing the intrinsic uncertainties via partially observable Markov decision processes (POMDPs) constitutes a persistent challenge within the realm of pen‐testing. Furthermore, RL agents are compelled to formulate intricate strategies to contend with the challenges posed by partially observable environments, thereby engendering augmented computational and temporal expenditures. To address these issues, this study introduces EPPTA (efficient POMDP‐driven penetration testing agent), an agent built on an asynchronous RL framework, designed for conducting pen‐testing tasks within partially observable environments. We incorporate an implicit belief module in EPPTA, grounded on the belief update formula of the traditional POMDP model, which represents the agent's probabilistic estimation of the current environment state. Furthermore, by integrating the algorithm with the high‐performance RL framework, sample factory, EPPTA significantly reduces convergence time compared to existing pen‐testing methods, resulting in an approximately 20‐fold acceleration. Empirical results across various pen‐testing scenarios validate EPPTA's superior task reward performance and enhanced scalability, providing substantial support for efficient and advanced evaluation of network infrastructure security.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138999089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Global climate change is an important issue that all of humanity needs to address together. Precipitation is an important climatic feature for agricultural development and food security, and the study of precipitation and its associated climatic factors is important for the analysis of global change. As an important part of China's food production, Northeast China has a temperate monsoon climate with simultaneous rain and heat, which is favorable for crop growth. In this paper, a scientific workflow for climate data analysis with a learning‐based method is designed. Using climate data from typical models in CMIP6, a machine learning‐based approach is used to establish regression relationships between precipitation and climate variables such as temperature, humidity and wind speed in Northeast China, which is validated through a time series approach. We design a weight‐based model ensemble method and a learning‐based bias correction method, so that the ensemble model can achieve better performance. We also analyze the precipitation trends in Northeast China under the three Shared Socio‐economic Pathways (SSPs). This will help researchers to analyze the long‐term evolution and factors of climate.
{"title":"A learning‐based approach to regression analysis for climate data–A case of Northeast China","authors":"Jiaxu Guo, Yidan Xu, Liang Hu, Xianwei Wu, Gaochao Xu, Xilong Che","doi":"10.1002/eng2.12797","DOIUrl":"https://doi.org/10.1002/eng2.12797","url":null,"abstract":"Global climate change is an important issue that all of humanity needs to address together. Precipitation is an important climatic feature for agricultural development and food security, and the study of precipitation and its associated climatic factors is important for the analysis of global change. As an important part of China's food production, Northeast China has a temperate monsoon climate with simultaneous rain and heat, which is favorable for crop growth. In this paper, a scientific workflow for climate data analysis with a learning‐based method is designed. Using climate data from typical models in CMIP6, a machine learning‐based approach is used to establish regression relationships between precipitation and climate variables such as temperature, humidity and wind speed in Northeast China, which is validated through a time series approach. We design a weight‐based model ensemble method and a learning‐based bias correction method, so that the ensemble model can achieve better performance. We also analyze the precipitation trends in Northeast China under the three Shared Socio‐economic Pathways (SSPs). This will help researchers to analyze the long‐term evolution and factors of climate.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138981265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High‐performance computing is progressively assuming a fundamental role in advancing scientific research and engineering domains. However, the ever‐expanding scales of scientific simulations pose challenges for efficient data I/O and storage. The data compression technology has garnered significant attention as a solution to reduce data transmission and storage costs while enhancing performance. In particular, the BZIP2 lossless compression algorithm has been widely used due to its exceptional compression ratio, moderate compression speed, high reliability, and open‐source nature. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New‐Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New‐Generation Sunway processor, we propose the highly tuned multi‐threading and multi‐node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high‐level (de)compression interfaces. The test results indicate that the our multi‐threading implementation achieves maximum speedup of 23.09 (8.57) in decompression(compression) compared to the sequential implementation. Furthermore, the multi‐node implementation achieves 50.81% (26.35%) parallel efficiency and peak performance of 16.6 GB/s (52.8 GB/s) for compression(decompression) when scaling up to 2048 processes.
{"title":"Refactoring BZIP2 on the new‐generation sunway supercomputer","authors":"Xiaohui Liu, Zekun Yin, Haodong Tian, Wubing Wan, Mengyuan Hua, Wenlai Zhao, Zhenchun Huang, Ping Gao, Fangjin Zhu, Hua Wang, Xiaohui Duan","doi":"10.1002/eng2.12806","DOIUrl":"https://doi.org/10.1002/eng2.12806","url":null,"abstract":"High‐performance computing is progressively assuming a fundamental role in advancing scientific research and engineering domains. However, the ever‐expanding scales of scientific simulations pose challenges for efficient data I/O and storage. The data compression technology has garnered significant attention as a solution to reduce data transmission and storage costs while enhancing performance. In particular, the BZIP2 lossless compression algorithm has been widely used due to its exceptional compression ratio, moderate compression speed, high reliability, and open‐source nature. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New‐Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New‐Generation Sunway processor, we propose the highly tuned multi‐threading and multi‐node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high‐level (de)compression interfaces. The test results indicate that the our multi‐threading implementation achieves maximum speedup of 23.09 (8.57) in decompression(compression) compared to the sequential implementation. Furthermore, the multi‐node implementation achieves 50.81% (26.35%) parallel efficiency and peak performance of 16.6 GB/s (52.8 GB/s) for compression(decompression) when scaling up to 2048 processes.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135821318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaoqi Sun, Zhen Wang, Mengyuan Hua, Puyu Xiong, Wubing Wan, Ping Gao, Wenlai Zhao, Zhenchun Huang, Lin Han
Abstract With the increasing popularity of high‐resolution displays, there is a growing demand for more realistic rendered images. Ray tracing has become the most effective algorithm for image rendering, but its complexity and large amount of computing data require sophisticated HPC solutions. In this article, we present our efforts to port the ray tracing engine CYCLES of Blender to the new generation of Sunway supercomputers. We propose optimizations that are tailored to the new hardware architecture, including a multi‐level parallel scheme that efficiently maps and scales Blender onto the novel Sunway architecture, strategies to address memory bottlenecks, a revised task dispatching method that achieves excellent load balancing, and a pipeline approach that maximizes computation and communication overlap. By combining all these optimizations, we achieve a significant reduction in rendering time for a single‐frame image, from 2260 s using the single‐core serial version to 71 s using 48 processes, which is a speedup of about 128×. Accelerating the ray tracing engine CYCLES of Blender in the new generation of Sunway supercomputers.
{"title":"Accelerating ray tracing engine of <scp>BLENDER</scp> on the new Sunway architecture","authors":"Zhaoqi Sun, Zhen Wang, Mengyuan Hua, Puyu Xiong, Wubing Wan, Ping Gao, Wenlai Zhao, Zhenchun Huang, Lin Han","doi":"10.1002/eng2.12789","DOIUrl":"https://doi.org/10.1002/eng2.12789","url":null,"abstract":"Abstract With the increasing popularity of high‐resolution displays, there is a growing demand for more realistic rendered images. Ray tracing has become the most effective algorithm for image rendering, but its complexity and large amount of computing data require sophisticated HPC solutions. In this article, we present our efforts to port the ray tracing engine CYCLES of Blender to the new generation of Sunway supercomputers. We propose optimizations that are tailored to the new hardware architecture, including a multi‐level parallel scheme that efficiently maps and scales Blender onto the novel Sunway architecture, strategies to address memory bottlenecks, a revised task dispatching method that achieves excellent load balancing, and a pipeline approach that maximizes computation and communication overlap. By combining all these optimizations, we achieve a significant reduction in rendering time for a single‐frame image, from 2260 s using the single‐core serial version to 71 s using 48 processes, which is a speedup of about 128×. Accelerating the ray tracing engine CYCLES of Blender in the new generation of Sunway supercomputers.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135863399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Shu, Xuhong Qiang, Xu Jiang, Yi Xiao, Hao Dong
Abstract Compared with traditional technology, bonding technology is more suitable for civil structure reinforcement because of its cost‐efficiency and superior mechanical properties. However, research on the long‐term performance of single‐lap joints (SLJs) requires better organization and comprehension. This article aims to investigate the long‐term performance and optimization design of SLJs. The main factors influencing the long‐term performance of SLJs from both material and component levels are discussed. The moisture diffusion mechanisms of bulk adhesives and the degradation mechanisms of SLJs are explored. Moreover, the optimization design of SLJs focuses on evaluating the overlap length, adhesive layer thicknesses, and changes in adhesives along the overlap length based on available literature. It is found that the applicability of diffusion models should be validated, and the selection of the models should consider working environments and types of adhesives. Exploring failure mechanisms and design criteria for the mixed SLJs in hygrothermal environments with/without sustained or alternating load is significant for the optimization design. This article indicates the limitations on the shear strength and long‐term performance of SLJs in available studies and provides insights into the challenges and prospects of their optimization design.
{"title":"Long‐term performance of single‐lap joints: Review, challenges and prospects in civil engineering","authors":"Yue Shu, Xuhong Qiang, Xu Jiang, Yi Xiao, Hao Dong","doi":"10.1002/eng2.12769","DOIUrl":"https://doi.org/10.1002/eng2.12769","url":null,"abstract":"Abstract Compared with traditional technology, bonding technology is more suitable for civil structure reinforcement because of its cost‐efficiency and superior mechanical properties. However, research on the long‐term performance of single‐lap joints (SLJs) requires better organization and comprehension. This article aims to investigate the long‐term performance and optimization design of SLJs. The main factors influencing the long‐term performance of SLJs from both material and component levels are discussed. The moisture diffusion mechanisms of bulk adhesives and the degradation mechanisms of SLJs are explored. Moreover, the optimization design of SLJs focuses on evaluating the overlap length, adhesive layer thicknesses, and changes in adhesives along the overlap length based on available literature. It is found that the applicability of diffusion models should be validated, and the selection of the models should consider working environments and types of adhesives. Exploring failure mechanisms and design criteria for the mixed SLJs in hygrothermal environments with/without sustained or alternating load is significant for the optimization design. This article indicates the limitations on the shear strength and long‐term performance of SLJs in available studies and provides insights into the challenges and prospects of their optimization design.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135011417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general‐purpose high‐performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large‐scale and long‐time many‐body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the Tersoff potential is designed based on field‐programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on‐the‐fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal‐performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the Tersoff simulation in a 55,296‐atom system, which is a 2.00 increase in performance when compared to Intel I7‐8700K and 1.70 to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00 and 7.19 compared to Intel I7‐8700K, and 4.33 and 2.11 compared to NVIDIA Titan Xp, respectively.
分子动力学模拟是帮助人类了解微观世界的常用方法。传统的通用高性能计算平台受到低计算效率和低功耗的制约,限制了大规模、长时间多体分子动力学模拟的实际应用。为了解决这些问题,基于现场可编程门阵列(FPGA)平台设计了一种新型的Tersoff势分子动力学加速器,该加速器可以使用FPGA加速LAMMPS。首先,提出了一种实时构建邻居列表的方法,减少了存储空间的使用。此外,为了使加速器能够灵活地部署在不同规模的fpga上,并获得良好的性能,还实现了多电平并行化。最后,建立了加速器的数学模型,并提出了一种利用模型确定最佳性能参数的方法。实验结果表明,在Xilinx Alveo U200上进行测试时,所提出的加速器在55,296原子系统中实现了9.51 ns/day的Tersoff模拟性能,与Intel I7‐8700K相比,在相同的测试用例下,性能提高了2.00,与NVIDIA Tesla K40c相比提高了1.70。此外,在计算效率和功耗效率方面,该加速器与Intel I7‐8700K相比分别提高了2.00和7.19,与NVIDIA Titan Xp相比分别提高了4.33和2.11。
{"title":"Field‐programmable gate array acceleration of the Tersoff potential in LAMMPS","authors":"Quan Deng, Qiang Liu","doi":"10.1002/eng2.12694","DOIUrl":"https://doi.org/10.1002/eng2.12694","url":null,"abstract":"Abstract Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general‐purpose high‐performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large‐scale and long‐time many‐body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the Tersoff potential is designed based on field‐programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on‐the‐fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal‐performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the Tersoff simulation in a 55,296‐atom system, which is a 2.00 increase in performance when compared to Intel I7‐8700K and 1.70 to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00 and 7.19 compared to Intel I7‐8700K, and 4.33 and 2.11 compared to NVIDIA Titan Xp, respectively.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135792476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}