首页 > 最新文献

Journal of Computational Science最新文献

英文 中文
Private linear equation solving: An application to federated learning and extreme learning machines 私有线性方程求解:在联邦学习和极限学习机中的应用
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-26 DOI: 10.1016/j.jocs.2025.102693
Daniel Heinlein, Anton Akusok, Kaj-Mikael Björk, Leonardo Espinosa-Leal
In federated learning, multiple devices compute each a part of a common machine learning model using their own private data. These partial models (or their parameters) are then exchanged in a central server that builds an aggregated model. This sharing process may leak information about the data used to train them. This problem intensifies as the machine learning model becomes simpler, indicating a higher risk for single-hidden-layer feedforward neural networks, such as extreme learning machines. In this paper, we establish a mechanism to disguise the input data to a system of linear equations while guaranteeing that the modifications do not alter the solutions, and propose two possible approaches to apply these techniques to federated learning. Our findings show that extreme learning machines can be used in federated learning with an extra security layer, making them attractive in learning schemes with limited computational resources.
在联合学习中,多个设备使用自己的私有数据计算通用机器学习模型的一部分。然后在构建聚合模型的中央服务器中交换这些部分模型(或它们的参数)。这个共享过程可能会泄露用于训练他们的数据信息。随着机器学习模型变得更简单,这个问题也会加剧,这表明单隐藏层前馈神经网络(如极限学习机)的风险更高。在本文中,我们建立了一种机制,将输入数据伪装成线性方程系统,同时保证修改不会改变解,并提出了两种可能的方法将这些技术应用于联邦学习。我们的研究结果表明,极限学习机可以用于具有额外安全层的联邦学习,使它们在计算资源有限的学习方案中具有吸引力。
{"title":"Private linear equation solving: An application to federated learning and extreme learning machines","authors":"Daniel Heinlein,&nbsp;Anton Akusok,&nbsp;Kaj-Mikael Björk,&nbsp;Leonardo Espinosa-Leal","doi":"10.1016/j.jocs.2025.102693","DOIUrl":"10.1016/j.jocs.2025.102693","url":null,"abstract":"<div><div>In federated learning, multiple devices compute each a part of a common machine learning model using their own private data. These partial models (or their parameters) are then exchanged in a central server that builds an aggregated model. This sharing process may leak information about the data used to train them. This problem intensifies as the machine learning model becomes simpler, indicating a higher risk for single-hidden-layer feedforward neural networks, such as extreme learning machines. In this paper, we establish a mechanism to disguise the input data to a system of linear equations while guaranteeing that the modifications do not alter the solutions, and propose two possible approaches to apply these techniques to federated learning. Our findings show that extreme learning machines can be used in federated learning with an extra security layer, making them attractive in learning schemes with limited computational resources.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"92 ","pages":"Article 102693"},"PeriodicalIF":3.7,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive Hamiltonian circuit of virtual sample generation for a small dataset 小数据集虚拟样本生成的自适应哈密顿电路
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-22 DOI: 10.1016/j.jocs.2025.102711
Totok Sutojo , Supriadi Rustad , Muhamad Akrom , Wahyu Aji Eko Prabowo , De Rosal Ignatius Moses Setiadi , Hermawan Kresno Dipojono , Yoshitada Morikawa
Small datasets often lead to poor performance of data-driven prediction models due to uneven data distribution and large data spacing. One popular approach to address this issue is to use virtual samples during machine learning (ML) model training. This study proposes a Hamiltonian Circuit Virtual Sample Generation (HCVSG) method to distribute virtual samples generated using interpolation techniques while integrating the K-Nearest Neighbors (KNN) algorithm in model development. The Hamiltonian circuit is chosen because it doesn’t depend on the distribution assumption and provides multiple circuits that allow adaptive sample distribution, allowing the selection of circuits that produce minimum errors. This method supports improving feature-target correlation, reducing the risk of overfitting, and stabilizing error values as model complexity increases. Applying this method to three datasets in material research (MLCC, PSH, and EFD) shows that HCVSG significantly improves prediction accuracy compared to conventional KNN and eight MTD-based methods. The distribution of virtual samples along the Hamiltonian circuit helps fill the information gap and makes the data distribution more even, ultimately improving the predictive model's performance.
数据集小,数据分布不均匀,数据间距大,往往会导致数据驱动预测模型性能不佳。解决这个问题的一个流行方法是在机器学习(ML)模型训练期间使用虚拟样本。本研究提出了一种哈密顿电路虚拟样本生成(HCVSG)方法来分配使用插值技术生成的虚拟样本,同时在模型开发中集成k -最近邻(KNN)算法。选择哈密顿电路是因为它不依赖于分布假设,并且提供了允许自适应样本分布的多个电路,允许选择产生最小误差的电路。该方法可以提高特征与目标的相关性,降低过拟合的风险,并随着模型复杂性的增加而稳定误差值。将该方法应用于材料研究中的3个数据集(MLCC、PSH和EFD)表明,与传统KNN和8种基于mtd的方法相比,HCVSG显著提高了预测精度。虚拟样本沿哈密顿电路的分布有助于填补信息缺口,使数据分布更加均匀,最终提高预测模型的性能。
{"title":"An adaptive Hamiltonian circuit of virtual sample generation for a small dataset","authors":"Totok Sutojo ,&nbsp;Supriadi Rustad ,&nbsp;Muhamad Akrom ,&nbsp;Wahyu Aji Eko Prabowo ,&nbsp;De Rosal Ignatius Moses Setiadi ,&nbsp;Hermawan Kresno Dipojono ,&nbsp;Yoshitada Morikawa","doi":"10.1016/j.jocs.2025.102711","DOIUrl":"10.1016/j.jocs.2025.102711","url":null,"abstract":"<div><div>Small datasets often lead to poor performance of data-driven prediction models due to uneven data distribution and large data spacing. One popular approach to address this issue is to use virtual samples during machine learning (ML) model training. This study proposes a Hamiltonian Circuit Virtual Sample Generation (HCVSG) method to distribute virtual samples generated using interpolation techniques while integrating the K-Nearest Neighbors (KNN) algorithm in model development. The Hamiltonian circuit is chosen because it doesn’t depend on the distribution assumption and provides multiple circuits that allow adaptive sample distribution, allowing the selection of circuits that produce minimum errors. This method supports improving feature-target correlation, reducing the risk of overfitting, and stabilizing error values as model complexity increases. Applying this method to three datasets in material research (MLCC, PSH, and EFD) shows that HCVSG significantly improves prediction accuracy compared to conventional KNN and eight MTD-based methods. The distribution of virtual samples along the Hamiltonian circuit helps fill the information gap and makes the data distribution more even, ultimately improving the predictive model's performance.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"92 ","pages":"Article 102711"},"PeriodicalIF":3.7,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144891889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning sensitivity of black phosphorene surface doped SnS, SnSe, GeS, and GeSe quantum dots toward water molecule and other small toxic molecules 黑色磷烯表面掺杂SnS、SnSe、GeS和GeSe量子点对水分子和其他小有毒分子的灵敏度调谐
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-21 DOI: 10.1016/j.jocs.2025.102707
Mamori Habiba , Moatassim Hajar , El Kenz Abdallah , Benyoussef Abdelilah , Taleb Abdelhafed , Abdel Ghafour El Hachimi , Zaari Halima
In this work, Density Functional Theory (DFT) was employed to investigate the impact of SnS, GeS, SnSe, and GeSe quantum dots doped black phosphorene on the sensitivity of black phosphorene toward various adsorbed gas molecules namely NO2 and H2S. The interaction of H2O molecule with doped black phosphorene surface is also investigated to evaluate the impact of humidity on the sensing response. The results revealed the large electronic changes in bands distribution upon exposure to the selected gas molecules, giving rise to a variation in the electronic band nature from hole to electron doping which can promote the electrical conductivity and the sensing properties of the doped phosphorene structures.
本文采用密度泛函理论(DFT)研究了掺杂SnS、GeS、SnSe和GeSe量子点对黑磷烯对不同吸附气体分子NO2和H2S的敏感性的影响。研究了水分子与掺杂黑磷烯表面的相互作用,以评估湿度对传感响应的影响。结果表明,暴露于所选气体分子后,能带分布发生了较大的电子变化,导致从空穴到电子掺杂的电子能带性质发生变化,从而提高了掺杂磷烯结构的电导率和传感性能。
{"title":"Tuning sensitivity of black phosphorene surface doped SnS, SnSe, GeS, and GeSe quantum dots toward water molecule and other small toxic molecules","authors":"Mamori Habiba ,&nbsp;Moatassim Hajar ,&nbsp;El Kenz Abdallah ,&nbsp;Benyoussef Abdelilah ,&nbsp;Taleb Abdelhafed ,&nbsp;Abdel Ghafour El Hachimi ,&nbsp;Zaari Halima","doi":"10.1016/j.jocs.2025.102707","DOIUrl":"10.1016/j.jocs.2025.102707","url":null,"abstract":"<div><div>In this work, Density Functional Theory (DFT) was employed to investigate the impact of SnS, GeS, SnSe, and GeSe quantum dots doped black phosphorene on the sensitivity of black phosphorene toward various adsorbed gas molecules namely NO<sub>2</sub> and H<sub>2</sub>S. The interaction of H<sub>2</sub>O molecule with doped black phosphorene surface is also investigated to evaluate the impact of humidity on the sensing response. The results revealed the large electronic changes in bands distribution upon exposure to the selected gas molecules, giving rise to a variation in the electronic band nature from hole to electron doping which can promote the electrical conductivity and the sensing properties of the doped phosphorene structures.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"92 ","pages":"Article 102707"},"PeriodicalIF":3.7,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144889883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making hierarchically aware decisions on short findings for automatic summarisation 为自动总结的简短发现做出层次分明的决策
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-16 DOI: 10.1016/j.jocs.2025.102692
Emrah Inan
An impression in a typical radiology report emphasises critical information by providing a conclusion and reasoning based on the findings. However, the findings and impression sections of these reports generally contain brief texts, as they highlight crucial observations derived from the clinical radiograph. In this scenario, abstractive summarisation models often experience a degradation in performance when generating short impressions. To address this challenge in the summarisation task, our work proposes a method that combines well-known fine-tuned text classification and abstractive summarisation language models. Since fine-tuning a language model requires an extensive, well-defined training dataset and is a time-consuming task dependent on high GPU resources, we employ prompt engineering, which uses prompt templates to programme language models and improve their performance. Our method first predicts whether the given findings text is normal or abnormal by leveraging a fine-tuned language model. Then, we apply a radiology-specific BART model to generate the summary for abnormal findings. In the zero-shot setting, our method achieves remarkable results compared to existing approaches on a real-world dataset. In particular, our method achieves scores of 37.43 for ROUGE-1, 21.72 for ROUGE-2, and 35.52 for ROUGE-L.
在典型的放射学报告中,印象通过提供结论和基于发现的推理来强调关键信息。然而,这些报告的发现和印象部分通常包含简短的文本,因为它们强调了来自临床x线片的关键观察结果。在这种情况下,抽象摘要模型在生成简短印象时通常会出现性能下降。为了解决摘要任务中的这一挑战,我们的工作提出了一种将众所周知的微调文本分类和抽象摘要语言模型相结合的方法。由于微调语言模型需要广泛的,定义良好的训练数据集,并且依赖于高GPU资源是一项耗时的任务,因此我们采用提示工程,它使用提示模板来编程语言模型并提高其性能。我们的方法首先通过利用一个微调的语言模型来预测给定的结果文本是正常的还是异常的。然后,我们应用放射学特异性BART模型来生成异常发现的摘要。在零射击设置中,与现有方法在真实数据集上相比,我们的方法取得了显着的结果。其中,ROUGE-1的得分为37.43分,ROUGE-2的得分为21.72分,ROUGE-L的得分为35.52分。
{"title":"Making hierarchically aware decisions on short findings for automatic summarisation","authors":"Emrah Inan","doi":"10.1016/j.jocs.2025.102692","DOIUrl":"10.1016/j.jocs.2025.102692","url":null,"abstract":"<div><div>An impression in a typical radiology report emphasises critical information by providing a conclusion and reasoning based on the findings. However, the findings and impression sections of these reports generally contain brief texts, as they highlight crucial observations derived from the clinical radiograph. In this scenario, abstractive summarisation models often experience a degradation in performance when generating short impressions. To address this challenge in the summarisation task, our work proposes a method that combines well-known fine-tuned text classification and abstractive summarisation language models. Since fine-tuning a language model requires an extensive, well-defined training dataset and is a time-consuming task dependent on high GPU resources, we employ prompt engineering, which uses prompt templates to programme language models and improve their performance. Our method first predicts whether the given findings text is normal or abnormal by leveraging a fine-tuned language model. Then, we apply a radiology-specific BART model to generate the summary for abnormal findings. In the zero-shot setting, our method achieves remarkable results compared to existing approaches on a real-world dataset. In particular, our method achieves scores of 37.43 for ROUGE-1, 21.72 for ROUGE-2, and 35.52 for ROUGE-L.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"91 ","pages":"Article 102692"},"PeriodicalIF":3.7,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144852163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solving the dopant diffusion dynamics with physics-informed neural networks 用物理信息神经网络求解掺杂剂扩散动力学
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-15 DOI: 10.1016/j.jocs.2025.102695
Sungyeop Lee , Jisu Ryu , Young-Gu Kim , Dae Sin Kim , Hiroo Koshimoto , Jaeshin Park
Simulation plays a crucial role in the semiconductor chip manufacturing. In particular, process simulation is primarily used to solve the dopant diffusion dynamics, which describes the temporal evolution of doping profiles during the thermal annealing process. The diffusion dynamics constitutes a multiscale problem, formulated as a set of coupled partial differential equations (PDEs) with respect to the concentration of dopants and point defects. In this paper, we demonstrate that Physics-Informed Neural Networks (PINNs) can accurately predict not only the evolution of the doping profile, but also the unknown physical parameters, specifically the diffusivities appearing as PDE coefficients. Furthermore, we propose a physics-informed calibration method, which performs PDE-constrained optimization by leveraging a pre-trained PINN model. We experimentally verify that this post-processing significantly improves the accuracy of coefficients fine-tuning. To the best of our knowledge, this is the first demonstration of an annealing simulation for the semiconductor diffusion process using a physics-informed machine learning approach. This framework is expected to enable more efficient calibration of simulation parameters based on measurement data.
仿真在半导体芯片制造中起着至关重要的作用。特别是,过程模拟主要用于求解掺杂扩散动力学,它描述了在热退火过程中掺杂分布的时间演变。扩散动力学是一个多尺度问题,它被表述为一组关于掺杂剂浓度和点缺陷的耦合偏微分方程(PDEs)。在本文中,我们证明了物理信息神经网络(pinn)不仅可以准确地预测掺杂谱的演变,而且可以准确地预测未知的物理参数,特别是以PDE系数形式出现的扩散系数。此外,我们提出了一种物理信息校准方法,该方法通过利用预训练的PINN模型执行pde约束优化。实验证明,这种后处理方法显著提高了系数微调的精度。据我们所知,这是第一次使用物理信息机器学习方法对半导体扩散过程进行退火模拟的演示。该框架有望使基于测量数据的仿真参数的更有效校准成为可能。
{"title":"Solving the dopant diffusion dynamics with physics-informed neural networks","authors":"Sungyeop Lee ,&nbsp;Jisu Ryu ,&nbsp;Young-Gu Kim ,&nbsp;Dae Sin Kim ,&nbsp;Hiroo Koshimoto ,&nbsp;Jaeshin Park","doi":"10.1016/j.jocs.2025.102695","DOIUrl":"10.1016/j.jocs.2025.102695","url":null,"abstract":"<div><div>Simulation plays a crucial role in the semiconductor chip manufacturing. In particular, process simulation is primarily used to solve the dopant diffusion dynamics, which describes the temporal evolution of doping profiles during the thermal annealing process. The diffusion dynamics constitutes a multiscale problem, formulated as a set of coupled partial differential equations (PDEs) with respect to the concentration of dopants and point defects. In this paper, we demonstrate that Physics-Informed Neural Networks (PINNs) can accurately predict not only the evolution of the doping profile, but also the unknown physical parameters, specifically the diffusivities appearing as PDE coefficients. Furthermore, we propose a physics-informed calibration method, which performs PDE-constrained optimization by leveraging a pre-trained PINN model. We experimentally verify that this post-processing significantly improves the accuracy of coefficients fine-tuning. To the best of our knowledge, this is the first demonstration of an annealing simulation for the semiconductor diffusion process using a physics-informed machine learning approach. This framework is expected to enable more efficient calibration of simulation parameters based on measurement data.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"92 ","pages":"Article 102695"},"PeriodicalIF":3.7,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Helium focused ion beam damage in silicon: Physics-informed neural network modeling of helium bubble nucleation and early growth 硅中氦聚焦离子束损伤:氦泡成核和早期生长的物理信息神经网络模型
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-13 DOI: 10.1016/j.jocs.2025.102696
Shupeng Gao , Qi Li , M.A. Gosalvez , Xi Lin , Yan Xing , Zaifa Zhou , Qianhuang Chen
Currently, the time and cost required to obtain large datasets limit the application of data-driven machine learning in nanoscale manufacturing. Here, we focus on predicting the nanoscale damage induced by helium focused ion beams (He-FIBs) on silicon substrates. We briefly review the most relevant atomistic defects and the partial differential equations (PDEs), or rate equations, that describe the mutual creation and annihilation of the defects, eventually leading to the amorphization of the substrate and, the nucleation and early growth of helium bubbles. The novelty comes from the use of a physics-informed neural network (PINN) to simulate quantitatively the evolution of the bubbles, thus bypassing the dataset availability problem. As usual, the proposed PINN learns the underlying physics through the incorporation of the residuals of the PDEs and corresponding Initial Conditions (ICs) and Boundary Conditions (BCs) in the network’s loss function. Meanwhile, the system of PDEs poses some challenges to the PINN modeling strategy. We find that (i) hard constraints need to be imposed on the network output in order to satisfy both BCs and ICs, (ii) all the inputs and outputs of the PINN need to be cautiously normalized to ensure convergence during training, and (iii) customized weights need to be carefully applied to all the PDE loss terms in order to balance their contributions, thus improving the accuracy of the PINN predictions. Once trained, the network achieves good prediction accuracy over the entire space-time domain for various ion beam energies and doses. Comparisons are provided against previous experiments and traditional numerical simulations, which are also implemented in this study using the Finite Difference Method (FDM). While the L2 relative errors for all collocated points remain below 10%, the accuracy of the PINN decreases at lower beam energies and larger ion doses, due to the presence of higher numerical gradients.
目前,获取大型数据集所需的时间和成本限制了数据驱动机器学习在纳米级制造中的应用。本文主要研究了氦聚焦离子束(He-FIBs)在硅衬底上引起的纳米级损伤。我们简要回顾了最相关的原子缺陷和描述缺陷相互产生和湮灭的偏微分方程(PDEs),或速率方程,最终导致衬底的非晶化,氦泡的成核和早期生长。新颖之处在于使用物理信息神经网络(PINN)来定量模拟气泡的演变,从而绕过了数据集可用性问题。与往常一样,所提出的PINN通过在网络损失函数中结合偏微分方程的残差以及相应的初始条件(ic)和边界条件(bc)来学习底层物理。同时,pde系统对PINN建模策略提出了一些挑战。我们发现(i)需要对网络输出施加硬约束,以同时满足bc和ic; (ii)需要谨慎地归一化PINN的所有输入和输出,以确保训练过程中的收敛性;(iii)需要仔细地对所有PDE损失项应用自定义权重,以平衡它们的贡献,从而提高PINN预测的准确性。经过训练后,该网络在整个时空域对不同离子束能量和剂量的预测精度较高。并与以往的实验和传统的数值模拟进行了比较,本研究也采用有限差分法(FDM)进行了数值模拟。虽然所有配位点的L2相对误差保持在10%以下,但由于存在较高的数值梯度,在较低的束流能量和较大的离子剂量下,PINN的精度降低。
{"title":"Helium focused ion beam damage in silicon: Physics-informed neural network modeling of helium bubble nucleation and early growth","authors":"Shupeng Gao ,&nbsp;Qi Li ,&nbsp;M.A. Gosalvez ,&nbsp;Xi Lin ,&nbsp;Yan Xing ,&nbsp;Zaifa Zhou ,&nbsp;Qianhuang Chen","doi":"10.1016/j.jocs.2025.102696","DOIUrl":"10.1016/j.jocs.2025.102696","url":null,"abstract":"<div><div>Currently, the time and cost required to obtain large datasets limit the application of data-driven machine learning in nanoscale manufacturing. Here, we focus on predicting the nanoscale damage induced by helium focused ion beams (He-FIBs) on silicon substrates. We briefly review the most relevant atomistic defects and the partial differential equations (PDEs), or rate equations, that describe the mutual creation and annihilation of the defects, eventually leading to the amorphization of the substrate and, the nucleation and early growth of helium bubbles. The novelty comes from the use of a physics-informed neural network (PINN) to simulate quantitatively the evolution of the bubbles, thus bypassing the dataset availability problem. As usual, the proposed PINN learns the underlying physics through the incorporation of the residuals of the PDEs and corresponding Initial Conditions (ICs) and Boundary Conditions (BCs) in the network’s loss function. Meanwhile, the system of PDEs poses some challenges to the PINN modeling strategy. We find that (i) hard constraints need to be imposed on the network output in order to satisfy both BCs and ICs, (ii) all the inputs and outputs of the PINN need to be cautiously normalized to ensure convergence during training, and (iii) customized weights need to be carefully applied to all the PDE loss terms in order to balance their contributions, thus improving the accuracy of the PINN predictions. Once trained, the network achieves good prediction accuracy over the entire space-time domain for various ion beam energies and doses. Comparisons are provided against previous experiments and traditional numerical simulations, which are also implemented in this study using the Finite Difference Method (FDM). While the L2 relative errors for all collocated points remain below 10%, the accuracy of the PINN decreases at lower beam energies and larger ion doses, due to the presence of higher numerical gradients.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"92 ","pages":"Article 102696"},"PeriodicalIF":3.7,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variational Bayes for analysis of masked data 基于变分贝叶斯的屏蔽数据分析
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-08 DOI: 10.1016/j.jocs.2025.102690
Himanshu Rai , Sanjeev K. Tomer
Bayesian competing risks analysis in presence of masked data often leads to an intractable posterior, for which Markov chain Monte Carlo (MCMC) methods are frequently utilized to evaluate various estimators of interest. However, while analyzing several risks simultaneously, MCMC methods may consume substantial amount of computation time. This paper introduces Variational Bayes, a machine learning technique, as an efficient alternative to MCMC for Bayesian analysis of competing risk data. Variational Bayes demonstrates faster convergence than MCMC, particularly in the context of extensive competing risk datasets. We compare the performance of variational Bayes over Gibbs sampling with respect to the number of considered risks through a simulation study. Additionally, we apply the two methods to analyze a real data set of computer hard drives.
贝叶斯竞争风险分析往往导致难以处理的后验,为此,马尔可夫链蒙特卡罗(MCMC)方法经常用于评估各种感兴趣的估计量。然而,当同时分析多个风险时,MCMC方法可能会消耗大量的计算时间。本文介绍了变分贝叶斯,一种机器学习技术,作为MCMC的有效替代,用于贝叶斯分析竞争风险数据。变分贝叶斯证明了比MCMC更快的收敛速度,特别是在广泛竞争的风险数据集的背景下。我们通过模拟研究比较了变分贝叶斯在吉布斯抽样中的表现,考虑了风险的数量。此外,我们将这两种方法应用于计算机硬盘驱动器的实际数据集分析。
{"title":"Variational Bayes for analysis of masked data","authors":"Himanshu Rai ,&nbsp;Sanjeev K. Tomer","doi":"10.1016/j.jocs.2025.102690","DOIUrl":"10.1016/j.jocs.2025.102690","url":null,"abstract":"<div><div>Bayesian competing risks analysis in presence of masked data often leads to an intractable posterior, for which Markov chain Monte Carlo (MCMC) methods are frequently utilized to evaluate various estimators of interest. However, while analyzing several risks simultaneously, MCMC methods may consume substantial amount of computation time. This paper introduces Variational Bayes, a machine learning technique, as an efficient alternative to MCMC for Bayesian analysis of competing risk data. Variational Bayes demonstrates faster convergence than MCMC, particularly in the context of extensive competing risk datasets. We compare the performance of variational Bayes over Gibbs sampling with respect to the number of considered risks through a simulation study. Additionally, we apply the two methods to analyze a real data set of computer hard drives.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"91 ","pages":"Article 102690"},"PeriodicalIF":3.7,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144842213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAHA: Binary artificial hummingbird algorithm for feature selection 用于特征选择的二元人工蜂鸟算法
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-31 DOI: 10.1016/j.jocs.2025.102686
Ali Hamdipour , Abdolali Basiri , Mostafa Zaare , Seyedali Mirjalili
Datasets classification accuracy depends on their features. The presence of irrelevant and redundant features in the dataset leads to the reduction of classification accuracy. Identifying and removing such features is the main purpose in feature selection, which is an important step in the data science lifecycle. The objective of the Wrapper feature selection method is to reduce the number of selected feature (NSF) while improving the classification accuracy by working on a set of features. The feature selection is a challenging and computationally expensive problem that falls under the NP-complete category, so it requires computationally cheap and efficient algorithm to solve it. The artificial hummingbird algorithm (AHA) is a biological inspired optimization technique that mimics the unique flight capabilities and intelligent foraging tactics of hummingbirds in nature. Since feature selection is inherently a binary problem. In this paper, the binary form of the AHA meta-heuristic algorithm is proposed to show that binarizing the AHA meta-heuristic algorithm improves its performance for solving feature selection problems. The proposed method is tested on a standard benchmark dataset and compared with four state-of-the-art feature selection algorithms: Automata-based improved equilibrium optimizer with U-shaped transfer function (AIEOU), Whale optimization approaches for wrapper feature selection (WOA-CM), Ring theory-based harmony search (RTHS), and Adaptive switching gray-whale optimizer (ASGW). The results show the effectiveness of the proposed algorithm in searching for optimal features subset. The source code for the algorithm being proposed is accessible to the public on https://github.com/alihamdipour/baha.
数据集的分类精度取决于它们的特征。数据集中不相关和冗余特征的存在会导致分类精度的降低。识别和删除这些特征是特征选择的主要目的,这是数据科学生命周期中的一个重要步骤。Wrapper特征选择方法的目标是减少被选特征(NSF)的数量,同时通过处理一组特征来提高分类精度。特征选择是一个具有挑战性且计算成本高的问题,属于np完全范畴,因此需要计算成本低且高效的算法来解决。人工蜂鸟算法(artificial hummingbird algorithm, AHA)是一种模拟自然界蜂鸟独特的飞行能力和智能觅食策略的仿生优化技术。因为特征选择本质上是一个二元问题。本文提出了AHA元启发式算法的二值化形式,表明对AHA元启发式算法进行二值化可以提高其解决特征选择问题的性能。该方法在标准基准数据集上进行了测试,并与四种最先进的特征选择算法进行了比较:基于自动机的u形传递函数改进均衡优化器(AIEOU)、用于包装特征选择的鲸鱼优化方法(WOA-CM)、基于环理论的和谐搜索(RTHS)和自适应切换灰鲸优化器(ASGW)。结果表明,该算法在搜索最优特征子集方面是有效的。该算法的源代码可以在https://github.com/alihamdipour/baha上公开。
{"title":"BAHA: Binary artificial hummingbird algorithm for feature selection","authors":"Ali Hamdipour ,&nbsp;Abdolali Basiri ,&nbsp;Mostafa Zaare ,&nbsp;Seyedali Mirjalili","doi":"10.1016/j.jocs.2025.102686","DOIUrl":"10.1016/j.jocs.2025.102686","url":null,"abstract":"<div><div>Datasets classification accuracy depends on their features. The presence of irrelevant and redundant features in the dataset leads to the reduction of classification accuracy. Identifying and removing such features is the main purpose in feature selection, which is an important step in the data science lifecycle. The objective of the Wrapper feature selection method is to reduce the number of selected feature (NSF) while improving the classification accuracy by working on a set of features. The feature selection is a challenging and computationally expensive problem that falls under the NP-complete category, so it requires computationally cheap and efficient algorithm to solve it. The artificial hummingbird algorithm (AHA) is a biological inspired optimization technique that mimics the unique flight capabilities and intelligent foraging tactics of hummingbirds in nature. Since feature selection is inherently a binary problem. In this paper, the binary form of the AHA meta-heuristic algorithm is proposed to show that binarizing the AHA meta-heuristic algorithm improves its performance for solving feature selection problems. The proposed method is tested on a standard benchmark dataset and compared with four state-of-the-art feature selection algorithms: Automata-based improved equilibrium optimizer with U-shaped transfer function (AIEOU), Whale optimization approaches for wrapper feature selection (WOA-CM), Ring theory-based harmony search (RTHS), and Adaptive switching gray-whale optimizer (ASGW). The results show the effectiveness of the proposed algorithm in searching for optimal features subset. The source code for the algorithm being proposed is accessible to the public on <span><span>https://github.com/alihamdipour/baha</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"92 ","pages":"Article 102686"},"PeriodicalIF":3.7,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anomaly detection and root cause analysis using convolutional autoencoders: A real case study 使用卷积自编码器的异常检测和根本原因分析:一个真实案例研究
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-30 DOI: 10.1016/j.jocs.2025.102685
Piero Danti , Alessandro Innocenti , Sascha Sandomier
Anomaly detection is the process of identifying unusual patterns in data that may indicate a deviation from the expected norm. This paper proposes a semi-supervised deep learning solution to detect anomalies of a YANMAR energy device that produces heat and power utilizing an internal combustion engine supplied with natural gas. The main equipment of the analysis is a 20 kWe micro-cogeneration unit installed in the energy plant of a facility school. More in detail, the dataset considered in this work consists of 12 features temporally acquired every 15 min. The authors exploit a deep learning architecture, an autoencoder with 1-D convolutional layers to retain temporal correlations, trained to learn the normal behavior of the cogenerator and report unseen operations. In consideration of the fact that autoencoders tend to yield false positives, a Fast-Fourier-Transform-based technique has been applied to filter spurious detections and improve the algorithm’s robustness. As the last contribution, a naive methodology to address the root cause of the anomalies has been explained and its effectiveness has been proved in a real malfunctioning of the CHP.
异常检测是识别数据中可能表明偏离预期规范的异常模式的过程。本文提出了一种半监督深度学习解决方案,用于检测利用天然气内燃机产生热量和电力的洋马能源设备的异常情况。分析的主要设备是安装在某设施学校能源厂的一台20千瓦的微型热电联产机组。更详细地说,这项工作中考虑的数据集由每15分钟临时获取的12个特征组成。作者利用深度学习架构,一个具有1-D卷积层的自动编码器来保持时间相关性,训练以学习共同生成器的正常行为并报告未见的操作。考虑到自编码器容易产生假阳性的事实,采用基于快速傅立叶变换的技术来过滤假检测,提高算法的鲁棒性。作为最后的贡献,我们解释了一种幼稚的方法来解决异常的根本原因,并在一次实际的CHP故障中证明了其有效性。
{"title":"Anomaly detection and root cause analysis using convolutional autoencoders: A real case study","authors":"Piero Danti ,&nbsp;Alessandro Innocenti ,&nbsp;Sascha Sandomier","doi":"10.1016/j.jocs.2025.102685","DOIUrl":"10.1016/j.jocs.2025.102685","url":null,"abstract":"<div><div>Anomaly detection is the process of identifying unusual patterns in data that may indicate a deviation from the expected norm. This paper proposes a semi-supervised deep learning solution to detect anomalies of a YANMAR energy device that produces heat and power utilizing an internal combustion engine supplied with natural gas. The main equipment of the analysis is a 20 <span><math><mrow><mi>k</mi><msub><mrow><mi>W</mi></mrow><mrow><mi>e</mi></mrow></msub></mrow></math></span> micro-cogeneration unit installed in the energy plant of a facility school. More in detail, the dataset considered in this work consists of 12 features temporally acquired every 15 min. The authors exploit a deep learning architecture, an autoencoder with 1-D convolutional layers to retain temporal correlations, trained to learn the normal behavior of the cogenerator and report unseen operations. In consideration of the fact that autoencoders tend to yield false positives, a Fast-Fourier-Transform-based technique has been applied to filter spurious detections and improve the algorithm’s robustness. As the last contribution, a naive methodology to address the root cause of the anomalies has been explained and its effectiveness has been proved in a real malfunctioning of the CHP.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"91 ","pages":"Article 102685"},"PeriodicalIF":3.7,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144739414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved K-means algorithm based on persistent homology 基于持久同源性的改进K-means算法
IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-29 DOI: 10.1016/j.jocs.2025.102680
NingNing Peng, Shanjunshu Gao, Xingzi Yin, Xueyan Zhan
The traditional K-means algorithm has several limitations, including sensitivity to initial center, unstable clustering results, local optimal clustering results, and a large number of iterations. In this paper, we propose an improved clustering algorithm called PH-K-means that utilizes the persistent homology to identify k clusters in the data set. The algorithm calculates the length of the longest Betti number to obtain k Betti numbers, which represent the k clusters respectively. The data is then output in k Betty numbers, and the average value of the data in each Betti number is used as the initialization center of k clusters. The algorithm iterates until the difference of the square sum of the errors in the adjacent two clusters is less than the threshold value. The PH-K-means algorithm is tested on seven common data sets, and the results show that it has high accuracy, stable clustering results, and requires fewer iterations than traditional K-means, K-means++, UK-means, and K-means algorithms.
传统的K-means算法存在对初始中心敏感、聚类结果不稳定、聚类结果局部最优、迭代量大等缺点。在本文中,我们提出了一种改进的聚类算法,称为PH-K-means,它利用持久同源性来识别数据集中的k个聚类。算法计算最长Betti数的长度,得到k个Betti数,分别代表k个聚类。然后以k个贝蒂数输出数据,每个贝蒂数中数据的平均值作为k个簇的初始化中心。算法迭代,直到相邻两个聚类的误差平方和之差小于阈值。在7个常用数据集上对PH-K-means算法进行了测试,结果表明,与传统的K-means、k -means++、UK-means和K-means算法相比,PH-K-means算法具有精度高、聚类结果稳定、迭代次数少等优点。
{"title":"An improved K-means algorithm based on persistent homology","authors":"NingNing Peng,&nbsp;Shanjunshu Gao,&nbsp;Xingzi Yin,&nbsp;Xueyan Zhan","doi":"10.1016/j.jocs.2025.102680","DOIUrl":"10.1016/j.jocs.2025.102680","url":null,"abstract":"<div><div>The traditional K-means algorithm has several limitations, including sensitivity to initial center, unstable clustering results, local optimal clustering results, and a large number of iterations. In this paper, we propose an improved clustering algorithm called PH-K-means that utilizes the persistent homology to identify k clusters in the data set. The algorithm calculates the length of the longest Betti number to obtain k Betti numbers, which represent the k clusters respectively. The data is then output in k Betty numbers, and the average value of the data in each Betti number is used as the initialization center of k clusters. The algorithm iterates until the difference of the square sum of the errors in the adjacent two clusters is less than the threshold value. The PH-K-means algorithm is tested on seven common data sets, and the results show that it has high accuracy, stable clustering results, and requires fewer iterations than traditional K-means, K-means++, UK-means, and K-means algorithms.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"91 ","pages":"Article 102680"},"PeriodicalIF":3.7,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Computational Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1