首页 > 最新文献

Digital discovery最新文献

英文 中文
Challenges for error-correction coding in DNA data storage: photolithographic synthesis and DNA decay† DNA数据存储中纠错编码的挑战:光刻合成和DNA衰变
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-18 DOI: 10.1039/D4DD00220B
Andreas L. Gimpel, Wendelin J. Stark, Reinhard Heckel and Robert N. Grass

Efficient error-correction codes are crucial for realizing DNA's potential as a long-lasting, high-density storage medium for digital data. At the same time, new workflows promising low-cost, resilient DNA data storage are challenging their design and error-correcting capabilities. This study characterizes the errors and biases in two new additions to the state-of-the-art workflow in DNA data storage: photolithographic synthesis and DNA decay. Photolithographic synthesis offers low-cost, scalable oligonucleotide synthesis but suffers from high error rates, necessitating sophisticated error-correction schemes, for example codes introducing within-sequence redundancy combined with clustering and alignment techniques for retrieval. On the other hand, the decoding of oligo fragments after DNA decay promises unprecedented storage densities, but complicates data recovery by requiring the reassembly of full-length sequences or the use of partial sequences for decoding. Our analysis provides a detailed account of the error patterns and biases present in photolithographic synthesis and DNA decay, and identifies considerable bias stemming from sequencing workflows. We implement our findings into a digital twin of the two workflows, offering a tool for developing error-correction codes and providing benchmarks for the evaluation of codec performance.

高效的纠错码对于实现DNA作为数字数据持久、高密度存储介质的潜力至关重要。与此同时,低成本、弹性DNA数据存储的新工作流程正在挑战它们的设计和纠错能力。本研究描述了DNA数据存储中两个新添加的最新工作流程中的错误和偏差:光刻合成和DNA衰变。光刻合成提供了低成本、可扩展的寡核苷酸合成,但存在高错误率,需要复杂的纠错方案,例如引入序列内冗余的代码,结合聚类和比对技术进行检索。另一方面,DNA衰变后寡核苷酸片段的解码承诺了前所未有的存储密度,但由于需要重新组装全长序列或使用部分序列进行解码,使得数据恢复变得复杂。我们的分析提供了光刻合成和DNA衰变中存在的错误模式和偏差的详细说明,并确定了源于测序工作流程的相当大的偏差。我们将我们的发现应用到两个工作流程的数字孪生中,提供了开发纠错码的工具,并提供了评估编解码器性能的基准。
{"title":"Challenges for error-correction coding in DNA data storage: photolithographic synthesis and DNA decay†","authors":"Andreas L. Gimpel, Wendelin J. Stark, Reinhard Heckel and Robert N. Grass","doi":"10.1039/D4DD00220B","DOIUrl":"https://doi.org/10.1039/D4DD00220B","url":null,"abstract":"<p >Efficient error-correction codes are crucial for realizing DNA's potential as a long-lasting, high-density storage medium for digital data. At the same time, new workflows promising low-cost, resilient DNA data storage are challenging their design and error-correcting capabilities. This study characterizes the errors and biases in two new additions to the state-of-the-art workflow in DNA data storage: photolithographic synthesis and DNA decay. Photolithographic synthesis offers low-cost, scalable oligonucleotide synthesis but suffers from high error rates, necessitating sophisticated error-correction schemes, for example codes introducing within-sequence redundancy combined with clustering and alignment techniques for retrieval. On the other hand, the decoding of oligo fragments after DNA decay promises unprecedented storage densities, but complicates data recovery by requiring the reassembly of full-length sequences or the use of partial sequences for decoding. Our analysis provides a detailed account of the error patterns and biases present in photolithographic synthesis and DNA decay, and identifies considerable bias stemming from sequencing workflows. We implement our findings into a digital twin of the two workflows, offering a tool for developing error-correction codes and providing benchmarks for the evaluation of codec performance.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2497-2508"},"PeriodicalIF":6.2,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00220b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spectra to structure: contrastive learning framework for library ranking and generating molecular structures for infrared spectra† 光谱结构:对比学习框架库排名和生成分子结构的红外光谱†
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-17 DOI: 10.1039/D4DD00135D
Ganesh Chandan Kanakala, Bhuvanesh Sridharan and U. Deva Priyakumar

Inferring complete molecular structure from infrared (IR) spectra is a challenging task. In this work, we propose SMEN (Spectra and Molecule Encoder Network), a framework for scoring molecules against given IR spectra. The proposed framework uses contrastive optimization to obtain similar embedding for a molecule and its spectra. For this study, we consider the QM9 dataset with molecules consisting of less than 9 heavy atoms and obtain simulated spectra. Using the proposed method, we can rank the molecules using embedding similarity and obtain a Top 1 accuracy of ∼81%, Top 3 accuracy of ∼96%, and Top 10 accuracy of ∼99% on the evaluation set. We extend SMEN to build a generative transformer for a direct molecule prediction from IR spectra. The proposed method can significantly help molecule library ranking tasks and aid the problem of inferring molecular structures from spectra.

从红外光谱推断完整的分子结构是一项具有挑战性的任务。在这项工作中,我们提出了光谱和分子编码器网络(光谱和分子编码器网络),这是一个根据给定的红外光谱对分子进行评分的框架。所提出的框架使用对比优化来获得分子及其光谱的相似嵌入。在本研究中,我们考虑了QM9数据集的分子组成少于9个重原子,并获得了模拟光谱。使用所提出的方法,我们可以使用嵌入相似度对分子进行排序,并在评估集上获得Top 1精度为~ 81%,Top 3精度为~ 96%,Top 10精度为~ 99%。我们扩展了sman,建立了一个生成变压器,用于从红外光谱直接预测分子。该方法对分子库排序任务和从光谱推断分子结构的问题具有重要的帮助。
{"title":"Spectra to structure: contrastive learning framework for library ranking and generating molecular structures for infrared spectra†","authors":"Ganesh Chandan Kanakala, Bhuvanesh Sridharan and U. Deva Priyakumar","doi":"10.1039/D4DD00135D","DOIUrl":"https://doi.org/10.1039/D4DD00135D","url":null,"abstract":"<p >Inferring complete molecular structure from infrared (IR) spectra is a challenging task. In this work, we propose SMEN (Spectra and Molecule Encoder Network), a framework for scoring molecules against given IR spectra. The proposed framework uses contrastive optimization to obtain similar embedding for a molecule and its spectra. For this study, we consider the QM9 dataset with molecules consisting of less than 9 heavy atoms and obtain simulated spectra. Using the proposed method, we can rank the molecules using embedding similarity and obtain a Top 1 accuracy of ∼81%, Top 3 accuracy of ∼96%, and Top 10 accuracy of ∼99% on the evaluation set. We extend SMEN to build a generative transformer for a direct molecule prediction from IR spectra. The proposed method can significantly help molecule library ranking tasks and aid the problem of inferring molecular structures from spectra.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2417-2423"},"PeriodicalIF":6.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00135d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of object detection and action recognition toward automated recognition of chemical experiments† 目标检测与动作识别在化学实验自动识别中的应用
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-16 DOI: 10.1039/D4DD00015C
Ryosuke Sasaki, Mikito Fujinami and Hiromi Nakai

Developments in deep learning-based computer vision technology have significantly improved the performance of applied research. The use of image recognition methods to manually conduct chemical experiments is promising for digitizing traditional practices in terms of experimental recording, hazard management, and educational applications. This study investigated the feasibility of automatically recognizing manual chemical experiments using recent image recognition technology. Both object detection and action recognition were evaluated, that is, the identification of the locations and types of objects in images and the inference of human actions in videos. The image and video datasets for the chemical experiments were originally constructed by capturing scenes from actual organic chemistry laboratories. The assessment of inference accuracy indicates that image recognition methods can effectively detect chemical apparatuses and classify manipulations in experiments.

基于深度学习的计算机视觉技术的发展极大地提高了应用研究的性能。在实验记录、危害管理和教育应用方面,使用图像识别方法手动进行化学实验对于数字化传统实践是有希望的。本研究探讨了利用最新的图像识别技术自动识别人工化学实验的可行性。评估了物体检测和动作识别,即图像中物体的位置和类型的识别以及视频中人类动作的推断。化学实验的图像和视频数据集最初是通过捕获实际有机化学实验室的场景构建的。对推理精度的评估表明,图像识别方法可以有效地检测化学仪器,并对实验中的操作进行分类。
{"title":"Application of object detection and action recognition toward automated recognition of chemical experiments†","authors":"Ryosuke Sasaki, Mikito Fujinami and Hiromi Nakai","doi":"10.1039/D4DD00015C","DOIUrl":"https://doi.org/10.1039/D4DD00015C","url":null,"abstract":"<p >Developments in deep learning-based computer vision technology have significantly improved the performance of applied research. The use of image recognition methods to manually conduct chemical experiments is promising for digitizing traditional practices in terms of experimental recording, hazard management, and educational applications. This study investigated the feasibility of automatically recognizing manual chemical experiments using recent image recognition technology. Both object detection and action recognition were evaluated, that is, the identification of the locations and types of objects in images and the inference of human actions in videos. The image and video datasets for the chemical experiments were originally constructed by capturing scenes from actual organic chemistry laboratories. The assessment of inference accuracy indicates that image recognition methods can effectively detect chemical apparatuses and classify manipulations in experiments.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2458-2464"},"PeriodicalIF":6.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00015c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In situ synthesis within micron-sized hydrogel reactors created via programmable aerosol chemistry† 通过可编程气溶胶化学创建的微米级水凝胶反应器中的原位合成
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-15 DOI: 10.1039/D4DD00139G
Luokun Zhang and S. Hessam M. Mehr

Recent progress in materials science and complex chemical systems has highlighted the critical role of containers in directing and modulating reactivity. Micron-sized reactors are especially attractive due to their significantly different surface/volume ratios compared to traditional laboratory glassware, while still providing high experimental throughput and being easily observable using optical microscopy. Despite their promise, there is a gap in adapting chemical synthesis protocols to work within microspheres. We demonstrate a programmable aerosol chemistry setup that automates the generation of calcium alginate microspheres and allows them to be used as micro-reactors for exploration of chemical reactivity. A range of reactions can be adapted for in situ synthesis within the forming microspheres by pre-loading the precursor solutions with solid and soluble reagents, exemplified by our preparation of Prussian blue and quinhydrone. The micro-reactors are permeable, allowing rapid uptake and release of small molecule reagents and products. Larger particles trapped within the calcium alginate matrix can also be released, triggered via rapid disassembly of the microspheres in response to calcium binders like EDTA. As our standard programmable apparatus is extensible to broad reagent types and reaction stoichiometries, we expect that its adoption will accelerate exploration of chemical reactivity and discovery within micro-reactors.

材料科学和复杂化学系统的最新进展突出了容器在指导和调节反应性方面的关键作用。与传统的实验室玻璃器皿相比,微米尺寸的反应器尤其具有吸引力,因为它们的表面/体积比显着不同,同时仍然提供高实验吞吐量,并且易于使用光学显微镜观察。尽管它们很有希望,但在调整化学合成方案以在微球内工作方面存在差距。我们演示了一个可编程的气溶胶化学装置,它可以自动生成海藻酸钙微球,并允许它们用作微反应器来探索化学反应性。通过在前驱体溶液中预加载固体和可溶性试剂,可以在形成的微球内进行一系列的原位合成反应,例如我们制备的普鲁士蓝和醌氢酮。微反应器是可渗透的,允许小分子试剂和产物的快速吸收和释放。捕获在海藻酸钙基质中的较大颗粒也可以被释放,通过响应EDTA等钙粘合剂的微球快速分解来触发。由于我们的标准可编程设备可扩展到广泛的试剂类型和反应化学计量学,我们期望它的采用将加速对微反应器中化学反应性和发现的探索。
{"title":"In situ synthesis within micron-sized hydrogel reactors created via programmable aerosol chemistry†","authors":"Luokun Zhang and S. Hessam M. Mehr","doi":"10.1039/D4DD00139G","DOIUrl":"https://doi.org/10.1039/D4DD00139G","url":null,"abstract":"<p >Recent progress in materials science and complex chemical systems has highlighted the critical role of containers in directing and modulating reactivity. Micron-sized reactors are especially attractive due to their significantly different surface/volume ratios compared to traditional laboratory glassware, while still providing high experimental throughput and being easily observable using optical microscopy. Despite their promise, there is a gap in adapting chemical synthesis protocols to work within microspheres. We demonstrate a programmable aerosol chemistry setup that automates the generation of calcium alginate microspheres and allows them to be used as micro-reactors for exploration of chemical reactivity. A range of reactions can be adapted for <em>in situ</em> synthesis within the forming microspheres by pre-loading the precursor solutions with solid and soluble reagents, exemplified by our preparation of Prussian blue and quinhydrone. The micro-reactors are permeable, allowing rapid uptake and release of small molecule reagents and products. Larger particles trapped within the calcium alginate matrix can also be released, triggered <em>via</em> rapid disassembly of the microspheres in response to calcium binders like EDTA. As our standard programmable apparatus is extensible to broad reagent types and reaction stoichiometries, we expect that its adoption will accelerate exploration of chemical reactivity and discovery within micro-reactors.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2424-2433"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00139g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sorting polyolefins with near-infrared spectroscopy: identification of optimal data analysis pipelines and machine learning classifiers†‡ 利用近红外光谱分拣聚烯烃:确定最佳数据分析管道和机器学习分类器†‡
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-15 DOI: 10.1039/D4DD00235K
Bradley P. Sutliff, Peter A. Beaucage, Debra J. Audus, Sara V. Orski and Tyler B. Martin

Polyolefins (POs) are the largest class of polymers produced worldwide. Despite the intrinsic chemical similarities within this class of polymers, they are often physically incompatible. This combination presents a significant hurdle for high-throughput recycling systems that strive to sort various types of plastics from one another. Some research has been done to show that near-infrared spectroscopy (NIR) can sort POs from other plastics, but they generally fall short of sorting POs from one another. In this work, we enhance NIR spectroscopy-based sortation by screening over 12 000 machine-learning pipelines to enable sorting of PO species beyond what is possible using current NIR databases. These pipelines include a series of scattering corrections, filtering and differentiation, data scaling, dimensionality reduction, and machine learning classifiers. Common scattering corrections and preprocessing steps include scatter correction, linear detrending, and Savitzky–Golay filtering. Dimensionality reduction techniques such as principal component analysis (PCA), functional principal component analysis (fPCA) and uniform manifold approximation and projection (UMAP) were also investigated for classification enhancements. This analysis of preprocessing steps and classification algorithm combinations identified multiple data pipelines capable of successfully sorting PO materials with over 95% accuracy. Through rigorous testing, this study provides recommendations for consistently applying preprocessing and classification techniques without over-complicating the data analysis. This work also provides a set of preprocessing steps, a chosen classifier, and tuned hyperparameters that may be useful for benchmarking new models and data sets. Finally, the approach outlined here is ready to be applied by the developers of materials sortation equipment so that we can improve the value and purity of recycled plastic waste streams.

聚烯烃 (PO) 是全球生产的最大一类聚合物。尽管这一类聚合物在化学性质上具有内在的相似性,但它们在物理上往往是不相容的。这种组合给高通量回收系统带来了巨大的障碍,因为该系统需要对各种类型的塑料进行分类。已有一些研究表明,近红外光谱(NIR)可以将 PO 从其他塑料中分拣出来,但它们通常无法将 PO 从彼此中分拣出来。在这项工作中,我们通过筛选超过 12000 个机器学习管道,增强了基于近红外光谱的分拣能力,从而实现了超出现有近红外数据库所能实现的 PO 种类分拣。这些管道包括一系列散射校正、过滤和区分、数据缩放、降维以及机器学习分类器。常见的散射校正和预处理步骤包括散射校正、线性去趋势和萨维茨基-戈莱滤波。此外,还研究了主成分分析(PCA)、功能主成分分析(fPCA)和均匀流形逼近与投影(UMAP)等降维技术,以提高分类能力。通过对预处理步骤和分类算法组合的分析,确定了多个数据管道,能够成功地对 PO 材料进行分类,准确率超过 95%。通过严格的测试,本研究为在不使数据分析过于复杂的情况下持续应用预处理和分类技术提供了建议。这项工作还提供了一套预处理步骤、一个选定的分类器和经过调整的超参数,这些可能对新模型和数据集的基准测试有用。最后,材料分类设备的开发人员可以随时应用本文概述的方法,从而提高回收塑料废物流的价值和纯度。
{"title":"Sorting polyolefins with near-infrared spectroscopy: identification of optimal data analysis pipelines and machine learning classifiers†‡","authors":"Bradley P. Sutliff, Peter A. Beaucage, Debra J. Audus, Sara V. Orski and Tyler B. Martin","doi":"10.1039/D4DD00235K","DOIUrl":"https://doi.org/10.1039/D4DD00235K","url":null,"abstract":"<p >Polyolefins (POs) are the largest class of polymers produced worldwide. Despite the intrinsic chemical similarities within this class of polymers, they are often physically incompatible. This combination presents a significant hurdle for high-throughput recycling systems that strive to sort various types of plastics from one another. Some research has been done to show that near-infrared spectroscopy (NIR) can sort POs from other plastics, but they generally fall short of sorting POs from one another. In this work, we enhance NIR spectroscopy-based sortation by screening over 12 000 machine-learning pipelines to enable sorting of PO species beyond what is possible using current NIR databases. These pipelines include a series of scattering corrections, filtering and differentiation, data scaling, dimensionality reduction, and machine learning classifiers. Common scattering corrections and preprocessing steps include scatter correction, linear detrending, and Savitzky–Golay filtering. Dimensionality reduction techniques such as principal component analysis (PCA), functional principal component analysis (fPCA) and uniform manifold approximation and projection (UMAP) were also investigated for classification enhancements. This analysis of preprocessing steps and classification algorithm combinations identified multiple data pipelines capable of successfully sorting PO materials with over 95% accuracy. Through rigorous testing, this study provides recommendations for consistently applying preprocessing and classification techniques without over-complicating the data analysis. This work also provides a set of preprocessing steps, a chosen classifier, and tuned hyperparameters that may be useful for benchmarking new models and data sets. Finally, the approach outlined here is ready to be applied by the developers of materials sortation equipment so that we can improve the value and purity of recycled plastic waste streams.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2341-2355"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00235k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High accuracy uncertainty-aware interatomic force modeling with equivariant Bayesian neural networks† 利用等变贝叶斯神经网络进行高精度不确定性感知原子力建模†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-15 DOI: 10.1039/D4DD00183D
Tim Rensmeyer, Ben Craig, Denis Kramer and Oliver Niggemann

Ab initio molecular dynamics simulations of material properties have become a cornerstone in the development of novel materials for a wide range of applications such as battery technology and catalysis. Unfortunately, their high computational demand can make them unsuitable in many applications. Consequently, surrogate modeling via neural networks has become an active field of research. Two of the major obstacles to their practical application in many cases are assessing the reliability of the neural network predictions and the difficulty of generating suitable datasets to train the neural network in the first place. Bayesian neural networks offer a promising framework for modeling uncertainty, active learning and improving data efficiency and robustness by incorporating prior physical knowledge. However, due to the high computational demand and slow convergence of the gold standard approach of Monte Carlo Markov Chain (MCMC) sampling methods, variational inference via Monte Carlo dropout is currently the only sampling method successfully applied in this domain. Since MCMC methods have often displayed a superior quality in their uncertainty quantification, developing a suitable MCMC method in this domain would be a significant advance in making neural network-based molecular dynamics simulations more practically viable. In this paper, we demonstrate that convergence for state-of-the-art models with high-quality MCMC methods can still be achieved in a practical amount of time by introducing a novel parameter-specific adaptive step size scheme. In addition, we introduce a new stochastic neural network model based on the NequIP architecture and demonstrate that, when combined with our novel sampling algorithm, we obtain predictions with state-of-the-art accuracy as well as a significantly improved measure of uncertainty over Monte Carlo dropout. Lastly, we show that the proposed algorithm can even outperform deep ensembles while sampling from a single Markov chain.

材料特性的 Ab initio 分子动力学模拟已成为开发新型材料的基石,广泛应用于电池技术和催化等领域。遗憾的是,由于计算量大,在许多应用中并不适用。因此,通过神经网络进行代用建模已成为一个活跃的研究领域。在许多情况下,神经网络实际应用的两个主要障碍是评估神经网络预测的可靠性,以及难以首先生成合适的数据集来训练神经网络。贝叶斯神经网络为不确定性建模、主动学习以及通过结合先验物理知识提高数据效率和鲁棒性提供了一个前景广阔的框架。然而,由于蒙特卡洛马尔可夫链(MCMC)采样方法这一黄金标准方法的计算需求高、收敛速度慢,通过蒙特卡洛剔除进行变分推理是目前成功应用于这一领域的唯一采样方法。由于 MCMC 方法通常在不确定性量化方面表现出更高的质量,因此在这一领域开发一种合适的 MCMC 方法将是一项重大进步,可使基于神经网络的分子动力学模拟更加切实可行。在本文中,我们通过引入新颖的特定参数自适应步长方案,证明了高质量 MCMC 方法仍可在实际时间内实现最先进模型的收敛。此外,我们还引入了基于 NequIP 架构的新型随机神经网络模型,并证明结合我们的新型采样算法,我们不仅能获得最先进的预测精度,还能显著改善蒙特卡罗遗漏的不确定性度量。最后,我们证明了所提出的算法甚至可以在从单个马尔可夫链采样时超越深度集合。
{"title":"High accuracy uncertainty-aware interatomic force modeling with equivariant Bayesian neural networks†","authors":"Tim Rensmeyer, Ben Craig, Denis Kramer and Oliver Niggemann","doi":"10.1039/D4DD00183D","DOIUrl":"https://doi.org/10.1039/D4DD00183D","url":null,"abstract":"<p > <em>Ab initio</em> molecular dynamics simulations of material properties have become a cornerstone in the development of novel materials for a wide range of applications such as battery technology and catalysis. Unfortunately, their high computational demand can make them unsuitable in many applications. Consequently, surrogate modeling <em>via</em> neural networks has become an active field of research. Two of the major obstacles to their practical application in many cases are assessing the reliability of the neural network predictions and the difficulty of generating suitable datasets to train the neural network in the first place. Bayesian neural networks offer a promising framework for modeling uncertainty, active learning and improving data efficiency and robustness by incorporating prior physical knowledge. However, due to the high computational demand and slow convergence of the gold standard approach of Monte Carlo Markov Chain (MCMC) sampling methods, variational inference <em>via</em> Monte Carlo dropout is currently the only sampling method successfully applied in this domain. Since MCMC methods have often displayed a superior quality in their uncertainty quantification, developing a suitable MCMC method in this domain would be a significant advance in making neural network-based molecular dynamics simulations more practically viable. In this paper, we demonstrate that convergence for state-of-the-art models with high-quality MCMC methods can still be achieved in a practical amount of time by introducing a novel parameter-specific adaptive step size scheme. In addition, we introduce a new stochastic neural network model based on the NequIP architecture and demonstrate that, when combined with our novel sampling algorithm, we obtain predictions with state-of-the-art accuracy as well as a significantly improved measure of uncertainty over Monte Carlo dropout. Lastly, we show that the proposed algorithm can even outperform deep ensembles while sampling from a single Markov chain.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2356-2366"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00183d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence-enabled optimization of battery-grade lithium carbonate production† 人工智能优化电池级碳酸锂的生产†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-14 DOI: 10.1039/D4DD00159A
S. Shayan Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Samuel Shi, Maria J. Gendron Romero, Jason E. Hein and Jason Hattrick-Simpers

By 2035, the need for battery-grade lithium is expected to quadruple. About half of this lithium is currently sourced from brines and must be converted from lithium chloride into lithium carbonate (Li2CO3) through a process called softening. Conventional softening methods using sodium or potassium salts contribute to carbon emissions during reagent mining and battery manufacturing, exacerbating global warming. This study introduces an alternative approach using carbon dioxide (CO2(g)) as the carbonating reagent in the lithium softening process, offering a carbon capture solution. We employed an active learning-driven high-throughput method to rapidly capture CO2(g) and convert it to lithium carbonate. The model was simplified by focusing on the elemental concentrations of C, Li, and N for practical measurement and tracking, avoiding the complexities of ion speciation equilibria. This approach led to an optimized lithium carbonate process that capitalizes on CO2(g) capture and improves the battery metal supply chain's carbon efficiency.

到 2035 年,对电池级锂的需求预计将翻两番。目前,约有一半的锂来自卤水,必须通过软化过程将氯化锂转化为碳酸锂(Li2CO3)。使用钠盐或钾盐的传统软化方法会在试剂开采和电池制造过程中造成碳排放,加剧全球变暖。本研究介绍了一种替代方法,即在锂软化过程中使用二氧化碳(CO2(g))作为碳化试剂,提供碳捕获解决方案。我们采用了一种主动学习驱动的高通量方法来快速捕获 CO2(g)并将其转化为碳酸锂。为了便于实际测量和跟踪,我们对模型进行了简化,将重点放在 C、Li 和 N 的元素浓度上,避免了复杂的离子标示平衡。这种方法优化了碳酸锂工艺,充分利用了二氧化碳(g)的捕获,提高了电池金属供应链的碳效率。
{"title":"Artificial intelligence-enabled optimization of battery-grade lithium carbonate production†","authors":"S. Shayan Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Samuel Shi, Maria J. Gendron Romero, Jason E. Hein and Jason Hattrick-Simpers","doi":"10.1039/D4DD00159A","DOIUrl":"https://doi.org/10.1039/D4DD00159A","url":null,"abstract":"<p >By 2035, the need for battery-grade lithium is expected to quadruple. About half of this lithium is currently sourced from brines and must be converted from lithium chloride into lithium carbonate (Li<small><sub>2</sub></small>CO<small><sub>3</sub></small>) through a process called softening. Conventional softening methods using sodium or potassium salts contribute to carbon emissions during reagent mining and battery manufacturing, exacerbating global warming. This study introduces an alternative approach using carbon dioxide (CO<small><sub>2(g)</sub></small>) as the carbonating reagent in the lithium softening process, offering a carbon capture solution. We employed an active learning-driven high-throughput method to rapidly capture CO<small><sub>2(g)</sub></small> and convert it to lithium carbonate. The model was simplified by focusing on the elemental concentrations of C, Li, and N for practical measurement and tracking, avoiding the complexities of ion speciation equilibria. This approach led to an optimized lithium carbonate process that capitalizes on CO<small><sub>2(g)</sub></small> capture and improves the battery metal supply chain's carbon efficiency.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2320-2326"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00159a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing 更正:只需一个微笑:利用自然语言处理从 SMILES 预测极限活动系数。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-14 DOI: 10.1039/D4DD90045F
Benedikt Winter, Clemens Winter, Johannes Schilling and André Bardow

Correction for ‘A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing’ by Benedikt Winter et al., Digital Discovery, 2022, 1, 859–869, https://doi.org/10.1039/D2DD00058J.

[此处更正了文章 DOI:10.1039/D2DD00058J]。
{"title":"Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing","authors":"Benedikt Winter, Clemens Winter, Johannes Schilling and André Bardow","doi":"10.1039/D4DD90045F","DOIUrl":"10.1039/D4DD90045F","url":null,"abstract":"<p >Correction for ‘A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing’ by Benedikt Winter <em>et al.</em>, <em>Digital Discovery</em>, 2022, <strong>1</strong>, 859–869, https://doi.org/10.1039/D2DD00058J.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2384-2384"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embedding DNA-based natural language in microbes for the benefit of future researchers† 在微生物中植入基于 DNA 的自然语言,造福于未来的研究人员†。
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-14 DOI: 10.1039/D4DD00251B
Heqian Zhang, Jiaquan Huang, Xiaoyu Wang, Zhizeng Gao, Song Meng, Hang Li, Shanshan Zhou, Shang Wang, Shan Wang, Xunyou Yan, Xinwei Yang, Xiaoluo Huang and Zhiwei Qin

Microorganisms are valuable resources as antibiotic producers, biocontrol agents, and symbiotic agents in various ecosystems and organisms. Over the past decades, there has been a notable increase in the identification and generation of both wild-type and genetically modified microbial strains from research laboratories worldwide. However, a substantial portion of the information represented in these strains remains scattered across the scientific literature. To facilitate the work of future researchers, in this perspective article, we advocate the adoption of the DNA-based natural language (DBNL) algorithm standard and then demonstrate it using a Streptomyces species as a proof of concept. This standard enables the sophisticated genome sequencing and subsequent extraction of valuable information encoded within a particular microbial species. In addition, it allows the access of such information for the continued research and applications even if a currently cultivated microbe cannot be cultured in the future. Embracing the DBNL algorithm standard promises to enhance the efficiency and effectiveness of microbial research, paving the way for innovative solutions and discoveries in diverse fields.

微生物是各种生态系统和生物体中的抗生素生产者、生物控制剂和共生剂,是宝贵的资源。过去几十年来,世界各地研究实验室鉴定和产生的野生型和转基因微生物菌株显著增加。然而,这些菌株所代表的大量信息仍然散见于科学文献中。为了促进未来研究人员的工作,我们在这篇视角文章中提倡采用基于 DNA 的自然语言(DBNL)算法标准,然后用一个链霉菌物种作为概念验证进行了演示。该标准可以进行复杂的基因组测序,随后提取特定微生物物种中编码的宝贵信息。此外,即使目前培养的微生物将来无法再培养,也能获取这些信息,用于继续研究和应用。采用 DBNL 算法标准有望提高微生物研究的效率和效果,为不同领域的创新解决方案和发现铺平道路。
{"title":"Embedding DNA-based natural language in microbes for the benefit of future researchers†","authors":"Heqian Zhang, Jiaquan Huang, Xiaoyu Wang, Zhizeng Gao, Song Meng, Hang Li, Shanshan Zhou, Shang Wang, Shan Wang, Xunyou Yan, Xinwei Yang, Xiaoluo Huang and Zhiwei Qin","doi":"10.1039/D4DD00251B","DOIUrl":"https://doi.org/10.1039/D4DD00251B","url":null,"abstract":"<p >Microorganisms are valuable resources as antibiotic producers, biocontrol agents, and symbiotic agents in various ecosystems and organisms. Over the past decades, there has been a notable increase in the identification and generation of both wild-type and genetically modified microbial strains from research laboratories worldwide. However, a substantial portion of the information represented in these strains remains scattered across the scientific literature. To facilitate the work of future researchers, in this perspective article, we advocate the adoption of the DNA-based natural language (DBNL) algorithm standard and then demonstrate it using a <em>Streptomyces</em> species as a proof of concept. This standard enables the sophisticated genome sequencing and subsequent extraction of valuable information encoded within a particular microbial species. In addition, it allows the access of such information for the continued research and applications even if a currently cultivated microbe cannot be cultured in the future. Embracing the DBNL algorithm standard promises to enhance the efficiency and effectiveness of microbial research, paving the way for innovative solutions and discoveries in diverse fields.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2377-2383"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00251b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Autonomous robotic experimentation system for powder X-ray diffraction† 粉末x射线衍射自主机器人实验系统
IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2024-10-14 DOI: 10.1039/D4DD00190G
Yuto Yotsumoto, Yusaku Nakajima, Ryusei Takamoto, Yasuo Takeichi and Kanta Ono

The automation of materials research is essential for accelerating scientific discovery. Powder X-ray diffraction (PXRD) plays a crucial role in analyzing crystal structures and quantifying phase compositions in materials science. However, current methods face challenges in reproducibility and efficiency. To address these issues, we developed an autonomous robotic experimentation (ARE) system for PXRD that integrates the entire process from sample preparation to data analysis. This system combines a robotic arm for precise sample preparation with machine learning-based techniques for automated data analysis. Our approach consistently produced high-quality samples with reduced background noise, achieving accuracy comparable to manual preparation techniques. We also investigated the relationship between sample quantity and analysis accuracy, demonstrating the system's ability to obtain reliable results with significantly reduced sample amounts. This work advances laboratory automation capabilities and contributes to the development of autonomous materials discovery and optimization processes. By addressing key challenges in PXRD automation, our research enables more efficient and reproducible materials characterization methodologies.

材料研究的自动化对于加速科学发现是必不可少的。粉末x射线衍射(PXRD)在材料科学中分析晶体结构和定量相组成方面起着至关重要的作用。然而,目前的方法在可重复性和效率方面面临挑战。为了解决这些问题,我们为PXRD开发了一个自主机器人实验(ARE)系统,该系统集成了从样品制备到数据分析的整个过程。该系统结合了用于精确样品制备的机械臂和用于自动数据分析的基于机器学习的技术。我们的方法始终如一地产生高质量的样品,降低背景噪声,实现与手工制备技术相当的准确性。我们还研究了样品数量和分析精度之间的关系,证明了系统在显著减少样品数量的情况下获得可靠结果的能力。这项工作提高了实验室自动化能力,有助于自主材料发现和优化过程的发展。通过解决PXRD自动化中的关键挑战,我们的研究实现了更高效和可重复的材料表征方法。
{"title":"Autonomous robotic experimentation system for powder X-ray diffraction†","authors":"Yuto Yotsumoto, Yusaku Nakajima, Ryusei Takamoto, Yasuo Takeichi and Kanta Ono","doi":"10.1039/D4DD00190G","DOIUrl":"https://doi.org/10.1039/D4DD00190G","url":null,"abstract":"<p >The automation of materials research is essential for accelerating scientific discovery. Powder X-ray diffraction (PXRD) plays a crucial role in analyzing crystal structures and quantifying phase compositions in materials science. However, current methods face challenges in reproducibility and efficiency. To address these issues, we developed an autonomous robotic experimentation (ARE) system for PXRD that integrates the entire process from sample preparation to data analysis. This system combines a robotic arm for precise sample preparation with machine learning-based techniques for automated data analysis. Our approach consistently produced high-quality samples with reduced background noise, achieving accuracy comparable to manual preparation techniques. We also investigated the relationship between sample quantity and analysis accuracy, demonstrating the system's ability to obtain reliable results with significantly reduced sample amounts. This work advances laboratory automation capabilities and contributes to the development of autonomous materials discovery and optimization processes. By addressing key challenges in PXRD automation, our research enables more efficient and reproducible materials characterization methodologies.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2523-2532"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00190g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1