Pub Date : 2024-06-19DOI: 10.1038/s43588-024-00637-0
Francisco Barreras, Duncan J. Watts
Large-scale GPS location datasets hold immense potential for measuring human mobility and interpersonal contact, both of which are essential for data-driven epidemiology. However, despite their potential and widespread adoption during the COVID-19 pandemic, there are several challenges with these data that raise concerns regarding the validity and robustness of its applications. Here we outline two types of challenges—some related to accessing and processing these data, and some related to data quality—and propose several research directions to address them moving forward. While large-scale GPS location datasets have been instrumental to applications in epidemiology, there are still several challenges with these data that should be considered and addressed to make data-driven epidemiology more reliable.
{"title":"The exciting potential and daunting challenge of using GPS human-mobility data for epidemic modeling","authors":"Francisco Barreras, Duncan J. Watts","doi":"10.1038/s43588-024-00637-0","DOIUrl":"10.1038/s43588-024-00637-0","url":null,"abstract":"Large-scale GPS location datasets hold immense potential for measuring human mobility and interpersonal contact, both of which are essential for data-driven epidemiology. However, despite their potential and widespread adoption during the COVID-19 pandemic, there are several challenges with these data that raise concerns regarding the validity and robustness of its applications. Here we outline two types of challenges—some related to accessing and processing these data, and some related to data quality—and propose several research directions to address them moving forward. While large-scale GPS location datasets have been instrumental to applications in epidemiology, there are still several challenges with these data that should be considered and addressed to make data-driven epidemiology more reliable.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141428408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1038/s43588-024-00655-y
Ananya Rastogi
{"title":"The whole picture in digital pathology","authors":"Ananya Rastogi","doi":"10.1038/s43588-024-00655-y","DOIUrl":"10.1038/s43588-024-00655-y","url":null,"abstract":"","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141428409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-17DOI: 10.1038/s43588-024-00639-y
Jenna C. Fromer, Connor W. Coley
Small molecules exhibiting desirable property profiles are often discovered through an iterative process of designing, synthesizing and testing sets of molecules. The selection of molecules to synthesize from all possible candidates is a complex decision-making process that typically relies on expert chemist intuition. Here we propose a quantitative decision-making framework, SPARROW, that prioritizes molecules for evaluation by balancing expected information gain and synthetic cost. SPARROW integrates molecular design, property prediction and retrosynthetic planning to balance the utility of testing a molecule with the cost of batch synthesis. We demonstrate, through three case studies, that the developed algorithm captures the non-additive costs inherent to batch synthesis, leverages common reaction steps and intermediates, and scales to hundreds of molecules. The downselection of compounds for synthesis is a key challenge in molecular design cycles that typically relies on expert chemist intuition. Fromer and Coley propose a cost-aware method to automatically select compounds and synthetic routes.
{"title":"An algorithmic framework for synthetic cost-aware decision making in molecular design","authors":"Jenna C. Fromer, Connor W. Coley","doi":"10.1038/s43588-024-00639-y","DOIUrl":"10.1038/s43588-024-00639-y","url":null,"abstract":"Small molecules exhibiting desirable property profiles are often discovered through an iterative process of designing, synthesizing and testing sets of molecules. The selection of molecules to synthesize from all possible candidates is a complex decision-making process that typically relies on expert chemist intuition. Here we propose a quantitative decision-making framework, SPARROW, that prioritizes molecules for evaluation by balancing expected information gain and synthetic cost. SPARROW integrates molecular design, property prediction and retrosynthetic planning to balance the utility of testing a molecule with the cost of batch synthesis. We demonstrate, through three case studies, that the developed algorithm captures the non-additive costs inherent to batch synthesis, leverages common reaction steps and intermediates, and scales to hundreds of molecules. The downselection of compounds for synthesis is a key challenge in molecular design cycles that typically relies on expert chemist intuition. Fromer and Coley propose a cost-aware method to automatically select compounds and synthetic routes.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-14DOI: 10.1038/s43588-024-00648-x
Tianyu Wang
A recent study shows that, by leveraging nonlinear optical processes in disordered media, photonic processors can transform high-dimensional machine-learning data, using nonlinear functions that are otherwise challenging for digital electronic processors to compute.
{"title":"A nonlinear dimension for machine learning in optical disordered media","authors":"Tianyu Wang","doi":"10.1038/s43588-024-00648-x","DOIUrl":"10.1038/s43588-024-00648-x","url":null,"abstract":"A recent study shows that, by leveraging nonlinear optical processes in disordered media, photonic processors can transform high-dimensional machine-learning data, using nonlinear functions that are otherwise challenging for digital electronic processors to compute.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141322108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-14DOI: 10.1038/s43588-024-00644-1
Hao Wang, Jianqi Hu, Andrea Morandi, Alfonso Nardi, Fei Xia, Xuanchen Li, Romolo Savo, Qiang Liu, Rachel Grange, Sylvain Gigan
Neural networks find widespread use in scientific and technological applications, yet their implementations in conventional computers have encountered bottlenecks due to ever-expanding computational needs. Photonic computing is a promising neuromorphic platform with potential advantages of massive parallelism, ultralow latency and reduced energy consumption but mostly for computing linear operations. Here we demonstrate a large-scale, high-performance nonlinear photonic neural system based on a disordered polycrystalline slab composed of lithium niobate nanocrystals. Mediated by random quasi-phase-matching and multiple scattering, linear and nonlinear optical speckle features are generated as the interplay between the simultaneous linear random scattering and the second-harmonic generation, defining a complex neural network in which the second-order nonlinearity acts as internal nonlinear activation functions. Benchmarked against linear random projection, such nonlinear mapping embedded with rich physical computational operations shows improved performance across a large collection of machine learning tasks in image classification, regression and graph classification. Demonstrating up to 27,648 input and 3,500 nonlinear output nodes, the combination of optical nonlinearity and random scattering serves as a scalable computing engine for diverse applications. Nonlinear optical computations have been essential yet challenging for developing optical neural networks with appreciable expressivity. In this paper, light scattering is combined with optical nonlinearity to empower a high-performance, large-scale nonlinear photonic neural system.
{"title":"Large-scale photonic computing with nonlinear disordered media","authors":"Hao Wang, Jianqi Hu, Andrea Morandi, Alfonso Nardi, Fei Xia, Xuanchen Li, Romolo Savo, Qiang Liu, Rachel Grange, Sylvain Gigan","doi":"10.1038/s43588-024-00644-1","DOIUrl":"10.1038/s43588-024-00644-1","url":null,"abstract":"Neural networks find widespread use in scientific and technological applications, yet their implementations in conventional computers have encountered bottlenecks due to ever-expanding computational needs. Photonic computing is a promising neuromorphic platform with potential advantages of massive parallelism, ultralow latency and reduced energy consumption but mostly for computing linear operations. Here we demonstrate a large-scale, high-performance nonlinear photonic neural system based on a disordered polycrystalline slab composed of lithium niobate nanocrystals. Mediated by random quasi-phase-matching and multiple scattering, linear and nonlinear optical speckle features are generated as the interplay between the simultaneous linear random scattering and the second-harmonic generation, defining a complex neural network in which the second-order nonlinearity acts as internal nonlinear activation functions. Benchmarked against linear random projection, such nonlinear mapping embedded with rich physical computational operations shows improved performance across a large collection of machine learning tasks in image classification, regression and graph classification. Demonstrating up to 27,648 input and 3,500 nonlinear output nodes, the combination of optical nonlinearity and random scattering serves as a scalable computing engine for diverse applications. Nonlinear optical computations have been essential yet challenging for developing optical neural networks with appreciable expressivity. In this paper, light scattering is combined with optical nonlinearity to empower a high-performance, large-scale nonlinear photonic neural system.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141322109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-14DOI: 10.1038/s43588-024-00642-3
Mai Ha Vu, Philippe A. Robert, Rahmad Akbar, Bartlomiej Swiatczak, Geir Kjetil Sandve, Dag Trygve Truslew Haug, Victor Greiff
Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining. The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.
{"title":"Linguistics-based formalization of the antibody language as a basis for antibody language models","authors":"Mai Ha Vu, Philippe A. Robert, Rahmad Akbar, Bartlomiej Swiatczak, Geir Kjetil Sandve, Dag Trygve Truslew Haug, Victor Greiff","doi":"10.1038/s43588-024-00642-3","DOIUrl":"10.1038/s43588-024-00642-3","url":null,"abstract":"Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining. The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141322110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-12DOI: 10.1038/s43588-024-00636-1
Chiheb Ben Mahmoud, John L. A. Gardner, Volker L. Deringer
As machine learning models are becoming mainstream tools for molecular and materials research, there is an urgent need to improve the nature, quality, and accessibility of atomistic data. In turn, there are opportunities for a new generation of generally applicable datasets and distillable models.
{"title":"Data as the next challenge in atomistic machine learning","authors":"Chiheb Ben Mahmoud, John L. A. Gardner, Volker L. Deringer","doi":"10.1038/s43588-024-00636-1","DOIUrl":"10.1038/s43588-024-00636-1","url":null,"abstract":"As machine learning models are becoming mainstream tools for molecular and materials research, there is an urgent need to improve the nature, quality, and accessibility of atomistic data. In turn, there are opportunities for a new generation of generally applicable datasets and distillable models.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-07DOI: 10.1038/s43588-024-00646-z
Gokul Gowri, Kuanwei Sheng, Peng Yin
Orthogonal DNA barcode library design is an essential task in bioengineering. Here we present seqwalk, an efficient method for designing barcode libraries that satisfy a sequence symmetry minimization (SSM) heuristic for orthogonality, with theoretical guarantees of maximal or near-maximal library size under certain design constraints. Seqwalk encodes SSM constraints in a de Bruijn graph representation of sequence space, enabling the application of recent advances in discrete mathematics1 to the problem of orthogonal sequence design. We demonstrate the scalability of seqwalk by designing a library of >106 SSM-satisfying barcode sequences in less than 20 s on a standard laptop. Seqwalk is a scalable method for designing orthogonal DNA barcode libraries, producing one million barcodes in 20 s on a standard laptop.
正交 DNA 条形码文库设计是生物工程中的一项重要任务。在此,我们介绍一种高效的条形码文库设计方法 Seqwalk,该方法满足序列对称性最小化(SSM)启发式正交性要求,理论上保证了在特定设计约束条件下最大或接近最大的文库规模。Seqwalk 将 SSM 约束条件编码为序列空间的 de Bruijn 图表示法,从而将离散数学1 的最新进展应用于正交序列设计问题。我们在一台标准笔记本电脑上用不到 20 秒的时间就设计出了大于 106 个满足 SSM 的条形码序列库,证明了 seqwalk 的可扩展性。
{"title":"Scalable design of orthogonal DNA barcode libraries","authors":"Gokul Gowri, Kuanwei Sheng, Peng Yin","doi":"10.1038/s43588-024-00646-z","DOIUrl":"10.1038/s43588-024-00646-z","url":null,"abstract":"Orthogonal DNA barcode library design is an essential task in bioengineering. Here we present seqwalk, an efficient method for designing barcode libraries that satisfy a sequence symmetry minimization (SSM) heuristic for orthogonality, with theoretical guarantees of maximal or near-maximal library size under certain design constraints. Seqwalk encodes SSM constraints in a de Bruijn graph representation of sequence space, enabling the application of recent advances in discrete mathematics1 to the problem of orthogonal sequence design. We demonstrate the scalability of seqwalk by designing a library of >106 SSM-satisfying barcode sequences in less than 20 s on a standard laptop. Seqwalk is a scalable method for designing orthogonal DNA barcode libraries, producing one million barcodes in 20 s on a standard laptop.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11208133/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141289021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-07DOI: 10.1038/s43588-024-00647-y
Yifan Yang, Fan Xu
Morphing soft matter, which is capable of changing its shape and function in response to stimuli, has wide-ranging applications in robotics, medicine and biology. Recently, computational models have accelerated its development. Here, we highlight advances and challenges in developing computational techniques, and explore the potential applications enabled by such models.
{"title":"Computational morphology and morphogenesis for empowering soft-matter engineering","authors":"Yifan Yang, Fan Xu","doi":"10.1038/s43588-024-00647-y","DOIUrl":"10.1038/s43588-024-00647-y","url":null,"abstract":"Morphing soft matter, which is capable of changing its shape and function in response to stimuli, has wide-ranging applications in robotics, medicine and biology. Recently, computational models have accelerated its development. Here, we highlight advances and challenges in developing computational techniques, and explore the potential applications enabled by such models.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":null,"pages":null},"PeriodicalIF":12.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141289020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}