Multimodal large language models have been recognized as a historical milestone in the field of artificial intelligence and have demonstrated revolutionary potentials not only in commercial applications, but also for many scientific fields. Here we give a brief overview of multimodal large language models through the lens of bioimage analysis and discuss how we could build these models as a community to facilitate biology research.
{"title":"Multimodal large language models for bioimage analysis","authors":"Shanghang Zhang, Gaole Dai, Tiejun Huang, Jianxu Chen","doi":"10.1038/s41592-024-02334-2","DOIUrl":"10.1038/s41592-024-02334-2","url":null,"abstract":"Multimodal large language models have been recognized as a historical milestone in the field of artificial intelligence and have demonstrated revolutionary potentials not only in commercial applications, but also for many scientific fields. Here we give a brief overview of multimodal large language models through the lens of bioimage analysis and discuss how we could build these models as a community to facilitate biology research.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1038/s41592-024-02369-5
Vivien Marx
As scientists avidly use, tinker and build with artificial intelligence tools, best practices begin to emerge.
随着科学家们热衷于使用、修补和构建人工智能工具,最佳实践开始出现。
{"title":"Quest for AI literacy","authors":"Vivien Marx","doi":"10.1038/s41592-024-02369-5","DOIUrl":"10.1038/s41592-024-02369-5","url":null,"abstract":"As scientists avidly use, tinker and build with artificial intelligence tools, best practices begin to emerge.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41592-024-02369-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1038/s41592-024-02351-1
Eloise Berson, Philip Chung, Camilo Espinosa, Thomas J. Montine, Nima Aghaeepour
Advancements in artificial intelligence (AI) have led to unprecedented success in modeling technically challenging domains including language, audio, image and video understanding. Here we discuss the opportunities represented by recent AI methods to advance immunology research.
{"title":"Unlocking human immune system complexity through AI","authors":"Eloise Berson, Philip Chung, Camilo Espinosa, Thomas J. Montine, Nima Aghaeepour","doi":"10.1038/s41592-024-02351-1","DOIUrl":"10.1038/s41592-024-02351-1","url":null,"abstract":"Advancements in artificial intelligence (AI) have led to unprecedented success in modeling technically challenging domains including language, audio, image and video understanding. Here we discuss the opportunities represented by recent AI methods to advance immunology research.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1038/s41592-024-02362-y
Judith Bernett, David B. Blumenthal, Dominik G. Grimm, Florian Haselbeck, Roman Joeres, Olga V. Kalinina, Markus List
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology. This Perspective discusses the issue of data leakage in machine learning based models and presents seven questions designed to identify and avoid the problems resulting from data leakage.
{"title":"Guiding questions to avoid data leakage in biological machine learning applications","authors":"Judith Bernett, David B. Blumenthal, Dominik G. Grimm, Florian Haselbeck, Roman Joeres, Olga V. Kalinina, Markus List","doi":"10.1038/s41592-024-02362-y","DOIUrl":"10.1038/s41592-024-02362-y","url":null,"abstract":"Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology. This Perspective discusses the issue of data leakage in machine learning based models and presents seven questions designed to identify and avoid the problems resulting from data leakage.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1038/s41592-024-02338-y
Omar O. Abudayyeh, Jonathan S. Gootenberg
Artificial intelligence-enabled computational tools not only help us to elucidate biological processes but also facilitate the programming of biology through molecular and cellular engineering.
人工智能计算工具不仅有助于我们阐明生物过程,还能通过分子和细胞工程促进生物编程。
{"title":"Programmable biology through artificial intelligence: from nucleic acids to proteins to cells","authors":"Omar O. Abudayyeh, Jonathan S. Gootenberg","doi":"10.1038/s41592-024-02338-y","DOIUrl":"10.1038/s41592-024-02338-y","url":null,"abstract":"Artificial intelligence-enabled computational tools not only help us to elucidate biological processes but also facilitate the programming of biology through molecular and cellular engineering.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1038/s41592-024-02353-z
Artur Szałata, Karin Hrovatin, Sören Becker, Alejandro Tejada-Lapuerta, Haotian Cui, Bo Wang, Fabian J. Theis
Recent efforts to construct reference maps of cellular phenotypes have expanded the volume and diversity of single-cell omics data, providing an unprecedented resource for studying cell properties. Despite the availability of rich datasets and their continued growth, current single-cell models are unable to fully capitalize on the information they contain. Transformers have become the architecture of choice for foundation models in other domains owing to their ability to generalize to heterogeneous, large-scale datasets. Thus, the question arises of whether transformers could set off a similar shift in the field of single-cell modeling. Here we first describe the transformer architecture and its single-cell adaptations and then present a comprehensive review of the existing applications of transformers in single-cell analysis and critically discuss their future potential for single-cell biology. By studying limitations and technical challenges, we aim to provide a structured outlook for future research directions at the intersection of machine learning and single-cell biology. This Perspective presents a comprehensive and in-depth overview of computational models based on the deep learning architecture of transformers for single-cell omics analysis.
{"title":"Transformers in single-cell omics: a review and new perspectives","authors":"Artur Szałata, Karin Hrovatin, Sören Becker, Alejandro Tejada-Lapuerta, Haotian Cui, Bo Wang, Fabian J. Theis","doi":"10.1038/s41592-024-02353-z","DOIUrl":"10.1038/s41592-024-02353-z","url":null,"abstract":"Recent efforts to construct reference maps of cellular phenotypes have expanded the volume and diversity of single-cell omics data, providing an unprecedented resource for studying cell properties. Despite the availability of rich datasets and their continued growth, current single-cell models are unable to fully capitalize on the information they contain. Transformers have become the architecture of choice for foundation models in other domains owing to their ability to generalize to heterogeneous, large-scale datasets. Thus, the question arises of whether transformers could set off a similar shift in the field of single-cell modeling. Here we first describe the transformer architecture and its single-cell adaptations and then present a comprehensive review of the existing applications of transformers in single-cell analysis and critically discuss their future potential for single-cell biology. By studying limitations and technical challenges, we aim to provide a structured outlook for future research directions at the intersection of machine learning and single-cell biology. This Perspective presents a comprehensive and in-depth overview of computational models based on the deep learning architecture of transformers for single-cell omics analysis.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1038/s41592-024-02331-5
Alexander Sasse, Maria Chikina, Sara Mostafavi
By exploiting recent advances in modern artificial intelligence and large-scale functional genomic datasets, sequence-to-function models learn the relationship between genomic DNA and its multilayer gene regulatory functions. These models are poised to uncover mechanistic relationships across layers of cellular biology, which will transform our understanding of cis gene regulation and open new avenues for discovering disease mechanisms.
序列到功能模型利用现代人工智能和大规模功能基因组数据集的最新进展,学习基因组 DNA 与其多层基因调控功能之间的关系。这些模型有望揭示跨细胞生物学各层次的机理关系,从而改变我们对顺式基因调控的理解,为发现疾病机理开辟新途径。
{"title":"Unlocking gene regulation with sequence-to-function models","authors":"Alexander Sasse, Maria Chikina, Sara Mostafavi","doi":"10.1038/s41592-024-02331-5","DOIUrl":"10.1038/s41592-024-02331-5","url":null,"abstract":"By exploiting recent advances in modern artificial intelligence and large-scale functional genomic datasets, sequence-to-function models learn the relationship between genomic DNA and its multilayer gene regulatory functions. These models are poised to uncover mechanistic relationships across layers of cellular biology, which will transform our understanding of cis gene regulation and open new avenues for discovering disease mechanisms.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1038/s41592-024-02324-4
Benjamin M. Gyori, Olga Vitek
Mass spectrometry-based proteomics provides broad and quantitative detection of the proteome, but its results are mostly presented as protein lists. Artificial intelligence approaches will exploit prior knowledge from literature and harmonize fragmented datasets to enable mechanistic and functional interpretation of proteomics experiments.
{"title":"Beyond protein lists: AI-assisted interpretation of proteomic investigations in the context of evolving scientific knowledge","authors":"Benjamin M. Gyori, Olga Vitek","doi":"10.1038/s41592-024-02324-4","DOIUrl":"10.1038/s41592-024-02324-4","url":null,"abstract":"Mass spectrometry-based proteomics provides broad and quantitative detection of the proteome, but its results are mostly presented as protein lists. Artificial intelligence approaches will exploit prior knowledge from literature and harmonize fragmented datasets to enable mechanistic and functional interpretation of proteomics experiments.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1038/s41592-024-02388-2
Nina Vogt
Virtual models of rats can be used to simulate behaviors and gain insights into the underlying neural activity.
大鼠的虚拟模型可用于模拟行为并深入了解潜在的神经活动。
{"title":"Studying naturalistic behavior virtually","authors":"Nina Vogt","doi":"10.1038/s41592-024-02388-2","DOIUrl":"10.1038/s41592-024-02388-2","url":null,"abstract":"Virtual models of rats can be used to simulate behaviors and gain insights into the underlying neural activity.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":null,"pages":null},"PeriodicalIF":36.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141913342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}