Pub Date : 2024-12-06eCollection Date: 2024-12-13DOI: 10.1016/j.patter.2024.101115
Jake Crawford, Maria Chikina, Casey S Greene
Guidelines in statistical modeling for genomics hold that simpler models have advantages over more complex ones. Potential advantages include cost, interpretability, and improved generalization across datasets or biological contexts. We directly tested the assumption that small gene signatures generalize better by examining the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice versa) and biological contexts (holding out entire cancer types from pan-cancer data). We compared model selection between solely cross-validation performance and combining cross-validation performance with regularization strength. We did not observe that more regularized signatures generalized better. This result held across both generalization problems and for both linear models (LASSO logistic regression) and non-linear ones (neural networks). When the goal of an analysis is to produce generalizable predictive models, we recommend choosing the ones that perform best on held-out data or in cross-validation instead of those that are smaller or more regularized.
{"title":"Best holdout assessment is sufficient for cancer transcriptomic model selection.","authors":"Jake Crawford, Maria Chikina, Casey S Greene","doi":"10.1016/j.patter.2024.101115","DOIUrl":"https://doi.org/10.1016/j.patter.2024.101115","url":null,"abstract":"<p><p>Guidelines in statistical modeling for genomics hold that simpler models have advantages over more complex ones. Potential advantages include cost, interpretability, and improved generalization across datasets or biological contexts. We directly tested the assumption that small gene signatures generalize better by examining the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice versa) and biological contexts (holding out entire cancer types from pan-cancer data). We compared model selection between solely cross-validation performance and combining cross-validation performance with regularization strength. We did not observe that more regularized signatures generalized better. This result held across both generalization problems and for both linear models (LASSO logistic regression) and non-linear ones (neural networks). When the goal of an analysis is to produce generalizable predictive models, we recommend choosing the ones that perform best on held-out data or in cross-validation instead of those that are smaller or more regularized.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 12","pages":"101115"},"PeriodicalIF":6.7,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11701843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25eCollection Date: 2024-12-13DOI: 10.1016/j.patter.2024.101099
Charles H Martin, Ganesh Mani
This article examines the convergence of physics, chemistry, and artificial intelligence (AI), highlighted by recent Nobel Prizes. It traces the historical development of neural networks, emphasizing interdisciplinary research's role in advancing AI. The authors advocate for nurturing AI-enabled polymaths to bridge the gap between theoretical advancements and practical applications, driving progress toward artificial general intelligence (AGI).
{"title":"The recent Physics and Chemistry Nobel Prizes, AI, and the convergence of knowledge fields.","authors":"Charles H Martin, Ganesh Mani","doi":"10.1016/j.patter.2024.101099","DOIUrl":"https://doi.org/10.1016/j.patter.2024.101099","url":null,"abstract":"<p><p>This article examines the convergence of physics, chemistry, and artificial intelligence (AI), highlighted by recent Nobel Prizes. It traces the historical development of neural networks, emphasizing interdisciplinary research's role in advancing AI. The authors advocate for nurturing AI-enabled polymaths to bridge the gap between theoretical advancements and practical applications, driving progress toward artificial general intelligence (AGI).</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 12","pages":"101099"},"PeriodicalIF":6.7,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11701849/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19eCollection Date: 2024-12-13DOI: 10.1016/j.patter.2024.101097
Yimu Pan, Manas Mehta, Jeffery A Goldstein, Joseph Ngonzi, Lisa M Bebell, Drucilla J Roberts, Chrystalle Katte Carreon, Kelly Gallagher, Rachel E Walker, Alison D Gernand, James Z Wang
The placenta is vital to maternal and child health but often overlooked in pregnancy studies. Addressing the need for a more accessible and cost-effective method of placental assessment, our study introduces a computational tool designed for the analysis of placental photographs. Leveraging images and pathology reports collected from sites in the United States and Uganda over a 12-year period, we developed a cross-modal contrastive learning algorithm consisting of pre-alignment, distillation, and retrieval modules. Moreover, the proposed robustness evaluation protocol enables statistical assessment of performance improvements, provides deeper insight into the impact of different features on predictions, and offers practical guidance for its application in a variety of settings. Through extensive experimentation, our tool demonstrates an average area under the receiver operating characteristic curve score of over 82% in both internal and external validations, which underscores the potential of our tool to enhance clinical care across diverse environments.
{"title":"Cross-modal contrastive learning for unified placenta analysis using photographs.","authors":"Yimu Pan, Manas Mehta, Jeffery A Goldstein, Joseph Ngonzi, Lisa M Bebell, Drucilla J Roberts, Chrystalle Katte Carreon, Kelly Gallagher, Rachel E Walker, Alison D Gernand, James Z Wang","doi":"10.1016/j.patter.2024.101097","DOIUrl":"https://doi.org/10.1016/j.patter.2024.101097","url":null,"abstract":"<p><p>The placenta is vital to maternal and child health but often overlooked in pregnancy studies. Addressing the need for a more accessible and cost-effective method of placental assessment, our study introduces a computational tool designed for the analysis of placental photographs. Leveraging images and pathology reports collected from sites in the United States and Uganda over a 12-year period, we developed a cross-modal contrastive learning algorithm consisting of pre-alignment, distillation, and retrieval modules. Moreover, the proposed robustness evaluation protocol enables statistical assessment of performance improvements, provides deeper insight into the impact of different features on predictions, and offers practical guidance for its application in a variety of settings. Through extensive experimentation, our tool demonstrates an average area under the receiver operating characteristic curve score of over 82% in both internal and external validations, which underscores the potential of our tool to enhance clinical care across diverse environments.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 12","pages":"101097"},"PeriodicalIF":6.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11701861/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.patter.2024.101095
So Nakagawa, Shoichi Sakaguchi
Hou and He et al.1 developed a new RNA virus identification tool named LucaProt, a transformer-based bioinformatics software using sequence and structural characteristics of RNA-dependent RNA polymerases (RdRPs), which are essential for almost all RNA viruses. LucaProt can identify RdRPs from highly diverse RNA viruses, unveiling the hidden RNA virosphere.
{"title":"Exploring the hidden world of RNA viruses with a transformer-based tool.","authors":"So Nakagawa, Shoichi Sakaguchi","doi":"10.1016/j.patter.2024.101095","DOIUrl":"10.1016/j.patter.2024.101095","url":null,"abstract":"<p><p>Hou and He et al.<sup>1</sup> developed a new RNA virus identification tool named LucaProt, a transformer-based bioinformatics software using sequence and structural characteristics of RNA-dependent RNA polymerases (RdRPs), which are essential for almost all RNA viruses. LucaProt can identify RdRPs from highly diverse RNA viruses, unveiling the hidden RNA virosphere.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101095"},"PeriodicalIF":6.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.patter.2024.101094
James Z Wang, Brad Wyble
In this opinion piece, the authors, from the fields of artificial intelligence (AI) and psychology, reflect on how the foundational discoveries of Nobel laureates Hopfield and Hinton have influenced their research. They also discuss emerging directions in AI and the challenges that lie ahead for neural networks and machine learning.
{"title":"Hopfield and Hinton's neural network revolution and the future of AI.","authors":"James Z Wang, Brad Wyble","doi":"10.1016/j.patter.2024.101094","DOIUrl":"10.1016/j.patter.2024.101094","url":null,"abstract":"<p><p>In this opinion piece, the authors, from the fields of artificial intelligence (AI) and psychology, reflect on how the foundational discoveries of Nobel laureates Hopfield and Hinton have influenced their research. They also discuss emerging directions in AI and the challenges that lie ahead for neural networks and machine learning.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101094"},"PeriodicalIF":6.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.patter.2024.101096
Hyunghoon Cho
The possibility that single-cell gene expression datasets could leak information about individuals' genotypes has been largely unexplored. Walker et al. showed that even noisy genotype predictions derived from these data can be linked to the corresponding genotype profiles with significant accuracy.
{"title":"Privacy of single-cell gene expression data.","authors":"Hyunghoon Cho","doi":"10.1016/j.patter.2024.101096","DOIUrl":"10.1016/j.patter.2024.101096","url":null,"abstract":"<p><p>The possibility that single-cell gene expression datasets could leak information about individuals' genotypes has been largely unexplored. Walker et al. showed that even noisy genotype predictions derived from these data can be linked to the corresponding genotype profiles with significant accuracy.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101096"},"PeriodicalIF":6.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.patter.2024.101092
Kelly Widdicks, Faiza Samreen, Gordon S Blair, Susannah Rennie, John Watkins
Digital research infrastructure (DRI) for environmental science requires significant transformation to support the changing nature of science and utilize digital innovations. Numerous challenges prevent this change yet simultaneously pose exciting principles to drive the future of DRI. This opinion piece details a multi-dimensional approach toward these futures for the environmental community.
{"title":"A multi-dimensional approach to the future of digital research infrastructure for systemic environmental science.","authors":"Kelly Widdicks, Faiza Samreen, Gordon S Blair, Susannah Rennie, John Watkins","doi":"10.1016/j.patter.2024.101092","DOIUrl":"10.1016/j.patter.2024.101092","url":null,"abstract":"<p><p>Digital research infrastructure (DRI) for environmental science requires significant transformation to support the changing nature of science and utilize digital innovations. Numerous challenges prevent this change yet simultaneously pose exciting principles to drive the future of DRI. This opinion piece details a multi-dimensional approach toward these futures for the environmental community.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101092"},"PeriodicalIF":6.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.patter.2024.101078
Arthur Gwagwa, Warmhold Jan Thomas Mollema
As the geopolitical superpowers race to regulate the digital realm, their divergent rights-centered, market-driven, and social-control-based approaches require a global compact on digital regulation. If diverse regulatory jurisdictions remain, forms of domination entailed by cultural imposition and hermeneutical injustice related to AI legislation and AI systems will follow. We argue for consensual regulation on shared substantive issues, accompanied by proper standardization and coordination. Failure to attain consensus will fragment global digital regulation, enable regulatory capture by authoritarian powers or bad corporate actors, and deepen the historical geopolitical power asymmetries between the global South and the global North. To prevent an unjust regulatory landscape where the global South's cultural and hermeneutic resources are absent, two principles for the Global Digital Compact to counter these prospective harms are proposed and discussed: (1) "recognitive consensus on key substantive benefits and harms" and (2) "procedural consensus on global coordination and essential standards."
{"title":"How could the United Nations Global Digital Compact prevent cultural imposition and hermeneutical injustice?","authors":"Arthur Gwagwa, Warmhold Jan Thomas Mollema","doi":"10.1016/j.patter.2024.101078","DOIUrl":"10.1016/j.patter.2024.101078","url":null,"abstract":"<p><p>As the geopolitical superpowers race to regulate the digital realm, their divergent rights-centered, market-driven, and social-control-based approaches require a global compact on digital regulation. If diverse regulatory jurisdictions remain, forms of domination entailed by cultural imposition and hermeneutical injustice related to AI legislation and AI systems will follow. We argue for consensual regulation on shared substantive issues, accompanied by proper standardization and coordination. Failure to attain consensus will fragment global digital regulation, enable regulatory capture by authoritarian powers or bad corporate actors, and deepen the historical geopolitical power asymmetries between the global South and the global North. To prevent an unjust regulatory landscape where the global South's cultural and hermeneutic resources are absent, two principles for the Global Digital Compact to counter these prospective harms are proposed and discussed: (1) \"recognitive consensus on key substantive benefits and harms\" and (2) \"procedural consensus on global coordination and essential standards.\"</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101078"},"PeriodicalIF":6.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.patter.2024.101080
Angelina Wang, Aaron Hertzmann, Olga Russakovsky
Benchmarks and leaderboards are commonly used to track the fairness impacts of artificial intelligence (AI) models. Many critics argue against this practice, since it incentivizes optimizing for metrics in an attempt to build the "most fair" AI model. However, this is an inherently impossible task since different applications have different considerations. While we agree with the critiques against leaderboards, we believe that the use of benchmarks can be reformed. Thus far, the critiques of leaderboards and benchmarks have become unhelpfully entangled. However, benchmarks, when not used for leaderboards, offer important tools for understanding a model. We advocate for collecting benchmarks into carefully curated "benchmark suites," which can provide researchers and practitioners with tools for understanding the wide range of potential harms and trade-offs among different aspects of fairness. We describe the research needed to build these benchmark suites so that they can better assess different usage modalities, cover potential harms, and reflect diverse perspectives. By moving away from leaderboards and instead thoughtfully designing and compiling benchmark suites, we can better monitor and improve the fairness impacts of AI technology.
{"title":"Benchmark suites instead of leaderboards for evaluating AI fairness.","authors":"Angelina Wang, Aaron Hertzmann, Olga Russakovsky","doi":"10.1016/j.patter.2024.101080","DOIUrl":"10.1016/j.patter.2024.101080","url":null,"abstract":"<p><p>Benchmarks and leaderboards are commonly used to track the fairness impacts of artificial intelligence (AI) models. Many critics argue against this practice, since it incentivizes optimizing for metrics in an attempt to build the \"most fair\" AI model. However, this is an inherently impossible task since different applications have different considerations. While we agree with the critiques against leaderboards, we believe that the use of benchmarks can be reformed. Thus far, the critiques of leaderboards and benchmarks have become unhelpfully entangled. However, benchmarks, when not used for leaderboards, offer important tools for understanding a model. We advocate for collecting benchmarks into carefully curated \"benchmark suites,\" which can provide researchers and practitioners with tools for understanding the wide range of potential harms and trade-offs among different aspects of fairness. We describe the research needed to build these benchmark suites so that they can better assess different usage modalities, cover potential harms, and reflect diverse perspectives. By moving away from leaderboards and instead thoughtfully designing and compiling benchmark suites, we can better monitor and improve the fairness impacts of AI technology.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101080"},"PeriodicalIF":6.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573903/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.patter.2024.101077
Inken Hagestedt, Ian Hales, Eric Boernert, Holger R Roth, Michael A Hoeh, Robin Röhm, Ellie Dobson, José Tomás Prieto
We discuss the real-world application of federated learning (FL) in the healthcare and life sciences industry, noting a tipping point in its adoption beyond academia. Sharing our experiences with multi-hospital and multi-pharma collaborations, we highlight the importance of involving key stakeholders to develop production-grade FL solutions that are fully compliant with stringent privacy and security standards.
{"title":"Toward a tipping point in federated learning in healthcare and life sciences.","authors":"Inken Hagestedt, Ian Hales, Eric Boernert, Holger R Roth, Michael A Hoeh, Robin Röhm, Ellie Dobson, José Tomás Prieto","doi":"10.1016/j.patter.2024.101077","DOIUrl":"10.1016/j.patter.2024.101077","url":null,"abstract":"<p><p>We discuss the real-world application of federated learning (FL) in the healthcare and life sciences industry, noting a tipping point in its adoption beyond academia. Sharing our experiences with multi-hospital and multi-pharma collaborations, we highlight the importance of involving key stakeholders to develop production-grade FL solutions that are fully compliant with stringent privacy and security standards.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101077"},"PeriodicalIF":6.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}