African populations are diverse in their ethnicity, language, culture, and genetics. Although plagued by high disease burdens, until recently the continent has largely been excluded from biomedical studies. Along with limitations in research and clinical infrastructure, human capacity, and funding, this omission has resulted in an underrepresentation of African data and disadvantaged African scientists. This review interrogates the relative abundance of biomedical data from Africa, primarily in genomics and other omics. The visibility of African science through publications is also discussed. A challenge encountered in this review is the relative lack of annotation of data on their geographical or population origin, with African countries represented as a single group. In addition to the abovementioned limitations,the global representation of African data may also be attributed to the hesitation to deposit data in public repositories. Whatever the reason, the disparity should be addressed, as African data have enormous value for scientists in Africa and globally.
{"title":"African Global Representation in Biomedical Sciences.","authors":"Nicola Mulder, Lyndon Zass, Yosr Hamdi, Houcemeddine Othman, Sumir Panji, Imane Allali, Yasmina Jaufeerally Fakim","doi":"10.1146/annurev-biodatasci-102920-112550","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-102920-112550","url":null,"abstract":"<p><p>African populations are diverse in their ethnicity, language, culture, and genetics. Although plagued by high disease burdens, until recently the continent has largely been excluded from biomedical studies. Along with limitations in research and clinical infrastructure, human capacity, and funding, this omission has resulted in an underrepresentation of African data and disadvantaged African scientists. This review interrogates the relative abundance of biomedical data from Africa, primarily in genomics and other omics. The visibility of African science through publications is also discussed. A challenge encountered in this review is the relative lack of annotation of data on their geographical or population origin, with African countries represented as a single group. In addition to the abovementioned limitations,the global representation of African data may also be attributed to the hesitation to deposit data in public repositories. Whatever the reason, the disparity should be addressed, as African data have enormous value for scientists in Africa and globally.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"57-81"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39373761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20Epub Date: 2021-06-01DOI: 10.1146/annurev-biodatasci-110920-093120
Tracey Holloway, Daegan Miller, Susan Anenberg, Minghui Diao, Bryan Duncan, Arlene M Fiore, Daven K Henze, Jeremy Hess, Patrick L Kinney, Yang Liu, Jessica L Neu, Susan M O'Neill, M Talat Odman, R Bradley Pierce, Armistead G Russell, Daniel Tong, J Jason West, Mark A Zondlo
Data from satellite instruments provide estimates of gas and particle levels relevant to human health, even pollutants invisible to the human eye. However, the successful interpretation of satellite data requires an understanding of how satellites relate to other data sources, as well as factors affecting their application to health challenges. Drawing from the expertise and experience of the 2016-2020 NASA HAQAST (Health and Air Quality Applied Sciences Team), we present a review of satellite data for air quality and health applications. We include a discussion of satellite data for epidemiological studies and health impact assessments, as well as the use of satellite data to evaluate air quality trends, support air quality regulation, characterize smoke from wildfires, and quantify emission sources. The primary advantage of satellite data compared to in situ measurements, e.g., from air quality monitoring stations, is their spatial coverage. Satellite data can reveal where pollution levels are highest around the world, how levels have changed over daily to decadal periods, and where pollutants are transported from urban to global scales. To date, air quality and health applications have primarily utilized satellite observations and satellite-derived products relevant to near-surface particulate matter <2.5 μm in diameter (PM2.5) and nitrogen dioxide (NO2). Health and air quality communities have grown increasingly engaged in the use of satellite data, and this trend is expected to continue. From health researchers to air quality managers, and from global applications to community impacts, satellite data are transforming the way air pollution exposure is evaluated.
{"title":"Satellite Monitoring for Air Quality and Health.","authors":"Tracey Holloway, Daegan Miller, Susan Anenberg, Minghui Diao, Bryan Duncan, Arlene M Fiore, Daven K Henze, Jeremy Hess, Patrick L Kinney, Yang Liu, Jessica L Neu, Susan M O'Neill, M Talat Odman, R Bradley Pierce, Armistead G Russell, Daniel Tong, J Jason West, Mark A Zondlo","doi":"10.1146/annurev-biodatasci-110920-093120","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-110920-093120","url":null,"abstract":"<p><p>Data from satellite instruments provide estimates of gas and particle levels relevant to human health, even pollutants invisible to the human eye. However, the successful interpretation of satellite data requires an understanding of how satellites relate to other data sources, as well as factors affecting their application to health challenges. Drawing from the expertise and experience of the 2016-2020 NASA HAQAST (Health and Air Quality Applied Sciences Team), we present a review of satellite data for air quality and health applications. We include a discussion of satellite data for epidemiological studies and health impact assessments, as well as the use of satellite data to evaluate air quality trends, support air quality regulation, characterize smoke from wildfires, and quantify emission sources. The primary advantage of satellite data compared to in situ measurements, e.g., from air quality monitoring stations, is their spatial coverage. Satellite data can reveal where pollution levels are highest around the world, how levels have changed over daily to decadal periods, and where pollutants are transported from urban to global scales. To date, air quality and health applications have primarily utilized satellite observations and satellite-derived products relevant to near-surface particulate matter <2.5 μm in diameter (PM<sub>2.5</sub>) and nitrogen dioxide (NO<sub>2</sub>). Health and air quality communities have grown increasingly engaged in the use of satellite data, and this trend is expected to continue. From health researchers to air quality managers, and from global applications to community impacts, satellite data are transforming the way air pollution exposure is evaluated.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"417-447"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39373763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20DOI: 10.1146/annurev-biodatasci-030221-125715
Sandra Soo-Jin Lee
The collection and use of human genetic data raise important ethical questions about how to balance individual autonomy and privacy with the potential for public good. The proliferation of local, national, and international efforts to collect genetic data and create linkages to support large-scale initiatives in precision medicine and the learning health system creates new demands for broad data sharing that involve managing competing interests and careful consideration of what constitutes appropriate ethical trade-offs. This review describes these emerging ethical issues with a focus on approaches to consent and issues related to justice in the shifting genomic research ecosystem.
{"title":"The Ethics of Consent in a Shifting Genomic Ecosystem.","authors":"Sandra Soo-Jin Lee","doi":"10.1146/annurev-biodatasci-030221-125715","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-030221-125715","url":null,"abstract":"<p><p>The collection and use of human genetic data raise important ethical questions about how to balance individual autonomy and privacy with the potential for public good. The proliferation of local, national, and international efforts to collect genetic data and create linkages to support large-scale initiatives in precision medicine and the learning health system creates new demands for broad data sharing that involve managing competing interests and careful consideration of what constitutes appropriate ethical trade-offs. This review describes these emerging ethical issues with a focus on approaches to consent and issues related to justice in the shifting genomic research ecosystem.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"145-164"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8683157/pdf/nihms-1760354.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39371085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20Epub Date: 2021-05-14DOI: 10.1146/annurev-biodatasci-021821-061045
Qingyu Chen, Robert Leaman, Alexis Allot, Ling Luo, Chih-Hsuan Wei, Shankai Yan, Zhiyong Lu
The COVID-19 (coronavirus disease 2019) pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP)-the branch of artificial intelligence that interprets human language-can be applied to address many of the information needs made urgent by the COVID-19 pandemic. This review surveys approximately 150 NLP studies and more than 50 systems and datasets addressing the COVID-19 pandemic. We detail work on four core NLP tasks: information retrieval, named entity recognition, literature-based discovery, and question answering. We also describe work that directly addresses aspects of the pandemic through four additional tasks: topic modeling, sentiment and emotion analysis, caseload forecasting, and misinformation detection. We conclude by discussing observable trends and remaining challenges.
{"title":"Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing.","authors":"Qingyu Chen, Robert Leaman, Alexis Allot, Ling Luo, Chih-Hsuan Wei, Shankai Yan, Zhiyong Lu","doi":"10.1146/annurev-biodatasci-021821-061045","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-021821-061045","url":null,"abstract":"<p><p>The COVID-19 (coronavirus disease 2019) pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP)-the branch of artificial intelligence that interprets human language-can be applied to address many of the information needs made urgent by the COVID-19 pandemic. This review surveys approximately 150 NLP studies and more than 50 systems and datasets addressing the COVID-19 pandemic. We detail work on four core NLP tasks: information retrieval, named entity recognition, literature-based discovery, and question answering. We also describe work that directly addresses aspects of the pandemic through four additional tasks: topic modeling, sentiment and emotion analysis, caseload forecasting, and misinformation detection. We conclude by discussing observable trends and remaining challenges.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"313-339"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39371087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20Epub Date: 2021-05-13DOI: 10.1146/annurev-biodatasci-020221-123602
George-John Nychas, Emma Sims, Panagiotis Tsakanikas, Fady Mohareb
Food safety is one of the main challenges of the agri-food industry that is expected to be addressed in the current environment of tremendous technological progress, where consumers' lifestyles and preferences are in a constant state of flux. Food chain transparency and trust are drivers for food integrity control and for improvements in efficiency and economic growth. Similarly, the circular economy has great potential to reduce wastage and improve the efficiency of operations in multi-stakeholder ecosystems. Throughout the food chain cycle, all food commodities are exposed to multiple hazards, resulting in a high likelihood of contamination. Such biological or chemical hazards may be naturally present at any stage of food production, whether accidentally introduced or fraudulently imposed, risking consumers' health and their faith in the food industry. Nowadays, a massive amount of data is generated, not only from the next generation of food safety monitoring systems and along the entire food chain (primary production included) but also from the Internet of things, media, and other devices. These data should be used for the benefit of society, and the scientific field of data science should be a vital player in helping to make this possible.
{"title":"Data Science in the Food Industry.","authors":"George-John Nychas, Emma Sims, Panagiotis Tsakanikas, Fady Mohareb","doi":"10.1146/annurev-biodatasci-020221-123602","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020221-123602","url":null,"abstract":"<p><p>Food safety is one of the main challenges of the agri-food industry that is expected to be addressed in the current environment of tremendous technological progress, where consumers' lifestyles and preferences are in a constant state of flux. Food chain transparency and trust are drivers for food integrity control and for improvements in efficiency and economic growth. Similarly, the circular economy has great potential to reduce wastage and improve the efficiency of operations in multi-stakeholder ecosystems. Throughout the food chain cycle, all food commodities are exposed to multiple hazards, resulting in a high likelihood of contamination. Such biological or chemical hazards may be naturally present at any stage of food production, whether accidentally introduced or fraudulently imposed, risking consumers' health and their faith in the food industry. Nowadays, a massive amount of data is generated, not only from the next generation of food safety monitoring systems and along the entire food chain (primary production included) but also from the Internet of things, media, and other devices. These data should be used for the benefit of society, and the scientific field of data science should be a vital player in helping to make this possible.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"341-367"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39371089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20Epub Date: 2021-05-13DOI: 10.1146/annurev-biodatasci-031121-103035
Yancong Zhang, Kelsey N Thompson, Tobyn Branck, Yan Yan, Long H Nguyen, Eric A Franzosa, Curtis Huttenhower
Shotgun metatranscriptomics (MTX) is an increasingly practical way to survey microbial community gene function and regulation at scale. This review begins by summarizing the motivations for community transcriptomics and the history of the field. We then explore the principles, best practices, and challenges of contemporary MTX workflows: beginning with laboratory methods for isolation and sequencing of community RNA, followed by informatics methods for quantifying RNA features, and finally statistical methods for detecting differential expression in a community context. In thesecond half of the review, we survey important biological findings from the MTX literature, drawing examples from the human microbiome, other (nonhuman) host-associated microbiomes, and the environment. Across these examples, MTX methods prove invaluable for probing microbe-microbe and host-microbe interactions, the dynamics of energy harvest and chemical cycling, and responses to environmental stresses. We conclude with a review of open challenges in the MTX field, including making assays and analyses more robust, accessible, and adaptable to new technologies; deciphering roles for millions of uncharacterized microbial transcripts; and solving applied problems such as biomarker discovery and development of microbial therapeutics.
{"title":"Metatranscriptomics for the Human Microbiome and Microbial Community Functional Profiling.","authors":"Yancong Zhang, Kelsey N Thompson, Tobyn Branck, Yan Yan, Long H Nguyen, Eric A Franzosa, Curtis Huttenhower","doi":"10.1146/annurev-biodatasci-031121-103035","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-031121-103035","url":null,"abstract":"<p><p>Shotgun metatranscriptomics (MTX) is an increasingly practical way to survey microbial community gene function and regulation at scale. This review begins by summarizing the motivations for community transcriptomics and the history of the field. We then explore the principles, best practices, and challenges of contemporary MTX workflows: beginning with laboratory methods for isolation and sequencing of community RNA, followed by informatics methods for quantifying RNA features, and finally statistical methods for detecting differential expression in a community context. In thesecond half of the review, we survey important biological findings from the MTX literature, drawing examples from the human microbiome, other (nonhuman) host-associated microbiomes, and the environment. Across these examples, MTX methods prove invaluable for probing microbe-microbe and host-microbe interactions, the dynamics of energy harvest and chemical cycling, and responses to environmental stresses. We conclude with a review of open challenges in the MTX field, including making assays and analyses more robust, accessible, and adaptable to new technologies; deciphering roles for millions of uncharacterized microbial transcripts; and solving applied problems such as biomarker discovery and development of microbial therapeutics.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"279-311"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39370513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20Epub Date: 2021-04-23DOI: 10.1146/annurev-biodatasci-092820-025214
Jonathan Li, Ernest Fraenkel
Induced pluripotent stem cell (iPSC) technology holds promise for modeling neurodegenerative diseases. Traditional approaches for disease modeling using animal and cellular models require knowledge of disease mutations. However, many patients with neurodegenerative diseases do not have a known genetic cause. iPSCs offer a way to generate patient-specific models and study pathways of dysfunction in an in vitro setting in order to understand the causes and subtypes of neurodegeneration. Furthermore, iPSC-based models can be used to search for candidate therapeutics using high-throughput screening. Here we review how iPSC-based models are currently being used to further our understanding of neurodegenerative diseases, as well as discuss their challenges and future directions.
{"title":"Phenotyping Neurodegeneration in Human iPSCs.","authors":"Jonathan Li, Ernest Fraenkel","doi":"10.1146/annurev-biodatasci-092820-025214","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-092820-025214","url":null,"abstract":"<p><p>Induced pluripotent stem cell (iPSC) technology holds promise for modeling neurodegenerative diseases. Traditional approaches for disease modeling using animal and cellular models require knowledge of disease mutations. However, many patients with neurodegenerative diseases do not have a known genetic cause. iPSCs offer a way to generate patient-specific models and study pathways of dysfunction in an in vitro setting in order to understand the causes and subtypes of neurodegeneration. Furthermore, iPSC-based models can be used to search for candidate therapeutics using high-throughput screening. Here we review how iPSC-based models are currently being used to further our understanding of neurodegenerative diseases, as well as discuss their challenges and future directions.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"83-100"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237961/pdf/nihms-1816934.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39371084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20DOI: 10.1146/annurev-biodatasci-012721-122807
Xinyue Zhang, Peng Gao, Michael P Snyder
Human health is regulated by complex interactions among the genome, the microbiome, and the environment. While extensive research has been conducted on the human genome and microbiome, little is known about the human exposome. The exposome comprises the totality of chemical, biological, and physical exposures that individuals encounter over their lifetimes. Traditional environmental and biological monitoring only targets specific substances, whereas exposomic approaches identify and quantify thousands of substances simultaneously using nontargeted high-throughput and high-resolution analyses. The quantified self (QS) aims at enhancing our understanding of human health and disease through self-tracking. QS measurements are critical in exposome research, as external exposures impact an individual's health, behavior, and biology. This review discusses both the achievements and the shortcomings of current research and methodologies on the QS and the exposome and proposes future research directions.
{"title":"The Exposome in the Era of the Quantified Self.","authors":"Xinyue Zhang, Peng Gao, Michael P Snyder","doi":"10.1146/annurev-biodatasci-012721-122807","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-012721-122807","url":null,"abstract":"<p><p>Human health is regulated by complex interactions among the genome, the microbiome, and the environment. While extensive research has been conducted on the human genome and microbiome, little is known about the human exposome. The exposome comprises the totality of chemical, biological, and physical exposures that individuals encounter over their lifetimes. Traditional environmental and biological monitoring only targets specific substances, whereas exposomic approaches identify and quantify thousands of substances simultaneously using nontargeted high-throughput and high-resolution analyses. The quantified self (QS) aims at enhancing our understanding of human health and disease through self-tracking. QS measurements are critical in exposome research, as external exposures impact an individual's health, behavior, and biology. This review discusses both the achievements and the shortcomings of current research and methodologies on the QS and the exposome and proposes future research directions.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"255-277"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39371088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20Epub Date: 2021-06-01DOI: 10.1146/annurev-biodatasci-092820-033938
Irene Y Chen, Shalmali Joshi, Marzyeh Ghassemi, Rajesh Ranganath
Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial, including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning.
{"title":"Probabilistic Machine Learning for Healthcare.","authors":"Irene Y Chen, Shalmali Joshi, Marzyeh Ghassemi, Rajesh Ranganath","doi":"10.1146/annurev-biodatasci-092820-033938","DOIUrl":"10.1146/annurev-biodatasci-092820-033938","url":null,"abstract":"<p><p>Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial, including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"393-415"},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39370517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-20Epub Date: 2021-05-11DOI: 10.1146/annurev-biodatasci-122320-120920
Yoo-Ah Kim, Mark D M Leiserson, Priya Moorjani, Roded Sharan, Damian Wojtowicz, Teresa M Przytycka
Mutations are the driving force of evolution, yet they underlie many diseases, in particular, cancer. They are thought to arise from a combination of stochastic errors in DNA processing, naturally occurring DNA damage (e.g., the spontaneous deamination of methylated CpG sites), replication errors, and dysregulation of DNA repair mechanisms. High-throughput sequencing has made it possible to generate large datasets to study mutational processes in health and disease. Since the emergence of the first mutational process studies in 2012, this field is gaining increasing attention and has already accumulated a host of computational approaches and biomedical applications.
{"title":"Mutational Signatures: From Methods to Mechanisms.","authors":"Yoo-Ah Kim, Mark D M Leiserson, Priya Moorjani, Roded Sharan, Damian Wojtowicz, Teresa M Przytycka","doi":"10.1146/annurev-biodatasci-122320-120920","DOIUrl":"10.1146/annurev-biodatasci-122320-120920","url":null,"abstract":"<p><p>Mutations are the driving force of evolution, yet they underlie many diseases, in particular, cancer. They are thought to arise from a combination of stochastic errors in DNA processing, naturally occurring DNA damage (e.g., the spontaneous deamination of methylated CpG sites), replication errors, and dysregulation of DNA repair mechanisms. High-throughput sequencing has made it possible to generate large datasets to study mutational processes in health and disease. Since the emergence of the first mutational process studies in 2012, this field is gaining increasing attention and has already accumulated a host of computational approaches and biomedical applications.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"189-206"},"PeriodicalIF":7.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232997/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39370516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}