首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Data Science in the Food Industry. 食品工业中的数据科学。
IF 6 Pub Date : 2021-07-20 Epub Date: 2021-05-13 DOI: 10.1146/annurev-biodatasci-020221-123602
George-John Nychas, Emma Sims, Panagiotis Tsakanikas, Fady Mohareb

Food safety is one of the main challenges of the agri-food industry that is expected to be addressed in the current environment of tremendous technological progress, where consumers' lifestyles and preferences are in a constant state of flux. Food chain transparency and trust are drivers for food integrity control and for improvements in efficiency and economic growth. Similarly, the circular economy has great potential to reduce wastage and improve the efficiency of operations in multi-stakeholder ecosystems. Throughout the food chain cycle, all food commodities are exposed to multiple hazards, resulting in a high likelihood of contamination. Such biological or chemical hazards may be naturally present at any stage of food production, whether accidentally introduced or fraudulently imposed, risking consumers' health and their faith in the food industry. Nowadays, a massive amount of data is generated, not only from the next generation of food safety monitoring systems and along the entire food chain (primary production included) but also from the Internet of things, media, and other devices. These data should be used for the benefit of society, and the scientific field of data science should be a vital player in helping to make this possible.

食品安全是农业食品行业面临的主要挑战之一,预计将在当前技术进步巨大的环境中得到解决,消费者的生活方式和偏好处于不断变化的状态。食品链的透明度和信任是食品完整性控制和提高效率和经济增长的驱动力。同样,循环经济在减少浪费和提高多方利益相关者生态系统的运营效率方面具有巨大潜力。在整个食品链周期中,所有食品商品都面临多重危害,导致污染的可能性很高。这种生物或化学危害可能自然存在于食品生产的任何阶段,无论是偶然引入还是欺诈强加,都可能危及消费者的健康和他们对食品工业的信心。如今,大量的数据不仅来自下一代食品安全监控系统和整个食品链(包括初级生产),还来自物联网、媒体和其他设备。这些数据应该用于造福社会,数据科学的科学领域应该在帮助实现这一目标方面发挥重要作用。
{"title":"Data Science in the Food Industry.","authors":"George-John Nychas,&nbsp;Emma Sims,&nbsp;Panagiotis Tsakanikas,&nbsp;Fady Mohareb","doi":"10.1146/annurev-biodatasci-020221-123602","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020221-123602","url":null,"abstract":"<p><p>Food safety is one of the main challenges of the agri-food industry that is expected to be addressed in the current environment of tremendous technological progress, where consumers' lifestyles and preferences are in a constant state of flux. Food chain transparency and trust are drivers for food integrity control and for improvements in efficiency and economic growth. Similarly, the circular economy has great potential to reduce wastage and improve the efficiency of operations in multi-stakeholder ecosystems. Throughout the food chain cycle, all food commodities are exposed to multiple hazards, resulting in a high likelihood of contamination. Such biological or chemical hazards may be naturally present at any stage of food production, whether accidentally introduced or fraudulently imposed, risking consumers' health and their faith in the food industry. Nowadays, a massive amount of data is generated, not only from the next generation of food safety monitoring systems and along the entire food chain (primary production included) but also from the Internet of things, media, and other devices. These data should be used for the benefit of society, and the scientific field of data science should be a vital player in helping to make this possible.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39371089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Metatranscriptomics for the Human Microbiome and Microbial Community Functional Profiling. 人类微生物组和微生物群落功能分析的超转录组学。
IF 6 Pub Date : 2021-07-20 Epub Date: 2021-05-13 DOI: 10.1146/annurev-biodatasci-031121-103035
Yancong Zhang, Kelsey N Thompson, Tobyn Branck, Yan Yan, Long H Nguyen, Eric A Franzosa, Curtis Huttenhower

Shotgun metatranscriptomics (MTX) is an increasingly practical way to survey microbial community gene function and regulation at scale. This review begins by summarizing the motivations for community transcriptomics and the history of the field. We then explore the principles, best practices, and challenges of contemporary MTX workflows: beginning with laboratory methods for isolation and sequencing of community RNA, followed by informatics methods for quantifying RNA features, and finally statistical methods for detecting differential expression in a community context. In thesecond half of the review, we survey important biological findings from the MTX literature, drawing examples from the human microbiome, other (nonhuman) host-associated microbiomes, and the environment. Across these examples, MTX methods prove invaluable for probing microbe-microbe and host-microbe interactions, the dynamics of energy harvest and chemical cycling, and responses to environmental stresses. We conclude with a review of open challenges in the MTX field, including making assays and analyses more robust, accessible, and adaptable to new technologies; deciphering roles for millions of uncharacterized microbial transcripts; and solving applied problems such as biomarker discovery and development of microbial therapeutics.

散弹枪亚转录组学(MTX)是一种越来越实用的大规模调查微生物群落基因功能和调控的方法。本文首先概述了社区转录组学研究的动机和该领域的历史。然后,我们探讨了当代MTX工作流程的原则、最佳实践和挑战:从分离和测序社区RNA的实验室方法开始,接着是量化RNA特征的信息学方法,最后是检测社区背景下差异表达的统计方法。在这篇综述的后半部分,我们调查了MTX文献中重要的生物学发现,从人类微生物组、其他(非人类)宿主相关微生物组和环境中提取了例子。在这些例子中,MTX方法在探测微生物-微生物和宿主-微生物相互作用、能量收集和化学循环的动力学以及对环境胁迫的反应方面证明是无价的。最后,我们回顾了MTX领域面临的挑战,包括使检测和分析更强大、更容易获取和适应新技术;解读数百万未表征的微生物转录物的作用;并解决应用问题,如生物标志物的发现和微生物疗法的发展。
{"title":"Metatranscriptomics for the Human Microbiome and Microbial Community Functional Profiling.","authors":"Yancong Zhang,&nbsp;Kelsey N Thompson,&nbsp;Tobyn Branck,&nbsp;Yan Yan,&nbsp;Long H Nguyen,&nbsp;Eric A Franzosa,&nbsp;Curtis Huttenhower","doi":"10.1146/annurev-biodatasci-031121-103035","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-031121-103035","url":null,"abstract":"<p><p>Shotgun metatranscriptomics (MTX) is an increasingly practical way to survey microbial community gene function and regulation at scale. This review begins by summarizing the motivations for community transcriptomics and the history of the field. We then explore the principles, best practices, and challenges of contemporary MTX workflows: beginning with laboratory methods for isolation and sequencing of community RNA, followed by informatics methods for quantifying RNA features, and finally statistical methods for detecting differential expression in a community context. In thesecond half of the review, we survey important biological findings from the MTX literature, drawing examples from the human microbiome, other (nonhuman) host-associated microbiomes, and the environment. Across these examples, MTX methods prove invaluable for probing microbe-microbe and host-microbe interactions, the dynamics of energy harvest and chemical cycling, and responses to environmental stresses. We conclude with a review of open challenges in the MTX field, including making assays and analyses more robust, accessible, and adaptable to new technologies; deciphering roles for millions of uncharacterized microbial transcripts; and solving applied problems such as biomarker discovery and development of microbial therapeutics.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39370513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Phenotyping Neurodegeneration in Human iPSCs. 人类ips细胞神经变性的表型分析。
IF 6 Pub Date : 2021-07-20 Epub Date: 2021-04-23 DOI: 10.1146/annurev-biodatasci-092820-025214
Jonathan Li, Ernest Fraenkel

Induced pluripotent stem cell (iPSC) technology holds promise for modeling neurodegenerative diseases. Traditional approaches for disease modeling using animal and cellular models require knowledge of disease mutations. However, many patients with neurodegenerative diseases do not have a known genetic cause. iPSCs offer a way to generate patient-specific models and study pathways of dysfunction in an in vitro setting in order to understand the causes and subtypes of neurodegeneration. Furthermore, iPSC-based models can be used to search for candidate therapeutics using high-throughput screening. Here we review how iPSC-based models are currently being used to further our understanding of neurodegenerative diseases, as well as discuss their challenges and future directions.

诱导多能干细胞(iPSC)技术有望模拟神经退行性疾病。使用动物和细胞模型进行疾病建模的传统方法需要了解疾病突变。然而,许多患有神经退行性疾病的患者并没有已知的遗传原因。iPSCs提供了一种在体外环境中生成患者特异性模型和研究功能障碍途径的方法,以便了解神经退行性变的原因和亚型。此外,基于ipsc的模型可以通过高通量筛选来搜索候选治疗方法。在这里,我们回顾了基于ipsc的模型目前如何被用于进一步我们对神经退行性疾病的理解,并讨论了它们的挑战和未来的方向。
{"title":"Phenotyping Neurodegeneration in Human iPSCs.","authors":"Jonathan Li,&nbsp;Ernest Fraenkel","doi":"10.1146/annurev-biodatasci-092820-025214","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-092820-025214","url":null,"abstract":"<p><p>Induced pluripotent stem cell (iPSC) technology holds promise for modeling neurodegenerative diseases. Traditional approaches for disease modeling using animal and cellular models require knowledge of disease mutations. However, many patients with neurodegenerative diseases do not have a known genetic cause. iPSCs offer a way to generate patient-specific models and study pathways of dysfunction in an in vitro setting in order to understand the causes and subtypes of neurodegeneration. Furthermore, iPSC-based models can be used to search for candidate therapeutics using high-throughput screening. Here we review how iPSC-based models are currently being used to further our understanding of neurodegenerative diseases, as well as discuss their challenges and future directions.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237961/pdf/nihms-1816934.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39371084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Probabilistic Machine Learning for Healthcare. 医疗保健领域的概率机器学习。
IF 6 Pub Date : 2021-07-20 Epub Date: 2021-06-01 DOI: 10.1146/annurev-biodatasci-092820-033938
Irene Y Chen, Shalmali Joshi, Marzyeh Ghassemi, Rajesh Ranganath

Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial, including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning.

机器学习可以用来理解医疗数据。概率机器学习模型有助于提供医疗保健中观察数据的完整图像。在这篇综述中,我们研究了概率机器学习如何推进医疗保健。我们考虑了预测模型构建管道中的挑战,其中概率模型可能是有益的,包括校准和丢失数据。除了预测模型,我们还研究了概率机器学习模型在表型、临床用例生成模型和强化学习中的效用。
{"title":"Probabilistic Machine Learning for Healthcare.","authors":"Irene Y Chen,&nbsp;Shalmali Joshi,&nbsp;Marzyeh Ghassemi,&nbsp;Rajesh Ranganath","doi":"10.1146/annurev-biodatasci-092820-033938","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-092820-033938","url":null,"abstract":"<p><p>Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial, including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39370517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
The Exposome in the Era of the Quantified Self. 自我量化时代的暴露。
IF 6 Pub Date : 2021-07-20 DOI: 10.1146/annurev-biodatasci-012721-122807
Xinyue Zhang, Peng Gao, Michael P Snyder

Human health is regulated by complex interactions among the genome, the microbiome, and the environment. While extensive research has been conducted on the human genome and microbiome, little is known about the human exposome. The exposome comprises the totality of chemical, biological, and physical exposures that individuals encounter over their lifetimes. Traditional environmental and biological monitoring only targets specific substances, whereas exposomic approaches identify and quantify thousands of substances simultaneously using nontargeted high-throughput and high-resolution analyses. The quantified self (QS) aims at enhancing our understanding of human health and disease through self-tracking. QS measurements are critical in exposome research, as external exposures impact an individual's health, behavior, and biology. This review discusses both the achievements and the shortcomings of current research and methodologies on the QS and the exposome and proposes future research directions.

人类健康是由基因组、微生物组和环境之间复杂的相互作用调节的。虽然对人类基因组和微生物组进行了广泛的研究,但对人类暴露体知之甚少。暴露包括个人一生中所接触的化学、生物和物理的全部暴露。传统的环境和生物监测仅针对特定物质,而暴露学方法使用非靶向高通量和高分辨率分析同时识别和量化数千种物质。量化自我(QS)旨在通过自我跟踪来增强我们对人类健康和疾病的了解。QS测量在暴露研究中至关重要,因为外部暴露会影响个人的健康、行为和生物学。本文综述了目前QS和暴露点的研究成果和方法的不足,并提出了未来的研究方向。
{"title":"The Exposome in the Era of the Quantified Self.","authors":"Xinyue Zhang,&nbsp;Peng Gao,&nbsp;Michael P Snyder","doi":"10.1146/annurev-biodatasci-012721-122807","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-012721-122807","url":null,"abstract":"<p><p>Human health is regulated by complex interactions among the genome, the microbiome, and the environment. While extensive research has been conducted on the human genome and microbiome, little is known about the human exposome. The exposome comprises the totality of chemical, biological, and physical exposures that individuals encounter over their lifetimes. Traditional environmental and biological monitoring only targets specific substances, whereas exposomic approaches identify and quantify thousands of substances simultaneously using nontargeted high-throughput and high-resolution analyses. The quantified self (QS) aims at enhancing our understanding of human health and disease through self-tracking. QS measurements are critical in exposome research, as external exposures impact an individual's health, behavior, and biology. This review discusses both the achievements and the shortcomings of current research and methodologies on the QS and the exposome and proposes future research directions.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39371088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Mutational Signatures: From Methods to Mechanisms. 突变签名:从方法到机制。
IF 6 Pub Date : 2021-07-20 Epub Date: 2021-05-11 DOI: 10.1146/annurev-biodatasci-122320-120920
Yoo-Ah Kim, Mark D M Leiserson, Priya Moorjani, Roded Sharan, Damian Wojtowicz, Teresa M Przytycka

Mutations are the driving force of evolution, yet they underlie many diseases, in particular, cancer. They are thought to arise from a combination of stochastic errors in DNA processing, naturally occurring DNA damage (e.g., the spontaneous deamination of methylated CpG sites), replication errors, and dysregulation of DNA repair mechanisms. High-throughput sequencing has made it possible to generate large datasets to study mutational processes in health and disease. Since the emergence of the first mutational process studies in 2012, this field is gaining increasing attention and has already accumulated a host of computational approaches and biomedical applications.

突变是进化的驱动力,但它们是许多疾病,尤其是癌症的基础。它们被认为是由DNA加工中的随机错误、自然发生的DNA损伤(例如,甲基化CpG位点的自发脱氨)、复制错误和DNA修复机制失调的组合引起的。高通量测序使得产生大型数据集来研究健康和疾病的突变过程成为可能。自2012年首次出现突变过程研究以来,该领域受到越来越多的关注,并且已经积累了大量的计算方法和生物医学应用。
{"title":"Mutational Signatures: From Methods to Mechanisms.","authors":"Yoo-Ah Kim,&nbsp;Mark D M Leiserson,&nbsp;Priya Moorjani,&nbsp;Roded Sharan,&nbsp;Damian Wojtowicz,&nbsp;Teresa M Przytycka","doi":"10.1146/annurev-biodatasci-122320-120920","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122320-120920","url":null,"abstract":"<p><p>Mutations are the driving force of evolution, yet they underlie many diseases, in particular, cancer. They are thought to arise from a combination of stochastic errors in DNA processing, naturally occurring DNA damage (e.g., the spontaneous deamination of methylated CpG sites), replication errors, and dysregulation of DNA repair mechanisms. High-throughput sequencing has made it possible to generate large datasets to study mutational processes in health and disease. Since the emergence of the first mutational process studies in 2012, this field is gaining increasing attention and has already accumulated a host of computational approaches and biomedical applications.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39370516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Perspectives on Allele-Specific Expression. 等位基因特异性表达研究进展
IF 6 Pub Date : 2021-07-20 Epub Date: 2021-04-28 DOI: 10.1146/annurev-biodatasci-021621-122219
Siobhan Cleary, Cathal Seoighe

Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes.

二倍体对群体遗传学和遗传疾病的易感性具有深远的影响。尽管人类基因组中的大多数基因都有两个拷贝,但它们不一定都是活跃的,也不一定在一个特定的个体中处于相同的水平。基因组印记,导致有利于父系或母系等位基因的排他性或偏向性表达,现在被认为影响数百个人类基因。由于顺式作用的基因变异干扰了基因的表达,更多的基因表现出基因拷贝的不平等表达。应用于大量个体和组织类型的RNA测序产生的数据的可用性为评估遗传变异对基因表达中等位基因失衡的贡献提供了前所未有的机会。在这里,我们回顾了通过分析这些数据所获得的见解,包括基因表达不平衡的遗传贡献程度、基因表达不平衡的工具和统计模型,以及所获得的结果揭示了改变基因表达的遗传变异对复杂人类疾病和表型的贡献。
{"title":"Perspectives on Allele-Specific Expression.","authors":"Siobhan Cleary,&nbsp;Cathal Seoighe","doi":"10.1146/annurev-biodatasci-021621-122219","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-021621-122219","url":null,"abstract":"<p><p>Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to <i>cis</i>-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39370512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS. 将 Phecodes 用于电子病历研究:从 PheWAS 到 PheRS。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-07-20 DOI: 10.1146/annurev-biodatasci-122320-112352
Lisa Bastarache

Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.

电子病历(EHR)是研究人员丰富的数据来源,但要从这一高度复杂的数据源中提取有意义的信息却极具挑战性。表型代码是利用电子病历数据为研究定义表型的一种策略。它们是一种基于 ICD(国际疾病分类)代码的高通量表型分析工具,可用于快速定义数千种具有临床意义的疾病和病症的病例/对照状态。Phecodes最初是为进行表型范围关联研究而开发的,目的是扫描表型与常见基因变异的关联。此后,Phecodes 被用于支持各种基于电子病历的表型分析方法,包括表型风险评分。本综述旨在全面描述表型代码的开发、验证和应用,并提出表型代码和高通量表型分析的未来发展方向。
{"title":"Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS.","authors":"Lisa Bastarache","doi":"10.1146/annurev-biodatasci-122320-112352","DOIUrl":"10.1146/annurev-biodatasci-122320-112352","url":null,"abstract":"<p><p>Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9307256/pdf/nihms-1823813.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39373762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Illuminating the Virosphere Through Global Metagenomics. 通过全球宏基因组学照亮病毒圈。
IF 6 Pub Date : 2021-07-20 DOI: 10.1146/annurev-biodatasci-012221-095114
Lee Call, Stephen Nayfach, Nikos C Kyrpides

Viruses are the most abundant biological entity on Earth, infect cellular organisms from all domains of life, and are central players in the global biosphere. Over the last century, the discovery and characterization of viruses have progressed steadily alongside much of modern biology. In terms of outright numbers of novel viruses discovered, however, the last few years have been by far the most transformative for the field. Advances in methods for identifying viral sequences in genomic and metagenomic datasets, coupled to the exponential growth of environmental sequencing, have greatly expanded the catalog of known viruses and fueled the tremendous growth of viral sequence databases. Development and implementation of new standards, along with careful study of the newly discovered viruses, have transformed and will continue to transform our understanding of microbial evolution, ecology, and biogeochemical cycles, leading to new biotechnological innovations across many diverse fields, including environmental, agricultural, and biomedical sciences.

病毒是地球上最丰富的生物实体,感染来自所有生命领域的细胞生物体,是全球生物圈的核心参与者。在过去的一个世纪里,病毒的发现和特征与现代生物学一起稳步发展。然而,就新发现病毒的数量而言,过去几年是该领域迄今为止最具变革性的几年。在基因组和宏基因组数据集中识别病毒序列的方法的进步,加上环境测序的指数级增长,极大地扩展了已知病毒的目录,并推动了病毒序列数据库的巨大增长。新标准的制定和实施,以及对新发现病毒的仔细研究,已经并将继续改变我们对微生物进化、生态学和生物地球化学循环的理解,从而在包括环境、农业和生物医学科学在内的许多不同领域带来新的生物技术创新。
{"title":"Illuminating the Virosphere Through Global Metagenomics.","authors":"Lee Call,&nbsp;Stephen Nayfach,&nbsp;Nikos C Kyrpides","doi":"10.1146/annurev-biodatasci-012221-095114","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-012221-095114","url":null,"abstract":"<p><p>Viruses are the most abundant biological entity on Earth, infect cellular organisms from all domains of life, and are central players in the global biosphere. Over the last century, the discovery and characterization of viruses have progressed steadily alongside much of modern biology. In terms of outright numbers of novel viruses discovered, however, the last few years have been by far the most transformative for the field. Advances in methods for identifying viral sequences in genomic and metagenomic datasets, coupled to the exponential growth of environmental sequencing, have greatly expanded the catalog of known viruses and fueled the tremendous growth of viral sequence databases. Development and implementation of new standards, along with careful study of the newly discovered viruses, have transformed and will continue to transform our understanding of microbial evolution, ecology, and biogeochemical cycles, leading to new biotechnological innovations across many diverse fields, including environmental, agricultural, and biomedical sciences.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39370510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Ethical Machine Learning in Healthcare. 医疗保健领域的道德机器学习。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-07-01 Epub Date: 2021-05-06 DOI: 10.1146/annurev-biodatasci-092820-114757
Irene Y Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, Marzyeh Ghassemi

The use of machine learning (ML) in healthcare raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline ethical considerations for equitable ML in the advancement of healthcare. Specifically, we frame ethics of ML in healthcare through the lens of social justice. We describe ongoing efforts and outline challenges in a proposed pipeline of ethical ML in health, ranging from problem selection to postdeployment considerations. We close by summarizing recommendations to address these challenges.

在医疗保健领域使用机器学习(ML)会引发许多伦理问题,特别是由于模型可能会扩大现有的健康不平等。在此,我们概述了在医疗保健领域推进公平机器学习的伦理考虑因素。具体来说,我们将从社会正义的角度来阐述医疗保健领域的 ML 伦理问题。我们描述了正在进行的努力,并概述了拟议中的健康领域道德人工智能管道所面临的挑战,包括从问题选择到部署后的考虑因素。最后,我们总结了应对这些挑战的建议。
{"title":"Ethical Machine Learning in Healthcare.","authors":"Irene Y Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, Marzyeh Ghassemi","doi":"10.1146/annurev-biodatasci-092820-114757","DOIUrl":"10.1146/annurev-biodatasci-092820-114757","url":null,"abstract":"<p><p>The use of machine learning (ML) in healthcare raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline ethical considerations for equitable ML in the advancement of healthcare. Specifically, we frame ethics of ML in healthcare through the lens of social justice. We describe ongoing efforts and outline challenges in a proposed pipeline of ethical ML in health, ranging from problem selection to postdeployment considerations. We close by summarizing recommendations to address these challenges.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8362902/pdf/nihms-1712606.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39314388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1