首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Single-Cell Multiomics. 单细胞多组学
IF 6 Pub Date : 2023-08-10 Epub Date: 2023-05-09 DOI: 10.1146/annurev-biodatasci-020422-050645
Emily Flynn, Ana Almonte-Loya, Gabriela K Fragiadakis

Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.

单细胞 RNA 测序方法提高了人们对复杂生物系统中异质性和转录组状态的认识。最近,用于检测其他模式(特别是基因组学、表观基因组学、蛋白质组学和空间数据)的新型单细胞技术的发展,使人们对细胞生物学有了前所未有的深入了解。某些技术能同时从同一个细胞中收集多种测量数据,即使是在不同细胞中分别检测的模式,我们也能应用新型计算方法来整合这些数据。将计算整合方法应用于多模态配对和非配对数据,可获得丰富的信息,包括存在的细胞身份以及不同生物学水平之间的相互作用,如遗传变异和转录之间的相互作用。在这篇综述中,我们既讨论了测量这些模式的单细胞技术,也介绍了各种计算整合方法,并说明了这些方法的特点。
{"title":"Single-Cell Multiomics.","authors":"Emily Flynn, Ana Almonte-Loya, Gabriela K Fragiadakis","doi":"10.1146/annurev-biodatasci-020422-050645","DOIUrl":"10.1146/annurev-biodatasci-020422-050645","url":null,"abstract":"<p><p>Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11146013/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism 自闭症神经精神表型的数据科学和机器学习综述和路线图
IF 6 Pub Date : 2023-03-07 DOI: 10.48550/arXiv.2303.03577
Peter Washington, D. Wall
Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
自闭症谱系障碍(Autism spectrum disorder,简称Autism)是一种神经发育迟缓,每44名儿童中至少有1人受到影响。像许多神经系统疾病表型一样,诊断特征是可观察到的,可以随着时间的推移而跟踪,并且可以通过适当的治疗和治疗来控制甚至消除。然而,自闭症和相关神经发育迟缓的诊断、治疗和纵向跟踪管道存在主要瓶颈,这为新的数据科学解决方案创造了机会,以增强和改变现有的工作流程,并为受影响的家庭提供更多的服务。许多研究实验室之前进行的几项努力已经在改善自闭症儿童的数字诊断和数字治疗方面取得了巨大进展。我们回顾了关于自闭症行为量化的数字健康方法和使用数据科学的有益治疗的文献。我们描述了数字表型的病例对照研究和分类系统。然后,我们讨论了整合自闭症相关行为的机器学习模型的数字诊断和治疗方法,包括必须解决的翻译使用因素。最后,我们描述了自闭症数据科学领域的持续挑战和潜在机遇。鉴于自闭症的异质性和相关行为的复杂性,本综述包含了与神经行为分析和更广泛的数字精神病学相关的见解。预计《生物医学数据科学年度评论》第6卷的最终在线出版日期为2023年8月。修订后的估计数请参阅http://www.annualreviews.org/page/journal/pubdates。
{"title":"A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism","authors":"Peter Washington, D. Wall","doi":"10.48550/arXiv.2303.03577","DOIUrl":"https://doi.org/10.48550/arXiv.2303.03577","url":null,"abstract":"Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47897781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores. 开发更具通用性的多基因风险评分的挑战与机遇。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 Epub Date: 2022-05-16 DOI: 10.1146/annurev-biodatasci-111721-074830
Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin

Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.

多基因风险评分(PRS)通过汇总从全基因组关联研究中发现的多个基因变异的信息,估算个体患复杂性状和疾病的遗传可能性。PRS 可以预测多种疾病,因此被广泛应用于研究领域。一些研究已对 PRS 作为生物标记物在预防医学中的潜在应用进行了调查,但仍需开展大量工作,以明确确定并向患者传达不同人口群体中遗传和可改变风险因素的绝对风险。然而,PRS 目前最大的局限性在于其在不同血统和队列中的通用性较差。目前正在通过方法开发和数据生成计划努力提高其通用性。本综述旨在全面讨论 PRS 目前的开发进展、影响其通用性的因素,以及提高其准确性、可移植性和实施的前景广阔的领域。
{"title":"Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores.","authors":"Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin","doi":"10.1146/annurev-biodatasci-111721-074830","DOIUrl":"10.1146/annurev-biodatasci-111721-074830","url":null,"abstract":"<p><p>Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9828290/pdf/nihms-1857872.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10555201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome Privacy and Trust. 基因组隐私与信任。
IF 6 Pub Date : 2022-08-10 DOI: 10.1146/annurev-biodatasci-122120-021311
Gamze Gürsoy

Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.

基因组学数据对于推进生物医学研究、改善临床护理以及为法医学和家谱学等其他学科提供信息非常重要。然而,当基因组数据被共享时,隐私问题就出现了。特别是,遗传信息的识别性质、与健康状况的直接关系以及对个人及其血亲造成的潜在经济损害和污名化,要求调查与共享遗传和相关数据有关的隐私问题以及克服这些问题的可能解决办法。在这项工作中,我们概述了基因组隐私的重要性,从基因组数据中收集的信息,基因组学中潜在的私人信息泄露的来源,以及在研究中利用遗传信息时保护隐私的方法。我们讨论了科学界的信任与保护隐私之间的关系,为数据共享和研究参与指明了未来的路线图。
{"title":"Genome Privacy and Trust.","authors":"Gamze Gürsoy","doi":"10.1146/annurev-biodatasci-122120-021311","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122120-021311","url":null,"abstract":"<p><p>Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9116494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Importance of Including Non-European Populations in Large Human Genetic Studies to Enhance Precision Medicine. 在大型人类基因研究中纳入非欧洲人群对加强精准医疗的重要性。
IF 6 Pub Date : 2022-08-10 Epub Date: 2022-05-16 DOI: 10.1146/annurev-biodatasci-122220-112550
Dan Ju, Daniel Hui, Dorothy A Hammond, Ambroise Wonkam, Sarah A Tishkoff

One goal of genomic medicine is to uncover an individual's genetic risk for disease, which generally requires data connecting genotype to phenotype, as done in genome-wide association studies (GWAS). While there may be clinical promise to employing prediction tools such as polygenic risk scores (PRS), it currently stands that individuals of non-European ancestry may not reap the benefits of genomic medicine because of underrepresentation in large-scale genetics studies. Here, we discuss why this inequity poses a problem for genomic medicine and the reasons for the low transferability of PRS across populations. We also survey the ancestry representation of published GWAS and investigate how estimates of ancestry diversity in GWASparticipants might be biased. We highlight the importance of expanding genetic research in Africa, one of the most underrepresented regions in human genomics research, and discuss issues of ethics, resources, and technology for equitable advancement of genomic medicine.

基因组医学的目标之一是揭示个体的遗传疾病风险,这通常需要将基因型与表型联系起来的数据,就像全基因组关联研究(GWAS)所做的那样。虽然采用多基因风险评分(PRS)等预测工具可能有临床前景,但目前的情况是,非欧洲血统的个体可能无法从基因组医学中获益,因为他们在大规模遗传学研究中的代表性不足。在此,我们将讨论为什么这种不公平会给基因组医学带来问题,以及 PRS 在不同人群中可移植性低的原因。我们还调查了已发表的 GWAS 中的祖先代表性,并研究了对 GWAS 参与者祖先多样性的估计可能存在的偏差。非洲是人类基因组学研究中代表性最弱的地区之一,我们强调了在非洲扩大基因研究的重要性,并讨论了伦理、资源和技术问题,以促进基因组医学的公平发展。
{"title":"Importance of Including Non-European Populations in Large Human Genetic Studies to Enhance Precision Medicine.","authors":"Dan Ju, Daniel Hui, Dorothy A Hammond, Ambroise Wonkam, Sarah A Tishkoff","doi":"10.1146/annurev-biodatasci-122220-112550","DOIUrl":"10.1146/annurev-biodatasci-122220-112550","url":null,"abstract":"<p><p>One goal of genomic medicine is to uncover an individual's genetic risk for disease, which generally requires data connecting genotype to phenotype, as done in genome-wide association studies (GWAS). While there may be clinical promise to employing prediction tools such as polygenic risk scores (PRS), it currently stands that individuals of non-European ancestry may not reap the benefits of genomic medicine because of underrepresentation in large-scale genetics studies. Here, we discuss why this inequity poses a problem for genomic medicine and the reasons for the low transferability of PRS across populations. We also survey the ancestry representation of published GWAS and investigate how estimates of ancestry diversity in GWASparticipants might be biased. We highlight the importance of expanding genetic research in Africa, one of the most underrepresented regions in human genomics research, and discuss issues of ethics, resources, and technology for equitable advancement of genomic medicine.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904154/pdf/nihms-1864817.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9545868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cotranslational Mechanisms of Protein Biogenesis and Complex Assembly in Eukaryotes. 真核生物蛋白质生物生成和复合体组装的共翻译机制。
IF 6 Pub Date : 2022-08-10 Epub Date: 2022-04-26 DOI: 10.1146/annurev-biodatasci-121721-095858
Fabián Morales-Polanco, Jae Ho Lee, Natália M Barbosa, Judith Frydman

The formation of protein complexes is crucial to most biological functions. The cellular mechanisms governing protein complex biogenesis are not yet well understood, but some principles of cotranslational and posttranslational assembly are beginning to emerge. In bacteria, this process is favored by operons encoding subunits of protein complexes. Eukaryotic cells do not have polycistronic mRNAs, raising the question of how they orchestrate the encounter of unassembled subunits. Here we review the constraints and mechanisms governing eukaryotic co- and posttranslational protein folding and assembly, including the influence of elongation rate on nascent chain targeting, folding, and chaperone interactions. Recent evidence shows that mRNAs encoding subunits of oligomeric assemblies can undergo localized translation and form cytoplasmic condensates that might facilitate the assembly of protein complexes. Understanding the interplay between localized mRNA translation and cotranslational proteostasis will be critical to defining protein complex assembly in vivo.

蛋白质复合物的形成对大多数生物功能至关重要。蛋白质复合物生物生成的细胞机制尚不十分清楚,但一些共翻译和翻译后组装的原理已开始显现。在细菌中,编码蛋白质复合体亚基的操作子有利于这一过程。真核细胞没有多聚核苷酸 mRNA,这就提出了它们如何协调未组装亚基相遇的问题。在此,我们回顾了真核生物共翻译和翻译后蛋白质折叠和组装的制约因素和机制,包括伸长率对新生链靶向、折叠和伴侣相互作用的影响。最近的证据表明,编码寡聚体组装亚基的 mRNA 可以进行局部翻译并形成细胞质凝聚物,从而促进蛋白质复合体的组装。了解局部 mRNA 翻译和共翻译蛋白稳态之间的相互作用对于确定体内蛋白质复合体的组装至关重要。
{"title":"Cotranslational Mechanisms of Protein Biogenesis and Complex Assembly in Eukaryotes.","authors":"Fabián Morales-Polanco, Jae Ho Lee, Natália M Barbosa, Judith Frydman","doi":"10.1146/annurev-biodatasci-121721-095858","DOIUrl":"10.1146/annurev-biodatasci-121721-095858","url":null,"abstract":"<p><p>The formation of protein complexes is crucial to most biological functions. The cellular mechanisms governing protein complex biogenesis are not yet well understood, but some principles of cotranslational and posttranslational assembly are beginning to emerge. In bacteria, this process is favored by operons encoding subunits of protein complexes. Eukaryotic cells do not have polycistronic mRNAs, raising the question of how they orchestrate the encounter of unassembled subunits. Here we review the constraints and mechanisms governing eukaryotic co- and posttranslational protein folding and assembly, including the influence of elongation rate on nascent chain targeting, folding, and chaperone interactions. Recent evidence shows that mRNAs encoding subunits of oligomeric assemblies can undergo localized translation and form cytoplasmic condensates that might facilitate the assembly of protein complexes. Understanding the interplay between localized mRNA translation and cotranslational proteostasis will be critical to defining protein complex assembly in vivo.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11040709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9769322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phenotypic Causal Inference Using Genome-Wide Association Study Data: Mendelian Randomization and Beyond. 利用全基因组关联研究数据进行表型因果推断:孟德尔随机化及其他。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 Epub Date: 2022-04-01 DOI: 10.1146/annurev-biodatasci-122120-024910
Venexia M Walker, Jie Zheng, Tom R Gaunt, George Davey Smith

statistics for genome-wide association studies (GWAS) are increasingly available for downstream analyses. Meanwhile, the popularity of causal inference methods has grown as we look to gather robust evidence for novel medical and public health interventions. This has led to the development of methods that use GWAS summary statistics for causal inference. Here, we describe these methods in order of their escalating complexity, from genetic associations to extensions of Mendelian randomization that consider thousands of phenotypes simultaneously. We also cover the assumptions and limitations of these approaches before considering the challenges faced by researchers performing causal inference using GWAS data. GWAS summary statistics constitute an important data source for causal inference research that offers a counterpoint to nongenetic methods when triangulating evidence. Continued efforts to address the challenges in using GWAS data for causal inference will allow the full impact of these approaches to be realized.

越来越多的全基因组关联研究(GWAS)统计数据可用于下游分析。同时,随着我们希望为新型医疗和公共卫生干预措施收集可靠证据,因果推断方法也越来越受欢迎。因此,我们开发了使用 GWAS 摘要统计进行因果推断的方法。在此,我们将按照复杂程度的递增顺序介绍这些方法,从遗传关联到孟德尔随机化的扩展(同时考虑数千种表型)。在考虑研究人员利用 GWAS 数据进行因果推断所面临的挑战之前,我们还将介绍这些方法的假设和局限性。GWAS 统计摘要是因果推断研究的一个重要数据源,在三角测量证据时可与非遗传方法相抗衡。继续努力解决使用 GWAS 数据进行因果推断时所面临的挑战,将使这些方法的影响得以充分发挥。
{"title":"Phenotypic Causal Inference Using Genome-Wide Association Study Data: Mendelian Randomization and Beyond.","authors":"Venexia M Walker, Jie Zheng, Tom R Gaunt, George Davey Smith","doi":"10.1146/annurev-biodatasci-122120-024910","DOIUrl":"10.1146/annurev-biodatasci-122120-024910","url":null,"abstract":"<p><p>statistics for genome-wide association studies (GWAS) are increasingly available for downstream analyses. Meanwhile, the popularity of causal inference methods has grown as we look to gather robust evidence for novel medical and public health interventions. This has led to the development of methods that use GWAS summary statistics for causal inference. Here, we describe these methods in order of their escalating complexity, from genetic associations to extensions of Mendelian randomization that consider thousands of phenotypes simultaneously. We also cover the assumptions and limitations of these approaches before considering the challenges faced by researchers performing causal inference using GWAS data. GWAS summary statistics constitute an important data source for causal inference research that offers a counterpoint to nongenetic methods when triangulating evidence. Continued efforts to address the challenges in using GWAS data for causal inference will allow the full impact of these approaches to be realized.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614231/pdf/EMS167448.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10780371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases. 大数据分析的最佳实践,以解决我们在了解疾病的病因、诊断和预后时存在的性别偏见。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 Epub Date: 2022-05-13 DOI: 10.1146/annurev-biodatasci-122120-025806
Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez

A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (a) "women," "men," or "sex"; (b) "big data," "artificial intelligence," or "NLP"; and (c) "disparities" or "differences." From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.

健康研究中偏向于理解男性所患疾病的偏见会对女性的健康产生严重影响。本文对有关机器学习或自然语言处理(NLP)技术的文献进行了概念性综述,这些技术用于查询大数据以识别特定性别的健康差异。我们在 2021 年 10 月使用以下同义词和索引词对 Ovid MEDLINE、Embase 和 PsycINFO 进行了检索:(a) "女性"、"男性 "或 "性别";(b) "大数据"、"人工智能 "或 "NLP";(c) "差异 "或 "差别"。在 902 条记录中,有 22 项研究符合纳入标准并进行了分析。结果表明,按性别纳入研究的情况并不一致,而且往往未作报告,尽管男性在这些研究中的比例比女性少得多。尽管人工智能和 NLP 技术已广泛应用于健康研究,但很少有研究利用它们来研究非结构化文本中与性别相关的差异或差距。研究人员越来越意识到基于性别的数据偏差,但纠正过程却很缓慢。我们反思了在了解疾病的病因、诊断和预后方面使用大数据分析来解决性别偏见的最佳实践。
{"title":"Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases.","authors":"Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez","doi":"10.1146/annurev-biodatasci-122120-025806","DOIUrl":"10.1146/annurev-biodatasci-122120-025806","url":null,"abstract":"<p><p>A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (<i>a</i>) \"women,\" \"men,\" or \"sex\"; (<i>b</i>) \"big data,\" \"artificial intelligence,\" or \"NLP\"; and (<i>c</i>) \"disparities\" or \"differences.\" From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142366765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering Biological Conflict Systems Through Genome Analysis: Evolutionary Principles and Biochemical Novelty. 通过基因组分析发现生物冲突系统:进化原理和生化新颖性。
IF 6 Pub Date : 2022-05-24 DOI: 10.1146/annurev-biodatasci-122220-101119
L. Aravind, L. Iyer, A. M. Burroughs
Biological replicators, from genes within a genome to whole organisms, are locked in conflicts. Comparative genomics has revealed a staggering diversity of molecular armaments and mechanisms regulating their deployment, collectively termed biological conflict systems. These encompass toxins used in inter- and intraspecific interactions, self/nonself discrimination, antiviral immune mechanisms, and counter-host effectors deployed by viruses and intragenomic selfish elements. These systems possess shared syntactical features in their organizational logic and a set of effectors targeting genetic information flow through the Central Dogma, certain membranes, and key molecules like NAD+. These principles can be exploited to discover new conflict systems through sensitive computational analyses. This has led to significant advances in our understanding of the biology of these systems and furnished new biotechnological reagents for genome editing, sequencing, and beyond. We discuss these advances using specific examples of toxins, restriction-modification, apoptosis, CRISPR/second messenger-regulated systems, and other enigmatic nucleic acid-targeting systems. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
生物复制因子,从基因组内的基因到整个生物体,都处于冲突之中。比较基因组学揭示了分子武器的惊人多样性和调节其部署的机制,统称为生物冲突系统。这些包括在种间和种内相互作用中使用的毒素,自我/非自我歧视,抗病毒免疫机制,病毒和基因组内自私元件部署的反宿主效应物。这些系统在组织逻辑上具有共同的语法特征,并具有一组针对遗传信息流的效应器,这些遗传信息流通过中央教条、某些膜和关键分子如NAD+。这些原理可以通过敏感的计算分析来发现新的冲突系统。这使得我们对这些系统的生物学理解取得了重大进展,并为基因组编辑、测序等提供了新的生物技术试剂。我们用毒素、限制性修饰、细胞凋亡、CRISPR/第二信使调节系统和其他神秘的核酸靶向系统的具体例子来讨论这些进展。预计《生物医学数据科学年度评论》第5卷的最终在线出版日期为2022年8月。修订后的估计数请参阅http://www.annualreviews.org/page/journal/pubdates。
{"title":"Discovering Biological Conflict Systems Through Genome Analysis: Evolutionary Principles and Biochemical Novelty.","authors":"L. Aravind, L. Iyer, A. M. Burroughs","doi":"10.1146/annurev-biodatasci-122220-101119","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122220-101119","url":null,"abstract":"Biological replicators, from genes within a genome to whole organisms, are locked in conflicts. Comparative genomics has revealed a staggering diversity of molecular armaments and mechanisms regulating their deployment, collectively termed biological conflict systems. These encompass toxins used in inter- and intraspecific interactions, self/nonself discrimination, antiviral immune mechanisms, and counter-host effectors deployed by viruses and intragenomic selfish elements. These systems possess shared syntactical features in their organizational logic and a set of effectors targeting genetic information flow through the Central Dogma, certain membranes, and key molecules like NAD+. These principles can be exploited to discover new conflict systems through sensitive computational analyses. This has led to significant advances in our understanding of the biology of these systems and furnished new biotechnological reagents for genome editing, sequencing, and beyond. We discuss these advances using specific examples of toxins, restriction-modification, apoptosis, CRISPR/second messenger-regulated systems, and other enigmatic nucleic acid-targeting systems. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42599953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Developing and Implementing Predictive Models in a Learning Healthcare System: Traditional and Artificial Intelligence Approaches in the Veterans Health Administration. 在学习医疗保健系统中开发和实施预测模型:退伍军人健康管理中的传统和人工智能方法。
IF 6 Pub Date : 2022-05-24 DOI: 10.1146/annurev-biodatasci-122220-110053
D. Atkins, C. A. Makridis, G. Alterovitz, R. Ramoni, C. Clancy
Predicting clinical risk is an important part of healthcare and can inform decisions about treatments, preventive interventions, and provision of extra services. The field of predictive models has been revolutionized over the past two decades by electronic health record data; the ability to link such data with other demographic, socioeconomic, and geographic information; the availability of high-capacity computing; and new machine learning and artificial intelligence methods for extracting insights from complex datasets. These advances have produced a new generation of computerized predictive models, but debate continues about their development, reporting, validation, evaluation, and implementation. In this review we reflect on more than 10 years of experience at the Veterans Health Administration, the largest integrated healthcare system in the United States, in developing, testing, and implementing such models at scale. We report lessons from the implementation of national risk prediction models and suggest an agenda for research. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
预测临床风险是医疗保健的重要组成部分,可以为有关治疗、预防性干预和提供额外服务的决策提供信息。在过去的二十年里,电子健康记录数据使预测模型领域发生了革命性的变化;将此类数据与其他人口、社会经济和地理信息联系起来的能力;高容量计算的可用性;以及从复杂数据集中提取见解的新机器学习和人工智能方法。这些进步已经产生了新一代的计算机化预测模型,但是关于它们的开发、报告、验证、评估和实现的争论仍在继续。在这篇综述中,我们回顾了退伍军人健康管理局(美国最大的综合医疗保健系统)10多年来在大规模开发、测试和实施这些模型方面的经验。我们报告了实施国家风险预测模型的经验教训,并提出了研究议程。预计《生物医学数据科学年度评论》第5卷的最终在线出版日期为2022年8月。修订后的估计数请参阅http://www.annualreviews.org/page/journal/pubdates。
{"title":"Developing and Implementing Predictive Models in a Learning Healthcare System: Traditional and Artificial Intelligence Approaches in the Veterans Health Administration.","authors":"D. Atkins, C. A. Makridis, G. Alterovitz, R. Ramoni, C. Clancy","doi":"10.1146/annurev-biodatasci-122220-110053","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122220-110053","url":null,"abstract":"Predicting clinical risk is an important part of healthcare and can inform decisions about treatments, preventive interventions, and provision of extra services. The field of predictive models has been revolutionized over the past two decades by electronic health record data; the ability to link such data with other demographic, socioeconomic, and geographic information; the availability of high-capacity computing; and new machine learning and artificial intelligence methods for extracting insights from complex datasets. These advances have produced a new generation of computerized predictive models, but debate continues about their development, reporting, validation, evaluation, and implementation. In this review we reflect on more than 10 years of experience at the Veterans Health Administration, the largest integrated healthcare system in the United States, in developing, testing, and implementing such models at scale. We report lessons from the implementation of national risk prediction models and suggest an agenda for research. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47617284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1