首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics. 转录组学和蛋白质组学中剪接变异体和异构体的鉴定。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-10 DOI: 10.1146/annurev-biodatasci-020722-044021
Taojunfeng Su, Michael A R Hollas, Ryan T Fellers, Neil L Kelleher

Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.

选择性剪接对真核细胞中基因表达和蛋白质多样性的调节至关重要。选择性剪接事件的检测需要特定的组学技术。尽管短读RNA测序已经成功地支持了对替代剪接的大量研究,但新出现的长读RNA测序和自上而下的质谱技术为识别替代剪接和蛋白质异构体提供了新的机会,而不那么模糊。在这里,我们总结了用于选择性剪接分析的短读RNA测序的改进,包括百分比剪接指数估计和差异分析。我们还回顾了自上而下蛋白质组学分析中使用的蛋白质形态鉴定的计算方法,包括蛋白质异构体数据库的构建和搜索结果的统计分析。虽然测序和计算方法的许多改进将来自新兴技术,但未来应该努力提高替代剪接事件的有效性、整合性和蛋白质组覆盖率。
{"title":"Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics.","authors":"Taojunfeng Su, Michael A R Hollas, Ryan T Fellers, Neil L Kelleher","doi":"10.1146/annurev-biodatasci-020722-044021","DOIUrl":"10.1146/annurev-biodatasci-020722-044021","url":null,"abstract":"<p><p>Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"357-376"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10840079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10339608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research. 我们所有人的数据和研究中心:为生物医学研究创建一个安全、可扩展和可持续的生态系统。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-10 DOI: 10.1146/annurev-biodatasci-122120-104825
Kelsey R Mayo, Melissa A Basford, Robert J Carroll, Moira Dillon, Heather Fullen, Jesse Leung, Hiral Master, Shimon Rura, Lina Sulieman, Nan Kennedy, Eric Banks, David Bernick, Asmita Gauchan, Lee Lichtenstein, Brandy M Mapes, Kayla Marginean, Steve L Nyemba, Andrea Ramirez, Charissa Rotundo, Keri Wolfe, Weiyi Xia, Romuladus E Azuine, Robert M Cronin, Joshua C Denny, Abel Kho, Christopher Lunt, Bradley Malin, Karthik Natarajan, Consuelo H Wilkins, Hua Xu, George Hripcsak, Dan M Roden, Anthony A Philippakis, David Glazer, Paul A Harris

The All of Us Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in All of Us, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the All of Us data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.

我们所有人研究计划的数据和研究中心(DRC)成立的目的是帮助获取、策划和访问世界上最大、最多样化的精准医学研究数据集之一。已经有超过50万名参与者参加了All of Us,其中80%在生物医学研究中的代表性不足,2300多名研究人员正在分析数据。DRC通过与参与者、创新项目合作伙伴和有能力的研究人员合作,创建了这个蓬勃发展的数据生态系统。在这篇综述中,我们首先描述了刚果民主共和国是如何组织起来以满足这一广泛利益相关者群体的需求的。然后,我们概述了用于构建All of Us数据生态系统的指导原则、共同挑战和创新方法。最后,我们分享经验教训,帮助其他人在构建现代生物医学数据平台时做出重要决策和权衡。
{"title":"The <i>All of Us</i> Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research.","authors":"Kelsey R Mayo, Melissa A Basford, Robert J Carroll, Moira Dillon, Heather Fullen, Jesse Leung, Hiral Master, Shimon Rura, Lina Sulieman, Nan Kennedy, Eric Banks, David Bernick, Asmita Gauchan, Lee Lichtenstein, Brandy M Mapes, Kayla Marginean, Steve L Nyemba, Andrea Ramirez, Charissa Rotundo, Keri Wolfe, Weiyi Xia, Romuladus E Azuine, Robert M Cronin, Joshua C Denny, Abel Kho, Christopher Lunt, Bradley Malin, Karthik Natarajan, Consuelo H Wilkins, Hua Xu, George Hripcsak, Dan M Roden, Anthony A Philippakis, David Glazer, Paul A Harris","doi":"10.1146/annurev-biodatasci-122120-104825","DOIUrl":"10.1146/annurev-biodatasci-122120-104825","url":null,"abstract":"<p><p>The <i>All of Us</i> Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in <i>All of Us</i>, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the <i>All of Us</i> data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"443-464"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10040579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Noninvasive Prenatal Testing Using Circulating DNA and RNA: Advances, Challenges, and Possibilities. 使用循环DNA和RNA进行无创产前检测:进展、挑战和可能性。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-10 Epub Date: 2023-05-17 DOI: 10.1146/annurev-biodatasci-020722-094144
Mira N Moufarrej, Diana W Bianchi, Gary M Shaw, David K Stevenson, Stephen R Quake

Prenatal screening using sequencing of circulating cell-free DNA has transformed obstetric care over the past decade and significantly reduced the number of invasive diagnostic procedures like amniocentesis for genetic disorders. Nonetheless, emergency care remains the only option for complications like preeclampsia and preterm birth, two of the most prevalent obstetrical syndromes. Advances in noninvasive prenatal testing expand the scope of precision medicine in obstetric care. In this review, we discuss advances, challenges, and possibilities toward the goal of providing proactive, personalized prenatal care. The highlighted advances focus mainly on cell-free nucleic acids; however, we also review research that uses signals from metabolomics, proteomics, intact cells, and the microbiome. We discuss ethical challenges in providing care. Finally, we look to future possibilities, including redefining disease taxonomy and moving from biomarker correlation to biological causation.

在过去的十年里,使用循环无细胞DNA测序的产前筛查改变了产科护理,并显著减少了羊水穿刺等遗传疾病侵入性诊断程序的数量。尽管如此,紧急护理仍然是先兆子痫和早产等并发症的唯一选择,这两种最常见的产科综合征。无创产前检测的进展扩大了产科护理中精准医学的范围。在这篇综述中,我们讨论了提供主动、个性化产前护理的进展、挑战和可能性。突出的进展主要集中在无细胞核酸上;然而,我们也回顾了使用来自代谢组学、蛋白质组学、完整细胞和微生物组的信号的研究。我们讨论了提供护理的道德挑战。最后,我们展望了未来的可能性,包括重新定义疾病分类学,从生物标志物相关性转向生物因果关系。
{"title":"Noninvasive Prenatal Testing Using Circulating DNA and RNA: Advances, Challenges, and Possibilities.","authors":"Mira N Moufarrej, Diana W Bianchi, Gary M Shaw, David K Stevenson, Stephen R Quake","doi":"10.1146/annurev-biodatasci-020722-094144","DOIUrl":"10.1146/annurev-biodatasci-020722-094144","url":null,"abstract":"<p><p>Prenatal screening using sequencing of circulating cell-free DNA has transformed obstetric care over the past decade and significantly reduced the number of invasive diagnostic procedures like amniocentesis for genetic disorders. Nonetheless, emergency care remains the only option for complications like preeclampsia and preterm birth, two of the most prevalent obstetrical syndromes. Advances in noninvasive prenatal testing expand the scope of precision medicine in obstetric care. In this review, we discuss advances, challenges, and possibilities toward the goal of providing proactive, personalized prenatal care. The highlighted advances focus mainly on cell-free nucleic acids; however, we also review research that uses signals from metabolomics, proteomics, intact cells, and the microbiome. We discuss ethical challenges in providing care. Finally, we look to future possibilities, including redefining disease taxonomy and moving from biomarker correlation to biological causation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"397-418"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10528197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9969611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism 自闭症神经精神表型的数据科学和机器学习综述和路线图
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-03-07 DOI: 10.48550/arXiv.2303.03577
Peter Washington, D. Wall
Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
自闭症谱系障碍(Autism spectrum disorder,简称Autism)是一种神经发育迟缓,每44名儿童中至少有1人受到影响。像许多神经系统疾病表型一样,诊断特征是可观察到的,可以随着时间的推移而跟踪,并且可以通过适当的治疗和治疗来控制甚至消除。然而,自闭症和相关神经发育迟缓的诊断、治疗和纵向跟踪管道存在主要瓶颈,这为新的数据科学解决方案创造了机会,以增强和改变现有的工作流程,并为受影响的家庭提供更多的服务。许多研究实验室之前进行的几项努力已经在改善自闭症儿童的数字诊断和数字治疗方面取得了巨大进展。我们回顾了关于自闭症行为量化的数字健康方法和使用数据科学的有益治疗的文献。我们描述了数字表型的病例对照研究和分类系统。然后,我们讨论了整合自闭症相关行为的机器学习模型的数字诊断和治疗方法,包括必须解决的翻译使用因素。最后,我们描述了自闭症数据科学领域的持续挑战和潜在机遇。鉴于自闭症的异质性和相关行为的复杂性,本综述包含了与神经行为分析和更广泛的数字精神病学相关的见解。预计《生物医学数据科学年度评论》第6卷的最终在线出版日期为2023年8月。修订后的估计数请参阅http://www.annualreviews.org/page/journal/pubdates。
{"title":"A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism","authors":"Peter Washington, D. Wall","doi":"10.48550/arXiv.2303.03577","DOIUrl":"https://doi.org/10.48550/arXiv.2303.03577","url":null,"abstract":"Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47897781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores. 开发更具通用性的多基因风险评分的挑战与机遇。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 Epub Date: 2022-05-16 DOI: 10.1146/annurev-biodatasci-111721-074830
Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin

Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.

多基因风险评分(PRS)通过汇总从全基因组关联研究中发现的多个基因变异的信息,估算个体患复杂性状和疾病的遗传可能性。PRS 可以预测多种疾病,因此被广泛应用于研究领域。一些研究已对 PRS 作为生物标记物在预防医学中的潜在应用进行了调查,但仍需开展大量工作,以明确确定并向患者传达不同人口群体中遗传和可改变风险因素的绝对风险。然而,PRS 目前最大的局限性在于其在不同血统和队列中的通用性较差。目前正在通过方法开发和数据生成计划努力提高其通用性。本综述旨在全面讨论 PRS 目前的开发进展、影响其通用性的因素,以及提高其准确性、可移植性和实施的前景广阔的领域。
{"title":"Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores.","authors":"Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin","doi":"10.1146/annurev-biodatasci-111721-074830","DOIUrl":"10.1146/annurev-biodatasci-111721-074830","url":null,"abstract":"<p><p>Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"293-320"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9828290/pdf/nihms-1857872.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10555201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome Privacy and Trust. 基因组隐私与信任。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 DOI: 10.1146/annurev-biodatasci-122120-021311
Gamze Gürsoy

Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.

基因组学数据对于推进生物医学研究、改善临床护理以及为法医学和家谱学等其他学科提供信息非常重要。然而,当基因组数据被共享时,隐私问题就出现了。特别是,遗传信息的识别性质、与健康状况的直接关系以及对个人及其血亲造成的潜在经济损害和污名化,要求调查与共享遗传和相关数据有关的隐私问题以及克服这些问题的可能解决办法。在这项工作中,我们概述了基因组隐私的重要性,从基因组数据中收集的信息,基因组学中潜在的私人信息泄露的来源,以及在研究中利用遗传信息时保护隐私的方法。我们讨论了科学界的信任与保护隐私之间的关系,为数据共享和研究参与指明了未来的路线图。
{"title":"Genome Privacy and Trust.","authors":"Gamze Gürsoy","doi":"10.1146/annurev-biodatasci-122120-021311","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122120-021311","url":null,"abstract":"<p><p>Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"163-181"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9116494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Importance of Including Non-European Populations in Large Human Genetic Studies to Enhance Precision Medicine. 在大型人类基因研究中纳入非欧洲人群对加强精准医疗的重要性。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 Epub Date: 2022-05-16 DOI: 10.1146/annurev-biodatasci-122220-112550
Dan Ju, Daniel Hui, Dorothy A Hammond, Ambroise Wonkam, Sarah A Tishkoff

One goal of genomic medicine is to uncover an individual's genetic risk for disease, which generally requires data connecting genotype to phenotype, as done in genome-wide association studies (GWAS). While there may be clinical promise to employing prediction tools such as polygenic risk scores (PRS), it currently stands that individuals of non-European ancestry may not reap the benefits of genomic medicine because of underrepresentation in large-scale genetics studies. Here, we discuss why this inequity poses a problem for genomic medicine and the reasons for the low transferability of PRS across populations. We also survey the ancestry representation of published GWAS and investigate how estimates of ancestry diversity in GWASparticipants might be biased. We highlight the importance of expanding genetic research in Africa, one of the most underrepresented regions in human genomics research, and discuss issues of ethics, resources, and technology for equitable advancement of genomic medicine.

基因组医学的目标之一是揭示个体的遗传疾病风险,这通常需要将基因型与表型联系起来的数据,就像全基因组关联研究(GWAS)所做的那样。虽然采用多基因风险评分(PRS)等预测工具可能有临床前景,但目前的情况是,非欧洲血统的个体可能无法从基因组医学中获益,因为他们在大规模遗传学研究中的代表性不足。在此,我们将讨论为什么这种不公平会给基因组医学带来问题,以及 PRS 在不同人群中可移植性低的原因。我们还调查了已发表的 GWAS 中的祖先代表性,并研究了对 GWAS 参与者祖先多样性的估计可能存在的偏差。非洲是人类基因组学研究中代表性最弱的地区之一,我们强调了在非洲扩大基因研究的重要性,并讨论了伦理、资源和技术问题,以促进基因组医学的公平发展。
{"title":"Importance of Including Non-European Populations in Large Human Genetic Studies to Enhance Precision Medicine.","authors":"Dan Ju, Daniel Hui, Dorothy A Hammond, Ambroise Wonkam, Sarah A Tishkoff","doi":"10.1146/annurev-biodatasci-122220-112550","DOIUrl":"10.1146/annurev-biodatasci-122220-112550","url":null,"abstract":"<p><p>One goal of genomic medicine is to uncover an individual's genetic risk for disease, which generally requires data connecting genotype to phenotype, as done in genome-wide association studies (GWAS). While there may be clinical promise to employing prediction tools such as polygenic risk scores (PRS), it currently stands that individuals of non-European ancestry may not reap the benefits of genomic medicine because of underrepresentation in large-scale genetics studies. Here, we discuss why this inequity poses a problem for genomic medicine and the reasons for the low transferability of PRS across populations. We also survey the ancestry representation of published GWAS and investigate how estimates of ancestry diversity in GWASparticipants might be biased. We highlight the importance of expanding genetic research in Africa, one of the most underrepresented regions in human genomics research, and discuss issues of ethics, resources, and technology for equitable advancement of genomic medicine.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"321-339"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904154/pdf/nihms-1864817.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9545868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cotranslational Mechanisms of Protein Biogenesis and Complex Assembly in Eukaryotes. 真核生物蛋白质生物生成和复合体组装的共翻译机制。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 Epub Date: 2022-04-26 DOI: 10.1146/annurev-biodatasci-121721-095858
Fabián Morales-Polanco, Jae Ho Lee, Natália M Barbosa, Judith Frydman

The formation of protein complexes is crucial to most biological functions. The cellular mechanisms governing protein complex biogenesis are not yet well understood, but some principles of cotranslational and posttranslational assembly are beginning to emerge. In bacteria, this process is favored by operons encoding subunits of protein complexes. Eukaryotic cells do not have polycistronic mRNAs, raising the question of how they orchestrate the encounter of unassembled subunits. Here we review the constraints and mechanisms governing eukaryotic co- and posttranslational protein folding and assembly, including the influence of elongation rate on nascent chain targeting, folding, and chaperone interactions. Recent evidence shows that mRNAs encoding subunits of oligomeric assemblies can undergo localized translation and form cytoplasmic condensates that might facilitate the assembly of protein complexes. Understanding the interplay between localized mRNA translation and cotranslational proteostasis will be critical to defining protein complex assembly in vivo.

蛋白质复合物的形成对大多数生物功能至关重要。蛋白质复合物生物生成的细胞机制尚不十分清楚,但一些共翻译和翻译后组装的原理已开始显现。在细菌中,编码蛋白质复合体亚基的操作子有利于这一过程。真核细胞没有多聚核苷酸 mRNA,这就提出了它们如何协调未组装亚基相遇的问题。在此,我们回顾了真核生物共翻译和翻译后蛋白质折叠和组装的制约因素和机制,包括伸长率对新生链靶向、折叠和伴侣相互作用的影响。最近的证据表明,编码寡聚体组装亚基的 mRNA 可以进行局部翻译并形成细胞质凝聚物,从而促进蛋白质复合体的组装。了解局部 mRNA 翻译和共翻译蛋白稳态之间的相互作用对于确定体内蛋白质复合体的组装至关重要。
{"title":"Cotranslational Mechanisms of Protein Biogenesis and Complex Assembly in Eukaryotes.","authors":"Fabián Morales-Polanco, Jae Ho Lee, Natália M Barbosa, Judith Frydman","doi":"10.1146/annurev-biodatasci-121721-095858","DOIUrl":"10.1146/annurev-biodatasci-121721-095858","url":null,"abstract":"<p><p>The formation of protein complexes is crucial to most biological functions. The cellular mechanisms governing protein complex biogenesis are not yet well understood, but some principles of cotranslational and posttranslational assembly are beginning to emerge. In bacteria, this process is favored by operons encoding subunits of protein complexes. Eukaryotic cells do not have polycistronic mRNAs, raising the question of how they orchestrate the encounter of unassembled subunits. Here we review the constraints and mechanisms governing eukaryotic co- and posttranslational protein folding and assembly, including the influence of elongation rate on nascent chain targeting, folding, and chaperone interactions. Recent evidence shows that mRNAs encoding subunits of oligomeric assemblies can undergo localized translation and form cytoplasmic condensates that might facilitate the assembly of protein complexes. Understanding the interplay between localized mRNA translation and cotranslational proteostasis will be critical to defining protein complex assembly in vivo.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"67-94"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11040709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9769322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phenotypic Causal Inference Using Genome-Wide Association Study Data: Mendelian Randomization and Beyond. 利用全基因组关联研究数据进行表型因果推断:孟德尔随机化及其他。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 Epub Date: 2022-04-01 DOI: 10.1146/annurev-biodatasci-122120-024910
Venexia M Walker, Jie Zheng, Tom R Gaunt, George Davey Smith

statistics for genome-wide association studies (GWAS) are increasingly available for downstream analyses. Meanwhile, the popularity of causal inference methods has grown as we look to gather robust evidence for novel medical and public health interventions. This has led to the development of methods that use GWAS summary statistics for causal inference. Here, we describe these methods in order of their escalating complexity, from genetic associations to extensions of Mendelian randomization that consider thousands of phenotypes simultaneously. We also cover the assumptions and limitations of these approaches before considering the challenges faced by researchers performing causal inference using GWAS data. GWAS summary statistics constitute an important data source for causal inference research that offers a counterpoint to nongenetic methods when triangulating evidence. Continued efforts to address the challenges in using GWAS data for causal inference will allow the full impact of these approaches to be realized.

越来越多的全基因组关联研究(GWAS)统计数据可用于下游分析。同时,随着我们希望为新型医疗和公共卫生干预措施收集可靠证据,因果推断方法也越来越受欢迎。因此,我们开发了使用 GWAS 摘要统计进行因果推断的方法。在此,我们将按照复杂程度的递增顺序介绍这些方法,从遗传关联到孟德尔随机化的扩展(同时考虑数千种表型)。在考虑研究人员利用 GWAS 数据进行因果推断所面临的挑战之前,我们还将介绍这些方法的假设和局限性。GWAS 统计摘要是因果推断研究的一个重要数据源,在三角测量证据时可与非遗传方法相抗衡。继续努力解决使用 GWAS 数据进行因果推断时所面临的挑战,将使这些方法的影响得以充分发挥。
{"title":"Phenotypic Causal Inference Using Genome-Wide Association Study Data: Mendelian Randomization and Beyond.","authors":"Venexia M Walker, Jie Zheng, Tom R Gaunt, George Davey Smith","doi":"10.1146/annurev-biodatasci-122120-024910","DOIUrl":"10.1146/annurev-biodatasci-122120-024910","url":null,"abstract":"<p><p>statistics for genome-wide association studies (GWAS) are increasingly available for downstream analyses. Meanwhile, the popularity of causal inference methods has grown as we look to gather robust evidence for novel medical and public health interventions. This has led to the development of methods that use GWAS summary statistics for causal inference. Here, we describe these methods in order of their escalating complexity, from genetic associations to extensions of Mendelian randomization that consider thousands of phenotypes simultaneously. We also cover the assumptions and limitations of these approaches before considering the challenges faced by researchers performing causal inference using GWAS data. GWAS summary statistics constitute an important data source for causal inference research that offers a counterpoint to nongenetic methods when triangulating evidence. Continued efforts to address the challenges in using GWAS data for causal inference will allow the full impact of these approaches to be realized.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"1-17"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614231/pdf/EMS167448.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10780371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases. 大数据分析的最佳实践,以解决我们在了解疾病的病因、诊断和预后时存在的性别偏见。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-08-10 Epub Date: 2022-05-13 DOI: 10.1146/annurev-biodatasci-122120-025806
Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez

A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (a) "women," "men," or "sex"; (b) "big data," "artificial intelligence," or "NLP"; and (c) "disparities" or "differences." From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.

健康研究中偏向于理解男性所患疾病的偏见会对女性的健康产生严重影响。本文对有关机器学习或自然语言处理(NLP)技术的文献进行了概念性综述,这些技术用于查询大数据以识别特定性别的健康差异。我们在 2021 年 10 月使用以下同义词和索引词对 Ovid MEDLINE、Embase 和 PsycINFO 进行了检索:(a) "女性"、"男性 "或 "性别";(b) "大数据"、"人工智能 "或 "NLP";(c) "差异 "或 "差别"。在 902 条记录中,有 22 项研究符合纳入标准并进行了分析。结果表明,按性别纳入研究的情况并不一致,而且往往未作报告,尽管男性在这些研究中的比例比女性少得多。尽管人工智能和 NLP 技术已广泛应用于健康研究,但很少有研究利用它们来研究非结构化文本中与性别相关的差异或差距。研究人员越来越意识到基于性别的数据偏差,但纠正过程却很缓慢。我们反思了在了解疾病的病因、诊断和预后方面使用大数据分析来解决性别偏见的最佳实践。
{"title":"Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases.","authors":"Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez","doi":"10.1146/annurev-biodatasci-122120-025806","DOIUrl":"10.1146/annurev-biodatasci-122120-025806","url":null,"abstract":"<p><p>A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (<i>a</i>) \"women,\" \"men,\" or \"sex\"; (<i>b</i>) \"big data,\" \"artificial intelligence,\" or \"NLP\"; and (<i>c</i>) \"disparities\" or \"differences.\" From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"251-267"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142366765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1