Pub Date : 2023-08-10DOI: 10.1146/annurev-biodatasci-020722-044021
Taojunfeng Su, Michael A R Hollas, Ryan T Fellers, Neil L Kelleher
Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.
{"title":"Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics.","authors":"Taojunfeng Su, Michael A R Hollas, Ryan T Fellers, Neil L Kelleher","doi":"10.1146/annurev-biodatasci-020722-044021","DOIUrl":"10.1146/annurev-biodatasci-020722-044021","url":null,"abstract":"<p><p>Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"357-376"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10840079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10339608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-10DOI: 10.1146/annurev-biodatasci-122120-104825
Kelsey R Mayo, Melissa A Basford, Robert J Carroll, Moira Dillon, Heather Fullen, Jesse Leung, Hiral Master, Shimon Rura, Lina Sulieman, Nan Kennedy, Eric Banks, David Bernick, Asmita Gauchan, Lee Lichtenstein, Brandy M Mapes, Kayla Marginean, Steve L Nyemba, Andrea Ramirez, Charissa Rotundo, Keri Wolfe, Weiyi Xia, Romuladus E Azuine, Robert M Cronin, Joshua C Denny, Abel Kho, Christopher Lunt, Bradley Malin, Karthik Natarajan, Consuelo H Wilkins, Hua Xu, George Hripcsak, Dan M Roden, Anthony A Philippakis, David Glazer, Paul A Harris
The All of Us Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in All of Us, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the All of Us data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.
我们所有人研究计划的数据和研究中心(DRC)成立的目的是帮助获取、策划和访问世界上最大、最多样化的精准医学研究数据集之一。已经有超过50万名参与者参加了All of Us,其中80%在生物医学研究中的代表性不足,2300多名研究人员正在分析数据。DRC通过与参与者、创新项目合作伙伴和有能力的研究人员合作,创建了这个蓬勃发展的数据生态系统。在这篇综述中,我们首先描述了刚果民主共和国是如何组织起来以满足这一广泛利益相关者群体的需求的。然后,我们概述了用于构建All of Us数据生态系统的指导原则、共同挑战和创新方法。最后,我们分享经验教训,帮助其他人在构建现代生物医学数据平台时做出重要决策和权衡。
{"title":"The <i>All of Us</i> Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research.","authors":"Kelsey R Mayo, Melissa A Basford, Robert J Carroll, Moira Dillon, Heather Fullen, Jesse Leung, Hiral Master, Shimon Rura, Lina Sulieman, Nan Kennedy, Eric Banks, David Bernick, Asmita Gauchan, Lee Lichtenstein, Brandy M Mapes, Kayla Marginean, Steve L Nyemba, Andrea Ramirez, Charissa Rotundo, Keri Wolfe, Weiyi Xia, Romuladus E Azuine, Robert M Cronin, Joshua C Denny, Abel Kho, Christopher Lunt, Bradley Malin, Karthik Natarajan, Consuelo H Wilkins, Hua Xu, George Hripcsak, Dan M Roden, Anthony A Philippakis, David Glazer, Paul A Harris","doi":"10.1146/annurev-biodatasci-122120-104825","DOIUrl":"10.1146/annurev-biodatasci-122120-104825","url":null,"abstract":"<p><p>The <i>All of Us</i> Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in <i>All of Us</i>, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the <i>All of Us</i> data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"443-464"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10040579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-10Epub Date: 2023-05-17DOI: 10.1146/annurev-biodatasci-020722-094144
Mira N Moufarrej, Diana W Bianchi, Gary M Shaw, David K Stevenson, Stephen R Quake
Prenatal screening using sequencing of circulating cell-free DNA has transformed obstetric care over the past decade and significantly reduced the number of invasive diagnostic procedures like amniocentesis for genetic disorders. Nonetheless, emergency care remains the only option for complications like preeclampsia and preterm birth, two of the most prevalent obstetrical syndromes. Advances in noninvasive prenatal testing expand the scope of precision medicine in obstetric care. In this review, we discuss advances, challenges, and possibilities toward the goal of providing proactive, personalized prenatal care. The highlighted advances focus mainly on cell-free nucleic acids; however, we also review research that uses signals from metabolomics, proteomics, intact cells, and the microbiome. We discuss ethical challenges in providing care. Finally, we look to future possibilities, including redefining disease taxonomy and moving from biomarker correlation to biological causation.
{"title":"Noninvasive Prenatal Testing Using Circulating DNA and RNA: Advances, Challenges, and Possibilities.","authors":"Mira N Moufarrej, Diana W Bianchi, Gary M Shaw, David K Stevenson, Stephen R Quake","doi":"10.1146/annurev-biodatasci-020722-094144","DOIUrl":"10.1146/annurev-biodatasci-020722-094144","url":null,"abstract":"<p><p>Prenatal screening using sequencing of circulating cell-free DNA has transformed obstetric care over the past decade and significantly reduced the number of invasive diagnostic procedures like amniocentesis for genetic disorders. Nonetheless, emergency care remains the only option for complications like preeclampsia and preterm birth, two of the most prevalent obstetrical syndromes. Advances in noninvasive prenatal testing expand the scope of precision medicine in obstetric care. In this review, we discuss advances, challenges, and possibilities toward the goal of providing proactive, personalized prenatal care. The highlighted advances focus mainly on cell-free nucleic acids; however, we also review research that uses signals from metabolomics, proteomics, intact cells, and the microbiome. We discuss ethical challenges in providing care. Finally, we look to future possibilities, including redefining disease taxonomy and moving from biomarker correlation to biological causation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"397-418"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10528197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9969611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-07DOI: 10.48550/arXiv.2303.03577
Peter Washington, D. Wall
Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
{"title":"A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism","authors":"Peter Washington, D. Wall","doi":"10.48550/arXiv.2303.03577","DOIUrl":"https://doi.org/10.48550/arXiv.2303.03577","url":null,"abstract":"Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47897781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-10Epub Date: 2022-05-16DOI: 10.1146/annurev-biodatasci-111721-074830
Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin
Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.
{"title":"Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores.","authors":"Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin","doi":"10.1146/annurev-biodatasci-111721-074830","DOIUrl":"10.1146/annurev-biodatasci-111721-074830","url":null,"abstract":"<p><p>Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"293-320"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9828290/pdf/nihms-1857872.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10555201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-10DOI: 10.1146/annurev-biodatasci-122120-021311
Gamze Gürsoy
Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.
{"title":"Genome Privacy and Trust.","authors":"Gamze Gürsoy","doi":"10.1146/annurev-biodatasci-122120-021311","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122120-021311","url":null,"abstract":"<p><p>Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"163-181"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9116494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-10Epub Date: 2022-05-16DOI: 10.1146/annurev-biodatasci-122220-112550
Dan Ju, Daniel Hui, Dorothy A Hammond, Ambroise Wonkam, Sarah A Tishkoff
One goal of genomic medicine is to uncover an individual's genetic risk for disease, which generally requires data connecting genotype to phenotype, as done in genome-wide association studies (GWAS). While there may be clinical promise to employing prediction tools such as polygenic risk scores (PRS), it currently stands that individuals of non-European ancestry may not reap the benefits of genomic medicine because of underrepresentation in large-scale genetics studies. Here, we discuss why this inequity poses a problem for genomic medicine and the reasons for the low transferability of PRS across populations. We also survey the ancestry representation of published GWAS and investigate how estimates of ancestry diversity in GWASparticipants might be biased. We highlight the importance of expanding genetic research in Africa, one of the most underrepresented regions in human genomics research, and discuss issues of ethics, resources, and technology for equitable advancement of genomic medicine.
{"title":"Importance of Including Non-European Populations in Large Human Genetic Studies to Enhance Precision Medicine.","authors":"Dan Ju, Daniel Hui, Dorothy A Hammond, Ambroise Wonkam, Sarah A Tishkoff","doi":"10.1146/annurev-biodatasci-122220-112550","DOIUrl":"10.1146/annurev-biodatasci-122220-112550","url":null,"abstract":"<p><p>One goal of genomic medicine is to uncover an individual's genetic risk for disease, which generally requires data connecting genotype to phenotype, as done in genome-wide association studies (GWAS). While there may be clinical promise to employing prediction tools such as polygenic risk scores (PRS), it currently stands that individuals of non-European ancestry may not reap the benefits of genomic medicine because of underrepresentation in large-scale genetics studies. Here, we discuss why this inequity poses a problem for genomic medicine and the reasons for the low transferability of PRS across populations. We also survey the ancestry representation of published GWAS and investigate how estimates of ancestry diversity in GWASparticipants might be biased. We highlight the importance of expanding genetic research in Africa, one of the most underrepresented regions in human genomics research, and discuss issues of ethics, resources, and technology for equitable advancement of genomic medicine.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"321-339"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904154/pdf/nihms-1864817.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9545868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-10Epub Date: 2022-04-26DOI: 10.1146/annurev-biodatasci-121721-095858
Fabián Morales-Polanco, Jae Ho Lee, Natália M Barbosa, Judith Frydman
The formation of protein complexes is crucial to most biological functions. The cellular mechanisms governing protein complex biogenesis are not yet well understood, but some principles of cotranslational and posttranslational assembly are beginning to emerge. In bacteria, this process is favored by operons encoding subunits of protein complexes. Eukaryotic cells do not have polycistronic mRNAs, raising the question of how they orchestrate the encounter of unassembled subunits. Here we review the constraints and mechanisms governing eukaryotic co- and posttranslational protein folding and assembly, including the influence of elongation rate on nascent chain targeting, folding, and chaperone interactions. Recent evidence shows that mRNAs encoding subunits of oligomeric assemblies can undergo localized translation and form cytoplasmic condensates that might facilitate the assembly of protein complexes. Understanding the interplay between localized mRNA translation and cotranslational proteostasis will be critical to defining protein complex assembly in vivo.
{"title":"Cotranslational Mechanisms of Protein Biogenesis and Complex Assembly in Eukaryotes.","authors":"Fabián Morales-Polanco, Jae Ho Lee, Natália M Barbosa, Judith Frydman","doi":"10.1146/annurev-biodatasci-121721-095858","DOIUrl":"10.1146/annurev-biodatasci-121721-095858","url":null,"abstract":"<p><p>The formation of protein complexes is crucial to most biological functions. The cellular mechanisms governing protein complex biogenesis are not yet well understood, but some principles of cotranslational and posttranslational assembly are beginning to emerge. In bacteria, this process is favored by operons encoding subunits of protein complexes. Eukaryotic cells do not have polycistronic mRNAs, raising the question of how they orchestrate the encounter of unassembled subunits. Here we review the constraints and mechanisms governing eukaryotic co- and posttranslational protein folding and assembly, including the influence of elongation rate on nascent chain targeting, folding, and chaperone interactions. Recent evidence shows that mRNAs encoding subunits of oligomeric assemblies can undergo localized translation and form cytoplasmic condensates that might facilitate the assembly of protein complexes. Understanding the interplay between localized mRNA translation and cotranslational proteostasis will be critical to defining protein complex assembly in vivo.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"67-94"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11040709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9769322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-10Epub Date: 2022-04-01DOI: 10.1146/annurev-biodatasci-122120-024910
Venexia M Walker, Jie Zheng, Tom R Gaunt, George Davey Smith
statistics for genome-wide association studies (GWAS) are increasingly available for downstream analyses. Meanwhile, the popularity of causal inference methods has grown as we look to gather robust evidence for novel medical and public health interventions. This has led to the development of methods that use GWAS summary statistics for causal inference. Here, we describe these methods in order of their escalating complexity, from genetic associations to extensions of Mendelian randomization that consider thousands of phenotypes simultaneously. We also cover the assumptions and limitations of these approaches before considering the challenges faced by researchers performing causal inference using GWAS data. GWAS summary statistics constitute an important data source for causal inference research that offers a counterpoint to nongenetic methods when triangulating evidence. Continued efforts to address the challenges in using GWAS data for causal inference will allow the full impact of these approaches to be realized.
{"title":"Phenotypic Causal Inference Using Genome-Wide Association Study Data: Mendelian Randomization and Beyond.","authors":"Venexia M Walker, Jie Zheng, Tom R Gaunt, George Davey Smith","doi":"10.1146/annurev-biodatasci-122120-024910","DOIUrl":"10.1146/annurev-biodatasci-122120-024910","url":null,"abstract":"<p><p>statistics for genome-wide association studies (GWAS) are increasingly available for downstream analyses. Meanwhile, the popularity of causal inference methods has grown as we look to gather robust evidence for novel medical and public health interventions. This has led to the development of methods that use GWAS summary statistics for causal inference. Here, we describe these methods in order of their escalating complexity, from genetic associations to extensions of Mendelian randomization that consider thousands of phenotypes simultaneously. We also cover the assumptions and limitations of these approaches before considering the challenges faced by researchers performing causal inference using GWAS data. GWAS summary statistics constitute an important data source for causal inference research that offers a counterpoint to nongenetic methods when triangulating evidence. Continued efforts to address the challenges in using GWAS data for causal inference will allow the full impact of these approaches to be realized.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"1-17"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614231/pdf/EMS167448.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10780371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-10Epub Date: 2022-05-13DOI: 10.1146/annurev-biodatasci-122120-025806
Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez
A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (a) "women," "men," or "sex"; (b) "big data," "artificial intelligence," or "NLP"; and (c) "disparities" or "differences." From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.
{"title":"Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases.","authors":"Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez","doi":"10.1146/annurev-biodatasci-122120-025806","DOIUrl":"10.1146/annurev-biodatasci-122120-025806","url":null,"abstract":"<p><p>A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (<i>a</i>) \"women,\" \"men,\" or \"sex\"; (<i>b</i>) \"big data,\" \"artificial intelligence,\" or \"NLP\"; and (<i>c</i>) \"disparities\" or \"differences.\" From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"251-267"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142366765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}