Cell atlases are essential companions to the genome as they elucidate how genes are used in a cell type-specific manner or how the usage of genes changes over the lifetime of an organism. This review explores recent advances in whole-organism single-cell atlases, which enable understanding of cell heterogeneity and tissue and cell fate, both in health and disease. Here we provide an overview of recent efforts to build cell atlases across species and discuss the challenges that the field is currently facing. Moreover, we propose the concept of having a knowledgebase that can scale with the number of experiments and computational approaches and a new feedback loop for development and benchmarking of computational methods that includes contributions from the users. These two aspects are key for community efforts in single-cell biology that will help produce a comprehensive annotated map of cell types and states with unparalleled resolution.
The spatial organization of the genome in the cell nucleus is pivotal to cell function. However, how the 3D genome organization and its dynamics influence cellular phenotypes remains poorly understood. The very recent development of single-cell technologies for probing the 3D genome, especially single-cell Hi-C (scHi-C), has ushered in a new era of unveiling cell-to-cell variability of 3D genome features at an unprecedented resolution. Here, we review recent developments in computational approaches to the analysis of scHi-C, including data processing, dimensionality reduction, imputation for enhancing data quality, and the revealing of 3D genome features at single-cell resolution. While much progress has been made in computational method development to analyze single-cell 3D genomes, substantial future work is needed to improve data interpretation and multimodal data integration, which are critical to reveal fundamental connections between genome structure and function among heterogeneous cell populations in various biological contexts.
The accumulation of vast amounts of multimodal data for the human brain, in both normal and disease conditions, has provided unprecedented opportunities for understanding why and how brain disorders arise. Compared with traditional analyses of single datasets, the integration of multimodal datasets covering different types of data (i.e., genomics, transcriptomics, imaging, etc.) has shed light on the mechanisms underlying brain disorders in greater detail across both the microscopic and macroscopic levels. In this review, we first briefly introduce the popular large datasets for the brain. Then, we discuss in detail how integration of multimodal human brain datasets can reveal the genetic predispositions and the abnormal molecular pathways of brain disorders. Finally, we present an outlook on how future data integration efforts may advance the diagnosis and treatment of brain disorders.
Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry.
African populations are diverse in their ethnicity, language, culture, and genetics. Although plagued by high disease burdens, until recently the continent has largely been excluded from biomedical studies. Along with limitations in research and clinical infrastructure, human capacity, and funding, this omission has resulted in an underrepresentation of African data and disadvantaged African scientists. This review interrogates the relative abundance of biomedical data from Africa, primarily in genomics and other omics. The visibility of African science through publications is also discussed. A challenge encountered in this review is the relative lack of annotation of data on their geographical or population origin, with African countries represented as a single group. In addition to the abovementioned limitations,the global representation of African data may also be attributed to the hesitation to deposit data in public repositories. Whatever the reason, the disparity should be addressed, as African data have enormous value for scientists in Africa and globally.
Data from satellite instruments provide estimates of gas and particle levels relevant to human health, even pollutants invisible to the human eye. However, the successful interpretation of satellite data requires an understanding of how satellites relate to other data sources, as well as factors affecting their application to health challenges. Drawing from the expertise and experience of the 2016-2020 NASA HAQAST (Health and Air Quality Applied Sciences Team), we present a review of satellite data for air quality and health applications. We include a discussion of satellite data for epidemiological studies and health impact assessments, as well as the use of satellite data to evaluate air quality trends, support air quality regulation, characterize smoke from wildfires, and quantify emission sources. The primary advantage of satellite data compared to in situ measurements, e.g., from air quality monitoring stations, is their spatial coverage. Satellite data can reveal where pollution levels are highest around the world, how levels have changed over daily to decadal periods, and where pollutants are transported from urban to global scales. To date, air quality and health applications have primarily utilized satellite observations and satellite-derived products relevant to near-surface particulate matter <2.5 μm in diameter (PM2.5) and nitrogen dioxide (NO2). Health and air quality communities have grown increasingly engaged in the use of satellite data, and this trend is expected to continue. From health researchers to air quality managers, and from global applications to community impacts, satellite data are transforming the way air pollution exposure is evaluated.
Next-generation sequencing technologies have revolutionized our ability to catalog the landscape of somatic mutations in tumor genomes. These mutations can sometimes create so-called neoantigens, which allow the immune system to detect and eliminate tumor cells. However, efforts that stimulate the immune system to eliminate tumors based on their molecular differences have had less success than has been hoped for, and there are conflicting reports about the role of neoantigens in the success of this approach. Here we review some of the conflicting evidence in the literature and highlight key aspects of the tumor-immune interface that are emerging as major determinants of whether mutation-derived neoantigens will contribute to an immunotherapy response. Accounting for these factors is expected to improve success rates of future immunotherapy approaches.
The collection and use of human genetic data raise important ethical questions about how to balance individual autonomy and privacy with the potential for public good. The proliferation of local, national, and international efforts to collect genetic data and create linkages to support large-scale initiatives in precision medicine and the learning health system creates new demands for broad data sharing that involve managing competing interests and careful consideration of what constitutes appropriate ethical trade-offs. This review describes these emerging ethical issues with a focus on approaches to consent and issues related to justice in the shifting genomic research ecosystem.
The COVID-19 (coronavirus disease 2019) pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the public. Natural language processing (NLP)-the branch of artificial intelligence that interprets human language-can be applied to address many of the information needs made urgent by the COVID-19 pandemic. This review surveys approximately 150 NLP studies and more than 50 systems and datasets addressing the COVID-19 pandemic. We detail work on four core NLP tasks: information retrieval, named entity recognition, literature-based discovery, and question answering. We also describe work that directly addresses aspects of the pandemic through four additional tasks: topic modeling, sentiment and emotion analysis, caseload forecasting, and misinformation detection. We conclude by discussing observable trends and remaining challenges.