M. Okomo-Adhiambo, E. Ramos, Reagan J. Kelly, Yatish Jain, R. Tatusov, A. Montmayeur, Gregory Doho, Rachel L. Marine, T. Ng, Adam C. Retchless, S. Oberste, P. Rota, X. Wang, Agha N. Khan
Next-generation sequencing (NGS) has become a vital tool in clinical microbiology, with numerous applications in infectious disease diagnostics, outbreak investigations, and public health surveillance. Although the NGS technology enables comprehensive pathogen detection in a relatively short time at a low cost, the enormous amount of genomics data generated creates a critical challenge of effectively organizing, archiving, analyzing, and reporting the results within a clinically relevant timeframe. Automated pipelines provide the first step in standardizing NGS data processing and reporting, thus eliminating the common bottlenecks in bioinformatics analyses, and providing rapid turnaround. Here, we present the Viral NGS Pipeline optimized for identification and whole genome assembly of viruses, and the Bacterial Meningococcus Genome Analysis Platform (BMGAP), designed for genotypic characterization of meningitis pathogens. These respective pipelines have been used to analyze more than 11,000 clinical samples and isolates. The pipelines are deployable on both standalone and cloud-based servers, enabling their accessibility to internal CDC users, as well as external partners, including state public health laboratories and other collaborators worldwide. These automated pipelines have the potential to contribute to the development of unbiased NGS-based clinical assays for pathogen detection that demand rapid turnaround times, and are expected to play a key role in infectious disease surveillance in the future.
{"title":"Automated Next Generation Sequencing Bioinformatics Pipelines for Pathogen Discovery and Surveillance","authors":"M. Okomo-Adhiambo, E. Ramos, Reagan J. Kelly, Yatish Jain, R. Tatusov, A. Montmayeur, Gregory Doho, Rachel L. Marine, T. Ng, Adam C. Retchless, S. Oberste, P. Rota, X. Wang, Agha N. Khan","doi":"10.1145/3107411.3108192","DOIUrl":"https://doi.org/10.1145/3107411.3108192","url":null,"abstract":"Next-generation sequencing (NGS) has become a vital tool in clinical microbiology, with numerous applications in infectious disease diagnostics, outbreak investigations, and public health surveillance. Although the NGS technology enables comprehensive pathogen detection in a relatively short time at a low cost, the enormous amount of genomics data generated creates a critical challenge of effectively organizing, archiving, analyzing, and reporting the results within a clinically relevant timeframe. Automated pipelines provide the first step in standardizing NGS data processing and reporting, thus eliminating the common bottlenecks in bioinformatics analyses, and providing rapid turnaround. Here, we present the Viral NGS Pipeline optimized for identification and whole genome assembly of viruses, and the Bacterial Meningococcus Genome Analysis Platform (BMGAP), designed for genotypic characterization of meningitis pathogens. These respective pipelines have been used to analyze more than 11,000 clinical samples and isolates. The pipelines are deployable on both standalone and cloud-based servers, enabling their accessibility to internal CDC users, as well as external partners, including state public health laboratories and other collaborators worldwide. These automated pipelines have the potential to contribute to the development of unbiased NGS-based clinical assays for pathogen detection that demand rapid turnaround times, and are expected to play a key role in infectious disease surveillance in the future.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naveen Mangalakumar, A. Alkhateeb, H. Pham, L. Rueda, A. Ngom
Studying gene expression through various time intervals of breast cancer survival may provide new insights into the recovery from the disease. In this work, we propose a hierarchical clustering method to separate dissimilar groups of gene time-series profiles, which have the furthest distances from the rest of the profiles throughout different time intervals. The isolated outliers can be used as potential biomarkers of Breast Cancer survivability. Gene expressions throughout those time points are cubic spline interpolated to create a trending profile for each gene. After universally aligning the profiles to minimize the vertical area between each pair of profiles, we cluster the genes using hierarchical clustering based on minimized vertical distances [1]. An appropriate number of clusters was chosen based on the profile alignment and agglomerative clustering (PAAC) index as well as visual observations of the clusters. Our study suggests that the combination of proper clustering, distance function and index validation for clusters is a suitable model to identify genes as informative biomarkers of breast cancer survivability.
{"title":"Outlier Genes as Biomarkers of Breast Cancer Survivability in Time-Series Data","authors":"Naveen Mangalakumar, A. Alkhateeb, H. Pham, L. Rueda, A. Ngom","doi":"10.1145/3107411.3108202","DOIUrl":"https://doi.org/10.1145/3107411.3108202","url":null,"abstract":"Studying gene expression through various time intervals of breast cancer survival may provide new insights into the recovery from the disease. In this work, we propose a hierarchical clustering method to separate dissimilar groups of gene time-series profiles, which have the furthest distances from the rest of the profiles throughout different time intervals. The isolated outliers can be used as potential biomarkers of Breast Cancer survivability. Gene expressions throughout those time points are cubic spline interpolated to create a trending profile for each gene. After universally aligning the profiles to minimize the vertical area between each pair of profiles, we cluster the genes using hierarchical clustering based on minimized vertical distances [1]. An appropriate number of clusters was chosen based on the profile alignment and agglomerative clustering (PAAC) index as well as visual observations of the clusters. Our study suggests that the combination of proper clustering, distance function and index validation for clusters is a suitable model to identify genes as informative biomarkers of breast cancer survivability.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127050249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The majority of genes in eukaryotes consist of multiple protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences for genes. Yet, most computational methods for studying gene evolution view genes as the basic unit of evolution and assume that evolutionary processes such as duplications and losses act on entire genes, rather than on parts of genes. Specifically, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop a three-tree model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species tree, by explicitly accounting for domain-level events. The new model decouples domain-level events from gene-level events and provides a much more fine-grained view of gene family and domain family evolution that is easy to interpret. Specifically, we (i) introduce the new three-tree computational framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large dataset of about 4000 domain trees and 7000 gene trees from 12 fly species, and (v) demonstrate the impact of using our new computational framework by comparing the inferred evolutionary histories against those obtained using existing approaches. Our experimental results show that using the new three-tree model has a significant impact on the inference of both domain-level and gene-level events, and on the inference of domain content in ancestral genes and gene content in ancestral species, compared to existing approaches.
{"title":"An Integrated Reconciliation Framework for Domain, Gene, and Species Level Evolution","authors":"Lei Li, Mukul S. Bansal","doi":"10.1145/3107411.3108220","DOIUrl":"https://doi.org/10.1145/3107411.3108220","url":null,"abstract":"The majority of genes in eukaryotes consist of multiple protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences for genes. Yet, most computational methods for studying gene evolution view genes as the basic unit of evolution and assume that evolutionary processes such as duplications and losses act on entire genes, rather than on parts of genes. Specifically, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop a three-tree model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species tree, by explicitly accounting for domain-level events. The new model decouples domain-level events from gene-level events and provides a much more fine-grained view of gene family and domain family evolution that is easy to interpret. Specifically, we (i) introduce the new three-tree computational framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large dataset of about 4000 domain trees and 7000 gene trees from 12 fly species, and (v) demonstrate the impact of using our new computational framework by comparing the inferred evolutionary histories against those obtained using existing approaches. Our experimental results show that using the new three-tree model has a significant impact on the inference of both domain-level and gene-level events, and on the inference of domain content in ancestral genes and gene content in ancestral species, compared to existing approaches.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132023730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Influenza A Virus (IAV) is remarkably adept at surviving in human populations. IAV thrives even among populations with wide spread access to vaccines and anti-viral drugs, and continues to be a major cause of morbidity and mortality. Correlated mutations are an important factor in IAV's evolution and are critical for host adaptation and pathogenicity. Large sets of publicly available sequences of IAV combined with its rapid and complex evolutionary dynamics present interesting opportunities and unique challenges to analyze correlated mutations in influenza proteomes. In this work, we performed a comprehensive analysis of correlated mutations in IAV using a network theory approach where residues in each protein act as nodes in the graph and edges in the graph are created based on inter-residue correlated mutations. Our approach used 'maximal information coefficient' (MIC) to compute correlations between residues and the edges connect nodes if their MIC exceeds a threshold. We created a modular and robust pipeline and applied it to multiple datasets of H1N1, H3N2, H5 and H7N9 subtypes. We studied structural dynamics of IAV sub-systems based on topological properties of their networks resulting in several important conclusions. The main finding is that correlated mutation networks in IAV are sub-type and host specific and the differences for various subtypes and hosts are significant. We identified nodes with highest degree along with edges and triplets with strongest weight for each network. To contextualize our results, we performed entropy analysis to gain a global view of sequence variation and computed solvent accessibility profiles to identify statistical differences in correlation profiles between surface and buried residues. To understand the extent of co-variation between the 10 proteins in IAV sequences, we created visualizations of protein correlation graphs where the proteins acts as nodes and the strength of connections between the nodes depends on the number of correlated mutations between residues of connected proteins. A web application and visualization tools to explore the results and search for correlated mutations were developed.
{"title":"Network Analysis of Correlated Mutations in Influenza","authors":"Uday Yallapragada, I. Vaisman","doi":"10.1145/3107411.3108237","DOIUrl":"https://doi.org/10.1145/3107411.3108237","url":null,"abstract":"Influenza A Virus (IAV) is remarkably adept at surviving in human populations. IAV thrives even among populations with wide spread access to vaccines and anti-viral drugs, and continues to be a major cause of morbidity and mortality. Correlated mutations are an important factor in IAV's evolution and are critical for host adaptation and pathogenicity. Large sets of publicly available sequences of IAV combined with its rapid and complex evolutionary dynamics present interesting opportunities and unique challenges to analyze correlated mutations in influenza proteomes. In this work, we performed a comprehensive analysis of correlated mutations in IAV using a network theory approach where residues in each protein act as nodes in the graph and edges in the graph are created based on inter-residue correlated mutations. Our approach used 'maximal information coefficient' (MIC) to compute correlations between residues and the edges connect nodes if their MIC exceeds a threshold. We created a modular and robust pipeline and applied it to multiple datasets of H1N1, H3N2, H5 and H7N9 subtypes. We studied structural dynamics of IAV sub-systems based on topological properties of their networks resulting in several important conclusions. The main finding is that correlated mutation networks in IAV are sub-type and host specific and the differences for various subtypes and hosts are significant. We identified nodes with highest degree along with edges and triplets with strongest weight for each network. To contextualize our results, we performed entropy analysis to gain a global view of sequence variation and computed solvent accessibility profiles to identify statistical differences in correlation profiles between surface and buried residues. To understand the extent of co-variation between the 10 proteins in IAV sequences, we created visualizations of protein correlation graphs where the proteins acts as nodes and the strength of connections between the nodes depends on the number of correlated mutations between residues of connected proteins. A web application and visualization tools to explore the results and search for correlated mutations were developed.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132538059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 4: Genomic Variation and Disease","authors":"Anna M. Ritz","doi":"10.1145/3254547","DOIUrl":"https://doi.org/10.1145/3254547","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131859129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Adjeroh, Maen Allaga, Jun Tan, Jie Lin, Yue Jiang, A. Abbasi, Xiaobo Zhou
In this work, we study string-based approaches for the problem of RNA-Protein Interaction (RPI). We apply string algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed string-based models, including comparative results against state-of-the-art methods.
{"title":"String-Based Models for Predicting RNA-Protein Interaction","authors":"D. Adjeroh, Maen Allaga, Jun Tan, Jie Lin, Yue Jiang, A. Abbasi, Xiaobo Zhou","doi":"10.1145/3107411.3107508","DOIUrl":"https://doi.org/10.1145/3107411.3107508","url":null,"abstract":"In this work, we study string-based approaches for the problem of RNA-Protein Interaction (RPI). We apply string algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed string-based models, including comparative results against state-of-the-art methods.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133403158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drug repositioning is a promising strategy in drug discovery. New biomedical insights of drug-target-disease relationships are important in drug repositioning, and such relationships have been intensively studied recently. Most of the studies utilize network-based computational approaches based on drug and disease similarities. However, one common limitation of existing approaches is that both drug similarities and disease similarities are defined based on a single feature of drugs/diseases. In reality, the relationships between drug (or disease) pairs can be characterized based on many different features. Therefore, it is increasingly important to include them in drug repositioning studies. In this study, we propose a flexible and robust multi-source learning (FRMSL) framework to integrate multiple heterogeneous data sources for drug-disease association predictions. We first construct a two-layer heterogeneous network consisting of drug nodes, disease nodes and known drug-disease relationships. The drug repositioning problem can thus be treated as a missing link prediction problem on the heterogeneous graph and can be solved using Kronecker regularized least square (KronRLS) method. Multiple data sources describing drugs and diseases are incorporated into the framework using similarity-based kernels. In practice, a great challenge in such data integration projects is the data incompleteness problem due to the nature of data generation and collection. To address this issue, we develop a novel multi-view learning algorithm based on symmetric nonnegative matrix factorization (SymNMF). Extensive experimental studies show that our framework outperforms several recent network-based methods.
{"title":"A Flexible and Robust Multi-Source Learning Algorithm for Drug Repositioning","authors":"Huiyuan Chen, Jing Li","doi":"10.1145/3107411.3107473","DOIUrl":"https://doi.org/10.1145/3107411.3107473","url":null,"abstract":"Drug repositioning is a promising strategy in drug discovery. New biomedical insights of drug-target-disease relationships are important in drug repositioning, and such relationships have been intensively studied recently. Most of the studies utilize network-based computational approaches based on drug and disease similarities. However, one common limitation of existing approaches is that both drug similarities and disease similarities are defined based on a single feature of drugs/diseases. In reality, the relationships between drug (or disease) pairs can be characterized based on many different features. Therefore, it is increasingly important to include them in drug repositioning studies. In this study, we propose a flexible and robust multi-source learning (FRMSL) framework to integrate multiple heterogeneous data sources for drug-disease association predictions. We first construct a two-layer heterogeneous network consisting of drug nodes, disease nodes and known drug-disease relationships. The drug repositioning problem can thus be treated as a missing link prediction problem on the heterogeneous graph and can be solved using Kronecker regularized least square (KronRLS) method. Multiple data sources describing drugs and diseases are incorporated into the framework using similarity-based kernels. In practice, a great challenge in such data integration projects is the data incompleteness problem due to the nature of data generation and collection. To address this issue, we develop a novel multi-view learning algorithm based on symmetric nonnegative matrix factorization (SymNMF). Extensive experimental studies show that our framework outperforms several recent network-based methods.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133174011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Blake Camp, J. Mandivarapu, Jay Mehta, Nagashayana Ramamurthy, James Wingo, A. Bourgeois, Xiaojun Cao, Rajshekhar Sunderraman
The CDC's Epi-Info is widely-used by epidemiologists and public health researchers to collect and analyze public health data, especially in the event of outbreaks. As it exists today, Epi-Info runs only on the Windows platform and is made of separate code-bases for several different devices and use-cases. Software portability has become increasingly important over the past few years. In this poster, we present a cross-platform architecture for Epi-Info. To simplify and expedite future development, the cross-platform system architecture uses Electron, AngularJS, and Python with the capability of running on virtually any desktop or laptop computer. Additionally, the code can be easily deployed to the Web, and has the potential to be a viable solution for several mobile use-cases.
{"title":"A Cross-Platform System Architecture for Form Design and Data Analytics for Public Health","authors":"Blake Camp, J. Mandivarapu, Jay Mehta, Nagashayana Ramamurthy, James Wingo, A. Bourgeois, Xiaojun Cao, Rajshekhar Sunderraman","doi":"10.1145/3107411.3108223","DOIUrl":"https://doi.org/10.1145/3107411.3108223","url":null,"abstract":"The CDC's Epi-Info is widely-used by epidemiologists and public health researchers to collect and analyze public health data, especially in the event of outbreaks. As it exists today, Epi-Info runs only on the Windows platform and is made of separate code-bases for several different devices and use-cases. Software portability has become increasingly important over the past few years. In this poster, we present a cross-platform architecture for Epi-Info. To simplify and expedite future development, the cross-platform system architecture uses Electron, AngularJS, and Python with the capability of running on virtually any desktop or laptop computer. Additionally, the code can be easily deployed to the Web, and has the potential to be a viable solution for several mobile use-cases.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Yanamala, M. Orandle, V. Kodali, Lindsey M. Bishop, P. Zeidler-Erdely, J. Roberts, V. Castranova, A. Erdely
Globally, carbon nanotubes (CNT) make up 30% of the total engineered nanomaterial market. Within that 30%, multi-walled carbon nanotubes (MWCNT) make up 94% of the total. Recent experimental evidence points towards significant pulmonary toxicity of MWCNTs such as inflammation, sub-pleural fibrosis and granuloma formation, associated with CNTs. Although numerous studies explore the adverse potential of various CNTs, their comparability is often limited. This is due to differences in administered dose, physico-chemical characteristics (e.g. agglomeration/aggregation state, metal impurities, stiffness, length) of the CNTs studied, exposure methods employed, as well as the differences in the end points monitored. In this study, we attempted to address the problem of identifying protein markers consistent across different MWCNT studies through the application of a sparse supervised classification methods. A panel of proteins measured in bronchoalveolar lavage collected from mice at various post-exposure time points and concentrations exposed to two different pristine or as-produced MWCNT, their polymer coated counterparts, or a well-studied reference material, MWCNT-7, were analyzed. The main objective was to take advantage of the power of sparse classification methods in identifying a small number of highly predictive and correlated markers (4 to 7, out of a panel of 52 proteins) that can distinguish exposure to MWCNT and/or be attributable to MWCNT toxicity in mice. Using this approach, we identified a small subset of proteins clearly distinguishing each exposure. MDC/CCL22, in particular, was associated with various MWCNT exposures and was independent of exposure route tested i.e., oropharyngeal aspiration versus inhalation exposure. The approaches presented in this study could enable comparison not only within a class of engineered nanomaterials but between various classes of nanomaterials. This study thus serves as a "proof of concept" that can be expanded to future nanomaterial risk profiling studies by informing decisions related to dose- and time-response relationships and to generate relevant experimental conditions.
{"title":"Supervised Machine Learning Approaches Predict and Characterize Nanomaterial Exposures: MWCNT Markers in Lung Lavage Fluid.","authors":"N. Yanamala, M. Orandle, V. Kodali, Lindsey M. Bishop, P. Zeidler-Erdely, J. Roberts, V. Castranova, A. Erdely","doi":"10.1145/3107411.3108181","DOIUrl":"https://doi.org/10.1145/3107411.3108181","url":null,"abstract":"Globally, carbon nanotubes (CNT) make up 30% of the total engineered nanomaterial market. Within that 30%, multi-walled carbon nanotubes (MWCNT) make up 94% of the total. Recent experimental evidence points towards significant pulmonary toxicity of MWCNTs such as inflammation, sub-pleural fibrosis and granuloma formation, associated with CNTs. Although numerous studies explore the adverse potential of various CNTs, their comparability is often limited. This is due to differences in administered dose, physico-chemical characteristics (e.g. agglomeration/aggregation state, metal impurities, stiffness, length) of the CNTs studied, exposure methods employed, as well as the differences in the end points monitored. In this study, we attempted to address the problem of identifying protein markers consistent across different MWCNT studies through the application of a sparse supervised classification methods. A panel of proteins measured in bronchoalveolar lavage collected from mice at various post-exposure time points and concentrations exposed to two different pristine or as-produced MWCNT, their polymer coated counterparts, or a well-studied reference material, MWCNT-7, were analyzed. The main objective was to take advantage of the power of sparse classification methods in identifying a small number of highly predictive and correlated markers (4 to 7, out of a panel of 52 proteins) that can distinguish exposure to MWCNT and/or be attributable to MWCNT toxicity in mice. Using this approach, we identified a small subset of proteins clearly distinguishing each exposure. MDC/CCL22, in particular, was associated with various MWCNT exposures and was independent of exposure route tested i.e., oropharyngeal aspiration versus inhalation exposure. The approaches presented in this study could enable comparison not only within a class of engineered nanomaterials but between various classes of nanomaterials. This study thus serves as a \"proof of concept\" that can be expanded to future nanomaterial risk profiling studies by informing decisions related to dose- and time-response relationships and to generate relevant experimental conditions.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114119553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microbiota are ecological communities of microorganisms found throughout nature. In humans and animals, microbiota communities can reside on or within the body, and exist in a commensal or mutualistic relationship with their host to impact physiological functions and play critical roles in the host's development. These microbial communities can be very complex. One such example is the intestinal microbiota, comprising hundreds of species that interact with other microorganisms in the community as well as their host. Recent studies have demonstrated that microbiota impacts a wide range of physiological processes, including digestion, development of the immune system, and inflammation. Further, significant alterations in the intestinal microbiota composition has shown to correlate with several diseases, including obesity diabetes, cancer, asthma, and even autism spectrum disorder. Characterizing the microbiota and understanding its relation to health and disease stand to significantly improve human health. Efforts to characterize microbiota have greatly benefited from technical advances in DNA sequencing. In particular, low-cost culture-independent sequencing has made metagenomic and metatranscriptomic surveys of microbial communities practical, including bacteria, archaea, viruses, and fungi associated with the human body, other hosts, and the environment. The resulting data have stimulated the development of many new computational approaches to meta'omic sequence analysis, including metagenomic assembly, microbial identification, and gene, transcript, and pathway metabolic profiling. Further, recent advances in untargeted metabolomics have stimulated the development of many tools that enhance the functional profiling of microbial communities. Through invited talks, this workshop will highlight recent advances computational methods for metagenomics and metabolomics, and present pressing challenges. A hands-on tutorial will provide an introduction to computational metagenomics. This workshop is timely, and will broaden the scope of the conference to cover such pressing important topics.
{"title":"A Workshop on Microbiomics, Metagenomics, and Metabolomics","authors":"S. Hassoun, C. Huttenhower","doi":"10.1145/3107411.3108172","DOIUrl":"https://doi.org/10.1145/3107411.3108172","url":null,"abstract":"Microbiota are ecological communities of microorganisms found throughout nature. In humans and animals, microbiota communities can reside on or within the body, and exist in a commensal or mutualistic relationship with their host to impact physiological functions and play critical roles in the host's development. These microbial communities can be very complex. One such example is the intestinal microbiota, comprising hundreds of species that interact with other microorganisms in the community as well as their host. Recent studies have demonstrated that microbiota impacts a wide range of physiological processes, including digestion, development of the immune system, and inflammation. Further, significant alterations in the intestinal microbiota composition has shown to correlate with several diseases, including obesity diabetes, cancer, asthma, and even autism spectrum disorder. Characterizing the microbiota and understanding its relation to health and disease stand to significantly improve human health. Efforts to characterize microbiota have greatly benefited from technical advances in DNA sequencing. In particular, low-cost culture-independent sequencing has made metagenomic and metatranscriptomic surveys of microbial communities practical, including bacteria, archaea, viruses, and fungi associated with the human body, other hosts, and the environment. The resulting data have stimulated the development of many new computational approaches to meta'omic sequence analysis, including metagenomic assembly, microbial identification, and gene, transcript, and pathway metabolic profiling. Further, recent advances in untargeted metabolomics have stimulated the development of many tools that enhance the functional profiling of microbial communities. Through invited talks, this workshop will highlight recent advances computational methods for metagenomics and metabolomics, and present pressing challenges. A hands-on tutorial will provide an introduction to computational metagenomics. This workshop is timely, and will broaden the scope of the conference to cover such pressing important topics.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"313 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116627432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}