首页 > 最新文献

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics最新文献

英文 中文
Automated Next Generation Sequencing Bioinformatics Pipelines for Pathogen Discovery and Surveillance 用于病原体发现和监测的自动化下一代测序生物信息学管道
M. Okomo-Adhiambo, E. Ramos, Reagan J. Kelly, Yatish Jain, R. Tatusov, A. Montmayeur, Gregory Doho, Rachel L. Marine, T. Ng, Adam C. Retchless, S. Oberste, P. Rota, X. Wang, Agha N. Khan
Next-generation sequencing (NGS) has become a vital tool in clinical microbiology, with numerous applications in infectious disease diagnostics, outbreak investigations, and public health surveillance. Although the NGS technology enables comprehensive pathogen detection in a relatively short time at a low cost, the enormous amount of genomics data generated creates a critical challenge of effectively organizing, archiving, analyzing, and reporting the results within a clinically relevant timeframe. Automated pipelines provide the first step in standardizing NGS data processing and reporting, thus eliminating the common bottlenecks in bioinformatics analyses, and providing rapid turnaround. Here, we present the Viral NGS Pipeline optimized for identification and whole genome assembly of viruses, and the Bacterial Meningococcus Genome Analysis Platform (BMGAP), designed for genotypic characterization of meningitis pathogens. These respective pipelines have been used to analyze more than 11,000 clinical samples and isolates. The pipelines are deployable on both standalone and cloud-based servers, enabling their accessibility to internal CDC users, as well as external partners, including state public health laboratories and other collaborators worldwide. These automated pipelines have the potential to contribute to the development of unbiased NGS-based clinical assays for pathogen detection that demand rapid turnaround times, and are expected to play a key role in infectious disease surveillance in the future.
新一代测序(NGS)已成为临床微生物学的重要工具,在传染病诊断、疫情调查和公共卫生监测方面有着广泛的应用。尽管NGS技术能够在相对较短的时间内以较低的成本进行全面的病原体检测,但所产生的大量基因组学数据为在临床相关的时间框架内有效组织、存档、分析和报告结果带来了重大挑战。自动化管道为标准化NGS数据处理和报告提供了第一步,从而消除了生物信息学分析中的常见瓶颈,并提供了快速周转。在这里,我们提出了用于病毒鉴定和全基因组组装的病毒NGS管道,以及用于脑膜炎病原体基因型表征的细菌性脑膜炎球菌基因组分析平台(BMGAP)。这些各自的管道已用于分析11,000多个临床样本和分离株。这些管道可部署在独立服务器和基于云的服务器上,使CDC内部用户以及外部合作伙伴(包括州公共卫生实验室和全球其他合作者)能够访问它们。这些自动化管道有可能有助于开发基于ngs的无偏见临床检测方法,用于需要快速周转时间的病原体检测,并有望在未来的传染病监测中发挥关键作用。
{"title":"Automated Next Generation Sequencing Bioinformatics Pipelines for Pathogen Discovery and Surveillance","authors":"M. Okomo-Adhiambo, E. Ramos, Reagan J. Kelly, Yatish Jain, R. Tatusov, A. Montmayeur, Gregory Doho, Rachel L. Marine, T. Ng, Adam C. Retchless, S. Oberste, P. Rota, X. Wang, Agha N. Khan","doi":"10.1145/3107411.3108192","DOIUrl":"https://doi.org/10.1145/3107411.3108192","url":null,"abstract":"Next-generation sequencing (NGS) has become a vital tool in clinical microbiology, with numerous applications in infectious disease diagnostics, outbreak investigations, and public health surveillance. Although the NGS technology enables comprehensive pathogen detection in a relatively short time at a low cost, the enormous amount of genomics data generated creates a critical challenge of effectively organizing, archiving, analyzing, and reporting the results within a clinically relevant timeframe. Automated pipelines provide the first step in standardizing NGS data processing and reporting, thus eliminating the common bottlenecks in bioinformatics analyses, and providing rapid turnaround. Here, we present the Viral NGS Pipeline optimized for identification and whole genome assembly of viruses, and the Bacterial Meningococcus Genome Analysis Platform (BMGAP), designed for genotypic characterization of meningitis pathogens. These respective pipelines have been used to analyze more than 11,000 clinical samples and isolates. The pipelines are deployable on both standalone and cloud-based servers, enabling their accessibility to internal CDC users, as well as external partners, including state public health laboratories and other collaborators worldwide. These automated pipelines have the potential to contribute to the development of unbiased NGS-based clinical assays for pathogen detection that demand rapid turnaround times, and are expected to play a key role in infectious disease surveillance in the future.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outlier Genes as Biomarkers of Breast Cancer Survivability in Time-Series Data 异常基因在时间序列数据中作为乳腺癌生存能力的生物标志物
Naveen Mangalakumar, A. Alkhateeb, H. Pham, L. Rueda, A. Ngom
Studying gene expression through various time intervals of breast cancer survival may provide new insights into the recovery from the disease. In this work, we propose a hierarchical clustering method to separate dissimilar groups of gene time-series profiles, which have the furthest distances from the rest of the profiles throughout different time intervals. The isolated outliers can be used as potential biomarkers of Breast Cancer survivability. Gene expressions throughout those time points are cubic spline interpolated to create a trending profile for each gene. After universally aligning the profiles to minimize the vertical area between each pair of profiles, we cluster the genes using hierarchical clustering based on minimized vertical distances [1]. An appropriate number of clusters was chosen based on the profile alignment and agglomerative clustering (PAAC) index as well as visual observations of the clusters. Our study suggests that the combination of proper clustering, distance function and index validation for clusters is a suitable model to identify genes as informative biomarkers of breast cancer survivability.
通过乳腺癌生存的不同时间间隔研究基因表达可能为从疾病中恢复提供新的见解。在这项工作中,我们提出了一种分层聚类方法来分离不同的基因时间序列谱,这些基因时间序列谱在不同的时间间隔内与其他谱距离最远。孤立的异常值可作为乳腺癌生存能力的潜在生物标志物。在这些时间点上的基因表达被三次样条插值,以创建每个基因的趋势剖面。在普遍对齐基因图谱以最小化每对基因图谱之间的垂直面积后,我们使用基于最小化垂直距离的分层聚类方法对基因进行聚类[1]。根据聚类指数(PAAC)和对聚类的视觉观察,选择合适的聚类数量。我们的研究表明,结合适当的聚类、距离函数和聚类的指数验证是一种合适的模型,可以识别基因作为乳腺癌生存能力的信息生物标志物。
{"title":"Outlier Genes as Biomarkers of Breast Cancer Survivability in Time-Series Data","authors":"Naveen Mangalakumar, A. Alkhateeb, H. Pham, L. Rueda, A. Ngom","doi":"10.1145/3107411.3108202","DOIUrl":"https://doi.org/10.1145/3107411.3108202","url":null,"abstract":"Studying gene expression through various time intervals of breast cancer survival may provide new insights into the recovery from the disease. In this work, we propose a hierarchical clustering method to separate dissimilar groups of gene time-series profiles, which have the furthest distances from the rest of the profiles throughout different time intervals. The isolated outliers can be used as potential biomarkers of Breast Cancer survivability. Gene expressions throughout those time points are cubic spline interpolated to create a trending profile for each gene. After universally aligning the profiles to minimize the vertical area between each pair of profiles, we cluster the genes using hierarchical clustering based on minimized vertical distances [1]. An appropriate number of clusters was chosen based on the profile alignment and agglomerative clustering (PAAC) index as well as visual observations of the clusters. Our study suggests that the combination of proper clustering, distance function and index validation for clusters is a suitable model to identify genes as informative biomarkers of breast cancer survivability.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127050249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Integrated Reconciliation Framework for Domain, Gene, and Species Level Evolution 领域、基因和物种水平进化的整合协调框架
Lei Li, Mukul S. Bansal
The majority of genes in eukaryotes consist of multiple protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences for genes. Yet, most computational methods for studying gene evolution view genes as the basic unit of evolution and assume that evolutionary processes such as duplications and losses act on entire genes, rather than on parts of genes. Specifically, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop a three-tree model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species tree, by explicitly accounting for domain-level events. The new model decouples domain-level events from gene-level events and provides a much more fine-grained view of gene family and domain family evolution that is easy to interpret. Specifically, we (i) introduce the new three-tree computational framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large dataset of about 4000 domain trees and 7000 gene trees from 12 fly species, and (v) demonstrate the impact of using our new computational framework by comparing the inferred evolutionary histories against those obtained using existing approaches. Our experimental results show that using the new three-tree model has a significant impact on the inference of both domain-level and gene-level events, and on the inference of domain content in ancestral genes and gene content in ancestral species, compared to existing approaches.
真核生物中的大多数基因由多个蛋白质结构域组成,这些蛋白质结构域可以在进化过程中独立地丢失或获得。这种通过结构域复制、转移或损失的蛋白质结构域的获得和损失对基因的进化和功能具有重要的影响。然而,大多数研究基因进化的计算方法将基因视为进化的基本单位,并假设复制和丢失等进化过程作用于整个基因,而不是基因的一部分。具体来说,尽管众所周知,结构域在基因内部进化,基因在物种内部进化,但目前还不存在任何计算框架来同时模拟结构域、基因和物种的进化,并解释它们之间的相互依赖性。在这里,我们开发了一个三树结构域进化模型,明确地捕捉了结构域、基因和物种水平进化的相互依存关系。我们的模型扩展了经典的系统发育和解框架,该框架通过比较基因树和物种树来推断基因家族的进化,通过明确地考虑域级事件。新模型将领域级事件与基因级事件解耦,并提供了更细粒度的基因家族和领域家族进化视图,易于解释。具体来说,我们(i)引入了新的三树计算框架,(ii)证明了相关的优化问题是np困难的,(iii)为该问题设计了一个有效的启发式解决方案,(iv)将我们的算法应用于来自12种果蝇的约4000个域树和7000个基因树的大型数据集,(v)通过将推断的进化历史与使用现有方法获得的进化历史进行比较,证明了使用我们的新计算框架的影响。我们的实验结果表明,与现有方法相比,使用新的三树模型对域水平和基因水平事件的推断,以及祖先基因的域内容和祖先物种的基因内容的推断都有显着影响。
{"title":"An Integrated Reconciliation Framework for Domain, Gene, and Species Level Evolution","authors":"Lei Li, Mukul S. Bansal","doi":"10.1145/3107411.3108220","DOIUrl":"https://doi.org/10.1145/3107411.3108220","url":null,"abstract":"The majority of genes in eukaryotes consist of multiple protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences for genes. Yet, most computational methods for studying gene evolution view genes as the basic unit of evolution and assume that evolutionary processes such as duplications and losses act on entire genes, rather than on parts of genes. Specifically, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop a three-tree model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species tree, by explicitly accounting for domain-level events. The new model decouples domain-level events from gene-level events and provides a much more fine-grained view of gene family and domain family evolution that is easy to interpret. Specifically, we (i) introduce the new three-tree computational framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large dataset of about 4000 domain trees and 7000 gene trees from 12 fly species, and (v) demonstrate the impact of using our new computational framework by comparing the inferred evolutionary histories against those obtained using existing approaches. Our experimental results show that using the new three-tree model has a significant impact on the inference of both domain-level and gene-level events, and on the inference of domain content in ancestral genes and gene content in ancestral species, compared to existing approaches.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132023730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Network Analysis of Correlated Mutations in Influenza 流感相关突变的网络分析
Uday Yallapragada, I. Vaisman
Influenza A Virus (IAV) is remarkably adept at surviving in human populations. IAV thrives even among populations with wide spread access to vaccines and anti-viral drugs, and continues to be a major cause of morbidity and mortality. Correlated mutations are an important factor in IAV's evolution and are critical for host adaptation and pathogenicity. Large sets of publicly available sequences of IAV combined with its rapid and complex evolutionary dynamics present interesting opportunities and unique challenges to analyze correlated mutations in influenza proteomes. In this work, we performed a comprehensive analysis of correlated mutations in IAV using a network theory approach where residues in each protein act as nodes in the graph and edges in the graph are created based on inter-residue correlated mutations. Our approach used 'maximal information coefficient' (MIC) to compute correlations between residues and the edges connect nodes if their MIC exceeds a threshold. We created a modular and robust pipeline and applied it to multiple datasets of H1N1, H3N2, H5 and H7N9 subtypes. We studied structural dynamics of IAV sub-systems based on topological properties of their networks resulting in several important conclusions. The main finding is that correlated mutation networks in IAV are sub-type and host specific and the differences for various subtypes and hosts are significant. We identified nodes with highest degree along with edges and triplets with strongest weight for each network. To contextualize our results, we performed entropy analysis to gain a global view of sequence variation and computed solvent accessibility profiles to identify statistical differences in correlation profiles between surface and buried residues. To understand the extent of co-variation between the 10 proteins in IAV sequences, we created visualizations of protein correlation graphs where the proteins acts as nodes and the strength of connections between the nodes depends on the number of correlated mutations between residues of connected proteins. A web application and visualization tools to explore the results and search for correlated mutations were developed.
甲型流感病毒(IAV)非常善于在人群中生存。即使在广泛获得疫苗和抗病毒药物的人群中,禽流感也很猖獗,并继续成为发病率和死亡率的一个主要原因。相关突变是IAV进化的重要因素,对宿主适应和致病性至关重要。大量可公开获得的流感病毒序列及其快速而复杂的进化动力学为分析流感蛋白质组的相关突变提供了有趣的机会和独特的挑战。在这项工作中,我们使用网络理论方法对IAV中的相关突变进行了全面分析,其中每个蛋白质中的残基作为图中的节点,图中的边是基于残基间相关突变创建的。我们的方法使用“最大信息系数”(MIC)来计算残基和连接节点的边之间的相关性,如果它们的MIC超过阈值。我们创建了一个模块化和强大的管道,并将其应用于H1N1, H3N2, H5和H7N9亚型的多个数据集。我们研究了基于网络拓扑特性的IAV子系统的结构动力学,得出了几个重要的结论。主要发现是IAV的相关突变网络具有亚型和宿主特异性,不同亚型和宿主之间差异显著。我们为每个网络识别度最高的节点以及权重最强的边和三联体。为了将我们的结果联系起来,我们进行了熵分析,以获得序列变化的全局视图,并计算了溶剂可及性曲线,以确定地表和掩埋残留物之间相关曲线的统计差异。为了了解IAV序列中10种蛋白质之间的共变异程度,我们创建了蛋白质相关图的可视化,其中蛋白质作为节点,节点之间的连接强度取决于连接蛋白质残基之间相关突变的数量。开发了一个web应用程序和可视化工具来探索结果和搜索相关突变。
{"title":"Network Analysis of Correlated Mutations in Influenza","authors":"Uday Yallapragada, I. Vaisman","doi":"10.1145/3107411.3108237","DOIUrl":"https://doi.org/10.1145/3107411.3108237","url":null,"abstract":"Influenza A Virus (IAV) is remarkably adept at surviving in human populations. IAV thrives even among populations with wide spread access to vaccines and anti-viral drugs, and continues to be a major cause of morbidity and mortality. Correlated mutations are an important factor in IAV's evolution and are critical for host adaptation and pathogenicity. Large sets of publicly available sequences of IAV combined with its rapid and complex evolutionary dynamics present interesting opportunities and unique challenges to analyze correlated mutations in influenza proteomes. In this work, we performed a comprehensive analysis of correlated mutations in IAV using a network theory approach where residues in each protein act as nodes in the graph and edges in the graph are created based on inter-residue correlated mutations. Our approach used 'maximal information coefficient' (MIC) to compute correlations between residues and the edges connect nodes if their MIC exceeds a threshold. We created a modular and robust pipeline and applied it to multiple datasets of H1N1, H3N2, H5 and H7N9 subtypes. We studied structural dynamics of IAV sub-systems based on topological properties of their networks resulting in several important conclusions. The main finding is that correlated mutation networks in IAV are sub-type and host specific and the differences for various subtypes and hosts are significant. We identified nodes with highest degree along with edges and triplets with strongest weight for each network. To contextualize our results, we performed entropy analysis to gain a global view of sequence variation and computed solvent accessibility profiles to identify statistical differences in correlation profiles between surface and buried residues. To understand the extent of co-variation between the 10 proteins in IAV sequences, we created visualizations of protein correlation graphs where the proteins acts as nodes and the strength of connections between the nodes depends on the number of correlated mutations between residues of connected proteins. A web application and visualization tools to explore the results and search for correlated mutations were developed.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132538059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Session 4: Genomic Variation and Disease 会议详情:第四部分:基因组变异与疾病
Anna M. Ritz
{"title":"Session details: Session 4: Genomic Variation and Disease","authors":"Anna M. Ritz","doi":"10.1145/3254547","DOIUrl":"https://doi.org/10.1145/3254547","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131859129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
String-Based Models for Predicting RNA-Protein Interaction 基于字符串的rna -蛋白相互作用预测模型
D. Adjeroh, Maen Allaga, Jun Tan, Jie Lin, Yue Jiang, A. Abbasi, Xiaobo Zhou
In this work, we study string-based approaches for the problem of RNA-Protein Interaction (RPI). We apply string algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed string-based models, including comparative results against state-of-the-art methods.
在这项工作中,我们研究了基于字符串的rna -蛋白质相互作用(RPI)问题的方法。我们利用序列信息(蛋白质和RNA序列)和结构信息(蛋白质和RNA二级结构),应用字符串算法和数据结构提取有效的字符串模式来预测RPI。这导致了不同的基于字符串的模型来预测相互作用的rna -蛋白对。我们展示的结果证明了所提出的基于字符串的模型的有效性,包括与最先进的方法的比较结果。
{"title":"String-Based Models for Predicting RNA-Protein Interaction","authors":"D. Adjeroh, Maen Allaga, Jun Tan, Jie Lin, Yue Jiang, A. Abbasi, Xiaobo Zhou","doi":"10.1145/3107411.3107508","DOIUrl":"https://doi.org/10.1145/3107411.3107508","url":null,"abstract":"In this work, we study string-based approaches for the problem of RNA-Protein Interaction (RPI). We apply string algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed string-based models, including comparative results against state-of-the-art methods.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133403158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Flexible and Robust Multi-Source Learning Algorithm for Drug Repositioning 一种灵活鲁棒的药物重定位多源学习算法
Huiyuan Chen, Jing Li
Drug repositioning is a promising strategy in drug discovery. New biomedical insights of drug-target-disease relationships are important in drug repositioning, and such relationships have been intensively studied recently. Most of the studies utilize network-based computational approaches based on drug and disease similarities. However, one common limitation of existing approaches is that both drug similarities and disease similarities are defined based on a single feature of drugs/diseases. In reality, the relationships between drug (or disease) pairs can be characterized based on many different features. Therefore, it is increasingly important to include them in drug repositioning studies. In this study, we propose a flexible and robust multi-source learning (FRMSL) framework to integrate multiple heterogeneous data sources for drug-disease association predictions. We first construct a two-layer heterogeneous network consisting of drug nodes, disease nodes and known drug-disease relationships. The drug repositioning problem can thus be treated as a missing link prediction problem on the heterogeneous graph and can be solved using Kronecker regularized least square (KronRLS) method. Multiple data sources describing drugs and diseases are incorporated into the framework using similarity-based kernels. In practice, a great challenge in such data integration projects is the data incompleteness problem due to the nature of data generation and collection. To address this issue, we develop a novel multi-view learning algorithm based on symmetric nonnegative matrix factorization (SymNMF). Extensive experimental studies show that our framework outperforms several recent network-based methods.
药物重新定位是一种很有前途的药物发现策略。药物-靶标-疾病关系的生物医学新见解在药物重新定位中很重要,这种关系近年来得到了广泛的研究。大多数研究利用基于药物和疾病相似性的基于网络的计算方法。然而,现有方法的一个共同局限性是,药物相似度和疾病相似度都是基于药物/疾病的单一特征来定义的。在现实中,药物(或疾病)对之间的关系可以基于许多不同的特征来表征。因此,将它们纳入药物重新定位研究变得越来越重要。在这项研究中,我们提出了一个灵活而稳健的多源学习(FRMSL)框架,以整合多个异构数据源进行药物-疾病关联预测。我们首先构建了一个由药物节点、疾病节点和已知药物-疾病关系组成的两层异构网络。因此,药物重定位问题可以看作是异构图上的缺失环节预测问题,可以使用Kronecker正则化最小二乘(KronRLS)方法进行求解。使用基于相似性的核将描述药物和疾病的多个数据源纳入框架。在实践中,由于数据生成和收集的性质,数据不完整性问题是此类数据集成项目面临的一大挑战。为了解决这个问题,我们开发了一种新的基于对称非负矩阵分解(SymNMF)的多视图学习算法。大量的实验研究表明,我们的框架优于最近几种基于网络的方法。
{"title":"A Flexible and Robust Multi-Source Learning Algorithm for Drug Repositioning","authors":"Huiyuan Chen, Jing Li","doi":"10.1145/3107411.3107473","DOIUrl":"https://doi.org/10.1145/3107411.3107473","url":null,"abstract":"Drug repositioning is a promising strategy in drug discovery. New biomedical insights of drug-target-disease relationships are important in drug repositioning, and such relationships have been intensively studied recently. Most of the studies utilize network-based computational approaches based on drug and disease similarities. However, one common limitation of existing approaches is that both drug similarities and disease similarities are defined based on a single feature of drugs/diseases. In reality, the relationships between drug (or disease) pairs can be characterized based on many different features. Therefore, it is increasingly important to include them in drug repositioning studies. In this study, we propose a flexible and robust multi-source learning (FRMSL) framework to integrate multiple heterogeneous data sources for drug-disease association predictions. We first construct a two-layer heterogeneous network consisting of drug nodes, disease nodes and known drug-disease relationships. The drug repositioning problem can thus be treated as a missing link prediction problem on the heterogeneous graph and can be solved using Kronecker regularized least square (KronRLS) method. Multiple data sources describing drugs and diseases are incorporated into the framework using similarity-based kernels. In practice, a great challenge in such data integration projects is the data incompleteness problem due to the nature of data generation and collection. To address this issue, we develop a novel multi-view learning algorithm based on symmetric nonnegative matrix factorization (SymNMF). Extensive experimental studies show that our framework outperforms several recent network-based methods.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133174011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Cross-Platform System Architecture for Form Design and Data Analytics for Public Health 面向公共卫生表单设计和数据分析的跨平台系统架构
Blake Camp, J. Mandivarapu, Jay Mehta, Nagashayana Ramamurthy, James Wingo, A. Bourgeois, Xiaojun Cao, Rajshekhar Sunderraman
The CDC's Epi-Info is widely-used by epidemiologists and public health researchers to collect and analyze public health data, especially in the event of outbreaks. As it exists today, Epi-Info runs only on the Windows platform and is made of separate code-bases for several different devices and use-cases. Software portability has become increasingly important over the past few years. In this poster, we present a cross-platform architecture for Epi-Info. To simplify and expedite future development, the cross-platform system architecture uses Electron, AngularJS, and Python with the capability of running on virtually any desktop or laptop computer. Additionally, the code can be easily deployed to the Web, and has the potential to be a viable solution for several mobile use-cases.
流行病学家和公共卫生研究人员广泛使用疾病预防控制中心的Epi-Info来收集和分析公共卫生数据,特别是在爆发疫情的情况下。就目前而言,Epi-Info仅在Windows平台上运行,由不同设备和用例的独立代码库组成。软件可移植性在过去几年中变得越来越重要。在这张海报中,我们展示了Epi-Info的跨平台架构。为了简化和加快未来的开发,跨平台系统架构使用了Electron、AngularJS和Python,并且能够在几乎任何台式机或笔记本电脑上运行。此外,代码可以很容易地部署到Web上,并且有可能成为几个移动用例的可行解决方案。
{"title":"A Cross-Platform System Architecture for Form Design and Data Analytics for Public Health","authors":"Blake Camp, J. Mandivarapu, Jay Mehta, Nagashayana Ramamurthy, James Wingo, A. Bourgeois, Xiaojun Cao, Rajshekhar Sunderraman","doi":"10.1145/3107411.3108223","DOIUrl":"https://doi.org/10.1145/3107411.3108223","url":null,"abstract":"The CDC's Epi-Info is widely-used by epidemiologists and public health researchers to collect and analyze public health data, especially in the event of outbreaks. As it exists today, Epi-Info runs only on the Windows platform and is made of separate code-bases for several different devices and use-cases. Software portability has become increasingly important over the past few years. In this poster, we present a cross-platform architecture for Epi-Info. To simplify and expedite future development, the cross-platform system architecture uses Electron, AngularJS, and Python with the capability of running on virtually any desktop or laptop computer. Additionally, the code can be easily deployed to the Web, and has the potential to be a viable solution for several mobile use-cases.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Supervised Machine Learning Approaches Predict and Characterize Nanomaterial Exposures: MWCNT Markers in Lung Lavage Fluid. 监督机器学习方法预测和表征纳米材料暴露:肺灌洗液中的MWCNT标记物。
N. Yanamala, M. Orandle, V. Kodali, Lindsey M. Bishop, P. Zeidler-Erdely, J. Roberts, V. Castranova, A. Erdely
Globally, carbon nanotubes (CNT) make up 30% of the total engineered nanomaterial market. Within that 30%, multi-walled carbon nanotubes (MWCNT) make up 94% of the total. Recent experimental evidence points towards significant pulmonary toxicity of MWCNTs such as inflammation, sub-pleural fibrosis and granuloma formation, associated with CNTs. Although numerous studies explore the adverse potential of various CNTs, their comparability is often limited. This is due to differences in administered dose, physico-chemical characteristics (e.g. agglomeration/aggregation state, metal impurities, stiffness, length) of the CNTs studied, exposure methods employed, as well as the differences in the end points monitored. In this study, we attempted to address the problem of identifying protein markers consistent across different MWCNT studies through the application of a sparse supervised classification methods. A panel of proteins measured in bronchoalveolar lavage collected from mice at various post-exposure time points and concentrations exposed to two different pristine or as-produced MWCNT, their polymer coated counterparts, or a well-studied reference material, MWCNT-7, were analyzed. The main objective was to take advantage of the power of sparse classification methods in identifying a small number of highly predictive and correlated markers (4 to 7, out of a panel of 52 proteins) that can distinguish exposure to MWCNT and/or be attributable to MWCNT toxicity in mice. Using this approach, we identified a small subset of proteins clearly distinguishing each exposure. MDC/CCL22, in particular, was associated with various MWCNT exposures and was independent of exposure route tested i.e., oropharyngeal aspiration versus inhalation exposure. The approaches presented in this study could enable comparison not only within a class of engineered nanomaterials but between various classes of nanomaterials. This study thus serves as a "proof of concept" that can be expanded to future nanomaterial risk profiling studies by informing decisions related to dose- and time-response relationships and to generate relevant experimental conditions.
在全球范围内,碳纳米管(CNT)占整个工程纳米材料市场的30%。在这30%中,多壁碳纳米管(MWCNT)占总数的94%。最近的实验证据表明,与CNTs相关的MWCNTs具有显著的肺毒性,如炎症、胸膜下纤维化和肉芽肿形成。尽管许多研究探讨了各种碳纳米管的潜在不利影响,但它们的可比性往往有限。这是由于给药剂量、所研究碳纳米管的物理化学特性(如团聚/聚集状态、金属杂质、硬度、长度)、所采用的暴露方法以及监测终点的差异造成的。在这项研究中,我们试图通过应用稀疏监督分类方法来解决识别不同MWCNT研究中一致的蛋白质标记的问题。在暴露后的不同时间点和暴露于两种不同的原始或生产的MWCNT,其聚合物涂层对应物或经过充分研究的参考物质MWCNT-7的浓度下,从小鼠收集的支气管肺泡灌洗液中测量的一组蛋白质进行了分析。主要目的是利用稀疏分类方法的力量来识别少数具有高度预测性和相关性的标记(从52个蛋白质组中选出4到7个),这些标记可以区分小鼠暴露于MWCNT和/或归因于MWCNT毒性。使用这种方法,我们确定了一小部分蛋白质,可以清楚地区分每种暴露。特别是MDC/CCL22与各种MWCNT暴露有关,并且独立于所测试的暴露途径,即口咽吸入与吸入暴露。本研究中提出的方法不仅可以在一类工程纳米材料内进行比较,而且可以在不同类别的纳米材料之间进行比较。因此,这项研究可以作为“概念证明”,通过为与剂量和时间反应关系相关的决策提供信息,并产生相关的实验条件,可以扩展到未来的纳米材料风险分析研究。
{"title":"Supervised Machine Learning Approaches Predict and Characterize Nanomaterial Exposures: MWCNT Markers in Lung Lavage Fluid.","authors":"N. Yanamala, M. Orandle, V. Kodali, Lindsey M. Bishop, P. Zeidler-Erdely, J. Roberts, V. Castranova, A. Erdely","doi":"10.1145/3107411.3108181","DOIUrl":"https://doi.org/10.1145/3107411.3108181","url":null,"abstract":"Globally, carbon nanotubes (CNT) make up 30% of the total engineered nanomaterial market. Within that 30%, multi-walled carbon nanotubes (MWCNT) make up 94% of the total. Recent experimental evidence points towards significant pulmonary toxicity of MWCNTs such as inflammation, sub-pleural fibrosis and granuloma formation, associated with CNTs. Although numerous studies explore the adverse potential of various CNTs, their comparability is often limited. This is due to differences in administered dose, physico-chemical characteristics (e.g. agglomeration/aggregation state, metal impurities, stiffness, length) of the CNTs studied, exposure methods employed, as well as the differences in the end points monitored. In this study, we attempted to address the problem of identifying protein markers consistent across different MWCNT studies through the application of a sparse supervised classification methods. A panel of proteins measured in bronchoalveolar lavage collected from mice at various post-exposure time points and concentrations exposed to two different pristine or as-produced MWCNT, their polymer coated counterparts, or a well-studied reference material, MWCNT-7, were analyzed. The main objective was to take advantage of the power of sparse classification methods in identifying a small number of highly predictive and correlated markers (4 to 7, out of a panel of 52 proteins) that can distinguish exposure to MWCNT and/or be attributable to MWCNT toxicity in mice. Using this approach, we identified a small subset of proteins clearly distinguishing each exposure. MDC/CCL22, in particular, was associated with various MWCNT exposures and was independent of exposure route tested i.e., oropharyngeal aspiration versus inhalation exposure. The approaches presented in this study could enable comparison not only within a class of engineered nanomaterials but between various classes of nanomaterials. This study thus serves as a \"proof of concept\" that can be expanded to future nanomaterial risk profiling studies by informing decisions related to dose- and time-response relationships and to generate relevant experimental conditions.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114119553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Workshop on Microbiomics, Metagenomics, and Metabolomics 微生物组学、宏基因组学和代谢组学研讨会
S. Hassoun, C. Huttenhower
Microbiota are ecological communities of microorganisms found throughout nature. In humans and animals, microbiota communities can reside on or within the body, and exist in a commensal or mutualistic relationship with their host to impact physiological functions and play critical roles in the host's development. These microbial communities can be very complex. One such example is the intestinal microbiota, comprising hundreds of species that interact with other microorganisms in the community as well as their host. Recent studies have demonstrated that microbiota impacts a wide range of physiological processes, including digestion, development of the immune system, and inflammation. Further, significant alterations in the intestinal microbiota composition has shown to correlate with several diseases, including obesity diabetes, cancer, asthma, and even autism spectrum disorder. Characterizing the microbiota and understanding its relation to health and disease stand to significantly improve human health. Efforts to characterize microbiota have greatly benefited from technical advances in DNA sequencing. In particular, low-cost culture-independent sequencing has made metagenomic and metatranscriptomic surveys of microbial communities practical, including bacteria, archaea, viruses, and fungi associated with the human body, other hosts, and the environment. The resulting data have stimulated the development of many new computational approaches to meta'omic sequence analysis, including metagenomic assembly, microbial identification, and gene, transcript, and pathway metabolic profiling. Further, recent advances in untargeted metabolomics have stimulated the development of many tools that enhance the functional profiling of microbial communities. Through invited talks, this workshop will highlight recent advances computational methods for metagenomics and metabolomics, and present pressing challenges. A hands-on tutorial will provide an introduction to computational metagenomics. This workshop is timely, and will broaden the scope of the conference to cover such pressing important topics.
微生物群是自然界中发现的微生物生态群落。在人类和动物体内,微生物群落可以存在于体内或体表,与宿主存在共生或互惠关系,影响宿主的生理功能,在宿主的发育过程中发挥关键作用。这些微生物群落可能非常复杂。一个这样的例子是肠道微生物群,包括数百种与群落中的其他微生物及其宿主相互作用的物种。最近的研究表明,微生物群影响广泛的生理过程,包括消化、免疫系统的发育和炎症。此外,肠道微生物群组成的显著变化已被证明与几种疾病相关,包括肥胖、糖尿病、癌症、哮喘,甚至自闭症谱系障碍。描述微生物群并了解其与健康和疾病的关系将显著改善人类健康。微生物群特征的研究很大程度上得益于DNA测序技术的进步。特别是,低成本的培养独立测序使得微生物群落的宏基因组和亚转录组学调查变得可行,包括与人体、其他宿主和环境相关的细菌、古生菌、病毒和真菌。由此产生的数据刺激了许多新的元基因组序列分析计算方法的发展,包括元基因组组装、微生物鉴定、基因、转录和途径代谢分析。此外,非靶向代谢组学的最新进展刺激了许多工具的发展,这些工具可以增强微生物群落的功能谱。通过特邀演讲,本次研讨会将重点介绍宏基因组学和代谢组学计算方法的最新进展,以及目前面临的紧迫挑战。动手教程将提供计算宏基因组学的介绍。这次研讨会很及时,将扩大会议的范围,涵盖这些紧迫的重要议题。
{"title":"A Workshop on Microbiomics, Metagenomics, and Metabolomics","authors":"S. Hassoun, C. Huttenhower","doi":"10.1145/3107411.3108172","DOIUrl":"https://doi.org/10.1145/3107411.3108172","url":null,"abstract":"Microbiota are ecological communities of microorganisms found throughout nature. In humans and animals, microbiota communities can reside on or within the body, and exist in a commensal or mutualistic relationship with their host to impact physiological functions and play critical roles in the host's development. These microbial communities can be very complex. One such example is the intestinal microbiota, comprising hundreds of species that interact with other microorganisms in the community as well as their host. Recent studies have demonstrated that microbiota impacts a wide range of physiological processes, including digestion, development of the immune system, and inflammation. Further, significant alterations in the intestinal microbiota composition has shown to correlate with several diseases, including obesity diabetes, cancer, asthma, and even autism spectrum disorder. Characterizing the microbiota and understanding its relation to health and disease stand to significantly improve human health. Efforts to characterize microbiota have greatly benefited from technical advances in DNA sequencing. In particular, low-cost culture-independent sequencing has made metagenomic and metatranscriptomic surveys of microbial communities practical, including bacteria, archaea, viruses, and fungi associated with the human body, other hosts, and the environment. The resulting data have stimulated the development of many new computational approaches to meta'omic sequence analysis, including metagenomic assembly, microbial identification, and gene, transcript, and pathway metabolic profiling. Further, recent advances in untargeted metabolomics have stimulated the development of many tools that enhance the functional profiling of microbial communities. Through invited talks, this workshop will highlight recent advances computational methods for metagenomics and metabolomics, and present pressing challenges. A hands-on tutorial will provide an introduction to computational metagenomics. This workshop is timely, and will broaden the scope of the conference to cover such pressing important topics.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"313 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116627432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1