Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics最新文献

英文中文

Analysis of Single Cells on a Pseudotime Scale along Postnatal Pancreatic Beta Cell Development 出生后胰腺β细胞发育过程中单细胞伪时间尺度的分析

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3107411.3107458

F. Mulas, Chun Zeng, Yinghui Sui, Tiffany Guan, Nathanael Miller, Yuliang Tan, Fenfen Liu, Wen Jin, Andrea C. Carrano, M. Huising, O. Shirihai, Gene W. Yeo, M. Sander

Single-cell RNA-seq generates gene expression profiles of individual cells and has furthered our understanding of the developmental and cellular hierarchy within complex tissues. One computational challenge in analyzing single-cell data sets is reconstructing the progression of individual cells with respect to the gradual transition of their transcriptomes. While a number of single-cell ordering tools have been proposed, many of these require knowledge of progression markers or time delineators. Here, we adapted an algorithm previously developed for temporally ordering bulk microarray samples [1] to reconstruct the developmental trajectory of pancreatic beta-cells postnatally. To accomplish this, we applied a multi-step pipeline to analyze single-cell RNA-seq data sets from isolated beta-cells at five different time points between birth and post-weaning. Specifically, we i) ordered cells along a linear trajectory (the Pseudotime Scale) by applying one-dimensional principal component analysis to the normalized data matrix; ii) identified annotated and de-novo gene sets significantly regulated along the trajectory; iii) built a network of top-regulated genes using protein interaction repositories; and iv) scored genes for their network connectivity to transcription factors [2]. A systematic comparison showed that our approach was more accurate in correctly ordering cells for our data set than previously reported methods and allowed for direct comparisons with external data sets. Importantly, our analysis revealed never before seen changes in beta-cell metabolism and in levels of mitochondrial reactive oxygen species. We demonstrated experimentally a role for these changes in the regulation of postnatal beta-cell proliferation. Our pipeline identified maturation-related changes in gene expression not captured when evaluating bulk gene expression data across the developmental time course. The proposed methodology has a broad applicability beyond the context here described and could be used to examine the trajectory of other single cell types along a continuous course of cell state changes.

单细胞RNA-seq生成单个细胞的基因表达谱，并进一步加深了我们对复杂组织中发育和细胞层次结构的理解。分析单细胞数据集的一个计算挑战是重建相对于其转录组的逐渐转变的单个细胞的进展。而许多单细胞订购工具提出了许多需要知识的进展标记或时间描写的人。在这里，我们采用了先前开发的一种算法，用于临时订购大量微阵列样本[1]，以重建出生后胰腺β细胞的发育轨迹。为了实现这一目标，我们应用了一个多步骤管道来分析从出生到断奶后五个不同时间点分离的β细胞的单细胞RNA-seq数据集。具体来说，我们i)通过对归一化数据矩阵应用一维主成分分析，沿线性轨迹(伪时间尺度)对细胞进行排序;Ii)鉴定出沿轨迹显著调控的注释和去novo基因集;Iii)利用蛋白质相互作用库构建了顶级调控基因网络;和iv)得到的基因转录因子[2]的网络连接。系统比较表明，我们的方法在正确排序数据集的单元格方面比以前报道的方法更准确，并允许与外部数据集进行直接比较。重要的是,我们的分析显示从未见过胰腺β-细胞代谢的改变,线粒体活性氧的水平。我们通过实验证明了这些变化在出生后β细胞增殖调节中的作用。我们的研究管道确定了在整个发育过程中评估大量基因表达数据时未捕获的基因表达的成熟相关变化。所提出的方法具有广泛的适用性，超出了这里所描述的上下文，可以用来检查沿着细胞状态变化的连续过程中其他单细胞类型的轨迹。

{"title":"Analysis of Single Cells on a Pseudotime Scale along Postnatal Pancreatic Beta Cell Development","authors":"F. Mulas, Chun Zeng, Yinghui Sui, Tiffany Guan, Nathanael Miller, Yuliang Tan, Fenfen Liu, Wen Jin, Andrea C. Carrano, M. Huising, O. Shirihai, Gene W. Yeo, M. Sander","doi":"10.1145/3107411.3107458","DOIUrl":"https://doi.org/10.1145/3107411.3107458","url":null,"abstract":"Single-cell RNA-seq generates gene expression profiles of individual cells and has furthered our understanding of the developmental and cellular hierarchy within complex tissues. One computational challenge in analyzing single-cell data sets is reconstructing the progression of individual cells with respect to the gradual transition of their transcriptomes. While a number of single-cell ordering tools have been proposed, many of these require knowledge of progression markers or time delineators. Here, we adapted an algorithm previously developed for temporally ordering bulk microarray samples [1] to reconstruct the developmental trajectory of pancreatic beta-cells postnatally. To accomplish this, we applied a multi-step pipeline to analyze single-cell RNA-seq data sets from isolated beta-cells at five different time points between birth and post-weaning. Specifically, we i) ordered cells along a linear trajectory (the Pseudotime Scale) by applying one-dimensional principal component analysis to the normalized data matrix; ii) identified annotated and de-novo gene sets significantly regulated along the trajectory; iii) built a network of top-regulated genes using protein interaction repositories; and iv) scored genes for their network connectivity to transcription factors [2]. A systematic comparison showed that our approach was more accurate in correctly ordering cells for our data set than previously reported methods and allowed for direct comparisons with external data sets. Importantly, our analysis revealed never before seen changes in beta-cell metabolism and in levels of mitochondrial reactive oxygen species. We demonstrated experimentally a role for these changes in the regulation of postnatal beta-cell proliferation. Our pipeline identified maturation-related changes in gene expression not captured when evaluating bulk gene expression data across the developmental time course. The proposed methodology has a broad applicability beyond the context here described and could be used to examine the trajectory of other single cell types along a continuous course of cell state changes.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115132503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session details: Session 15: Sdequence Analysis and Genome Assembly 会议细节:第15部分:序列分析和基因组组装

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3254558

C. Boucher

引用次数: 0

Novel Unsupervised Named Entity Recognition Used in Text Annotation Tool (OntoMate) At Rat Genome Database 基于大鼠基因组数据库文本标注工具(OntoMate)的新型无监督命名实体识别

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3107411.3108198

O. Ghiasvand, M. Shimoyama

In model organism databases, one of the important tasks is to convert free text in biomedical literature to a structured data format. Curators in the Rat Genome Database (RGD), the primary source of rat genomic, genetic, and physiological data, spend considerable time and effort curating functional information for genes, QTLs, and strains from the literature. To increase curation efficiency and prioritize literature for data extraction OntoMate was developed at RGD. This tool tags Pubmed abstracts with genes, gene names, gene mutations, organism name and terms from 16 ontologies/vocabularies, including synonyms and aliases, used to represent functional information. In this project, we have used an unsupervised tagging method to reduce human effort for creating training data. In this approach, a machine learning tool based on decision tree classification techniques has been developed. Mentions that are uniquely belong to a semantic type play positive sample roles, and those with semantic types other than desired group are assumed to be negative samples. An interface allows the user to create a complex query incorporating terms from any of the ontologies, gene symbols, organisms, dates and other parameters. The results return abstracts along with all tagged parameters indicated in the query, along with children of the ontology terms chosen. Results can be further filtered by the user through a panel that lists organisms, genes and diseases with number of paper returned. Abstracts and papers are provided in rank order by relevance to the query. The tool is fully integrated into curation software so citations and abstracts can be automatically entered into the RGD database and given ID and genes and ontology terms in the tags can be checked to create annotations linked to the paper. The system was built with a scalable and open architecture, and literature is updated daily. This tool uses Solr indexing technology and categorizes papers based on a relevance score. It indexes and tags more than 27 million abstracts. With the use of bioNLP tools, RGD has added more automation to its curation workflow.

在模式生物数据库中，将生物医学文献中的自由文本转换为结构化数据格式是一个重要任务。大鼠基因组数据库(RGD)是大鼠基因组、遗传和生理数据的主要来源，管理员花费大量时间和精力从文献中整理基因、qtl和菌株的功能信息。为了提高文献整理效率，优先考虑文献的数据提取，RGD开发了OntoMate。这个工具用基因、基因名称、基因突变、生物体名称和来自16个本体/词汇表(包括同义词和别名)的术语标记Pubmed摘要，用于表示功能信息。在这个项目中，我们使用了一种无监督标记方法来减少人工创建训练数据的工作量。在这种方法中，开发了一种基于决策树分类技术的机器学习工具。唯一属于一种语义类型的提及起积极样本作用，而那些不属于期望组的语义类型的提及被假设为负样本。一个界面允许用户创建一个复杂的查询，包含来自任何本体、基因符号、生物体、日期和其他参数的术语。结果返回摘要以及查询中指示的所有标记参数，以及所选本体术语的子术语。用户可以通过一个面板进一步筛选结果，该面板列出了生物体、基因和疾病以及返回的纸张数量。摘要和论文按与查询的相关性排序。该工具完全集成到管理软件中，因此引文和摘要可以自动输入RGD数据库，并且可以检查标签中的ID和基因和本体术语，以创建链接到论文的注释。该系统采用可扩展和开放的架构，并且每天更新文献。该工具使用Solr索引技术，并根据相关性评分对论文进行分类。它对2700多万篇摘要进行了索引和标记。通过使用bioNLP工具，RGD在其策展工作流程中增加了更多的自动化。

{"title":"Novel Unsupervised Named Entity Recognition Used in Text Annotation Tool (OntoMate) At Rat Genome Database","authors":"O. Ghiasvand, M. Shimoyama","doi":"10.1145/3107411.3108198","DOIUrl":"https://doi.org/10.1145/3107411.3108198","url":null,"abstract":"In model organism databases, one of the important tasks is to convert free text in biomedical literature to a structured data format. Curators in the Rat Genome Database (RGD), the primary source of rat genomic, genetic, and physiological data, spend considerable time and effort curating functional information for genes, QTLs, and strains from the literature. To increase curation efficiency and prioritize literature for data extraction OntoMate was developed at RGD. This tool tags Pubmed abstracts with genes, gene names, gene mutations, organism name and terms from 16 ontologies/vocabularies, including synonyms and aliases, used to represent functional information. In this project, we have used an unsupervised tagging method to reduce human effort for creating training data. In this approach, a machine learning tool based on decision tree classification techniques has been developed. Mentions that are uniquely belong to a semantic type play positive sample roles, and those with semantic types other than desired group are assumed to be negative samples. An interface allows the user to create a complex query incorporating terms from any of the ontologies, gene symbols, organisms, dates and other parameters. The results return abstracts along with all tagged parameters indicated in the query, along with children of the ontology terms chosen. Results can be further filtered by the user through a panel that lists organisms, genes and diseases with number of paper returned. Abstracts and papers are provided in rank order by relevance to the query. The tool is fully integrated into curation software so citations and abstracts can be automatically entered into the RGD database and given ID and genes and ontology terms in the tags can be checked to create annotations linked to the paper. The system was built with a scalable and open architecture, and literature is updated daily. This tool uses Solr indexing technology and categorizes papers based on a relevance score. It indexes and tags more than 27 million abstracts. With the use of bioNLP tools, RGD has added more automation to its curation workflow.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121202923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Investigating Rigidity Properties of Protein Cavities 研究蛋白质空腔的刚性特性

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3107411.3107502

Stephanie Mason, T. Woods, B. Chen, F. Jagodzinski

Cavities in proteins facilitate a variety of biochemical processes. The shapes and sizes of cavities are factors that contribute to specificity in ligand binding, and docking with other biomolecules. A deep understanding of cavity properties may enable new insights into protein-protein interactions, ligand binding, and structure-based drug design studies. In this work we explore how biological properties such as size and residue membership of protein cavities correlate with the flexibility of the cavity as computed using an efficient graph theoretic rigidity algorithm. We hypothesize that various rigidity properties of protein cavities are dependent on cavity surface area. In this work we enumerate a set of cavity rigidity metrics, and demonstrate their use in characterizing over 120,000 cavities from approximately 2,500 chains. We show that cavity size indeed does correlate with some -- but not all -- cavity rigidity metrics.

蛋白质中的空腔促进了多种生化过程。空腔的形状和大小是决定配体结合和与其他生物分子对接特异性的因素。对空腔特性的深入了解可以为蛋白质-蛋白质相互作用、配体结合和基于结构的药物设计研究提供新的见解。在这项工作中，我们探讨了生物特性，如蛋白质空腔的大小和残留成员如何与使用有效的图论刚性算法计算的空腔的灵活性相关。我们假设蛋白质空腔的各种刚性特性依赖于空腔的表面积。在这项工作中，我们列举了一组腔刚度指标，并展示了它们在表征约2,500个链中超过120,000个腔中的使用。我们表明，空腔尺寸确实与一些(但不是全部)空腔刚度指标相关。

引用次数: 0

Session details: Session 20: Big Data in Bioinformatics II 会议详情:第二部分:生物信息学中的大数据

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3254563

T. Pollard

引用次数: 0

Learning Deep Representations from Heterogeneous Patient Data for Predictive Diagnosis 从异构患者数据中学习深度表征用于预测诊断

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3107411.3107433

Chongyu Zhou, Yao Jia, M. Motani, J. Chew

Predictive diagnosis benefits both patients and hospitals. Major challenges limiting the effectiveness of machine learning based predictive diagnosis include the lack of efficient feature selection methods and the heterogeneity of measured patient data (e.g., vital signs). In this paper, we propose DLFS, an efficient feature selection scheme based on deep learning that is applicable for heterogeneous data. DLFS is unsupervised in nature and can learn compact representations from patient data automatically for efficient prediction. In this paper, the specific problem of predicting the patients' length of stay in the hospital is investigated in a predictive diagnosis framework which uses DLFS for feature selection. Real patient data from the pneumonia database of the National University Health System (NUHS) in Singapore are collected to verify the effectiveness of DLFS. By running experiments on real-world patient data and comparing with several other commonly used feature selection methods, we demonstrate the advantage of the proposed DLFS scheme.

预测性诊断对患者和医院都有好处。限制基于机器学习的预测诊断有效性的主要挑战包括缺乏有效的特征选择方法和测量的患者数据(例如生命体征)的异质性。本文提出了一种基于深度学习的、适用于异构数据的高效特征选择方案DLFS。DLFS本质上是无监督的，可以从患者数据中自动学习紧凑的表示，以进行有效的预测。本文在利用DLFS进行特征选择的预测诊断框架中，研究了预测患者住院时间的具体问题。从新加坡国立大学卫生系统(NUHS)的肺炎数据库中收集真实患者数据，以验证DLFS的有效性。通过在真实患者数据上运行实验，并与其他几种常用的特征选择方法进行比较，我们证明了所提出的DLFS方案的优势。

引用次数: 17

Identification and Prediction of Intrinsically Disordered Regions in Proteins Using n-grams 利用n-图识别和预测蛋白质的内在无序区

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3107411.3107480

Mauricio Oberti, I. Vaisman

Intrinsically disordered proteins (IDPs) play an important role in many biological processes and are closely related to human diseases. They also have the potential to serve as targets for drug discovery, especially in disordered binding regions. Accurate prediction of IDPs is challenging, most methods rely on sequence profiles to improve accuracy making them computationally expensive. This paper describes a method based on n-gram frequencies using reduced amino acid alphabets, which tries to overcome this challenge by utilizing only sequence information. Our results show that the described IDP prediction approach performs at the same level as some of the other state of the art ab initio methods. However, the simplicity of n-grams allows to construct decision trees which can provide important insights into common patterns and properties associated with disordered regions.

内在无序蛋白(IDPs)在许多生物过程中发挥重要作用，与人类疾病密切相关。它们也有潜力作为药物发现的靶点，特别是在无序结合区。IDPs的准确预测是具有挑战性的，大多数方法依赖于序列剖面来提高精度，这使得它们的计算成本很高。本文描述了一种基于n-gram频率的方法，该方法使用减少的氨基酸字母表，试图通过仅利用序列信息来克服这一挑战。我们的研究结果表明，所描述的IDP预测方法与其他一些最先进的从头算方法具有相同的水平。然而，n-图的简单性允许构建决策树，这可以提供对与无序区域相关的常见模式和属性的重要见解。

引用次数: 2

Session details: Session 11: Applications to Microbes and Imaging Genetics 会议详情:第11部分:微生物和成像遗传学的应用

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3254554

A. Wright

引用次数: 0

Instrumenting the Health Care Enterprise for Discovery in the Course of Clinical Care 在临床护理过程中利用医疗保健企业进行发现

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3107411.3121000

S. Murphy

Objectives: Although patients may have a wealth of imaging, genomic, monitoring, and personal device data, it has yet to be fully integrated into clinical care. Methods: We identify three reasons for the lack of integration. The first is that "Big Data" is poorly managed by most Electronic Medical Record Systems (EMRS). The data is mostly available on "cloud-native" platforms that are outside the scope of most EMRS, and even checking if such data is available on a patient often must be done outside the EMRS. The second reason is that extracting features from the Big Data that are relevant to healthcare often requires complex machine learning algorithms, such as determining if a genomic variant is protein-altering. The third reason is that applications that present the big data need to be modified constantly to reflect the current state of knowledge, such as instructing when to order a new set of genomic tests. In some cases, the applications need to be updated nightly. Results: A new architecture for the EMRS is evolving which could unite Big Data, machine learning, and clinical care through a microservice-based architecture which can host applications focused on quite specific aspects of clinical care, such as managing cancer immunotherapy. Conclusion: Informatics innovation, medical research, and clinical care go hand in hand as we look to infuse science-based practice into healthcare. Innovative methods will lead to in a new ecosystem of Apps interacting with healthcare providers to fulfill a promise that is still to be determined.

目的:尽管患者可能拥有丰富的影像、基因组、监测和个人设备数据，但尚未完全整合到临床护理中。方法:我们找出缺乏整合的三个原因。首先，大多数电子病历系统(EMRS)对“大数据”管理不善。这些数据大多可以在“云原生”平台上获得，这超出了大多数电子病历的范围，甚至检查患者是否可以获得这些数据通常也必须在电子病历之外完成。第二个原因是，从大数据中提取与医疗保健相关的特征通常需要复杂的机器学习算法，例如确定基因组变异是否会改变蛋白质。第三个原因是，呈现大数据的应用程序需要不断修改，以反映当前的知识状态，例如指示何时订购一套新的基因组测试。在某些情况下，应用程序需要每晚更新。结果:EMRS的新架构正在发展，它可以通过基于微服务的架构将大数据、机器学习和临床护理结合起来，该架构可以托管专注于临床护理相当特定方面的应用程序，例如管理癌症免疫治疗。结论:信息学创新、医学研究和临床护理携手并进，因为我们希望将基于科学的实践注入医疗保健。创新的方法将导致一个新的应用生态系统与医疗保健提供商互动，以实现一个仍有待确定的承诺。

{"title":"Instrumenting the Health Care Enterprise for Discovery in the Course of Clinical Care","authors":"S. Murphy","doi":"10.1145/3107411.3121000","DOIUrl":"https://doi.org/10.1145/3107411.3121000","url":null,"abstract":"Objectives: Although patients may have a wealth of imaging, genomic, monitoring, and personal device data, it has yet to be fully integrated into clinical care. Methods: We identify three reasons for the lack of integration. The first is that \"Big Data\" is poorly managed by most Electronic Medical Record Systems (EMRS). The data is mostly available on \"cloud-native\" platforms that are outside the scope of most EMRS, and even checking if such data is available on a patient often must be done outside the EMRS. The second reason is that extracting features from the Big Data that are relevant to healthcare often requires complex machine learning algorithms, such as determining if a genomic variant is protein-altering. The third reason is that applications that present the big data need to be modified constantly to reflect the current state of knowledge, such as instructing when to order a new set of genomic tests. In some cases, the applications need to be updated nightly. Results: A new architecture for the EMRS is evolving which could unite Big Data, machine learning, and clinical care through a microservice-based architecture which can host applications focused on quite specific aspects of clinical care, such as managing cancer immunotherapy. Conclusion: Informatics innovation, medical research, and clinical care go hand in hand as we look to infuse science-based practice into healthcare. Innovative methods will lead to in a new ecosystem of Apps interacting with healthcare providers to fulfill a promise that is still to be determined.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123030129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Determining Optimal Features for Predicting Type IV Secretion System Effector Proteins for Coxiella burnetii 确定预测伯纳氏杆菌IV型分泌系统效应蛋白的最佳特征

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

Pub Date : 2017-08-20 DOI: 10.1145/3107411.3107416

Zhila Esna Ashari Esfahani, K. Brayton, S. Broschat

Type IV secretion systems (T4SS) are constructed from multiple protein complexes that exist in some types of bacterial pathogens and are responsible for delivering type IV effector proteins into host cells. Effectors target eukaryotic cells and try to manipulate host cell processes and the immune system of the host. Some work has been done to validate effectors experimentally, and recently a few scoring and machine learning-based methods have been developed to predict effectors from whole genome sequences. However, different types of features have been suggested to be effective. In this work, we gathered the features proposed in pre-vious reports and calculated their values for a dataset of effectors and non-effectors of Coxiella burnetii. Then we ranked the features based on their importance in classifying effectors and non-effectors to determine the set of optimal features. Finally, a Support Vector Machine model was developed to test the optimal features by comparing them to a set of features proposed in a previous study. The outcome of the comparison supports the effectiveness of our optimal features.

IV型分泌系统(T4SS)由存在于某些类型细菌病原体中的多种蛋白质复合物构成，负责将IV型效应蛋白输送到宿主细胞中。效应物以真核细胞为目标，试图操纵宿主细胞过程和宿主免疫系统。已经做了一些实验来验证效应器，最近已经开发了一些基于评分和机器学习的方法来预测全基因组序列的效应器。然而，不同类型的特征被认为是有效的。在这项工作中，我们收集了以前报告中提出的特征，并计算了伯纳氏杆菌效应物和非效应物数据集的值。然后，我们根据特征在效应器和非效应器分类中的重要程度对特征进行排序，以确定最优特征集。最后，开发了一个支持向量机模型，通过将其与先前研究中提出的一组特征进行比较来测试最优特征。比较的结果支持我们的最优特征的有效性。

{"title":"Determining Optimal Features for Predicting Type IV Secretion System Effector Proteins for Coxiella burnetii","authors":"Zhila Esna Ashari Esfahani, K. Brayton, S. Broschat","doi":"10.1145/3107411.3107416","DOIUrl":"https://doi.org/10.1145/3107411.3107416","url":null,"abstract":"Type IV secretion systems (T4SS) are constructed from multiple protein complexes that exist in some types of bacterial pathogens and are responsible for delivering type IV effector proteins into host cells. Effectors target eukaryotic cells and try to manipulate host cell processes and the immune system of the host. Some work has been done to validate effectors experimentally, and recently a few scoring and machine learning-based methods have been developed to predict effectors from whole genome sequences. However, different types of features have been suggested to be effective. In this work, we gathered the features proposed in pre-vious reports and calculated their values for a dataset of effectors and non-effectors of Coxiella burnetii. Then we ranked the features based on their importance in classifying effectors and non-effectors to determine the set of optimal features. Finally, a Support Vector Machine model was developed to test the optimal features by comparing them to a set of features proposed in a previous study. The outcome of the comparison supports the effectiveness of our optimal features.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116839044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀