首页 > 最新文献

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics最新文献

英文 中文
Session details: Session 20: Big Data in Bioinformatics II 会议详情:第二部分:生物信息学中的大数据
T. Pollard
{"title":"Session details: Session 20: Big Data in Bioinformatics II","authors":"T. Pollard","doi":"10.1145/3254563","DOIUrl":"https://doi.org/10.1145/3254563","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123840558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Deep Representations from Heterogeneous Patient Data for Predictive Diagnosis 从异构患者数据中学习深度表征用于预测诊断
Chongyu Zhou, Yao Jia, M. Motani, J. Chew
Predictive diagnosis benefits both patients and hospitals. Major challenges limiting the effectiveness of machine learning based predictive diagnosis include the lack of efficient feature selection methods and the heterogeneity of measured patient data (e.g., vital signs). In this paper, we propose DLFS, an efficient feature selection scheme based on deep learning that is applicable for heterogeneous data. DLFS is unsupervised in nature and can learn compact representations from patient data automatically for efficient prediction. In this paper, the specific problem of predicting the patients' length of stay in the hospital is investigated in a predictive diagnosis framework which uses DLFS for feature selection. Real patient data from the pneumonia database of the National University Health System (NUHS) in Singapore are collected to verify the effectiveness of DLFS. By running experiments on real-world patient data and comparing with several other commonly used feature selection methods, we demonstrate the advantage of the proposed DLFS scheme.
预测性诊断对患者和医院都有好处。限制基于机器学习的预测诊断有效性的主要挑战包括缺乏有效的特征选择方法和测量的患者数据(例如生命体征)的异质性。本文提出了一种基于深度学习的、适用于异构数据的高效特征选择方案DLFS。DLFS本质上是无监督的,可以从患者数据中自动学习紧凑的表示,以进行有效的预测。本文在利用DLFS进行特征选择的预测诊断框架中,研究了预测患者住院时间的具体问题。从新加坡国立大学卫生系统(NUHS)的肺炎数据库中收集真实患者数据,以验证DLFS的有效性。通过在真实患者数据上运行实验,并与其他几种常用的特征选择方法进行比较,我们证明了所提出的DLFS方案的优势。
{"title":"Learning Deep Representations from Heterogeneous Patient Data for Predictive Diagnosis","authors":"Chongyu Zhou, Yao Jia, M. Motani, J. Chew","doi":"10.1145/3107411.3107433","DOIUrl":"https://doi.org/10.1145/3107411.3107433","url":null,"abstract":"Predictive diagnosis benefits both patients and hospitals. Major challenges limiting the effectiveness of machine learning based predictive diagnosis include the lack of efficient feature selection methods and the heterogeneity of measured patient data (e.g., vital signs). In this paper, we propose DLFS, an efficient feature selection scheme based on deep learning that is applicable for heterogeneous data. DLFS is unsupervised in nature and can learn compact representations from patient data automatically for efficient prediction. In this paper, the specific problem of predicting the patients' length of stay in the hospital is investigated in a predictive diagnosis framework which uses DLFS for feature selection. Real patient data from the pneumonia database of the National University Health System (NUHS) in Singapore are collected to verify the effectiveness of DLFS. By running experiments on real-world patient data and comparing with several other commonly used feature selection methods, we demonstrate the advantage of the proposed DLFS scheme.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Session details: Session 11: Applications to Microbes and Imaging Genetics 会议详情:第11部分:微生物和成像遗传学的应用
A. Wright
{"title":"Session details: Session 11: Applications to Microbes and Imaging Genetics","authors":"A. Wright","doi":"10.1145/3254554","DOIUrl":"https://doi.org/10.1145/3254554","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127665004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel Unsupervised Named Entity Recognition Used in Text Annotation Tool (OntoMate) At Rat Genome Database 基于大鼠基因组数据库文本标注工具(OntoMate)的新型无监督命名实体识别
O. Ghiasvand, M. Shimoyama
In model organism databases, one of the important tasks is to convert free text in biomedical literature to a structured data format. Curators in the Rat Genome Database (RGD), the primary source of rat genomic, genetic, and physiological data, spend considerable time and effort curating functional information for genes, QTLs, and strains from the literature. To increase curation efficiency and prioritize literature for data extraction OntoMate was developed at RGD. This tool tags Pubmed abstracts with genes, gene names, gene mutations, organism name and terms from 16 ontologies/vocabularies, including synonyms and aliases, used to represent functional information. In this project, we have used an unsupervised tagging method to reduce human effort for creating training data. In this approach, a machine learning tool based on decision tree classification techniques has been developed. Mentions that are uniquely belong to a semantic type play positive sample roles, and those with semantic types other than desired group are assumed to be negative samples. An interface allows the user to create a complex query incorporating terms from any of the ontologies, gene symbols, organisms, dates and other parameters. The results return abstracts along with all tagged parameters indicated in the query, along with children of the ontology terms chosen. Results can be further filtered by the user through a panel that lists organisms, genes and diseases with number of paper returned. Abstracts and papers are provided in rank order by relevance to the query. The tool is fully integrated into curation software so citations and abstracts can be automatically entered into the RGD database and given ID and genes and ontology terms in the tags can be checked to create annotations linked to the paper. The system was built with a scalable and open architecture, and literature is updated daily. This tool uses Solr indexing technology and categorizes papers based on a relevance score. It indexes and tags more than 27 million abstracts. With the use of bioNLP tools, RGD has added more automation to its curation workflow.
在模式生物数据库中,将生物医学文献中的自由文本转换为结构化数据格式是一个重要任务。大鼠基因组数据库(RGD)是大鼠基因组、遗传和生理数据的主要来源,管理员花费大量时间和精力从文献中整理基因、qtl和菌株的功能信息。为了提高文献整理效率,优先考虑文献的数据提取,RGD开发了OntoMate。这个工具用基因、基因名称、基因突变、生物体名称和来自16个本体/词汇表(包括同义词和别名)的术语标记Pubmed摘要,用于表示功能信息。在这个项目中,我们使用了一种无监督标记方法来减少人工创建训练数据的工作量。在这种方法中,开发了一种基于决策树分类技术的机器学习工具。唯一属于一种语义类型的提及起积极样本作用,而那些不属于期望组的语义类型的提及被假设为负样本。一个界面允许用户创建一个复杂的查询,包含来自任何本体、基因符号、生物体、日期和其他参数的术语。结果返回摘要以及查询中指示的所有标记参数,以及所选本体术语的子术语。用户可以通过一个面板进一步筛选结果,该面板列出了生物体、基因和疾病以及返回的纸张数量。摘要和论文按与查询的相关性排序。该工具完全集成到管理软件中,因此引文和摘要可以自动输入RGD数据库,并且可以检查标签中的ID和基因和本体术语,以创建链接到论文的注释。该系统采用可扩展和开放的架构,并且每天更新文献。该工具使用Solr索引技术,并根据相关性评分对论文进行分类。它对2700多万篇摘要进行了索引和标记。通过使用bioNLP工具,RGD在其策展工作流程中增加了更多的自动化。
{"title":"Novel Unsupervised Named Entity Recognition Used in Text Annotation Tool (OntoMate) At Rat Genome Database","authors":"O. Ghiasvand, M. Shimoyama","doi":"10.1145/3107411.3108198","DOIUrl":"https://doi.org/10.1145/3107411.3108198","url":null,"abstract":"In model organism databases, one of the important tasks is to convert free text in biomedical literature to a structured data format. Curators in the Rat Genome Database (RGD), the primary source of rat genomic, genetic, and physiological data, spend considerable time and effort curating functional information for genes, QTLs, and strains from the literature. To increase curation efficiency and prioritize literature for data extraction OntoMate was developed at RGD. This tool tags Pubmed abstracts with genes, gene names, gene mutations, organism name and terms from 16 ontologies/vocabularies, including synonyms and aliases, used to represent functional information. In this project, we have used an unsupervised tagging method to reduce human effort for creating training data. In this approach, a machine learning tool based on decision tree classification techniques has been developed. Mentions that are uniquely belong to a semantic type play positive sample roles, and those with semantic types other than desired group are assumed to be negative samples. An interface allows the user to create a complex query incorporating terms from any of the ontologies, gene symbols, organisms, dates and other parameters. The results return abstracts along with all tagged parameters indicated in the query, along with children of the ontology terms chosen. Results can be further filtered by the user through a panel that lists organisms, genes and diseases with number of paper returned. Abstracts and papers are provided in rank order by relevance to the query. The tool is fully integrated into curation software so citations and abstracts can be automatically entered into the RGD database and given ID and genes and ontology terms in the tags can be checked to create annotations linked to the paper. The system was built with a scalable and open architecture, and literature is updated daily. This tool uses Solr indexing technology and categorizes papers based on a relevance score. It indexes and tags more than 27 million abstracts. With the use of bioNLP tools, RGD has added more automation to its curation workflow.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121202923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Session details: Session 15: Sdequence Analysis and Genome Assembly 会议细节:第15部分:序列分析和基因组组装
C. Boucher
{"title":"Session details: Session 15: Sdequence Analysis and Genome Assembly","authors":"C. Boucher","doi":"10.1145/3254558","DOIUrl":"https://doi.org/10.1145/3254558","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116608260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Session 13: Knowledge Representation Applications 会议详情:第13部分:知识表示应用
P. Veltri
{"title":"Session details: Session 13: Knowledge Representation Applications","authors":"P. Veltri","doi":"10.1145/3254556","DOIUrl":"https://doi.org/10.1145/3254556","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116639147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and Prediction of Intrinsically Disordered Regions in Proteins Using n-grams 利用n-图识别和预测蛋白质的内在无序区
Mauricio Oberti, I. Vaisman
Intrinsically disordered proteins (IDPs) play an important role in many biological processes and are closely related to human diseases. They also have the potential to serve as targets for drug discovery, especially in disordered binding regions. Accurate prediction of IDPs is challenging, most methods rely on sequence profiles to improve accuracy making them computationally expensive. This paper describes a method based on n-gram frequencies using reduced amino acid alphabets, which tries to overcome this challenge by utilizing only sequence information. Our results show that the described IDP prediction approach performs at the same level as some of the other state of the art ab initio methods. However, the simplicity of n-grams allows to construct decision trees which can provide important insights into common patterns and properties associated with disordered regions.
内在无序蛋白(IDPs)在许多生物过程中发挥重要作用,与人类疾病密切相关。它们也有潜力作为药物发现的靶点,特别是在无序结合区。IDPs的准确预测是具有挑战性的,大多数方法依赖于序列剖面来提高精度,这使得它们的计算成本很高。本文描述了一种基于n-gram频率的方法,该方法使用减少的氨基酸字母表,试图通过仅利用序列信息来克服这一挑战。我们的研究结果表明,所描述的IDP预测方法与其他一些最先进的从头算方法具有相同的水平。然而,n-图的简单性允许构建决策树,这可以提供对与无序区域相关的常见模式和属性的重要见解。
{"title":"Identification and Prediction of Intrinsically Disordered Regions in Proteins Using n-grams","authors":"Mauricio Oberti, I. Vaisman","doi":"10.1145/3107411.3107480","DOIUrl":"https://doi.org/10.1145/3107411.3107480","url":null,"abstract":"Intrinsically disordered proteins (IDPs) play an important role in many biological processes and are closely related to human diseases. They also have the potential to serve as targets for drug discovery, especially in disordered binding regions. Accurate prediction of IDPs is challenging, most methods rely on sequence profiles to improve accuracy making them computationally expensive. This paper describes a method based on n-gram frequencies using reduced amino acid alphabets, which tries to overcome this challenge by utilizing only sequence information. Our results show that the described IDP prediction approach performs at the same level as some of the other state of the art ab initio methods. However, the simplicity of n-grams allows to construct decision trees which can provide important insights into common patterns and properties associated with disordered regions.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1997 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131165385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Development of a Polymer-Theoretic Approach to Describing Constraints on Reactions Between Antipeptide Antibodies and Intrinsically Disordered Peptide Antigens: Implications for B-Cell Epitope Prediction 描述抗多肽抗体和内在无序肽抗原之间反应约束的聚合物理论方法的发展:对b细胞表位预测的影响
S. Caoili
B-cell epitope prediction aims to support translational applications as exemplified by peptide-based vaccine design. This entails selection of immunizing peptide sequences that tend to be intrinsically disordered and thus appropriately described within the framework of polymer theory. A fully extended hexapeptide sequence spans a typical antibody footprint; but disordered peptides are flexible rather than rigid, such that their B-cell epitopes may vary in length according to the diversity of conformations assumed upon binding by antibodies. Hence, peptides were modeled herein as worm-like chains, using an interpolated approximation of the radial probability density distribution function to estimate the probability that the ends of a peptidic sequence are separated by a distance less than or equal to a typical antibody footprint diameter. The results suggest that the epitopes are likely to be no more than 17 residues long, which is consistent with available structural data on immune complexes consisting of antipeptide antibodies bound to cognate peptide antigens. For such antigens, B-cell epitope prediction thus could proceed with initial scanning for intrinsically disordered sequences of length up to a physicochemically plausible maximum value (e.g., 17 residues), with analysis of progressively longer subsequences to identify nonredundant sets of putative epitopes (e.g., based on predicted affinity).
b细胞表位预测的目的是支持翻译应用,例如基于肽的疫苗设计。这需要选择免疫肽序列,这些序列往往是内在无序的,因此在聚合物理论的框架内适当地描述。一个完全扩展的六肽序列跨越一个典型的抗体足迹;但无序肽是柔性的而不是刚性的,因此它们的b细胞表位可能根据抗体结合时假设的构象的多样性而变化长度。因此,本文将多肽建模为蠕虫状链,使用径向概率密度分布函数的插值近似来估计多肽序列的末端被小于或等于典型抗体足迹直径的距离分开的概率。结果表明,这些表位的长度可能不超过17个残基,这与现有的由与同源肽抗原结合的抗肽抗体组成的免疫复合物的结构数据一致。因此,对于这些抗原,b细胞表位预测可以进行初始扫描,寻找长度达到物理化学上合理最大值(例如,17个残基)的内在无序序列,并对逐渐增加的子序列进行分析,以识别非冗余的假定表位集(例如,基于预测的亲和力)。
{"title":"Development of a Polymer-Theoretic Approach to Describing Constraints on Reactions Between Antipeptide Antibodies and Intrinsically Disordered Peptide Antigens: Implications for B-Cell Epitope Prediction","authors":"S. Caoili","doi":"10.1145/3107411.3108190","DOIUrl":"https://doi.org/10.1145/3107411.3108190","url":null,"abstract":"B-cell epitope prediction aims to support translational applications as exemplified by peptide-based vaccine design. This entails selection of immunizing peptide sequences that tend to be intrinsically disordered and thus appropriately described within the framework of polymer theory. A fully extended hexapeptide sequence spans a typical antibody footprint; but disordered peptides are flexible rather than rigid, such that their B-cell epitopes may vary in length according to the diversity of conformations assumed upon binding by antibodies. Hence, peptides were modeled herein as worm-like chains, using an interpolated approximation of the radial probability density distribution function to estimate the probability that the ends of a peptidic sequence are separated by a distance less than or equal to a typical antibody footprint diameter. The results suggest that the epitopes are likely to be no more than 17 residues long, which is consistent with available structural data on immune complexes consisting of antipeptide antibodies bound to cognate peptide antigens. For such antigens, B-cell epitope prediction thus could proceed with initial scanning for intrinsically disordered sequences of length up to a physicochemically plausible maximum value (e.g., 17 residues), with analysis of progressively longer subsequences to identify nonredundant sets of putative epitopes (e.g., based on predicted affinity).","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132167852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Instrumenting the Health Care Enterprise for Discovery in the Course of Clinical Care 在临床护理过程中利用医疗保健企业进行发现
S. Murphy
Objectives: Although patients may have a wealth of imaging, genomic, monitoring, and personal device data, it has yet to be fully integrated into clinical care. Methods: We identify three reasons for the lack of integration. The first is that "Big Data" is poorly managed by most Electronic Medical Record Systems (EMRS). The data is mostly available on "cloud-native" platforms that are outside the scope of most EMRS, and even checking if such data is available on a patient often must be done outside the EMRS. The second reason is that extracting features from the Big Data that are relevant to healthcare often requires complex machine learning algorithms, such as determining if a genomic variant is protein-altering. The third reason is that applications that present the big data need to be modified constantly to reflect the current state of knowledge, such as instructing when to order a new set of genomic tests. In some cases, the applications need to be updated nightly. Results: A new architecture for the EMRS is evolving which could unite Big Data, machine learning, and clinical care through a microservice-based architecture which can host applications focused on quite specific aspects of clinical care, such as managing cancer immunotherapy. Conclusion: Informatics innovation, medical research, and clinical care go hand in hand as we look to infuse science-based practice into healthcare. Innovative methods will lead to in a new ecosystem of Apps interacting with healthcare providers to fulfill a promise that is still to be determined.
目的:尽管患者可能拥有丰富的影像、基因组、监测和个人设备数据,但尚未完全整合到临床护理中。方法:我们找出缺乏整合的三个原因。首先,大多数电子病历系统(EMRS)对“大数据”管理不善。这些数据大多可以在“云原生”平台上获得,这超出了大多数电子病历的范围,甚至检查患者是否可以获得这些数据通常也必须在电子病历之外完成。第二个原因是,从大数据中提取与医疗保健相关的特征通常需要复杂的机器学习算法,例如确定基因组变异是否会改变蛋白质。第三个原因是,呈现大数据的应用程序需要不断修改,以反映当前的知识状态,例如指示何时订购一套新的基因组测试。在某些情况下,应用程序需要每晚更新。结果:EMRS的新架构正在发展,它可以通过基于微服务的架构将大数据、机器学习和临床护理结合起来,该架构可以托管专注于临床护理相当特定方面的应用程序,例如管理癌症免疫治疗。结论:信息学创新、医学研究和临床护理携手并进,因为我们希望将基于科学的实践注入医疗保健。创新的方法将导致一个新的应用生态系统与医疗保健提供商互动,以实现一个仍有待确定的承诺。
{"title":"Instrumenting the Health Care Enterprise for Discovery in the Course of Clinical Care","authors":"S. Murphy","doi":"10.1145/3107411.3121000","DOIUrl":"https://doi.org/10.1145/3107411.3121000","url":null,"abstract":"Objectives: Although patients may have a wealth of imaging, genomic, monitoring, and personal device data, it has yet to be fully integrated into clinical care. Methods: We identify three reasons for the lack of integration. The first is that \"Big Data\" is poorly managed by most Electronic Medical Record Systems (EMRS). The data is mostly available on \"cloud-native\" platforms that are outside the scope of most EMRS, and even checking if such data is available on a patient often must be done outside the EMRS. The second reason is that extracting features from the Big Data that are relevant to healthcare often requires complex machine learning algorithms, such as determining if a genomic variant is protein-altering. The third reason is that applications that present the big data need to be modified constantly to reflect the current state of knowledge, such as instructing when to order a new set of genomic tests. In some cases, the applications need to be updated nightly. Results: A new architecture for the EMRS is evolving which could unite Big Data, machine learning, and clinical care through a microservice-based architecture which can host applications focused on quite specific aspects of clinical care, such as managing cancer immunotherapy. Conclusion: Informatics innovation, medical research, and clinical care go hand in hand as we look to infuse science-based practice into healthcare. Innovative methods will lead to in a new ecosystem of Apps interacting with healthcare providers to fulfill a promise that is still to be determined.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123030129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Determining Optimal Features for Predicting Type IV Secretion System Effector Proteins for Coxiella burnetii 确定预测伯纳氏杆菌IV型分泌系统效应蛋白的最佳特征
Zhila Esna Ashari Esfahani, K. Brayton, S. Broschat
Type IV secretion systems (T4SS) are constructed from multiple protein complexes that exist in some types of bacterial pathogens and are responsible for delivering type IV effector proteins into host cells. Effectors target eukaryotic cells and try to manipulate host cell processes and the immune system of the host. Some work has been done to validate effectors experimentally, and recently a few scoring and machine learning-based methods have been developed to predict effectors from whole genome sequences. However, different types of features have been suggested to be effective. In this work, we gathered the features proposed in pre-vious reports and calculated their values for a dataset of effectors and non-effectors of Coxiella burnetii. Then we ranked the features based on their importance in classifying effectors and non-effectors to determine the set of optimal features. Finally, a Support Vector Machine model was developed to test the optimal features by comparing them to a set of features proposed in a previous study. The outcome of the comparison supports the effectiveness of our optimal features.
IV型分泌系统(T4SS)由存在于某些类型细菌病原体中的多种蛋白质复合物构成,负责将IV型效应蛋白输送到宿主细胞中。效应物以真核细胞为目标,试图操纵宿主细胞过程和宿主免疫系统。已经做了一些实验来验证效应器,最近已经开发了一些基于评分和机器学习的方法来预测全基因组序列的效应器。然而,不同类型的特征被认为是有效的。在这项工作中,我们收集了以前报告中提出的特征,并计算了伯纳氏杆菌效应物和非效应物数据集的值。然后,我们根据特征在效应器和非效应器分类中的重要程度对特征进行排序,以确定最优特征集。最后,开发了一个支持向量机模型,通过将其与先前研究中提出的一组特征进行比较来测试最优特征。比较的结果支持我们的最优特征的有效性。
{"title":"Determining Optimal Features for Predicting Type IV Secretion System Effector Proteins for Coxiella burnetii","authors":"Zhila Esna Ashari Esfahani, K. Brayton, S. Broschat","doi":"10.1145/3107411.3107416","DOIUrl":"https://doi.org/10.1145/3107411.3107416","url":null,"abstract":"Type IV secretion systems (T4SS) are constructed from multiple protein complexes that exist in some types of bacterial pathogens and are responsible for delivering type IV effector proteins into host cells. Effectors target eukaryotic cells and try to manipulate host cell processes and the immune system of the host. Some work has been done to validate effectors experimentally, and recently a few scoring and machine learning-based methods have been developed to predict effectors from whole genome sequences. However, different types of features have been suggested to be effective. In this work, we gathered the features proposed in pre-vious reports and calculated their values for a dataset of effectors and non-effectors of Coxiella burnetii. Then we ranked the features based on their importance in classifying effectors and non-effectors to determine the set of optimal features. Finally, a Support Vector Machine model was developed to test the optimal features by comparing them to a set of features proposed in a previous study. The outcome of the comparison supports the effectiveness of our optimal features.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116839044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1