首页 > 最新文献

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics最新文献

英文 中文
Using Energy-Minimization Profiles to Measure Protein Resistance to Drugs 使用能量最小化谱来测量蛋白质对药物的耐药性
E. Thompson, Tess Thackray, Cecilia Kalthoff, Ryan Rapoport, F. Jagodzinski
A mutation to the amino acid sequence of a protein can cause a biomolecule to be resistant to the intended effects of a drug. Assessing the changes of a drug's efficacy in response to mutations via mutagenesis wet-lab experiments is prohibitively time consuming for even a single point mutation, let alone for all possible mutations. Existing approaches for inferring mutation-induced drug resistance are available, but all of them reason about mutations of residues at or very near the protein-drug interface. However, there are examples of mutations far away from the region where the ligand binds, but which nonetheless render a protein resistant to the effects of the drug. We present a proof-of-concept computational pipeline that generates in silico the set of all possible single point mutations in a protein-ligand complex. We assess drug resistance via energy profiles for short runs of molecular dynamics of the mutants. We assess the impact of mutations far away from the protein-drug interface and provide case studies for exploring how amino acid substitutions both near and far away from where the ligand interacts with a protein target have a stabilizing or destabilizing effect on the protein-drug complex.
蛋白质氨基酸序列的突变可导致生物分子对药物的预期作用产生抗性。通过诱变湿实验室实验来评估药物对突变反应的疗效变化,即使是单点突变也非常耗时,更不用说所有可能的突变了。现有的方法可以推断突变引起的耐药性,但它们都是基于蛋白质-药物界面或非常接近界面的残基突变。然而,也有一些突变的例子,这些突变远离配体结合的区域,但仍然使蛋白质抵抗药物的作用。我们提出了一个概念验证计算管道,在计算机上生成蛋白质配体复合体中所有可能的单点突变集。我们通过突变体分子动力学的短期运行的能量谱来评估耐药性。我们评估了远离蛋白质-药物界面的突变的影响,并提供了案例研究,以探索离配体与蛋白质靶标相互作用的近和远的氨基酸取代如何对蛋白质-药物复合物产生稳定或不稳定的影响。
{"title":"Using Energy-Minimization Profiles to Measure Protein Resistance to Drugs","authors":"E. Thompson, Tess Thackray, Cecilia Kalthoff, Ryan Rapoport, F. Jagodzinski","doi":"10.1145/3388440.3414703","DOIUrl":"https://doi.org/10.1145/3388440.3414703","url":null,"abstract":"A mutation to the amino acid sequence of a protein can cause a biomolecule to be resistant to the intended effects of a drug. Assessing the changes of a drug's efficacy in response to mutations via mutagenesis wet-lab experiments is prohibitively time consuming for even a single point mutation, let alone for all possible mutations. Existing approaches for inferring mutation-induced drug resistance are available, but all of them reason about mutations of residues at or very near the protein-drug interface. However, there are examples of mutations far away from the region where the ligand binds, but which nonetheless render a protein resistant to the effects of the drug. We present a proof-of-concept computational pipeline that generates in silico the set of all possible single point mutations in a protein-ligand complex. We assess drug resistance via energy profiles for short runs of molecular dynamics of the mutants. We assess the impact of mutations far away from the protein-drug interface and provide case studies for exploring how amino acid substitutions both near and far away from where the ligand interacts with a protein target have a stabilizing or destabilizing effect on the protein-drug complex.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127717573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Prefix/Suffix Variation in Retinoic Acid Response Elements 维甲酸反应元件的前缀/后缀变化
Y. Zhuang, Kara L. Cerveny, Anna M. Ritz
{"title":"Prefix/Suffix Variation in Retinoic Acid Response Elements","authors":"Y. Zhuang, Kara L. Cerveny, Anna M. Ritz","doi":"10.1145/3388440.3414914","DOIUrl":"https://doi.org/10.1145/3388440.3414914","url":null,"abstract":"","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129062069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Exploration of Protein Conformational Pathways using RRT* and MC 利用RRT*和MC有效探索蛋白质构象途径
Fatemeh Afrasiabi, Nurit Haspel
The conformational space of proteins is complex and high dimensional, which makes its analysis a highly challenging task. Understanding the structure and dynamics of proteins is essential in order to understand their cellular function. It is often hard to experimentally characterize intermediate structures as well as conformational trajectory, due to the rapid dynamics of some proteins. Conformational pathways, which describe how proteins transition from one conformation to another as a result of a shift in conditions, are hard to describe experimentally. Computationally it is a challenging problem as well since physics-based simulations are time-consuming and often don't span sufficient time scales to allow capturing a full pathway. In previous work, we combined evolutionary information or rigidity analysis obtained from proteins' sequence and structure with an efficient tree based conformational search to elucidate the conformational trajectory of proteins. We incorporated backbone + C - β resolution and helped limit the search space by identifying mobile regions in a molecule. In this work, we use a hybrid algorithm which combines MC sampling and RRT*, a version of the Rapidly Exploring Random Trees (RRT) robotics-based method, to make the search more accurate and efficient, and produce smooth conformational pathways.
蛋白质的构象空间是复杂的、高维的,对其进行分析是一项极具挑战性的任务。为了了解蛋白质的细胞功能,了解蛋白质的结构和动力学是必不可少的。由于某些蛋白质的快速动力学,通常很难通过实验表征中间结构以及构象轨迹。构象途径描述了蛋白质如何由于条件的变化而从一种构象转变为另一种构象,这很难用实验来描述。从计算角度来看,这也是一个具有挑战性的问题,因为基于物理的模拟非常耗时,并且通常不能跨越足够的时间尺度来捕捉完整的路径。在之前的工作中,我们将从蛋白质序列和结构中获得的进化信息或刚度分析与有效的基于树的构象搜索相结合,以阐明蛋白质的构象轨迹。我们结合了骨架+ C - β分辨率,并通过识别分子中的移动区域来限制搜索空间。在这项工作中,我们使用了一种结合MC采样和RRT*(一种基于快速探索随机树(RRT)机器人的方法)的混合算法,使搜索更加准确和高效,并产生平滑的构象路径。
{"title":"Efficient Exploration of Protein Conformational Pathways using RRT* and MC","authors":"Fatemeh Afrasiabi, Nurit Haspel","doi":"10.1145/3388440.3414705","DOIUrl":"https://doi.org/10.1145/3388440.3414705","url":null,"abstract":"The conformational space of proteins is complex and high dimensional, which makes its analysis a highly challenging task. Understanding the structure and dynamics of proteins is essential in order to understand their cellular function. It is often hard to experimentally characterize intermediate structures as well as conformational trajectory, due to the rapid dynamics of some proteins. Conformational pathways, which describe how proteins transition from one conformation to another as a result of a shift in conditions, are hard to describe experimentally. Computationally it is a challenging problem as well since physics-based simulations are time-consuming and often don't span sufficient time scales to allow capturing a full pathway. In previous work, we combined evolutionary information or rigidity analysis obtained from proteins' sequence and structure with an efficient tree based conformational search to elucidate the conformational trajectory of proteins. We incorporated backbone + C - β resolution and helped limit the search space by identifying mobile regions in a molecule. In this work, we use a hybrid algorithm which combines MC sampling and RRT*, a version of the Rapidly Exploring Random Trees (RRT) robotics-based method, to make the search more accurate and efficient, and produce smooth conformational pathways.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116907600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deep Neural Network Modeling for Phenotypic Prediction of Metagenomic Samples 用于宏基因组样本表型预测的深度神经网络建模
Yassin Mreyoud, Tae-Hyuk Ahn
The increasing popularity of metagenomic sequencing has resulted in a plethora of 16S RNA and whole genome sequence data available. Microbes play an important role in the health and disease of humans, pets, and livestock. Characterizing such microbes and their relative abundances are important to identify sample phenotypes such as disease. In the past, machine learning based methods have been applied for prediction of host disease status and overall health based on taxonomic abundance profiles. Here we utilize deep neural network modeling with taxonomic profiles for faster, precise, and effective prediction of metagenomic sample phenotypes.
宏基因组测序的日益普及导致了16S RNA和全基因组序列数据的过剩。微生物在人类、宠物和牲畜的健康和疾病中起着重要作用。表征这些微生物及其相对丰度对于确定样品表型(如疾病)非常重要。过去,基于机器学习的方法已被应用于基于分类丰度谱的宿主疾病状态和整体健康预测。在这里,我们利用具有分类概况的深度神经网络建模来更快,精确和有效地预测宏基因组样本表型。
{"title":"Deep Neural Network Modeling for Phenotypic Prediction of Metagenomic Samples","authors":"Yassin Mreyoud, Tae-Hyuk Ahn","doi":"10.1145/3388440.3414921","DOIUrl":"https://doi.org/10.1145/3388440.3414921","url":null,"abstract":"The increasing popularity of metagenomic sequencing has resulted in a plethora of 16S RNA and whole genome sequence data available. Microbes play an important role in the health and disease of humans, pets, and livestock. Characterizing such microbes and their relative abundances are important to identify sample phenotypes such as disease. In the past, machine learning based methods have been applied for prediction of host disease status and overall health based on taxonomic abundance profiles. Here we utilize deep neural network modeling with taxonomic profiles for faster, precise, and effective prediction of metagenomic sample phenotypes.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125510636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PeakMatcher
R. J. Nowling, C. R. Beal, Scott J. Emrich, S. Behura, M. Halfon, M. Duman-Scheel
When reference genome assemblies are updated, the peaks from DNA enrichment assays such as ChIP-Seq and FAIRE-Seq need to be called again using the new genome assembly. PeakMatcher is an open-source package that aids in validation by matching peaks across two genome assemblies using the alignment of reads or within the same genome. PeakMatcher calculates recall and precision while also outputting lists of peak-to-peak matches. PeakMatcher uses read alignments to match peaks across genome assemblies. PeakMatcher finds all read aligned to one genome that overlap with a given list of peaks. PeakMatcher uses the read names to locate where those reads are aligned against a second genome. Lastly, all peaks called against the second genome that overlap with the aligned reads are found and output. PeakMatcher groups uses the peak-read-peak relationships to discover 1-to-1, 1-to-many, and many-to-many relationships. Overlap queries are performed with interval trees for maximum efficiency. We evaluated PeakMatcher on two data sets. The first data set was FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements Sequencing) of DNA isolated embyros of the mosquito Aedes aegypti [2, 4]. We implemented a peak calling pipeline and validated it on the older (highly fragmented) AaegL3 assembly [5]. PeakMatcher matched 92.9% (precision) of the 121,594 previously-called peaks from [2, 4] with 89.4% (recall) of the 124,959 peaks called with our new pipeline. Next, we applied the peak-calling pipeline to call FAIRE peaks using the newer, chromosome-complete AaegL5 assembly [3]. PeakMatcher found matches for 14 of the 16 experimentally-validated AaegL3 FAIRE peaks from [2, 4]. We validated the matches by comparing nearby genes across the genomes. Nearby genes were consistent for 11 of the 14 peaks; inconsistencies for at least two of the remaining peaks were clearly attributable to differences in assemblies. When applied to all of the peaks, Peak-Matcher matched 78.8% (precision) of the 124,959 AaegL3 peaks with 76.7% (recall) of the 128,307 AaegL5 peaks. The second data set was STARR-Seq (Self-Transcribing Active Regulatory Region Sequencing) of Drosophila melanogaster DNA in S2 culture cells [1]. We called STARR peaks against two versions (dm3 and r5.53) of the D. melanogaster genome [6]. PeakMatcher matched 77.4% (precision) of the 4,195 dm3 peaks with 94.8% (recall) of the 3,114 r5.53 peaks. PeakMatcher and associated documentation are available on GitHub (https://github.com/rnowling/peak-matcher) under the open-source Apache Software License v2. PeakMatcher was written in Python 3 using the intervaltree library.
{"title":"PeakMatcher","authors":"R. J. Nowling, C. R. Beal, Scott J. Emrich, S. Behura, M. Halfon, M. Duman-Scheel","doi":"10.1145/3388440.3414907","DOIUrl":"https://doi.org/10.1145/3388440.3414907","url":null,"abstract":"When reference genome assemblies are updated, the peaks from DNA enrichment assays such as ChIP-Seq and FAIRE-Seq need to be called again using the new genome assembly. PeakMatcher is an open-source package that aids in validation by matching peaks across two genome assemblies using the alignment of reads or within the same genome. PeakMatcher calculates recall and precision while also outputting lists of peak-to-peak matches. PeakMatcher uses read alignments to match peaks across genome assemblies. PeakMatcher finds all read aligned to one genome that overlap with a given list of peaks. PeakMatcher uses the read names to locate where those reads are aligned against a second genome. Lastly, all peaks called against the second genome that overlap with the aligned reads are found and output. PeakMatcher groups uses the peak-read-peak relationships to discover 1-to-1, 1-to-many, and many-to-many relationships. Overlap queries are performed with interval trees for maximum efficiency. We evaluated PeakMatcher on two data sets. The first data set was FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements Sequencing) of DNA isolated embyros of the mosquito Aedes aegypti [2, 4]. We implemented a peak calling pipeline and validated it on the older (highly fragmented) AaegL3 assembly [5]. PeakMatcher matched 92.9% (precision) of the 121,594 previously-called peaks from [2, 4] with 89.4% (recall) of the 124,959 peaks called with our new pipeline. Next, we applied the peak-calling pipeline to call FAIRE peaks using the newer, chromosome-complete AaegL5 assembly [3]. PeakMatcher found matches for 14 of the 16 experimentally-validated AaegL3 FAIRE peaks from [2, 4]. We validated the matches by comparing nearby genes across the genomes. Nearby genes were consistent for 11 of the 14 peaks; inconsistencies for at least two of the remaining peaks were clearly attributable to differences in assemblies. When applied to all of the peaks, Peak-Matcher matched 78.8% (precision) of the 124,959 AaegL3 peaks with 76.7% (recall) of the 128,307 AaegL5 peaks. The second data set was STARR-Seq (Self-Transcribing Active Regulatory Region Sequencing) of Drosophila melanogaster DNA in S2 culture cells [1]. We called STARR peaks against two versions (dm3 and r5.53) of the D. melanogaster genome [6]. PeakMatcher matched 77.4% (precision) of the 4,195 dm3 peaks with 94.8% (recall) of the 3,114 r5.53 peaks. PeakMatcher and associated documentation are available on GitHub (https://github.com/rnowling/peak-matcher) under the open-source Apache Software License v2. PeakMatcher was written in Python 3 using the intervaltree library.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129636748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Learning for Cardiovascular Risk Prediction using EHR Data 利用电子病历数据进行心血管风险预测的多模式学习
A. Bagheri, T. K. Groenhof, W. B. Veldhuis, P. A. Jong, F. Asselbergs, D. Oberski
Electronic health records (EHRs) contain structured and unstructured data of significant clinical and research value. Various machine learning approaches have been developed to employ information in EHRs for risk prediction. The majority of these attempts, however, focus on structured EHR fields and lose the vast amount of information in the unstructured texts. Deep neural networks, on the other hand, gained tremendous momentum in knowledge discovery from EHR texts, while there are very seldom studies that used of both free-texts and the structured information in EHRs for clinical prediction. To exploit the potential information captured in EHRs, in this study we propose MI-BiLSTM, a multimodal bidirectional long short-term memory-based framework for cardiovascular risk prediction that integrates medical texts and structured clinical information. The MI-BiLSTM framework concatenates word embeddings from x-ray reports to classical clinical predictors from the Second Manifestations of ARTerial disease (SMART) study [1], before applying them to a final fully connected neural network. In the experiments, by employing the proposed framework, we compared performances of different deep neural network architectures on data of 5603 patients using 5-fold cross validation. Evaluated on the SMART study, we demonstrate the clinical relevance of integrating text features and classical predictors for cardiovascular risk prediction for patients with manifest vascular disease or at high--risk for cardiovascular disease. Our results show that the MI-BiLSTM framework using text data in addition to laboratory values outperforms deep learning models using only known clinical predictors. In future, we will focus on expanding our multimodal framework to import knowledge from available medical ontologies to enhance the quality of clinical decision making in risk prediction models. An open-source implementation of the proposed framework is publicly available at https://github.com/bagheria/CardioRisk-TextMining
电子健康记录(EHRs)包含具有重要临床和研究价值的结构化和非结构化数据。已经开发了各种机器学习方法来利用电子病历中的信息进行风险预测。然而,这些尝试中的大多数都集中在结构化的EHR字段上,而丢失了非结构化文本中的大量信息。另一方面,深度神经网络在电子病历文本的知识发现方面取得了巨大的发展势头,而将电子病历中的自由文本和结构化信息同时用于临床预测的研究却很少。为了利用电子病历中捕获的潜在信息,在本研究中,我们提出了MI-BiLSTM,这是一个基于多模式双向长短期记忆的心血管风险预测框架,整合了医学文本和结构化临床信息。MI-BiLSTM框架将来自x射线报告的词嵌入连接到来自动脉疾病第二表现(SMART)研究[1]的经典临床预测因子,然后将它们应用于最终的全连接神经网络。在实验中,采用所提出的框架,我们使用5倍交叉验证比较了不同深度神经网络架构在5603例患者数据上的性能。通过SMART研究的评估,我们证明了整合文本特征和经典预测因子对明显血管疾病或心血管疾病高风险患者的心血管风险预测的临床相关性。我们的研究结果表明,除了实验室值之外,使用文本数据的MI-BiLSTM框架优于仅使用已知临床预测因子的深度学习模型。未来,我们将专注于扩展我们的多模式框架,从现有的医学本体中导入知识,以提高风险预测模型中临床决策的质量。该框架的开源实现可在https://github.com/bagheria/CardioRisk-TextMining上公开获得
{"title":"Multimodal Learning for Cardiovascular Risk Prediction using EHR Data","authors":"A. Bagheri, T. K. Groenhof, W. B. Veldhuis, P. A. Jong, F. Asselbergs, D. Oberski","doi":"10.1145/3388440.3414924","DOIUrl":"https://doi.org/10.1145/3388440.3414924","url":null,"abstract":"Electronic health records (EHRs) contain structured and unstructured data of significant clinical and research value. Various machine learning approaches have been developed to employ information in EHRs for risk prediction. The majority of these attempts, however, focus on structured EHR fields and lose the vast amount of information in the unstructured texts. Deep neural networks, on the other hand, gained tremendous momentum in knowledge discovery from EHR texts, while there are very seldom studies that used of both free-texts and the structured information in EHRs for clinical prediction. To exploit the potential information captured in EHRs, in this study we propose MI-BiLSTM, a multimodal bidirectional long short-term memory-based framework for cardiovascular risk prediction that integrates medical texts and structured clinical information. The MI-BiLSTM framework concatenates word embeddings from x-ray reports to classical clinical predictors from the Second Manifestations of ARTerial disease (SMART) study [1], before applying them to a final fully connected neural network. In the experiments, by employing the proposed framework, we compared performances of different deep neural network architectures on data of 5603 patients using 5-fold cross validation. Evaluated on the SMART study, we demonstrate the clinical relevance of integrating text features and classical predictors for cardiovascular risk prediction for patients with manifest vascular disease or at high--risk for cardiovascular disease. Our results show that the MI-BiLSTM framework using text data in addition to laboratory values outperforms deep learning models using only known clinical predictors. In future, we will focus on expanding our multimodal framework to import knowledge from available medical ontologies to enhance the quality of clinical decision making in risk prediction models. An open-source implementation of the proposed framework is publicly available at https://github.com/bagheria/CardioRisk-TextMining","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115815139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Ir-Man: An Information Retrieval Framework for Marine Animal Necropsy Analysis Ir-Man:海洋动物尸体解剖分析的信息检索框架
Alexander F. B. Carmichael, Deepayan Bhowmik, J. Baily, A. Brownlow, G. Gunn, A. Reeves
This paper proposes Ir-Man (Information Retrieval for Marine Animal Necropsies), a framework for retrieving discrete information from marine mammal post-mortem reports for statistical analysis. When a marine mammal is reported dead after stranding in Scotland, the carcass is examined by the Scottish Marine Animal Strandings Scheme (SMASS) to establish the circumstances of the animal's death. This involves the creation of a "post-mortem" (or necropsy) report, which systematically describes the body. These semi-structured reports record lesions (damage or abnormalities to anatomical regions) as well as other observations. Observations embedded within these texts are used to determine cause of death. While a cause of death is recorded separately, many other descriptions may be of pathological and epidemiological significance when aggregated and analysed collectively. As manual extraction of these descriptions is costly, time consuming and at times erroneous, there is a need for an automated information retrieval mechanism which is a non-trivial task given the wide variety of possible descriptions, pathologies and species. The Ir-Man framework consists of a new ontology, a lexicon of observations and anatomical terms and an entity relation engine for information retrieval and statistics generation from a pool of necropsy reports. We demonstrate the effectiveness of our framework by creating a rule-based binary classifier for identifying bottlenose dolphin attacks (BDA) in harbour porpoise gross pathology reports and achieved an accuracy of 83.4%.
本文提出Ir-Man (Information Retrieval for Marine Animal Necropsies),这是一个从海洋哺乳动物死后报告中检索离散信息进行统计分析的框架。当海洋哺乳动物在苏格兰搁浅后被报告死亡时,苏格兰海洋动物搁浅计划(SMASS)将对其尸体进行检查,以确定动物死亡的情况。这包括创建一份“验尸”(或尸检)报告,该报告系统地描述了尸体。这些半结构化的报告记录病变(解剖区域的损伤或异常)以及其他观察结果。这些文字中的观察结果被用来确定死因。虽然死因是单独记录的,但许多其他描述在汇总和集体分析时可能具有病理和流行病学意义。由于人工提取这些描述是昂贵的、耗时的,而且有时是错误的,因此需要一种自动化的信息检索机制,考虑到各种可能的描述、病理和物种,这是一项重要的任务。Ir-Man框架由一个新的本体、一个观察和解剖术语词典以及一个实体关系引擎组成,该引擎用于从尸检报告池中检索信息和生成统计数据。我们通过创建一个基于规则的二元分类器来识别海港鼠海豚大体病理报告中的宽吻海豚攻击(BDA),并实现了83.4%的准确率,从而证明了我们框架的有效性。
{"title":"Ir-Man: An Information Retrieval Framework for Marine Animal Necropsy Analysis","authors":"Alexander F. B. Carmichael, Deepayan Bhowmik, J. Baily, A. Brownlow, G. Gunn, A. Reeves","doi":"10.1145/3388440.3412417","DOIUrl":"https://doi.org/10.1145/3388440.3412417","url":null,"abstract":"This paper proposes Ir-Man (Information Retrieval for Marine Animal Necropsies), a framework for retrieving discrete information from marine mammal post-mortem reports for statistical analysis. When a marine mammal is reported dead after stranding in Scotland, the carcass is examined by the Scottish Marine Animal Strandings Scheme (SMASS) to establish the circumstances of the animal's death. This involves the creation of a \"post-mortem\" (or necropsy) report, which systematically describes the body. These semi-structured reports record lesions (damage or abnormalities to anatomical regions) as well as other observations. Observations embedded within these texts are used to determine cause of death. While a cause of death is recorded separately, many other descriptions may be of pathological and epidemiological significance when aggregated and analysed collectively. As manual extraction of these descriptions is costly, time consuming and at times erroneous, there is a need for an automated information retrieval mechanism which is a non-trivial task given the wide variety of possible descriptions, pathologies and species. The Ir-Man framework consists of a new ontology, a lexicon of observations and anatomical terms and an entity relation engine for information retrieval and statistics generation from a pool of necropsy reports. We demonstrate the effectiveness of our framework by creating a rule-based binary classifier for identifying bottlenose dolphin attacks (BDA) in harbour porpoise gross pathology reports and achieved an accuracy of 83.4%.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121154917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Global Attention Graph Kernel Network Prediction of Drug Prescription 药物处方的跨全局注意图核网络预测
Hao-Ren Yao, D. Chang, O. Frieder, Wendy Huang, I. Liang, C. Hung
We present an end-to-end, interpretable, deep-learning architecture to learn a graph kernel that predicts the outcome of chronic disease drug prescription. This is achieved through a deep metric learning collaborative with a Support Vector Machine objective using a graphical representation of Electronic Health Records. We formulate the predictive model as a binary graph classification problem with an adaptive learned graph kernel through novel cross-global attention node matching between patient graphs, simultaneously computing on multiple graphs without training pair or triplet generation. Results using the Taiwanese National Health Insurance Research Database demonstrate that our approach outperforms current start-of-the-art models both in terms of accuracy and interpretability.
我们提出了一个端到端、可解释的深度学习架构来学习预测慢性病药物处方结果的图核。这是通过使用电子健康记录的图形表示与支持向量机目标的深度度量学习协作来实现的。我们将预测模型描述为一个具有自适应学习图核的二值图分类问题,通过在患者图之间进行新颖的跨全局注意节点匹配,同时在多个图上进行计算,而不需要训练对或三元组生成。使用台湾国民健康保险研究数据库的结果表明,我们的方法在准确性和可解释性方面都优于当前最先进的模型。
{"title":"Cross-Global Attention Graph Kernel Network Prediction of Drug Prescription","authors":"Hao-Ren Yao, D. Chang, O. Frieder, Wendy Huang, I. Liang, C. Hung","doi":"10.1145/3388440.3412459","DOIUrl":"https://doi.org/10.1145/3388440.3412459","url":null,"abstract":"We present an end-to-end, interpretable, deep-learning architecture to learn a graph kernel that predicts the outcome of chronic disease drug prescription. This is achieved through a deep metric learning collaborative with a Support Vector Machine objective using a graphical representation of Electronic Health Records. We formulate the predictive model as a binary graph classification problem with an adaptive learned graph kernel through novel cross-global attention node matching between patient graphs, simultaneously computing on multiple graphs without training pair or triplet generation. Results using the Taiwanese National Health Insurance Research Database demonstrate that our approach outperforms current start-of-the-art models both in terms of accuracy and interpretability.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114478287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Structural representations of DNA regulatory substrates can enhance sequence-based algorithms by associating functional sequence variants DNA调控底物的结构表征可以通过关联功能序列变体来增强基于序列的算法
Jan Zrimec
The nucleotide sequence representation of DNA can be inadequate for resolving protein-DNA binding sites and regulatory substrates, such as those involved in gene expression and horizontal gene transfer. Considering that sequence-like representations are algorithmically very useful, here we fused over 60 currently available DNA physicochemical and conformational variables into compact structural representations that can encode single DNA binding sites to whole regulatory regions. We find that the main structural components reflect key properties of protein-DNA interactions and can be condensed to the amount of information found in a single nucleotide position. The most accurate structural representations compress functional DNA sequence variants by 30% to 50%, as each instance encodes from tens to thousands of sequences. We show that a structural distance function discriminates among groups of DNA substrates more accurately than nucleotide sequence-based metrics. As this opens up a variety of implementation possibilities, we develop and test a distance-based alignment algorithm, demonstrating the potential of using the structural representations to enhance sequence-based algorithms. Due to the bias of most current bioinformatic methods to nucleotide sequence representations, it is possible that considerable performance increases might still be achievable with such solutions.
DNA的核苷酸序列表示可能不足以解决蛋白质-DNA结合位点和调节底物,例如涉及基因表达和水平基因转移的底物。考虑到序列表示在算法上是非常有用的,在这里,我们融合了超过60个目前可用的DNA物理化学和构象变量到紧凑的结构表示,可以编码单个DNA结合位点到整个调控区域。我们发现主要的结构成分反映了蛋白质- dna相互作用的关键特性,并且可以浓缩到单个核苷酸位置的信息量。最精确的结构表示将功能DNA序列变体压缩30%至50%,因为每个实例编码从数十到数千个序列。我们表明,结构距离函数比基于核苷酸序列的指标更准确地区分DNA底物群。由于这开辟了多种实现可能性,我们开发并测试了基于距离的对齐算法,展示了使用结构表示来增强基于序列的算法的潜力。由于目前大多数生物信息学方法对核苷酸序列表示的偏见,有可能通过这种解决方案仍然可以实现相当大的性能提高。
{"title":"Structural representations of DNA regulatory substrates can enhance sequence-based algorithms by associating functional sequence variants","authors":"Jan Zrimec","doi":"10.1145/3388440.3412482","DOIUrl":"https://doi.org/10.1145/3388440.3412482","url":null,"abstract":"The nucleotide sequence representation of DNA can be inadequate for resolving protein-DNA binding sites and regulatory substrates, such as those involved in gene expression and horizontal gene transfer. Considering that sequence-like representations are algorithmically very useful, here we fused over 60 currently available DNA physicochemical and conformational variables into compact structural representations that can encode single DNA binding sites to whole regulatory regions. We find that the main structural components reflect key properties of protein-DNA interactions and can be condensed to the amount of information found in a single nucleotide position. The most accurate structural representations compress functional DNA sequence variants by 30% to 50%, as each instance encodes from tens to thousands of sequences. We show that a structural distance function discriminates among groups of DNA substrates more accurately than nucleotide sequence-based metrics. As this opens up a variety of implementation possibilities, we develop and test a distance-based alignment algorithm, demonstrating the potential of using the structural representations to enhance sequence-based algorithms. Due to the bias of most current bioinformatic methods to nucleotide sequence representations, it is possible that considerable performance increases might still be achievable with such solutions.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127902809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Preliminary Investigation in the Molecular Basis of Host Shutoff Mechanism in SARS-CoV sars冠状病毒宿主关闭机制分子基础的初步研究
Niharika Pandala, C. Cole, D. McFarland, A. Nag, H. Valafar
Recent events leading to the worldwide pandemic of COVID-19 have demonstrated the effective use of genomic sequencing technologies to establish the genetic sequence of this virus. In contrast, the COVID-19 pandemic has demonstrated the absence of computational approaches to understand the molecular basis of this infection rapidly. Here we present an integrated approach to the study of the nsp1 protein in SARS-CoV-1, which plays an essential role in maintaining the expression of viral proteins and further disabling the host protein expression, also known as the host shutoff mechanism. We present three independent methods of evaluating two potential binding sites speculated to participate in host shutoff by nsp1. We have combined results from computed models of nsp1, with deep mining of all existing protein structures (using PDBMine), and binding site recognition (using msTALI) to examine the two sites consisting of residues 55--59 and 73--80. Based on our preliminary results, we conclude that the residues 73--80 appear as the regions that facilitate the critical initial steps in the function of nsp1. Given the 90% sequence identity between nsp1 from SARS-CoV-1 and SARS-CoV-2, we conjecture the same critical initiation step in the function of SARS-CoV-2 nsp1.
最近导致COVID-19全球大流行的事件证明了基因组测序技术在确定该病毒基因序列方面的有效使用。相比之下,COVID-19大流行表明,缺乏快速了解这种感染的分子基础的计算方法。在这里,我们提出了一种综合方法来研究SARS-CoV-1中的nsp1蛋白,它在维持病毒蛋白的表达和进一步禁用宿主蛋白表达方面起着至关重要的作用,也被称为宿主关闭机制。我们提出了三种独立的方法来评估推测参与nsp1关闭宿主的两个潜在结合位点。我们将nsp1计算模型的结果与所有现有蛋白质结构的深度挖掘(使用PDBMine)和结合位点识别(使用msTALI)相结合,以检查由残基55—59和73—80组成的两个位点。根据我们的初步结果,我们得出结论,残基73—80是促进nsp1函数中关键初始步骤的区域。鉴于SARS-CoV-1和SARS-CoV-2的nsp1序列90%相同,我们推测SARS-CoV-2 nsp1在功能上具有相同的关键起始步骤。
{"title":"A Preliminary Investigation in the Molecular Basis of Host Shutoff Mechanism in SARS-CoV","authors":"Niharika Pandala, C. Cole, D. McFarland, A. Nag, H. Valafar","doi":"10.1145/3388440.3412483","DOIUrl":"https://doi.org/10.1145/3388440.3412483","url":null,"abstract":"Recent events leading to the worldwide pandemic of COVID-19 have demonstrated the effective use of genomic sequencing technologies to establish the genetic sequence of this virus. In contrast, the COVID-19 pandemic has demonstrated the absence of computational approaches to understand the molecular basis of this infection rapidly. Here we present an integrated approach to the study of the nsp1 protein in SARS-CoV-1, which plays an essential role in maintaining the expression of viral proteins and further disabling the host protein expression, also known as the host shutoff mechanism. We present three independent methods of evaluating two potential binding sites speculated to participate in host shutoff by nsp1. We have combined results from computed models of nsp1, with deep mining of all existing protein structures (using PDBMine), and binding site recognition (using msTALI) to examine the two sites consisting of residues 55--59 and 73--80. Based on our preliminary results, we conclude that the residues 73--80 appear as the regions that facilitate the critical initial steps in the function of nsp1. Given the 90% sequence identity between nsp1 from SARS-CoV-1 and SARS-CoV-2, we conjecture the same critical initiation step in the function of SARS-CoV-2 nsp1.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133567970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1