首页 > 最新文献

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)最新文献

英文 中文
Feature selection based on functional group structure for microRNA expression data analysis 基于功能基团结构的特征选择用于microRNA表达数据分析
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822525
Yang Yang, Tianyu Cao, Wei Kong
Feature selection methods have been widely used in gene expression analysis to identify differentially expressed genes and explore potential biomarkers for complex diseases. While a lot of studies have shown that incorporating feature structure information can greatly enhance the performance of feature selection algorithms, and genes naturally fall into groups with regard to common function and co-regulation, only a few of gene expression studies utilized the structured properties. And, as far as we know, there has been no such study on microRNA (miRNA) expression analysis due to the lack of available functional annotation for miRNAs. In this study, we focus on miRNA expression analysis because of its importance in the diagnosis, prognosis prediction and new therapeutic target detection for complex diseases. MiRNAs tend to work in groups to play their regulation roles, thus the miRNA expression data also has group structure. We utilize the GO-based semantic similarity to infer miRNA functional groups, and propose a new feature selection method taking group structure into consideration, called MiRFFS (MiRNA Functional group-based Feature Selection). We also apply the group information to the sparse group Lasso method, and compare MiRFFS with the sparse group Lasso as well as some existing feature selection methods. The results on three miRNA microarray profiles of breast cancer show that MiRFFS can achieve a compact feature subset with high classification accuracy.
特征选择方法已广泛应用于基因表达分析,以识别差异表达基因,探索复杂疾病的潜在生物标志物。虽然大量研究表明,结合特征结构信息可以大大提高特征选择算法的性能,并且基因在共同功能和共调控方面自然地属于群体,但只有少数基因表达研究利用了结构特性。而且,据我们所知,由于缺乏可用的miRNA功能注释,目前还没有microRNA (miRNA)表达分析的研究。在本研究中,我们重点关注miRNA表达分析,因为它在复杂疾病的诊断、预后预测和新的治疗靶点检测中具有重要意义。miRNA倾向于成组发挥调控作用,因此miRNA表达数据也具有成组结构。我们利用基于go的语义相似度推断miRNA功能基团,提出了一种考虑基团结构的特征选择方法MiRFFS (miRNA functional group-based feature selection)。我们还将分组信息应用到稀疏组Lasso方法中,并将MiRFFS与稀疏组Lasso以及现有的一些特征选择方法进行了比较。三个乳腺癌miRNA微阵列图谱的结果表明,MiRFFS可以实现紧凑的特征子集,具有较高的分类精度。
{"title":"Feature selection based on functional group structure for microRNA expression data analysis","authors":"Yang Yang, Tianyu Cao, Wei Kong","doi":"10.1109/BIBM.2016.7822525","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822525","url":null,"abstract":"Feature selection methods have been widely used in gene expression analysis to identify differentially expressed genes and explore potential biomarkers for complex diseases. While a lot of studies have shown that incorporating feature structure information can greatly enhance the performance of feature selection algorithms, and genes naturally fall into groups with regard to common function and co-regulation, only a few of gene expression studies utilized the structured properties. And, as far as we know, there has been no such study on microRNA (miRNA) expression analysis due to the lack of available functional annotation for miRNAs. In this study, we focus on miRNA expression analysis because of its importance in the diagnosis, prognosis prediction and new therapeutic target detection for complex diseases. MiRNAs tend to work in groups to play their regulation roles, thus the miRNA expression data also has group structure. We utilize the GO-based semantic similarity to infer miRNA functional groups, and propose a new feature selection method taking group structure into consideration, called MiRFFS (MiRNA Functional group-based Feature Selection). We also apply the group information to the sparse group Lasso method, and compare MiRFFS with the sparse group Lasso as well as some existing feature selection methods. The results on three miRNA microarray profiles of breast cancer show that MiRFFS can achieve a compact feature subset with high classification accuracy.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114484819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying protein complexes via multi-network clustering 通过多网络聚类识别蛋白质复合物
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822594
Ou-Yang Le, Hong Yan, Xiao-Fei Zhang
The detection of protein complexes from protein-protein interaction (PPI) networks is an important step toward understanding the functional organization within cells. A great number of graph clustering algorithms have been proposed to undertake this task. Since PPI data collected by high-throughput technologies is quite noisy, simply applying graph clustering algorithms on PPI data is generally not adequate to achieve reliable prediction results. Behind protein interactions, there are protein domains that interact with each other. Jointly exploiting protein-protein interactions and domain-domain interactions (DDI) have the potential to increase the accuracy of protein complex detection. However, traditional graph clustering algorithms focus on clustering proteins within a single PPI network, and cannot make use of information inherent in other heterogeneous networks. In this paper, we proposed a novel generative model to perform multi-network clustering. Unlike previous protein complex detection algorithms that can only utilize the information within a single PPI network, our model is a flexible framework that can take into account PPIs, DDIs and domain-protein associations to achieve more consistent and reliable clustering results. Experiment results on real data demonstrate that our method performs much better than state-of-the-art protein complex detection techniques.
从蛋白质-蛋白质相互作用(PPI)网络中检测蛋白质复合物是了解细胞内功能组织的重要一步。为了完成这一任务,已经提出了大量的图聚类算法。由于高通量技术采集的PPI数据具有较大的噪声,简单地对PPI数据应用图聚类算法通常不足以获得可靠的预测结果。在蛋白质相互作用的背后,有相互作用的蛋白质结构域。联合利用蛋白质-蛋白质相互作用和结构域-结构域相互作用(DDI)有可能提高蛋白质复合物检测的准确性。然而,传统的图聚类算法主要关注单个PPI网络中的蛋白质聚类,无法利用其他异构网络中固有的信息。本文提出了一种新的多网络聚类生成模型。与以往只能利用单个PPI网络中的信息的蛋白质复合物检测算法不同,我们的模型是一个灵活的框架,可以考虑PPI、ddi和结构域-蛋白质关联,以获得更一致和可靠的聚类结果。实际数据的实验结果表明,我们的方法比目前最先进的蛋白质复合物检测技术性能要好得多。
{"title":"Identifying protein complexes via multi-network clustering","authors":"Ou-Yang Le, Hong Yan, Xiao-Fei Zhang","doi":"10.1109/BIBM.2016.7822594","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822594","url":null,"abstract":"The detection of protein complexes from protein-protein interaction (PPI) networks is an important step toward understanding the functional organization within cells. A great number of graph clustering algorithms have been proposed to undertake this task. Since PPI data collected by high-throughput technologies is quite noisy, simply applying graph clustering algorithms on PPI data is generally not adequate to achieve reliable prediction results. Behind protein interactions, there are protein domains that interact with each other. Jointly exploiting protein-protein interactions and domain-domain interactions (DDI) have the potential to increase the accuracy of protein complex detection. However, traditional graph clustering algorithms focus on clustering proteins within a single PPI network, and cannot make use of information inherent in other heterogeneous networks. In this paper, we proposed a novel generative model to perform multi-network clustering. Unlike previous protein complex detection algorithms that can only utilize the information within a single PPI network, our model is a flexible framework that can take into account PPIs, DDIs and domain-protein associations to achieve more consistent and reliable clustering results. Experiment results on real data demonstrate that our method performs much better than state-of-the-art protein complex detection techniques.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129609119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identifying amino acids sensitive to mutations using high-throughput rigidity analysis 利用高通量刚性分析鉴定对突变敏感的氨基酸
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822779
Michael Siderius, F. Jagodzinski
Understanding how an amino acid substitution affects a protein's stability can aid in the design of pharmaceutical drugs that aim to counter the deleterious effects caused by protein mutants. Unfortunately, performing mutation experiments on the physical protein is both time and cost prohibitive. Thus an exhaustive analysis which includes systematically mutating all amino acids in the physical protein is infeasible. Computational methods have been developed over the years to predict the effects of mutations, but even many of them are computationally intensive else are dependent on homology or experimental data that may not be available for the protein being studied. In this work we motivate and present a computation pipeline whose only input is a Protein Data Bank file containing the 3D coordinates of the atoms of a biomolecule. Our high-throughput approach uses our rMutant algorithm to exhaustively generate in silico mutants with amino acid substitutions to Glycine, Alanine, and Serine for all residues in a protein. We exploit the speed of a fast rigidity analysis approach to analyze our protein variants, and develop a Mutation Sensitivity (MuSe) Map to identify residues that are most sensitive to mutations. We present three case studies and show the degree to which a MuSe Map is able to identify those amino acids which are susceptible to the effects of mutations.
了解氨基酸取代如何影响蛋白质的稳定性可以帮助设计旨在对抗蛋白质突变引起的有害影响的药物。不幸的是,对物理蛋白质进行突变实验既费时又费钱。因此,包括系统地改变物理蛋白质中所有氨基酸的详尽分析是不可行的。多年来,人们已经开发出计算方法来预测突变的影响,但即使是其中的许多方法也需要大量的计算,否则它们依赖于同源性或实验数据,而这些数据可能无法用于所研究的蛋白质。在这项工作中,我们激发并提出了一个计算管道,其唯一的输入是包含生物分子原子三维坐标的蛋白质数据库文件。我们的高通量方法使用我们的突变体算法来详尽地生成具有氨基酸替换为蛋白质中所有残基的甘氨酸,丙氨酸和丝氨酸的硅突变体。我们利用快速刚性分析方法的速度来分析我们的蛋白质变体,并开发突变敏感性(MuSe)图来识别对突变最敏感的残基。我们提出了三个案例研究,并展示了MuSe图谱能够识别易受突变影响的氨基酸的程度。
{"title":"Identifying amino acids sensitive to mutations using high-throughput rigidity analysis","authors":"Michael Siderius, F. Jagodzinski","doi":"10.1109/BIBM.2016.7822779","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822779","url":null,"abstract":"Understanding how an amino acid substitution affects a protein's stability can aid in the design of pharmaceutical drugs that aim to counter the deleterious effects caused by protein mutants. Unfortunately, performing mutation experiments on the physical protein is both time and cost prohibitive. Thus an exhaustive analysis which includes systematically mutating all amino acids in the physical protein is infeasible. Computational methods have been developed over the years to predict the effects of mutations, but even many of them are computationally intensive else are dependent on homology or experimental data that may not be available for the protein being studied. In this work we motivate and present a computation pipeline whose only input is a Protein Data Bank file containing the 3D coordinates of the atoms of a biomolecule. Our high-throughput approach uses our rMutant algorithm to exhaustively generate in silico mutants with amino acid substitutions to Glycine, Alanine, and Serine for all residues in a protein. We exploit the speed of a fast rigidity analysis approach to analyze our protein variants, and develop a Mutation Sensitivity (MuSe) Map to identify residues that are most sensitive to mutations. We present three case studies and show the degree to which a MuSe Map is able to identify those amino acids which are susceptible to the effects of mutations.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128501410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNA mapping using Processor-in-Memory architecture 使用内存处理器架构的DNA映射
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822732
D. Lavenier, Jean-François Roy, David Furodet
This paper presents the implementation of a mapping algorithm on a new Processing-in-Memory (PIM) architecture developed by UPMEM Company. UPMEM's solution consists in adding processing units into the DRAM, to minimize data access time and maximize bandwidth, in order to drastically accelerate data-consuming algorithms. The technology developed by UPMEM makes it possible to combine 256 cores with 16 GBytes of DRAM, on a standard DIMM module. An experimentation of DNA Mapping on Human genome dataset shows that a speed-up of 25 can be obtained with UPMEM technology compared to fast mapping software such as BWA, Bowtie2 or NextGenMap running on 16 Intel threads. Experimentation also highlight that data transfer from storage device limits the performances of the implementation. The use of SSD drives can boost the speed-up to 80.
本文介绍了一种映射算法在UPMEM公司开发的一种新的内存处理(PIM)体系结构上的实现。UPMEM的解决方案包括在DRAM中添加处理单元,以最大限度地减少数据访问时间和最大限度地提高带宽,从而大大加快数据消耗算法。UPMEM开发的技术可以在一个标准的DIMM模块上结合256核和16gb的DRAM。在人类基因组数据集上进行的DNA制图实验表明,与运行在16个Intel线程上的BWA、Bowtie2或NextGenMap等快速制图软件相比,UPMEM技术可获得25倍的加速。实验还表明,从存储设备传输数据限制了实现的性能。使用SSD驱动器可以将加速提升到80。
{"title":"DNA mapping using Processor-in-Memory architecture","authors":"D. Lavenier, Jean-François Roy, David Furodet","doi":"10.1109/BIBM.2016.7822732","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822732","url":null,"abstract":"This paper presents the implementation of a mapping algorithm on a new Processing-in-Memory (PIM) architecture developed by UPMEM Company. UPMEM's solution consists in adding processing units into the DRAM, to minimize data access time and maximize bandwidth, in order to drastically accelerate data-consuming algorithms. The technology developed by UPMEM makes it possible to combine 256 cores with 16 GBytes of DRAM, on a standard DIMM module. An experimentation of DNA Mapping on Human genome dataset shows that a speed-up of 25 can be obtained with UPMEM technology compared to fast mapping software such as BWA, Bowtie2 or NextGenMap running on 16 Intel threads. Experimentation also highlight that data transfer from storage device limits the performances of the implementation. The use of SSD drives can boost the speed-up to 80.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128494814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
ERDS-pe: A paired hidden Markov model for copy number variant detection from whole-exome sequencing data ERDS-pe:从全外显子组测序数据中检测拷贝数变异的配对隐马尔可夫模型
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822508
Renjie Tan, Jixuan Wang, Xiaoliang Wu, Guoqiang Wan, Rongjie Wang, Rui Ma, Zhijie Han, Wenyang Zhou, Shuilin Jin, Qinghua Jiang, Yadong Wang
Detecting copy number variants (CNVs) is an essential part in variant calling process. Here, we describe a novel method ERDS-pe to detect CNVs from whole-exome sequencing (WES) data. ERDS-pe first employs principal component analysis to normalize WES data. Then, ERDS-pe incorporates read depth signal and single-nucleotide variation information together as a hybrid signal into a paired hidden Markov model to infer CNVs from WES data. Experimental results on real human WES data show that ERDS-pe demonstrates higher sensitivity and provides comparable or even better specificity than other tools. ERDS-pe is publicly available at: https://github.com/microtan0902/erds-pe.
拷贝数变异的检测是变异调用过程中的一个重要环节。在这里,我们描述了一种从全外显子组测序(WES)数据中检测CNVs的新方法ERDS-pe。erds - type首先采用主成分分析对WES数据进行归一化处理。然后,ERDS-pe将读取深度信号和单核苷酸变异信息作为混合信号结合到配对隐马尔可夫模型中,从WES数据中推断CNVs。在真实人体WES数据上的实验结果表明,ERDS-pe具有更高的灵敏度,并且具有与其他工具相当甚至更好的特异性。ERDS-pe可在:https://github.com/microtan0902/erds-pe公开获取。
{"title":"ERDS-pe: A paired hidden Markov model for copy number variant detection from whole-exome sequencing data","authors":"Renjie Tan, Jixuan Wang, Xiaoliang Wu, Guoqiang Wan, Rongjie Wang, Rui Ma, Zhijie Han, Wenyang Zhou, Shuilin Jin, Qinghua Jiang, Yadong Wang","doi":"10.1109/BIBM.2016.7822508","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822508","url":null,"abstract":"Detecting copy number variants (CNVs) is an essential part in variant calling process. Here, we describe a novel method ERDS-pe to detect CNVs from whole-exome sequencing (WES) data. ERDS-pe first employs principal component analysis to normalize WES data. Then, ERDS-pe incorporates read depth signal and single-nucleotide variation information together as a hybrid signal into a paired hidden Markov model to infer CNVs from WES data. Experimental results on real human WES data show that ERDS-pe demonstrates higher sensitivity and provides comparable or even better specificity than other tools. ERDS-pe is publicly available at: https://github.com/microtan0902/erds-pe.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129332369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Comparisons of linkage disequilibrium blocks of different populations at the sites of natural selection 不同种群在自然选择位点的连锁不平衡区比较
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822784
Sun-Ah Kim, Suh-Ryung Kim, Y. J. Yoo
Linkage disequilibrium structure (LD) is the main source of the study of population genetics and disease-gene association. Especially, analyzing extended long haplotypes carrying a derived allele and examining LD block patterns can provide evidence for positive selection. We investigated the LD block structure of East Asian, European, and African populations for the previously reported sites of positive selection by comparing LD block construction results based on 1000 Genomes Project data. We confirmed that differences of LD block size in EDAR, LCT, PCDH15, and LARGE region among different populations is consistent with previous reports. We found new evidence for positive selection in SLC30A19, PDE11A and BCAS3 in East Asian and European populations based on the LD block patterns.
连锁不平衡结构(LD)是群体遗传学和疾病基因关联研究的主要来源。特别是,分析携带衍生等位基因的延伸长单倍型和检查LD块模式可以为正选择提供证据。我们通过比较基于1000基因组计划数据的LD块构建结果,研究了东亚、欧洲和非洲人群中先前报道的阳性选择位点的LD块结构。我们证实不同人群中EDAR、LCT、PCDH15和LARGE区域LD块大小的差异与文献报道一致。我们发现东亚和欧洲人群中SLC30A19、PDE11A和BCAS3存在正选择的新证据。
{"title":"Comparisons of linkage disequilibrium blocks of different populations at the sites of natural selection","authors":"Sun-Ah Kim, Suh-Ryung Kim, Y. J. Yoo","doi":"10.1109/BIBM.2016.7822784","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822784","url":null,"abstract":"Linkage disequilibrium structure (LD) is the main source of the study of population genetics and disease-gene association. Especially, analyzing extended long haplotypes carrying a derived allele and examining LD block patterns can provide evidence for positive selection. We investigated the LD block structure of East Asian, European, and African populations for the previously reported sites of positive selection by comparing LD block construction results based on 1000 Genomes Project data. We confirmed that differences of LD block size in EDAR, LCT, PCDH15, and LARGE region among different populations is consistent with previous reports. We found new evidence for positive selection in SLC30A19, PDE11A and BCAS3 in East Asian and European populations based on the LD block patterns.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124543905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ILSES: Identification lysine succinylation-sites with ensemble classification ILSES:用集合分类鉴定赖氨酸琥珀酰化位点
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822530
Wenzheng Bao, Lin Zhu, De-shuang Huang
Lysine succinylation is one of most important types in protein post-translational modification, which is involved in many cellular processes and serious diseases. However, effective recognition of such sites with traditional experiment methods may seem to be treated as time-consuming and laborious. Those methods can hardly meet the need of efficient identification a great deal of succinylated sites at speed. In this work, several physicochemical properties of succinylated sites have been extracted, such as the physicochemical property of the amino acids. Flexible neural tree, which is employed as the classification model, was utilized to integrate above mentioned features for generating a novel lysine succinylation prediction framework named ILSES (identification lysine succinylation-sites with ensemble features classification). Such method owns the ability to combining diverse features to predict lysine succinylation with high accuracy and real time.
赖氨酸琥珀酰化是蛋白质翻译后修饰的重要类型之一,涉及许多细胞过程和严重疾病。然而,用传统的实验方法有效地识别这些地点似乎是费时费力的。这些方法很难满足快速高效鉴定大量琥珀化位点的需要。在这项工作中,提取了琥珀酰化位点的一些理化性质,如氨基酸的理化性质。采用柔性神经树作为分类模型,对上述特征进行整合,生成新的赖氨酸琥珀酰化预测框架ILSES (identification lysine succinylation-sites with ensemble features classification)。该方法能够结合多种特征预测赖氨酸琥珀酰化,具有较高的准确性和实时性。
{"title":"ILSES: Identification lysine succinylation-sites with ensemble classification","authors":"Wenzheng Bao, Lin Zhu, De-shuang Huang","doi":"10.1109/BIBM.2016.7822530","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822530","url":null,"abstract":"Lysine succinylation is one of most important types in protein post-translational modification, which is involved in many cellular processes and serious diseases. However, effective recognition of such sites with traditional experiment methods may seem to be treated as time-consuming and laborious. Those methods can hardly meet the need of efficient identification a great deal of succinylated sites at speed. In this work, several physicochemical properties of succinylated sites have been extracted, such as the physicochemical property of the amino acids. Flexible neural tree, which is employed as the classification model, was utilized to integrate above mentioned features for generating a novel lysine succinylation prediction framework named ILSES (identification lysine succinylation-sites with ensemble features classification). Such method owns the ability to combining diverse features to predict lysine succinylation with high accuracy and real time.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129134684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multiple sequence alignment and reconstructing phylogenetic trees with Hadoop 基于Hadoop的多序列比对和系统发育树重建
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822735
Q. Zou
Multiple sequence alignment (MSA) is the “Holy Grail” problem in computational biology, but bottlenecks arise in the massive MSA of homologous sequences. Most of the available state-of-the-art software tools cannot address large-scale datasets, or they run rather slowly. The similarity of homologous DNA sequences is often ignored. Lack of parallelization is still a challenge for MSA research. Building the phylogenetic trees for ultra-large sequences is also a time-consuming work. MSA is the previous work for phylogenetic reconstruction. With the development of parallel computation, we employed Hadoop platform to solve the two computational intensive problems. Trie trees and suffix trees were used for accelerating multiple similar DNA sequences alignment. The expected time complexity was decreased to linear time from square time. For the phylogenetic tree reconstruction, clustering and multiple-sequence alignment were executed in parallel, and the basic phylogenetic trees were built using the neighbour-joining model. Experiments on two large datasets, both more than 1 GB, show that our software tool can outperform other common phylogenetic reconstruction tools. Furthermore, data, software codes, and web servers were all opened in http://lab.malab.cn/soft/halign/ and http://lab.malab.cn/soft/HPtree/
多序列比对(MSA)是计算生物学中的“圣杯”问题,但在同源序列的大量MSA中出现了瓶颈。大多数可用的最先进的软件工具不能处理大规模数据集,或者它们运行得相当慢。同源DNA序列的相似性常常被忽略。缺乏并行化仍然是MSA研究的一个挑战。构建超大序列的系统发育树也是一项耗时的工作。MSA是系统发育重建的前期工作。随着并行计算的发展,我们采用Hadoop平台来解决这两个计算密集型问题。三树和后缀树用于加速多个相似DNA序列的比对。期望时间复杂度由平方时间降为线性时间。在系统发生树重建中,并行进行聚类和多序列比对,并利用邻域连接模型构建基本系统发生树。在两个大于1gb的大型数据集上的实验表明,我们的软件工具可以优于其他常见的系统发育重建工具。此外,数据、软件代码和web服务器都是在http://lab.malab.cn/soft/halign/和http://lab.malab.cn/soft/HPtree/上打开的
{"title":"Multiple sequence alignment and reconstructing phylogenetic trees with Hadoop","authors":"Q. Zou","doi":"10.1109/BIBM.2016.7822735","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822735","url":null,"abstract":"Multiple sequence alignment (MSA) is the “Holy Grail” problem in computational biology, but bottlenecks arise in the massive MSA of homologous sequences. Most of the available state-of-the-art software tools cannot address large-scale datasets, or they run rather slowly. The similarity of homologous DNA sequences is often ignored. Lack of parallelization is still a challenge for MSA research. Building the phylogenetic trees for ultra-large sequences is also a time-consuming work. MSA is the previous work for phylogenetic reconstruction. With the development of parallel computation, we employed Hadoop platform to solve the two computational intensive problems. Trie trees and suffix trees were used for accelerating multiple similar DNA sequences alignment. The expected time complexity was decreased to linear time from square time. For the phylogenetic tree reconstruction, clustering and multiple-sequence alignment were executed in parallel, and the basic phylogenetic trees were built using the neighbour-joining model. Experiments on two large datasets, both more than 1 GB, show that our software tool can outperform other common phylogenetic reconstruction tools. Furthermore, data, software codes, and web servers were all opened in http://lab.malab.cn/soft/halign/ and http://lab.malab.cn/soft/HPtree/","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129227693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Characteristic gene selection via L2,1-norm Sparse Principal Component Analysis 基于L2,1范数稀疏主成分分析的特征基因选择
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822796
Yao Lu, Ying-Lian Gao, Jin-Xing Liu, Chang-Gang Wen, Yaxuan Wang, Jiguo Yu
Sparse Principal Component Analysis (SPCA) is a method that can get the sparse loadings of the principal components (PCs), and it may formulate PCA as a regression-type optimization problem by using the elastic net. But the selected features are different with each PC and generally independent. A new method named SPCA has been proposed for removing these detect, which replaces the elastic net with L2,1-norm penalty. The results of the method on gene expression data are still unknown. Therefore, we will take a test to prove this point in this paper. Firstly, this method is applied to the simulated data for obtaining an optimal parameter. Secondly, the L2,1SPCA method is applied to the gene expression data, that is the head and neck squamous carcinoma data (HNSC). Thirdly, the characteristic genes are selected according the PCs. The results consist of very lower P-value and very higher hit count, which shows the method of L2,1SPCA can obtain higher recognition accuracy and higher relevancy to the genes. Finally, the experimental results demonstrate that the L2,1SPCA works well and has good performances in the gene expression data.
稀疏主成分分析(SPCA)是一种获取主成分稀疏载荷的方法,它可以利用弹性网络将主成分分析转化为回归型优化问题。但所选择的功能因个人电脑而异,通常是独立的。为了消除这些检测,提出了一种新的SPCA方法,用L2,1范数惩罚代替弹性网。该方法对基因表达数据的结果尚不清楚。因此,我们将在本文中进行一个测试来证明这一点。首先,将该方法应用于模拟数据,求出最优参数。其次,将L2,1SPCA方法应用于基因表达数据,即头颈部鳞状癌数据(HNSC)。第三,根据pc选择特征基因。结果表明,L2,1SPCA方法具有较低的p值和较高的命中数,可以获得较高的识别精度和与基因的相关性。最后,实验结果表明,L2,1SPCA在基因表达数据中具有良好的性能。
{"title":"Characteristic gene selection via L2,1-norm Sparse Principal Component Analysis","authors":"Yao Lu, Ying-Lian Gao, Jin-Xing Liu, Chang-Gang Wen, Yaxuan Wang, Jiguo Yu","doi":"10.1109/BIBM.2016.7822796","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822796","url":null,"abstract":"Sparse Principal Component Analysis (SPCA) is a method that can get the sparse loadings of the principal components (PCs), and it may formulate PCA as a regression-type optimization problem by using the elastic net. But the selected features are different with each PC and generally independent. A new method named SPCA has been proposed for removing these detect, which replaces the elastic net with L2,1-norm penalty. The results of the method on gene expression data are still unknown. Therefore, we will take a test to prove this point in this paper. Firstly, this method is applied to the simulated data for obtaining an optimal parameter. Secondly, the L2,1SPCA method is applied to the gene expression data, that is the head and neck squamous carcinoma data (HNSC). Thirdly, the characteristic genes are selected according the PCs. The results consist of very lower P-value and very higher hit count, which shows the method of L2,1SPCA can obtain higher recognition accuracy and higher relevancy to the genes. Finally, the experimental results demonstrate that the L2,1SPCA works well and has good performances in the gene expression data.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130622168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FPGA implementation of the coupled filtering method 用FPGA实现的耦合滤波方法
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822556
C. Zhang, Tianzhu Liang, P. Mok, Weichuan Yu
In ultrasound image analysis, speckle tracking methods are widely applied to study the elasticity of body tissue. However, “feature-motion decorrelation” still remains as a challenge for speckle tracking methods. Recently, a coupled filtering method was proposed to accurately estimate strain values when the tissue deformation is large. The major drawback of the new method is its high computational complexity. Even the GPU-based program requires a few hours to finish the analysis. In this paper, we propose an FPGA-based implementation for further acceleration. The capability of FPGAs on handling different image processing components in this method is discussed. The algorithm is reformulated to build a highly efficient pipeline on FPGA. The final implementation on a Xilinx Virtex-7 FPGA is 15 times faster than the GPU implementation on two NVIDIA graphic cards (GeForce GTX 580).
在超声图像分析中,散斑跟踪方法被广泛应用于研究人体组织的弹性。然而,“特征-运动去相关”仍然是散斑跟踪方法面临的挑战。最近提出了一种耦合滤波方法,用于在组织变形较大时准确估计应变值。新方法的主要缺点是计算复杂度高。即使是基于gpu的程序也需要几个小时才能完成分析。在本文中,我们提出了一个基于fpga的实现来进一步加速。讨论了该方法中fpga处理不同图像处理元件的能力。为了在FPGA上构建高效的流水线,对算法进行了重新表述。在Xilinx Virtex-7 FPGA上的最终实现比在两个NVIDIA显卡(GeForce GTX 580)上的GPU实现快15倍。
{"title":"FPGA implementation of the coupled filtering method","authors":"C. Zhang, Tianzhu Liang, P. Mok, Weichuan Yu","doi":"10.1109/BIBM.2016.7822556","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822556","url":null,"abstract":"In ultrasound image analysis, speckle tracking methods are widely applied to study the elasticity of body tissue. However, “feature-motion decorrelation” still remains as a challenge for speckle tracking methods. Recently, a coupled filtering method was proposed to accurately estimate strain values when the tissue deformation is large. The major drawback of the new method is its high computational complexity. Even the GPU-based program requires a few hours to finish the analysis. In this paper, we propose an FPGA-based implementation for further acceleration. The capability of FPGAs on handling different image processing components in this method is discussed. The algorithm is reformulated to build a highly efficient pipeline on FPGA. The final implementation on a Xilinx Virtex-7 FPGA is 15 times faster than the GPU implementation on two NVIDIA graphic cards (GeForce GTX 580).","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123617664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1