首页 > 最新文献

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)最新文献

英文 中文
mAMBER: A CPU/MIC collaborated parallel framework for AMBER on Tianhe-2 supercomputer 天河二号超级计算机AMBER的CPU/MIC协同并行框架
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822595
Shaoliang Peng, Xiaoyu Zhang, Yutong Lu, Xiangke Liao, Kai Lu, Canqun Yang, Jie Liu, Weiliang Zhu, Dongqing Wei
Molecular dynamics (MD) is a computer simulation method of studying physical movements of atoms and molecules that provide detailed microscopic sampling on molecular scale. With the continuous efforts and improvements, MD simulation gained popularity in materials science, biochemistry and biophysics with various application areas and expanding data scale. Assisted Model Building with Energy Refinement (AMBER) is one of the most widely used software packages for conducting MD simulations. However, the speed of AMBER MD simulations for system with millions of atoms in microsecond scale still need to be improved. In this paper, we propose a parallel acceleration strategy for AMBER on Tianhe-2 supercomputer. The parallel optimization of AMBER is carried out on three different levels: fine grained OpenMP parallel on a single MIC, single-node CPU/MIC collaborated parallel optimization and multi-node multi-MIC collaborated parallel acceleration. By the three levels of parallel acceleration strategy above, we achieved the highest speedup of 25–33 times compared with the original program. Source Code: https://github.com/tianhe2/mAMBER
分子动力学(MD)是一种研究原子和分子物理运动的计算机模拟方法,它提供了分子尺度上详细的微观采样。随着不断的努力和改进,MD仿真在材料科学、生物化学和生物物理学等领域得到了广泛的应用,数据规模不断扩大。辅助模型构建与能量细化(AMBER)是一个最广泛使用的软件包进行MD模拟。然而,在微秒尺度下,数百万原子系统的AMBER MD模拟速度仍有待提高。本文提出了AMBER在天河二号超级计算机上的并行加速策略。AMBER的并行优化分三个层次进行:单MIC上的细粒度OpenMP并行、单节点CPU/MIC协同并行优化和多节点多MIC协同并行加速。通过以上三个层次的并行加速策略,我们实现了与原方案相比最高25-33倍的加速。源代码:https://github.com/tianhe2/mAMBER
{"title":"mAMBER: A CPU/MIC collaborated parallel framework for AMBER on Tianhe-2 supercomputer","authors":"Shaoliang Peng, Xiaoyu Zhang, Yutong Lu, Xiangke Liao, Kai Lu, Canqun Yang, Jie Liu, Weiliang Zhu, Dongqing Wei","doi":"10.1109/BIBM.2016.7822595","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822595","url":null,"abstract":"Molecular dynamics (MD) is a computer simulation method of studying physical movements of atoms and molecules that provide detailed microscopic sampling on molecular scale. With the continuous efforts and improvements, MD simulation gained popularity in materials science, biochemistry and biophysics with various application areas and expanding data scale. Assisted Model Building with Energy Refinement (AMBER) is one of the most widely used software packages for conducting MD simulations. However, the speed of AMBER MD simulations for system with millions of atoms in microsecond scale still need to be improved. In this paper, we propose a parallel acceleration strategy for AMBER on Tianhe-2 supercomputer. The parallel optimization of AMBER is carried out on three different levels: fine grained OpenMP parallel on a single MIC, single-node CPU/MIC collaborated parallel optimization and multi-node multi-MIC collaborated parallel acceleration. By the three levels of parallel acceleration strategy above, we achieved the highest speedup of 25–33 times compared with the original program. Source Code: https://github.com/tianhe2/mAMBER","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134490291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mathematical and computational analysis of CRISPR Cas9 sgRNA off-target homologies CRISPR Cas9 sgRNA脱靶同源性的数学和计算分析
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822558
M. Zhou, Daisy Li, X. Huan, Joseph Manthey, E. Lioutikova, Hong Zhou
The true power of genome editing mechanism known as RNA-programmable CRISPR Cas9 endonuclease system, lies in the fact that Cas9 can be guided to any loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, and therefore allows the introduction of wanted mutations. Unfortunately, sgRNA is prone to off-target homologous attachment, thus guiding Cas9 to cleave DNA sequences at unwanted sites. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as the example, this article analyzed the probabilities of off-target sites of sgRNAs and discovered that for large-size genomes such as human genome, off-target sites are nearly inevitable for sgRNA selection. Based on the mathematical analysis, it seems that the double nicking approach is currently the only feasible solution to promise genome editing specificity. An effective computational algorithm for off-target homology searching is also implemented to confirm the mathematical analysis.
被称为RNA可编程CRISPR Cas9内切酶系统的基因组编辑机制的真正力量在于,Cas9可以被引导到与20 nt RNA(单导RNA (sgRNA))互补的任何位点上,以切割双链DNA,从而允许引入所需的突变。不幸的是,sgRNA容易脱靶同源附着,从而引导Cas9在不需要的位点切割DNA序列。本文以人类基因组和化脓性链球菌Cas9 (SpCas9)为例,分析了sgRNA脱靶位点的概率,发现对于人类基因组这样的大尺度基因组,sgRNA选择的脱靶位点几乎是不可避免的。基于数学分析,双切口方法似乎是目前唯一可行的解决方案,以保证基因组编辑的特异性。实现了一种有效的脱靶同源搜索计算算法来验证数学分析。
{"title":"Mathematical and computational analysis of CRISPR Cas9 sgRNA off-target homologies","authors":"M. Zhou, Daisy Li, X. Huan, Joseph Manthey, E. Lioutikova, Hong Zhou","doi":"10.1109/BIBM.2016.7822558","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822558","url":null,"abstract":"The true power of genome editing mechanism known as RNA-programmable CRISPR Cas9 endonuclease system, lies in the fact that Cas9 can be guided to any loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, and therefore allows the introduction of wanted mutations. Unfortunately, sgRNA is prone to off-target homologous attachment, thus guiding Cas9 to cleave DNA sequences at unwanted sites. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as the example, this article analyzed the probabilities of off-target sites of sgRNAs and discovered that for large-size genomes such as human genome, off-target sites are nearly inevitable for sgRNA selection. Based on the mathematical analysis, it seems that the double nicking approach is currently the only feasible solution to promise genome editing specificity. An effective computational algorithm for off-target homology searching is also implemented to confirm the mathematical analysis.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134520727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
3D tracking swimming fish school using a master view tracking first strategy 采用主视图跟踪优先策略对游动鱼群进行三维跟踪
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822572
Shuohong Wang, Xiang Liu, Jingwen Zhao, Ye Liu, Y. Chen
3D motion data of fish school is more valuable than 2D data for behavior and other researches. This paper proposes to use a master view tracking first strategy based on a novel master-slave camera setup. On this basis, fish are firstly tracked in master view in 2D after being extracted via an eye-focused Gaussian Mixture Model (E-GMM) detector. Then 3D trajectories are reconstructed by associating 2D tracking results in master view and detection results in slave views after fish in slave views are localized using an eye-focused Gabor (E-Gabor) detector. Experiments on data sets with different fish densities demonstrate that the proposed method outperforms two state-of-the-art methods in terms of 5 evaluation metrics.
鱼群的三维运动数据比二维数据在行为和其他研究中更有价值。本文提出了一种基于主从摄像机设置的主视图优先跟踪策略。在此基础上,通过眼聚焦高斯混合模型(E-GMM)检测器提取鱼,首先在主视图中进行二维跟踪。然后利用眼聚焦Gabor (E-Gabor)检测器对从视图中的鱼进行定位,通过将主视图中的2D跟踪结果与从视图中的检测结果相关联,重建三维轨迹。在不同鱼类密度的数据集上进行的实验表明,该方法在5个评价指标方面优于两种最先进的方法。
{"title":"3D tracking swimming fish school using a master view tracking first strategy","authors":"Shuohong Wang, Xiang Liu, Jingwen Zhao, Ye Liu, Y. Chen","doi":"10.1109/BIBM.2016.7822572","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822572","url":null,"abstract":"3D motion data of fish school is more valuable than 2D data for behavior and other researches. This paper proposes to use a master view tracking first strategy based on a novel master-slave camera setup. On this basis, fish are firstly tracked in master view in 2D after being extracted via an eye-focused Gaussian Mixture Model (E-GMM) detector. Then 3D trajectories are reconstructed by associating 2D tracking results in master view and detection results in slave views after fish in slave views are localized using an eye-focused Gabor (E-Gabor) detector. Experiments on data sets with different fish densities demonstrate that the proposed method outperforms two state-of-the-art methods in terms of 5 evaluation metrics.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133171954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An evaluation of data replication for bioinformatics workflows on NoSQL systems NoSQL系统中生物信息学工作流程的数据复制评估
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822644
Iasmini Lima, Matheus Oliveira, Diego S. Kieckbusch, M. Holanda, M. E. Walter, Aleteia P. F. Araujo, M. Victorino, Waldeyr M. C. Silva, Sérgio Lifschitz
Many research projects in bioinformatics may be viewed as scientific workflows. Biologists often run multiple times the same workflow with different parameters in order to refine their data analysis. These executions generate a large volume of files with different formats, which need to be stored for future evaluations. New database models, like NoSQL systems, could be considered to deal with large volumes of data, particularly in distributed systems. This work presents a data replication impact assessment from the execution of scientific workflows for two NoSQL database management systems: Cassandra and MongoDB.
生物信息学中的许多研究项目可被视为科学工作流程。生物学家经常用不同的参数运行多次相同的工作流程,以完善他们的数据分析。这些执行生成大量不同格式的文件,这些文件需要存储以供将来评估。新的数据库模型,如NoSQL系统,可以考虑处理大量数据,特别是在分布式系统中。这项工作提出了两个NoSQL数据库管理系统:Cassandra和MongoDB的科学工作流执行的数据复制影响评估。
{"title":"An evaluation of data replication for bioinformatics workflows on NoSQL systems","authors":"Iasmini Lima, Matheus Oliveira, Diego S. Kieckbusch, M. Holanda, M. E. Walter, Aleteia P. F. Araujo, M. Victorino, Waldeyr M. C. Silva, Sérgio Lifschitz","doi":"10.1109/BIBM.2016.7822644","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822644","url":null,"abstract":"Many research projects in bioinformatics may be viewed as scientific workflows. Biologists often run multiple times the same workflow with different parameters in order to refine their data analysis. These executions generate a large volume of files with different formats, which need to be stored for future evaluations. New database models, like NoSQL systems, could be considered to deal with large volumes of data, particularly in distributed systems. This work presents a data replication impact assessment from the execution of scientific workflows for two NoSQL database management systems: Cassandra and MongoDB.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133781263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DASE: Condition-specific differential alternative splicing variants estimation method without reference genome sequence, and its application to non-model organisms 无参考基因组序列的条件特异性差异选择性剪接变异体估计方法及其在非模式生物中的应用
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822540
Kouki Yonezawa, Tsukasa Mori, S. Shigeno, A. Ogura
Alternative splicing is a mechanism to produce gene expression diversity under the constraint of a limited number of genes, causing spatiotemporal gene expression in many tissues and developmental processes in organisms. This mechanism is well studied in model organisms but not in non-model organisms because the current standard method requires genomic sequences as well as fully annotated information of exons and introns, that are not accessible from non-model organisms. However, it is essential to uncover the landscape of alternative splicing of organisms to understand its evolutionary impacts and roles. We developed a method for condition-specific alternative splicing estimation based on de novo transcriptome assembly, and it would help to enlarge knowledge of alternative splicing functionalized in non-model organisms. The software is deposited to https://github.com/koukiyonezawa/DASE.
选择性剪接是在有限数量基因的约束下产生基因表达多样性的机制,在生物体的许多组织和发育过程中引起基因的时空表达。这一机制在模式生物中得到了很好的研究,但在非模式生物中却没有得到很好的研究,因为目前的标准方法需要基因组序列以及外显子和内含子的完整注释信息,而这些信息在非模式生物中是无法获得的。然而,揭示生物选择性剪接的景观对于理解其进化影响和作用是至关重要的。我们开发了一种基于从头转录组组装的条件特异性选择性剪接估计方法,这将有助于扩大对非模式生物中功能化的选择性剪接的了解。软件存放在https://github.com/koukiyonezawa/DASE。
{"title":"DASE: Condition-specific differential alternative splicing variants estimation method without reference genome sequence, and its application to non-model organisms","authors":"Kouki Yonezawa, Tsukasa Mori, S. Shigeno, A. Ogura","doi":"10.1109/BIBM.2016.7822540","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822540","url":null,"abstract":"Alternative splicing is a mechanism to produce gene expression diversity under the constraint of a limited number of genes, causing spatiotemporal gene expression in many tissues and developmental processes in organisms. This mechanism is well studied in model organisms but not in non-model organisms because the current standard method requires genomic sequences as well as fully annotated information of exons and introns, that are not accessible from non-model organisms. However, it is essential to uncover the landscape of alternative splicing of organisms to understand its evolutionary impacts and roles. We developed a method for condition-specific alternative splicing estimation based on de novo transcriptome assembly, and it would help to enlarge knowledge of alternative splicing functionalized in non-model organisms. The software is deposited to https://github.com/koukiyonezawa/DASE.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115044240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wide line detection with water flow 宽线检测水流
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822715
Yangyang Hu, Wenqiang Zhang, Hong Lu, Fufeng Li, Weifei Zhang
Line detection plays a vital role in visual analysis tasks like Traditional Chinese Medicine (TCM) image analytics. However, most of the current methods ignore line thickness and perform poorly for the lines with different widths. This paper proposes a novel line detection method by using the water flow method. Unlike most edge-based and region-based line detectors, the water flow method is applied to obtaining the whole line response map by simply imitating the movement of water in the image smoothed by guided filter, which is viewed as a geomorphological map. In addition, this paper also proposes an adaptive parameter selection method so that the line detection can be more robust and accurate. Experimental results demonstrate the effectiveness of the proposed method on tongue crack images in comparison to the existing line extraction methods.
在中医图像分析等视觉分析任务中,线检测起着至关重要的作用。然而,目前大多数方法忽略了线粗细,对于不同宽度的线表现不佳。本文提出了一种利用水流法进行直线检测的新方法。与大多数基于边缘和区域的线检测器不同,水流法通过简单地模拟经过引导滤波平滑的图像中的水的运动来获得整个线响应图,将其视为地形图。此外,本文还提出了一种自适应参数选择方法,使直线检测具有更强的鲁棒性和准确性。实验结果表明,与现有的线提取方法相比,该方法对舌裂纹图像的提取是有效的。
{"title":"Wide line detection with water flow","authors":"Yangyang Hu, Wenqiang Zhang, Hong Lu, Fufeng Li, Weifei Zhang","doi":"10.1109/BIBM.2016.7822715","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822715","url":null,"abstract":"Line detection plays a vital role in visual analysis tasks like Traditional Chinese Medicine (TCM) image analytics. However, most of the current methods ignore line thickness and perform poorly for the lines with different widths. This paper proposes a novel line detection method by using the water flow method. Unlike most edge-based and region-based line detectors, the water flow method is applied to obtaining the whole line response map by simply imitating the movement of water in the image smoothed by guided filter, which is viewed as a geomorphological map. In addition, this paper also proposes an adaptive parameter selection method so that the line detection can be more robust and accurate. Experimental results demonstrate the effectiveness of the proposed method on tongue crack images in comparison to the existing line extraction methods.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115087121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Is EEG causal to fNIRs? 脑电图与近红外光谱有因果关系吗?
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822648
Borzou Alipourfard, Jean X. Gao, Olajide Babawale, Hanli Liu
Causality analysis of simultaneous measurements of the brain's electrical activity and its hemodynamic activity provides the opportunity to study the neural underpinning of hemodynamic fluctuations. This multimodal analysis can also be used to extract valuable information regarding the location of the generators of various electrical events such as Alpha rhythms or epileptiform activity. To best of our knowledge, we are the first propose a method to assess causality from EEG to the hemodynamic activity measured using functional near-infrared spectroscopy (fNIRs). The main challenge in studying causality within this setting arises from the low sampling rate of the fNIRs and the mixed frequency nature of the data. Our method of analysis consists of two parts. Through a simple modification of Geweke's formulation of contamination, we first show that the low sampling frequency of the fNIRs does not cause contamination in estimating causality from EEG to fNIRs. We then apply a novel causality test to avoid the down-sampling of the EEG when measuring for causality. The method of analysis proposed here can be generalized to study causality in other biomedical signal analysis applications and mixed frequency settings.
同时测量脑电活动和血流动力学活动的因果关系分析为研究血流动力学波动的神经基础提供了机会。这种多模态分析也可用于提取有关各种电事件(如α节律或癫痫样活动)的发生器位置的有价值的信息。据我们所知,我们是第一个提出一种方法来评估从脑电图到使用功能近红外光谱(fNIRs)测量的血流动力学活性的因果关系。在这种情况下研究因果关系的主要挑战来自近红外光谱的低采样率和数据的混合频率性质。我们的分析方法由两部分组成。通过对Geweke的污染公式的简单修改,我们首先证明了低采样频率的近红外光谱在估计EEG到近红外光谱的因果关系时不会造成污染。然后,我们应用了一种新的因果关系检验,以避免在测量因果关系时脑电图的下采样。本文提出的分析方法可以推广到其他生物医学信号分析应用和混合频率设置中的因果关系研究。
{"title":"Is EEG causal to fNIRs?","authors":"Borzou Alipourfard, Jean X. Gao, Olajide Babawale, Hanli Liu","doi":"10.1109/BIBM.2016.7822648","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822648","url":null,"abstract":"Causality analysis of simultaneous measurements of the brain's electrical activity and its hemodynamic activity provides the opportunity to study the neural underpinning of hemodynamic fluctuations. This multimodal analysis can also be used to extract valuable information regarding the location of the generators of various electrical events such as Alpha rhythms or epileptiform activity. To best of our knowledge, we are the first propose a method to assess causality from EEG to the hemodynamic activity measured using functional near-infrared spectroscopy (fNIRs). The main challenge in studying causality within this setting arises from the low sampling rate of the fNIRs and the mixed frequency nature of the data. Our method of analysis consists of two parts. Through a simple modification of Geweke's formulation of contamination, we first show that the low sampling frequency of the fNIRs does not cause contamination in estimating causality from EEG to fNIRs. We then apply a novel causality test to avoid the down-sampling of the EEG when measuring for causality. The method of analysis proposed here can be generalized to study causality in other biomedical signal analysis applications and mixed frequency settings.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114493302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sparse singular value decomposition-based feature extraction for identifying differentially expressed genes 基于稀疏奇异值分解的差异表达基因特征提取
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822795
Jin-Xing Liu, Xiangzhen Kong, C. Zheng, J. Shang, Wei Zhang
Recently, feature extraction and dimensionality reduction have become fundamental tools for many data mining tasks, especially for processing high-dimensional data such as genome data. In this paper, a new feature extraction method based on sparse singular value decomposition (SSVD) is developed. SSVD algorithm is applied to extract differentially expressed genes from two different genome datasets that are all from The Cancer Genome Atlas (TCGA), and then the extracted genes are evaluated by the tools based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. As a gene extraction method, SSVD is also compared with some existing feature extraction methods such as independent component analysis, the p-norm robust feature extraction and sparse principal component analysis. The experimental GO analysis results show that SSVD method outperforms the competitive algorithms. The KEGG analysis results demonstrate the genes which participate in the pathways in cancer. The elaborate experiments prove that SSVD is an effective feature selection method compared with the competitive methods. The KEGG analysis results may provide a meaningful reference to carry out further study for professionals in the field of biomedical science.
近年来,特征提取和降维已经成为许多数据挖掘任务的基本工具,特别是处理高维数据,如基因组数据。本文提出了一种基于稀疏奇异值分解(SSVD)的特征提取方法。采用SSVD算法从两个不同的基因组数据集(均来自The Cancer genome Atlas, TCGA)中提取差异表达基因,并基于基因本体(Gene Ontology, GO)和京都基因与基因组百科全书(Kyoto Encyclopedia of genes and Genomes, KEGG)途径富集分析工具对提取的基因进行评估。作为一种基因提取方法,SSVD还与独立成分分析、p范数鲁棒特征提取和稀疏主成分分析等现有特征提取方法进行了比较。实验结果表明,SSVD方法优于竞争算法。KEGG分析结果显示了参与癌症通路的基因。实验证明,与竞争方法相比,SSVD是一种有效的特征选择方法。KEGG分析结果可为生物医学领域的专业人员进一步开展研究提供有意义的参考。
{"title":"Sparse singular value decomposition-based feature extraction for identifying differentially expressed genes","authors":"Jin-Xing Liu, Xiangzhen Kong, C. Zheng, J. Shang, Wei Zhang","doi":"10.1109/BIBM.2016.7822795","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822795","url":null,"abstract":"Recently, feature extraction and dimensionality reduction have become fundamental tools for many data mining tasks, especially for processing high-dimensional data such as genome data. In this paper, a new feature extraction method based on sparse singular value decomposition (SSVD) is developed. SSVD algorithm is applied to extract differentially expressed genes from two different genome datasets that are all from The Cancer Genome Atlas (TCGA), and then the extracted genes are evaluated by the tools based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. As a gene extraction method, SSVD is also compared with some existing feature extraction methods such as independent component analysis, the p-norm robust feature extraction and sparse principal component analysis. The experimental GO analysis results show that SSVD method outperforms the competitive algorithms. The KEGG analysis results demonstrate the genes which participate in the pathways in cancer. The elaborate experiments prove that SSVD is an effective feature selection method compared with the competitive methods. The KEGG analysis results may provide a meaningful reference to carry out further study for professionals in the field of biomedical science.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123004401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
2Path: A terpenoid metabolic network modeled as graph database 2 . path:一个以图数据库为模型的萜类代谢网络
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822709
Waldeyr M. C. Silva, Danilo Vilar, Daniel S. Souza, M. E. Walter, M. Brigido, M. Holanda
Terpenoids are involved in interactions as signaling for communication intra/inter species, signal molecules to attract pollinating insects, and defense against herbivores and microbes. Due to their chemical composition, many terpenoids possess vast pharmacological applicability in medicine and biotechnology, besides important roles in ecology, industry and commerce. The biosynthesis of terpenes has been widely studied over the years, and it is well known that they can be synthesized from two metabolic pathways: mevalonate pathway (MVA) and non-mevalonate pathway (MEP). On the other hand, genome-scale reconstruction of metabolic networks faces many challenges, including organizational data storage and data modeling, to properly represent the complexity of systems biology. Recent NoSQL database paradigms have introduced new concepts of scalable storage and data queries. Among them graph databases, which are versatile enough to cope with biological data. In this paper, we propose 2Path, a graph database model designed to represent terpenoid metabolic networks, with thousands of secondary metabolism reactions, such that it preserves important terpenoid biosynthesis characteristics.
萜类化合物参与相互作用,作为物种内/物种间交流的信号,吸引传粉昆虫的信号分子,以及防御食草动物和微生物。由于其化学组成,许多萜类化合物在医学和生物技术方面具有广泛的药理适用性,除了在生态、工业和商业方面具有重要作用。萜烯的生物合成已被广泛研究多年,众所周知,它们可以通过两种代谢途径合成:甲羟戊酸途径(MVA)和非甲羟戊酸途径(MEP)。另一方面,代谢网络的基因组尺度重建面临着许多挑战,包括组织数据存储和数据建模,以正确地表示系统生物学的复杂性。最近的NoSQL数据库范例引入了可伸缩存储和数据查询的新概念。其中包括图形数据库,它具有足够的通用性来处理生物数据。在本文中,我们提出了2Path,这是一个旨在表示萜类代谢网络的图形数据库模型,具有数千个次级代谢反应,因此它保留了重要的萜类生物合成特征。
{"title":"2Path: A terpenoid metabolic network modeled as graph database","authors":"Waldeyr M. C. Silva, Danilo Vilar, Daniel S. Souza, M. E. Walter, M. Brigido, M. Holanda","doi":"10.1109/BIBM.2016.7822709","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822709","url":null,"abstract":"Terpenoids are involved in interactions as signaling for communication intra/inter species, signal molecules to attract pollinating insects, and defense against herbivores and microbes. Due to their chemical composition, many terpenoids possess vast pharmacological applicability in medicine and biotechnology, besides important roles in ecology, industry and commerce. The biosynthesis of terpenes has been widely studied over the years, and it is well known that they can be synthesized from two metabolic pathways: mevalonate pathway (MVA) and non-mevalonate pathway (MEP). On the other hand, genome-scale reconstruction of metabolic networks faces many challenges, including organizational data storage and data modeling, to properly represent the complexity of systems biology. Recent NoSQL database paradigms have introduced new concepts of scalable storage and data queries. Among them graph databases, which are versatile enough to cope with biological data. In this paper, we propose 2Path, a graph database model designed to represent terpenoid metabolic networks, with thousands of secondary metabolism reactions, such that it preserves important terpenoid biosynthesis characteristics.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123752458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of multiple heterogeneous omics data 多个异构组学数据的集成
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822582
Chuanchao Zhang, Juan Liu, Qianqian Shi, Xiangtian Yu, T. Zeng, Luonan Chen
Integration of different genomic profiles is challenging to understand complex diseases in a multi-view manner. Computational method is needed to preserve useful information of data types as well as correct bias. Thus, we proposed a novel framework pattern fusion analysis (PFA), to fuse the local sample patterns into a global pattern of patients with respect to the underlying data, by adaptively aligning the information in each type of biological data. In particular, PFA can adjust the distinct data types and achieve more robust sample pattern within different profiles. To validate the effectiveness of PFA, we tested PFA on various synthetic datasets and found that PFA is able to effectively capture the intrinsic clustering structure than the state-of-the-art integrative methods, such as moCluster, iClusterPlus and SNF. Moreover, in a case study on kidney cancer, PFA not only identified the multi-way feature modules among the prior-known disease associated genes, methylations and miRNAs, but also outperformed in cancer subtypes identification and could get effective clinical prognosis prediction. Totally, PFA not only provides new insights on the more holistic & systems-level sample pattern, but also supplies a new way for selecting more informative types of biological data.
整合不同的基因组图谱对以多视角理解复杂疾病具有挑战性。计算方法既要保留数据类型的有用信息,又要纠正偏差。因此,我们提出了一种新的框架模式融合分析(PFA),通过自适应地对齐每种生物数据中的信息,将局部样本模式融合到相对于基础数据的患者全局模式中。特别是,PFA可以调整不同的数据类型,并在不同的配置文件中实现更健壮的样本模式。为了验证PFA的有效性,我们在各种合成数据集上测试了PFA,发现PFA比最先进的综合方法(如moCluster, iClusterPlus和SNF)能够有效地捕获内在聚类结构。此外,在肾癌的案例研究中,PFA不仅识别了已知疾病相关基因、甲基化和mirna之间的多向特征模块,而且在癌症亚型识别方面也表现出色,能够得到有效的临床预后预测。总的来说,PFA不仅提供了更全面和系统级的样本模式的新见解,而且为选择更多信息类型的生物数据提供了一种新的方法。
{"title":"Integration of multiple heterogeneous omics data","authors":"Chuanchao Zhang, Juan Liu, Qianqian Shi, Xiangtian Yu, T. Zeng, Luonan Chen","doi":"10.1109/BIBM.2016.7822582","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822582","url":null,"abstract":"Integration of different genomic profiles is challenging to understand complex diseases in a multi-view manner. Computational method is needed to preserve useful information of data types as well as correct bias. Thus, we proposed a novel framework pattern fusion analysis (PFA), to fuse the local sample patterns into a global pattern of patients with respect to the underlying data, by adaptively aligning the information in each type of biological data. In particular, PFA can adjust the distinct data types and achieve more robust sample pattern within different profiles. To validate the effectiveness of PFA, we tested PFA on various synthetic datasets and found that PFA is able to effectively capture the intrinsic clustering structure than the state-of-the-art integrative methods, such as moCluster, iClusterPlus and SNF. Moreover, in a case study on kidney cancer, PFA not only identified the multi-way feature modules among the prior-known disease associated genes, methylations and miRNAs, but also outperformed in cancer subtypes identification and could get effective clinical prognosis prediction. Totally, PFA not only provides new insights on the more holistic & systems-level sample pattern, but also supplies a new way for selecting more informative types of biological data.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129923959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1