首页 > 最新文献

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)最新文献

英文 中文
Machine learning approaches for the investigation of features beyond seed matches affecting miRNA binding 用于研究种子匹配以外影响miRNA结合的特征的机器学习方法
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706564
Cen Gao, Jing Li
MicroRNAs are one type of noncoding RNA that regulate their target mRNAs before mRNAs are translated into proteins. Although it has been demonstrated that the regulation is through partial binding of the seed region of a miRNA and its targets, the mechanism of this process is not fully discovered. Some biological experiments have shown that even perfect base pairing in the seed region does not always guarantee the down-regulation of the targets. It has been suspected that some other characteristics of mRNAs may facilitate the regulation. An earlier study (1) has identified five additional features beyond seed matching that seem to significantly affect repressions. However, the observation that evolutionally conserved targets have shown significantly more destabilization comparing to nonconserved targets with the same score using these five features leads to the suspicion that additional features remain to be discovered. This motivates our study to identify additional features that may differentiate down-regulated mRNAs (positive set) from those not down-regulated ones (negative set) provided both sets have perfect seed matches with miRNAs. Our first attempt to search for different sequence motifs around seed site regions in the two different sets is not successful. We further construct a set of 18 sequence/structure features based on domain knowledge and evaluate them individually and jointly. By employing feature selection techniques in combination with several classification methods, we have been able to identify a subset of features that may facilitate the down-regulation of mRNAs. Our results can be incorporated into target prediction algorithms to further improve target specificities.
microrna是一种非编码RNA,在mrna被翻译成蛋白质之前对其靶mrna进行调控。虽然已经证明这种调节是通过miRNA的种子区及其靶标的部分结合来实现的,但这一过程的机制尚未完全发现。一些生物学实验表明,即使种子区碱基配对完美,也不能保证靶基因的下调。人们一直怀疑mrna的一些其他特征可能促进了这种调节。早期的一项研究(1)已经确定了除了种子匹配之外的五个额外特征,这些特征似乎对抑制有显著影响。然而,观察到进化保守的目标与使用这五个特征的非保守目标相比,在相同的分数下表现出更大的不稳定性,导致怀疑其他特征仍有待发现。这促使我们的研究确定可能区分下调mrna(阳性组)和非下调mrna(阴性组)的其他特征,前提是这两组mrna都与mirna具有完美的种子匹配。我们第一次尝试在两个不同集合的种子位点区域周围搜索不同的序列基序,但没有成功。我们进一步基于领域知识构建了一组18个序列/结构特征,并分别和联合对它们进行了评价。通过将特征选择技术与几种分类方法相结合,我们已经能够识别出可能促进mrna下调的特征子集。我们的结果可以纳入目标预测算法,以进一步提高目标特异性。
{"title":"Machine learning approaches for the investigation of features beyond seed matches affecting miRNA binding","authors":"Cen Gao, Jing Li","doi":"10.1109/BIBM.2010.5706564","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706564","url":null,"abstract":"MicroRNAs are one type of noncoding RNA that regulate their target mRNAs before mRNAs are translated into proteins. Although it has been demonstrated that the regulation is through partial binding of the seed region of a miRNA and its targets, the mechanism of this process is not fully discovered. Some biological experiments have shown that even perfect base pairing in the seed region does not always guarantee the down-regulation of the targets. It has been suspected that some other characteristics of mRNAs may facilitate the regulation. An earlier study (1) has identified five additional features beyond seed matching that seem to significantly affect repressions. However, the observation that evolutionally conserved targets have shown significantly more destabilization comparing to nonconserved targets with the same score using these five features leads to the suspicion that additional features remain to be discovered. This motivates our study to identify additional features that may differentiate down-regulated mRNAs (positive set) from those not down-regulated ones (negative set) provided both sets have perfect seed matches with miRNAs. Our first attempt to search for different sequence motifs around seed site regions in the two different sets is not successful. We further construct a set of 18 sequence/structure features based on domain knowledge and evaluate them individually and jointly. By employing feature selection techniques in combination with several classification methods, we have been able to identify a subset of features that may facilitate the down-regulation of mRNAs. Our results can be incorporated into target prediction algorithms to further improve target specificities.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123263357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A supervised learning approach to the unsupervised clustering of genes 基因无监督聚类的监督学习方法
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706585
Andrew K. Rider, Geoffrey H. Siwo, S. Emrich, M. Ferdig, N. Chawla
Clustering is a common step in the analysis of microarray data. Microarrays enable simultaneous high-throughput measurement of the expression level of genes. These data can be used to explore relationships between genes and can guide development of drugs and further research. A typical first step in the analysis of these data is to use an agglomerative hierarchical clustering algorithm on the correlation between all gene pairs. While this simple approach has been successful it fails to identify many genetic interactions that may be important for drug design and other important applications. We present an approach to the clustering of expression data that utilizes known gene-gene interaction data to improve results for already commonly used clustering techniques. The approach creates an ensemble similarity measure that can be used as input to common clustering techniques and provides results with increased biological significance while not altering the clustering approach at all.
聚类是分析微阵列数据的常见步骤。微阵列能够同时高通量测量基因的表达水平。这些数据可以用来探索基因之间的关系,并可以指导药物的开发和进一步的研究。对这些数据进行分析的典型第一步是对所有基因对之间的相关性使用聚类分层聚类算法。虽然这种简单的方法取得了成功,但它未能识别出许多可能对药物设计和其他重要应用很重要的基因相互作用。我们提出了一种表达数据聚类的方法,利用已知的基因-基因相互作用数据来改进已经常用的聚类技术的结果。该方法创建了一个集成相似性度量,可以用作普通聚类技术的输入,并提供具有更高生物学意义的结果,同时完全不改变聚类方法。
{"title":"A supervised learning approach to the unsupervised clustering of genes","authors":"Andrew K. Rider, Geoffrey H. Siwo, S. Emrich, M. Ferdig, N. Chawla","doi":"10.1109/BIBM.2010.5706585","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706585","url":null,"abstract":"Clustering is a common step in the analysis of microarray data. Microarrays enable simultaneous high-throughput measurement of the expression level of genes. These data can be used to explore relationships between genes and can guide development of drugs and further research. A typical first step in the analysis of these data is to use an agglomerative hierarchical clustering algorithm on the correlation between all gene pairs. While this simple approach has been successful it fails to identify many genetic interactions that may be important for drug design and other important applications. We present an approach to the clustering of expression data that utilizes known gene-gene interaction data to improve results for already commonly used clustering techniques. The approach creates an ensemble similarity measure that can be used as input to common clustering techniques and provides results with increased biological significance while not altering the clustering approach at all.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114946479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature selection for graph kernels 图核的特征选择
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706643
Mehmet Tan, Faruk Polat, R. Alhajj
Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.
图分类在不同的科学应用中很重要;它可以应用于与生物信息学和化学信息学有关的各种问题。鉴于它们的图表,越来越需要对小分子进行分类,以预测它们的性质,如活性、毒性或诱变性。在核方法中,使用子树作为特征集进行图分类在小分子分类中表现良好。众所周知,特征选择可以提高分类器的性能。然而,大多数图核在选择哪些子树包含在特征集中时是没有选择性的。相反,它们使用某一属性的所有子树作为它们的特征集。我们认为,并不是所有的后一种特征都需要有效的分类。在本文中,我们研究了选择子树子集作为图核特征的效果,即我们试图识别并保留有用的特征;所有剩余的子树都被消除。提出了一种可以归结为特征选择的掩蔽方法来对图进行分类。我们在几个分子分类数据集上进行了实验;实验结果验证了所提出的特征选择过程的适用性和有效性。
{"title":"Feature selection for graph kernels","authors":"Mehmet Tan, Faruk Polat, R. Alhajj","doi":"10.1109/BIBM.2010.5706643","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706643","url":null,"abstract":"Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126826198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Computational prediction of toxicity 毒性的计算预测
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706653
Meenakshi Mishra, Hongliang Fei, Jun Huan
As the number of new chemicals developed and being used keep adding every year, having the toxic profiles of each chemical becomes a daunting challenge. To meet this information gap, EPA suggested that certain in vitro assays and computational methods, which predict toxicity related information in much lesser time and cost than traditional in vivo methods, may be used. In this paper, we use computational techniques to use results from certain in vitro assays applied on 309 chemicals (whose toxicity profile is readily available) along with the molecular descriptors and other computed physical-chemical properties of the chemicals to predict the toxicity caused by chemical at a particular endpoint. The dataset is available from EPA TOXCAST group online. We show that Random Forest and Naïve Bayes have a good performance on this dataset. We also show that using small and related trees in random forest help to further improve the performance.
随着每年开发和使用的新化学品数量不断增加,了解每种化学品的毒性概况成为一项艰巨的挑战。为了弥补这一信息缺口,EPA建议可以使用某些体外测定和计算方法,这些方法比传统的体内方法在更短的时间和成本下预测毒性相关信息。在本文中,我们使用计算技术来使用309种化学物质(其毒性谱很容易获得)的某些体外测定结果,以及化学物质的分子描述符和其他计算的物理化学性质,以预测化学物质在特定端点引起的毒性。该数据集可从EPA TOXCAST组在线获得。我们证明随机森林和Naïve贝叶斯在这个数据集上有很好的性能。我们还表明,在随机森林中使用小的和相关的树有助于进一步提高性能。
{"title":"Computational prediction of toxicity","authors":"Meenakshi Mishra, Hongliang Fei, Jun Huan","doi":"10.1109/BIBM.2010.5706653","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706653","url":null,"abstract":"As the number of new chemicals developed and being used keep adding every year, having the toxic profiles of each chemical becomes a daunting challenge. To meet this information gap, EPA suggested that certain in vitro assays and computational methods, which predict toxicity related information in much lesser time and cost than traditional in vivo methods, may be used. In this paper, we use computational techniques to use results from certain in vitro assays applied on 309 chemicals (whose toxicity profile is readily available) along with the molecular descriptors and other computed physical-chemical properties of the chemicals to predict the toxicity caused by chemical at a particular endpoint. The dataset is available from EPA TOXCAST group online. We show that Random Forest and Naïve Bayes have a good performance on this dataset. We also show that using small and related trees in random forest help to further improve the performance.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126256881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Exploratory analysis of the BioAssay Network with implications to therapeutic discovery 生物测定网络的探索性分析与治疗发现的意义
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706630
Jintao Zhang, G. Lushington, Jun Huan
Despite intense investment growth and technology development, there is an observed bottleneck in drug discovery and development over the past decade. NIH started the Molecular Libraries Initiative (MLI) in 2004 to enlarge the pool for potential drug targets, especially from the “undruggable” part of human genome, and potential drug candidates from much broader types of drug-like small molecules. In this paper we used the concepts of network biology to integrate MLI data with other biological databases such as DrugBank and UniHI, and evaluated the potential of MLI target proteins being new drug targets. Our analysis provided some measures of the value of the MLI data as a resource for both basic chemical biology research and future therapeutic discovery.
尽管有大量的投资增长和技术发展,但在过去十年中,药物发现和开发出现了明显的瓶颈。2004年,美国国立卫生研究院启动了分子文库计划(MLI),以扩大潜在药物靶点的资源库,特别是从人类基因组中“不可药物”的部分,以及从更广泛的药物类型(如小分子)中寻找潜在的候选药物。本文运用网络生物学的概念,将MLI数据与DrugBank、UniHI等生物数据库进行整合,并对MLI靶蛋白作为新药靶点的潜力进行了评价。我们的分析为MLI数据作为基础化学生物学研究和未来治疗发现的资源的价值提供了一些衡量标准。
{"title":"Exploratory analysis of the BioAssay Network with implications to therapeutic discovery","authors":"Jintao Zhang, G. Lushington, Jun Huan","doi":"10.1109/BIBM.2010.5706630","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706630","url":null,"abstract":"Despite intense investment growth and technology development, there is an observed bottleneck in drug discovery and development over the past decade. NIH started the Molecular Libraries Initiative (MLI) in 2004 to enlarge the pool for potential drug targets, especially from the “undruggable” part of human genome, and potential drug candidates from much broader types of drug-like small molecules. In this paper we used the concepts of network biology to integrate MLI data with other biological databases such as DrugBank and UniHI, and evaluated the potential of MLI target proteins being new drug targets. Our analysis provided some measures of the value of the MLI data as a resource for both basic chemical biology research and future therapeutic discovery.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127748640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic topic modeling for genomic data interpretation 基因组数据解释的概率主题建模
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706554
Xin Chen, Xiaohua Hu, Xiajiong Shen, G. Rosen
Recently, the concept of a species containing both core and distributed genes, known as the supra- or pangenome theory, has been introduced. In this paper, we aim to develop a new method that is able to analyze the genome-level composition of DNA sequences, in order to characterize a set of common genomic features shared by the same species and tell their functional roles. To achieve this end, we firstly apply a composition-based approach to break down DNA sequences into sub-reads called the ‘N-mer’ and represent the sequences by N-mer frequencies. Then, we introduce the Latent Dirichlet Allocation (LDA) model to study the genome-level statistic patterns (a.k.a. latent topics) of the ‘N-mer’ features. Each estimated latent topic represents a certain component of the whole genome. With the help of the BioJava toolkit, we access to the gene region information of reference sequences from the NCBI database. We use our data mining framework to investigate two areas: 1) do strains within species share similar core and distributed topics? and 2) do genes with similar functional roles contain similar latent topics? After studying the mutual information between latent topics and gene regions, we provide examples of each, where the BioCyc database is used to correlate pathway and reaction information to the genes. The examples demonstrate the effectiveness of proposed method.
最近,物种既包含核心基因又包含分布基因的概念被称为超基因组理论或泛基因组理论。在本文中,我们旨在开发一种新的方法,能够分析DNA序列的基因组水平组成,以表征同一物种共享的一组共同的基因组特征,并告诉他们的功能作用。为了实现这一目标,我们首先采用基于组合的方法将DNA序列分解为称为“N-mer”的子读段,并用N-mer频率表示序列。然后,我们引入潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)模型来研究“N-mer”特征的基因组水平统计模式(即潜在主题)。每个估计的潜在主题代表了整个基因组的某个组成部分。借助BioJava工具箱,我们从NCBI数据库中获取参考序列的基因区域信息。我们使用我们的数据挖掘框架来研究两个领域:1)物种内的菌株是否具有相似的核心和分布主题?2)具有相似功能角色的基因是否包含相似的潜在主题?在研究了潜在主题和基因区域之间的相互信息之后,我们提供了每个例子,其中BioCyc数据库用于将途径和反应信息与基因相关联。算例验证了该方法的有效性。
{"title":"Probabilistic topic modeling for genomic data interpretation","authors":"Xin Chen, Xiaohua Hu, Xiajiong Shen, G. Rosen","doi":"10.1109/BIBM.2010.5706554","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706554","url":null,"abstract":"Recently, the concept of a species containing both core and distributed genes, known as the supra- or pangenome theory, has been introduced. In this paper, we aim to develop a new method that is able to analyze the genome-level composition of DNA sequences, in order to characterize a set of common genomic features shared by the same species and tell their functional roles. To achieve this end, we firstly apply a composition-based approach to break down DNA sequences into sub-reads called the ‘N-mer’ and represent the sequences by N-mer frequencies. Then, we introduce the Latent Dirichlet Allocation (LDA) model to study the genome-level statistic patterns (a.k.a. latent topics) of the ‘N-mer’ features. Each estimated latent topic represents a certain component of the whole genome. With the help of the BioJava toolkit, we access to the gene region information of reference sequences from the NCBI database. We use our data mining framework to investigate two areas: 1) do strains within species share similar core and distributed topics? and 2) do genes with similar functional roles contain similar latent topics? After studying the mutual information between latent topics and gene regions, we provide examples of each, where the BioCyc database is used to correlate pathway and reaction information to the genes. The examples demonstrate the effectiveness of proposed method.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132434944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
An Evolutionary Monte Carlo algorithm for identifying short adjacent repeats in multiple sequences 多序列中相邻短重复序列识别的进化蒙特卡罗算法
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706645
Jin Xu, Qiwei Li, Xiaodan Fan, V. Li, S. Li
Evolutionary Monte Carlo (EMC) algorithm is an effective and powerful method to sample complicated distributions. Short adjacent repeats identification problem (SARIP), i.e., searching for the common sequence pattern in multiple DNA sequences, is considered as one of the key challenges in the field of bioinformatics. A recently proposed Markov chain Monte Carlo (MCMC) algorithm has demonstrated its effectiveness in solving SARIP. However, high computation time and inevitable local optima hinder its wide application. In this paper, we apply EMC to parallelize the MCMC algorithm to solve SARIP. Our proposed EMC scheme is implemented on a parallel platform and the simulation results show that, compared with the conventional MCMC algorithm, EMC not only improves the quality of final solution but also reduces the computation time.
进化蒙特卡罗(EMC)算法是对复杂分布进行采样的一种有效而强大的方法。短相邻重复序列识别问题(SARIP),即在多个DNA序列中寻找共同的序列模式,是生物信息学领域的关键挑战之一。最近提出的一种马尔可夫链蒙特卡罗(MCMC)算法已经证明了它在求解SARIP方面的有效性。然而,计算时间长和不可避免的局部最优限制了它的广泛应用。本文采用EMC并行化MCMC算法求解SARIP问题。仿真结果表明,与传统的MCMC算法相比,EMC算法不仅提高了最终解的质量,而且减少了计算时间。
{"title":"An Evolutionary Monte Carlo algorithm for identifying short adjacent repeats in multiple sequences","authors":"Jin Xu, Qiwei Li, Xiaodan Fan, V. Li, S. Li","doi":"10.1109/BIBM.2010.5706645","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706645","url":null,"abstract":"Evolutionary Monte Carlo (EMC) algorithm is an effective and powerful method to sample complicated distributions. Short adjacent repeats identification problem (SARIP), i.e., searching for the common sequence pattern in multiple DNA sequences, is considered as one of the key challenges in the field of bioinformatics. A recently proposed Markov chain Monte Carlo (MCMC) algorithm has demonstrated its effectiveness in solving SARIP. However, high computation time and inevitable local optima hinder its wide application. In this paper, we apply EMC to parallelize the MCMC algorithm to solve SARIP. Our proposed EMC scheme is implemented on a parallel platform and the simulation results show that, compared with the conventional MCMC algorithm, EMC not only improves the quality of final solution but also reduces the computation time.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"641 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133102165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A two-stage machine learning approach for pathway analysis 路径分析的两阶段机器学习方法
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706576
Wei Zhang, S. Emrich, Erliang Zeng
Analysis of gene expression data has emerged as an important approach to discover active pathways related to biological phenotypes. Previous pathway analysis methods use all genes in a pathway for linking it to a particular phenotype. Using only a subset of informative genes, however, could better classify samples. Here, we propose a two-stage machine learning approach for pathway analysis. During the first stage, informative genes that can represent a pathway are selected using feature selection methods. These “representative genes” are mostly associated with the phenotype of interest. In the second stage, pathways are ranked based on their “representative genes” using classification methods. We applied our two-stage approach on three gene expression datasets. The results indicate our method does outperform methods that consider every gene in a pathway.
基因表达数据分析已成为发现与生物表型相关的活性途径的重要方法。以前的途径分析方法使用途径中的所有基因将其与特定表型联系起来。然而,仅使用信息基因的子集可以更好地对样本进行分类。在这里,我们提出了一种两阶段的机器学习方法来进行路径分析。在第一阶段,使用特征选择方法选择可以代表途径的信息基因。这些“代表性基因”大多与感兴趣的表型相关。第二阶段,采用分类方法,根据“代表性基因”对路径进行排序。我们对三个基因表达数据集应用了我们的两阶段方法。结果表明,我们的方法确实优于考虑途径中每个基因的方法。
{"title":"A two-stage machine learning approach for pathway analysis","authors":"Wei Zhang, S. Emrich, Erliang Zeng","doi":"10.1109/BIBM.2010.5706576","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706576","url":null,"abstract":"Analysis of gene expression data has emerged as an important approach to discover active pathways related to biological phenotypes. Previous pathway analysis methods use all genes in a pathway for linking it to a particular phenotype. Using only a subset of informative genes, however, could better classify samples. Here, we propose a two-stage machine learning approach for pathway analysis. During the first stage, informative genes that can represent a pathway are selected using feature selection methods. These “representative genes” are mostly associated with the phenotype of interest. In the second stage, pathways are ranked based on their “representative genes” using classification methods. We applied our two-stage approach on three gene expression datasets. The results indicate our method does outperform methods that consider every gene in a pathway.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"9 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123955882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A dynamic qualitative probabilistic network approach for extracting gene regulatory network motifs 一种动态定性概率网络方法提取基因调控网络基序
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706595
Zina M. Ibrahim, A. Ngom, Ahmed Y. Tawfik
This paper extends our work to using qualitative probability to model the naturally-occurring motifs of gene regulatory networks. Having showed in [16] that the qualitative relations defining QPN graphs exhibit a direct mapping to the naturally-occurring network motifs embedded in Gene Regulatory Networks, this work is concerned with generalizing QPN constructs to create a high-level framework from which any regulatory network motif can be derived. Experimental results using time-series data of the Saccha-romyces Cerevisiae show the effectiveness of our approach in providing a more accurate description of the regulatory motifs in the Saccharomyces Cerevisiae gene regulatory network compared to our previous definitions.
本文将我们的工作扩展到使用定性概率来模拟基因调控网络的自然发生的基序。在[16]中表明,定义QPN图的定性关系直接映射到嵌入在基因调控网络中的自然发生的网络基序,这项工作涉及推广QPN结构,以创建一个高级框架,从中可以推导出任何调控网络基序。使用酿酒酵母时间序列数据的实验结果表明,与之前的定义相比,我们的方法在提供更准确描述酿酒酵母基因调控网络中的调控基序方面是有效的。
{"title":"A dynamic qualitative probabilistic network approach for extracting gene regulatory network motifs","authors":"Zina M. Ibrahim, A. Ngom, Ahmed Y. Tawfik","doi":"10.1109/BIBM.2010.5706595","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706595","url":null,"abstract":"This paper extends our work to using qualitative probability to model the naturally-occurring motifs of gene regulatory networks. Having showed in [16] that the qualitative relations defining QPN graphs exhibit a direct mapping to the naturally-occurring network motifs embedded in Gene Regulatory Networks, this work is concerned with generalizing QPN constructs to create a high-level framework from which any regulatory network motif can be derived. Experimental results using time-series data of the Saccha-romyces Cerevisiae show the effectiveness of our approach in providing a more accurate description of the regulatory motifs in the Saccharomyces Cerevisiae gene regulatory network compared to our previous definitions.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A non-parameter Ising model for network-based identification of differentially expressed genes in recurrent breast cancer patients 用于复发性乳腺癌患者差异表达基因网络识别的非参数Ising模型
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706565
Xumeng Li, F. Feltus, Xiaoqian Sun, Zijun Wang, Feng Luo
Identification of genes and pathways involving in diseases and physiological conditions is a major task in systems biology. In this study, we develop a new non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also propose a simulated annealing algorithm to find the optimal configuration of the Ising model. We test the Ising model to two breast cancer microarray data sets. The results show that more cancer related differentially expressed subnetworks and genes are identified by the Ising model than by the Markov random filed (MRF) model.
识别与疾病和生理状况有关的基因和途径是系统生物学的一项主要任务。在这项研究中,我们开发了一个新的非参数Ising模型来整合蛋白质-蛋白质相互作用网络和微阵列数据来识别差异表达(DE)基因。我们还提出了一种模拟退火算法来寻找Ising模型的最优配置。我们对两个乳腺癌微阵列数据集测试了Ising模型。结果表明,与马尔可夫随机场(MRF)模型相比,Ising模型能识别出更多与癌症相关的差异表达子网络和基因。
{"title":"A non-parameter Ising model for network-based identification of differentially expressed genes in recurrent breast cancer patients","authors":"Xumeng Li, F. Feltus, Xiaoqian Sun, Zijun Wang, Feng Luo","doi":"10.1109/BIBM.2010.5706565","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706565","url":null,"abstract":"Identification of genes and pathways involving in diseases and physiological conditions is a major task in systems biology. In this study, we develop a new non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also propose a simulated annealing algorithm to find the optimal configuration of the Ising model. We test the Ising model to two breast cancer microarray data sets. The results show that more cancer related differentially expressed subnetworks and genes are identified by the Ising model than by the Markov random filed (MRF) model.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115217355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1