首页 > 最新文献

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)最新文献

英文 中文
Scalable, updatable predictive models for sequence data 可扩展的、可更新的序列数据预测模型
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706652
Neeraj Koul, N. Bui, Vasant G Honavar
The emergence of data rich domains has led to an exponential growth in the size and number of data repositories, offering exciting opportunities to learn from the data using machine learning algorithms. In particular, sequence data is being made available at a rapid rate. In many applications, the learning algorithm may not have direct access to the entire dataset because of a variety of reasons such as massive data size or bandwidth limitation. In such settings, there is a need for techniques that can learn predictive models (e.g., classifiers) from large datasets without direct access to the data. We describe an approach to learn from massive sequence datasets using statistical queries. Specifically we show how Markov Models and Probabilistic Suffix Trees (PSTs) can be constructed from sequence databases that answer only a class of count queries. We analyze the query complexity (a measure of the number of queries needed) for constructing classifiers in such settings and outline some techniques to minimize the query complexity. We also show how some of the models can be updated in response to addition or deletion of subsets of sequences from the underlying sequence database.
数据丰富领域的出现导致数据存储库的规模和数量呈指数级增长,为使用机器学习算法从数据中学习提供了令人兴奋的机会。特别是,序列数据正以迅速的速度提供。在许多应用中,由于大量数据大小或带宽限制等各种原因,学习算法可能无法直接访问整个数据集。在这种情况下,需要能够在不直接访问数据的情况下从大型数据集中学习预测模型(例如分类器)的技术。我们描述了一种使用统计查询从大量序列数据集中学习的方法。具体来说,我们展示了如何从仅回答一类计数查询的序列数据库构建马尔可夫模型和概率后缀树(pst)。我们分析了在这种设置中构造分类器所需的查询复杂性(所需查询数量的度量),并概述了一些最小化查询复杂性的技术。我们还展示了如何更新一些模型以响应底层序列数据库中序列子集的添加或删除。
{"title":"Scalable, updatable predictive models for sequence data","authors":"Neeraj Koul, N. Bui, Vasant G Honavar","doi":"10.1109/BIBM.2010.5706652","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706652","url":null,"abstract":"The emergence of data rich domains has led to an exponential growth in the size and number of data repositories, offering exciting opportunities to learn from the data using machine learning algorithms. In particular, sequence data is being made available at a rapid rate. In many applications, the learning algorithm may not have direct access to the entire dataset because of a variety of reasons such as massive data size or bandwidth limitation. In such settings, there is a need for techniques that can learn predictive models (e.g., classifiers) from large datasets without direct access to the data. We describe an approach to learn from massive sequence datasets using statistical queries. Specifically we show how Markov Models and Probabilistic Suffix Trees (PSTs) can be constructed from sequence databases that answer only a class of count queries. We analyze the query complexity (a measure of the number of queries needed) for constructing classifiers in such settings and outline some techniques to minimize the query complexity. We also show how some of the models can be updated in response to addition or deletion of subsets of sequences from the underlying sequence database.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133264480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A possible mutation that enables H1N1 influenza a virus to escape antibody recognition 一种可能的突变使甲型H1N1流感病毒逃避抗体识别
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706541
C. Su, S. D. Handoko, C. Kwoh, C. Schönbach, X. Li
The H1N1 influenza A 2009 pandemic caused a global concern as it has killed more than 18,000 people worldwide so far. Studies that have found cross-neutralizing antibodies between the 1918 and 2009 pandemic flu elicit a basis of pre-existing immunity against the 2009 H1N1 virus in old population. The cross-reactivity occurs due to conserved antigenic epitopes shared between the two pandemic viruses. However, evolutionary mutation can enable the virus to elude human immunity system, making these antibodies probably no longer effective. In our study, we found that a possible mutation in B-cell epitope (the sequence PNHDSNKG) could be the chance for the virus to escape the 1918 antibody recognition. Hence, this finding can be helpful for further vaccine designs against the H1N1 2009 influenza A virus.
2009年H1N1流感大流行引起了全球关注,迄今为止全球已有1.8万多人死亡。研究发现,在1918年和2009年流感大流行之间存在交叉中和抗体,这为老年人对2009年H1N1病毒的预先免疫奠定了基础。交叉反应性的发生是由于两种大流行病毒之间共享保守的抗原表位。然而,进化突变可以使病毒避开人体免疫系统,使这些抗体可能不再有效。在我们的研究中,我们发现b细胞表位(序列PNHDSNKG)的可能突变可能是病毒逃避1918抗体识别的机会。因此,这一发现可能有助于进一步设计针对2009年H1N1甲型流感病毒的疫苗。
{"title":"A possible mutation that enables H1N1 influenza a virus to escape antibody recognition","authors":"C. Su, S. D. Handoko, C. Kwoh, C. Schönbach, X. Li","doi":"10.1109/BIBM.2010.5706541","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706541","url":null,"abstract":"The H1N1 influenza A 2009 pandemic caused a global concern as it has killed more than 18,000 people worldwide so far. Studies that have found cross-neutralizing antibodies between the 1918 and 2009 pandemic flu elicit a basis of pre-existing immunity against the 2009 H1N1 virus in old population. The cross-reactivity occurs due to conserved antigenic epitopes shared between the two pandemic viruses. However, evolutionary mutation can enable the virus to elude human immunity system, making these antibodies probably no longer effective. In our study, we found that a possible mutation in B-cell epitope (the sequence PNHDSNKG) could be the chance for the virus to escape the 1918 antibody recognition. Hence, this finding can be helpful for further vaccine designs against the H1N1 2009 influenza A virus.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134372834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An automatic procedure to search highly repetitive sequences in genome as fluorescence in situ hybridization probes and its application on Brachypodium distachyon 荧光原位杂交探针在基因组高重复序列搜索中的应用
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706629
Qiwei Li, Tong Liang, Xiaodan Fan, Chunhui Xu, Weichang Yu, S. Li
Fluorescence in situ hybridization (FISH) is a powerful technique that localizes specific DNA sequences on chromosomes for use in physical and genetic maps assembling, genetic counselling, species identification, etc. Highly repetitive sequences are considered to be suitable FISH probes that can avoid many potential problems of using unique sequences as FISH probes. The distinct chromosomal distributions of these highly repetitive sequences are also ideal for labelling purposes such as karyotyping. In this paper, we present an automatic computational procedure for searching highly repetitive sequences from a whole genome as FISH probes, as well as an experimental protocol to use them in FISH analysis. We successfully applied the method on the newly released genome of Brachypodium distachyon (Brachypodium) and produced satisfactory results of FISH experiment.
荧光原位杂交(FISH)是一种强大的技术,可以定位染色体上特定的DNA序列,用于物理和遗传图谱的组装,遗传咨询,物种鉴定等。高度重复序列被认为是合适的FISH探针,可以避免使用独特序列作为FISH探针的许多潜在问题。这些高度重复序列的独特染色体分布也非常适合用于标记目的,如核型分析。在本文中,我们提出了一种自动计算程序,用于从全基因组中搜索高重复序列作为FISH探针,以及一种实验方案,将它们用于FISH分析。我们成功地将该方法应用于新发布的短掌菜(Brachypodium)基因组,并取得了令人满意的FISH实验结果。
{"title":"An automatic procedure to search highly repetitive sequences in genome as fluorescence in situ hybridization probes and its application on Brachypodium distachyon","authors":"Qiwei Li, Tong Liang, Xiaodan Fan, Chunhui Xu, Weichang Yu, S. Li","doi":"10.1109/BIBM.2010.5706629","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706629","url":null,"abstract":"Fluorescence in situ hybridization (FISH) is a powerful technique that localizes specific DNA sequences on chromosomes for use in physical and genetic maps assembling, genetic counselling, species identification, etc. Highly repetitive sequences are considered to be suitable FISH probes that can avoid many potential problems of using unique sequences as FISH probes. The distinct chromosomal distributions of these highly repetitive sequences are also ideal for labelling purposes such as karyotyping. In this paper, we present an automatic computational procedure for searching highly repetitive sequences from a whole genome as FISH probes, as well as an experimental protocol to use them in FISH analysis. We successfully applied the method on the newly released genome of Brachypodium distachyon (Brachypodium) and produced satisfactory results of FISH experiment.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133779083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Represented indicator measurement and corpus distillation on focus species detection 介绍了焦点物种检测中的指标测量和语料蒸馏
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706647
Chih-Hsuan Wei, Hung-Yu kao
In extraction of information from the biomedical literature, name disambiguation of domain-specific entities, such as proteins, is one of the most important issues. The entity ambiguity with the highest dimension is the species to which an entity is associated with. Furthermore, one of the bottlenecks in inter-species gene name normalization is species disambiguation. To enhance the performance of species disambiguation, the detection of focus species detection remains a substantial challenge. This study presents a method addressing this issue. The results present evaluations of all articles from the BioCreaTive I&II GN task. Our method is robust for all types of articles, particularly those without explicit species entity information. Since our method requires a training corpus to be the indicator vector, we developed an iterative corpus distillation method to extend the corpus. In the conducted experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% without species entity information.
在生物医学文献信息提取中,特定领域实体(如蛋白质)的名称消歧是最重要的问题之一。具有最高维度的实体歧义是实体所关联的物种。此外,物种间基因名称规范化的瓶颈之一是物种消歧。为了提高物种消歧的性能,焦点物种的检测仍然是一个重大的挑战。本研究提出了一种解决这一问题的方法。结果显示了对BioCreaTive I&II GN任务中所有文章的评估。我们的方法对所有类型的文章都具有鲁棒性,特别是那些没有明确物种实体信息的文章。由于我们的方法需要一个训练语料库作为指示向量,我们开发了一个迭代语料库蒸馏方法来扩展语料库。在已进行的实验中,该方法在不含物种实体信息的情况下,准确率分别达到85.64%和84.32%。
{"title":"Represented indicator measurement and corpus distillation on focus species detection","authors":"Chih-Hsuan Wei, Hung-Yu kao","doi":"10.1109/BIBM.2010.5706647","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706647","url":null,"abstract":"In extraction of information from the biomedical literature, name disambiguation of domain-specific entities, such as proteins, is one of the most important issues. The entity ambiguity with the highest dimension is the species to which an entity is associated with. Furthermore, one of the bottlenecks in inter-species gene name normalization is species disambiguation. To enhance the performance of species disambiguation, the detection of focus species detection remains a substantial challenge. This study presents a method addressing this issue. The results present evaluations of all articles from the BioCreaTive I&II GN task. Our method is robust for all types of articles, particularly those without explicit species entity information. Since our method requires a training corpus to be the indicator vector, we developed an iterative corpus distillation method to extend the corpus. In the conducted experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% without species entity information.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130279320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Global analysis of miRNA target genes in colon rectal cancer 结肠癌miRNA靶基因的全局分析
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706588
M. Pradhan, L. Ledford, Yogesh Pandit, M. Palakal
In this paper we present a global analysis of colon rectal cancer genes and their associated miRNAs. Significant genes in colon cancer were obtained by mining the literature and cancer related miRNAs were obtained from miRbase. Five different features were used to analyze to obtain a global gene-miRNA profile. By combining the topological features along with miRNA-gene associations and gene propensity measures, we identified a set of genes and modules that are significant in CRC. The proposed methodology identified 123 significant modules of miRNA-genes that can be further studied for understanding the disease and marker discovery.
在本文中,我们提出了结肠直肠癌基因及其相关mirna的全球分析。通过文献挖掘获得结肠癌的重要基因,从miRbase中获得与癌症相关的mirna。使用五个不同的特征进行分析,以获得全局基因- mirna谱。通过结合拓扑特征以及mirna -基因关联和基因倾向测量,我们确定了一组在CRC中重要的基因和模块。该方法确定了123个重要的mirna基因模块,可以进一步研究这些模块以了解疾病和发现标记物。
{"title":"Global analysis of miRNA target genes in colon rectal cancer","authors":"M. Pradhan, L. Ledford, Yogesh Pandit, M. Palakal","doi":"10.1109/BIBM.2010.5706588","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706588","url":null,"abstract":"In this paper we present a global analysis of colon rectal cancer genes and their associated miRNAs. Significant genes in colon cancer were obtained by mining the literature and cancer related miRNAs were obtained from miRbase. Five different features were used to analyze to obtain a global gene-miRNA profile. By combining the topological features along with miRNA-gene associations and gene propensity measures, we identified a set of genes and modules that are significant in CRC. The proposed methodology identified 123 significant modules of miRNA-genes that can be further studied for understanding the disease and marker discovery.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133609576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Discovering negative correlated gene sets from integrative gene expression data for cancer prognosis 从综合基因表达数据中发现与癌症预后负相关的基因集
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706615
Tao Zeng, Xuan Guo, Juan Liu
Along with the emergence and development of translational biomedicine, more and more genetic information has been applied in clinical practice. In recent decade, the discovery of genetic biomarkers for cancer prognosis obtains increasing attentions and many methods have been developed. The ”element” methods use one or two independent genes to judge the Boolean status of disease. The ”set” methods use general genetic biomarkers to classify patients into different risks as a whole. And the advanced ”sets” methods use a group of different gene sets as biomarkers. However, the existing methods always concern positive correlations among genes ignoring negative correlations. Whereas the negative regulation, negative feedback, and functional repression are actually the important clues in cancer expression profiles. Therefore, in this paper, we propose to mine negative correlated gene sets (NCGSs) from multiple datasets, and use them along with the pure positive correlated gene sets for prognosis classification. The exploring experimental results have shown the encouraging promotion of cancer prognosis accuracy with NCGSs.
随着转化生物医学的出现和发展,越来越多的遗传信息被应用于临床。近十年来,癌症预后的遗传生物标志物的发现越来越受到重视,并开发了许多方法。“元素”法使用一个或两个独立的基因来判断疾病的布尔状态。“集合”方法使用一般的遗传生物标志物将患者分类为不同的风险。先进的“集”方法使用一组不同的基因集作为生物标志物。然而,现有的方法往往只关注基因间的正相关关系,而忽略了基因间的负相关关系。而负调控、负反馈和功能抑制实际上是癌症表达谱的重要线索。因此,在本文中,我们提出从多个数据集中挖掘负相关基因集(ncgs),并将其与纯正相关基因集一起用于预后分类。探索性实验结果显示ncgs对肿瘤预后准确性的提升令人鼓舞。
{"title":"Discovering negative correlated gene sets from integrative gene expression data for cancer prognosis","authors":"Tao Zeng, Xuan Guo, Juan Liu","doi":"10.1109/BIBM.2010.5706615","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706615","url":null,"abstract":"Along with the emergence and development of translational biomedicine, more and more genetic information has been applied in clinical practice. In recent decade, the discovery of genetic biomarkers for cancer prognosis obtains increasing attentions and many methods have been developed. The ”element” methods use one or two independent genes to judge the Boolean status of disease. The ”set” methods use general genetic biomarkers to classify patients into different risks as a whole. And the advanced ”sets” methods use a group of different gene sets as biomarkers. However, the existing methods always concern positive correlations among genes ignoring negative correlations. Whereas the negative regulation, negative feedback, and functional repression are actually the important clues in cancer expression profiles. Therefore, in this paper, we propose to mine negative correlated gene sets (NCGSs) from multiple datasets, and use them along with the pure positive correlated gene sets for prognosis classification. The exploring experimental results have shown the encouraging promotion of cancer prognosis accuracy with NCGSs.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115429718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A relevance-novelty combined model for genomics search result diversification 基因组学搜索结果多样化的关联-新颖性组合模型
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706654
Xiaoshi Yin, Zhoujun Li, Xiangji Huang, Xiaohua Hu
Traditional retrieval models assume that the relevance of a document is independent of the relevance of other documents. However, this assumption may result in high redundancy and low diversity in a ranked list. In order to provide comprehensive and diverse answers to fulfill biologists' information need, we propose a relevance-novelty combined model, named RelNov model, based on the framework of an undirected graphical model. Experiments conducted on the TREC 2006 and 2007 Genomics collections show that the proposed approach is effective in promoting both diversity and relevance of retrieval ranked lists.
传统的检索模型假设一个文档的相关性独立于其他文档的相关性。然而,这种假设可能导致排名列表中的高冗余和低多样性。为了提供全面多样的答案来满足生物学家的信息需求,我们提出了一种基于无向图形模型框架的相关性-新颖性组合模型,即RelNov模型。在TREC 2006和2007基因组学数据集上进行的实验表明,该方法在提高检索排序列表的多样性和相关性方面是有效的。
{"title":"A relevance-novelty combined model for genomics search result diversification","authors":"Xiaoshi Yin, Zhoujun Li, Xiangji Huang, Xiaohua Hu","doi":"10.1109/BIBM.2010.5706654","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706654","url":null,"abstract":"Traditional retrieval models assume that the relevance of a document is independent of the relevance of other documents. However, this assumption may result in high redundancy and low diversity in a ranked list. In order to provide comprehensive and diverse answers to fulfill biologists' information need, we propose a relevance-novelty combined model, named RelNov model, based on the framework of an undirected graphical model. Experiments conducted on the TREC 2006 and 2007 Genomics collections show that the proposed approach is effective in promoting both diversity and relevance of retrieval ranked lists.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117101910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Gene cluster profile vectors: A novel method to infer functional coupling using both gene proximity and co-occurrence profiles 基因簇谱载体:一种利用基因接近性和共现谱推断功能耦合的新方法
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706530
V. Pejaver, Sun Kim
Proximity-based methods and co-evolution-based phylogenetic profiles methods have been successfully used for the identification of functionally related genes. Proximity-based methods are effective for physically clustered genes while the phylogenetic profiles method is effective for co-occurring gene sets. However, both methods predict many false positives and false negatives. In this paper, we propose the Gene Cluster Profile Vector (GCPV) method, which combines these two methods by using phylogenetic profiles of whole gene clusters. Moreover, the GCPV method is, currently, the only method that allows for the characterization of relationships between gene clusters themselves. The GCPV method groups together reasonably related operons in E. coli about 60% of the time. The method is minimally dependent on the reference genome set used and it outperforms the conventional phylogenetic profiles method. Finally, we show that the method works well for predicted gene clusters from C. crescentus and can serve as an important tool not only for understanding gene function, but also for elucidating mechanisms of general biological processes.
基于接近的方法和基于共同进化的系统发育谱方法已经成功地用于功能相关基因的鉴定。基于接近度的方法对物理聚类基因有效,而系统发育谱方法对共发生基因集有效。然而,这两种方法都预测了许多假阳性和假阴性。在本文中,我们提出了基因簇特征向量(GCPV)方法,该方法利用整个基因簇的系统发育特征将这两种方法结合起来。此外,GCPV方法是目前唯一允许表征基因簇本身之间关系的方法。GCPV方法在大约60%的时间内将大肠杆菌中合理相关的操纵子组合在一起。该方法对参考基因组集的依赖最小,优于传统的系统发育谱方法。最后,我们证明该方法可以很好地预测月牙菇的基因簇,不仅可以作为理解基因功能的重要工具,而且可以作为阐明一般生物学过程机制的重要工具。
{"title":"Gene cluster profile vectors: A novel method to infer functional coupling using both gene proximity and co-occurrence profiles","authors":"V. Pejaver, Sun Kim","doi":"10.1109/BIBM.2010.5706530","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706530","url":null,"abstract":"Proximity-based methods and co-evolution-based phylogenetic profiles methods have been successfully used for the identification of functionally related genes. Proximity-based methods are effective for physically clustered genes while the phylogenetic profiles method is effective for co-occurring gene sets. However, both methods predict many false positives and false negatives. In this paper, we propose the Gene Cluster Profile Vector (GCPV) method, which combines these two methods by using phylogenetic profiles of whole gene clusters. Moreover, the GCPV method is, currently, the only method that allows for the characterization of relationships between gene clusters themselves. The GCPV method groups together reasonably related operons in E. coli about 60% of the time. The method is minimally dependent on the reference genome set used and it outperforms the conventional phylogenetic profiles method. Finally, we show that the method works well for predicted gene clusters from C. crescentus and can serve as an important tool not only for understanding gene function, but also for elucidating mechanisms of general biological processes.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122196440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A gene ranking method using text-mining for the identification of disease related genes 基于文本挖掘的疾病相关基因排序方法
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706616
Hyungmin Lee, Miyoung Shin, Munpyo Hong
For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.
为了鉴定与特定疾病相关的重要基因,微阵列基因表达谱已被广泛用于确定候选基因的优先级。在本文中,我们提出了一种新的基因排序方法,该方法利用从文献中提取的基因关系以及从微阵列中获得的基因表达评分。本文采用结合句法分析和共现方法的混合方法提取基因关系。具体来说,我们对文本进行语法解析,然后在解析句子的每个子句中,认为共同出现的基因名称是相互关联的。通过上述方法得到的基因-基因关系得到的基因网络和基因表达得分作为GeneRank算法的输入。为了评估我们的方法,我们用公开的前列腺癌数据进行了实验。结果表明,该方法在查准率和查全率上均优于基于基因本体标注构建的基因-基因关系的GeneRank方法。此外,我们的基因-基因关系提取的混合方法比现有流行的共现方法更好地优先考虑真正与疾病相关的基因。
{"title":"A gene ranking method using text-mining for the identification of disease related genes","authors":"Hyungmin Lee, Miyoung Shin, Munpyo Hong","doi":"10.1109/BIBM.2010.5706616","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706616","url":null,"abstract":"For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114065648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
GPU-based triangulation of the van der Waals surface 基于gpu的范德华曲面三角剖分
Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706650
Sérgio Dias, A. Gomes
The problem addressed in this paper consists in triangulating the van der Waals surface without computing the geometric intersections of its atoms. Recall that the van der Waals surface is useful in computational molecular biology and biochemistry to, for example, determine the volume occupied by a molecule, as well as other important geometric properties. Assuming that every atom is represented by a ball, this amounts to compute the surface of the union of a number of balls. The novelty of our method lies in avoiding the computation of surface-surface intersections (SSI) of two or more balls.
本文讨论的问题是在不计算原子几何交点的情况下对范德华表面进行三角剖分。回想一下,范德华表面在计算分子生物学和生物化学中很有用,例如,它可以确定分子所占的体积,以及其他重要的几何性质。假设每个原子都用一个球来表示,这相当于计算若干个球的并集表面。该方法的新颖之处在于避免了两个或多个球的表面-表面相交(SSI)的计算。
{"title":"GPU-based triangulation of the van der Waals surface","authors":"Sérgio Dias, A. Gomes","doi":"10.1109/BIBM.2010.5706650","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706650","url":null,"abstract":"The problem addressed in this paper consists in triangulating the van der Waals surface without computing the geometric intersections of its atoms. Recall that the van der Waals surface is useful in computational molecular biology and biochemistry to, for example, determine the volume occupied by a molecule, as well as other important geometric properties. Assuming that every atom is represented by a ball, this amounts to compute the surface of the union of a number of balls. The novelty of our method lies in avoiding the computation of surface-surface intersections (SSI) of two or more balls.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"306 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132056205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1