首页 > 最新文献

In Silico Biology最新文献

英文 中文
Combined classifier for unknown genome classification using chaos game representation features 基于混沌博弈表示特征的未知基因组分类器
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722065
Vrinda V. Nair, A. Nair
Classification of unknown genomes finds wide application in areas like evolutionary studies, bio-diversity researches and forensic studies which are viewed in a renewed 'genomic' perspective, lately. Only a few attempts are seen in literature focusing on unknown genome identification, and the reported accuracies are not more than 85%. Most works report classification into the major kingdoms only, not venturing further into their sub-classes. A novel combined technique of Chaos Game Representation (CGR) and machine learning is proposed, the former for feature extraction and the latter for subsequent sequence classification. Eight sub categories of eukaryotic mitochondrial genomes from NCBI are used for the study. The sequences are initially mapped into their Chaos Game Representation format. Genomic feature extraction is implemented by computing the Frequency Chaos Game Representation (FCGR) matrix. An order 3 FCGR matrix is considered here, which consists of 64 elements. The 64 element matrix acts as the feature descriptor for classification. The classification methods used are Difference Boosting Naïve Bayesian (DBNB) based method, Artificial Neural Network (ANN) based and Support Vector Machine (SVM) based methods. Accuracies of individual methods are reported. Although the average accuracy is seen highest for the SVM-CGR combination, better accuracies are seen for some categories in other methods too. Hence a voting classifier is implemented combining all the three methods. Accuracies of 100% were obtained for Vertebrata and Porifera whereas Acoelomata, Cnidaria and Fungi were classified with accuracies above 90%. The accuracies obtained for Protostomia, Plant, and Pseudocoelomata were respectively 90, 82 and 77%.
未知基因组的分类在进化研究、生物多样性研究和法医研究等领域得到了广泛的应用,这些领域最近从一个新的“基因组”角度来看待。在文献中,只有少数尝试关注未知基因组的鉴定,报道的准确率不超过85%。大多数作品只报告了主要王国的分类,而没有进一步冒险进入它们的子类。提出了一种新的混沌博弈表示(CGR)与机器学习相结合的方法,前者用于特征提取,后者用于后续的序列分类。来自NCBI的真核线粒体基因组的八个亚类被用于研究。这些序列最初被映射成它们的混沌游戏表示格式。通过计算频率混沌博弈表示(FCGR)矩阵实现基因组特征提取。这里考虑一个3阶FCGR矩阵,它由64个元素组成。64个元素矩阵充当分类的特征描述符。使用的分类方法有差分增强Naïve基于贝叶斯(DBNB)的方法、基于人工神经网络(ANN)的方法和基于支持向量机(SVM)的方法。报告了个别方法的准确性。尽管SVM-CGR组合的平均准确率最高,但在其他方法中,某些类别的准确率也更高。因此,将这三种方法结合起来实现一个投票分类器。脊椎动物和多孔动物的分类准确率为100%,而无骨动物、刺胞动物和真菌的分类准确率在90%以上。原气孔虫、植物和假腔虫的准确度分别为90%、82%和77%。
{"title":"Combined classifier for unknown genome classification using chaos game representation features","authors":"Vrinda V. Nair, A. Nair","doi":"10.1145/1722024.1722065","DOIUrl":"https://doi.org/10.1145/1722024.1722065","url":null,"abstract":"Classification of unknown genomes finds wide application in areas like evolutionary studies, bio-diversity researches and forensic studies which are viewed in a renewed 'genomic' perspective, lately. Only a few attempts are seen in literature focusing on unknown genome identification, and the reported accuracies are not more than 85%. Most works report classification into the major kingdoms only, not venturing further into their sub-classes. A novel combined technique of Chaos Game Representation (CGR) and machine learning is proposed, the former for feature extraction and the latter for subsequent sequence classification. Eight sub categories of eukaryotic mitochondrial genomes from NCBI are used for the study. The sequences are initially mapped into their Chaos Game Representation format. Genomic feature extraction is implemented by computing the Frequency Chaos Game Representation (FCGR) matrix. An order 3 FCGR matrix is considered here, which consists of 64 elements. The 64 element matrix acts as the feature descriptor for classification. The classification methods used are Difference Boosting Naïve Bayesian (DBNB) based method, Artificial Neural Network (ANN) based and Support Vector Machine (SVM) based methods. Accuracies of individual methods are reported. Although the average accuracy is seen highest for the SVM-CGR combination, better accuracies are seen for some categories in other methods too. Hence a voting classifier is implemented combining all the three methods. Accuracies of 100% were obtained for Vertebrata and Porifera whereas Acoelomata, Cnidaria and Fungi were classified with accuracies above 90%. The accuracies obtained for Protostomia, Plant, and Pseudocoelomata were respectively 90, 82 and 77%.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722065","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters 微卫星探测软件的对比分析:结果的显著差异和参数的影响
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722068
Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram
Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.
微卫星是在包括细菌和病毒在内的所有已知生物体的基因组序列中发现的一种独特的重复模式。这些重复序列在基因组进化中起着重要作用,与多种疾病有关,已被用作DNA指纹图谱、群体遗传学等方面的分子标记。各种生物信息学工具已经开发出来用于从DNA序列中提取微卫星。然而,并非所有工具都能识别灵敏度相似的微型卫星,因此,根据所使用工具的类型,关于微型卫星的研究在结果和解释方面可能存在重大偏差。为了清楚地了解微卫星提取的固有局限性和偏差,特别是在程序参数阈值变化的影响下,我们使用一些测试DNA序列对一些广泛使用的工具的性能进行了比较分析。采用常用的微卫星提取工具TRF、Sputnik、SciRoKoCo和IMEx,分别从大肠杆菌基因组、秀丽隐杆线虫染色体I和果蝇染色体X三个不同序列中提取不完善的微卫星,并对提取结果进行分析。我们观察到,即使使用默认/建议参数,这些工具提取的微卫星数量也有显著变化。参数值的松弛导致检测到的重复次数增加,但结果之间的差异仍然存在。在TRF, Sputnik和SciRoKoCo中观察到,不匹配的数量随着重复的束长度的增加而增加,这表明不完善的水平在整个重复中并不均匀。本研究调查的四种工具在它们的算法、它们使用的参数以及因此检测到的微卫星数量方面有所不同。与IMEx相比,基于评分的程序识别出更多的五和六核苷酸重复序列。因此,我们建议谨慎地适当改变参数,以探测尽可能多的微卫星,以此作为不错过任何真正的重复区域的手段,或使用一个以上的工具作为获得良好共识的手段。我们还对所有微卫星提取工具的可用特性进行了详细的调查。除了算法、效率和参数方面的差异外,这些工具在功能和灵活性方面也存在很大差异。
{"title":"Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters","authors":"Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram","doi":"10.1145/1722024.1722068","DOIUrl":"https://doi.org/10.1145/1722024.1722068","url":null,"abstract":"Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722068","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Predicting protein-protein interactions using first principle methods and statistical scoring 使用第一性原理方法和统计评分预测蛋白质相互作用
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722038
M. Pradhan, P. Gandra, M. Palakal
Proteins are a combination of different PDB structures. To understand the interactions of the proteins, we have proposed a methodology that integrates the first principle parameters for protein interaction along with the number of PDB structures defining these proteins. Annotating possibly interacting proteins pairs with their Pfam and GO domains increases the strength of each interaction and can identify the important link between the two proteins. We propose a novel technique to predict protein interactions by integrating a protein's physico-chemical properties and the number of PDB structures that uses sliding window algorithm to compute the optimal interacting score. The proposed method identified ~94% true prediction from a known set of interacting protein dataset and a 100% prediction for non-interacting dataset. The prediction model that was developed was applied to an unknown protein dataset and we identified a novel interacting protein pairs with high relevance.
蛋白质是不同PDB结构的组合。为了理解蛋白质的相互作用,我们提出了一种方法,该方法将蛋白质相互作用的第一原理参数与定义这些蛋白质的PDB结构的数量结合起来。对可能相互作用的蛋白质对及其Pfam和GO结构域进行注释可以增加每种相互作用的强度,并可以确定两种蛋白质之间的重要联系。我们提出了一种新的技术,通过整合蛋白质的物理化学性质和PDB结构的数量来预测蛋白质相互作用,并使用滑动窗口算法来计算最佳相互作用分数。该方法对已知相互作用蛋白质数据集的预测准确率为94%,对非相互作用蛋白质数据集的预测准确率为100%。将所建立的预测模型应用于一个未知的蛋白质数据集,我们确定了一个新的具有高相关性的相互作用蛋白质对。
{"title":"Predicting protein-protein interactions using first principle methods and statistical scoring","authors":"M. Pradhan, P. Gandra, M. Palakal","doi":"10.1145/1722024.1722038","DOIUrl":"https://doi.org/10.1145/1722024.1722038","url":null,"abstract":"Proteins are a combination of different PDB structures. To understand the interactions of the proteins, we have proposed a methodology that integrates the first principle parameters for protein interaction along with the number of PDB structures defining these proteins. Annotating possibly interacting proteins pairs with their Pfam and GO domains increases the strength of each interaction and can identify the important link between the two proteins. We propose a novel technique to predict protein interactions by integrating a protein's physico-chemical properties and the number of PDB structures that uses sliding window algorithm to compute the optimal interacting score. The proposed method identified ~94% true prediction from a known set of interacting protein dataset and a 100% prediction for non-interacting dataset. The prediction model that was developed was applied to an unknown protein dataset and we identified a novel interacting protein pairs with high relevance.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identifying the nature of the interface in protein-protein complexes 鉴定蛋白质-蛋白质复合物界面的性质
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722040
Pralay Mitra
The role of molecular recognition is critical to the proper self-assembly of biological macromolecules and their function. Shape complementarity of the mutual recognition interfaces is one of the important factors that guide this interaction. The lock-and-key mechanism involving enzyme-substrate is a classical hallmark of shape complementarities at work in biochemical reaction. Recognition principles between macromolecular entities, however, has been difficult formulate. Sensitive surface complementarity recognition algorithms are computationally prohibitive, while accuracy of the heuristic methods is limited by the choice of proper biochemical information. This is a major drawback in understanding macromolecular recognition which entails critical assessment of biochemical information involving large interacting interfaces. Here we data mine on a number of biochemical parameters to highlight their individual merits and demerits and propose specific properties suitable for designing heuristic algorithms. The work is expected to find utility within bioinformatics algorithms seeking docking macromolecules and designing of protein complex interfaces.
分子识别的作用对生物大分子的正确自组装及其功能至关重要。互认界面的形状互补性是指导这种交互的重要因素之一。涉及酶-底物的锁-钥匙机制是生物化学反应中形状互补的经典标志。然而,大分子实体之间的识别原理一直难以表述。敏感的表面互补识别算法在计算上是禁止的,而启发式方法的准确性受到选择适当的生化信息的限制。这是理解大分子识别的一个主要缺点,大分子识别需要对涉及大型相互作用界面的生化信息进行关键评估。在这里,我们对一些生化参数进行数据挖掘,以突出它们各自的优点和缺点,并提出适合设计启发式算法的特定属性。这项工作有望在寻求对接大分子和设计蛋白质复合物界面的生物信息学算法中找到实用价值。
{"title":"Identifying the nature of the interface in protein-protein complexes","authors":"Pralay Mitra","doi":"10.1145/1722024.1722040","DOIUrl":"https://doi.org/10.1145/1722024.1722040","url":null,"abstract":"The role of molecular recognition is critical to the proper self-assembly of biological macromolecules and their function. Shape complementarity of the mutual recognition interfaces is one of the important factors that guide this interaction. The lock-and-key mechanism involving enzyme-substrate is a classical hallmark of shape complementarities at work in biochemical reaction. Recognition principles between macromolecular entities, however, has been difficult formulate. Sensitive surface complementarity recognition algorithms are computationally prohibitive, while accuracy of the heuristic methods is limited by the choice of proper biochemical information. This is a major drawback in understanding macromolecular recognition which entails critical assessment of biochemical information involving large interacting interfaces. Here we data mine on a number of biochemical parameters to highlight their individual merits and demerits and propose specific properties suitable for designing heuristic algorithms. The work is expected to find utility within bioinformatics algorithms seeking docking macromolecules and designing of protein complex interfaces.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722040","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DNA barcoding and microsatellite marker development for Nyctibatrachus major: the threatened amphibian species 两栖濒危物种大鲵DNA条形码及微卫星标记的开发
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722029
K. Meenakshi, R. Remya, G. Sanil
Identifying species of organisms by using molecular and bioinformatics tools has been in the center of ongoing discussions on the conservation genetics field. The resolution of taxonomic uncertainties is a necessary step to distinguish entities for conservation purposes. In an effort to contribute to resolving this taxonomic uncertainty and to assess the genetic population structure of the taxon Nyctibatrachus major, we barcoded COI gene for species identification and also developed species specific primers for microsatellite markers to assess the population dynamics among Nyctibatrachus major population. The current work is a part of the ongoing programs on the conservation genetics of endemic fauna of Western Ghats.
利用分子和生物信息学工具鉴定生物物种一直是保护遗传学领域讨论的焦点。解决分类上的不确定性是为了保护目的而区分实体的必要步骤。为了解决这一分类学上的不确定性,并对大夜蛾分类单元的遗传群体结构进行评估,我们对COI基因进行了条形码鉴定,并开发了物种特异性引物用于微卫星标记,以评估大夜蛾种群间的种群动态。目前的工作是正在进行的西高止山脉特有动物遗传保护计划的一部分。
{"title":"DNA barcoding and microsatellite marker development for Nyctibatrachus major: the threatened amphibian species","authors":"K. Meenakshi, R. Remya, G. Sanil","doi":"10.1145/1722024.1722029","DOIUrl":"https://doi.org/10.1145/1722024.1722029","url":null,"abstract":"Identifying species of organisms by using molecular and bioinformatics tools has been in the center of ongoing discussions on the conservation genetics field. The resolution of taxonomic uncertainties is a necessary step to distinguish entities for conservation purposes. In an effort to contribute to resolving this taxonomic uncertainty and to assess the genetic population structure of the taxon Nyctibatrachus major, we barcoded COI gene for species identification and also developed species specific primers for microsatellite markers to assess the population dynamics among Nyctibatrachus major population. The current work is a part of the ongoing programs on the conservation genetics of endemic fauna of Western Ghats.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Random walk ranking guided by disease association networks for lung cancer biomarker discovery 基于疾病关联网络的肺癌生物标志物发现随机行走排序
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722062
T. Huan, Xiaogang Wu, Zengliang Bai, J. Chen
The identification of candidate molecular entities involved in a specific disease has been a primary focus of cancer study on biomarker discovery. Prioritizing proteins from a disease-specific protein-protein interaction (PPI) network has become an efficient computational strategy for cancer biomarker discovery. Although some successful methods, such as random walk ranking (RWR) algorithm, can exploit global network topology to prioritize proteins, this network-based computational strategy still needs more comprehensive prior knowledge, like genome-wide association study (GWAS), to improve its discovering capability. In this paper, we first analyzed genome-wide association loci for human diseases, and built disease association networks (DAN), whose associations were defined by two diseases sharing common genetic variants. Then we assigned each node in a human PPI network a disease-specific weight, based on knowledge from the DANs and text mining. Finally, we presented a seed-weighted random walk ranking (SW-RWR) method to prioritize biomarkers in the global human PPI network. We used a lung cancer case study to show that our ranking strategy has better accuracy and sensitivity in discovering potential clinically-useful; biomarkers than a similar network-based ranking method. This result suggests that close association among different diseases could play an important role in biomarker discovery.
识别与特定疾病相关的候选分子实体一直是癌症研究中生物标志物发现的主要焦点。从疾病特异性蛋白蛋白相互作用(PPI)网络中对蛋白质进行优先排序已成为发现癌症生物标志物的有效计算策略。虽然一些成功的方法,如随机行走排序(RWR)算法,可以利用全局网络拓扑对蛋白质进行优先排序,但这种基于网络的计算策略仍然需要更全面的先验知识,如全基因组关联研究(GWAS),以提高其发现能力。本文首先分析了人类疾病的全基因组关联位点,并建立了疾病关联网络(DAN),该网络的关联由两种具有共同遗传变异的疾病来定义。然后,我们根据来自dan和文本挖掘的知识,为人类PPI网络中的每个节点分配疾病特异性权重。最后,我们提出了一种种子加权随机漫步排序(SW-RWR)方法,在全球人类PPI网络中对生物标志物进行优先排序。我们使用了一个肺癌病例研究来证明我们的排名策略在发现潜在的临床有用性方面具有更好的准确性和敏感性;生物标志物比类似的基于网络的排名方法。这一结果表明,不同疾病之间的密切联系可能在生物标志物的发现中发挥重要作用。
{"title":"Random walk ranking guided by disease association networks for lung cancer biomarker discovery","authors":"T. Huan, Xiaogang Wu, Zengliang Bai, J. Chen","doi":"10.1145/1722024.1722062","DOIUrl":"https://doi.org/10.1145/1722024.1722062","url":null,"abstract":"The identification of candidate molecular entities involved in a specific disease has been a primary focus of cancer study on biomarker discovery. Prioritizing proteins from a disease-specific protein-protein interaction (PPI) network has become an efficient computational strategy for cancer biomarker discovery. Although some successful methods, such as random walk ranking (RWR) algorithm, can exploit global network topology to prioritize proteins, this network-based computational strategy still needs more comprehensive prior knowledge, like genome-wide association study (GWAS), to improve its discovering capability.\u0000 In this paper, we first analyzed genome-wide association loci for human diseases, and built disease association networks (DAN), whose associations were defined by two diseases sharing common genetic variants. Then we assigned each node in a human PPI network a disease-specific weight, based on knowledge from the DANs and text mining. Finally, we presented a seed-weighted random walk ranking (SW-RWR) method to prioritize biomarkers in the global human PPI network. We used a lung cancer case study to show that our ranking strategy has better accuracy and sensitivity in discovering potential clinically-useful; biomarkers than a similar network-based ranking method. This result suggests that close association among different diseases could play an important role in biomarker discovery.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722062","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Gibbs sampling algorithm for motif discovery using a linear mixed model 基于线性混合模型的基序发现Gibbs采样算法
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722053
Daming Lu
The identification of motifs in the gene promoters is a critical step in the delineation of the genetic regulatory framework of an organism. In this paper, a new linear mixed model is introduced. This model is a combination of the conventional Position Weight Matrix (PWM) model and a novel Mutual Information (MI) model. PWM can contain individual position frequencies whereas MI can reflect pair wise relation between positions. A training stage is carried out to determine the weight of each model. After that this trained model is embedded into a Gibbs sampling algorithm for motif discovery. After analyzing a set of DNA sequences using this program, putative motifs are gained and compared with experimental verified motifs as well as other popular motif finding software. Results show that this new mixed model can improve motif discovery accuracy to some extent.
基因启动子中基序的鉴定是描述生物体遗传调控框架的关键步骤。本文提出了一种新的线性混合模型。该模型结合了传统的位置权重矩阵(PWM)模型和一种新的互信息(MI)模型。PWM可以包含单个位置频率,而MI可以反映位置之间的成对关系。通过一个训练阶段来确定每个模型的权重。然后,将这个训练好的模型嵌入到吉布斯采样算法中进行基序发现。利用该程序对一组DNA序列进行分析后,得到假定的基序,并与实验验证的基序以及其他流行的基序查找软件进行比较。结果表明,该混合模型在一定程度上提高了基序发现的精度。
{"title":"A Gibbs sampling algorithm for motif discovery using a linear mixed model","authors":"Daming Lu","doi":"10.1145/1722024.1722053","DOIUrl":"https://doi.org/10.1145/1722024.1722053","url":null,"abstract":"The identification of motifs in the gene promoters is a critical step in the delineation of the genetic regulatory framework of an organism. In this paper, a new linear mixed model is introduced. This model is a combination of the conventional Position Weight Matrix (PWM) model and a novel Mutual Information (MI) model. PWM can contain individual position frequencies whereas MI can reflect pair wise relation between positions. A training stage is carried out to determine the weight of each model. After that this trained model is embedded into a Gibbs sampling algorithm for motif discovery. After analyzing a set of DNA sequences using this program, putative motifs are gained and compared with experimental verified motifs as well as other popular motif finding software. Results show that this new mixed model can improve motif discovery accuracy to some extent.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722053","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Algebraic approach to optimal clone selection applied in metagenomic projects 元基因组项目中最优克隆选择的代数方法
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722066
M. Cantão, L. V. de Araújo, E. G. Lemos, J. E. Ferreira
Due to the wide diversity of unknown organisms in the environment, 99% of them cannot be grown in traditional culture medium in laboratories. Therefore, metagenomics projects are proposed to study microbial communities present in the environment, from molecular techniques, especially the sequencing. Thereby, for the coming years it is expected an accumulation of sequences produced by these projects. Thus, the sequences produced by genomics and metagenomics projects present several challenges for the treatment, storing and analysis such as: the search for clones containing genes of interest. This work presents the OCI Metagenomics, which allows defines and manages dynamically the rules of clone selection in metagenomic libraries, thought an algebraic approach based on process algebra. Furthermore, a web interface was developed to allow researchers to easily create and execute their own rules to select clones in genomic sequence database. This software has been tested in metagenomic cosmid library and it was able to select clones containing genes of interest.
由于环境中未知生物种类繁多,99%的未知生物无法在实验室的传统培养基中生长。因此,提出了宏基因组学项目来研究存在于环境中的微生物群落,从分子技术,特别是测序。因此,在未来几年,预计这些项目产生的序列将不断积累。因此,基因组学和宏基因组学项目产生的序列对治疗、存储和分析提出了一些挑战,例如:寻找包含感兴趣基因的克隆。本文提出了基于过程代数的OCI宏基因组学方法,该方法允许动态定义和管理宏基因组库中的克隆选择规则。此外,还开发了一个web界面,使研究人员可以轻松地创建和执行自己的规则,以选择基因组序列数据库中的克隆。该软件已在宏基因组库中进行了测试,并能够选择含有感兴趣基因的克隆。
{"title":"Algebraic approach to optimal clone selection applied in metagenomic projects","authors":"M. Cantão, L. V. de Araújo, E. G. Lemos, J. E. Ferreira","doi":"10.1145/1722024.1722066","DOIUrl":"https://doi.org/10.1145/1722024.1722066","url":null,"abstract":"Due to the wide diversity of unknown organisms in the environment, 99% of them cannot be grown in traditional culture medium in laboratories. Therefore, metagenomics projects are proposed to study microbial communities present in the environment, from molecular techniques, especially the sequencing. Thereby, for the coming years it is expected an accumulation of sequences produced by these projects. Thus, the sequences produced by genomics and metagenomics projects present several challenges for the treatment, storing and analysis such as: the search for clones containing genes of interest. This work presents the OCI Metagenomics, which allows defines and manages dynamically the rules of clone selection in metagenomic libraries, thought an algebraic approach based on process algebra. Furthermore, a web interface was developed to allow researchers to easily create and execute their own rules to select clones in genomic sequence database. This software has been tested in metagenomic cosmid library and it was able to select clones containing genes of interest.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722066","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Gene regulatory network from microarray data using dynamic neural fuzzy approach 基因调控网络从微阵列数据采用动态神经模糊方法
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722044
S. Vineetha, C. Chandra Shekara Bhat, S. M. Idicula
The paper presents a multilayered dynamic neural fuzzy network (DNFN) to extract regulatory relationship among genes and reconstruct gene regulatory network for circulating plasma RNA data from colon cancer patients. This method combines the merits of connectionist and fuzzy approaches. It encodes the knowledge learned in the form of fuzzy rules and processes data following fuzzy reasoning principles. While the dynamic aspect of gene regulation was taken into account through the on-line learning of fuzzy rules, the structural learning together with the parameter learning form a fast learning algorithm for building a small, yet powerful, dynamic neural fuzzy network. One of the main advantages of DNFN is that there is no predetermination of hidden nodes, since it can find its optimal structure automatically and quickly. The inferred knowledge using the above network may provide biological insights that can be used to design and interpret further experiments.
本文采用多层动态神经模糊网络(DNFN)提取结肠癌患者循环血浆RNA数据的基因调控关系,重构基因调控网络。该方法结合了连接方法和模糊方法的优点。它以模糊规则的形式对所学知识进行编码,并按照模糊推理原则对数据进行处理。通过模糊规则的在线学习,考虑了基因调控的动态方面,结构学习和参数学习形成了快速学习算法,构建了一个小而强大的动态神经模糊网络。DNFN的主要优点之一是不需要预先确定隐藏节点,可以自动快速地找到其最优结构。使用上述网络推断出的知识可能提供生物学见解,可用于设计和解释进一步的实验。
{"title":"Gene regulatory network from microarray data using dynamic neural fuzzy approach","authors":"S. Vineetha, C. Chandra Shekara Bhat, S. M. Idicula","doi":"10.1145/1722024.1722044","DOIUrl":"https://doi.org/10.1145/1722024.1722044","url":null,"abstract":"The paper presents a multilayered dynamic neural fuzzy network (DNFN) to extract regulatory relationship among genes and reconstruct gene regulatory network for circulating plasma RNA data from colon cancer patients. This method combines the merits of connectionist and fuzzy approaches. It encodes the knowledge learned in the form of fuzzy rules and processes data following fuzzy reasoning principles. While the dynamic aspect of gene regulation was taken into account through the on-line learning of fuzzy rules, the structural learning together with the parameter learning form a fast learning algorithm for building a small, yet powerful, dynamic neural fuzzy network. One of the main advantages of DNFN is that there is no predetermination of hidden nodes, since it can find its optimal structure automatically and quickly. The inferred knowledge using the above network may provide biological insights that can be used to design and interpret further experiments.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving motif refinement using hybrid expectation maximization and random projection 利用混合期望最大化和随机投影改进基序优化
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722048
H. S. Shashidhara, Prince Joseph, K. Srinivasa
The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling are sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. A novel optimization framework searches the neighborhood regions of the initial alignments in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. The work aims at implementing the hybrid algorithm and enhancing it by trying different global methods and other techniques. Then aggregation methods rather than projection methods are tried.
基序查找问题的主要目标是在一组序列中检测出新颖的、过度表示的未知信号。期望最大化(EM)和吉布斯抽样等流行算法对初始猜测很敏感,并且很快收敛到最近的局部最大值。一种新的优化框架以系统的方式搜索初始排列的邻域,以探索多个局部最优解。通过将原优化问题转化为相应的动力系统,并估计局部最大值的实际稳定边界,实现了有效的搜索。该工作旨在实现混合算法,并通过尝试不同的全局方法和其他技术来增强混合算法。然后尝试聚合法而不是投影法。
{"title":"Improving motif refinement using hybrid expectation maximization and random projection","authors":"H. S. Shashidhara, Prince Joseph, K. Srinivasa","doi":"10.1145/1722024.1722048","DOIUrl":"https://doi.org/10.1145/1722024.1722048","url":null,"abstract":"The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling are sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. A novel optimization framework searches the neighborhood regions of the initial alignments in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. The work aims at implementing the hybrid algorithm and enhancing it by trying different global methods and other techniques. Then aggregation methods rather than projection methods are tried.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
In Silico Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1