International Journal of Data Mining and Bioinformatics最新文献

英文中文

A method for extracting task-oriented information from biological text sources. 一种从生物文本源中提取面向任务信息的方法。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.070072

Dhanasekaran Kuttiyapillai, R Rajeswari

A method for information extraction which processes the unstructured data from document collection has been introduced. A dynamic programming technique adopted to find relevant genes from sequences which are longest and accurate is used for finding matching sequences and identifying effects of various factors. The proposed method could handle complex information sequences which give different meanings in different situations, eliminating irrelevant information. The text contents were pre-processed using a general-purpose method and were applied with entity tagging component. The bottom-up scanning of key-value pairs improves content finding to generate relevant sequences to the testing task. This paper highlights context-based extraction method for extracting food safety information, which is identified from articles, guideline documents and laboratory results. The graphical disease model verifies weak component through utilisation of development data set. This improves the accuracy of information retrieval in biological text analysis and reporting applications.

介绍了一种对文档集合中的非结构化数据进行处理的信息提取方法。采用动态规划技术从最长和最精确的序列中寻找相关基因，寻找匹配序列，识别各种因素的影响。该方法可以处理在不同情况下具有不同含义的复杂信息序列，消除不相关信息。采用通用方法对文本内容进行预处理，并应用实体标注组件。键值对的自底向上扫描改进了内容查找，从而生成与测试任务相关的序列。本文重点介绍了基于上下文的食品安全信息提取方法，从文章、指南文件和实验室结果中识别食品安全信息。图形化疾病模型利用发展数据集对弱组分进行验证。这提高了生物文本分析和报告应用程序中信息检索的准确性。

{"title":"A method for extracting task-oriented information from biological text sources.","authors":"Dhanasekaran Kuttiyapillai, R Rajeswari","doi":"10.1504/ijdmb.2015.070072","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.070072","url":null,"abstract":"A method for information extraction which processes the unstructured data from document collection has been introduced. A dynamic programming technique adopted to find relevant genes from sequences which are longest and accurate is used for finding matching sequences and identifying effects of various factors. The proposed method could handle complex information sequences which give different meanings in different situations, eliminating irrelevant information. The text contents were pre-processed using a general-purpose method and were applied with entity tagging component. The bottom-up scanning of key-value pairs improves content finding to generate relevant sequences to the testing task. This paper highlights context-based extraction method for extracting food safety information, which is identified from articles, guideline documents and laboratory results. The graphical disease model verifies weak component through utilisation of development data set. This improves the accuracy of information retrieval in biological text analysis and reporting applications.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 4","pages":"387-99"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.070072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34192164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Granular support vector machine to identify unknown structural classes of protein. 颗粒支持向量机识别未知结构类型的蛋白质。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.070065

Rohayanti Hassan, Razib M Othman, Zuraini A Shah

To date, classification of structural class using local protein structure rather than the whole structure has been gaining widespread attention. It is noted that the structural class lies in local composition or arrangement of secondary structure, while the threshold-based classification method has restricted rules in determining these structural classes. As a consequence, some of the structures are unknown. In order to determine these unknown structural classes, we propose a fusion algorithm, abbreviated as GSVM-SigLpsSCPred (Granular Support Vector Machine--with Significant Local protein structure for Structural Class Prediction), which consists of two major components, which are: optimal local protein structure to represent the feature vector and granular support vector machine to predict the unknown structural classes. The results highlight the performance of GSVM-SigLpsSCPred as an alternative computational method for low-identity sequences.

目前，利用蛋白质局部结构而非整体结构进行结构类分类已受到广泛关注。指出结构类存在于二级结构的局部组成或排列中，而基于阈值的分类方法在确定这些结构类时存在规则限制。因此，有些结构是未知的。为了确定这些未知的结构类别，我们提出了一种融合算法，简称为GSVM-SigLpsSCPred (Granular Support Vector Machine- with Significant Local protein structure for structural Class Prediction)，该算法由两大部分组成，即最优局部蛋白质结构表示特征向量和颗粒支持向量机预测未知结构类别。结果表明，GSVM-SigLpsSCPred作为一种低恒等序列的替代计算方法具有良好的性能。

引用次数: 0

A system biology approach for understanding the miRNA regulatory network in colon rectal cancer. 用系统生物学方法了解结肠癌miRNA调控网络。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.066332

Meeta Pradhan, Kshithija Nagulapalli, Lakenvia Ledford, Yogesh Pandit, Mathew Palakal

In this paper we present a systems biology approach to the understanding of the miRNA-regulatory network in colon rectal cancer. An initial set of significant genes in Colon Rectal Cancer (CRC) were obtained by mining relevant literature. An initial set of cancer-related miRNAs were obtained from three databases: miRBase, miRWalk, Targetscan and GEO microarray experiment. First principle methods were then used to generate the global miRNA-gene network. Significant miRNAs and associated transcription factors in the global miRNA-gene network were identified using topological and sub-graph analyses. Eleven novel miRNAs were identified and three of the novel miRNAs, hsa-miR-630, hsa-miR-100 and hsa-miR-99a, were further analysed to elucidate their role in CRC. The proposed methodology effectively made use of literature data and was able to show novel, significant miRNA-transcription associations in CRC.

在本文中，我们提出了一种系统生物学方法来理解结肠癌中的mirna调控网络。通过对相关文献的挖掘，初步获得了结直肠癌(CRC)的一组重要基因。从miRBase、miRWalk、Targetscan和GEO微阵列实验三个数据库中获得了一组初始的癌症相关mirna。然后使用第一性原理方法生成全球mirna -基因网络。通过拓扑和子图分析，确定了全球mirna -基因网络中的重要mirna和相关转录因子。我们鉴定了11个新的mirna，并进一步分析了其中3个新的mirna, hsa-miR-630, hsa-miR-100和hsa-miR-99a，以阐明它们在结直肠癌中的作用。所提出的方法有效地利用了文献数据，并能够显示CRC中新颖、显著的mirna转录关联。

引用次数: 4

Discovery of phenotypic networks from genotypic association studies with application to obesity. 从基因型关联研究中发现表型网络与肥胖症的应用。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069414

Christine W Duarte, Yann C Klimentidis, Jacqueline J Harris, Michelle Cardel, José R Fernández

Genome-wide Association Studies (GWAS) have resulted in many discovered risk variants for several obesity-related traits. However, before clinical relevance of these discoveries can be achieved, molecular or physiological mechanisms of these risk variants needs to be discovered. One strategy is to perform data mining of phenotypically-rich data sources such as those present in dbGAP (database of Genotypes and Phenotypes) for hypothesis generation. Here we propose a technique that combines the power of existing Bayesian Network (BN) learning algorithms with the statistical rigour of Structural Equation Modelling (SEM) to produce an overall phenotypic network discovery system with optimal properties. We illustrate our method using the analysis of a candidate SNP data set from the AMERICO sample, a multi-ethnic cross-sectional cohort of roughly 300 children with detailed obesity-related phenotypes. We demonstrate our approach by showing genetic mechanisms for three obesity-related SNPs.

全基因组关联研究(GWAS)已经发现了许多与肥胖相关的特征的风险变异。然而，在实现这些发现的临床相关性之前，需要发现这些风险变异的分子或生理机制。一种策略是对表型丰富的数据源进行数据挖掘，例如dbGAP(基因型和表型数据库)中存在的数据，以生成假设。在这里，我们提出了一种技术，将现有贝叶斯网络(BN)学习算法的力量与结构方程建模(SEM)的统计严谨性相结合，以产生具有最佳性能的整体表型网络发现系统。我们通过分析来自AMERICO样本的候选SNP数据集来说明我们的方法，AMERICO样本是一个多种族的横断面队列，大约有300名具有详细的肥胖相关表型的儿童。我们通过展示三个肥胖相关snp的遗传机制来证明我们的方法。

{"title":"Discovery of phenotypic networks from genotypic association studies with application to obesity.","authors":"Christine W Duarte, Yann C Klimentidis, Jacqueline J Harris, Michelle Cardel, José R Fernández","doi":"10.1504/ijdmb.2015.069414","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069414","url":null,"abstract":"Genome-wide Association Studies (GWAS) have resulted in many discovered risk variants for several obesity-related traits. However, before clinical relevance of these discoveries can be achieved, molecular or physiological mechanisms of these risk variants needs to be discovered. One strategy is to perform data mining of phenotypically-rich data sources such as those present in dbGAP (database of Genotypes and Phenotypes) for hypothesis generation. Here we propose a technique that combines the power of existing Bayesian Network (BN) learning algorithms with the statistical rigour of Structural Equation Modelling (SEM) to produce an overall phenotypic network discovery system with optimal properties. We illustrate our method using the analysis of a candidate SNP data set from the AMERICO sample, a multi-ethnic cross-sectional cohort of roughly 300 children with detailed obesity-related phenotypes. We demonstrate our approach by showing genetic mechanisms for three obesity-related SNPs.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 2","pages":"129-43"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069414","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Sequence-based protein superfamily classification using computational intelligence techniques: a review. 基于序列的蛋白质超家族分类使用计算智能技术:综述。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067957

Swati Vipsita, Santanu Kumar Rath

Protein superfamily classification deals with the problem of predicting the family membership of newly discovered amino acid sequence. Although many trivial alignment methods are already developed by previous researchers, but the present trend demands the application of computational intelligent techniques. As there is an exponential growth in size of biological database, retrieval and inference of essential knowledge in the biological domain become a very cumbersome task. This problem can be easily handled using intelligent techniques due to their ability of tolerance for imprecision, uncertainty, approximate reasoning, and partial truth. This paper discusses the various global and local features extracted from full length protein sequence which are used for the approximation and generalisation of the classifier. The various parameters used for evaluating the performance of the classifiers are also discussed. Therefore, this review article can show right directions to the present researchers to make an improvement over the existing methods.

蛋白质超家族分类处理的是预测新发现的氨基酸序列的家族成员问题。虽然前人已经开发了许多琐碎的对齐方法，但目前的趋势要求应用计算智能技术。随着生物数据库规模呈指数级增长，生物领域基本知识的检索和推理成为一项非常繁琐的任务。由于智能技术能够容忍不精确、不确定性、近似推理和部分真理，因此可以很容易地处理这个问题。本文讨论了从全长蛋白质序列中提取的各种全局和局部特征，这些特征用于分类器的近似和泛化。还讨论了用于评估分类器性能的各种参数。因此，本文的综述可以为目前的研究人员在现有方法的基础上进行改进指明方向。

{"title":"Sequence-based protein superfamily classification using computational intelligence techniques: a review.","authors":"Swati Vipsita, Santanu Kumar Rath","doi":"10.1504/ijdmb.2015.067957","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067957","url":null,"abstract":"Protein superfamily classification deals with the problem of predicting the family membership of newly discovered amino acid sequence. Although many trivial alignment methods are already developed by previous researchers, but the present trend demands the application of computational intelligent techniques. As there is an exponential growth in size of biological database, retrieval and inference of essential knowledge in the biological domain become a very cumbersome task. This problem can be easily handled using intelligent techniques due to their ability of tolerance for imprecision, uncertainty, approximate reasoning, and partial truth. This paper discusses the various global and local features extracted from full length protein sequence which are used for the approximation and generalisation of the classifier. The various parameters used for evaluating the performance of the classifiers are also discussed. Therefore, this review article can show right directions to the present researchers to make an improvement over the existing methods.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 4","pages":"424-57"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067957","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Discovering essential proteins based on PPI network and protein complex. 基于PPI网络和蛋白质复合物发现必需蛋白质。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.068951

Jun Ren, Jianxin Wang, Min Li, Fangxiang Wu

Most computational methods for identifying essential proteins focus on the topological centrality of protein-protein interaction (PPI) networks. However, these methods have limitations, such as the difficulty for identifying essential proteins with low centrality values and the poor performance for incomplete PPI network. In this paper, protein complex is proven to be an important factor for determining protein essentiality and a new centrality measure, complex centrality, is proposed. The weighted average of complex centrality and subgraph centrality, called harmonic centrality (HC), is proposed to predict essential proteins. It combines PPI network topology and protein complex information and has better performance than methods based on PPI network. The improvement is higher when the PPI network is incomplete. Furthermore, a weighted PPI network is generated by integrating cellular localisation and biological process to a PPI network. The performance of HC measure is improved 5% in this weighted PPI network.

大多数识别必需蛋白质的计算方法都集中在蛋白质-蛋白质相互作用(PPI)网络的拓扑中心性上。然而，这些方法存在局限性，例如难以识别具有低中心性值的必需蛋白质，以及不完整PPI网络的性能较差。本文证明了蛋白质复合体是决定蛋白质本质的重要因素，并提出了一种新的中心性度量方法——复合体中心性。提出了复杂中心性和子图中心性的加权平均值，称为调和中心性(HC)，用于预测必需蛋白质。该方法结合了PPI网络拓扑结构和蛋白质复合物信息，比基于PPI网络的方法具有更好的性能。当PPI网络不完整时，改善程度更高。此外，通过将细胞定位和生物过程整合到PPI网络中，生成加权PPI网络。在该加权PPI网络中，HC测度的性能提高了5%。

引用次数: 25

Metabolites production improvement by identifying minimal genomes and essential genes using flux balance analysis. 通过通量平衡分析鉴定最小基因组和必需基因来改善代谢物的产生。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.068955

Abdul Hakim Mohamed Salleh, Mohd Saberi Mohamad, Safaai Deris, Rosli Md Illias

With the advancement in metabolic engineering technologies, reconstruction of the genome of host organisms to achieve desired phenotypes can be made. However, due to the complexity and size of the genome scale metabolic network, significant components tend to be invisible. We proposed an approach to improve metabolite production that consists of two steps. First, we find the essential genes and identify the minimal genome by a single gene deletion process using Flux Balance Analysis (FBA) and second by identifying the significant pathway for the metabolite production using gene expression data. A genome scale model of Saccharomyces cerevisiae for production of vanillin and acetate is used to test this approach. The result has shown the reliability of this approach to find essential genes, reduce genome size and identify production pathway that can further optimise the production yield. The identified genes and pathways can be extendable to other applications especially in strain optimisation.

随着代谢工程技术的进步，可以重建宿主生物的基因组以获得所需的表型。然而，由于基因组尺度代谢网络的复杂性和规模，重要成分往往是不可见的。我们提出了一种改善代谢物产生的方法，包括两个步骤。首先，我们使用通量平衡分析(FBA)通过单个基因缺失过程找到必需基因并鉴定最小基因组，其次通过基因表达数据确定代谢物产生的重要途径。用于生产香兰素和醋酸盐的酿酒酵母的基因组规模模型被用来测试这种方法。结果表明，这种方法在寻找必需基因、减小基因组大小和确定生产途径方面是可靠的，可以进一步优化产量。鉴定的基因和途径可以扩展到其他应用，特别是在菌株优化。

{"title":"Metabolites production improvement by identifying minimal genomes and essential genes using flux balance analysis.","authors":"Abdul Hakim Mohamed Salleh, Mohd Saberi Mohamad, Safaai Deris, Rosli Md Illias","doi":"10.1504/ijdmb.2015.068955","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068955","url":null,"abstract":"With the advancement in metabolic engineering technologies, reconstruction of the genome of host organisms to achieve desired phenotypes can be made. However, due to the complexity and size of the genome scale metabolic network, significant components tend to be invisible. We proposed an approach to improve metabolite production that consists of two steps. First, we find the essential genes and identify the minimal genome by a single gene deletion process using Flux Balance Analysis (FBA) and second by identifying the significant pathway for the metabolite production using gene expression data. A genome scale model of Saccharomyces cerevisiae for production of vanillin and acetate is used to test this approach. The result has shown the reliability of this approach to find essential genes, reduce genome size and identify production pathway that can further optimise the production yield. The identified genes and pathways can be extendable to other applications especially in strain optimisation.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 1","pages":"85-99"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

To screen the effective software for analysing gene interactions from Kashin-Beck disease genome profiling pathway and network, according to the tool of GeneMANIA. 根据GeneMANIA工具，从大骨节病基因组图谱途径和网络中筛选有效的基因相互作用分析软件。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.068963

Sen Wang, Weizhuo Wang, Junjie Zhao, Feng Zhang, Shulan He, Xiong Guo

In order to screen the more effective software for the pathway and network analysis of Kashin-Beck disease, gene microarrays, TranscriptomeBrowser, MetaCore and GeneMANIA were used for analysis. Three significant chondrocytic pathways and one network were screened by TranscriptomeBrowser; one significant pathway and one network were identified by MetaCore. BAX, APAF1, CASP6, BCL2, VEGF, SOCS3, BAK, TGFBI, TNFAIP6, TNFRSF11B and THBS1 were significant genes associated with the biological function of chondrocyte or cartilage involved in the TranscriptomeBrowser or MetaCore results. The interactions between the significant genes and their adjacent genes were searched and classified in GeneMANIA. In pathway analysis results, TranscriptomeBrowser is superior to get the interaction of pathway and co-expression compared with MetaCore; MetaCore is superior to get the interaction of physical interaction compared with TranscriptomeBrowser. In network analysis results, TranscriptomeBrowser contains more interaction message of co-localisation, MetaCore contains, more interaction message of co-expression.

为了筛选更有效的大骨节病通路和网络分析软件，使用基因微阵列、转录组浏览器、MetaCore和GeneMANIA进行分析。转录组浏览器筛选了三个重要的软骨细胞通路和一个网络;MetaCore识别出一个重要通路和一个网络。在转录组浏览器或MetaCore结果中，BAX、APAF1、CASP6、BCL2、VEGF、SOCS3、BAK、TGFBI、TNFAIP6、TNFRSF11B和THBS1是与软骨细胞或软骨生物学功能相关的重要基因。对GeneMANIA中显著基因与其相邻基因之间的相互作用进行检索和分类。在通路分析结果中，与MetaCore相比，转录组浏览器更能获得通路与共表达的相互作用;MetaCore在物理交互方面优于TranscriptomeBrowser。在网络分析结果中，TranscriptomeBrowser包含较多的共定位交互信息，MetaCore包含较多的共表达交互信息。

{"title":"To screen the effective software for analysing gene interactions from Kashin-Beck disease genome profiling pathway and network, according to the tool of GeneMANIA.","authors":"Sen Wang, Weizhuo Wang, Junjie Zhao, Feng Zhang, Shulan He, Xiong Guo","doi":"10.1504/ijdmb.2015.068963","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068963","url":null,"abstract":"In order to screen the more effective software for the pathway and network analysis of Kashin-Beck disease, gene microarrays, TranscriptomeBrowser, MetaCore and GeneMANIA were used for analysis. Three significant chondrocytic pathways and one network were screened by TranscriptomeBrowser; one significant pathway and one network were identified by MetaCore. BAX, APAF1, CASP6, BCL2, VEGF, SOCS3, BAK, TGFBI, TNFAIP6, TNFRSF11B and THBS1 were significant genes associated with the biological function of chondrocyte or cartilage involved in the TranscriptomeBrowser or MetaCore results. The interactions between the significant genes and their adjacent genes were searched and classified in GeneMANIA. In pathway analysis results, TranscriptomeBrowser is superior to get the interaction of pathway and co-expression compared with MetaCore; MetaCore is superior to get the interaction of physical interaction compared with TranscriptomeBrowser. In network analysis results, TranscriptomeBrowser contains more interaction message of co-localisation, MetaCore contains, more interaction message of co-expression.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 1","pages":"100-14"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068963","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Modelling and structural characteristics analysis of gene networks for prostate cancer. 前列腺癌基因网络的建模和结构特征分析。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.068950

Yulin Zhang, Shudong Wang, Dazhi Meng

Analysing structure of gene networks is an important way to understand regulatory mechanisms of organism at the molecular level. In this work, gene mutual information networks are constructed based on gene expression profiles in prostate tissues with and without cancer. In order to contrast structural difference of normal and diseased networks, curves of four structural parameters are given with the change of thresholds. Then threshold discrimination intervals and discrimination weights are defined. A method of finding structural key genes with significant degree-difference is proposed. The finding of key genes will help the biomedical scientists to further research the pathogenesis of prostate cancer. Finally randomisation test is performed to prove that these structural parameters can distinguish normal and prostate cancer in their structures compared with these results in real data.

基因网络结构分析是在分子水平上认识生物调控机制的重要途径。在这项工作中，基因互信息网络构建基于基因表达谱的前列腺组织有和没有癌症。为了对比正常和病变网络的结构差异，给出了四个结构参数随阈值变化的曲线。然后定义阈值判别区间和判别权值。提出了一种寻找具有显著程度差异的结构关键基因的方法。关键基因的发现将有助于生物医学科学家进一步研究前列腺癌的发病机制。最后进行随机化测试，以证明这些结构参数可以区分正常和前列腺癌的结构，并与实际数据中的结果进行比较。

引用次数: 0

Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification. 带有误分类代价和拒绝代价的正则化极端学习机用于基因表达数据分类。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069657

Huijuan Lu, Shasha Wei, Zili Zhou, Yanzi Miao, Yi Lu

The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.

传统分类算法在生物信息学应用中的主要目的是获得更好的分类精度。然而，这些算法都不能满足平均误分类代价最小化的要求。本文提出了一种新的代价敏感正则化极限学习机(CS-RELM)算法，利用概率估计和误分类代价对分类结果进行重构。通过提高一组小样本的分类精度，从而降低错误分类成本，使分类成本最小化。将“拒绝代价”融入CS-RELM算法，进一步降低平均误分类代价。通过使用结肠肿瘤数据集和SRBCT(小圆蓝细胞肿瘤)数据集，将CS-RELM与其他代价敏感算法如极限学习机(ELM)、代价敏感极限学习机、正则化极限学习机、代价敏感支持向量机(SVM)进行比较。实验结果表明，嵌入拒绝代价的CS-RELM可以降低误分类的平均代价，做出更可信的分类决策。

{"title":"Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification.","authors":"Huijuan Lu, Shasha Wei, Zili Zhou, Yanzi Miao, Yi Lu","doi":"10.1504/ijdmb.2015.069657","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069657","url":null,"abstract":"The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 3","pages":"294-312"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069657","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34125295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Data Mining and Bioinformatics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀