首页 > 最新文献

International Journal of Data Mining and Bioinformatics最新文献

英文 中文
An efficient algorithm for updating regular expression indexes in RDF databases. 在RDF数据库中更新正则表达式索引的有效算法。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.066767
Jinsoo Lee, Romans Kasperovics, Wook-Shin Han, Jeong-Hoon Lee, Min Soo Kim, Hune Cho

The Resource Description Framework (RDF) is widely used for sharing biomedical data, such as gene ontology or the online protein database UniProt. SPARQL is a native query language for RDF, featuring regular expressions in queries for which exact values are either irrelevant or unknown. The use of regular expression indexes in SPARQL query processing improves the performance of queries containing regular expressions by up to two orders of magnitude. In this study, we address the update operation for regular expression indexes in RDF databases. We identify major performance problems of straightforward index update algorithms and propose a new algorithm that utilises unique properties of regular expression indexes to increase performance. Our contributions can be summarised as follows: (1) we propose an efficient update algorithm for regular expression indexes in RDF databases, (2) we build a prototype system for the proposed algorithm in C++ and (3) we conduct extensive experiments demonstrating the improvement of our algorithm over the straightforward approaches by an order of magnitude.

资源描述框架(RDF)被广泛用于生物医学数据的共享,如基因本体或在线蛋白质数据库UniProt。SPARQL是RDF的一种本地查询语言,在查询中提供正则表达式,而这些正则表达式的准确值要么是不相关的,要么是未知的。在SPARQL查询处理中使用正则表达式索引可以将包含正则表达式的查询的性能提高两个数量级。在本研究中,我们解决了RDF数据库中正则表达式索引的更新操作。我们确定了直接索引更新算法的主要性能问题,并提出了一种利用正则表达式索引的独特属性来提高性能的新算法。我们的贡献可以总结如下:(1)我们为RDF数据库中的正则表达式索引提出了一种有效的更新算法,(2)我们用c++为所提出的算法构建了一个原型系统,(3)我们进行了大量的实验,证明我们的算法比直接的方法有了数量级的改进。
{"title":"An efficient algorithm for updating regular expression indexes in RDF databases.","authors":"Jinsoo Lee,&nbsp;Romans Kasperovics,&nbsp;Wook-Shin Han,&nbsp;Jeong-Hoon Lee,&nbsp;Min Soo Kim,&nbsp;Hune Cho","doi":"10.1504/ijdmb.2015.066767","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066767","url":null,"abstract":"<p><p>The Resource Description Framework (RDF) is widely used for sharing biomedical data, such as gene ontology or the online protein database UniProt. SPARQL is a native query language for RDF, featuring regular expressions in queries for which exact values are either irrelevant or unknown. The use of regular expression indexes in SPARQL query processing improves the performance of queries containing regular expressions by up to two orders of magnitude. In this study, we address the update operation for regular expression indexes in RDF databases. We identify major performance problems of straightforward index update algorithms and propose a new algorithm that utilises unique properties of regular expression indexes to increase performance. Our contributions can be summarised as follows: (1) we propose an efficient update algorithm for regular expression indexes in RDF databases, (2) we build a prototype system for the proposed algorithm in C++ and (3) we conduct extensive experiments demonstrating the improvement of our algorithm over the straightforward approaches by an order of magnitude.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 2","pages":"205-22"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33906549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Probabilistic partial least squares regression for quantitative analysis of Raman spectra. 拉曼光谱定量分析的概率偏最小二乘回归。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.066768
Shuo Li, James O Nyagilo, Digant P Dave, Wei Wang, Baoju Zhang, Jean Gao

With the latest development of Surface-Enhanced Raman Scattering (SERS) technique, quantitative analysis of Raman spectra has shown the potential and promising trend of development in vivo molecular imaging. Partial Least Squares Regression (PLSR) is state-of-the-art method. But it only relies on training samples, which makes it difficult to incorporate complex domain knowledge. Based on probabilistic Principal Component Analysis (PCA) and probabilistic curve fitting idea, we propose a probabilistic PLSR (PPLSR) model and an Estimation Maximisation (EM) algorithm for estimating parameters. This model explains PLSR from a probabilistic viewpoint, describes its essential meaning and provides a foundation to develop future Bayesian nonparametrics models. Two real Raman spectra datasets were used to evaluate this model, and experimental results show its effectiveness.

随着表面增强拉曼散射(SERS)技术的最新发展,拉曼光谱的定量分析显示出体内分子成像的潜力和发展趋势。偏最小二乘回归(PLSR)是目前最先进的回归方法。但它只依赖于训练样本,难以整合复杂的领域知识。基于概率主成分分析(PCA)和概率曲线拟合思想,提出了一种概率PLSR (PPLSR)模型和一种估计最大化(EM)算法。该模型从概率的角度对PLSR进行了解释,描述了其本质意义,为今后贝叶斯非参数模型的发展奠定了基础。用两个真实的拉曼光谱数据集对该模型进行了验证,实验结果表明了该模型的有效性。
{"title":"Probabilistic partial least squares regression for quantitative analysis of Raman spectra.","authors":"Shuo Li,&nbsp;James O Nyagilo,&nbsp;Digant P Dave,&nbsp;Wei Wang,&nbsp;Baoju Zhang,&nbsp;Jean Gao","doi":"10.1504/ijdmb.2015.066768","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066768","url":null,"abstract":"<p><p>With the latest development of Surface-Enhanced Raman Scattering (SERS) technique, quantitative analysis of Raman spectra has shown the potential and promising trend of development in vivo molecular imaging. Partial Least Squares Regression (PLSR) is state-of-the-art method. But it only relies on training samples, which makes it difficult to incorporate complex domain knowledge. Based on probabilistic Principal Component Analysis (PCA) and probabilistic curve fitting idea, we propose a probabilistic PLSR (PPLSR) model and an Estimation Maximisation (EM) algorithm for estimating parameters. This model explains PLSR from a probabilistic viewpoint, describes its essential meaning and provides a foundation to develop future Bayesian nonparametrics models. Two real Raman spectra datasets were used to evaluate this model, and experimental results show its effectiveness.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 2","pages":"223-43"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066768","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33906550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Predicting gene functions from multiple biological sources using novel ensemble methods. 利用新颖的集成方法预测多种生物来源的基因功能。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069418
Chandan K Reddy, Mohammad S Aziz

The functional classification of genes plays a vital role in molecular biology. Detecting previously unknown role of genes and their products in physiological and pathological processes is an important and challenging problem. In this work, information from several biological sources such as comparative genome sequences, gene expression and protein interactions are combined to obtain robust results on predicting gene functions. The information in such heterogeneous sources is often incomplete and hence making the maximum use of all the available information is a challenging problem. We propose an algorithm that improves the performance of prediction of different models built on individual sources. We also develop a heterogeneous boosting framework that uses all the available information even if some sources do not provide any information about some of the genes. We demonstrate the superior performance of the proposed methods in terms of accuracy and F-measure compared to several imputation and integration schemes.

基因的功能分类在分子生物学中起着至关重要的作用。检测先前未知的基因及其产物在生理和病理过程中的作用是一个重要而具有挑战性的问题。在这项工作中,来自几个生物学来源的信息,如比较基因组序列,基因表达和蛋白质相互作用相结合,以获得预测基因功能的可靠结果。这些异构源中的信息通常是不完整的,因此最大限度地利用所有可用信息是一个具有挑战性的问题。我们提出了一种算法,可以提高基于单个源的不同模型的预测性能。我们还开发了一个异质促进框架,使用所有可用的信息,即使一些来源没有提供有关某些基因的任何信息。我们证明了所提出的方法在精度和f测量方面的优越性能,与几种imputation和integration方案相比。
{"title":"Predicting gene functions from multiple biological sources using novel ensemble methods.","authors":"Chandan K Reddy,&nbsp;Mohammad S Aziz","doi":"10.1504/ijdmb.2015.069418","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069418","url":null,"abstract":"<p><p>The functional classification of genes plays a vital role in molecular biology. Detecting previously unknown role of genes and their products in physiological and pathological processes is an important and challenging problem. In this work, information from several biological sources such as comparative genome sequences, gene expression and protein interactions are combined to obtain robust results on predicting gene functions. The information in such heterogeneous sources is often incomplete and hence making the maximum use of all the available information is a challenging problem. We propose an algorithm that improves the performance of prediction of different models built on individual sources. We also develop a heterogeneous boosting framework that uses all the available information even if some sources do not provide any information about some of the genes. We demonstrate the superior performance of the proposed methods in terms of accuracy and F-measure compared to several imputation and integration schemes.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 2","pages":"184-206"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069418","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Co-decision matrix framework for name entity recognition in biomedical text. 生物医学文本名称实体识别的协同决策矩阵框架。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067956
Haochang Wang, Yu Li

As a new branch of data mining and knowledge discovery, the research of biomedical text mining has a rapid progress currently. Biomedical named entity (BNE) recognition is a basic technique in the biomedical knowledge discovery and its performance has direct effects on further discovery and processing in biomedical texts. In this paper, we present an improved method based on co-decision matrix framework for Biomedical Named Entity Recognition (BNER). The relativity between classifiers is utilised by using co-decision matrix to exchange decision information among classifiers. The experiments are carried on GENIA corpus with the best result of 75.9% F-score. Experimental results show that the proposed method, co-decision matrix framework, can yield promising performances.

生物医学文本挖掘作为数据挖掘和知识发现的一个新分支,目前研究进展迅速。生物医学命名实体(BNE)识别是生物医学知识发现的一项基本技术,其性能直接影响到生物医学文本的进一步发现和处理。本文提出了一种改进的基于协同决策矩阵框架的生物医学命名实体识别方法。利用分类器之间的相关性,利用共同决策矩阵在分类器之间交换决策信息。实验在GENIA语料上进行,f值达到75.9%。实验结果表明,所提出的联合决策矩阵框架方法具有良好的性能。
{"title":"Co-decision matrix framework for name entity recognition in biomedical text.","authors":"Haochang Wang,&nbsp;Yu Li","doi":"10.1504/ijdmb.2015.067956","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067956","url":null,"abstract":"<p><p>As a new branch of data mining and knowledge discovery, the research of biomedical text mining has a rapid progress currently. Biomedical named entity (BNE) recognition is a basic technique in the biomedical knowledge discovery and its performance has direct effects on further discovery and processing in biomedical texts. In this paper, we present an improved method based on co-decision matrix framework for Biomedical Named Entity Recognition (BNER). The relativity between classifiers is utilised by using co-decision matrix to exchange decision information among classifiers. The experiments are carried on GENIA corpus with the best result of 75.9% F-score. Experimental results show that the proposed method, co-decision matrix framework, can yield promising performances.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 4","pages":"412-23"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067956","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An integrated approach to identify protein complex based on best neighbour and modularity increment. 一种基于最近邻和模块化增量的蛋白质复合体识别方法。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067973
Xianjun Shen, Yanli Zhao, Yanan Li, Yang Yi, Tingting He, Jincai Yang

In order to overcome the limitations of global modularity and the deficiency of local modularity, we propose a hybrid modularity measure Local-Global Quantification (LGQ) which considers global modularity and local modularity together. LGQ adopts a suitable module feature adjustable parameter to control the balance of global detecting capability and local search capability in Protein-Protein Interactions (PPI) Network. Furthermore, we develop a new protein complex mining algorithm called Best Neighbour and Local-Global Quantification (BN-LGQ) which integrates the best neighbour node and modularity increment. BN-LGQ expands the protein complex by fast searching the best neighbour node of the current cluster and by calculating the modularity increment as a metric to determine whether the best neighbour node can join the current cluster. The experimental results show BN-LGQ performs a better accuracy on predicting protein complexes and has a higher match with the reference protein complexes than MCL and MCODE algorithms. Moreover, BN-LGQ can effectively discover protein complexes with better biological significance in the PPI network.

为了克服全局模块化的局限性和局部模块化的不足,提出了一种同时考虑全局模块化和局部模块化的混合模块化测度局部全局量化(LGQ)。在蛋白质-蛋白质相互作用(Protein-Protein Interactions, PPI)网络中,LGQ采用合适的模块特征可调参数来控制全局检测能力和局部搜索能力的平衡。在此基础上,我们提出了一种结合最优邻居节点和模块化增量的蛋白质复合物挖掘算法,称为最优邻居和局部-全局量化算法(BN-LGQ)。BN-LGQ算法通过快速搜索当前集群的最佳邻居节点,并通过计算模块化增量作为度量来确定最佳邻居节点是否可以加入当前集群,从而扩展蛋白质复合物。实验结果表明,与MCL和MCODE算法相比,BN-LGQ算法对蛋白质复合物的预测精度更高,与参考蛋白复合物的匹配度更高。此外,BN-LGQ可以有效发现PPI网络中具有较好生物学意义的蛋白复合物。
{"title":"An integrated approach to identify protein complex based on best neighbour and modularity increment.","authors":"Xianjun Shen,&nbsp;Yanli Zhao,&nbsp;Yanan Li,&nbsp;Yang Yi,&nbsp;Tingting He,&nbsp;Jincai Yang","doi":"10.1504/ijdmb.2015.067973","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067973","url":null,"abstract":"<p><p>In order to overcome the limitations of global modularity and the deficiency of local modularity, we propose a hybrid modularity measure Local-Global Quantification (LGQ) which considers global modularity and local modularity together. LGQ adopts a suitable module feature adjustable parameter to control the balance of global detecting capability and local search capability in Protein-Protein Interactions (PPI) Network. Furthermore, we develop a new protein complex mining algorithm called Best Neighbour and Local-Global Quantification (BN-LGQ) which integrates the best neighbour node and modularity increment. BN-LGQ expands the protein complex by fast searching the best neighbour node of the current cluster and by calculating the modularity increment as a metric to determine whether the best neighbour node can join the current cluster. The experimental results show BN-LGQ performs a better accuracy on predicting protein complexes and has a higher match with the reference protein complexes than MCL and MCODE algorithms. Moreover, BN-LGQ can effectively discover protein complexes with better biological significance in the PPI network.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 4","pages":"458-73"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067973","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Managing changes in distributed biomedical ontologies using hierarchical distributed graph transformation. 使用分层分布式图转换管理分布式生物医学本体中的变更。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.066334
Arash Shaban-Nejad, Volker Haarslev

The issue of ontology evolution and change management is inadequately addressed by available tools and algorithms, mostly due to the lack of suitable knowledge representation formalisms to deal with temporal abstract notations and the overreliance on human factors. Also most of the current approaches have been focused on changes within the internal structure of ontologies and interactions with other existing ontologies have been widely neglected. In our research, after revealing and classifying some of the common alterations in a number of popular biomedical ontologies, we present a novel agent-based framework, Represent, Legitimate and Reproduce (RLR), to semi-automatically manage the evolution of bio-ontologies, with emphasis on the FungalWeb Ontology, with minimal human intervention. RLR assists and guides ontology engineers through the change management process in general and aids in tracking and representing the changes, particularly through the use of category theory and hierarchical graph transformation.

现有的工具和算法无法充分解决本体演化和变更管理的问题,这主要是由于缺乏适当的知识表示形式来处理时间抽象符号以及过度依赖人为因素。此外,目前的大多数方法都集中在本体内部结构的变化和与其他现有本体的相互作用上,而被广泛忽视。在我们的研究中,在揭示和分类了一些流行的生物医学本体中的一些常见变化之后,我们提出了一个新的基于代理的框架,代表,合法和复制(RLR),以半自动管理生物本体的进化,重点是真菌web本体,人工干预最少。RLR帮助和指导本体工程师完成变更管理过程,并帮助跟踪和表示变更,特别是通过使用范畴论和层次图转换。
{"title":"Managing changes in distributed biomedical ontologies using hierarchical distributed graph transformation.","authors":"Arash Shaban-Nejad,&nbsp;Volker Haarslev","doi":"10.1504/ijdmb.2015.066334","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066334","url":null,"abstract":"<p><p>The issue of ontology evolution and change management is inadequately addressed by available tools and algorithms, mostly due to the lack of suitable knowledge representation formalisms to deal with temporal abstract notations and the overreliance on human factors. Also most of the current approaches have been focused on changes within the internal structure of ontologies and interactions with other existing ontologies have been widely neglected. In our research, after revealing and classifying some of the common alterations in a number of popular biomedical ontologies, we present a novel agent-based framework, Represent, Legitimate and Reproduce (RLR), to semi-automatically manage the evolution of bio-ontologies, with emphasis on the FungalWeb Ontology, with minimal human intervention. RLR assists and guides ontology engineers through the change management process in general and aids in tracking and representing the changes, particularly through the use of category theory and hierarchical graph transformation.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 1","pages":"53-83"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066334","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Translation of disease associated gene signatures across tissues. 组织间疾病相关基因特征的翻译。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067321
Adetayo Kasim, Ziv Shkedy, Dan Lin, Suzy Van Sanden, Josè Cortiñas Abrahantes, Hinrich W H Göhlmann, Luc Bijnens, Dani Yekutieli, Michael Camilleri, Jeroen Aerssens, Willem Talloen

It has recently been shown that disease associated gene signatures can be identified by profiling tissue other than the disease related tissue. In this paper, we investigate gene signatures for Irritable Bowel Syndrome (IBS) using gene expression profiling of both disease related tissue (colon) and surrogate tissue (rectum). Gene specific joint ANOVA models were used to investigate differentially expressed genes between the IBS patients and the healthy controls taken into account both intra and inter tissue dependencies among expression levels of the same gene. Classification algorithms in combination with feature selection methods were used to investigate the predictive power of gene expression levels from the surrogate and the target tissues. We conclude based on the analyses that expression profiles of the colon and the rectum tissue could result in better predictive accuracy if the disease associated genes are known.

最近有研究表明,疾病相关的基因特征可以通过分析疾病相关组织以外的组织来识别。在本文中,我们利用疾病相关组织(结肠)和替代组织(直肠)的基因表达谱来研究肠易激综合征(IBS)的基因特征。基因特异性联合方差分析模型用于研究IBS患者和健康对照之间的差异表达基因,同时考虑到同一基因表达水平之间的组织内和组织间依赖性。将分类算法与特征选择方法相结合,用于研究代理组织和靶组织基因表达水平的预测能力。我们根据分析得出结论,如果已知疾病相关基因,结肠和直肠组织的表达谱可以导致更好的预测准确性。
{"title":"Translation of disease associated gene signatures across tissues.","authors":"Adetayo Kasim,&nbsp;Ziv Shkedy,&nbsp;Dan Lin,&nbsp;Suzy Van Sanden,&nbsp;Josè Cortiñas Abrahantes,&nbsp;Hinrich W H Göhlmann,&nbsp;Luc Bijnens,&nbsp;Dani Yekutieli,&nbsp;Michael Camilleri,&nbsp;Jeroen Aerssens,&nbsp;Willem Talloen","doi":"10.1504/ijdmb.2015.067321","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067321","url":null,"abstract":"<p><p>It has recently been shown that disease associated gene signatures can be identified by profiling tissue other than the disease related tissue. In this paper, we investigate gene signatures for Irritable Bowel Syndrome (IBS) using gene expression profiling of both disease related tissue (colon) and surrogate tissue (rectum). Gene specific joint ANOVA models were used to investigate differentially expressed genes between the IBS patients and the healthy controls taken into account both intra and inter tissue dependencies among expression levels of the same gene. Classification algorithms in combination with feature selection methods were used to investigate the predictive power of gene expression levels from the surrogate and the target tissues. We conclude based on the analyses that expression profiles of the colon and the rectum tissue could result in better predictive accuracy if the disease associated genes are known.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 3","pages":"301-13"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34039165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An ensemble method for reconstructing gene regulatory network with jackknife resampling and arithmetic mean fusion. 基于折刀重采样和算术均值融合的基因调控网络集成重构方法。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069658
Chen Zhou, Shao-Wu Zhang, Fei Liu

During the past decades, numerous computational approaches have been introduced for inferring the GRNs. PCA-CMI approach achieves the highest precision on the benchmark GRN datasets; however, it does not recover the meaningful edges that may have been deleted in an earlier iterative process. To recover this disadvantage and enhance the precision and robustness of GRNs inferred, we present an ensemble method, named as JRAMF, to infer GRNs from gene expression data by adopting two strategies of resampling and arithmetic mean fusion in this work. The jackknife resampling procedure were first employed to form a series of sub-datasets of gene expression data, then the PCA-CMI was used to generate the corresponding sub-networks from the sub-datasets, and the final GRN was inferred by integrating these sub-networks with an arithmetic mean fusion strategy. Compared with PCA-CMI algorithm, the results show that JRAMF outperforms significantly PCA-CMI method, which has a high and robust performance.

在过去的几十年里,已经引入了许多计算方法来推断grn。PCA-CMI方法在基准GRN数据集上获得了最高的精度;然而,它不能恢复在早期迭代过程中可能被删除的有意义的边。为了弥补这一缺陷,提高grn推断的精度和鲁棒性,本文提出了一种集成方法JRAMF,采用重采样和算术均值融合两种策略,从基因表达数据中推断grn。首先采用叠刀重采样方法形成一系列基因表达数据子数据集,然后使用PCA-CMI从子数据集生成相应的子网络,最后使用算术平均融合策略对这些子网络进行整合,从而推断出最终的GRN。与PCA-CMI算法进行比较,结果表明JRAMF算法显著优于PCA-CMI算法,具有较高的鲁棒性。
{"title":"An ensemble method for reconstructing gene regulatory network with jackknife resampling and arithmetic mean fusion.","authors":"Chen Zhou,&nbsp;Shao-Wu Zhang,&nbsp;Fei Liu","doi":"10.1504/ijdmb.2015.069658","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069658","url":null,"abstract":"<p><p>During the past decades, numerous computational approaches have been introduced for inferring the GRNs. PCA-CMI approach achieves the highest precision on the benchmark GRN datasets; however, it does not recover the meaningful edges that may have been deleted in an earlier iterative process. To recover this disadvantage and enhance the precision and robustness of GRNs inferred, we present an ensemble method, named as JRAMF, to infer GRNs from gene expression data by adopting two strategies of resampling and arithmetic mean fusion in this work. The jackknife resampling procedure were first employed to form a series of sub-datasets of gene expression data, then the PCA-CMI was used to generate the corresponding sub-networks from the sub-datasets, and the final GRN was inferred by integrating these sub-networks with an arithmetic mean fusion strategy. Compared with PCA-CMI algorithm, the results show that JRAMF outperforms significantly PCA-CMI method, which has a high and robust performance.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 3","pages":"328-42"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069658","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34125297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Signal transduction in the activation of spermatozoa compared to other signalling pathways: a biological networks study. 精子激活中的信号转导与其他信号通路的比较:一项生物网络研究。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.068953
Nicola Bernabò, Mauro Mattioli, Barbara Barboni

In this paper we represented Spermatozoa Activation (SA) the process that leads male gametes to reach their fertilising ability of sea urchin, Caenorhabditis elegans and human as biological networks, i.e. as networks of nodes (molecules) linked by edges (their interactions). Then, we compared them with networks representing ten pathways of relevant physio-pathological importance and with a computer-generated network. We have found that the number of nodes and edges composing each network is not related with the amount of published papers on each specific topic and that all the topological parameters examined are similar in all the networks, thus conferring them a scale free topology and small world behaviour. In conclusion, SA topology, independently from the reproductive biology of considered organism, as others signalling networks is characterised by robustness against random failure, controllability and efficiency in signal transmission.

在本文中,我们将海胆、秀丽隐杆线虫和人类的精子激活(SA)这一导致雄性配子达到受精能力的过程描述为生物网络,即由边缘(相互作用)连接的节点(分子)网络。然后,我们将它们与代表相关生理病理重要性的10条通路的网络以及计算机生成的网络进行比较。我们发现,组成每个网络的节点和边的数量与每个特定主题发表的论文数量无关,并且在所有网络中检查的所有拓扑参数都是相似的,从而赋予它们无标度拓扑和小世界行为。综上所述,SA拓扑独立于被考虑生物的生殖生物学,与其他信号网络一样,具有抗随机故障的鲁棒性、可控性和信号传输效率。
{"title":"Signal transduction in the activation of spermatozoa compared to other signalling pathways: a biological networks study.","authors":"Nicola Bernabò,&nbsp;Mauro Mattioli,&nbsp;Barbara Barboni","doi":"10.1504/ijdmb.2015.068953","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068953","url":null,"abstract":"<p><p>In this paper we represented Spermatozoa Activation (SA) the process that leads male gametes to reach their fertilising ability of sea urchin, Caenorhabditis elegans and human as biological networks, i.e. as networks of nodes (molecules) linked by edges (their interactions). Then, we compared them with networks representing ten pathways of relevant physio-pathological importance and with a computer-generated network. We have found that the number of nodes and edges composing each network is not related with the amount of published papers on each specific topic and that all the topological parameters examined are similar in all the networks, thus conferring them a scale free topology and small world behaviour. In conclusion, SA topology, independently from the reproductive biology of considered organism, as others signalling networks is characterised by robustness against random failure, controllability and efficiency in signal transmission.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 1","pages":"59-69"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068953","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Fuzzy-rough-neural-based f-information for gene selection and sample classification. 基于模糊粗糙神经的基因选择和样本分类f信息。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.066333
P Ganesh Kumar, C Rani, D Mahibha, T Aruldoss Albert Victoire

The greatest restriction in estimating the information measure for microarray data is the continuous nature of gene expression values. The traditional criterion function of f-information discretises the continuous gene expression value for calculating the probability function during gene selection. This leads to loss of biological meaning of microarray data and results in poor classification accuracy. To overcome this difficulty, the concepts of fuzzy and rough set are combined to redefine the criterion functions of f-information and are used to form candidate genes from which informative genes are selected using neural network. The performance of the proposed Fuzzy-Rough-Neural-based f-Information (FRNf-I) is evaluated using ten gene expression datasets. Simulation results show that the proposed approach compute f-information measure easily without discretisation. Statistical analysis of the test result shows that the proposed FRNf-I selects comparatively less number of genes and more classification accuracy than the other approaches reported in the literature.

估计微阵列数据信息测量的最大限制是基因表达值的连续性。传统的f信息准则函数将连续的基因表达值离散化,用于计算基因选择过程中的概率函数。这导致微阵列数据失去生物学意义,导致分类精度差。为了克服这一困难,结合模糊和粗糙集的概念,重新定义f-information的准则函数,形成候选基因,并利用神经网络从候选基因中选择信息基因。使用十个基因表达数据集评估了所提出的基于模糊粗糙神经的f-Information (FRNf-I)的性能。仿真结果表明,该方法易于计算f-信息测度,且不需要离散化。对测试结果的统计分析表明,与文献报道的其他方法相比,本文提出的FRNf-I方法选择的基因数量相对较少,分类精度更高。
{"title":"Fuzzy-rough-neural-based f-information for gene selection and sample classification.","authors":"P Ganesh Kumar,&nbsp;C Rani,&nbsp;D Mahibha,&nbsp;T Aruldoss Albert Victoire","doi":"10.1504/ijdmb.2015.066333","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066333","url":null,"abstract":"<p><p>The greatest restriction in estimating the information measure for microarray data is the continuous nature of gene expression values. The traditional criterion function of f-information discretises the continuous gene expression value for calculating the probability function during gene selection. This leads to loss of biological meaning of microarray data and results in poor classification accuracy. To overcome this difficulty, the concepts of fuzzy and rough set are combined to redefine the criterion functions of f-information and are used to form candidate genes from which informative genes are selected using neural network. The performance of the proposed Fuzzy-Rough-Neural-based f-Information (FRNf-I) is evaluated using ten gene expression datasets. Simulation results show that the proposed approach compute f-information measure easily without discretisation. Statistical analysis of the test result shows that the proposed FRNf-I selects comparatively less number of genes and more classification accuracy than the other approaches reported in the literature.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 1","pages":"31-52"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
International Journal of Data Mining and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1