首页 > 最新文献

International Journal of Data Mining and Bioinformatics最新文献

英文 中文
DiffGRN: differential gene regulatory network analysis DiffGRN:差异基因调控网络分析
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-09-27 DOI: 10.1504/IJDMB.2018.10016325
Youngsoon Kim, Jie Hao, Yadu Gautam, T. Mersha, Mingon Kang
Identification of differential gene regulators with significant changes under disparate conditions is essential to understand complex biological mechanism in a disease. Differential Network Analysis (DiNA) examines different biological processes based on gene regulatory networks that represent regulatory interactions between genes with a graph model. While most studies in DiNA have considered correlation-based inference to construct gene regulatory networks from gene expression data due to its intuitive representation and simple implementation, the approach lacks in the representation of causal effects and multivariate effects between genes. In this paper, we propose an approach named Differential Gene Regulatory Network (DiffGRN) that infers differential gene regulation between two groups. We infer gene regulatory networks of two groups using Random LASSO, and then we identify differential gene regulations by the proposed significance test. The advantages of DiffGRN are to capture multivariate effects of genes that regulate a gene simultaneously, to identify causality of gene regulations, and to discover differential gene regulators between regression-based gene regulatory networks. We assessed DiffGRN by simulation experiments and showed its outstanding performance than the current state-of-the-art correlation-based method, DINGO. DiffGRN is applied to gene expression data in asthma. The DiNA with asthma data showed a number of gene regulations, such as ADAM12 and RELB, reported in biological literature.
识别在不同条件下具有显著变化的差异基因调节因子对于理解疾病的复杂生物学机制至关重要。差分网络分析(DiNA)基于基因调控网络来检查不同的生物过程,该网络用图模型表示基因之间的调控相互作用。尽管DiNA的大多数研究都考虑了基于相关性的推断来从基因表达数据构建基因调控网络,因为其直观的表示和简单的实现,但该方法缺乏对基因之间因果效应和多变量效应的表示。在本文中,我们提出了一种称为差异基因调控网络(DiffGRN)的方法,该方法推断两组之间的差异基因调控。我们使用随机LASSO推断出两组的基因调控网络,然后通过所提出的显著性检验确定差异基因调控。DiffGRN的优点是捕捉同时调节基因的基因的多变量效应,识别基因调节的因果关系,并发现基于回归的基因调节网络之间的差异基因调节因子。我们通过模拟实验对DiffGRN进行了评估,并显示出其比目前最先进的基于相关性的方法DINGO更出色的性能。DiffGRN应用于哮喘的基因表达数据。哮喘数据的DiNA显示了许多基因调控,如生物学文献中报道的ADAM12和RELB。
{"title":"DiffGRN: differential gene regulatory network analysis","authors":"Youngsoon Kim, Jie Hao, Yadu Gautam, T. Mersha, Mingon Kang","doi":"10.1504/IJDMB.2018.10016325","DOIUrl":"https://doi.org/10.1504/IJDMB.2018.10016325","url":null,"abstract":"Identification of differential gene regulators with significant changes under disparate conditions is essential to understand complex biological mechanism in a disease. Differential Network Analysis (DiNA) examines different biological processes based on gene regulatory networks that represent regulatory interactions between genes with a graph model. While most studies in DiNA have considered correlation-based inference to construct gene regulatory networks from gene expression data due to its intuitive representation and simple implementation, the approach lacks in the representation of causal effects and multivariate effects between genes. In this paper, we propose an approach named Differential Gene Regulatory Network (DiffGRN) that infers differential gene regulation between two groups. We infer gene regulatory networks of two groups using Random LASSO, and then we identify differential gene regulations by the proposed significance test. The advantages of DiffGRN are to capture multivariate effects of genes that regulate a gene simultaneously, to identify causality of gene regulations, and to discover differential gene regulators between regression-based gene regulatory networks. We assessed DiffGRN by simulation experiments and showed its outstanding performance than the current state-of-the-art correlation-based method, DINGO. DiffGRN is applied to gene expression data in asthma. The DiNA with asthma data showed a number of gene regulations, such as ADAM12 and RELB, reported in biological literature.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"20 4 1","pages":"362-379"},"PeriodicalIF":0.3,"publicationDate":"2018-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49202748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
DiffGRN: differential gene regulatory network analysis. 差异基因调控网络分析。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-01-01 DOI: 10.1504/IJDMB.2018.094891
Youngsoon Kim, Jie Hao, Yadu Gautam, Tesfaye B Mersha, Mingon Kang

Identification of differential gene regulators with significant changes under disparate conditions is essential to understand complex biological mechanism in a disease. Differential Network Analysis (DiNA) examines different biological processes based on gene regulatory networks that represent regulatory interactions between genes with a graph model. While most studies in DiNA have considered correlation-based inference to construct gene regulatory networks from gene expression data due to its intuitive representation and simple implementation, the approach lacks in the representation of causal effects and multivariate effects between genes. In this paper, we propose an approach named Differential Gene Regulatory Network (DiffGRN) that infers differential gene regulation between two groups. We infer gene regulatory networks of two groups using Random LASSO, and then we identify differential gene regulations by the proposed significance test. The advantages of DiffGRN are to capture multivariate effects of genes that regulate a gene simultaneously, to identify causality of gene regulations, and to discover differential gene regulators between regression-based gene regulatory networks. We assessed DiffGRN by simulation experiments and showed its outstanding performance than the current state-of-the-art correlation-based method, DINGO. DiffGRN is applied to gene expression data in asthma. The DiNA with asthma data showed a number of gene regulations, such as ADAM12 and RELB, reported in biological literature.

鉴定在不同条件下具有显著变化的差异基因调节因子对于了解疾病的复杂生物学机制至关重要。差分网络分析(DiNA)基于基因调控网络,用图形模型表征基因之间的调控相互作用,研究不同的生物过程。大多数DiNA研究考虑基于关联推理的基因表达数据构建基因调控网络,由于其表征直观、实现简单,但缺乏对基因间因果效应和多变量效应的表征。在本文中,我们提出了一种名为差异基因调控网络(DiffGRN)的方法来推断两组之间的差异基因调控。我们利用随机LASSO方法推断两组基因调控网络,然后通过提出的显著性检验确定差异基因调控。DiffGRN的优势在于能够捕捉同时调控一个基因的基因的多变量效应,识别基因调控的因果关系,发现基于回归的基因调控网络之间的差异基因调控。我们通过仿真实验对DiffGRN进行了评估,并证明其优于当前最先进的基于相关的DINGO方法。DiffGRN应用于哮喘的基因表达数据。具有哮喘数据的DiNA显示了生物学文献中报道的许多基因调控,如ADAM12和RELB。
{"title":"DiffGRN: differential gene regulatory network analysis.","authors":"Youngsoon Kim,&nbsp;Jie Hao,&nbsp;Yadu Gautam,&nbsp;Tesfaye B Mersha,&nbsp;Mingon Kang","doi":"10.1504/IJDMB.2018.094891","DOIUrl":"https://doi.org/10.1504/IJDMB.2018.094891","url":null,"abstract":"<p><p>Identification of differential gene regulators with significant changes under disparate conditions is essential to understand complex biological mechanism in a disease. Differential Network Analysis (DiNA) examines different biological processes based on gene regulatory networks that represent regulatory interactions between genes with a graph model. While most studies in DiNA have considered correlation-based inference to construct gene regulatory networks from gene expression data due to its intuitive representation and simple implementation, the approach lacks in the representation of causal effects and multivariate effects between genes. In this paper, we propose an approach named Differential Gene Regulatory Network (DiffGRN) that infers differential gene regulation between two groups. We infer gene regulatory networks of two groups using Random LASSO, and then we identify differential gene regulations by the proposed significance test. The advantages of DiffGRN are to capture multivariate effects of genes that regulate a gene simultaneously, to identify causality of gene regulations, and to discover differential gene regulators between regression-based gene regulatory networks. We assessed DiffGRN by simulation experiments and showed its outstanding performance than the current state-of-the-art correlation-based method, DINGO. DiffGRN is applied to gene expression data in asthma. The DiNA with asthma data showed a number of gene regulations, such as ADAM12 and RELB, reported in biological literature.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"20 4","pages":"362-379"},"PeriodicalIF":0.3,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2018.094891","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36999358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of multi-omics data for integrative gene regulatory network inference. 整合多组学数据用于综合基因调控网络推断。
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2017-01-01 Epub Date: 2017-10-03 DOI: 10.1504/IJDMB.2017.10008266
Neda Zarayeneh, Euiseong Ko, Jung Hun Oh, Sang Suh, Chunyu Liu, Jean Gao, Donghyun Kim, Mingon Kang

Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.

基因调控网络为复杂的生物过程提供了全面的见解和深入的理解。基因调控网络的分子相互作用是从单一类型的基因组数据推断出来的,例如,在大多数研究中,基因表达数据。然而,基因表达是多种生物过程连续相互作用的产物,如DNA序列变异、拷贝数变异、组蛋白修饰、转录因子和DNA甲基化。最近高通量组学技术的快速发展使人们能够测量多种类型的组学数据,称为“多组学数据”,这些数据代表了各种生物过程。本文提出了一种整合多组学数据及其在基因调控网络中的相互作用的基因调控网络推断方法(iGRN)。除了基因表达外,本文还考虑了拷贝数变化和DNA甲基化对多组学数据的影响。密集的实验是用模拟数据进行的,其中iGRN推断综合基因调控网络的能力被评估。实验表明,在基因调控网络推理中,iGRN在模型表示和解释方面比其他综合方法有更好的表现。iGRN还应用于人类大脑的精神疾病数据集,并分析了精神疾病的生物网络。
{"title":"Integration of multi-omics data for integrative gene regulatory network inference.","authors":"Neda Zarayeneh,&nbsp;Euiseong Ko,&nbsp;Jung Hun Oh,&nbsp;Sang Suh,&nbsp;Chunyu Liu,&nbsp;Jean Gao,&nbsp;Donghyun Kim,&nbsp;Mingon Kang","doi":"10.1504/IJDMB.2017.10008266","DOIUrl":"https://doi.org/10.1504/IJDMB.2017.10008266","url":null,"abstract":"<p><p>Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"18 3","pages":"223-239"},"PeriodicalIF":0.3,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5771269/pdf/nihms912092.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35754483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
The development of non-coding RNA ontology 非编码RNA本体的发展
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2016-01-01 DOI: 10.1504/IJDMB.2016.077072
Jingshan Huang, K. Eilbeck, Barry Smith, J. Blake, D. Dou, Weili Huang, D. Natale, A. Ruttenberg, Jun Huan, Michael T. Zimmermann, Guoqian Jiang, Yu Lin, Bin Wu, Harrison J. Strachan, Nisansa de Silva, M. V. Kasukurthi, V. Jha, Y. He, Shaojie Zhang, Xiaowei Wang, Zixing Liu, G. Borchert, M. Tan
Identification of non-coding RNAs (ncRNAs) has been significantly improved over the past decade. On the other hand, semantic annotation of ncRNA data is facing critical challenges due to the lack of a comprehensive ontology to serve as common data elements and data exchange standards in the field. We developed the Non-Coding RNA Ontology (NCRO) to handle this situation. By providing a formally defined ncRNA controlled vocabulary, the NCRO aims to fill a specific and highly needed niche in semantic annotation of large amounts of ncRNA biological and clinical data.
在过去的十年中,非编码rna (ncRNAs)的鉴定有了显著的改进。另一方面,由于缺乏一个全面的本体作为该领域的通用数据元素和数据交换标准,ncRNA数据的语义标注面临着严峻的挑战。我们开发了非编码RNA本体(NCRO)来处理这种情况。通过提供正式定义的ncRNA控制词汇表,NCRO旨在填补大量ncRNA生物学和临床数据的语义注释中特定且急需的空缺。
{"title":"The development of non-coding RNA ontology","authors":"Jingshan Huang, K. Eilbeck, Barry Smith, J. Blake, D. Dou, Weili Huang, D. Natale, A. Ruttenberg, Jun Huan, Michael T. Zimmermann, Guoqian Jiang, Yu Lin, Bin Wu, Harrison J. Strachan, Nisansa de Silva, M. V. Kasukurthi, V. Jha, Y. He, Shaojie Zhang, Xiaowei Wang, Zixing Liu, G. Borchert, M. Tan","doi":"10.1504/IJDMB.2016.077072","DOIUrl":"https://doi.org/10.1504/IJDMB.2016.077072","url":null,"abstract":"Identification of non-coding RNAs (ncRNAs) has been significantly improved over the past decade. On the other hand, semantic annotation of ncRNA data is facing critical challenges due to the lack of a comprehensive ontology to serve as common data elements and data exchange standards in the field. We developed the Non-Coding RNA Ontology (NCRO) to handle this situation. By providing a formally defined ncRNA controlled vocabulary, the NCRO aims to fill a specific and highly needed niche in semantic annotation of large amounts of ncRNA biological and clinical data.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"15 3 1","pages":"214-232"},"PeriodicalIF":0.3,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2016.077072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Application of consensus string matching in the diagnosis of allelic heterogeneity involving transposition mutation 共识串匹配在转位突变等位基因异质性诊断中的应用
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-10-01 DOI: 10.1504/IJDMB.2015.072756
F. Zohora, Mohammad Sohel Rahman
In this paper, an algorithm is proposed that detects the existence of a common ancestor gene sequence for non-overlapping transposition metric given two input DNA sequences. We consider two cases: fixed length transposition and all length transposition. For the first one, the algorithm has the time complexity of O(n3), where n is the length of input sequences. In case of all length transposition, theoretical worst case time complexity of the algorithm is proven to be O(n4). However, practically the worst case and the average case time complexity for all length transposition are found to be O(n3) and O(n2) respectively. This work is motivated by the purpose of diagnosing unknown genetic disease that shows allelic heterogeneity, a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. The algorithm can be useful as well in the study of breed-related hereditary to determine the genetic spread of a defective gene in the population.
本文提出了一种基于非重叠转位度量的共同祖先基因序列检测算法。我们考虑两种情况:定长换位和全长换位。对于第一种算法,算法的时间复杂度为O(n3),其中n为输入序列的长度。在全长度转置情况下,证明了该算法的理论最坏情况时间复杂度为O(n4)。然而,实际上,所有长度变换的最坏情况和平均情况的时间复杂度分别为O(n3)和O(n2)。这项工作的动机是为了诊断显示等位基因异质性的未知遗传疾病,即一个正常基因在不同顺序上发生突变,导致两种不同的基因序列,从而导致两种不同的遗传疾病。该算法也可用于研究与品种相关的遗传,以确定缺陷基因在群体中的遗传传播。
{"title":"Application of consensus string matching in the diagnosis of allelic heterogeneity involving transposition mutation","authors":"F. Zohora, Mohammad Sohel Rahman","doi":"10.1504/IJDMB.2015.072756","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072756","url":null,"abstract":"In this paper, an algorithm is proposed that detects the existence of a common ancestor gene sequence for non-overlapping transposition metric given two input DNA sequences. We consider two cases: fixed length transposition and all length transposition. For the first one, the algorithm has the time complexity of O(n3), where n is the length of input sequences. In case of all length transposition, theoretical worst case time complexity of the algorithm is proven to be O(n4). However, practically the worst case and the average case time complexity for all length transposition are found to be O(n3) and O(n2) respectively. This work is motivated by the purpose of diagnosing unknown genetic disease that shows allelic heterogeneity, a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. The algorithm can be useful as well in the study of breed-related hereditary to determine the genetic spread of a defective gene in the population.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 4 1","pages":"360-77"},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072756","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Genome-wide discovery of miRNAs using ensembles of machine learning algorithms and logistic regression 使用机器学习算法和逻辑回归集成的全基因组mirna发现
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-10-01 DOI: 10.1504/IJDMB.2015.072755
Benjamin Ulfenborg, K. Klinga-Levan, B. Olsson
In silico prediction of novel miRNAs from genomic sequences remains a challenging problem. This study presents a genome-wide miRNA discovery software package called GenoScan and evaluates two hairpin classification methods. These methods, one ensemble-based and one using logistic regression were benchmarked along with 15 published methods. In addition, the sequence-folding step is addressed by investigating the impact of secondary structure prediction methods and the choice of input sequence length on prediction performance. Both the accuracy of secondary structure predictions and the miRNA prediction are evaluated. In the benchmark of hairpin classification methods, the regression model achieved highest classification accuracy. Of the structure prediction methods evaluated, ContextFold achieved the highest agreement between predicted and experimentally determined structures. However, both the choice of secondary structure prediction method and input sequence length had limited impact on hairpin classification performance.
从基因组序列中预测新的mirna仍然是一个具有挑战性的问题。本研究提出了一个名为genscan的全基因组miRNA发现软件包,并评估了两种发夹分类方法。这些方法,一个基于集成,一个使用逻辑回归与15个已发表的方法一起进行基准测试。此外,通过研究二级结构预测方法和输入序列长度的选择对预测性能的影响,解决了序列折叠步骤。对二级结构预测和miRNA预测的准确性进行了评估。在发夹分类方法的基准中,回归模型的分类准确率最高。在评估的结构预测方法中,ContextFold在预测和实验确定的结构之间取得了最高的一致性。然而,二级结构预测方法的选择和输入序列长度对发夹分类性能的影响有限。
{"title":"Genome-wide discovery of miRNAs using ensembles of machine learning algorithms and logistic regression","authors":"Benjamin Ulfenborg, K. Klinga-Levan, B. Olsson","doi":"10.1504/IJDMB.2015.072755","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072755","url":null,"abstract":"In silico prediction of novel miRNAs from genomic sequences remains a challenging problem. This study presents a genome-wide miRNA discovery software package called GenoScan and evaluates two hairpin classification methods. These methods, one ensemble-based and one using logistic regression were benchmarked along with 15 published methods. In addition, the sequence-folding step is addressed by investigating the impact of secondary structure prediction methods and the choice of input sequence length on prediction performance. Both the accuracy of secondary structure predictions and the miRNA prediction are evaluated. In the benchmark of hairpin classification methods, the regression model achieved highest classification accuracy. Of the structure prediction methods evaluated, ContextFold achieved the highest agreement between predicted and experimentally determined structures. However, both the choice of secondary structure prediction method and input sequence length had limited impact on hairpin classification performance.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 4 1","pages":"338-59"},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072755","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
In silico identification and functional annotation of yeast E3 ubiquitin ligase Rsp5 substrates 酵母E3泛素连接酶Rsp5底物的硅基鉴定和功能注释
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-10-01 DOI: 10.1504/IJDMB.2015.072754
Xiaofeng Song, Lizhen Hu, P. Han, Xuejiang Guo, J. Sha
Rsp5, E3 ligases conserved from yeast to mammals, plays a key role in diverse processes in yeast. However, many of Rsp5 substrates are still unclear. Therefore we proposed an in silico method to recognise new substrates of Rsp5. To investigate the molecular determinants that affect the interaction between Rsp5 and its substrate, we have systematically analysed many features that perhaps correlated with the Rsp5 substrate recognition. It is found that PPxY motif, transmembrane region, disorder region and N-linked glycosylation modification are the most important features for substrate recognition. We have constructed an SVM-based classifier to recognise Rsp5 substrates, obtaining 81.5% sensitivity and 74.1% specificity averagely on ten independent testing dataset. We also applied the model on the whole yeast proteome, and identified -66 new Rsp5 substrates. Functional annotation reveals that half of these novel substrates function in the Rsp5 involved cell processes as Rsp5-interacting proteins.
Rsp5是一种从酵母到哺乳动物保守的E3连接酶,在酵母的多种过程中起着关键作用。然而,许多Rsp5底物仍不清楚。因此,我们提出了一种识别Rsp5新底物的计算机方法。为了研究影响Rsp5与其底物之间相互作用的分子决定因素,我们系统地分析了可能与Rsp5底物识别相关的许多特征。发现PPxY基序、跨膜区、紊乱区和n -链糖基化修饰是底物识别的最重要特征。我们构建了基于svm的Rsp5底物识别分类器,在10个独立测试数据集上平均获得81.5%的灵敏度和74.1%的特异性。我们还将该模型应用于整个酵母蛋白质组,鉴定出-66个新的Rsp5底物。功能注释显示,这些新底物中有一半作为Rsp5相互作用蛋白在Rsp5参与的细胞过程中起作用。
{"title":"In silico identification and functional annotation of yeast E3 ubiquitin ligase Rsp5 substrates","authors":"Xiaofeng Song, Lizhen Hu, P. Han, Xuejiang Guo, J. Sha","doi":"10.1504/IJDMB.2015.072754","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072754","url":null,"abstract":"Rsp5, E3 ligases conserved from yeast to mammals, plays a key role in diverse processes in yeast. However, many of Rsp5 substrates are still unclear. Therefore we proposed an in silico method to recognise new substrates of Rsp5. To investigate the molecular determinants that affect the interaction between Rsp5 and its substrate, we have systematically analysed many features that perhaps correlated with the Rsp5 substrate recognition. It is found that PPxY motif, transmembrane region, disorder region and N-linked glycosylation modification are the most important features for substrate recognition. We have constructed an SVM-based classifier to recognise Rsp5 substrates, obtaining 81.5% sensitivity and 74.1% specificity averagely on ten independent testing dataset. We also applied the model on the whole yeast proteome, and identified -66 new Rsp5 substrates. Functional annotation reveals that half of these novel substrates function in the Rsp5 involved cell processes as Rsp5-interacting proteins.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 4 1","pages":"321-37"},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072754","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted fusion regularisation and predicting microbial interactions with vector autoregressive model 加权融合正则化与矢量自回归模型预测微生物相互作用
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-10-01 DOI: 10.1504/IJDMB.2015.072757
Yan Wang, Tingting He, Xingpeng Jiang, Jie Yuan, Xianjun Shen
In this paper, we develop a novel regularisation method for MVAR via weighted fusion which considers the correlation among variables. In theory, we discuss the grouping effect of weighted fusion regularisation for linear models. By virtue of the probability method, we show that coefficients corresponding to highly correlated predictors have small differences. A quantitative estimate for such small differences is given regardless of the coefficients signs. The estimate is also improved when consider empirical approximation error if the model fit the data well. We then apply the proposed model on several time series data sets especially a time series dataset of human gut microbiomes. The experimental results indicate that the new approach has better performance than several other VAR-based models and we also demonstrate its capability of extracting relevant microbial interactions.
本文提出了一种考虑变量间相关性的加权融合的MVAR正则化方法。从理论上讨论了线性模型加权融合正则化的分组效应。利用概率方法,我们发现高度相关预测因子对应的系数差异很小。无论系数符号如何,都给出了这种小差异的定量估计。当考虑经验逼近误差时,如果模型与数据拟合较好,估计也得到了改善。然后,我们将所提出的模型应用于几个时间序列数据集,特别是人类肠道微生物组的时间序列数据集。实验结果表明,新方法比其他几种基于var的模型具有更好的性能,并且我们还证明了其提取相关微生物相互作用的能力。
{"title":"Weighted fusion regularisation and predicting microbial interactions with vector autoregressive model","authors":"Yan Wang, Tingting He, Xingpeng Jiang, Jie Yuan, Xianjun Shen","doi":"10.1504/IJDMB.2015.072757","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072757","url":null,"abstract":"In this paper, we develop a novel regularisation method for MVAR via weighted fusion which considers the correlation among variables. In theory, we discuss the grouping effect of weighted fusion regularisation for linear models. By virtue of the probability method, we show that coefficients corresponding to highly correlated predictors have small differences. A quantitative estimate for such small differences is given regardless of the coefficients signs. The estimate is also improved when consider empirical approximation error if the model fit the data well. We then apply the proposed model on several time series data sets especially a time series dataset of human gut microbiomes. The experimental results indicate that the new approach has better performance than several other VAR-based models and we also demonstrate its capability of extracting relevant microbial interactions.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 4 1","pages":"378-94"},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072757","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning multiple distributed prototypes of semantic categories for named entity recognition 学习多个分布式语义类别原型,用于命名实体识别
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-10-01 DOI: 10.1504/IJDMB.2015.072766
Aron Henriksson
The scarcity of large labelled datasets comprising clinical text that can be exploited within the paradigm of supervised machine learning creates barriers for the secondary use of data from electronic health records. It is therefore important to develop capabilities to leverage the large amounts of unlabelled data that, indeed, tend to be readily available. One technique utilises distributional semantics to create word representations in a wholly unsupervised manner and uses existing training data to learn prototypical representations of predefined semantic categories. Features describing whether a given word belongs to a certain category are then provided to the learning algorithm. It has been shown that using multiple distributional semantic models, each employing a different word order strategy, can lead to enhanced predictive performance. Here, another hyperparameter is also varied--the size of the context window--and an experimental investigation shows that this leads to further performance gains.
包含可在监督机器学习范式中利用的临床文本的大型标记数据集的稀缺性为电子健康记录数据的二次使用创造了障碍。因此,开发利用大量未标记数据的能力是很重要的,事实上,这些数据往往很容易获得。一种技术利用分布式语义以完全无监督的方式创建单词表示,并使用现有的训练数据来学习预定义语义类别的原型表示。然后将描述给定单词是否属于某个类别的特征提供给学习算法。研究表明,使用多个分布式语义模型,每个模型采用不同的词序策略,可以提高预测性能。这里,另一个超参数也可以改变——上下文窗口的大小——实验研究表明,这可以进一步提高性能。
{"title":"Learning multiple distributed prototypes of semantic categories for named entity recognition","authors":"Aron Henriksson","doi":"10.1504/IJDMB.2015.072766","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072766","url":null,"abstract":"The scarcity of large labelled datasets comprising clinical text that can be exploited within the paradigm of supervised machine learning creates barriers for the secondary use of data from electronic health records. It is therefore important to develop capabilities to leverage the large amounts of unlabelled data that, indeed, tend to be readily available. One technique utilises distributional semantics to create word representations in a wholly unsupervised manner and uses existing training data to learn prototypical representations of predefined semantic categories. Features describing whether a given word belongs to a certain category are then provided to the learning algorithm. It has been shown that using multiple distributional semantic models, each employing a different word order strategy, can lead to enhanced predictive performance. Here, another hyperparameter is also varied--the size of the context window--and an experimental investigation shows that this leads to further performance gains.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 4 1","pages":"395-411"},"PeriodicalIF":0.3,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072766","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Sequence based human leukocyte antigen gene prediction using informative physicochemical properties 基于序列的人白细胞抗原基因预测,利用信息物理化学性质
IF 0.3 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2015-09-01 DOI: 10.1504/IJDMB.2015.072072
W. Shoombuatong, Panuwat Mekha, Jeerayut Chaijaruwanich
Prediction of different classes within the human leukocyte antigen (HLA) gene family can provide insight into the human immune system and its response to viral pathogens. Therefore, it is desirable to develop an efficient and easily interpretable method for predicting HLA gene class compared to existing methods. We investigated the HLA gene prediction problem as follows: (a) establishing a dataset (HLA262) such that the sequence identity of the complete HLA dataset was reduced to 30%; (b) proposing a feature set of informative physicochemical properties that cooperate with SVM (named HLAPred) to achieve high accuracy and sensitivity (90.04% and 82.99%, respectively) compared with existing methods; and (c) analysing the informative physicochemical properties to understand the physicochemical properties and molecular mechanisms of the HLA gene family.
预测人类白细胞抗原(HLA)基因家族的不同类别可以深入了解人类免疫系统及其对病毒病原体的反应。因此,与现有方法相比,开发一种高效且易于解释的HLA基因分类预测方法是很有必要的。我们对HLA基因预测问题进行了如下研究:(a)建立一个数据集(HLA262),使完整HLA数据集的序列同一性降低到30%;(b)提出了一种与支持向量机(SVM)配合的信息物理化学性质特征集(命名为HLAPred),与现有方法相比,准确率和灵敏度分别为90.04%和82.99%;(c)分析信息性的理化性质,以了解HLA基因家族的理化性质和分子机制。
{"title":"Sequence based human leukocyte antigen gene prediction using informative physicochemical properties","authors":"W. Shoombuatong, Panuwat Mekha, Jeerayut Chaijaruwanich","doi":"10.1504/IJDMB.2015.072072","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072072","url":null,"abstract":"Prediction of different classes within the human leukocyte antigen (HLA) gene family can provide insight into the human immune system and its response to viral pathogens. Therefore, it is desirable to develop an efficient and easily interpretable method for predicting HLA gene class compared to existing methods. We investigated the HLA gene prediction problem as follows: (a) establishing a dataset (HLA262) such that the sequence identity of the complete HLA dataset was reduced to 30%; (b) proposing a feature set of informative physicochemical properties that cooperate with SVM (named HLAPred) to achieve high accuracy and sensitivity (90.04% and 82.99%, respectively) compared with existing methods; and (c) analysing the informative physicochemical properties to understand the physicochemical properties and molecular mechanisms of the HLA gene family.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 3 1","pages":"211-24"},"PeriodicalIF":0.3,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
International Journal of Data Mining and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1