Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719682
Xiaoning Qian, E. Dougherty
Sensitivity analysis is a critical yet challenging problem for understanding complex systems. In genomic signal processing, it has been recognized that many biological systems are asymptotically stable. The sensitivity regarding the structural and dynamical uncertainty of network models may provide a deep understanding of the robustness, adaptability, and controllability of biological processes. We focus on the Boolean network model, as it has been shown to be able to capture the switching behavior of many biological processes by appropriate modeling of multivariate nonlinear relationships among genes. We study two different sensitivity measures for the Boolean network model, one directly related to individual predictor Boolean functions and the other to long-term network dynamics. Although there is some correlation between the measures, our study shows that these different sensitivities characterize different aspects of network behavior, so that their application depends on how they relate to specific translational goals.
{"title":"A comparative study on sensitivities of Boolean networks","authors":"Xiaoning Qian, E. Dougherty","doi":"10.1109/GENSIPS.2010.5719682","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719682","url":null,"abstract":"Sensitivity analysis is a critical yet challenging problem for understanding complex systems. In genomic signal processing, it has been recognized that many biological systems are asymptotically stable. The sensitivity regarding the structural and dynamical uncertainty of network models may provide a deep understanding of the robustness, adaptability, and controllability of biological processes. We focus on the Boolean network model, as it has been shown to be able to capture the switching behavior of many biological processes by appropriate modeling of multivariate nonlinear relationships among genes. We study two different sensitivity measures for the Boolean network model, one directly related to individual predictor Boolean functions and the other to long-term network dynamics. Although there is some correlation between the measures, our study shows that these different sensitivities characterize different aspects of network behavior, so that their application depends on how they relate to specific translational goals.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129527599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719687
Koon-Kiu Yan, M. Gerstein
The study of hierarchical organization of complex networks provides a more intuitive picture on the regulatory interactions in various complex systems, including both biological and technological systems. In the first part of the talk, I will introduce the integrated regulatory network based on the systematic integration of various high-throughput datasets from the modENCODE project. The network consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. I will examine the topological structures of the network, with emphasis on its hierarchical organization. In the second part of the talk, I will further present the hierarchical organization of the E. coli transcriptional regulatory network and the call graph of the Linux operating system. The effects on the robustness of the systems and insights from evolution will be discussed.
{"title":"Hierarchical analysis of regulatory networks and cross-disciplinary comparison with the Linux call graph","authors":"Koon-Kiu Yan, M. Gerstein","doi":"10.1109/GENSIPS.2010.5719687","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719687","url":null,"abstract":"The study of hierarchical organization of complex networks provides a more intuitive picture on the regulatory interactions in various complex systems, including both biological and technological systems. In the first part of the talk, I will introduce the integrated regulatory network based on the systematic integration of various high-throughput datasets from the modENCODE project. The network consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. I will examine the topological structures of the network, with emphasis on its hierarchical organization. In the second part of the talk, I will further present the hierarchical organization of the E. coli transcriptional regulatory network and the call graph of the Linux operating system. The effects on the robustness of the systems and insights from evolution will be discussed.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122959495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719670
N. Banerjee, A. Janevski, S. Kamalakaran, V. Varadan, R. Lucito, N. Dimitrova
Ovarian cancer is the leading cause of death in gynecological cancers. Carboplatinum-based therapy is the standard treatment choice for ovarian cancer. However, a majority of the patients develop resistance to carboplatinum fairly rapidly hence there is a clinical need for early predictors of carboplatinum resistance. While there are a few indicative gene markers, they have poor sensitivity and specificity in predicting response accurately. It is essential that multiple high throughput molecular profiling modalities are integrated and investigated to provide a full picture of the ongoing processes. Here, we propose a methodology to identify central players in platinum resistance from a list of stratifying genes using a data-driven approach. We have used correlation of DNA methylation and gene expression data and applied network based features to identify the influence of DNA methylation on gene expression. This provides interpretive analysis and is complementary to the biological pathway-enrichment approaches. We suggest that our method, based on network structure properties, adds a useful layer to multi-modal evidence integration to focus on the key processes and interactions in resistance mechanisms.
{"title":"Pathway and network analysis probing epigenetic influences on chemosensitivity in ovarian cancer","authors":"N. Banerjee, A. Janevski, S. Kamalakaran, V. Varadan, R. Lucito, N. Dimitrova","doi":"10.1109/GENSIPS.2010.5719670","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719670","url":null,"abstract":"Ovarian cancer is the leading cause of death in gynecological cancers. Carboplatinum-based therapy is the standard treatment choice for ovarian cancer. However, a majority of the patients develop resistance to carboplatinum fairly rapidly hence there is a clinical need for early predictors of carboplatinum resistance. While there are a few indicative gene markers, they have poor sensitivity and specificity in predicting response accurately. It is essential that multiple high throughput molecular profiling modalities are integrated and investigated to provide a full picture of the ongoing processes. Here, we propose a methodology to identify central players in platinum resistance from a list of stratifying genes using a data-driven approach. We have used correlation of DNA methylation and gene expression data and applied network based features to identify the influence of DNA methylation on gene expression. This provides interpretive analysis and is complementary to the biological pathway-enrichment approaches. We suggest that our method, based on network structure properties, adds a useful layer to multi-modal evidence integration to focus on the key processes and interactions in resistance mechanisms.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115280467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719690
Chen Zhao, I. Ivanov, Manasvi Shah, L. Davidson, R. Chapkin, E. Dougherty
For the first time, we studied the applicability of a conditioning-based model to a heterogeneous data set composed of expression values for microRNA, total mRNA and polysomal mRNA resulting from experiments about two dietary contexts. The results suggest that some of the microRNAs are likely to be involved in the regulation of a large set of genes and not just their putative targets. Furthermore, the regulatory activities of intestinal microRNA appear to be dependent on the sub-cellular location of mRNA within the cell.
{"title":"Conditioning-based model for the regulatory activities of microRNAs in specific dietary contexts","authors":"Chen Zhao, I. Ivanov, Manasvi Shah, L. Davidson, R. Chapkin, E. Dougherty","doi":"10.1109/GENSIPS.2010.5719690","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719690","url":null,"abstract":"For the first time, we studied the applicability of a conditioning-based model to a heterogeneous data set composed of expression values for microRNA, total mRNA and polysomal mRNA resulting from experiments about two dietary contexts. The results suggest that some of the microRNAs are likely to be involved in the regulation of a large set of genes and not just their putative targets. Furthermore, the regulatory activities of intestinal microRNA appear to be dependent on the sub-cellular location of mRNA within the cell.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133428670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719685
Liming Wang, D. Schonfeld
Processing of biological data sequences by mapping into numerical signals is a commonly used technique. The operators such as de-noising filter, smoothing filter and certain algorithm could be used iteratively. Little is known about the consistency of analysis results with different mapping strategies in this situation. Meanwhile, due to the errors and noises in acquisition of data, the stability of analysis results should never be neglected. In this paper, we provide a method for analyzing the consistency between different mappings under iterations of operator. We define different concepts of mapping equivalence. We show the necessary and sufficient condition for consistency under iteration of affine operator. We present a few theoretical results on the equivalent mappings on the concept of Fatou and Julia Set. We give the definition of stability under iteration of operator and show the stability issue can be viewed as a special case of mapping equivalence. We also establish the connection of stability to Fatou and Julia set. Finally, we present experimental results on human gene AD169 sequence and rhodopsin gene sequence under one of the widely used mappings and illustrate the equivalent mapping for a smoothing filter.
{"title":"Dynamics, stability and consistency in representation of genomic sequences","authors":"Liming Wang, D. Schonfeld","doi":"10.1109/GENSIPS.2010.5719685","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719685","url":null,"abstract":"Processing of biological data sequences by mapping into numerical signals is a commonly used technique. The operators such as de-noising filter, smoothing filter and certain algorithm could be used iteratively. Little is known about the consistency of analysis results with different mapping strategies in this situation. Meanwhile, due to the errors and noises in acquisition of data, the stability of analysis results should never be neglected. In this paper, we provide a method for analyzing the consistency between different mappings under iterations of operator. We define different concepts of mapping equivalence. We show the necessary and sufficient condition for consistency under iteration of affine operator. We present a few theoretical results on the equivalent mappings on the concept of Fatou and Julia Set. We give the definition of stability under iteration of operator and show the stability issue can be viewed as a special case of mapping equivalence. We also establish the connection of stability to Fatou and Julia set. Finally, we present experimental results on human gene AD169 sequence and rhodopsin gene sequence under one of the widely used mappings and illustrate the equivalent mapping for a smoothing filter.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133358699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719668
W. Ashlock, S. Datta
Retroviruses have important roles to play in medicine, evolution, and biology. A key step towards understanding the effect of retroviruses on hosts is identifying them in the host genome. Detecting retroviruses using sequence alignment is difficult because are very diverse and have high mutation rates. We propose a fast, accurate algorithm for detecting retroviruses that uses supervised machine learning and three sets of features. One set of novel features identify the characteristic reading frame structure of retroviruses. The other two sets include features that have been used by other researchers for exon finding. Our algorithm distinguishes retroviral genomes from non-coding sequences and endogenous retroviruses from non-coding sequences and from genes with high accuracy. It also distinguishes endogenous retroviruses from intact retroviral genomes, lentiviruses from other retroviruses, all with high accuracy.
{"title":"Fast algorithms for recognizing retroviruses","authors":"W. Ashlock, S. Datta","doi":"10.1109/GENSIPS.2010.5719668","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719668","url":null,"abstract":"Retroviruses have important roles to play in medicine, evolution, and biology. A key step towards understanding the effect of retroviruses on hosts is identifying them in the host genome. Detecting retroviruses using sequence alignment is difficult because are very diverse and have high mutation rates. We propose a fast, accurate algorithm for detecting retroviruses that uses supervised machine learning and three sets of features. One set of novel features identify the characteristic reading frame structure of retroviruses. The other two sets include features that have been used by other researchers for exon finding. Our algorithm distinguishes retroviral genomes from non-coding sequences and endogenous retroviruses from non-coding sequences and from genes with high accuracy. It also distinguishes endogenous retroviruses from intact retroviral genomes, lentiviruses from other retroviruses, all with high accuracy.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130000012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719679
Jia Meng, Jianqiu Zhang, Yidong Chen, Yufei Huang
The problem of uncovering transcriptional regulation by transcription factors (TFs) based on microarray data is considered. A novel Bayesian sparse correlated rectified factor model (BSCRFM) coupled with its ICM solution is proposed. BSCRFM models the unknown TF protein level activity, the correlated regulations between TFs, and the sparse nature of TF regulated genes and it admits prior knowledge from existing database regarding TF regulated target genes. An efficient ICM algorithm is developed and a context-specific transcriptional regulatory network specific to the experimental condition of the microarray data can be obtained. The proposed model and the ICM algorithm are evaluated on the simulated systems and results demonstrated the validity and effectiveness of the proposed approach. The proposed model is also applied to the breast cancer microarray data and a TF regulated network regarding ER status is obtained.
{"title":"An iterated conditional mode solution for Bayesian factor modeling of transcriptional regulatory networks","authors":"Jia Meng, Jianqiu Zhang, Yidong Chen, Yufei Huang","doi":"10.1109/GENSIPS.2010.5719679","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719679","url":null,"abstract":"The problem of uncovering transcriptional regulation by transcription factors (TFs) based on microarray data is considered. A novel Bayesian sparse correlated rectified factor model (BSCRFM) coupled with its ICM solution is proposed. BSCRFM models the unknown TF protein level activity, the correlated regulations between TFs, and the sparse nature of TF regulated genes and it admits prior knowledge from existing database regarding TF regulated target genes. An efficient ICM algorithm is developed and a context-specific transcriptional regulatory network specific to the experimental condition of the microarray data can be obtained. The proposed model and the ICM algorithm are evaluated on the simulated systems and results demonstrated the validity and effectiveness of the proposed approach. The proposed model is also applied to the breast cancer microarray data and a TF regulated network regarding ER status is obtained.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133668696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719680
B. Nazer, R. Nowak
This paper develops theoretical bounds on the number of required experiments to infer which genes are active in a particular biological process. The standard approach is to perform many experiments, each with a single gene suppressed or knocked down. However, certain effects are not revealed by single-gene knockouts and are only observed when two or more genes are suppressed simultaneously. Here, we propose a framework for identifying such interactions without resorting to an exhaustive pairwise search. We exploit the inherent sparsity of the problem that stems from the fact that very few gene pairs are likely to be active. We model the biological process by a multilinear function with unknown coefficients and develop a compressed sensing framework for inferring the coefficients. Our main result is that if at most S gene or gene pairs are active out of N total then approximately S2 log N measurements suffice to identify the significant active components.
{"title":"Efficient designs for multiple gene knockdown experiments","authors":"B. Nazer, R. Nowak","doi":"10.1109/GENSIPS.2010.5719680","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719680","url":null,"abstract":"This paper develops theoretical bounds on the number of required experiments to infer which genes are active in a particular biological process. The standard approach is to perform many experiments, each with a single gene suppressed or knocked down. However, certain effects are not revealed by single-gene knockouts and are only observed when two or more genes are suppressed simultaneously. Here, we propose a framework for identifying such interactions without resorting to an exhaustive pairwise search. We exploit the inherent sparsity of the problem that stems from the fact that very few gene pairs are likely to be active. We model the biological process by a multilinear function with unknown coefficients and develop a compressed sensing framework for inferring the coefficients. Our main result is that if at most S gene or gene pairs are active out of N total then approximately S2 log N measurements suffice to identify the significant active components.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121026827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719671
N. Bhardwaj, M. Gerstein
Gene regulatory networks have been shown to share some common aspects with commonplace social governance structures such as hierarchies. Thus, we can get some intuition into their organization by arranging them into well-known hierarchical layouts. Here we study a wide range of regulatory networks (transcriptional, modification and phosphorylation) in a hierarchical context for five evolutionarily diverse species. We specify three levels of regulators — top, middle and bottom — which collectively regulate the non-regulator targets lying in the lowest fourth level, and we define quantities for nodes, levels and entire networks that measure their degree of collaboration and autocratic or democratic character. Overall we show that in all the networks studied, the middle level has the highest collaborative propensity and that co-regulatory partnerships occur most frequently amongst mid-level regulators, an observation that has parallels in efficient corporate settings where middle managers need to interact most to ensure organizational effectiveness. Then to study dynamic effects, superimpose the phenotypic effects of tampering with nodes and edges directly onto the hierarchies. We reconstruct modified hierarchies reflecting changes in the level of regulators within the hierarchy upon deletions or insertions of nodes or edges. Overall, we find that rewiring events that affect upper levels have a more dramatic effect on cell proliferation rate and survival than do those involving lower levels. We also investigate other features connected to the importance of upper-level regulators: expression divergence, back-up copies and expression level.
{"title":"Dynamic and static analysis of transcriptional regulatory networks in a hierarchical context","authors":"N. Bhardwaj, M. Gerstein","doi":"10.1109/GENSIPS.2010.5719671","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719671","url":null,"abstract":"Gene regulatory networks have been shown to share some common aspects with commonplace social governance structures such as hierarchies. Thus, we can get some intuition into their organization by arranging them into well-known hierarchical layouts. Here we study a wide range of regulatory networks (transcriptional, modification and phosphorylation) in a hierarchical context for five evolutionarily diverse species. We specify three levels of regulators — top, middle and bottom — which collectively regulate the non-regulator targets lying in the lowest fourth level, and we define quantities for nodes, levels and entire networks that measure their degree of collaboration and autocratic or democratic character. Overall we show that in all the networks studied, the middle level has the highest collaborative propensity and that co-regulatory partnerships occur most frequently amongst mid-level regulators, an observation that has parallels in efficient corporate settings where middle managers need to interact most to ensure organizational effectiveness. Then to study dynamic effects, superimpose the phenotypic effects of tampering with nodes and edges directly onto the hierarchies. We reconstruct modified hierarchies reflecting changes in the level of regulators within the hierarchy upon deletions or insertions of nodes or edges. Overall, we find that rewiring events that affect upper levels have a more dramatic effect on cell proliferation rate and survival than do those involving lower levels. We also investigate other features connected to the importance of upper-level regulators: expression divergence, back-up copies and expression level.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133589476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-01DOI: 10.1109/GENSIPS.2010.5719673
Ting-Ju Chen, U. Braga-Neto
Estimation of the classification error and of the coefficient of determination (CoD) is a fundamental issue in discrete prediction problems. Analytical expressions for exact performance metrics of non-randomized error estimators and CoD estimators have been derived in previous publications by the authors. However, computation of these expressions becomes problematic as the sample size or predictor complexity increases, particularly in the case of second moments. Thus, fast and accurate approximations are desirable. In this paper, we make approximations to the variances of resubstitution and leave-one-out error estimators and CoD estimators. Our results show that these approximations not only are quite accurate but also reduce computation time tremendously.
{"title":"Approximate expressions for the variances of non-randomized error estimators and CoD estimators for the discrete histogram rule","authors":"Ting-Ju Chen, U. Braga-Neto","doi":"10.1109/GENSIPS.2010.5719673","DOIUrl":"https://doi.org/10.1109/GENSIPS.2010.5719673","url":null,"abstract":"Estimation of the classification error and of the coefficient of determination (CoD) is a fundamental issue in discrete prediction problems. Analytical expressions for exact performance metrics of non-randomized error estimators and CoD estimators have been derived in previous publications by the authors. However, computation of these expressions becomes problematic as the sample size or predictor complexity increases, particularly in the case of second moments. Thus, fast and accurate approximations are desirable. In this paper, we make approximations to the variances of resubstitution and leave-one-out error estimators and CoD estimators. Our results show that these approximations not only are quite accurate but also reduce computation time tremendously.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121123030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}