Since the ENCODE project published its final results in a series of articles in 2012, there is no consensus on what its implications are. ENCODE's central and most controversial claim was that there is essentially no junk DNA: most sections of the human genome believed to be «junk» are functional. This claim was met with many reservations. If researchers disagree about whether there is junk DNA, they have first to agree on a concept of function and how function, given a particular definition, can be discovered. The ENCODE debate centered on a notion of function that assumes a strong dichotomy between evolutionary and non-evolutionary function and causes, prevalent in the Modern Evolutionary Synthesis. In contrast to how the debate is typically portrayed, both sides share a commitment to this distinction. This distinction is, however, much debated in alternative approaches to evolutionary theory, such as the EES. We show that because the ENCODE debate is grounded in a particular notion of function, it is unclear how it connects to broader debates about what is the correct evolutionary framework. Furthermore, we show how arguments brought forward in the controversy, particularly arguments from mathematical population genetics, are deeply embedded in their particular disciplinary contexts, and reflect substantive assumptions about the evolution of genomes. With this article, we aim to provide an anatomy of the ENCODE debate that offers a new perspective on the notions of function both sides employed, as well as to situate the ENCODE debate within wider debates regarding the forces operating in evolution.
{"title":"A third way to the selected effect/causal role distinction in the great encode debate.","authors":"Ehud Lamm, Sophie Juliane Veigl","doi":"10.19272/202311402004","DOIUrl":"https://doi.org/10.19272/202311402004","url":null,"abstract":"<p><p>Since the ENCODE project published its final results in a series of articles in 2012, there is no consensus on what its implications are. ENCODE's central and most controversial claim was that there is essentially no junk DNA: most sections of the human genome believed to be «junk» are functional. This claim was met with many reservations. If researchers disagree about whether there is junk DNA, they have first to agree on a concept of function and how function, given a particular definition, can be discovered. The ENCODE debate centered on a notion of function that assumes a strong dichotomy between evolutionary and non-evolutionary function and causes, prevalent in the Modern Evolutionary Synthesis. In contrast to how the debate is typically portrayed, both sides share a commitment to this distinction. This distinction is, however, much debated in alternative approaches to evolutionary theory, such as the EES. We show that because the ENCODE debate is grounded in a particular notion of function, it is unclear how it connects to broader debates about what is the correct evolutionary framework. Furthermore, we show how arguments brought forward in the controversy, particularly arguments from mathematical population genetics, are deeply embedded in their particular disciplinary contexts, and reflect substantive assumptions about the evolution of genomes. With this article, we aim to provide an anatomy of the ENCODE debate that offers a new perspective on the notions of function both sides employed, as well as to situate the ENCODE debate within wider debates regarding the forces operating in evolution.</p>","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"116 1-2","pages":"53-74"},"PeriodicalIF":1.5,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10165575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1007/978-981-19-4979-1
M. Nakamaru
{"title":"Trust and Credit in Organizations and Institutions","authors":"M. Nakamaru","doi":"10.1007/978-981-19-4979-1","DOIUrl":"https://doi.org/10.1007/978-981-19-4979-1","url":null,"abstract":"","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"138 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77530800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1007/978-981-19-6016-1
H. Seno
{"title":"A Primer on Population Dynamics Modeling","authors":"H. Seno","doi":"10.1007/978-981-19-6016-1","DOIUrl":"https://doi.org/10.1007/978-981-19-6016-1","url":null,"abstract":"","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"149 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75754346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1007/978-981-19-8257-6
Makoto Sato
{"title":"Getting Started in Mathematical Life Sciences","authors":"Makoto Sato","doi":"10.1007/978-981-19-8257-6","DOIUrl":"https://doi.org/10.1007/978-981-19-8257-6","url":null,"abstract":"","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"21 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81894063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Longitudinal functional data are increasingly common in the health domain. The motivated dataset for this paper comprises H-NMR spectra of kidney transplant patients [8]. Our aim is to cluster patients into different clinical outcome subgoups to reveal the success of the transplantation. The NMR spectra of each patient at each time point are functional data and the data are longitudinally collected at up to nine different time points. Existing methods are available for functional data collected at one time point, but not for longitudinal functional data collected at a grid of time points subject to missingness. We therefore first apply a method to extract the same number of functional feactures for each subject. Next we propose a novel nonparametric clustering method for mulitivariate functional data. We applied our proposed clustering method to the kidney transplant dataset both to a subset of the raw data with only two time points and the extacted functional features. It appeared that the proposed method achieves better clustering performance on the extracted functional features than on the subset of raw data. A data simulation study was performed to further evaluate the method. The design mimiced the kidney transplant dataset but with a larger sample size. Scenarios which have different levels of noise were considered. The simulation study showed the accuarcy of our proposed method.
{"title":"Nonparametric clustering for longitudinal functional data with the application to H-NMR spectra of kidney transplant patients. Longitudinal functional data clustering.","authors":"Minzhen Xie, Haiyan Liu, Jeanine Houwing-Duistermaat","doi":"10.19272/202111401003","DOIUrl":"https://doi.org/10.19272/202111401003","url":null,"abstract":"Longitudinal functional data are increasingly common in the health domain. The motivated dataset for this paper comprises H-NMR spectra of kidney transplant patients [8]. Our aim is to cluster patients into different clinical outcome subgoups to reveal the success of the transplantation. The NMR spectra of each patient at each time point are functional data and the data are longitudinally collected at up to nine different time points. Existing methods are available for functional data collected at one time point, but not for longitudinal functional data collected at a grid of time points subject to missingness. We therefore first apply a method to extract the same number of functional feactures for each subject. Next we propose a novel nonparametric clustering method for mulitivariate functional data. We applied our proposed clustering method to the kidney transplant dataset both to a subset of the raw data with only two time points and the extacted functional features. It appeared that the proposed method achieves better clustering performance on the extracted functional features than on the subset of raw data. A data simulation study was performed to further evaluate the method. The design mimiced the kidney transplant dataset but with a larger sample size. Scenarios which have different levels of noise were considered. The simulation study showed the accuarcy of our proposed method.","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"114 1-2 1","pages":"15-28"},"PeriodicalIF":1.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49105459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annah Mwikali Muli, A. Gusnanto, Jeanine Houwing-Duistermaat
In survival analysis, the effect of a covariate on the outcome is reported in a hazard rate. However, hazards rates are hard to interpret. Here we consider differences in survival probabilities instead. Using data on twins is interesting due to the fact that many observed and unobserved factors are controlled or matched. To model the correlation between twins, some authors have proposed survival models with frailties or random effects. However, there is a potential danger of bias in the estimation if the frailty distribution is misspecified. Frailties are often assumed to follow a gamma distribution. To safeguard us from the impact of the misspecification of this distribution, we consider a flexible non-parametric baseline hazard in addition to a parametric one. We will apply this methodology to the TwinsUK cohort to predict the probability of experiencing a fracture in the next five or ten years, given their bone mineral densities (BMD) and their frailty index. The models with parametric and non-parametric baseline hazards yield very close results in estimating survival probabilities and thus a choice of parametric baseline hazard is generally preferred. We find that bone mineral density is a significant predictor in the model whereas frailty index is not. Low BMD leads to a larger probability of fracture; e.g, in 10 years, the probability of fracture is 21% for low BMD group, 16% for medium BMD group and 8% for high BMD group.
{"title":"Use of shared gamma frailty model in analysis of survival data in twins.","authors":"Annah Mwikali Muli, A. Gusnanto, Jeanine Houwing-Duistermaat","doi":"10.19272/202111402005","DOIUrl":"https://doi.org/10.19272/202111402005","url":null,"abstract":"In survival analysis, the effect of a covariate on the outcome is reported in a hazard rate. However, hazards rates are hard to interpret. Here we consider differences in survival probabilities instead. Using data on twins is interesting due to the fact that many observed and unobserved factors are controlled or matched. To model the correlation between twins, some authors have proposed survival models with frailties or random effects. However, there is a potential danger of bias in the estimation if the frailty distribution is misspecified. Frailties are often assumed to follow a gamma distribution. To safeguard us from the impact of the misspecification of this distribution, we consider a flexible non-parametric baseline hazard in addition to a parametric one. We will apply this methodology to the TwinsUK cohort to predict the probability of experiencing a fracture in the next five or ten years, given their bone mineral densities (BMD) and their frailty index. The models with parametric and non-parametric baseline hazards yield very close results in estimating survival probabilities and thus a choice of parametric baseline hazard is generally preferred. We find that bone mineral density is a significant predictor in the model whereas frailty index is not. Low BMD leads to a larger probability of fracture; e.g, in 10 years, the probability of fracture is 21% for low BMD group, 16% for medium BMD group and 8% for high BMD group.","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"114 1-2 1","pages":"45-58"},"PeriodicalIF":1.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43590946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Fuady, S. el Bouhaddani, H. Uh, Jeanine Houwing-Duistermaat
Multiple technologies which measure the same omics data set but are based on different aspects of the molecules exist. In practice, studies use different technologies and have therefore different biomarkers. An example is the glycan age index, which is constructed by three different ultra-performance liquid chromatography (UPLC) IgG glycans, and is a biomarker for biological age. A second technology is liquid chromatography- mass spectrometry (LCMS). To estimate the effect of a biomarker on an outcome variable, two issues need to be addressed. Firstly, a measurement error is needed to map one technology to the other one using a calibration study. Here, we consider two approaches, namely one based on the chemical properties of the two technologies and one based on the estimation of this relationship using O2PLS. Secondly, the use of an approximation of the biomarker in the main study needs to be taken into account by use of a regression calibration method. The performance of the two approaches is studied via simulations. The methods are used to estimate the relationship between glycan age and menopause. We have data from two cohorts, namely Korcula and Vis. In conclusion, (1) both measurement error models give similar results and suggest that there is an association between the glycan age index and the menopause status, (2) the chemical mapping approach outperforms O2PLS in the low measurement error variance, while on the larger measurement error variance, O2PLS works better, (3) statistical efficiency is lost due to increased noise level by adding irrelevant information.
{"title":"Estimation of the effect of surrogate multi-omic biomarkers.","authors":"A. Fuady, S. el Bouhaddani, H. Uh, Jeanine Houwing-Duistermaat","doi":"10.19272/202111402006","DOIUrl":"https://doi.org/10.19272/202111402006","url":null,"abstract":"Multiple technologies which measure the same omics data set but are based on different aspects of the molecules exist. In practice, studies use different technologies and have therefore different biomarkers. An example is the glycan age index, which is constructed by three different ultra-performance liquid chromatography (UPLC) IgG glycans, and is a biomarker for biological age. A second technology is liquid chromatography- mass spectrometry (LCMS). To estimate the effect of a biomarker on an outcome variable, two issues need to be addressed. Firstly, a measurement error is needed to map one technology to the other one using a calibration study. Here, we consider two approaches, namely one based on the chemical properties of the two technologies and one based on the estimation of this relationship using O2PLS. Secondly, the use of an approximation of the biomarker in the main study needs to be taken into account by use of a regression calibration method. The performance of the two approaches is studied via simulations. The methods are used to estimate the relationship between glycan age and menopause. We have data from two cohorts, namely Korcula and Vis. In conclusion, (1) both measurement error models give similar results and suggest that there is an association between the glycan age index and the menopause status, (2) the chemical mapping approach outperforms O2PLS in the low measurement error variance, while on the larger measurement error variance, O2PLS works better, (3) statistical efficiency is lost due to increased noise level by adding irrelevant information.","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"114 1-2 1","pages":"59-73"},"PeriodicalIF":1.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42019619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sonia Dembowska, Alejandro F Frangi, Jeanine Houwing-Duistermaat, Haiyan Liu
The use of statistical methods to predict outcomes using high dimensional datasets in medicine is becoming increasingly popular for forecasting and monitoring patient health. Our work is motivated by a longitudinal dataset containing 1H NMR spectra of metabolites of 18 patients undergoing a kidney transplant alongside their graft outcomes that fall into one of three categories: acute rejection, delayed graft function and primary function. We proposed a functional partial least squares (FPLS) model that extends existing PLS methods for the analysis of longitudinally measured scalar omics datasets to the case of longitudinally measured functional datasets. We designed an iterative algorithm to link multiple time points, and then applied our proposed method to analyse the data from kidney transplant patients. Finally, we compared the AUC of our method to the AUC of the univariate methods which only use the information of one time-point information. It appeared that our method outperforms the existing methods. A simulation study was performed to mimic the kidney transplant dataset but with a larger sample size and different scenarios performed to evaluate the performance of the new method in larger datasets. We consider scenarios which vary in the difficulty to distinguish the two groups. It appeared that the three time-points model performs better than any of the individual models with average AUCs of 0.909 and 0.811 respectively.
{"title":"Multivariate functional partial least squares for classification using longitudinal data.","authors":"Sonia Dembowska, Alejandro F Frangi, Jeanine Houwing-Duistermaat, Haiyan Liu","doi":"10.19272/202111402007","DOIUrl":"https://doi.org/10.19272/202111402007","url":null,"abstract":"The use of statistical methods to predict outcomes using high dimensional datasets in medicine is becoming increasingly popular for forecasting and monitoring patient health. Our work is motivated by a longitudinal dataset containing 1H NMR spectra of metabolites of 18 patients undergoing a kidney transplant alongside their graft outcomes that fall into one of three categories: acute rejection, delayed graft function and primary function. We proposed a functional partial least squares (FPLS) model that extends existing PLS methods for the analysis of longitudinally measured scalar omics datasets to the case of longitudinally measured functional datasets. We designed an iterative algorithm to link multiple time points, and then applied our proposed method to analyse the data from kidney transplant patients. Finally, we compared the AUC of our method to the AUC of the univariate methods which only use the information of one time-point information. It appeared that our method outperforms the existing methods. A simulation study was performed to mimic the kidney transplant dataset but with a larger sample size and different scenarios performed to evaluate the performance of the new method in larger datasets. We consider scenarios which vary in the difficulty to distinguish the two groups. It appeared that the three time-points model performs better than any of the individual models with average AUCs of 0.909 and 0.811 respectively.","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"114 1-2 1","pages":"75-88"},"PeriodicalIF":1.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45744157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Budimir, C. Sala, M. G. Bacalini, P. Garagnani, G. Castellani
DNA methylation studies usually focus on the groups of CpG sites. Neighbouring CpG sites are analyzed together due to their group behaviour. However, this approach ignores the possible interaction between more distant CpG sites. In this work, we investigate the complete methylation correlation structure of chromosome 21. Two data sets were used for the correlation analysis, smaller data set with methylation measurements from Down syndrome patients and their family members and larger data set with healthy subjects. This allowed us to examine the general properties of the methylation correlation structure as well as its modifications in presence of an extra copy of the chromosome. We observed that the CpG sites work in small highly correlated groups. While some groups coincided with CpG islands, other groups contained CpG sites scattered across the whole chromosome. Groups of highly correlated CpG sites remained preserved in the case of Down syndrome. Moreover, the methylome of a Down syndrome patient had newly formed correlations between CpG sites suggesting that the methylation correlation structure in Down syndrome is stronger than in case of an unaffected individual.
{"title":"DNA methylation correlation structure of chromosome 21 in Down syndrome.","authors":"I. Budimir, C. Sala, M. G. Bacalini, P. Garagnani, G. Castellani","doi":"10.19272/202111402008","DOIUrl":"https://doi.org/10.19272/202111402008","url":null,"abstract":"DNA methylation studies usually focus on the groups of CpG sites. Neighbouring CpG sites are analyzed together due to their group behaviour. However, this approach ignores the possible interaction between more distant CpG sites. In this work, we investigate the complete methylation correlation structure of chromosome 21. Two data sets were used for the correlation analysis, smaller data set with methylation measurements from Down syndrome patients and their family members and larger data set with healthy subjects. This allowed us to examine the general properties of the methylation correlation structure as well as its modifications in presence of an extra copy of the chromosome. We observed that the CpG sites work in small highly correlated groups. While some groups coincided with CpG islands, other groups contained CpG sites scattered across the whole chromosome. Groups of highly correlated CpG sites remained preserved in the case of Down syndrome. Moreover, the methylome of a Down syndrome patient had newly formed correlations between CpG sites suggesting that the methylation correlation structure in Down syndrome is stronger than in case of an unaffected individual.","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"114 1-2 1","pages":"89-113"},"PeriodicalIF":1.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44461506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}