Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562947
Giang V. Trinh, K. Hiraishi
Attractor detection in Asynchronous Boolean Networks (ABNs) is very challenging due to the high complexity of the state transition graph of an ABN. Recently, an efficient method (called FVS-ARBN) has been proposed for exactly finding attractors of an ABN. FVS-ARBN uses a Feedback Vertex Set (FVS) to get a candidate set of states, then filters out this set by checking the reachability in ABNs. This method gives promising results; however, it still needs to be improved to handle larger networks. In this paper, we propose a new method (named iFVS-ABN) that includes two improvements to FVS-ARBN. First, we propose a reasonable combination of multiple existing techniques to efficiently check the reachability in ABNs. Second, we formally state and prove a relation between a Negative Feedback Vertex Set (NFVS) and the dynamics of an ABN. Based on this relation, we propose to use an NFVS instead of an FVS to get the candidate set of states. Experimental results show that the two improvements are effective and the improved method outperforms the original one.
{"title":"An Improved Method for Finding Attractors of Large-Scale Asynchronous Boolean Networks","authors":"Giang V. Trinh, K. Hiraishi","doi":"10.1109/CIBCB49929.2021.9562947","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562947","url":null,"abstract":"Attractor detection in Asynchronous Boolean Networks (ABNs) is very challenging due to the high complexity of the state transition graph of an ABN. Recently, an efficient method (called FVS-ARBN) has been proposed for exactly finding attractors of an ABN. FVS-ARBN uses a Feedback Vertex Set (FVS) to get a candidate set of states, then filters out this set by checking the reachability in ABNs. This method gives promising results; however, it still needs to be improved to handle larger networks. In this paper, we propose a new method (named iFVS-ABN) that includes two improvements to FVS-ARBN. First, we propose a reasonable combination of multiple existing techniques to efficiently check the reachability in ABNs. Second, we formally state and prove a relation between a Negative Feedback Vertex Set (NFVS) and the dynamics of an ABN. Based on this relation, we propose to use an NFVS instead of an FVS to get the candidate set of states. Experimental results show that the two improvements are effective and the improved method outperforms the original one.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"400 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121804811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562876
Guangyao Chen, James Sargant, S. Houghten, T. K. Collins
A multi-objective genetic algorithm is applied to the problem of identifying genes associated with Alzheimer's disease. The input to the genetic algorithm is a set of centrality measures obtained by merging various biological evidence types into a complex network, based on a set of 11 genes already known to be associated with this disease. In terms of leave-one-out validation, the strongest results are obtained using betweenness, with ranking showing that better results are sometimes obtained by including either stress or load with betweenness. The overall ranking of the genes across all runs is examined and suggests some genes worthy of further study with respect to their link to this disease. The methodology is also evaluated with respect to robustness by modifying the original network by a range of percentages, and applying the methodology to these variations. The results show that the methodology returns very similar results under these circumstances.
{"title":"Identification of Genes Associated with Alzheimer's Disease using Evolutionary Computation","authors":"Guangyao Chen, James Sargant, S. Houghten, T. K. Collins","doi":"10.1109/CIBCB49929.2021.9562876","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562876","url":null,"abstract":"A multi-objective genetic algorithm is applied to the problem of identifying genes associated with Alzheimer's disease. The input to the genetic algorithm is a set of centrality measures obtained by merging various biological evidence types into a complex network, based on a set of 11 genes already known to be associated with this disease. In terms of leave-one-out validation, the strongest results are obtained using betweenness, with ranking showing that better results are sometimes obtained by including either stress or load with betweenness. The overall ranking of the genes across all runs is examined and suggests some genes worthy of further study with respect to their link to this disease. The methodology is also evaluated with respect to robustness by modifying the original network by a range of percentages, and applying the methodology to these variations. The results show that the methodology returns very similar results under these circumstances.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116505171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562940
N. Ramadhanti, W. Kusuma, I. Batubara, R. Heryanto
The covid-19 pandemic had been on the rise since the beginning of 2020. In Indonesia itself, the first case was identified on 3rd March 2020, then peaked at around the end of January 2021. Even though the recent number of covid-19 cases is not as much as the peak time, the positive case has been increasing from around 2600 to 6300 cases every day in the last month. This phenomenon is urging people to take better care of their health. One of the alternatives Indonesian takes to maintain and increase their health is using herbal medicine. Indonesia is one of the countries with a flourishing number of herbal species. Eucalyptus is one of herbal plants with lots of benefits. Even before the pandemic eucalyptus oil has been used for daily use by many in Indonesia. In this study, we predict the compounds in eucalyptus which have any interaction with protein in SARS-COV-2 virus using machine learning method, namely Random Forest. This is one of the applications of the drug-discovery method, drug repurposing, which used existing drug-target interaction data as a model to predict drug compounds with unidentified interaction with targets. Applying this method, we predicted some compounds found in eucalyptus, such as alpha-terpinene, and 1,8-cineole might have an interaction with covid-19 protein thus eucalyptus can be used as a preventive measure.
{"title":"Random Forest to Predict Eucalyptus as a Potential Herb in Preventing Covid19","authors":"N. Ramadhanti, W. Kusuma, I. Batubara, R. Heryanto","doi":"10.1109/CIBCB49929.2021.9562940","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562940","url":null,"abstract":"The covid-19 pandemic had been on the rise since the beginning of 2020. In Indonesia itself, the first case was identified on 3rd March 2020, then peaked at around the end of January 2021. Even though the recent number of covid-19 cases is not as much as the peak time, the positive case has been increasing from around 2600 to 6300 cases every day in the last month. This phenomenon is urging people to take better care of their health. One of the alternatives Indonesian takes to maintain and increase their health is using herbal medicine. Indonesia is one of the countries with a flourishing number of herbal species. Eucalyptus is one of herbal plants with lots of benefits. Even before the pandemic eucalyptus oil has been used for daily use by many in Indonesia. In this study, we predict the compounds in eucalyptus which have any interaction with protein in SARS-COV-2 virus using machine learning method, namely Random Forest. This is one of the applications of the drug-discovery method, drug repurposing, which used existing drug-target interaction data as a model to predict drug compounds with unidentified interaction with targets. Applying this method, we predicted some compounds found in eucalyptus, such as alpha-terpinene, and 1,8-cineole might have an interaction with covid-19 protein thus eucalyptus can be used as a preventive measure.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132619862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562943
Sumaiya Amin, S. Houghten, J. Hughes
How best to apply vaccines to a population is an open problem. It is trivial to derive intuitive strategies, but until tested, their efficacy is not known. This problem is particularly challenging when considering the dynamics of social contact networks and their changes over time. A system for automatically discovering tested vaccination strategies with evolutionary computation has been improved upon to include additional graph metrics and to generate vaccination strategies for dynamic graphs, something that is expected of real social networks within communities. The system's ability to generate effective strategies was demonstrated along with a comparison of the strategies developed when fit to a static graph versus a dynamic graph. It was observed that the additional computational resources required to generate strategies on a dynamic graph may not be necessary as strategies developed for static graphs performed similarly well; however, the authors are careful to acknowledge that results may differ significantly when adjusting the systems many parameters.
{"title":"Vaccinating a Population is a Changing Programming Problem","authors":"Sumaiya Amin, S. Houghten, J. Hughes","doi":"10.1109/CIBCB49929.2021.9562943","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562943","url":null,"abstract":"How best to apply vaccines to a population is an open problem. It is trivial to derive intuitive strategies, but until tested, their efficacy is not known. This problem is particularly challenging when considering the dynamics of social contact networks and their changes over time. A system for automatically discovering tested vaccination strategies with evolutionary computation has been improved upon to include additional graph metrics and to generate vaccination strategies for dynamic graphs, something that is expected of real social networks within communities. The system's ability to generate effective strategies was demonstrated along with a comparison of the strategies developed when fit to a static graph versus a dynamic graph. It was observed that the additional computational resources required to generate strategies on a dynamic graph may not be necessary as strategies developed for static graphs performed similarly well; however, the authors are careful to acknowledge that results may differ significantly when adjusting the systems many parameters.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130200837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562773
S. Spolaor, D. Papetti, P. Cazzaniga, D. Besozzi, M. S. Nobile
Combination therapies represent one of the most effective strategy in inducing cancer cell death and reducing the risk to develop drug resistance. The identification of putative novel drug combinations, which typically requires the execution of expensive and time consuming lab experiments, can be supported by the synergistic use of mathematical models and multi-objective optimization algorithms. The computational approach allows to automatically search for potential therapeutic combinations and to test their effectiveness in silico, thus reducing the costs of time and money, and driving the experiments toward the most promising therapies. In this work, we couple dynamic fuzzy modeling of cancer cells with different multi-objective optimization algorithm, and we compare their performance in identifying drug target combinations. Specifically, we perform batches of optimizations with 3 and 4 objective functions defined to achieve a desired behavior of the system (e.g., maximize apop-tosis while minimizing necrosis and survival), and we compare the quality of the solutions included in the Pareto fronts. Our results show that both the choice of the multi-objective algorithm and the formulation of the optimization problem have an impact on the identified solutions, highlighting the strengths as well as the limitations of this approach.
{"title":"A comparison of multi-objective optimization algorithms to identify drug target combinations","authors":"S. Spolaor, D. Papetti, P. Cazzaniga, D. Besozzi, M. S. Nobile","doi":"10.1109/CIBCB49929.2021.9562773","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562773","url":null,"abstract":"Combination therapies represent one of the most effective strategy in inducing cancer cell death and reducing the risk to develop drug resistance. The identification of putative novel drug combinations, which typically requires the execution of expensive and time consuming lab experiments, can be supported by the synergistic use of mathematical models and multi-objective optimization algorithms. The computational approach allows to automatically search for potential therapeutic combinations and to test their effectiveness in silico, thus reducing the costs of time and money, and driving the experiments toward the most promising therapies. In this work, we couple dynamic fuzzy modeling of cancer cells with different multi-objective optimization algorithm, and we compare their performance in identifying drug target combinations. Specifically, we perform batches of optimizations with 3 and 4 objective functions defined to achieve a desired behavior of the system (e.g., maximize apop-tosis while minimizing necrosis and survival), and we compare the quality of the solutions included in the Pareto fronts. Our results show that both the choice of the multi-objective algorithm and the formulation of the optimization problem have an impact on the identified solutions, highlighting the strengths as well as the limitations of this approach.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123318087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562959
Yanhua Xu, D. Wojtczak
The rapid mutation of influenza virus threatens public health. Reassortment among viruses with different hosts can lead to a fatal pandemic. However, it is difficult to detect the original host of the virus during or after an outbreak as influenza viruses can circulate between different species. Therefore, early and rapid detection of the viral host would help reduce the further spread of the virus. We use various machine learning models with features derived from the position-specific scoring matrix (PSSM) and features learned from word embedding and word encoding to infer the origin host of viruses. The results show that the performance of the PSSM-based model reaches the MCC around 95%, and the F1, around 96%. The MCC obtained using the model with word embedding is around 96%, and the F1 is around 97%.
{"title":"Predicting Influenza A Viral Host Using PSSM and Word Embeddings","authors":"Yanhua Xu, D. Wojtczak","doi":"10.1109/CIBCB49929.2021.9562959","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562959","url":null,"abstract":"The rapid mutation of influenza virus threatens public health. Reassortment among viruses with different hosts can lead to a fatal pandemic. However, it is difficult to detect the original host of the virus during or after an outbreak as influenza viruses can circulate between different species. Therefore, early and rapid detection of the viral host would help reduce the further spread of the virus. We use various machine learning models with features derived from the position-specific scoring matrix (PSSM) and features learned from word embedding and word encoding to infer the origin host of viruses. The results show that the performance of the PSSM-based model reaches the MCC around 95%, and the F1, around 96%. The MCC obtained using the model with word embedding is around 96%, and the F1 is around 97%.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115954133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562815
V. Mayya, Sowmya S Kamath, V. Sugumaran
Effective code assignment for patient clinical records in a hospital plays a significant role in the process of standardizing medical records, mainly for streamlining clinical care delivery, billing, and managing insurance claims. The current practice employed is manual coding, usually carried out by trained medical coders, making the process subjective, error-prone, inexact, and time-consuming. To alleviate this cost-intensive process, intelligent coding systems built on patients' structured electronic medical records are critical. Classification of medical diagnostic codes, like ICD-10, is widely employed to categorize patients' clinical conditions and associated diagnoses. In this work, we present a neural model $mathcal{LAJA}$, built on Label Attention Transformer Architectures for automatic assignment of ICD-10 codes. Our work is benchmarked on the CodiEsp dataset, a dataset for automatic clinical coding systems for multilingual medical documents, used in the eHealth CLEF 2020-Multilingual Information Extraction Shared Task. The experimental results reveal that the proposed $mathcal{LAJA}$ variants outperform their basic BERT counterparts by 33-49% in terms of standard metrics like precision, recall, F1-score and mean average precision. The label attention mechanism also enables direct extraction of textual evidence in medical documents that map to the clinical ICD-10 diagnostic codes.
{"title":"$mathcal{LAJA}{-}$ Label Attention Transformer Architectures for ICD-10 Coding of Unstructured Clinical Notes","authors":"V. Mayya, Sowmya S Kamath, V. Sugumaran","doi":"10.1109/CIBCB49929.2021.9562815","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562815","url":null,"abstract":"Effective code assignment for patient clinical records in a hospital plays a significant role in the process of standardizing medical records, mainly for streamlining clinical care delivery, billing, and managing insurance claims. The current practice employed is manual coding, usually carried out by trained medical coders, making the process subjective, error-prone, inexact, and time-consuming. To alleviate this cost-intensive process, intelligent coding systems built on patients' structured electronic medical records are critical. Classification of medical diagnostic codes, like ICD-10, is widely employed to categorize patients' clinical conditions and associated diagnoses. In this work, we present a neural model $mathcal{LAJA}$, built on Label Attention Transformer Architectures for automatic assignment of ICD-10 codes. Our work is benchmarked on the CodiEsp dataset, a dataset for automatic clinical coding systems for multilingual medical documents, used in the eHealth CLEF 2020-Multilingual Information Extraction Shared Task. The experimental results reveal that the proposed $mathcal{LAJA}$ variants outperform their basic BERT counterparts by 33-49% in terms of standard metrics like precision, recall, F1-score and mean average precision. The label attention mechanism also enables direct extraction of textual evidence in medical documents that map to the clinical ICD-10 diagnostic codes.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121585843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562906
Gulustan Dogan, Sinem Sena Ertas, Iremnaz Cay
Using smartphone sensors to recognize human activity may be advantageous due to the abundant volume of data that can be obtained. In this paper, we propose a sensor data based deep learning approach for recognizing human activity. Our proposed recognition method uses linear accelerometer (LAcc), gyroscope (Gyr), and magnetometer (Mag) sensors to perceive eight transportation and locomotion activities. The eight activities include: Still, Walk, Run, Bike, Bus, Car, Train, and Subway. In this study, the Sussex-Huawei Locomotion (SHL) Dataset of three participants are used to recognize the physical activities of the users. Fast Fourier Transform (FFT) spectrograms generated from the three axes of the LAcc, Gyr, and Mag sensor data are used as input data for our proposed Convolutional Neural Network (CNN) model. Experimental results on the task of human activity recognition demonstrated the effectiveness of our proposed user-independent approach over that of competitive baselines.
{"title":"Human Activity Recognition Using Convolutional Neural Networks","authors":"Gulustan Dogan, Sinem Sena Ertas, Iremnaz Cay","doi":"10.1109/CIBCB49929.2021.9562906","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562906","url":null,"abstract":"Using smartphone sensors to recognize human activity may be advantageous due to the abundant volume of data that can be obtained. In this paper, we propose a sensor data based deep learning approach for recognizing human activity. Our proposed recognition method uses linear accelerometer (LAcc), gyroscope (Gyr), and magnetometer (Mag) sensors to perceive eight transportation and locomotion activities. The eight activities include: Still, Walk, Run, Bike, Bus, Car, Train, and Subway. In this study, the Sussex-Huawei Locomotion (SHL) Dataset of three participants are used to recognize the physical activities of the users. Fast Fourier Transform (FFT) spectrograms generated from the three axes of the LAcc, Gyr, and Mag sensor data are used as input data for our proposed Convolutional Neural Network (CNN) model. Experimental results on the task of human activity recognition demonstrated the effectiveness of our proposed user-independent approach over that of competitive baselines.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132739753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562958
Jaskaran Gill, M. Chetty, Adrian B. R. Shatte, J. Hallinan
Reverse engineering of gene regulatory networks through temporal gene expression data is an active area of research. Among the plethora of modelling techniques under investigation is the decoupled S-system model, which attempts to capture the non-linearity of biological systems in detail. For the model, number of parameters to be estimated are significantly high even when the network is of small or medium scale. Thus, the inference process poses a significant computational burden. In this paper, we propose: (1) a novel population initialization technique, Dynamically Regulated Prediction Initialization (DRPI), which utilises prior knowledge of biological gene expression data to create a feedback loop to produce dynamically regulated high-quality individuals for initial population; (2) an adaptive fitness function; and (3) a method for the maintenance of population diversity. The aim of this work is to reduce the computational complexity of the inference algorithm, to speed up the entire process of reverse engineering. The performance of the proposed algorithm was evaluated against a benchmark dataset and compared with other methods from earlier work. The experimental results show that we succeeded in achieving higher accuracy results in lesser fitness evaluations, considerably reducing the computational burden of the inference process.
{"title":"Dynamically Regulated Initialization for S-system Modelling of Genetic Networks","authors":"Jaskaran Gill, M. Chetty, Adrian B. R. Shatte, J. Hallinan","doi":"10.1109/CIBCB49929.2021.9562958","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562958","url":null,"abstract":"Reverse engineering of gene regulatory networks through temporal gene expression data is an active area of research. Among the plethora of modelling techniques under investigation is the decoupled S-system model, which attempts to capture the non-linearity of biological systems in detail. For the model, number of parameters to be estimated are significantly high even when the network is of small or medium scale. Thus, the inference process poses a significant computational burden. In this paper, we propose: (1) a novel population initialization technique, Dynamically Regulated Prediction Initialization (DRPI), which utilises prior knowledge of biological gene expression data to create a feedback loop to produce dynamically regulated high-quality individuals for initial population; (2) an adaptive fitness function; and (3) a method for the maintenance of population diversity. The aim of this work is to reduce the computational complexity of the inference algorithm, to speed up the entire process of reverse engineering. The performance of the proposed algorithm was evaluated against a benchmark dataset and compared with other methods from earlier work. The experimental results show that we succeeded in achieving higher accuracy results in lesser fitness evaluations, considerably reducing the computational burden of the inference process.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117314651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-13DOI: 10.1109/CIBCB49929.2021.9562849
Jaspreet Singh, Jaswinder Singh, K. Paliwal, Andrew Busch, Yaoqi Zhou
Protein secondary structure prediction has been a long-standing problem in computational biology. Recent advances in deep contextual learning have enabled its performance in three-state prediction closer to the theoretical limit at 88–90%. Here, we showed that a large training set with 95% sequence identity cutoff can improve prediction of secondary structures even for those unrelated test sequences (<25% sequence identity cutoff) compared to the use of a non-redundant training dataset with 25% sequence identity cutoff. The three-state prediction edges closer to an accuracy of 87% and eight-state at 76%.The resulting model called SPOT-1D2 is freely available to academic users at https://github.com/jas-preet/SPOT-1D2.
{"title":"SPOT-1D2: Improving Protein Secondary Structure Prediction using High Sequence Identity Training Set and an Ensemble of Recurrent and Residual-convolutional Neural Networks","authors":"Jaspreet Singh, Jaswinder Singh, K. Paliwal, Andrew Busch, Yaoqi Zhou","doi":"10.1109/CIBCB49929.2021.9562849","DOIUrl":"https://doi.org/10.1109/CIBCB49929.2021.9562849","url":null,"abstract":"Protein secondary structure prediction has been a long-standing problem in computational biology. Recent advances in deep contextual learning have enabled its performance in three-state prediction closer to the theoretical limit at 88–90%. Here, we showed that a large training set with 95% sequence identity cutoff can improve prediction of secondary structures even for those unrelated test sequences (<25% sequence identity cutoff) compared to the use of a non-redundant training dataset with 25% sequence identity cutoff. The three-state prediction edges closer to an accuracy of 87% and eight-state at 76%.The resulting model called SPOT-1D2 is freely available to academic users at https://github.com/jas-preet/SPOT-1D2.","PeriodicalId":163387,"journal":{"name":"2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132501103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}