: Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial com- munities for microbiome data. To extract microbiome topics associated with a subject’s attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or super- vised topic model (SLDA,) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonpara- metric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.
{"title":"A Bayesian Nonparametric Topic Model for Microbiome Data Using Subject Attributes","authors":"T. Okui","doi":"10.2197/ipsjtbio.13.1","DOIUrl":"https://doi.org/10.2197/ipsjtbio.13.1","url":null,"abstract":": Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial com- munities for microbiome data. To extract microbiome topics associated with a subject’s attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or super- vised topic model (SLDA,) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonpara- metric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"142 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68500944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Advancements in technology have recently made it possible to obtain various types of biometric informa- tion from humans, enabling studies on estimation of human conditions in medicine, automobile safety, marketing, and other areas. These studies have particularly pointed to eye movement as an e ff ective indicator of human conditions, and research on its applications is actively being pursued. The devices now widely used for measuring eye movements are based on the video-oculography (VOG) method, wherein the direction of gaze is estimated by processing eye images obtained through a camera. Applying convolutional neural networks (ConvNet) to the processing of eye images has been shown to enable accurate and robust gaze estimation. Conventional image processing, however, is premised on execution using a personal computer, making it di ffi cult to carry out real-time gaze estimation using ConvNet, which involves the use of a large number of parameters, in a small arithmetic unit. Also, detecting eye movement events, such as blinking and saccadic movements, from the inferred gaze direction sequence for particular purposes requires the use of a separate algorithm. We therefore propose a new eye image processing method that batch-processes gaze estimation and event detection from end to end using an independently designed lightweight ConvNet. This paper discusses the structure of the proposed lightweight ConvNet, the methods for learning and evaluation used, and the proposed method’s ability to simultaneously detect gaze direction and event occurrence using a smaller memory and at lower computational complexity than conventional methods.
{"title":"Lightweight Convolutional Neural Network for Image Processing Method for Gaze Estimation and Eye Movement Event Detection","authors":"Joshua Emoto, Y. Hirata","doi":"10.2197/ipsjtbio.13.7","DOIUrl":"https://doi.org/10.2197/ipsjtbio.13.7","url":null,"abstract":": Advancements in technology have recently made it possible to obtain various types of biometric informa- tion from humans, enabling studies on estimation of human conditions in medicine, automobile safety, marketing, and other areas. These studies have particularly pointed to eye movement as an e ff ective indicator of human conditions, and research on its applications is actively being pursued. The devices now widely used for measuring eye movements are based on the video-oculography (VOG) method, wherein the direction of gaze is estimated by processing eye images obtained through a camera. Applying convolutional neural networks (ConvNet) to the processing of eye images has been shown to enable accurate and robust gaze estimation. Conventional image processing, however, is premised on execution using a personal computer, making it di ffi cult to carry out real-time gaze estimation using ConvNet, which involves the use of a large number of parameters, in a small arithmetic unit. Also, detecting eye movement events, such as blinking and saccadic movements, from the inferred gaze direction sequence for particular purposes requires the use of a separate algorithm. We therefore propose a new eye image processing method that batch-processes gaze estimation and event detection from end to end using an independently designed lightweight ConvNet. This paper discusses the structure of the proposed lightweight ConvNet, the methods for learning and evaluation used, and the proposed method’s ability to simultaneously detect gaze direction and event occurrence using a smaller memory and at lower computational complexity than conventional methods.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/ipsjtbio.13.7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68501082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BackgroundMetabolic engineering strategies enabling the production of specific target metabolites by host strains can be identified in silico through the use of metabolic network analysis such as flux balance analysis. This type of metabolic redesign is based on the computation of reactions that should be deleted from the original network representing the metabolism of the host strain to enable the production of the target metabolites while still ensuring its growth (the concept of growth coupling). In this context, it is important to use algorithms that enable this growth-coupled reaction deletions identification for any metabolic network topologies and any potential target metabolites. A recent method using a strong growth coupling assumption has been shown to be able to identify such computational redesign for nearly all metabolites included in the genome-scale metabolic models of Escherichia coli and Saccharomyces cerevisiae when cultivated under aerobic conditions. However, this approach enables the computational redesign of S. cerevisiae for only 3.9% of all metabolites if under anaerobic conditions. Therefore, it is necessary to develop algorithms able to perform for various culture conditions.ResultsThe author developed an algorithm that could calculate the reaction deletions that achieve the coupling of growth and production for 91.3% metabolites in genome-scale models of S. cerevisiae under anaerobic conditions. Computational experiments showed that the proposed algorithm is efficient also for aerobic conditions and Escherichia coli. In these analyses, the least target production rates were evaluated using flux variability analysis when multiple fluxes yield the highest growth rate. To demonstrate the feasibility of the coupling, the author derived appropriate reaction deletions using the new algorithm for target production in which the search space was divided into small cubes (CubeProd).ConclusionsThe author developed a novel algorithm, CubeProd, to demonstrate that growth coupling is possible for most metabolites in S.cerevisiae under anaerobic conditions. This may imply that growth coupling is possible by reaction deletions for most target metabolites in any genome-scale constraint-based metabolic networks. The developed software, CubeProd, implemented in MATLAB, and the obtained reaction deletion strategies are freely available.
{"title":"Efficient Reaction Deletion Algorithms for Redesign of Constraint-based Metabolic Networks for Metabolite Production with Weak Coupling","authors":"Takeyuki Tamura","doi":"10.21203/rs.2.24108/v1","DOIUrl":"https://doi.org/10.21203/rs.2.24108/v1","url":null,"abstract":"\u0000 BackgroundMetabolic engineering strategies enabling the production of specific target metabolites by host strains can be identified in silico through the use of metabolic network analysis such as flux balance analysis. This type of metabolic redesign is based on the computation of reactions that should be deleted from the original network representing the metabolism of the host strain to enable the production of the target metabolites while still ensuring its growth (the concept of growth coupling). In this context, it is important to use algorithms that enable this growth-coupled reaction deletions identification for any metabolic network topologies and any potential target metabolites. A recent method using a strong growth coupling assumption has been shown to be able to identify such computational redesign for nearly all metabolites included in the genome-scale metabolic models of Escherichia coli and Saccharomyces cerevisiae when cultivated under aerobic conditions. However, this approach enables the computational redesign of S. cerevisiae for only 3.9% of all metabolites if under anaerobic conditions. Therefore, it is necessary to develop algorithms able to perform for various culture conditions.ResultsThe author developed an algorithm that could calculate the reaction deletions that achieve the coupling of growth and production for 91.3% metabolites in genome-scale models of S. cerevisiae under anaerobic conditions. Computational experiments showed that the proposed algorithm is efficient also for aerobic conditions and Escherichia coli. In these analyses, the least target production rates were evaluated using flux variability analysis when multiple fluxes yield the highest growth rate. To demonstrate the feasibility of the coupling, the author derived appropriate reaction deletions using the new algorithm for target production in which the search space was divided into small cubes (CubeProd).ConclusionsThe author developed a novel algorithm, CubeProd, to demonstrate that growth coupling is possible for most metabolites in S.cerevisiae under anaerobic conditions. This may imply that growth coupling is possible by reaction deletions for most target metabolites in any genome-scale constraint-based metabolic networks. The developed software, CubeProd, implemented in MATLAB, and the obtained reaction deletion strategies are freely available.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43081102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Bacterial whole-genome sequences have recently become widely available via innovative and rapid progress in technologies such as high-throughput sequencing and computing. Genomes of environmental microor-ganisms have also been sequenced, and their number is expected to increase in the future. Typically, phylogenetic analysis is performed after genome sequencing of such organisms. 16S rRNA is a standard locus for the phylogenetic analysis of prokaryotes. However, 16S rRNA phylogenetic trees are not always reliable because of out-paralogs and horizontal gene transfer. To overcome this problem, multiple genes (or proteins) should be employed. Therefore, we developed “Genome Identifier,” which can be used for constructing a concatenated phylogenetic tree in the form of a species tree by predicting genes from newly sequenced genomic data and collecting homologous sequences from other species.
{"title":"Genome Identifier: A Tool for Phylogenetic Analysis of Microbial Genomes","authors":"Yukinari Shimoyama, Tokumasa Horiike","doi":"10.2197/IPSJTBIO.12.17","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.12.17","url":null,"abstract":": Bacterial whole-genome sequences have recently become widely available via innovative and rapid progress in technologies such as high-throughput sequencing and computing. Genomes of environmental microor-ganisms have also been sequenced, and their number is expected to increase in the future. Typically, phylogenetic analysis is performed after genome sequencing of such organisms. 16S rRNA is a standard locus for the phylogenetic analysis of prokaryotes. However, 16S rRNA phylogenetic trees are not always reliable because of out-paralogs and horizontal gene transfer. To overcome this problem, multiple genes (or proteins) should be employed. Therefore, we developed “Genome Identifier,” which can be used for constructing a concatenated phylogenetic tree in the form of a species tree by predicting genes from newly sequenced genomic data and collecting homologous sequences from other species.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.12.17","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68501171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Information Entropy-based Method to Detect microRNA Regulatory Module","authors":"Yi Yang, Yan Song, Buwen Cao","doi":"10.2197/IPSJTBIO.12.1","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.12.1","url":null,"abstract":"","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.12.1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68501158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun Fang, Yoshitaka Moriwaki, Caihong Li, K. Shimizu
{"title":"Prediction of Antifungal Peptides by Deep Learning with Character Embedding","authors":"Chun Fang, Yoshitaka Moriwaki, Caihong Li, K. Shimizu","doi":"10.2197/IPSJTBIO.12.21","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.12.21","url":null,"abstract":"","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.12.21","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68501182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: In this paper, I propose two novel methods for extracting synchronously fluctuated genes (SFGs) from a transcriptome data. Variability and synchrony in biological signals are generally considered to be associated with the system’s stability in some sense. However, a standard method for extracting SFGs from a transcriptome data with high reproducibility has not been established. Here, I propose two novel methods for extracting SFGs. The first method has two steps: selection of remarkably fluctuated genes and extraction of synchronized gene clusters. The other method is based on principal component analysis. It has been confirmed that the two methods have high extraction performance for artificial data and a moderate level of reproducibility for real data. The proposed methods will help to extract candidate genes related to the stability and homeostasis in living organisms.
{"title":"Two Novel Methods for Extracting Synchronously Fluctuated Genes","authors":"Makito Oku","doi":"10.2197/IPSJTBIO.12.9","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.12.9","url":null,"abstract":": In this paper, I propose two novel methods for extracting synchronously fluctuated genes (SFGs) from a transcriptome data. Variability and synchrony in biological signals are generally considered to be associated with the system’s stability in some sense. However, a standard method for extracting SFGs from a transcriptome data with high reproducibility has not been established. Here, I propose two novel methods for extracting SFGs. The first method has two steps: selection of remarkably fluctuated genes and extraction of synchronized gene clusters. The other method is based on principal component analysis. It has been confirmed that the two methods have high extraction performance for artificial data and a moderate level of reproducibility for real data. The proposed methods will help to extract candidate genes related to the stability and homeostasis in living organisms.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68501315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kinetic modeling is a powerful tool to understand how a biochemical system behaves as a whole. To develop a realistic and predictive model, kinetic parameters need to be estimated so that a model fits experimental data. However, parameter estimation remains a major bottleneck in kinetic modeling. To accelerate parameter estimation, we developed a C library for real-coded genetic algorithms (libRCGA). In libRCGA, two real-coded genetic algorithms (RCGAs), viz. the Unimodal Normal Distribution Crossover with Minimal Generation Gap (UNDX/MGG) and the Real-coded Ensemble Crossover star with Just Generation Gap (REX star/JGG), are implemented in C language and paralleled by Message Passing Interface (MPI). We designed libRCGA to take advantage of high-performance computing environments and thus to significantly accelerate parameter estimation. Constrained optimization formulation is useful to construct a realistic kinetic model that satisfies several biological constraints. libRCGA employs stochastic ranking to efficiently solve constrained optimization problems. In the present paper, we demonstrate the performance of libRCGA through benchmark problems and in realistic parameter estimation problems. libRCGA is freely available for academic usage at http://kurata21.bio.kyutech.ac.jp/maeda/index.html.
{"title":"libRCGA: a C library for real-coded genetic algorithms for rapid parameter estimation of kinetic models","authors":"Kazuhiro Maeda, F. Boogerd, K. Kurata","doi":"10.2197/IPSJTBIO.11.31","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.11.31","url":null,"abstract":"Kinetic modeling is a powerful tool to understand how a biochemical system behaves as a whole. To develop a realistic and predictive model, kinetic parameters need to be estimated so that a model fits experimental data. However, parameter estimation remains a major bottleneck in kinetic modeling. To accelerate parameter estimation, we developed a C library for real-coded genetic algorithms (libRCGA). In libRCGA, two real-coded genetic algorithms (RCGAs), viz. the Unimodal Normal Distribution Crossover with Minimal Generation Gap (UNDX/MGG) and the Real-coded Ensemble Crossover star with Just Generation Gap (REX star/JGG), are implemented in C language and paralleled by Message Passing Interface (MPI). We designed libRCGA to take advantage of high-performance computing environments and thus to significantly accelerate parameter estimation. Constrained optimization formulation is useful to construct a realistic kinetic model that satisfies several biological constraints. libRCGA employs stochastic ranking to efficiently solve constrained optimization problems. In the present paper, we demonstrate the performance of libRCGA through benchmark problems and in realistic parameter estimation problems. libRCGA is freely available for academic usage at http://kurata21.bio.kyutech.ac.jp/maeda/index.html.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"11 1","pages":"31-40"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.11.31","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43574389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Studies on computation of pathways connecting two metabolites have been reported. However, they did not intend to find pathways containing cycling, although there are biologically important cycles such as citric acid cycle (CAC). Whilst computation of pathways connecting two atoms, single-atom tracing, would contribute to finding pathways which include those containing cycling, it produces too many pathways to examine. The present article proposes a strategy to select pathways from those obtained by single-atom tracing, where coexistence of reactions on each pathway, specifically coexistence of a reaction and its reverse reaction forming a futile cycle together or reactions regulated in a reciprocal manner, is checked to select pathways based on biochemical meaning of the pathway. Using this strategy, 121 pathways were selected from total 7876 pathways from carbon atoms of glucose to CO 2 in a model network of carbohydrate metabolism. The selected pathways included pathways using reactions or metabolites of CAC or pentose phosphate pathway multiple times. These results indicate that the proposed strategy can contribute to listing a limited number of pathways which include those containing cycling as possibly biochemically meaningful pathways.
{"title":"Single-atom Tracing in a Model Network of Carbohydrate Metabolism and Pathway Selection","authors":"J. Ohta","doi":"10.2197/IPSJTBIO.11.1","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.11.1","url":null,"abstract":": Studies on computation of pathways connecting two metabolites have been reported. However, they did not intend to find pathways containing cycling, although there are biologically important cycles such as citric acid cycle (CAC). Whilst computation of pathways connecting two atoms, single-atom tracing, would contribute to finding pathways which include those containing cycling, it produces too many pathways to examine. The present article proposes a strategy to select pathways from those obtained by single-atom tracing, where coexistence of reactions on each pathway, specifically coexistence of a reaction and its reverse reaction forming a futile cycle together or reactions regulated in a reciprocal manner, is checked to select pathways based on biochemical meaning of the pathway. Using this strategy, 121 pathways were selected from total 7876 pathways from carbon atoms of glucose to CO 2 in a model network of carbohydrate metabolism. The selected pathways included pathways using reactions or metabolites of CAC or pentose phosphate pathway multiple times. These results indicate that the proposed strategy can contribute to listing a limited number of pathways which include those containing cycling as possibly biochemically meaningful pathways.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"11 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43591855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}