首页 > 最新文献

Open Bioinformatics Journal最新文献

英文 中文
Transcriptional Regulation in the G1-S Cell Cycle Stage in Fungi: Insights through Computational Analysis 真菌G1-S细胞周期阶段的转录调控:通过计算分析的见解
Q3 Computer Science Pub Date : 2012-09-07 DOI: 10.2174/1875036201206010043
V. Martyanov, R. H. Gross
The transcription factor complexes Mlu1-box binding factor (MBF) and Swi4/6 cell cycle box binding factor (SBF) regulate the cell cycle in Saccharomyces cerevisiae. They activate hundreds of genes and are responsible for nor- mal cell cycle progression from G1 to S phase. We investigated the conservation of MBF and SBF binding sites during fungal evolution. Orthologs of S. cerevisiae targets of these transcription factors were identified in 37 fungal species and their upstream regions were analyzed for putative transcription factor binding sites. Both groups displayed enrichment in specific putative regulatory DNA sequences in their upstream regions and showed different preferred upstream motif loca- tions, variable patterns of evolutionary conservation of the motifs and enrichment in unique biological functions for the regulated genes. The results indicate that despite high sequence similarity of upstream DNA motifs putatively associated with G1-S transcriptional regulation by MBF and SBF transcription factors, there are important upstream sequence feature differences that may help differentiate the two seemingly similar regulatory modes. The incorporation of upstream motif sequence comparison, positional distribution and evolutionary variability of the motif can complement functional infor- mation about roles of the respective gene products and help elucidate transcriptional regulatory pathways and functions.
转录因子复合物Mlu1-box binding factor (MBF)和Swi4/6 cell cycle box binding factor (SBF)调控酿酒酵母的细胞周期。它们激活数百个基因,并负责从G1期到S期的非正常细胞周期进程。我们研究了真菌进化过程中MBF和SBF结合位点的保守性。在37种真菌中鉴定了酿酒酵母的同源靶点,并分析了其上游区域推定的转录因子结合位点。两组在其上游区域均表现出特定的推测调控DNA序列富集,并表现出不同的首选上游基序位置、基序进化保护的不同模式和被调控基因独特生物学功能的富集。结果表明,尽管MBF和SBF转录因子与G1-S转录调控相关的上游DNA基序序列高度相似,但上游序列特征的重要差异可能有助于区分这两种看似相似的调控模式。结合上游基序序列比较、基序位置分布和进化变异性,可以补充有关各自基因产物作用的功能信息,有助于阐明转录调控途径和功能。
{"title":"Transcriptional Regulation in the G1-S Cell Cycle Stage in Fungi: Insights through Computational Analysis","authors":"V. Martyanov, R. H. Gross","doi":"10.2174/1875036201206010043","DOIUrl":"https://doi.org/10.2174/1875036201206010043","url":null,"abstract":"The transcription factor complexes Mlu1-box binding factor (MBF) and Swi4/6 cell cycle box binding factor (SBF) regulate the cell cycle in Saccharomyces cerevisiae. They activate hundreds of genes and are responsible for nor- mal cell cycle progression from G1 to S phase. We investigated the conservation of MBF and SBF binding sites during fungal evolution. Orthologs of S. cerevisiae targets of these transcription factors were identified in 37 fungal species and their upstream regions were analyzed for putative transcription factor binding sites. Both groups displayed enrichment in specific putative regulatory DNA sequences in their upstream regions and showed different preferred upstream motif loca- tions, variable patterns of evolutionary conservation of the motifs and enrichment in unique biological functions for the regulated genes. The results indicate that despite high sequence similarity of upstream DNA motifs putatively associated with G1-S transcriptional regulation by MBF and SBF transcription factors, there are important upstream sequence feature differences that may help differentiate the two seemingly similar regulatory modes. The incorporation of upstream motif sequence comparison, positional distribution and evolutionary variability of the motif can complement functional infor- mation about roles of the respective gene products and help elucidate transcriptional regulatory pathways and functions.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"6 1","pages":"43-54"},"PeriodicalIF":0.0,"publicationDate":"2012-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Cloud Computing System to Quickly Implement New Microarray Data Pre-processing Methods 云计算系统快速实现新的微阵列数据预处理方法
Q3 Computer Science Pub Date : 2012-08-31 DOI: 10.2174/1875036201206010037
Dajie Luo, Prithish Banerjee, E. Harner, J. Mobley, Dongquan Chen
Background: Pre-processing, including normalization of raw microarray data is crucial to microarray-related data analysis. It takes time and effort to build newly-developed algorithms into commercial software or locally developed systems. While most new algorithms emerge in the form of sharable R packages, it can be difficult for many biologists to apply them as soon as they are available. Currently, we rely on statisticians and experienced programmers to develop and implement code to access those R packages. Therefore, we need a robust procedure to quickly implement pre-processing methods as they appear. The newly emerging cloud computing concept has directed us toward a new way for providing an easily accessible service to the biologists without requiring them to have any programming knowledge in R. Results: Based on our earlier Java-based software tool JavaStat, we developed an internet based application prototype to upload data and carry out pre-processing applications that include normalization, statistical analyses and plots. More im- portantly, R packages, e. g., for newly-developed normalization methods, and GC-robust multichip algorithm (RMA) for exon arrays, can be easily incorporated into the system with limited inputs from a biologist or a programmer. The data are stored in the cloud and the R code runs on server. Conclusion: The newly emerged cloud computing concept provides us a new way to provide an easily accessible and up- to-date service to biologists, as evidenced by our JavaStat system to incorporate new pre-processing package as they ap- pear. Users can access the application with a newly incorporated module through the Web. We expect this and other simi- lar systems greatly decrease turn-around time, improve accessibility of newly developed R model for pre-processing algo- rithms.
背景:预处理,包括原始微阵列数据的规范化是微阵列相关数据分析的关键。将新开发的算法构建到商业软件或本地开发的系统中需要时间和精力。虽然大多数新算法以可共享的R包的形式出现,但对于许多生物学家来说,一旦它们可用,就很难应用它们。目前,我们依靠统计学家和经验丰富的程序员来开发和实现访问这些R包的代码。因此,我们需要一个健壮的程序来快速实现预处理方法。新出现的云计算概念为我们提供了一种新的方式,为生物学家提供易于访问的服务,而不需要他们有任何r语言的编程知识。结果:基于我们早期基于java的软件工具JavaStat,我们开发了一个基于互联网的应用程序原型,用于上传数据并执行包括标准化,统计分析和绘图在内的预处理应用程序。更重要的是,R包,例如,用于新开发的归一化方法,以及用于外显子阵列的gc -鲁棒多芯片算法(RMA),可以很容易地与生物学家或程序员的有限输入合并到系统中。数据存储在云中,R代码运行在服务器上。结论:新出现的云计算概念为我们提供了一种新的方式来为生物学家提供易于访问和最新的服务,正如我们的JavaStat系统在出现时包含新的预处理包所证明的那样。用户可以通过Web使用新合并的模块访问应用程序。我们期望这个和其他类似的系统能大大减少周转时间,提高新开发的R模型对预处理算法的可及性。
{"title":"A Cloud Computing System to Quickly Implement New Microarray Data Pre-processing Methods","authors":"Dajie Luo, Prithish Banerjee, E. Harner, J. Mobley, Dongquan Chen","doi":"10.2174/1875036201206010037","DOIUrl":"https://doi.org/10.2174/1875036201206010037","url":null,"abstract":"Background: Pre-processing, including normalization of raw microarray data is crucial to microarray-related data analysis. It takes time and effort to build newly-developed algorithms into commercial software or locally developed systems. While most new algorithms emerge in the form of sharable R packages, it can be difficult for many biologists to apply them as soon as they are available. Currently, we rely on statisticians and experienced programmers to develop and implement code to access those R packages. Therefore, we need a robust procedure to quickly implement pre-processing methods as they appear. The newly emerging cloud computing concept has directed us toward a new way for providing an easily accessible service to the biologists without requiring them to have any programming knowledge in R. Results: Based on our earlier Java-based software tool JavaStat, we developed an internet based application prototype to upload data and carry out pre-processing applications that include normalization, statistical analyses and plots. More im- portantly, R packages, e. g., for newly-developed normalization methods, and GC-robust multichip algorithm (RMA) for exon arrays, can be easily incorporated into the system with limited inputs from a biologist or a programmer. The data are stored in the cloud and the R code runs on server. Conclusion: The newly emerged cloud computing concept provides us a new way to provide an easily accessible and up- to-date service to biologists, as evidenced by our JavaStat system to incorporate new pre-processing package as they ap- pear. Users can access the application with a newly incorporated module through the Web. We expect this and other simi- lar systems greatly decrease turn-around time, improve accessibility of newly developed R model for pre-processing algo- rithms.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"6 1","pages":"37-42"},"PeriodicalIF":0.0,"publicationDate":"2012-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Self-organizing Approach for the Human Gut Meta-genome 人类肠道元基因组的自组织方法
Q3 Computer Science Pub Date : 2012-07-18 DOI: 10.2174/1875036201206010028
Jianfeng Zhu, Songgang Li, Wei-Mou Zheng
We extend the self-organizing approach for annotation of a bacterial genome to analyzing the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. After an initialization of phase assignment or codon usage tables, an iteration leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories of genomes. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome.
我们将细菌基因组注释的自组织方法扩展到分析人类肠道宏基因组的原始测序数据,而无需序列组装。原始方法将细菌的基因组序列划分为不等长的非重叠片段,并为每个片段分配七个“阶段”中的一个,其中一个用于非编码区,三个用于直接编码区,以指示片段起始位点的三个可能密码子位置,三个用于反向编码区。非编码阶段和6个编码阶段由64个三联体类型或“密码子用法”的两个频率表描述。一组密码子用法可以用来更新相位分配,反之亦然。在初始化阶段分配表或密码子使用表之后,迭代导致收敛阶段分配以给出基因组的注释。在宏基因组方法的扩展中,我们考虑了许多基因组类别的混合模型。然后检查来自粪便样本的Illumina基因组分析仪的总DNA测序数据,以了解人类肠道微生物组的多样性。
{"title":"Self-organizing Approach for the Human Gut Meta-genome","authors":"Jianfeng Zhu, Songgang Li, Wei-Mou Zheng","doi":"10.2174/1875036201206010028","DOIUrl":"https://doi.org/10.2174/1875036201206010028","url":null,"abstract":"We extend the self-organizing approach for annotation of a bacterial genome to analyzing the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. After an initialization of phase assignment or codon usage tables, an iteration leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories of genomes. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"6 1","pages":"28-36"},"PeriodicalIF":0.0,"publicationDate":"2012-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing Protein Shape by a Volume Distribution Asymmetry Index 用体积分布不对称指数表征蛋白质形状
Q3 Computer Science Pub Date : 2012-05-09 DOI: 10.2174/1875036201206010020
Nicole C. Arrigo, P. Paci, L. Paola, D. Santoni, M. Ruvo, A. Giuliani, F. Castiglione
A fully quantitative shape index relying upon the asymmetry of mass distribution of protein molecules along the three space dimensions is proposed. Multidimensional statistical analysis, based on principal component extraction and subsequent linear discriminant analysis, showed the presence of three major 'attractor forms' roughly correspondent to rod-like, discoidal and spherical shapes. This classification of protein shapes was in turn demonstrated to be strictly connected with topological features of proteins, as emerging from complex network invariants of their contact maps.
基于蛋白质分子质量分布在三维空间上的不对称性,提出了一种完全定量的形状指数。多维统计分析,基于主成分提取和随后的线性判别分析,显示了三种主要的“吸引子形式”的存在,大致对应于棒状,盘状和球形。这种蛋白质形状的分类反过来又被证明与蛋白质的拓扑特征密切相关,因为它们是从它们的接触图的复杂网络不变量中出现的。
{"title":"Characterizing Protein Shape by a Volume Distribution Asymmetry Index","authors":"Nicole C. Arrigo, P. Paci, L. Paola, D. Santoni, M. Ruvo, A. Giuliani, F. Castiglione","doi":"10.2174/1875036201206010020","DOIUrl":"https://doi.org/10.2174/1875036201206010020","url":null,"abstract":"A fully quantitative shape index relying upon the asymmetry of mass distribution of protein molecules along the three space dimensions is proposed. Multidimensional statistical analysis, based on principal component extraction and subsequent linear discriminant analysis, showed the presence of three major 'attractor forms' roughly correspondent to rod-like, discoidal and spherical shapes. This classification of protein shapes was in turn demonstrated to be strictly connected with topological features of proteins, as emerging from complex network invariants of their contact maps.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"6 1","pages":"20-27"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ProMode-Oligomer: Database of Normal Mode Analysis in Dihedral Angle Space for a Full-Atom System of Oligomeric Proteins ProMode-Oligomer:低聚蛋白全原子体系二面角空间正态分析数据库
Q3 Computer Science Pub Date : 2012-02-21 DOI: 10.2174/1875036201206010009
H. Wako, S. Endo
The database ProMode-Oligomer (http://promode.socs.waseda.ac.jp/promode_oligomer) was constructed by collecting normal-mode-analysis (NMA) results for oligomeric proteins including protein-protein complexes. As in the ProMode database developed earlier for monomers and individual subunits of oligomers (Bioinformatics vol. 20, pp. 2035-2043, 2004), NMA was performed for a full-atom system using dihedral angles as independent variables, and we re- leased the results (fluctuations of atoms, fluctuations of dihedral angles, correlations between atomic fluctuations, etc.). The vibrating oligomer is visualized by animation in an interactive molecular viewer for each of the 20 lowest-frequency normal modes. In addition, displacement vectors of constituent atoms for each normal mode were decomposed into two characteristic motions in individual subunits, i.e., internal and external (deformation and rigid-body movements of the in- dividual subunits, respectively), and then the mutual movements of the subunits and the movement of atoms around the interface regions were investigated. These results released in ProMode-Oligomer are useful for characterizing oligomeric proteins from a dynamic point of view. The analyses are illustrated with immunoglobulin light- and heavy-chain variable domains bound to lysozyme and to a 12-residue peptide.
通过收集低聚蛋白(包括蛋白-蛋白复合物)的正常模式分析(NMA)结果,构建数据库ProMode-Oligomer (http://promode.socs.waseda.ac.jp/promode_oligomer)。正如之前为单体和低聚物的单个亚基开发的ProMode数据库一样(生物信息学vol. 20, pp. 2035-2043, 2004), NMA是对使用二面角作为自变量的全原子系统进行的,我们公布了结果(原子的波动、二面角的波动、原子波动之间的相关性等)。在交互式分子观察器中,对20种最低频率正态模式中的每一种,通过动画显示振动低聚物。此外,将每个法向模的组成原子的位移向量分解为单个亚基的内部和外部两个特征运动(分别为单个亚基的变形和刚体运动),然后研究亚基之间的相互运动和原子在界面区域周围的运动。这些结果在ProMode-Oligomer上发表,对从动态的角度表征寡聚蛋白是有用的。免疫球蛋白轻链和重链可变结构域与溶菌酶和12个残基肽结合。
{"title":"ProMode-Oligomer: Database of Normal Mode Analysis in Dihedral Angle Space for a Full-Atom System of Oligomeric Proteins","authors":"H. Wako, S. Endo","doi":"10.2174/1875036201206010009","DOIUrl":"https://doi.org/10.2174/1875036201206010009","url":null,"abstract":"The database ProMode-Oligomer (http://promode.socs.waseda.ac.jp/promode_oligomer) was constructed by collecting normal-mode-analysis (NMA) results for oligomeric proteins including protein-protein complexes. As in the ProMode database developed earlier for monomers and individual subunits of oligomers (Bioinformatics vol. 20, pp. 2035-2043, 2004), NMA was performed for a full-atom system using dihedral angles as independent variables, and we re- leased the results (fluctuations of atoms, fluctuations of dihedral angles, correlations between atomic fluctuations, etc.). The vibrating oligomer is visualized by animation in an interactive molecular viewer for each of the 20 lowest-frequency normal modes. In addition, displacement vectors of constituent atoms for each normal mode were decomposed into two characteristic motions in individual subunits, i.e., internal and external (deformation and rigid-body movements of the in- dividual subunits, respectively), and then the mutual movements of the subunits and the movement of atoms around the interface regions were investigated. These results released in ProMode-Oligomer are useful for characterizing oligomeric proteins from a dynamic point of view. The analyses are illustrated with immunoglobulin light- and heavy-chain variable domains bound to lysozyme and to a 12-residue peptide.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"6 1","pages":"9-19"},"PeriodicalIF":0.0,"publicationDate":"2012-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Miyazawa-Jernigan Contact Energies Revisited 宫泽杰尼根接触能量重访
Q3 Computer Science Pub Date : 2012-01-24 DOI: 10.2174/1875036201206010001
H. Zeng, Ke-Song Liu, Wei Zheng
The Miyazawa-Jernigan (MJ) contact potential for globular proteins is a widely used knowledge-based potential. It is well known that MJ's contact energies mainly come from one-body terms. Directly in the framework of the MJ energy for a protein, we derive the one-body term based on a probabilistic model, and compare the term with several hydrophobicity scales of amino acids. This derivation is based on a set of native structures, and the only information of structures manipulated in the analysis is the contact numbers of each residue. Contact numbers strongly correlate with layers of a protein when it is viewed as an ellipsoid. Using an entropic clustering approach, we obtain two coarse-grained states by maximizing the mutual information between coordination numbers and residue types, and find their differences in the two-body correction. A contact definition using sidechain centers roughly estimated from C atoms results in no significant changes.
球状蛋白的宫泽杰尼根(Miyazawa-Jernigan, MJ)接触电位是一种广泛使用的基于知识的电位。众所周知,MJ的接触能主要来源于一体项。直接在蛋白质MJ能的框架下,我们基于概率模型推导出了单体项,并与氨基酸的几种疏水性尺度进行了比较。这种推导是基于一组固有结构,分析中操纵结构的唯一信息是每个残基的接触数。当把蛋白质看作椭球体时,接触数与蛋白质的层数密切相关。利用熵聚类方法,通过最大化配位数和残差类型之间的互信息,得到两种粗粒度状态,并找出它们在二体校正中的差异。使用从C原子粗略估计的侧链中心的接触定义不会产生显著的变化。
{"title":"The Miyazawa-Jernigan Contact Energies Revisited","authors":"H. Zeng, Ke-Song Liu, Wei Zheng","doi":"10.2174/1875036201206010001","DOIUrl":"https://doi.org/10.2174/1875036201206010001","url":null,"abstract":"The Miyazawa-Jernigan (MJ) contact potential for globular proteins is a widely used knowledge-based potential. It is well known that MJ's contact energies mainly come from one-body terms. Directly in the framework of the MJ energy for a protein, we derive the one-body term based on a probabilistic model, and compare the term with several hydrophobicity scales of amino acids. This derivation is based on a set of native structures, and the only information of structures manipulated in the analysis is the contact numbers of each residue. Contact numbers strongly correlate with layers of a protein when it is viewed as an ellipsoid. Using an entropic clustering approach, we obtain two coarse-grained states by maximizing the mutual information between coordination numbers and residue types, and find their differences in the two-body correction. A contact definition using sidechain centers roughly estimated from C atoms results in no significant changes.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"6 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Yeast Gene Function Prediction from Different Data Sources: An Empirical Comparison 不同数据来源的酵母基因功能预测:经验比较
Q3 Computer Science Pub Date : 2011-06-09 DOI: 10.2174/1875036201105010069
Y. Liu
Different data sources have been used to learn gene function. Whereas combining heterogeneous data sets to infer gene function has been widely studied, there is no empirical comparison to determine the relative effectiveness or usefulness of different types of data in terms of gene function prediction. In this paper, we report a comparative study of yeast gene function prediction using different data sources, namely microarray data, phylogenetic data, literature text data, and a combination of these three data sources. Our results showed that text data outperformed microarray data and phylo- genetic data in gene function prediction (p 0.05). The com- bined data led to decreased prediction performance relative to text data. In addition, we showed that feature selection did not improve the prediction performance of support vector machines.
不同的数据来源被用来研究基因功能。虽然结合异质数据集来推断基因功能已经得到了广泛的研究,但在基因功能预测方面,还没有经验比较来确定不同类型数据的相对有效性或有用性。在本文中,我们报告了酵母基因功能预测使用不同数据源的比较研究,即微阵列数据,系统发育数据,文献文本数据,以及这三种数据源的组合。我们的研究结果表明,文本数据在基因功能预测方面优于微阵列数据和进化遗传数据(p < 0.05)。与文本数据相比,合并后的数据导致预测性能下降。此外,我们发现特征选择并没有提高支持向量机的预测性能。
{"title":"Yeast Gene Function Prediction from Different Data Sources: An Empirical Comparison","authors":"Y. Liu","doi":"10.2174/1875036201105010069","DOIUrl":"https://doi.org/10.2174/1875036201105010069","url":null,"abstract":"Different data sources have been used to learn gene function. Whereas combining heterogeneous data sets to infer gene function has been widely studied, there is no empirical comparison to determine the relative effectiveness or usefulness of different types of data in terms of gene function prediction. In this paper, we report a comparative study of yeast gene function prediction using different data sources, namely microarray data, phylogenetic data, literature text data, and a combination of these three data sources. Our results showed that text data outperformed microarray data and phylo- genetic data in gene function prediction (p 0.05). The com- bined data led to decreased prediction performance relative to text data. In addition, we showed that feature selection did not improve the prediction performance of support vector machines.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"5 1","pages":"69-76"},"PeriodicalIF":0.0,"publicationDate":"2011-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analysis of the Local Sequences of Folding Sites in β Sandwich Proteinswith Inter-Residue Average Distance Statistics 利用残基间平均距离统计分析β三明治蛋白折叠位点的局部序列
Q3 Computer Science Pub Date : 2011-04-20 DOI: 10.2174/1875036201105010059
Y. Ishizuka, T. Kikuchi
The sequences of azurin and titin,  sandwich proteins, are analyzed based on inter-residue average distance statistics. A kind of predicted contact map based on inter-residue average distance statistics (Average Distance Map, ADM) is used to pinpoint regions of possible compact regions for two proteins. We compare predicted compact regions with the positions of the residues with experimental high  values for these proteins reported in the literature. The results reveal that the regions predicted by ADMs correspond to the positions of residues with the high  value. Furthermore, we perform random sampling of 3D conformations using these protein sequences with a potential derived from inter-residue average distance statistics. It is demonstrated that the residues with highest contact frequency during the simulations quali- tatively correspond to the residues with the highest  values for these proteins. Importantly, analysis with inter-residue av- erage distance statistics predicts the properties of folding processes of the  sandwich proteins starting from only sequence information.
基于残基间平均距离统计分析了夹层蛋白azurin和titin的序列。一种基于残基间平均距离统计的预测接触图(average distance map, ADM)用于确定两个蛋白质可能紧密区域的区域。我们将预测的紧致区域与文献中报道的这些蛋白质的实验高值的残基位置进行比较。结果表明,ADMs预测的区域与值较高的残基位置相对应。此外,我们使用这些蛋白质序列进行三维构象的随机抽样,这些蛋白质序列具有残基间平均距离统计的潜力。结果表明,在模拟过程中,接触频率最高的残基定性地对应于这些蛋白质值最高的残基。重要的是,残基间平均距离统计分析仅从序列信息开始预测三明治蛋白折叠过程的性质。
{"title":"Analysis of the Local Sequences of Folding Sites in β Sandwich Proteinswith Inter-Residue Average Distance Statistics","authors":"Y. Ishizuka, T. Kikuchi","doi":"10.2174/1875036201105010059","DOIUrl":"https://doi.org/10.2174/1875036201105010059","url":null,"abstract":"The sequences of azurin and titin,  sandwich proteins, are analyzed based on inter-residue average distance statistics. A kind of predicted contact map based on inter-residue average distance statistics (Average Distance Map, ADM) is used to pinpoint regions of possible compact regions for two proteins. We compare predicted compact regions with the positions of the residues with experimental high  values for these proteins reported in the literature. The results reveal that the regions predicted by ADMs correspond to the positions of residues with the high  value. Furthermore, we perform random sampling of 3D conformations using these protein sequences with a potential derived from inter-residue average distance statistics. It is demonstrated that the residues with highest contact frequency during the simulations quali- tatively correspond to the residues with the highest  values for these proteins. Importantly, analysis with inter-residue av- erage distance statistics predicts the properties of folding processes of the  sandwich proteins starting from only sequence information.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"5 1","pages":"59-68"},"PeriodicalIF":0.0,"publicationDate":"2011-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
GraProStr - Graphs of Protein Structures: A Tool for Constructing the Graphs and Generating Graph Parameters for Protein Structures 蛋白质结构图:一个用于构造图和生成蛋白质结构图参数的工具
Q3 Computer Science Pub Date : 2011-02-11 DOI: 10.2174/1875036201105010053
M. Vijayabaskar, V. Niranjan, S. Vishveshwara
Protein structures can be represented as graphs/networks by defining the amino-acids as nodes and the noncovalent interactions as connections (edges). An analysis of such a graph provides valuable insights into the global structural properties, function, folding, and stability of proteins. Here we have created a webtool GraProStr to generate protein structure networks and analyze network parameters. Protein side-chain based, C /C backbone based or proteinligand Graphs/Networks can be generated using GraProStr. The well tested tool is now made available to the scientific community for the first time. GraProStr is available online and can be accessed from http://vishgraph.mbu.iisc.ernet.in/GraProStr/index.html using any of the internet browsers (best viewed in Mozilla Firefox version 3.6). The webtool is written using Perl CGI and available using Apache Webserver. With its customizable definitions of protein structure networks and well defined network parameters, GraProStr can be a very useful tool for both theoretical and experimental elucidation of protein structures.
通过将氨基酸定义为节点,将非共价相互作用定义为连接(边),可以将蛋白质结构表示为图/网络。对这种图的分析提供了对蛋白质的整体结构特性、功能、折叠和稳定性的有价值的见解。在这里,我们创建了一个webtool GraProStr来生成蛋白质结构网络并分析网络参数。基于蛋白质侧链,基于C /C主链或蛋白质配体的图/网络可以使用GraProStr生成。这个经过良好测试的工具现在首次提供给科学界。GraProStr可以在线获得,可以使用任何互联网浏览器从http://vishgraph.mbu.iisc.ernet.in/GraProStr/index.html访问(最好在Mozilla Firefox 3.6版本中查看)。webtool是使用Perl CGI编写的,可以通过Apache Webserver访问。凭借其可定制的蛋白质结构网络定义和良好定义的网络参数,GraProStr可以成为蛋白质结构理论和实验阐明的非常有用的工具。
{"title":"GraProStr - Graphs of Protein Structures: A Tool for Constructing the Graphs and Generating Graph Parameters for Protein Structures","authors":"M. Vijayabaskar, V. Niranjan, S. Vishveshwara","doi":"10.2174/1875036201105010053","DOIUrl":"https://doi.org/10.2174/1875036201105010053","url":null,"abstract":"Protein structures can be represented as graphs/networks by defining the amino-acids as nodes and the noncovalent interactions as connections (edges). An analysis of such a graph provides valuable insights into the global structural properties, function, folding, and stability of proteins. Here we have created a webtool GraProStr to generate protein structure networks and analyze network parameters. Protein side-chain based, C /C backbone based or proteinligand Graphs/Networks can be generated using GraProStr. The well tested tool is now made available to the scientific community for the first time. GraProStr is available online and can be accessed from http://vishgraph.mbu.iisc.ernet.in/GraProStr/index.html using any of the internet browsers (best viewed in Mozilla Firefox version 3.6). The webtool is written using Perl CGI and available using Apache Webserver. With its customizable definitions of protein structure networks and well defined network parameters, GraProStr can be a very useful tool for both theoretical and experimental elucidation of protein structures.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"5 1","pages":"53-58"},"PeriodicalIF":0.0,"publicationDate":"2011-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Emergent Principles in Gene Expression Dynamics 基因表达动力学中的涌现原理
Q3 Computer Science Pub Date : 2011-02-02 DOI: 10.2174/1875036201105010034
J. Nacher, T. Ochiai
Rapid advances in data processing of genome-wide gene expression have allowed us to get a first glimpse of some fundamental laws and principles involved in the intra-cellular organization as well as to investigate its complex regulatory architecture. However, the identification of commonalities in dynamical processes involved in networks has not followed the same development. In particular, the coupling between dynamics and structural features remains largely uncovered. Here, we review several works that have addressed the issue of uncovering the gene expression dynamics and principles using micro-array time series data at different environmental conditions and disease states as well as the emer- gence of criticality in gene expression systems by using information theory. Moreover, we also describe the efforts done to explore the question of characterizing gene networks by using transcriptional dynamics information. The combination of the emergent principles uncovered in the transcriptional organization with dynamic information, may lead to recon- struct, characterize and complete gene networks. We also discuss several methods based on simulations of a series of en- zyme-catalyzed reaction routes and Markov processes as well as combination of complex network properties with sto- chastic theory.
全基因组基因表达数据处理的快速发展使我们能够初步了解细胞内组织的一些基本规律和原理,并研究其复杂的调控结构。然而,识别网络中涉及的动态过程的共性并没有遵循相同的发展。特别是,动力学和结构特征之间的耦合在很大程度上仍未发现。在这里,我们回顾了几项研究,这些研究利用微阵列时间序列数据揭示了不同环境条件和疾病状态下基因表达的动态和原理,以及利用信息论揭示了基因表达系统中临界性的出现。此外,我们还描述了通过使用转录动力学信息来探索表征基因网络的问题所做的努力。将转录组织中揭示的涌现原理与动态信息相结合,可能导致基因网络的重构、表征和完整。我们还讨论了几种基于一系列酶催化反应路线和马尔可夫过程的模拟方法,以及将复杂网络性质与随机理论相结合的方法。
{"title":"Emergent Principles in Gene Expression Dynamics","authors":"J. Nacher, T. Ochiai","doi":"10.2174/1875036201105010034","DOIUrl":"https://doi.org/10.2174/1875036201105010034","url":null,"abstract":"Rapid advances in data processing of genome-wide gene expression have allowed us to get a first glimpse of some fundamental laws and principles involved in the intra-cellular organization as well as to investigate its complex regulatory architecture. However, the identification of commonalities in dynamical processes involved in networks has not followed the same development. In particular, the coupling between dynamics and structural features remains largely uncovered. Here, we review several works that have addressed the issue of uncovering the gene expression dynamics and principles using micro-array time series data at different environmental conditions and disease states as well as the emer- gence of criticality in gene expression systems by using information theory. Moreover, we also describe the efforts done to explore the question of characterizing gene networks by using transcriptional dynamics information. The combination of the emergent principles uncovered in the transcriptional organization with dynamic information, may lead to recon- struct, characterize and complete gene networks. We also discuss several methods based on simulations of a series of en- zyme-catalyzed reaction routes and Markov processes as well as combination of complex network properties with sto- chastic theory.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"5 1","pages":"34-41"},"PeriodicalIF":0.0,"publicationDate":"2011-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Open Bioinformatics Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1