Advances in Bioinformatics最新文献_第6页

Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions. 剪接位点共识区域内变异分析的生物信息学程序评估。

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-05-24 DOI: 10.1155/2016/5614058

Rongying Tang, Debra O Prosser, Donald R Love

The increasing diagnostic use of gene sequencing has led to an expanding dataset of novel variants that lie within consensus splice junctions. The challenge for diagnostic laboratories is the evaluation of these variants in order to determine if they affect splicing or are merely benign. A common evaluation strategy is to use in silico analysis, and it is here that a number of programmes are available online; however, currently, there are no consensus guidelines on the selection of programmes or protocols to interpret the prediction results. Using a collection of 222 pathogenic mutations and 50 benign polymorphisms, we evaluated the sensitivity and specificity of four in silico programmes in predicting the effect of each variant on splicing. The programmes comprised Human Splice Finder (HSF), Max Entropy Scan (MES), NNSplice, and ASSP. The MES and ASSP programmes gave the highest performance based on Receiver Operator Curve analysis, with an optimal cut-off of score reduction of 10%. The study also showed that the sensitivity of prediction is affected by the level of conservation of individual positions, with in silico predictions for variants at positions -4 and +7 within consensus splice sites being largely uninformative.

越来越多的诊断使用的基因测序已经导致扩大的新变异的数据集，位于共识剪接连接。诊断实验室面临的挑战是评估这些变异，以确定它们是否影响剪接或仅仅是良性的。一种常见的评价策略是使用计算机分析，在这方面，一些课程可以在线获得;然而，目前在选择方案或方案来解释预测结果方面没有一致的指导方针。利用222个致病突变和50个良性多态性，我们评估了四种计算机程序在预测每种变异对剪接影响方面的敏感性和特异性。程序包括Human Splice Finder (HSF)、Max Entropy Scan (MES)、NNSplice和ASSP。根据接收算子曲线分析，MES和ASSP方案的表现最好，分数减少的最佳截止值为10%。该研究还表明，预测的敏感性受到个体位置保护水平的影响，在共识剪接位点内-4和+7位置变异的计算机预测在很大程度上是缺乏信息的。

{"title":"Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions.","authors":"Rongying Tang, Debra O Prosser, Donald R Love","doi":"10.1155/2016/5614058","DOIUrl":"https://doi.org/10.1155/2016/5614058","url":null,"abstract":"The increasing diagnostic use of gene sequencing has led to an expanding dataset of novel variants that lie within consensus splice junctions. The challenge for diagnostic laboratories is the evaluation of these variants in order to determine if they affect splicing or are merely benign. A common evaluation strategy is to use in silico analysis, and it is here that a number of programmes are available online; however, currently, there are no consensus guidelines on the selection of programmes or protocols to interpret the prediction results. Using a collection of 222 pathogenic mutations and 50 benign polymorphisms, we evaluated the sensitivity and specificity of four in silico programmes in predicting the effect of each variant on splicing. The programmes comprised Human Splice Finder (HSF), Max Entropy Scan (MES), NNSplice, and ASSP. The MES and ASSP programmes gave the highest performance based on Receiver Operator Curve analysis, with an optimal cut-off of score reduction of 10%. The study also showed that the sensitivity of prediction is affected by the level of conservation of individual positions, with in silico predictions for variants at positions -4 and +7 within consensus splice sites being largely uninformative. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"5614058"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2016/5614058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34477255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

Multiphase Simulated Annealing Based on Boltzmann and Bose-Einstein Distribution Applied to Protein Folding Problem. 基于Boltzmann和玻色-爱因斯坦分布的多相模拟退火在蛋白质折叠问题中的应用。

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-06-20 DOI: 10.1155/2016/7357123

Juan Frausto-Solis, Ernesto Liñán-García, Juan Paulo Sánchez-Hernández, J Javier González-Barbosa, Carlos González-Flores, Guadalupe Castilla-Valdez

A new hybrid Multiphase Simulated Annealing Algorithm using Boltzmann and Bose-Einstein distributions (MPSABBE) is proposed. MPSABBE was designed for solving the Protein Folding Problem (PFP) instances. This new approach has four phases: (i) Multiquenching Phase (MQP), (ii) Boltzmann Annealing Phase (BAP), (iii) Bose-Einstein Annealing Phase (BEAP), and (iv) Dynamical Equilibrium Phase (DEP). BAP and BEAP are simulated annealing searching procedures based on Boltzmann and Bose-Einstein distributions, respectively. DEP is also a simulated annealing search procedure, which is applied at the final temperature of the fourth phase, which can be seen as a second Bose-Einstein phase. MQP is a search process that ranges from extremely high to high temperatures, applying a very fast cooling process, and is not very restrictive to accept new solutions. However, BAP and BEAP range from high to low and from low to very low temperatures, respectively. They are more restrictive for accepting new solutions. DEP uses a particular heuristic to detect the stochastic equilibrium by applying a least squares method during its execution. MPSABBE parameters are tuned with an analytical method, which considers the maximal and minimal deterioration of problem instances. MPSABBE was tested with several instances of PFP, showing that the use of both distributions is better than using only the Boltzmann distribution on the classical SA.

提出了一种基于玻尔兹曼分布和玻色-爱因斯坦分布的混合多相模拟退火算法(MPSABBE)。MPSABBE设计用于解决蛋白质折叠问题(PFP)实例。这种新方法有四个阶段:(i)多淬火阶段(MQP)， (ii)玻尔兹曼退火阶段(BAP)， (iii)玻色-爱因斯坦退火阶段(BEAP)和(iv)动态平衡阶段(DEP)。BAP和BEAP分别是基于Boltzmann和Bose-Einstein分布的模拟退火搜索过程。DEP也是一种模拟退火搜索程序，应用于第四相的最终温度，这可以看作是第二个玻色-爱因斯坦相。MQP是一个搜索过程，范围从极高到高温，应用非常快的冷却过程，并且对接受新的解决方案没有很大的限制。然而，BAP和BEAP的温度范围分别从高到低和从低到极低。他们对接受新的解决方案更有限制。DEP在执行过程中使用一种特殊的启发式方法，通过应用最小二乘法来检测随机均衡。MPSABBE参数采用一种分析方法进行调优，该方法考虑了问题实例的最大和最小恶化。MPSABBE用几个PFP实例进行了测试，表明在经典SA上使用两种分布比仅使用玻尔兹曼分布更好。

{"title":"Multiphase Simulated Annealing Based on Boltzmann and Bose-Einstein Distribution Applied to Protein Folding Problem.","authors":"Juan Frausto-Solis, Ernesto Liñán-García, Juan Paulo Sánchez-Hernández, J Javier González-Barbosa, Carlos González-Flores, Guadalupe Castilla-Valdez","doi":"10.1155/2016/7357123","DOIUrl":"https://doi.org/10.1155/2016/7357123","url":null,"abstract":"A new hybrid Multiphase Simulated Annealing Algorithm using Boltzmann and Bose-Einstein distributions (MPSABBE) is proposed. MPSABBE was designed for solving the Protein Folding Problem (PFP) instances. This new approach has four phases: (i) Multiquenching Phase (MQP), (ii) Boltzmann Annealing Phase (BAP), (iii) Bose-Einstein Annealing Phase (BEAP), and (iv) Dynamical Equilibrium Phase (DEP). BAP and BEAP are simulated annealing searching procedures based on Boltzmann and Bose-Einstein distributions, respectively. DEP is also a simulated annealing search procedure, which is applied at the final temperature of the fourth phase, which can be seen as a second Bose-Einstein phase. MQP is a search process that ranges from extremely high to high temperatures, applying a very fast cooling process, and is not very restrictive to accept new solutions. However, BAP and BEAP range from high to low and from low to very low temperatures, respectively. They are more restrictive for accepting new solutions. DEP uses a particular heuristic to detect the stochastic equilibrium by applying a least squares method during its execution. MPSABBE parameters are tuned with an analytical method, which considers the maximal and minimal deterioration of problem instances. MPSABBE was tested with several instances of PFP, showing that the use of both distributions is better than using only the Boltzmann distribution on the classical SA. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"7357123"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2016/7357123","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34668396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Ebolavirus Database: Gene and Protein Information Resource for Ebolaviruses. 埃博拉病毒数据库:埃博拉病毒基因和蛋白质信息资源。

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-04-14 DOI: 10.1155/2016/1673284

Rayapadi G Swetha, Sudha Ramaiah, Anand Anbarasu, Kanagaraj Sekar

Ebola Virus Disease (EVD) is a life-threatening haemorrhagic fever in humans. Even though there are many reports on EVD, the protein precursor functions and virulent factors of ebolaviruses remain poorly understood. Comparative analyses of Ebolavirus genomes will help in the identification of these important features. This prompted us to develop the Ebolavirus Database (EDB) and we have provided links to various tools that will aid researchers to locate important regions in both the genomes and proteomes of Ebolavirus. The genomic analyses of ebolaviruses will provide important clues for locating the essential and core functional genes. The aim of EDB is to act as an integrated resource for ebolaviruses and we strongly believe that the database will be a useful tool for clinicians, microbiologists, health care workers, and bioscience researchers.

埃博拉病毒病(EVD)是一种危及生命的人类出血热。尽管有许多关于埃博拉病毒病的报告，但对埃博拉病毒的蛋白质前体功能和毒力因素仍然知之甚少。对埃博拉病毒基因组进行比较分析将有助于确定这些重要特征。这促使我们开发了埃博拉病毒数据库(EDB)，我们提供了各种工具的链接，这些工具将帮助研究人员定位埃博拉病毒基因组和蛋白质组中的重要区域。对埃博拉病毒的基因组分析将为确定其基本和核心功能基因提供重要线索。EDB的目标是作为埃博拉病毒的综合资源，我们坚信该数据库将成为临床医生、微生物学家、卫生保健工作者和生物科学研究人员的有用工具。

引用次数: 4

Expressing Redundancy among Linear-Epitope Sequence Data Based on Residue-Level Physicochemical Similarity in the Context of Antigenic Cross-Reaction. 抗原交叉反应中基于残差级物理化学相似性的线性表位序列数据冗余表达。

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-05-04 DOI: 10.1155/2016/1276594

Salvador Eugenio C Caoili

Epitope-based design of vaccines, immunotherapeutics, and immunodiagnostics is complicated by structural changes that radically alter immunological outcomes. This is obscured by expressing redundancy among linear-epitope data as fractional sequence-alignment identity, which fails to account for potentially drastic loss of binding affinity due to single-residue substitutions even where these might be considered conservative in the context of classical sequence analysis. From the perspective of immune function based on molecular recognition of epitopes, functional redundancy of epitope data (FRED) thus may be defined in a biologically more meaningful way based on residue-level physicochemical similarity in the context of antigenic cross-reaction, with functional similarity between epitopes expressed as the Shannon information entropy for differential epitope binding. Such similarity may be estimated in terms of structural differences between an immunogen epitope and an antigen epitope with reference to an idealized binding site of high complementarity to the immunogen epitope, by analogy between protein folding and ligand-receptor binding; but this underestimates potential for cross-reactivity, suggesting that epitope-binding site complementarity is typically suboptimal as regards immunologic specificity. The apparently suboptimal complementarity may reflect a tradeoff to attain optimal immune function that favors generation of immune-system components each having potential for cross-reactivity with a variety of epitopes.

基于表位的疫苗、免疫疗法和免疫诊断设计因结构变化而复杂化，这些结构变化从根本上改变了免疫结果。通过将线性表位数据之间的冗余表达为分数序列比对身份，这一点被掩盖了，这无法解释由于单残基替换而导致的潜在的剧烈结合亲和力损失，即使这些在经典序列分析的背景下可能被认为是保守的。因此，从基于表位分子识别的免疫功能的角度来看，表位数据的功能冗余(FRED)可以在抗原交叉反应背景下基于残基水平的物理化学相似性以更有生物学意义的方式定义，表位之间的功能相似性表示为差异表位结合的Shannon信息熵。这种相似性可以通过类比蛋白质折叠和配体受体结合，根据与免疫原表位具有高度互补性的理想结合位点的免疫原表位和抗原表位之间的结构差异来估计;但这低估了交叉反应的潜力，表明表位结合位点的互补性在免疫特异性方面通常是次优的。这种明显的次优互补性可能反映了为了获得最佳免疫功能而进行的权衡，这种权衡有利于产生免疫系统成分，每个成分都具有与各种表位交叉反应的潜力。

{"title":"Expressing Redundancy among Linear-Epitope Sequence Data Based on Residue-Level Physicochemical Similarity in the Context of Antigenic Cross-Reaction.","authors":"Salvador Eugenio C Caoili","doi":"10.1155/2016/1276594","DOIUrl":"https://doi.org/10.1155/2016/1276594","url":null,"abstract":"Epitope-based design of vaccines, immunotherapeutics, and immunodiagnostics is complicated by structural changes that radically alter immunological outcomes. This is obscured by expressing redundancy among linear-epitope data as fractional sequence-alignment identity, which fails to account for potentially drastic loss of binding affinity due to single-residue substitutions even where these might be considered conservative in the context of classical sequence analysis. From the perspective of immune function based on molecular recognition of epitopes, functional redundancy of epitope data (FRED) thus may be defined in a biologically more meaningful way based on residue-level physicochemical similarity in the context of antigenic cross-reaction, with functional similarity between epitopes expressed as the Shannon information entropy for differential epitope binding. Such similarity may be estimated in terms of structural differences between an immunogen epitope and an antigen epitope with reference to an idealized binding site of high complementarity to the immunogen epitope, by analogy between protein folding and ligand-receptor binding; but this underestimates potential for cross-reactivity, suggesting that epitope-binding site complementarity is typically suboptimal as regards immunologic specificity. The apparently suboptimal complementarity may reflect a tradeoff to attain optimal immune function that favors generation of immune-system components each having potential for cross-reactivity with a variety of epitopes. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"1276594"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2016/1276594","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34620741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Random versus Deterministic Descent in RNA Energy Landscape Analysis. RNA能量景观分析中的随机与确定性下降。

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-03-02 DOI: 10.1155/2016/9654921

Luke Day, Ouala Abdelhadi Ep Souki, Andreas A Albrecht, Kathleen Steinhöfel

Identifying sets of metastable conformations is a major research topic in RNA energy landscape analysis, and recently several methods have been proposed for finding local minima in landscapes spawned by RNA secondary structures. An important and time-critical component of such methods is steepest, or gradient, descent in attraction basins of local minima. We analyse the speed-up achievable by randomised descent in attraction basins in the context of large sample sets where the size has an order of magnitude in the region of ~10(6). While the gain for each individual sample might be marginal, the overall run-time improvement can be significant. Moreover, for the two nongradient methods we analysed for partial energy landscapes induced by ten different RNA sequences, we obtained that the number of observed local minima is on average larger by 7.3% and 3.5%, respectively. The run-time improvement is approximately 16.6% and 6.8% on average over the ten partial energy landscapes. For the large sample size we selected for descent procedures, the coverage of local minima is very high up to energy values of the region where the samples were randomly selected from the partial energy landscapes; that is, the difference to the total set of local minima is mainly due to the upper area of the energy landscapes.

识别亚稳构象集是RNA能量景观分析中的一个主要研究课题，近年来人们提出了几种方法来寻找RNA二级结构产生的景观中的局部极小值。这种方法的一个重要和时间关键的组成部分是在局部极小值的吸引盆地中最陡或梯度下降。我们分析了在大样本集的背景下，随机下降在吸引盆地中可以实现的加速，其中大小在~10(6)的范围内具有数量级。虽然每个单独样本的增益可能是微不足道的，但总体运行时的改进可能是显著的。此外，对于两种非梯度方法，我们分析了10种不同RNA序列诱导的部分能量景观，我们得到的局部最小值的数量平均分别大7.3%和3.5%。在10个局部能量景观中，运行时的改进平均约为16.6%和6.8%。对于我们为下降过程选择的大样本，局部极小值的覆盖率非常高，直到样本从部分能量景观中随机选择的区域的能量值;也就是说，与局部极小值集合的差异主要是由于能量景观的上部区域。

{"title":"Random versus Deterministic Descent in RNA Energy Landscape Analysis.","authors":"Luke Day, Ouala Abdelhadi Ep Souki, Andreas A Albrecht, Kathleen Steinhöfel","doi":"10.1155/2016/9654921","DOIUrl":"https://doi.org/10.1155/2016/9654921","url":null,"abstract":"Identifying sets of metastable conformations is a major research topic in RNA energy landscape analysis, and recently several methods have been proposed for finding local minima in landscapes spawned by RNA secondary structures. An important and time-critical component of such methods is steepest, or gradient, descent in attraction basins of local minima. We analyse the speed-up achievable by randomised descent in attraction basins in the context of large sample sets where the size has an order of magnitude in the region of ~10(6). While the gain for each individual sample might be marginal, the overall run-time improvement can be significant. Moreover, for the two nongradient methods we analysed for partial energy landscapes induced by ten different RNA sequences, we obtained that the number of observed local minima is on average larger by 7.3% and 3.5%, respectively. The run-time improvement is approximately 16.6% and 6.8% on average over the ten partial energy landscapes. For the large sample size we selected for descent procedures, the coverage of local minima is very high up to energy values of the region where the samples were randomly selected from the partial energy landscapes; that is, the difference to the total set of local minima is mainly due to the upper area of the energy landscapes. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"9654921"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2016/9654921","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34330444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Agent-Based Deterministic Modeling of the Bone Marrow Homeostasis. 基于agent的骨髓稳态确定性建模。

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-06-02 DOI: 10.1155/2016/8054219

Manish Kurhekar, Umesh Deshpande

Modeling of stem cells not only describes but also predicts how a stem cell's environment can control its fate. The first stem cell populations discovered were hematopoietic stem cells (HSCs). In this paper, we present a deterministic model of bone marrow (that hosts HSCs) that is consistent with several of the qualitative biological observations. This model incorporates stem cell death (apoptosis) after a certain number of cell divisions and also demonstrates that a single HSC can potentially populate the entire bone marrow. It also demonstrates that there is a production of sufficient number of differentiated cells (RBCs, WBCs, etc.). We prove that our model of bone marrow is biologically consistent and it overcomes the biological feasibility limitations of previously reported models. The major contribution of our model is the flexibility it allows in choosing model parameters which permits several different simulations to be carried out in silico without affecting the homeostatic properties of the model. We have also performed agent-based simulation of the model of bone marrow system proposed in this paper. We have also included parameter details and the results obtained from the simulation. The program of the agent-based simulation of the proposed model is made available on a publicly accessible website.

干细胞的建模不仅描述而且预测了干细胞的环境如何控制它的命运。最早发现的干细胞群是造血干细胞(hsc)。在本文中，我们提出了一个与几个定性生物学观察一致的骨髓(宿主造血干细胞)的确定性模型。该模型包含了一定数量的细胞分裂后的干细胞死亡(凋亡)，也证明了单个HSC可以潜在地填充整个骨髓。它还表明有足够数量的分化细胞(红细胞，白细胞等)的产生。我们证明我们的骨髓模型在生物学上是一致的，它克服了以前报道的模型的生物学可行性限制。我们的模型的主要贡献是它允许选择模型参数的灵活性，允许在不影响模型的稳态特性的情况下在计算机上进行几种不同的模拟。我们还对本文提出的骨髓系统模型进行了基于agent的仿真。我们还包括参数细节和从模拟中得到的结果。提出的模型的基于代理的仿真程序在一个可公开访问的网站上提供。

{"title":"Agent-Based Deterministic Modeling of the Bone Marrow Homeostasis.","authors":"Manish Kurhekar, Umesh Deshpande","doi":"10.1155/2016/8054219","DOIUrl":"https://doi.org/10.1155/2016/8054219","url":null,"abstract":"Modeling of stem cells not only describes but also predicts how a stem cell's environment can control its fate. The first stem cell populations discovered were hematopoietic stem cells (HSCs). In this paper, we present a deterministic model of bone marrow (that hosts HSCs) that is consistent with several of the qualitative biological observations. This model incorporates stem cell death (apoptosis) after a certain number of cell divisions and also demonstrates that a single HSC can potentially populate the entire bone marrow. It also demonstrates that there is a production of sufficient number of differentiated cells (RBCs, WBCs, etc.). We prove that our model of bone marrow is biologically consistent and it overcomes the biological feasibility limitations of previously reported models. The major contribution of our model is the flexibility it allows in choosing model parameters which permits several different simulations to be carried out in silico without affecting the homeostatic properties of the model. We have also performed agent-based simulation of the model of bone marrow system proposed in this paper. We have also included parameter details and the results obtained from the simulation. The program of the agent-based simulation of the proposed model is made available on a publicly accessible website. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"8054219"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2016/8054219","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34606483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Support Vector Machine Classification of Thyroid Bioptic Specimens Using MALDI-MSI Data. 基于MALDI-MSI数据的甲状腺活检标本支持向量机分类。

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-05-17 DOI: 10.1155/2016/3791214

Manuel Galli, Italo Zoppis, Gabriele De Sio, Clizia Chinello, Fabio Pagni, Fulvio Magni, Giancarlo Mauri

Biomarkers able to characterise and predict multifactorial diseases are still one of the most important targets for all the "omics" investigations. In this context, Matrix-Assisted Laser Desorption/Ionisation-Mass Spectrometry Imaging (MALDI-MSI) has gained considerable attention in recent years, but it also led to a huge amount of complex data to be elaborated and interpreted. For this reason, computational and machine learning procedures for biomarker discovery are important tools to consider, both to reduce data dimension and to provide predictive markers for specific diseases. For instance, the availability of protein and genetic markers to support thyroid lesion diagnoses would impact deeply on society due to the high presence of undetermined reports (THY3) that are generally treated as malignant patients. In this paper we show how an accurate classification of thyroid bioptic specimens can be obtained through the application of a state-of-the-art machine learning approach (i.e., Support Vector Machines) on MALDI-MSI data, together with a particular wrapper feature selection algorithm (i.e., recursive feature elimination). The model is able to provide an accurate discriminatory capability using only 20 out of 144 features, resulting in an increase of the model performances, reliability, and computational efficiency. Finally, tissue areas rather than average proteomic profiles are classified, highlighting potential discriminating areas of clinical interest.

能够表征和预测多因子疾病的生物标志物仍然是所有“组学”研究中最重要的目标之一。在这种背景下，基质辅助激光解吸/电离-质谱成像(MALDI-MSI)近年来获得了相当大的关注，但它也导致了大量复杂的数据需要阐述和解释。因此，用于生物标志物发现的计算和机器学习程序是需要考虑的重要工具，既可以降低数据维数，又可以为特定疾病提供预测标记。例如，支持甲状腺病变诊断的蛋白质和遗传标记的可用性将对社会产生深远影响，因为不确定报告(THY3)的高存在通常被视为恶性患者。在本文中，我们展示了如何通过在MALDI-MSI数据上应用最先进的机器学习方法(即支持向量机)以及特定的包装特征选择算法(即递归特征消除)来获得甲状腺活检标本的准确分类。该模型仅使用144个特征中的20个特征就能提供准确的区分能力，从而提高了模型的性能、可靠性和计算效率。最后，组织区域，而不是平均蛋白质组谱进行分类，突出潜在的区别领域的临床兴趣。

{"title":"A Support Vector Machine Classification of Thyroid Bioptic Specimens Using MALDI-MSI Data.","authors":"Manuel Galli, Italo Zoppis, Gabriele De Sio, Clizia Chinello, Fabio Pagni, Fulvio Magni, Giancarlo Mauri","doi":"10.1155/2016/3791214","DOIUrl":"https://doi.org/10.1155/2016/3791214","url":null,"abstract":"Biomarkers able to characterise and predict multifactorial diseases are still one of the most important targets for all the \"omics\" investigations. In this context, Matrix-Assisted Laser Desorption/Ionisation-Mass Spectrometry Imaging (MALDI-MSI) has gained considerable attention in recent years, but it also led to a huge amount of complex data to be elaborated and interpreted. For this reason, computational and machine learning procedures for biomarker discovery are important tools to consider, both to reduce data dimension and to provide predictive markers for specific diseases. For instance, the availability of protein and genetic markers to support thyroid lesion diagnoses would impact deeply on society due to the high presence of undetermined reports (THY3) that are generally treated as malignant patients. In this paper we show how an accurate classification of thyroid bioptic specimens can be obtained through the application of a state-of-the-art machine learning approach (i.e., Support Vector Machines) on MALDI-MSI data, together with a particular wrapper feature selection algorithm (i.e., recursive feature elimination). The model is able to provide an accurate discriminatory capability using only 20 out of 144 features, resulting in an increase of the model performances, reliability, and computational efficiency. Finally, tissue areas rather than average proteomic profiles are classified, highlighting potential discriminating areas of clinical interest. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"3791214"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2016/3791214","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34571874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

FullSSR: Microsatellite Finder and Primer Designer. FullSSR:微卫星查找器和引物设计器。

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-06-06 DOI: 10.1155/2016/6040124

Sebastián Metz, Juan Manuel Cabrera, Eva Rueda, Federico Giri, Patricia Amavet

Microsatellites are genomic sequences comprised of tandem repeats of short nucleotide motifs widely used as molecular markers in population genetics. FullSSR is a new bioinformatic tool for microsatellite (SSR) loci detection and primer design using genomic data from NGS assay. The software was tested with 2000 sequences of Oryza sativa shotgun sequencing project from the National Center of Biotechnology Information Trace Archive and with partial genome sequencing with ROCHE 454® from Caiman latirostris, Salvator merianae, Aegla platensis, and Zilchiopsis collastinensis. FullSSR performance was compared against other similar SSR search programs. The results of the use of this kind of approach depend on the parameters set by the user. In addition, results can be affected by the analyzed sequences because of differences among the genomes. FullSSR simplifies the detection of SSRs and primer design on a big data set. The command line interface of FullSSR was intended to be used as part of genomic analysis tools pipeline; however, it can be used as a stand-alone program because the results are easily interpreted for a nonexpert user.

微卫星是由短核苷酸基序串联重复序列组成的基因组序列，在群体遗传学中广泛用作分子标记。FullSSR是一种新的生物信息学工具，用于利用NGS分析的基因组数据进行微卫星(SSR)位点检测和引物设计。该软件使用来自国家生物技术信息追踪档案中心的2000个Oryza sativa shotgun测序项目序列，以及来自Caiman latirostris、Salvator merianae、Aegla platensis和Zilchiopsis collastinensis的ROCHE 454®部分基因组测序进行测试。将FullSSR的性能与其他类似的SSR搜索程序进行了比较。使用这种方法的结果取决于用户设置的参数。此外，由于基因组之间的差异，结果可能受到分析序列的影响。FullSSR简化了大数据集上ssr的检测和引物设计。FullSSR的命令行界面旨在作为基因组分析工具管线的一部分;然而，它可以作为一个独立的程序使用，因为结果很容易解释非专业用户。

{"title":"FullSSR: Microsatellite Finder and Primer Designer.","authors":"Sebastián Metz, Juan Manuel Cabrera, Eva Rueda, Federico Giri, Patricia Amavet","doi":"10.1155/2016/6040124","DOIUrl":"https://doi.org/10.1155/2016/6040124","url":null,"abstract":"Microsatellites are genomic sequences comprised of tandem repeats of short nucleotide motifs widely used as molecular markers in population genetics. FullSSR is a new bioinformatic tool for microsatellite (SSR) loci detection and primer design using genomic data from NGS assay. The software was tested with 2000 sequences of Oryza sativa shotgun sequencing project from the National Center of Biotechnology Information Trace Archive and with partial genome sequencing with ROCHE 454® from Caiman latirostris, Salvator merianae, Aegla platensis, and Zilchiopsis collastinensis. FullSSR performance was compared against other similar SSR search programs. The results of the use of this kind of approach depend on the parameters set by the user. In addition, results can be affected by the analyzed sequences because of differences among the genomes. FullSSR simplifies the detection of SSRs and primer design on a big data set. The command line interface of FullSSR was intended to be used as part of genomic analysis tools pipeline; however, it can be used as a stand-alone program because the results are easily interpreted for a nonexpert user. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"6040124"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2016/6040124","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34692637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Bioinformatics Approach for Prediction of Functional Coding/Noncoding Simple Polymorphisms (SNPs/Indels) in Human BRAF Gene. 人类BRAF基因功能编码/非编码简单多态性(snp /Indels)预测的生物信息学方法

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-07-10 DOI: 10.1155/2016/2632917

Mohamed M Hassan, Shaza E Omer, Rahma M Khalf-Allah, Razaz Y Mustafa, Isra S Ali, Sofia B Mohamed

This study was carried out for Homo sapiens single variation (SNPs/Indels) in BRAF gene through coding/non-coding regions. Variants data was obtained from database of SNP even last update of November, 2015. Many bioinformatics tools were used to identify functional SNPs and indels in proteins functions, structures and expressions. Results shown, for coding polymorphisms, 111 SNPs predicted as highly damaging and six other were less. For UTRs, showed five SNPs and one indel were altered in micro RNAs binding sites (3' UTR), furthermore nil SNP or indel have functional altered in transcription factor binding sites (5' UTR). In addition for 5'/3' splice sites, analysis showed that one SNP within 5' splice site and one Indel in 3' splice site showed potential alteration of splicing. In conclude these previous functional identified SNPs and indels could lead to gene alteration, which may be directly or indirectly contribute to the occurrence of many diseases.

本研究通过编码区和非编码区对智人BRAF基因的单变异(snp /Indels)进行分析。变异数据来自SNP数据库，截止到2015年11月更新。许多生物信息学工具被用来鉴定蛋白质功能、结构和表达中的功能snp和索引。结果表明，在编码多态性中，111个snp被预测为高破坏性，另外6个snp被预测为低破坏性。微rna结合位点(3′UTR)有5个SNP和1个indel发生改变，转录因子结合位点(5′UTR)无SNP或indel发生功能性改变。此外，对于5'/3'剪接位点，分析显示5'剪接位点内的1个SNP和3'剪接位点上的1个Indel显示了剪接的潜在改变。总之，这些先前功能性鉴定的snp和indel可能导致基因改变，这可能直接或间接地促进许多疾病的发生。

引用次数: 14

Structural Dynamics of Human Argonaute2 and Its Interaction with siRNAs Designed to Target Mutant tdp43. 人类Argonaute2的结构动力学及其与靶向突变体tdp43的sirna的相互作用

Q1 Biochemistry, Genetics and Molecular Biology

Advances in Bioinformatics

Pub Date : 2016-01-01 Epub Date: 2016-03-06 DOI: 10.1155/2016/8792814

Vishwambhar Bhandare, Amutha Ramaswamy

The human Argonaute2 protein (Ago2) is a key player in RNA interference pathway and small RNA recognition by Ago2 is the crucial step in siRNA mediated gene silencing mechanism. The present study highlights the structural and functional dynamics of human Ago2 and the interaction mechanism of Ago2 with a set of seven siRNAs for the first time. The human Ago2 protein adopts two conformations such as "open" and "close" during the simulation of 25 ns. One of the domains named as PAZ, which is responsible for anchoring the 3'-end of siRNA guide strand, is observed as a highly flexible region. The interaction between Ago2 and siRNA, analyzed using a set of siRNAs (targeting at positions 128, 251, 341, 383, 537, 1113, and 1115 of mRNA) designed to target tdp43 mutants causing Amyotrophic Lateral Sclerosis (ALS) disease, revealed the stable and strong recognition of siRNA by the Ago2 protein during dynamics. Among the studied siRNAs, the siRNA341 is identified as a potent siRNA to recognize Ago2 and hence could be used further as a possible siRNA candidate to target the mutant tdp43 protein for the treatment of ALS patients.

人Argonaute2蛋白(Ago2)是RNA干扰通路的关键分子，Ago2对小RNA的识别是siRNA介导的基因沉默机制的关键步骤。本研究首次揭示了人类Ago2的结构和功能动态，以及Ago2与一组7种sirna的相互作用机制。人类Ago2蛋白在25 ns的模拟过程中采用“开”和“闭”两种构象。其中一个被称为PAZ的结构域负责锚定siRNA引导链的3'端，被观察到是一个高度灵活的区域。利用一组旨在靶向引起肌萎缩性侧索硬化症(ALS)的tdp43突变体的siRNA(靶向mRNA的128、251、341、383、537、1113和1115位)分析了Ago2与siRNA之间的相互作用，揭示了Ago2蛋白在动态过程中对siRNA的稳定和强烈识别。在所研究的siRNA中，siRNA341被鉴定为识别Ago2的强效siRNA，因此可以进一步作为靶向突变体tdp43蛋白治疗ALS患者的可能siRNA候选物。

{"title":"Structural Dynamics of Human Argonaute2 and Its Interaction with siRNAs Designed to Target Mutant tdp43.","authors":"Vishwambhar Bhandare, Amutha Ramaswamy","doi":"10.1155/2016/8792814","DOIUrl":"https://doi.org/10.1155/2016/8792814","url":null,"abstract":"The human Argonaute2 protein (Ago2) is a key player in RNA interference pathway and small RNA recognition by Ago2 is the crucial step in siRNA mediated gene silencing mechanism. The present study highlights the structural and functional dynamics of human Ago2 and the interaction mechanism of Ago2 with a set of seven siRNAs for the first time. The human Ago2 protein adopts two conformations such as \"open\" and \"close\" during the simulation of 25 ns. One of the domains named as PAZ, which is responsible for anchoring the 3'-end of siRNA guide strand, is observed as a highly flexible region. The interaction between Ago2 and siRNA, analyzed using a set of siRNAs (targeting at positions 128, 251, 341, 383, 537, 1113, and 1115 of mRNA) designed to target tdp43 mutants causing Amyotrophic Lateral Sclerosis (ALS) disease, revealed the stable and strong recognition of siRNA by the Ago2 protein during dynamics. Among the studied siRNAs, the siRNA341 is identified as a potent siRNA to recognize Ago2 and hence could be used further as a possible siRNA candidate to target the mutant tdp43 protein for the treatment of ALS patients. ","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2016 ","pages":"8792814"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2016/8792814","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34330443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20