SOV_refine:进一步细化了片段重叠评分的定义及其对蛋白质结构相似性的意义。

Q2 Decision Sciences Source Code for Biology and Medicine Pub Date : 2018-04-20 eCollection Date: 2018-01-01 DOI:10.1186/s13029-018-0068-7
Tong Liu, Zheng Wang
{"title":"SOV_refine:进一步细化了片段重叠评分的定义及其对蛋白质结构相似性的意义。","authors":"Tong Liu,&nbsp;Zheng Wang","doi":"10.1186/s13029-018-0068-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV's advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately.</p><p><strong>Results: </strong>A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine  > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance.</p><p><strong>Conclusions: </strong>The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"13 ","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-018-0068-7","citationCount":"16","resultStr":"{\"title\":\"SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity.\",\"authors\":\"Tong Liu,&nbsp;Zheng Wang\",\"doi\":\"10.1186/s13029-018-0068-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV's advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately.</p><p><strong>Results: </strong>A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine  > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance.</p><p><strong>Conclusions: </strong>The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/.</p>\",\"PeriodicalId\":35052,\"journal\":{\"name\":\"Source Code for Biology and Medicine\",\"volume\":\"13 \",\"pages\":\"1\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/s13029-018-0068-7\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Source Code for Biology and Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13029-018-0068-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2018/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Source Code for Biology and Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13029-018-0068-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2018/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 16

摘要

背景:片段重叠评分(SOV)被用来评估预测的蛋白质二级结构,由螺旋(H),链(E)和线圈(C)组成的序列,通过将其与天然或参考二级结构,另一个序列H, E,SOV的优点在于它可以考虑连续重叠段的大小,并对较长的连续重叠段给予额外的补偿,而不是像Q3得分那样只从重叠的个体位置百分比来判断。然而,我们从之前的定义中发现了一个缺点,即当进一步准确预测段中更多残数时,不能保证增加余量分配。结果:设计了一种新的允许度分配方法,既保留了以往SOV评分定义的所有优点,又保证了当一个片段中的元素被准确预测时,允许度的分配是递增的。此外,我们改进的SOV与GDT-TS评分和tm评分测量的蛋白质模型质量具有更高的相关性,表明其在二级结构水平上评价三级结构质量的能力更好。我们分析了SOV评分的统计学意义,找到了区分两种蛋白质结构的阈值(SOV_refine > 0.19)和指示两种蛋白质是否处于同一CATH折叠下的阈值(三态二级结构SOV_refine > 0.94,八态二级结构SOV_refine > 0.90)。我们提供了另外两个示例应用程序,它们被用作蛋白质模型质量评估和比较拓扑关联域的不同定义的机器学习功能。我们证明了我们新定义的SOV分数带来了更好的性能。结论:SOV评分可广泛应用于生物信息学研究等需要比较两个字母序列,其中连续片段具有重要意义的领域。我们还推广了以前的SOV定义,使其可以适用于由三个以上状态组成的序列(例如,它可以适用于蛋白质二级结构的八状态定义)。一个独立的软件包已经用Perl实现,并发布了源代码。该软件可从http://dna.cs.miami.edu/SOV/下载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity.

Background: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV's advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately.

Results: A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine  > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance.

Conclusions: The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Source Code for Biology and Medicine
Source Code for Biology and Medicine Decision Sciences-Information Systems and Management
自引率
0.00%
发文量
0
期刊介绍: Source Code for Biology and Medicine is a peer-reviewed open access, online journal that publishes articles on source code employed over a wide range of applications in biology and medicine. The journal"s aim is to publish source code for distribution and use in the public domain in order to advance biological and medical research. Through this dissemination, it may be possible to shorten the time required for solving certain computational problems for which there is limited source code availability or resources.
期刊最新文献
2DKD: a toolkit for content-based local image search. Computing and graphing probability values of pearson distributions: a SAS/IML macro. iPBAvizu: a PyMOL plugin for an efficient 3D protein structure superimposition approach Social support for collaboration and group awareness in life science research teams. MZPAQ: a FASTQ data compression tool.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1