The bad and the good of trends in model building and refinement for sparse-data regions: pernicious forms of overfitting versus good new tools and predictions.

IF 2.6 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Acta Crystallographica. Section D, Structural Biology Pub Date : 2023-12-01 Epub Date: 2023-11-03 DOI:10.1107/S2059798323008847
Jane S Richardson, Christopher J Williams, Vincent B Chen, Michael G Prisant, David C Richardson
{"title":"The bad and the good of trends in model building and refinement for sparse-data regions: pernicious forms of overfitting versus good new tools and predictions.","authors":"Jane S Richardson, Christopher J Williams, Vincent B Chen, Michael G Prisant, David C Richardson","doi":"10.1107/S2059798323008847","DOIUrl":null,"url":null,"abstract":"<p><p>Model building and refinement, and the validation of their correctness, are very effective and reliable at local resolutions better than about 2.5 Å for both crystallography and cryo-EM. However, at local resolutions worse than 2.5 Å both the procedures and their validation break down and do not ensure reliably correct models. This is because in the broad density at lower resolution, critical features such as protein backbone carbonyl O atoms are not just less accurate but are not seen at all, and so peptide orientations are frequently wrongly fitted by 90-180°. This puts both backbone and side chains into the wrong local energy minimum, and they are then worsened rather than improved by further refinement into a valid but incorrect rotamer or Ramachandran region. On the positive side, new tools are being developed to locate this type of pernicious error in PDB depositions, such as CaBLAM, EMRinger, Pperp diagnosis of ribose puckers, and peptide flips in PDB-REDO, while interactive modeling in Coot or ISOLDE can help to fix many of them. Another positive trend is that artificial intelligence predictions such as those made by AlphaFold2 contribute additional evidence from large multiple sequence alignments, and in high-confidence parts they provide quite good starting models for loops, termini or whole domains with otherwise ambiguous density.</p>","PeriodicalId":7116,"journal":{"name":"Acta Crystallographica. Section D, Structural Biology","volume":" ","pages":"1071-1078"},"PeriodicalIF":2.6000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10833350/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Crystallographica. Section D, Structural Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1107/S2059798323008847","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/3 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Model building and refinement, and the validation of their correctness, are very effective and reliable at local resolutions better than about 2.5 Å for both crystallography and cryo-EM. However, at local resolutions worse than 2.5 Å both the procedures and their validation break down and do not ensure reliably correct models. This is because in the broad density at lower resolution, critical features such as protein backbone carbonyl O atoms are not just less accurate but are not seen at all, and so peptide orientations are frequently wrongly fitted by 90-180°. This puts both backbone and side chains into the wrong local energy minimum, and they are then worsened rather than improved by further refinement into a valid but incorrect rotamer or Ramachandran region. On the positive side, new tools are being developed to locate this type of pernicious error in PDB depositions, such as CaBLAM, EMRinger, Pperp diagnosis of ribose puckers, and peptide flips in PDB-REDO, while interactive modeling in Coot or ISOLDE can help to fix many of them. Another positive trend is that artificial intelligence predictions such as those made by AlphaFold2 contribute additional evidence from large multiple sequence alignments, and in high-confidence parts they provide quite good starting models for loops, termini or whole domains with otherwise ambiguous density.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
稀疏数据区域的模型构建和细化趋势的好坏:有害的过度拟合形式与良好的新工具和预测。
模型的建立和完善,以及对其正确性的验证,在优于约2.5的局部分辨率下是非常有效和可靠的 Å用于晶体学和冷冻电镜。然而,在低于2.5的本地分辨率下 Å程序及其验证都出现故障,无法确保模型可靠正确。这是因为在较低分辨率的宽密度中,关键特征(如蛋白质骨架羰基O原子)不仅不太准确,而且根本看不到,因此肽取向经常被错误地拟合90-180°。这将主链和侧链都置于错误的局部能量最小值,然后通过进一步细化为有效但不正确的旋转异构体或Ramachandran区域,它们会恶化而不是改善。从积极的方面来看,正在开发新的工具来定位PDB沉积中的这种类型的有害错误,如CaBLAM、EMRinger、核糖折叠的Pperp诊断和PDB-REDO中的肽翻转,而Coot或ISOLDE中的交互建模可以帮助修复其中的许多错误。另一个积极的趋势是,人工智能预测,如AlphaFold2所做的预测,从大型多序列比对中提供了额外的证据,在高置信度部分,它们为具有模糊密度的环、末端或整个域提供了非常好的起始模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Acta Crystallographica. Section D, Structural Biology
Acta Crystallographica. Section D, Structural Biology BIOCHEMICAL RESEARCH METHODSBIOCHEMISTRY &-BIOCHEMISTRY & MOLECULAR BIOLOGY
CiteScore
4.50
自引率
13.60%
发文量
216
期刊介绍: Acta Crystallographica Section D welcomes the submission of articles covering any aspect of structural biology, with a particular emphasis on the structures of biological macromolecules or the methods used to determine them. Reports on new structures of biological importance may address the smallest macromolecules to the largest complex molecular machines. These structures may have been determined using any structural biology technique including crystallography, NMR, cryoEM and/or other techniques. The key criterion is that such articles must present significant new insights into biological, chemical or medical sciences. The inclusion of complementary data that support the conclusions drawn from the structural studies (such as binding studies, mass spectrometry, enzyme assays, or analysis of mutants or other modified forms of biological macromolecule) is encouraged. Methods articles may include new approaches to any aspect of biological structure determination or structure analysis but will only be accepted where they focus on new methods that are demonstrated to be of general applicability and importance to structural biology. Articles describing particularly difficult problems in structural biology are also welcomed, if the analysis would provide useful insights to others facing similar problems.
期刊最新文献
The success rate of processed predicted models in molecular replacement: implications for experimental phasing in the AlphaFold era. EMhub: a web platform for data management and on-the-fly processing in scientific facilities. Welcoming two new Co-editors. CHiMP: deep-learning tools trained on protein crystallization micrographs to enable automation of experiments. Robust and automatic beamstop shadow outlier rejection: combining crystallographic statistics with modern clustering under a semi-supervised learning strategy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1