首页 > 最新文献

PLoS Computational Biology最新文献

英文 中文
iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction. iCircDA-NEAE:用于circRNA疾病关联预测的加速属性网络嵌入和动态卷积自动编码器。
IF 4.3 2区 生物学 Pub Date : 2023-08-31 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011344
Lin Yuan, Jiawang Zhao, Zhen Shen, Qinhu Zhang, Yushui Geng, Chun-Hou Zheng, De-Shuang Huang

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.

越来越多的证据表明,circRNAs在人类疾病中发挥着至关重要的作用。CircRNA疾病关联预测在了解发病机制、诊断和预防以及识别相关生物标志物方面非常有帮助。在过去的几年里,已经提出了大量基于深度学习(DL)的方法来预测circRNA疾病关联,并取得了令人印象深刻的预测性能。然而,这些方法有两个主要缺点。首先,这些方法没有充分利用数据中的生物特征信息。其次,通过这些方法提取的特征并不突出,不能代表circRNA与疾病之间的关联特征。在这项研究中,我们开发了一个新的深度学习模型,名为iCircDA-NEAE,用于预测circRNA与疾病的相关性。特别是,我们首次同时使用疾病语义相似性、高斯交互谱核、circRNA表达谱相似性和Jaccard相似性,并基于加速属性网络嵌入(AANE)和动态卷积自动编码器(DCAE)提取隐藏特征。在circR2Disease数据集上的实验结果表明,iCircDA-NEAE显著优于其他竞争方法。此外,预测得分最高的前20个circRNA疾病对中有16个得到了相关文献的验证。此外,我们观察到iCircDA-NEAE可以有效地预测新的潜在circRNA疾病关联。
{"title":"iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction.","authors":"Lin Yuan, Jiawang Zhao, Zhen Shen, Qinhu Zhang, Yushui Geng, Chun-Hou Zheng, De-Shuang Huang","doi":"10.1371/journal.pcbi.1011344","DOIUrl":"10.1371/journal.pcbi.1011344","url":null,"abstract":"<p><p>Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011344"},"PeriodicalIF":4.3,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10151643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family. ADAMTS-TSL家族中序列模块和蛋白质-蛋白质相互作用出现的系统发育推断。
IF 4.3 2区 生物学 Pub Date : 2023-08-31 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011404
Olivier Dennler, François Coste, Samuel Blanquart, Catherine Belleannée, Nathalie Théret

Numerous computational methods based on sequences or structures have been developed for the characterization of protein function, but they are still unsatisfactory to deal with the multiple functions of multi-domain protein families. Here we propose an original approach based on 1) the detection of conserved sequence modules using partial local multiple alignment, 2) the phylogenetic inference of species/genes/modules/functions evolutionary histories, and 3) the identification of co-appearances of modules and functions. Applying our framework to the multidomain ADAMTS-TSL family including ADAMTS (A Disintegrin-like and Metalloproteinase with ThromboSpondin motif) and ADAMTS-like proteins over nine species including human, we identify 45 sequence module signatures that are associated with the occurrence of 278 Protein-Protein Interactions in ancestral genes. Some of these signatures are supported by published experimental data and the others provide new insights (e.g. ADAMTS-5). The module signatures of ADAMTS ancestors notably highlight the dual variability of the propeptide and ancillary regions suggesting the importance of these two regions in the specialization of ADAMTS during evolution. Our analyses further indicate convergent interactions of ADAMTS with COMP and CCN2 proteins. Overall, our study provides 186 sequence module signatures that discriminate distinct subgroups of ADAMTS and ADAMTSL and that may result from selective pressures on novel functions and phenotypes.

已经开发了许多基于序列或结构的计算方法来表征蛋白质功能,但它们对于处理多结构域蛋白质家族的多个功能仍然不令人满意。在这里,我们提出了一种原始方法,基于1)使用部分局部多重比对检测保守序列模块,2)物种/基因/模块/功能进化史的系统发育推断,以及3)识别模块和功能的共同出现。将我们的框架应用于包括人类在内的九个物种的多结构域ADAMTS-TSL家族,包括ADAMTS(一种具有血栓海绵蛋白基序的崩解蛋白样和金属蛋白酶)和ADAMTS样蛋白,我们鉴定了45个序列模块特征,这些特征与祖先基因中278种蛋白质-蛋白质相互作用的发生有关。其中一些特征得到了已发表的实验数据的支持,而另一些则提供了新的见解(例如ADAMTS-5)。ADAMTS祖先的模块特征显著突出了前肽和辅助区域的双重可变性,这表明这两个区域在进化过程中ADAMTS的特化中的重要性。我们的分析进一步表明ADAMTS与COMP和CCN2蛋白的趋同相互作用。总的来说,我们的研究提供了186个序列模块特征,这些特征区分了ADAMTS和ADAMTSL的不同亚组,可能是由于对新功能和表型的选择性压力。
{"title":"Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family.","authors":"Olivier Dennler, François Coste, Samuel Blanquart, Catherine Belleannée, Nathalie Théret","doi":"10.1371/journal.pcbi.1011404","DOIUrl":"10.1371/journal.pcbi.1011404","url":null,"abstract":"<p><p>Numerous computational methods based on sequences or structures have been developed for the characterization of protein function, but they are still unsatisfactory to deal with the multiple functions of multi-domain protein families. Here we propose an original approach based on 1) the detection of conserved sequence modules using partial local multiple alignment, 2) the phylogenetic inference of species/genes/modules/functions evolutionary histories, and 3) the identification of co-appearances of modules and functions. Applying our framework to the multidomain ADAMTS-TSL family including ADAMTS (A Disintegrin-like and Metalloproteinase with ThromboSpondin motif) and ADAMTS-like proteins over nine species including human, we identify 45 sequence module signatures that are associated with the occurrence of 278 Protein-Protein Interactions in ancestral genes. Some of these signatures are supported by published experimental data and the others provide new insights (e.g. ADAMTS-5). The module signatures of ADAMTS ancestors notably highlight the dual variability of the propeptide and ancillary regions suggesting the importance of these two regions in the specialization of ADAMTS during evolution. Our analyses further indicate convergent interactions of ADAMTS with COMP and CCN2 proteins. Overall, our study provides 186 sequence module signatures that discriminate distinct subgroups of ADAMTS and ADAMTSL and that may result from selective pressures on novel functions and phenotypes.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011404"},"PeriodicalIF":4.3,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10499240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10587088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic extraction of actin networks in plants. 植物肌动蛋白网络的自动提取。
IF 4.3 2区 生物学 Pub Date : 2023-08-30 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011407
Jordan Hembrow, Michael J Deeks, David M Richards

The actin cytoskeleton is essential in eukaryotes, not least in the plant kingdom where it plays key roles in cell expansion, cell division, environmental responses and pathogen defence. Yet, the precise structure-function relationships of properties of the actin network in plants are still to be unravelled, including details of how the network configuration depends upon cell type, tissue type and developmental stage. Part of the problem lies in the difficulty of extracting high-quality, quantitative measures of actin network features from microscopy data. To address this problem, we have developed DRAGoN, a novel image analysis algorithm that can automatically extract the actin network across a range of cell types, providing seventeen different quantitative measures that describe the network at a local level. Using this algorithm, we then studied a number of cases in Arabidopsis thaliana, including several different tissues, a variety of actin-affected mutants, and cells responding to powdery mildew. In many cases we found statistically-significant differences in actin network properties. In addition to these results, our algorithm is designed to be easily adaptable to other tissues, mutants and plants, and so will be a valuable asset for the study and future biological engineering of the actin cytoskeleton in globally-important crops.

肌动蛋白细胞骨架在真核生物中至关重要,尤其是在植物界,它在细胞扩张、细胞分裂、环境反应和病原体防御中发挥着关键作用。然而,植物肌动蛋白网络性质的精确结构-功能关系仍有待解开,包括网络结构如何取决于细胞类型、组织类型和发育阶段的细节。部分问题在于难以从显微镜数据中提取高质量、定量的肌动蛋白网络特征。为了解决这个问题,我们开发了DRAGoN,这是一种新的图像分析算法,可以自动提取一系列细胞类型的肌动蛋白网络,提供17种不同的定量测量方法,在局部水平上描述网络。然后,我们使用该算法研究了拟南芥的许多病例,包括几种不同的组织、各种受肌动蛋白影响的突变体和对白粉菌有反应的细胞。在许多情况下,我们发现肌动蛋白网络特性存在统计学上的显著差异。除这些结果外,我们的算法被设计为易于适应其他组织、突变体和植物,因此将成为研究和未来全球重要作物肌动蛋白细胞骨架生物工程的宝贵资产。
{"title":"Automatic extraction of actin networks in plants.","authors":"Jordan Hembrow,&nbsp;Michael J Deeks,&nbsp;David M Richards","doi":"10.1371/journal.pcbi.1011407","DOIUrl":"10.1371/journal.pcbi.1011407","url":null,"abstract":"<p><p>The actin cytoskeleton is essential in eukaryotes, not least in the plant kingdom where it plays key roles in cell expansion, cell division, environmental responses and pathogen defence. Yet, the precise structure-function relationships of properties of the actin network in plants are still to be unravelled, including details of how the network configuration depends upon cell type, tissue type and developmental stage. Part of the problem lies in the difficulty of extracting high-quality, quantitative measures of actin network features from microscopy data. To address this problem, we have developed DRAGoN, a novel image analysis algorithm that can automatically extract the actin network across a range of cell types, providing seventeen different quantitative measures that describe the network at a local level. Using this algorithm, we then studied a number of cases in Arabidopsis thaliana, including several different tissues, a variety of actin-affected mutants, and cells responding to powdery mildew. In many cases we found statistically-significant differences in actin network properties. In addition to these results, our algorithm is designed to be easily adaptable to other tissues, mutants and plants, and so will be a valuable asset for the study and future biological engineering of the actin cytoskeleton in globally-important crops.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011407"},"PeriodicalIF":4.3,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10497154/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10238543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Steady-state approximations for Hodgkin-Huxley cell models: Reduction of order for uterine smooth muscle cell model. 霍奇金-赫胥黎细胞模型的稳态近似:子宫平滑肌细胞模型的降阶。
IF 4.3 2区 生物学 Pub Date : 2023-08-30 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011359
Shawn A Means, Mathias W Roesler, Amy S Garrett, Leo Cheng, Alys R Clark

Multi-scale mathematical bioelectrical models of organs such as the uterus, stomach or heart present challenges both for accuracy and computational tractability. These multi-scale models are typically founded on models of biological cells derived from the classic Hodkgin-Huxley (HH) formalism. Ion channel behaviour is tracked with dynamical variables representing activation or inactivation of currents that relax to steady-state dependencies on cellular membrane voltage. Timescales for relaxation may be orders of magnitude faster than companion ion channel variables or phenomena of physiological interest for the entire cell (such as bursting sequences of action potentials) or the entire organ (such as electromechanical coordination). Exploiting these time scales with steady-state approximations for relatively fast-acting systems is a well-known but often overlooked approach as evidenced by recent published models. We thus investigate feasibility of an extensive reduction of order for an HH-type cell model with steady-state approximations to the full dynamical activation and inactivation ion channel variables. Our effort utilises a published comprehensive uterine smooth muscle cell model that encompasses 19 ordinary differential equations and 105 formulations overall. The numerous ion channel submodels in the published model exhibit relaxation times ranging from order 10-1 to 105 milliseconds. Substitution of the faster dynamic variables with steady-state formulations demonstrates both an accurate reproduction of the full model and substantial improvements in time-to-solve, for test cases performed. Our demonstration here of an effective and relatively straightforward reduction method underlines the particular importance of considering time scales for model simplification before embarking on large-scale computations or parameter sweeps. As a preliminary complement to more intensive reduction of order methods such as parameter sensitivity and bifurcation analysis, this approach can rapidly and accurately improve computational tractability for challenging multi-scale organ modelling efforts.

子宫、胃或心脏等器官的多尺度数学生物电模型在准确性和计算可处理性方面都面临挑战。这些多尺度模型通常建立在源自经典Hodkgin-Huxley(HH)形式的生物细胞模型上。用表示电流激活或失活的动态变量来跟踪离子通道行为,这些电流松弛到对细胞膜电压的稳态依赖性。弛豫的时间尺度可以比伴随离子通道变量或整个细胞(如动作电位的爆发序列)或整个器官(如机电协调)的生理感兴趣的现象快几个数量级。将这些时间尺度与稳态近似用于相对快速作用的系统是一种众所周知但经常被忽视的方法,最近发表的模型证明了这一点。因此,我们研究了HH型细胞模型的大量降阶的可行性,该模型具有对全动态激活和失活离子通道变量的稳态近似。我们的工作利用了一个已发表的综合子宫平滑肌细胞模型,该模型包括19个常微分方程和105个配方。已发表的模型中的许多离子通道子模型表现出从10-1到105毫秒的弛豫时间。用稳态公式替换更快的动态变量表明,对于执行的测试用例,完整模型的准确再现和求解时间的显著改进。我们在这里展示了一种有效且相对简单的归约方法,强调了在开始大规模计算或参数扫描之前考虑模型简化的时间尺度的特殊重要性。作为对参数灵敏度和分叉分析等更密集的降阶方法的初步补充,该方法可以快速准确地提高具有挑战性的多尺度器官建模工作的计算可处理性。
{"title":"Steady-state approximations for Hodgkin-Huxley cell models: Reduction of order for uterine smooth muscle cell model.","authors":"Shawn A Means,&nbsp;Mathias W Roesler,&nbsp;Amy S Garrett,&nbsp;Leo Cheng,&nbsp;Alys R Clark","doi":"10.1371/journal.pcbi.1011359","DOIUrl":"10.1371/journal.pcbi.1011359","url":null,"abstract":"<p><p>Multi-scale mathematical bioelectrical models of organs such as the uterus, stomach or heart present challenges both for accuracy and computational tractability. These multi-scale models are typically founded on models of biological cells derived from the classic Hodkgin-Huxley (HH) formalism. Ion channel behaviour is tracked with dynamical variables representing activation or inactivation of currents that relax to steady-state dependencies on cellular membrane voltage. Timescales for relaxation may be orders of magnitude faster than companion ion channel variables or phenomena of physiological interest for the entire cell (such as bursting sequences of action potentials) or the entire organ (such as electromechanical coordination). Exploiting these time scales with steady-state approximations for relatively fast-acting systems is a well-known but often overlooked approach as evidenced by recent published models. We thus investigate feasibility of an extensive reduction of order for an HH-type cell model with steady-state approximations to the full dynamical activation and inactivation ion channel variables. Our effort utilises a published comprehensive uterine smooth muscle cell model that encompasses 19 ordinary differential equations and 105 formulations overall. The numerous ion channel submodels in the published model exhibit relaxation times ranging from order 10-1 to 105 milliseconds. Substitution of the faster dynamic variables with steady-state formulations demonstrates both an accurate reproduction of the full model and substantial improvements in time-to-solve, for test cases performed. Our demonstration here of an effective and relatively straightforward reduction method underlines the particular importance of considering time scales for model simplification before embarking on large-scale computations or parameter sweeps. As a preliminary complement to more intensive reduction of order methods such as parameter sensitivity and bifurcation analysis, this approach can rapidly and accurately improve computational tractability for challenging multi-scale organ modelling efforts.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011359"},"PeriodicalIF":4.3,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468033/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10153158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modulation of antigen discrimination by duration of immune contacts in a kinetic proofreading model of T cell activation with extreme statistics. 在具有极端统计的T细胞激活的动力学校对模型中,通过免疫接触的持续时间来调节抗原辨别。
IF 4.3 2区 生物学 Pub Date : 2023-08-30 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011216
Jonathan Morgan, Alan E Lindsay

T cells form transient cell-to-cell contacts with antigen presenting cells (APCs) to facilitate surface interrogation by membrane bound T cell receptors (TCRs). Upon recognition of molecular signatures (antigen) of pathogen, T cells may initiate an adaptive immune response. The duration of the T cell/APC contact is observed to vary widely, yet it is unclear what constructive role, if any, such variations might play in immune signaling. Modeling efforts describing antigen discrimination often focus on steady-state approximations and do not account for the transient nature of cellular contacts. Within the framework of a kinetic proofreading (KP) mechanism, we develop a stochastic First Receptor Activation Model (FRAM) describing the likelihood that a productive immune signal is produced before the expiry of the contact. Through the use of extreme statistics, we characterize the probability that the first TCR triggering is induced by a rare agonist antigen and not by that of an abundant self-antigen. We show that defining positive immune outcomes as resilience to extreme statistics and sensitivity to rare events mitigates classic tradeoffs associated with KP. By choosing a sufficient number of KP steps, our model is able to yield single agonist sensitivity whilst remaining non-reactive to large populations of self antigen, even when self and agonist antigen are similar in dissociation rate to the TCR but differ largely in expression. Additionally, our model achieves high levels of accuracy even when agonist positive APCs encounters are rare. Finally, we discuss potential biological costs associated with high classification accuracy, particularly in challenging T cell environments.

T细胞与抗原呈递细胞(APC)形成短暂的细胞间接触,以促进膜结合T细胞受体(TCRs)的表面询问。在识别病原体的分子特征(抗原)后,T细胞可以启动适应性免疫反应。观察到T细胞/APC接触的持续时间变化很大,但尚不清楚这种变化在免疫信号传导中可能发挥什么建设性作用(如果有的话)。描述抗原辨别的建模工作通常侧重于稳态近似,而没有考虑细胞接触的瞬态性质。在动力学校对(KP)机制的框架内,我们开发了一个随机第一受体激活模型(FRAM),描述了在接触期满前产生生产性免疫信号的可能性。通过使用极端统计学,我们表征了第一次TCR触发是由罕见的激动剂抗原而不是由丰富的自身抗原诱导的概率。我们表明,将阳性免疫结果定义为对极端统计数据的弹性和对罕见事件的敏感性,可以缓解与KP相关的经典权衡。通过选择足够数量的KP步骤,我们的模型能够产生单一激动剂敏感性,同时对大量自身抗原保持无反应,即使当自身和激动剂抗原在解离速率上与TCR相似但在表达上大不相同时。此外,即使激动剂阳性APC很少遇到,我们的模型也能达到高水平的准确性。最后,我们讨论了与高分类精度相关的潜在生物成本,特别是在具有挑战性的T细胞环境中。
{"title":"Modulation of antigen discrimination by duration of immune contacts in a kinetic proofreading model of T cell activation with extreme statistics.","authors":"Jonathan Morgan,&nbsp;Alan E Lindsay","doi":"10.1371/journal.pcbi.1011216","DOIUrl":"10.1371/journal.pcbi.1011216","url":null,"abstract":"<p><p>T cells form transient cell-to-cell contacts with antigen presenting cells (APCs) to facilitate surface interrogation by membrane bound T cell receptors (TCRs). Upon recognition of molecular signatures (antigen) of pathogen, T cells may initiate an adaptive immune response. The duration of the T cell/APC contact is observed to vary widely, yet it is unclear what constructive role, if any, such variations might play in immune signaling. Modeling efforts describing antigen discrimination often focus on steady-state approximations and do not account for the transient nature of cellular contacts. Within the framework of a kinetic proofreading (KP) mechanism, we develop a stochastic First Receptor Activation Model (FRAM) describing the likelihood that a productive immune signal is produced before the expiry of the contact. Through the use of extreme statistics, we characterize the probability that the first TCR triggering is induced by a rare agonist antigen and not by that of an abundant self-antigen. We show that defining positive immune outcomes as resilience to extreme statistics and sensitivity to rare events mitigates classic tradeoffs associated with KP. By choosing a sufficient number of KP steps, our model is able to yield single agonist sensitivity whilst remaining non-reactive to large populations of self antigen, even when self and agonist antigen are similar in dissociation rate to the TCR but differ largely in expression. Additionally, our model achieves high levels of accuracy even when agonist positive APCs encounters are rare. Finally, we discuss potential biological costs associated with high classification accuracy, particularly in challenging T cell environments.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011216"},"PeriodicalIF":4.3,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10497171/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10604253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scoring epidemiological forecasts on transformed scales. 在转换后的量表上对流行病学预测进行评分。
IF 4.3 2区 生物学 Pub Date : 2023-08-29 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011393
Nikos I Bosse, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, Sebastian Funk

Forecast evaluation is essential for the development of predictive epidemic models and can inform their use for public health decision-making. Common scores to evaluate epidemiological forecasts are the Continuous Ranked Probability Score (CRPS) and the Weighted Interval Score (WIS), which can be seen as measures of the absolute distance between the forecast distribution and the observation. However, applying these scores directly to predicted and observed incidence counts may not be the most appropriate due to the exponential nature of epidemic processes and the varying magnitudes of observed values across space and time. In this paper, we argue that transforming counts before applying scores such as the CRPS or WIS can effectively mitigate these difficulties and yield epidemiologically meaningful and easily interpretable results. Using the CRPS on log-transformed values as an example, we list three attractive properties: Firstly, it can be interpreted as a probabilistic version of a relative error. Secondly, it reflects how well models predicted the time-varying epidemic growth rate. And lastly, using arguments on variance-stabilizing transformations, it can be shown that under the assumption of a quadratic mean-variance relationship, the logarithmic transformation leads to expected CRPS values which are independent of the order of magnitude of the predicted quantity. Applying a transformation of log(x + 1) to data and forecasts from the European COVID-19 Forecast Hub, we find that it changes model rankings regardless of stratification by forecast date, location or target types. Situations in which models missed the beginning of upward swings are more strongly emphasised while failing to predict a downturn following a peak is less severely penalised when scoring transformed forecasts as opposed to untransformed ones. We conclude that appropriate transformations, of which the natural logarithm is only one particularly attractive option, should be considered when assessing the performance of different models in the context of infectious disease incidence.

预测评估对于开发流行病预测模型至关重要,可以为公共卫生决策提供信息。评估流行病学预测的常见分数是连续排序概率分数(CRPS)和加权区间分数(WIS),它们可以被视为预测分布和观测之间绝对距离的度量。然而,由于流行病过程的指数性质以及观察值在空间和时间上的变化幅度,将这些分数直接应用于预测和观察到的发病率可能不是最合适的。在本文中,我们认为,在应用CRPS或WIS等评分之前转换计数可以有效地缓解这些困难,并产生具有流行病学意义且易于解释的结果。以对数变换值上的CRPS为例,我们列出了三个有吸引力的性质:首先,它可以被解释为相对误差的概率版本。其次,它反映了模型对时变流行病增长率的预测效果。最后,利用方差稳定变换的自变量,可以表明,在二次均方差关系的假设下,对数变换产生的预期CRPS值与预测量的数量级无关。将log(x+1)转换应用于欧洲新冠肺炎预测中心的数据和预测,我们发现无论预测日期、地点或目标类型如何分层,它都会改变模型排名。模型错过了向上波动开始的情况得到了更有力的强调,而在对转换后的预测进行评分时,与未转换的预测相比,未能预测峰值后的衰退受到的惩罚较小。我们得出的结论是,在评估传染病发病率背景下不同模型的性能时,应该考虑适当的转换,自然对数只是其中一个特别有吸引力的选择。
{"title":"Scoring epidemiological forecasts on transformed scales.","authors":"Nikos I Bosse, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, Sebastian Funk","doi":"10.1371/journal.pcbi.1011393","DOIUrl":"10.1371/journal.pcbi.1011393","url":null,"abstract":"<p><p>Forecast evaluation is essential for the development of predictive epidemic models and can inform their use for public health decision-making. Common scores to evaluate epidemiological forecasts are the Continuous Ranked Probability Score (CRPS) and the Weighted Interval Score (WIS), which can be seen as measures of the absolute distance between the forecast distribution and the observation. However, applying these scores directly to predicted and observed incidence counts may not be the most appropriate due to the exponential nature of epidemic processes and the varying magnitudes of observed values across space and time. In this paper, we argue that transforming counts before applying scores such as the CRPS or WIS can effectively mitigate these difficulties and yield epidemiologically meaningful and easily interpretable results. Using the CRPS on log-transformed values as an example, we list three attractive properties: Firstly, it can be interpreted as a probabilistic version of a relative error. Secondly, it reflects how well models predicted the time-varying epidemic growth rate. And lastly, using arguments on variance-stabilizing transformations, it can be shown that under the assumption of a quadratic mean-variance relationship, the logarithmic transformation leads to expected CRPS values which are independent of the order of magnitude of the predicted quantity. Applying a transformation of log(x + 1) to data and forecasts from the European COVID-19 Forecast Hub, we find that it changes model rankings regardless of stratification by forecast date, location or target types. Situations in which models missed the beginning of upward swings are more strongly emphasised while failing to predict a downturn following a peak is less severely penalised when scoring transformed forecasts as opposed to untransformed ones. We conclude that appropriate transformations, of which the natural logarithm is only one particularly attractive option, should be considered when assessing the performance of different models in the context of infectious disease incidence.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011393"},"PeriodicalIF":4.3,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495027/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10236556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of DNA Methylation based on Multi-dimensional feature encoding and double convolutional fully connected convolutional neural network. 基于多维特征编码和双卷积全连接卷积神经网络的DNA甲基化预测。
IF 4.3 2区 生物学 Pub Date : 2023-08-28 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011370
Wenxing Hu, Lixin Guan, Mengshan Li
DNA methylation takes on critical significance to the regulation of gene expression by affecting the stability of DNA and changing the structure of chromosomes. DNA methylation modification sites should be identified, which lays a solid basis for gaining more insights into their biological functions. Existing machine learning-based methods of predicting DNA methylation have not fully exploited the hidden multidimensional information in DNA gene sequences, such that the prediction accuracy of models is significantly limited. Besides, most models have been built in terms of a single methylation type. To address the above-mentioned issues, a deep learning-based method was proposed in this study for DNA methylation site prediction, termed the MEDCNN model. The MEDCNN model is capable of extracting feature information from gene sequences in three dimensions (i.e., positional information, biological information, and chemical information). Moreover, the proposed method employs a convolutional neural network model with double convolutional layers and double fully connected layers while iteratively updating the gradient descent algorithm using the cross-entropy loss function to increase the prediction accuracy of the model. Besides, the MEDCNN model can predict different types of DNA methylation sites. As indicated by the experimental results,the deep learning method based on coding from multiple dimensions outperformed single coding methods, and the MEDCNN model was highly applicable and outperformed existing models in predicting DNA methylation between different species. As revealed by the above-described findings, the MEDCNN model can be effective in predicting DNA methylation sites.
DNA甲基化通过影响DNA的稳定性和改变染色体的结构,对基因表达的调节具有关键意义。应该确定DNA甲基化修饰位点,这为深入了解其生物学功能奠定了坚实的基础。现有的基于机器学习的DNA甲基化预测方法没有充分利用DNA基因序列中隐藏的多维信息,因此模型的预测准确性受到显著限制。此外,大多数模型都是根据单一甲基化类型构建的。为了解决上述问题,本研究提出了一种基于深度学习的DNA甲基化位点预测方法,称为MEDCNN模型。MEDCNN模型能够从三维的基因序列中提取特征信息(即位置信息、生物信息和化学信息)。此外,该方法采用了具有双卷积层和双完全连接层的卷积神经网络模型,同时使用交叉熵损失函数迭代更新梯度下降算法,以提高模型的预测精度。此外,MEDCNN模型可以预测不同类型的DNA甲基化位点。实验结果表明,基于多维编码的深度学习方法优于单一编码方法,MEDCNN模型在预测不同物种之间的DNA甲基化方面具有高度的适用性,优于现有模型。正如上述发现所揭示的,MEDCNN模型可以有效地预测DNA甲基化位点。
{"title":"Prediction of DNA Methylation based on Multi-dimensional feature encoding and double convolutional fully connected convolutional neural network.","authors":"Wenxing Hu,&nbsp;Lixin Guan,&nbsp;Mengshan Li","doi":"10.1371/journal.pcbi.1011370","DOIUrl":"10.1371/journal.pcbi.1011370","url":null,"abstract":"DNA methylation takes on critical significance to the regulation of gene expression by affecting the stability of DNA and changing the structure of chromosomes. DNA methylation modification sites should be identified, which lays a solid basis for gaining more insights into their biological functions. Existing machine learning-based methods of predicting DNA methylation have not fully exploited the hidden multidimensional information in DNA gene sequences, such that the prediction accuracy of models is significantly limited. Besides, most models have been built in terms of a single methylation type. To address the above-mentioned issues, a deep learning-based method was proposed in this study for DNA methylation site prediction, termed the MEDCNN model. The MEDCNN model is capable of extracting feature information from gene sequences in three dimensions (i.e., positional information, biological information, and chemical information). Moreover, the proposed method employs a convolutional neural network model with double convolutional layers and double fully connected layers while iteratively updating the gradient descent algorithm using the cross-entropy loss function to increase the prediction accuracy of the model. Besides, the MEDCNN model can predict different types of DNA methylation sites. As indicated by the experimental results,the deep learning method based on coding from multiple dimensions outperformed single coding methods, and the MEDCNN model was highly applicable and outperformed existing models in predicting DNA methylation between different species. As revealed by the above-described findings, the MEDCNN model can be effective in predicting DNA methylation sites.","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011370"},"PeriodicalIF":4.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10461834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10119990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models. VIRify:一个使用病毒特异性蛋白质图谱隐马尔可夫模型的综合检测、注释和分类管道。
IF 4.3 2区 生物学 Pub Date : 2023-08-28 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011422
Guillermo Rangel-Pineros, Alexandre Almeida, Martin Beracochea, Ekaterina Sakharova, Manja Marz, Alejandro Reyes Muñoz, Martin Hölzer, Robert D Finn

The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.

对病毒群落的研究揭示了这些生物实体对各种生态系统的巨大多样性和影响。这些观察结果引发了人们对开发计算策略的广泛兴趣,这些策略支持基于测序数据的病毒群落的全面表征。在这里,我们介绍了VIRify,这是一种新的计算管道,旨在提供用户友好、准确的病毒群落功能和分类特征。VIRify从宏基因组组装中识别病毒重叠群和原噬菌体,并使用病毒图谱隐藏马尔可夫模型(HMM)对其进行注释。其中包括我们手动策划的HMM图谱,它作为广泛的原核和真核病毒分类群的特定分类标记,因此用于可靠地对病毒重叠群进行分类。我们在两个微生物模拟群落的组装体、一项大型宏基因组学研究和一组来自人类肠道的公开可用病毒基因组序列上测试了VIRify。结果表明,VIRify可以识别原核病毒和真核病毒的序列,并提供从属到科的分类,平均准确率为86.6%。此外,VIRify还可以检测和分类243个海洋宏基因组组件中存在的一系列原核病毒或真核病毒。最后,VIRify的使用导致了分类上分类的人类肠道病毒序列数量的大幅增加,并改进了过时和肤浅的分类。总的来说,我们证明了VIRify是一种新颖而强大的资源,它提供了检测广泛病毒重叠群并对其进行分类的增强能力。
{"title":"VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models.","authors":"Guillermo Rangel-Pineros, Alexandre Almeida, Martin Beracochea, Ekaterina Sakharova, Manja Marz, Alejandro Reyes Muñoz, Martin Hölzer, Robert D Finn","doi":"10.1371/journal.pcbi.1011422","DOIUrl":"10.1371/journal.pcbi.1011422","url":null,"abstract":"<p><p>The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011422"},"PeriodicalIF":4.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10207472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Bayesian inference of dependency between mixed-type biological traits. 加速混合型生物特征之间依赖性的贝叶斯推断。
IF 4.3 2区 生物学 Pub Date : 2023-08-28 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011419
Zhenyu Zhang, Akihiko Nishimura, Nídia S Trovão, Joshua L Cherry, Andrew J Holbrook, Xiang Ji, Philippe Lemey, Marc A Suchard

Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.

在解释标本之间的进化关系的同时,推断混合型生物特征之间的依赖性具有很大的科学意义,但当特征和标本数量增加时,这仍然是不可行的。最先进的方法使用系统发育多变量probit模型,通过潜在变量框架来适应二元和连续特征,并使用有效的有界粒子采样器(BPS)来解决从高维截断正态分布中集成许多潜在变量的计算瓶颈。这种方法随着样本数量的增长而失效,并且无法可靠地表征性状之间的条件依赖性。在这里,我们提出了一个系统发育概率集模型的推理管道,它大大优于BPS。新颖性在于1)将最近的Zigzag哈密顿蒙特卡罗(Zigzag HMC)与线性时间梯度评估相结合,以及2)用于高度相关的潜在变量和相关矩阵元素的联合采样方案。在探索535种病毒的HIV-1进化的应用中,推断需要从11235维截断正态和24维协方差矩阵中联合采样。与BPS相比,我们的方法产生了5倍的加速,并使我们有可能了解候选病毒突变和毒力之间的部分相关性。计算加速现在使我们能够解决更大的问题:我们研究了大约900种病毒上甲型H1N1流感糖基化的进化。为了更广泛的适用性,我们扩展了系统发育概率模型,将分类特征纳入其中,并证明了它在研究Aquilegia花和传粉昆虫共同进化中的应用。
{"title":"Accelerating Bayesian inference of dependency between mixed-type biological traits.","authors":"Zhenyu Zhang, Akihiko Nishimura, Nídia S Trovão, Joshua L Cherry, Andrew J Holbrook, Xiang Ji, Philippe Lemey, Marc A Suchard","doi":"10.1371/journal.pcbi.1011419","DOIUrl":"10.1371/journal.pcbi.1011419","url":null,"abstract":"<p><p>Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011419"},"PeriodicalIF":4.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10207471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural network models for influenza forecasting with associated uncertainty using Web search activity trends. 使用网络搜索活动趋势预测流感的神经网络模型及其相关的不确定性。
IF 4.3 2区 生物学 Pub Date : 2023-08-28 eCollection Date: 2023-08-01 DOI: 10.1371/journal.pcbi.1011392
Michael Morris, Peter Hayes, Ingemar J Cox, Vasileios Lampos

Influenza affects millions of people every year. It causes a considerable amount of medical visits and hospitalisations as well as hundreds of thousands of deaths. Forecasting influenza prevalence with good accuracy can significantly help public health agencies to timely react to seasonal or novel strain epidemics. Although significant progress has been made, influenza forecasting remains a challenging modelling task. In this paper, we propose a methodological framework that improves over the state-of-the-art forecasting accuracy of influenza-like illness (ILI) rates in the United States. We achieve this by using Web search activity time series in conjunction with historical ILI rates as observations for training neural network (NN) architectures. The proposed models incorporate Bayesian layers to produce associated uncertainty intervals to their forecast estimates, positioning themselves as legitimate complementary solutions to more conventional approaches. The best performing NN, referred to as the iterative recurrent neural network (IRNN) architecture, reduces mean absolute error by 10.3% and improves skill by 17.1% on average in nowcasting and forecasting tasks across 4 consecutive flu seasons.

流感每年影响数百万人。它导致大量的医疗就诊和住院,以及数十万人死亡。准确预测流感流行率可以极大地帮助公共卫生机构及时应对季节性或新型流行病。尽管已经取得了重大进展,但流感预测仍然是一项具有挑战性的建模任务。在本文中,我们提出了一个方法框架,该框架提高了美国流感样疾病(ILI)发病率的预测准确性。我们通过使用网络搜索活动时间序列和历史ILI率作为训练神经网络(NN)架构的观测值来实现这一点。所提出的模型结合了贝叶斯层,在其预测估计中产生相关的不确定性区间,将其定位为更传统方法的合法补充解决方案。性能最好的神经网络被称为迭代递归神经网络(IRNN)架构,在连续4个流感季节的实时预报和预测任务中,平均绝对误差降低了10.3%,技能平均提高了17.1%。
{"title":"Neural network models for influenza forecasting with associated uncertainty using Web search activity trends.","authors":"Michael Morris,&nbsp;Peter Hayes,&nbsp;Ingemar J Cox,&nbsp;Vasileios Lampos","doi":"10.1371/journal.pcbi.1011392","DOIUrl":"10.1371/journal.pcbi.1011392","url":null,"abstract":"<p><p>Influenza affects millions of people every year. It causes a considerable amount of medical visits and hospitalisations as well as hundreds of thousands of deaths. Forecasting influenza prevalence with good accuracy can significantly help public health agencies to timely react to seasonal or novel strain epidemics. Although significant progress has been made, influenza forecasting remains a challenging modelling task. In this paper, we propose a methodological framework that improves over the state-of-the-art forecasting accuracy of influenza-like illness (ILI) rates in the United States. We achieve this by using Web search activity time series in conjunction with historical ILI rates as observations for training neural network (NN) architectures. The proposed models incorporate Bayesian layers to produce associated uncertainty intervals to their forecast estimates, positioning themselves as legitimate complementary solutions to more conventional approaches. The best performing NN, referred to as the iterative recurrent neural network (IRNN) architecture, reduces mean absolute error by 10.3% and improves skill by 17.1% on average in nowcasting and forecasting tasks across 4 consecutive flu seasons.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011392"},"PeriodicalIF":4.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491400/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10251469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
PLoS Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1