BioMedInformatics最新文献_第2页

Machine Learning for Extraction of Image Features Associated with Progression of Geographic Atrophy 通过机器学习提取与地理萎缩进展相关的图像特征

BioMedInformatics

Pub Date : 2024-07-02 DOI: 10.3390/biomedinformatics4030089

J. Arslan, Kurt Benke

Background: Several studies have investigated various features and models in order to understand the growth and progression of the ocular disease geographic atrophy (GA). Commonly assessed features include age, sex, smoking, alcohol consumption, sedentary lifestyle, hypertension, and diabetes. There have been inconsistencies regarding which features correlate with GA progression. Chief amongst these inconsistencies is whether the investigated features are readily available for analysis across various ophthalmic institutions. Methods:In this study, we focused our attention on the association of fundus autofluorescence (FAF) imaging features and GA progression. Our method included feature extraction using radiomic processes and feature ranking by machine learning incorporating the algorithm XGBoost to determine the best-ranked features. This led to the development of an image-based linear mixed-effects model, which was designed to account for slope change based on within-subject variability and inter-eye correlation. Metrics used to assess the linear mixed-effects model included marginal and conditional R2, Pearson’s correlation coefficient (r), root mean square error (RMSE), mean error (ME), mean absolute error (MAE), mean absolute deviation (MAD), the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and loglikelihood. Results: We developed a linear mixed-effects model with 15 image-based features. The model results were as follows: R2 = 0.96, r = 0.981, RMSE = 1.32, ME = −7.3 × 10−15, MAE = 0.94, MAD = 0.999, AIC = 2084.93, BIC = 2169.97, and log likelihood = −1022.46. Conclusions: The advantage of our method is that it relies on the inherent properties of the image itself, rather than the availability of clinical or demographic data. Thus, the image features discovered in this study are universally and readily available across the board.

背景：有几项研究对各种特征和模型进行了调查，以了解眼部疾病地理萎缩（GA）的生长和进展。常见的评估特征包括年龄、性别、吸烟、饮酒、久坐不动的生活方式、高血压和糖尿病。关于哪些特征与 GA 的进展相关，一直存在不一致的看法。在这些不一致中，最主要的是各眼科机构是否能随时对所调查的特征进行分析。方法：在本研究中，我们重点关注眼底自动荧光（FAF）成像特征与 GA 进展的关联。我们的方法包括使用放射学过程提取特征，并通过机器学习结合 XGBoost 算法进行特征排序，以确定最佳排序特征。这导致了基于图像的线性混合效应模型的开发，该模型旨在考虑基于受试者内变异性和眼间相关性的斜率变化。用于评估线性混合效应模型的指标包括边际和条件 R2、皮尔逊相关系数 (r)、均方根误差 (RMSE)、平均误差 (ME)、平均绝对误差 (MAE)、平均绝对偏差 (MAD)、阿凯克信息准则 (AIC)、贝叶斯信息准则 (BIC) 和对数概率。结果我们建立了一个包含 15 个图像特征的线性混合效应模型。模型结果如下R2 = 0.96，r = 0.981，RMSE = 1.32，ME = -7.3 × 10-15，MAE = 0.94，MAD = 0.999，AIC = 2084.93，BIC = 2169.97，对数似然 = -1022.46。结论我们的方法的优势在于它依赖于图像本身的固有特性，而不是临床或人口统计学数据。因此，本研究中发现的图像特征具有普遍性，可以随时随地获取。

{"title":"Machine Learning for Extraction of Image Features Associated with Progression of Geographic Atrophy","authors":"J. Arslan, Kurt Benke","doi":"10.3390/biomedinformatics4030089","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030089","url":null,"abstract":"Background: Several studies have investigated various features and models in order to understand the growth and progression of the ocular disease geographic atrophy (GA). Commonly assessed features include age, sex, smoking, alcohol consumption, sedentary lifestyle, hypertension, and diabetes. There have been inconsistencies regarding which features correlate with GA progression. Chief amongst these inconsistencies is whether the investigated features are readily available for analysis across various ophthalmic institutions. Methods:In this study, we focused our attention on the association of fundus autofluorescence (FAF) imaging features and GA progression. Our method included feature extraction using radiomic processes and feature ranking by machine learning incorporating the algorithm XGBoost to determine the best-ranked features. This led to the development of an image-based linear mixed-effects model, which was designed to account for slope change based on within-subject variability and inter-eye correlation. Metrics used to assess the linear mixed-effects model included marginal and conditional R2, Pearson’s correlation coefficient (r), root mean square error (RMSE), mean error (ME), mean absolute error (MAE), mean absolute deviation (MAD), the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and loglikelihood. Results: We developed a linear mixed-effects model with 15 image-based features. The model results were as follows: R2 = 0.96, r = 0.981, RMSE = 1.32, ME = −7.3 × 10−15, MAE = 0.94, MAD = 0.999, AIC = 2084.93, BIC = 2169.97, and log likelihood = −1022.46. Conclusions: The advantage of our method is that it relies on the inherent properties of the image itself, rather than the availability of clinical or demographic data. Thus, the image features discovered in this study are universally and readily available across the board.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":"30 46","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141685211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing DNA Language Models through Motif-Oriented Pre-Training with MoDNA 通过 MoDNA 面向动机的预训练推进 DNA 语言模型的发展

BioMedInformatics

Pub Date : 2024-06-12 DOI: 10.3390/biomedinformatics4020085

Weizhi An, Yuzhi Guo, Yatao Bian, Hehuan Ma, Jinyu Yang, Chunyuan Li, Junzhou Huang

Acquiring meaningful representations of gene expression is essential for the accurate prediction of downstream regulatory tasks, such as identifying promoters and transcription factor binding sites. However, the current dependency on supervised learning, constrained by the limited availability of labeled genomic data, impedes the ability to develop robust predictive models with broad generalization capabilities. In response, recent advancements have pivoted towards the application of self-supervised training for DNA sequence modeling, enabling the adaptation of pre-trained genomic representations to a variety of downstream tasks. Departing from the straightforward application of masked language learning techniques to DNA sequences, approaches such as MoDNA enrich genome language modeling with prior biological knowledge. In this study, we advance DNA language models by utilizing the Motif-oriented DNA (MoDNA) pre-training framework, which is established for self-supervised learning at the pre-training stage and is flexible enough for application across different downstream tasks. MoDNA distinguishes itself by efficiently learning semantic-level genomic representations from an extensive corpus of unlabeled genome data, offering a significant improvement in computational efficiency over previous approaches. The framework is pre-trained on a comprehensive human genome dataset and fine-tuned for targeted downstream tasks. Our enhanced analysis and evaluation in promoter prediction and transcription factor binding site prediction have further validated MoDNA’s exceptional capabilities, emphasizing its contribution to advancements in genomic predictive modeling.

获取有意义的基因表达表征对于准确预测下游调控任务（如识别启动子和转录因子结合位点）至关重要。然而，由于标记基因组数据的可用性有限，目前对监督学习的依赖阻碍了开发具有广泛泛化能力的稳健预测模型的能力。为此，最近的研究进展转向将自我监督训练应用于 DNA 序列建模，使预先训练的基因组表征能够适应各种下游任务。与直接将遮蔽语言学习技术应用于 DNA 序列不同，MoDNA 等方法利用先验生物知识丰富了基因组语言建模。在本研究中，我们利用面向动机的 DNA（MoDNA）预训练框架推进了 DNA 语言模型，该框架在预训练阶段建立了自我监督学习，并可灵活应用于不同的下游任务。MoDNA 的与众不同之处在于，它能从大量未标记的基因组数据中高效地学习语义级基因组表征，与之前的方法相比，计算效率有了显著提高。该框架在全面的人类基因组数据集上进行了预训练，并针对目标下游任务进行了微调。我们在启动子预测和转录因子结合位点预测方面的强化分析和评估进一步验证了 MoDNA 的卓越能力，强调了它对基因组预测建模进步的贡献。

{"title":"Advancing DNA Language Models through Motif-Oriented Pre-Training with MoDNA","authors":"Weizhi An, Yuzhi Guo, Yatao Bian, Hehuan Ma, Jinyu Yang, Chunyuan Li, Junzhou Huang","doi":"10.3390/biomedinformatics4020085","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020085","url":null,"abstract":"Acquiring meaningful representations of gene expression is essential for the accurate prediction of downstream regulatory tasks, such as identifying promoters and transcription factor binding sites. However, the current dependency on supervised learning, constrained by the limited availability of labeled genomic data, impedes the ability to develop robust predictive models with broad generalization capabilities. In response, recent advancements have pivoted towards the application of self-supervised training for DNA sequence modeling, enabling the adaptation of pre-trained genomic representations to a variety of downstream tasks. Departing from the straightforward application of masked language learning techniques to DNA sequences, approaches such as MoDNA enrich genome language modeling with prior biological knowledge. In this study, we advance DNA language models by utilizing the Motif-oriented DNA (MoDNA) pre-training framework, which is established for self-supervised learning at the pre-training stage and is flexible enough for application across different downstream tasks. MoDNA distinguishes itself by efficiently learning semantic-level genomic representations from an extensive corpus of unlabeled genome data, offering a significant improvement in computational efficiency over previous approaches. The framework is pre-trained on a comprehensive human genome dataset and fine-tuned for targeted downstream tasks. Our enhanced analysis and evaluation in promoter prediction and transcription factor binding site prediction have further validated MoDNA’s exceptional capabilities, emphasizing its contribution to advancements in genomic predictive modeling.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":"128 32","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141351387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Utilizing Immunoinformatics for mRNA Vaccine Design against Influenza D Virus 利用免疫信息学设计抗 D 型流感病毒的 mRNA 疫苗

BioMedInformatics

Pub Date : 2024-06-12 DOI: 10.3390/biomedinformatics4020086

E. K. Oladipo, Stephen Feranmi Adeyemo, M. Akinboade, Temitope Michael Akinleye, Kehinde Favour Siyanbola, Precious Ayomide Adeogun, Victor Michael Ogunfidodo, Christiana Adewumi Adekunle, Olubunmi Ayobami Elutade, Esther Eghogho Omoathebu, Blessing Oluwatunmise Taiwo, Elizabeth Olawumi Akindiya, Lucy Ochola, H. Onyeaka

Background: Influenza D Virus (IDV) presents a possible threat to animal and human health, necessitating the development of effective vaccines. Although no human illness linked to IDV has been reported, the possibility of human susceptibility to infection remains uncertain. Hence, there is a need for an animal vaccine to be designed. Such a vaccine will contribute to preventing and controlling IDV outbreaks and developing effective countermeasures against this emerging pathogen. This study, therefore, aimed to design an mRNA vaccine construct against IDV using immunoinformatic methods and evaluate its potential efficacy. Methods: A comprehensive methodology involving epitope prediction, vaccine construction, and structural analysis was employed. Viral sequences from six continents were collected and analyzed. A total of 88 Hemagglutinin Esterase Fusion (HEF) sequences from IDV isolates were obtained, of which 76 were identified as antigenic. Different bioinformatics tools were used to identify preferred CTL, HTL, and B-cell epitopes. The epitopes underwent thorough analysis, and those that can induce a lasting immunological response were selected for the construction. Results: The vaccine prototype comprised nine epitopes, an adjuvant, MHC I-targeting domain (MITD), Kozaq, 3′ UTR, 5′ UTR, and specific linkers. The mRNA vaccine construct exhibited antigenicity, non-toxicity, and non-allergenicity, with favourable physicochemical properties. The secondary and tertiary structure analyses revealed a stable and accurate vaccine construct. Molecular docking simulations also demonstrated strong binding affinity with toll-like receptors. Conclusions: The study provides a promising framework for developing an effective mRNA vaccine against IDV, highlighting its potential for mitigating the global impact of this viral infection. Further experimental studies are needed to confirm the vaccine’s efficacy and safety.

背景：D 型流感病毒（IDV）可能对动物和人类健康构成威胁，因此有必要开发有效的疫苗。虽然尚未有人类疾病与 IDV 有关的报道，但人类对感染 IDV 的易感性仍不确定。因此，有必要设计一种动物疫苗。这种疫苗将有助于预防和控制 IDV 的爆发，并针对这种新出现的病原体制定有效的应对措施。因此，本研究旨在利用免疫形式学方法设计一种针对 IDV 的 mRNA 疫苗构建体，并评估其潜在的功效。方法：本研究采用了包括表位预测、疫苗构建和结构分析在内的综合方法。收集并分析了来自六大洲的病毒序列。共从 IDV 分离物中获得 88 个血凝素酯酶融合（HEF）序列，其中 76 个被确定为抗原性序列。利用不同的生物信息学工具确定了首选的 CTL、HTL 和 B 细胞表位。对这些表位进行了全面分析，并选择了那些能诱导持久免疫反应的表位进行构建。结果：疫苗原型由九个表位、佐剂、MHC I靶向结构域（MITD）、Kozaq、3′UTR、5′UTR和特异性连接体组成。该 mRNA 疫苗构建体具有抗原性、无毒性和无过敏性，并具有良好的理化特性。二级和三级结构分析表明，该疫苗结构稳定、准确。分子对接模拟也显示了与收费样受体的强结合亲和力。结论这项研究为开发针对 IDV 的有效 mRNA 疫苗提供了一个前景广阔的框架，凸显了其减轻这种病毒感染对全球影响的潜力。疫苗的有效性和安全性还需要进一步的实验研究来证实。

{"title":"Utilizing Immunoinformatics for mRNA Vaccine Design against Influenza D Virus","authors":"E. K. Oladipo, Stephen Feranmi Adeyemo, M. Akinboade, Temitope Michael Akinleye, Kehinde Favour Siyanbola, Precious Ayomide Adeogun, Victor Michael Ogunfidodo, Christiana Adewumi Adekunle, Olubunmi Ayobami Elutade, Esther Eghogho Omoathebu, Blessing Oluwatunmise Taiwo, Elizabeth Olawumi Akindiya, Lucy Ochola, H. Onyeaka","doi":"10.3390/biomedinformatics4020086","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020086","url":null,"abstract":"Background: Influenza D Virus (IDV) presents a possible threat to animal and human health, necessitating the development of effective vaccines. Although no human illness linked to IDV has been reported, the possibility of human susceptibility to infection remains uncertain. Hence, there is a need for an animal vaccine to be designed. Such a vaccine will contribute to preventing and controlling IDV outbreaks and developing effective countermeasures against this emerging pathogen. This study, therefore, aimed to design an mRNA vaccine construct against IDV using immunoinformatic methods and evaluate its potential efficacy. Methods: A comprehensive methodology involving epitope prediction, vaccine construction, and structural analysis was employed. Viral sequences from six continents were collected and analyzed. A total of 88 Hemagglutinin Esterase Fusion (HEF) sequences from IDV isolates were obtained, of which 76 were identified as antigenic. Different bioinformatics tools were used to identify preferred CTL, HTL, and B-cell epitopes. The epitopes underwent thorough analysis, and those that can induce a lasting immunological response were selected for the construction. Results: The vaccine prototype comprised nine epitopes, an adjuvant, MHC I-targeting domain (MITD), Kozaq, 3′ UTR, 5′ UTR, and specific linkers. The mRNA vaccine construct exhibited antigenicity, non-toxicity, and non-allergenicity, with favourable physicochemical properties. The secondary and tertiary structure analyses revealed a stable and accurate vaccine construct. Molecular docking simulations also demonstrated strong binding affinity with toll-like receptors. Conclusions: The study provides a promising framework for developing an effective mRNA vaccine against IDV, highlighting its potential for mitigating the global impact of this viral infection. Further experimental studies are needed to confirm the vaccine’s efficacy and safety.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":"109 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141352150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Calibrating Glucose Sensors at the Edge: A Stress Generation Model for Tiny ML Drift Compensation 校准边缘葡萄糖传感器：用于微小 ML 漂移补偿的应力生成模型

BioMedInformatics

Pub Date : 2024-06-09 DOI: 10.3390/biomedinformatics4020083

Anna Sabatini, Costanza Cenerini, Luca Vollero, D. Pau

Background: Continuous glucose monitoring (CGM) systems offer the advantage of noninvasive monitoring and continuous data on glucose fluctuations. This study introduces a new model that enables the generation of synthetic but realistic databases that integrate physiological variables and sensor attributes into a dataset generation model and this, in turn, enables the design of improved CGM systems. Methods: The presented approach uses a combination of physiological data and sensor characteristics to construct a model that considers the impact of these variables on the accuracy of CGM measures. A dataset of 500 sensor responses over a 15-day period is generated and analyzed using machine learning algorithms (random forest regressor and support vector regressor). Results: The random forest and support vector regression models achieved Mean Absolute Errors (MAEs) of 16.13 mg/dL and 16.22 mg/dL, respectively. In contrast, models trained solely on single sensor outputs recorded an average MAE of 11.01±5.12 mg/dL. These findings demonstrate the variable impact of integrating multiple data sources on the predictive accuracy of CGM systems, as well as the complexity of the dataset. Conclusions: This approach provides a foundation for developing more precise algorithms and introduces its initial application of Tiny Machine Control Units (MCUs). More research is recommended to refine these models and validate their effectiveness in clinical settings.

背景：连续葡萄糖监测（CGM）系统具有无创监测和连续葡萄糖波动数据的优势。本研究介绍了一种新模型，它能生成合成但真实的数据库，将生理变量和传感器属性整合到数据集生成模型中，进而改进 CGM 系统的设计。方法：本文介绍的方法结合生理数据和传感器特征，构建了一个模型，考虑了这些变量对 CGM 测量准确性的影响。在 15 天内生成 500 个传感器响应的数据集，并使用机器学习算法（随机森林回归器和支持向量回归器）进行分析。结果：随机森林和支持向量回归模型的平均绝对误差 (MAE) 分别为 16.13 mg/dL 和 16.22 mg/dL。相比之下，仅根据单一传感器输出训练的模型记录的平均绝对误差为 11.01±5.12 mg/dL。这些研究结果表明，整合多个数据源对 CGM 系统预测准确性的影响各不相同，数据集的复杂性也不尽相同。结论：这种方法为开发更精确的算法奠定了基础，并介绍了微型机器控制单元（MCU）的初步应用。建议开展更多研究，以完善这些模型并验证其在临床环境中的有效性。

{"title":"Calibrating Glucose Sensors at the Edge: A Stress Generation Model for Tiny ML Drift Compensation","authors":"Anna Sabatini, Costanza Cenerini, Luca Vollero, D. Pau","doi":"10.3390/biomedinformatics4020083","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020083","url":null,"abstract":"Background: Continuous glucose monitoring (CGM) systems offer the advantage of noninvasive monitoring and continuous data on glucose fluctuations. This study introduces a new model that enables the generation of synthetic but realistic databases that integrate physiological variables and sensor attributes into a dataset generation model and this, in turn, enables the design of improved CGM systems. Methods: The presented approach uses a combination of physiological data and sensor characteristics to construct a model that considers the impact of these variables on the accuracy of CGM measures. A dataset of 500 sensor responses over a 15-day period is generated and analyzed using machine learning algorithms (random forest regressor and support vector regressor). Results: The random forest and support vector regression models achieved Mean Absolute Errors (MAEs) of 16.13 mg/dL and 16.22 mg/dL, respectively. In contrast, models trained solely on single sensor outputs recorded an average MAE of 11.01±5.12 mg/dL. These findings demonstrate the variable impact of integrating multiple data sources on the predictive accuracy of CGM systems, as well as the complexity of the dataset. Conclusions: This approach provides a foundation for developing more precise algorithms and introduces its initial application of Tiny Machine Control Units (MCUs). More research is recommended to refine these models and validate their effectiveness in clinical settings.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":" 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141367813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding the Molecular Actions of Spike Glycoprotein in SARS-CoV-2 and Issues of a Novel Therapeutic Strategy for the COVID-19 Vaccine 了解 SARS-CoV-2 中穗糖蛋白的分子作用以及 COVID-19 疫苗的新型治疗策略问题

BioMedInformatics

Pub Date : 2024-06-09 DOI: 10.3390/biomedinformatics4020084

Y. Matsuzaka, R. Yashiro

In vaccine development, many use the spike protein (S protein), which has multiple “spike-like” structures protruding from the spherical structure of the coronavirus, as an antigen. However, there are concerns about its effectiveness and toxicity. When S protein is used in a vaccine, its ability to attack viruses may be weak, and its effectiveness in eliciting immunity will only last for a short period of time. Moreover, it may cause “antibody-dependent immune enhancement”, which can enhance infections. In addition, the three-dimensional (3D) structure of epitopes is essential for functional analysis and structure-based vaccine design. Additionally, during viral infection, large amounts of extracellular vesicles (EVs) are secreted from infected cells, which function as a communication network between cells and coordinate the response to infection. Under conditions where SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) molecular vaccination produces overwhelming SARS-CoV-2 spike glycoprotein, a significant proportion of the overproduced intracellular spike glycoprotein is transported via EVs. Therefore, it will be important to understand the infection mechanisms of SARA-CoV-2 via EV-dependent and EV-independent uptake into cells and to model the infection processes based on 3D structural features at interaction sites.

在疫苗开发过程中，许多人使用尖峰蛋白（S 蛋白）作为抗原，这种蛋白从冠状病毒的球形结构中突出多个 "尖峰状 "结构。然而，人们对其有效性和毒性表示担忧。在疫苗中使用 S 蛋白时，其攻击病毒的能力可能较弱，激发免疫力的效果只能维持很短的时间。此外，它还可能引起 "抗体依赖性免疫增强"，从而增强感染。此外，表位的三维（3D）结构对于功能分析和基于结构的疫苗设计至关重要。此外，在病毒感染过程中，受感染细胞会分泌大量的胞外囊泡 (EVs)，这些囊泡可作为细胞间的通信网络，协调对感染的反应。在 SARS-CoV-2（严重急性呼吸系统综合征冠状病毒 2）分子疫苗接种会产生大量 SARS-CoV-2 棘突糖蛋白的情况下，细胞内过量产生的棘突糖蛋白有很大一部分是通过 EVs 运输的。因此，了解 SARA-CoV-2 通过 EV 依赖性和 EV 非依赖性摄入细胞的感染机制，并根据相互作用位点的三维结构特征建立感染过程模型将非常重要。

{"title":"Understanding the Molecular Actions of Spike Glycoprotein in SARS-CoV-2 and Issues of a Novel Therapeutic Strategy for the COVID-19 Vaccine","authors":"Y. Matsuzaka, R. Yashiro","doi":"10.3390/biomedinformatics4020084","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020084","url":null,"abstract":"In vaccine development, many use the spike protein (S protein), which has multiple “spike-like” structures protruding from the spherical structure of the coronavirus, as an antigen. However, there are concerns about its effectiveness and toxicity. When S protein is used in a vaccine, its ability to attack viruses may be weak, and its effectiveness in eliciting immunity will only last for a short period of time. Moreover, it may cause “antibody-dependent immune enhancement”, which can enhance infections. In addition, the three-dimensional (3D) structure of epitopes is essential for functional analysis and structure-based vaccine design. Additionally, during viral infection, large amounts of extracellular vesicles (EVs) are secreted from infected cells, which function as a communication network between cells and coordinate the response to infection. Under conditions where SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) molecular vaccination produces overwhelming SARS-CoV-2 spike glycoprotein, a significant proportion of the overproduced intracellular spike glycoprotein is transported via EVs. Therefore, it will be important to understand the infection mechanisms of SARA-CoV-2 via EV-dependent and EV-independent uptake into cells and to model the infection processes based on 3D structural features at interaction sites.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":" 38","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141367081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anomaly Detection and Artificial Intelligence Identified the Pathogenic Role of Apoptosis and RELB Proto-Oncogene, NF-kB Subunit in Diffuse Large B-Cell Lymphoma 异常检测和人工智能发现弥漫大 B 细胞淋巴瘤中细胞凋亡和 RELB 原癌基因、NF-kB 亚基的致病作用

BioMedInformatics

Pub Date : 2024-06-07 DOI: 10.3390/biomedinformatics4020081

J. Carreras, R. Hamoudi

Background: Diffuse large B-cell lymphoma (DLBCL) is one of the most frequent lymphomas. DLBCL is phenotypically, genetically, and clinically heterogeneous. Aim: We aim to identify new prognostic markers. Methods: We performed anomaly detection analysis, other artificial intelligence techniques, and conventional statistics using gene expression data of 414 patients from the Lymphoma/Leukemia Molecular Profiling Project (GSE10846), and immunohistochemistry in 10 reactive tonsils and 30 DLBCL cases. Results: First, an unsupervised anomaly detection analysis pinpointed outliers (anomalies) in the series, and 12 genes were identified: DPM2, TRAPPC1, HYAL2, TRIM35, NUDT18, TMEM219, CHCHD10, IGFBP7, LAMTOR2, ZNF688, UBL7, and RELB, which belonged to the apoptosis, MAPK, MTOR, and NF-kB pathways. Second, these 12 genes were used to predict overall survival using machine learning, artificial neural networks, and conventional statistics. In a multivariate Cox regression analysis, high expressions of HYAL2 and UBL7 were correlated with poor overall survival, whereas TRAPPC1, IGFBP7, and RELB were correlated with good overall survival (p < 0.01). As a single marker and only in RCHOP-like treated cases, the prognostic value of RELB was confirmed using GSEA analysis and Kaplan–Meier with log-rank test and validated in the TCGA and GSE57611 datasets. Anomaly detection analysis was successfully tested in the GSE31312 and GSE117556 datasets. Using immunohistochemistry, RELB was positive in B-lymphocytes and macrophage/dendritic-like cells, and correlation with HLA DP-DR, SIRPA, CD85A (LILRB3), PD-L1, MARCO, and TOX was explored. Conclusions: Anomaly detection and other bioinformatic techniques successfully predicted the prognosis of DLBCL, and high RELB was associated with a favorable prognosis.

背景：弥漫大 B 细胞淋巴瘤（DLBCL）是最常见的淋巴瘤之一。DLBCL在表型、基因和临床上都存在异质性。目的：我们旨在确定新的预后标志物。方法：进行异常检测分析：我们利用淋巴瘤/白血病分子谱分析项目（GSE10846）中 414 例患者的基因表达数据以及 10 例反应性扁桃体和 30 例 DLBCL 的免疫组化数据，进行了异常检测分析、其他人工智能技术和常规统计。研究结果首先，无监督异常检测分析找出了系列中的异常值（异常），并确定了 12 个基因：DPM2、TRAPPC1、HYAL2、TRIM35、NUDT18、TMEM219、CHCHD10、IGFBP7、LAMTOR2、ZNF688、UBL7和RELB，它们属于凋亡、MAPK、MTOR和NF-kB通路。其次，利用机器学习、人工神经网络和传统统计学方法预测这12个基因的总生存率。在多变量 Cox 回归分析中，HYAL2 和 UBL7 的高表达与总生存率差相关，而 TRAPPC1、IGFBP7 和 RELB 则与总生存率好相关（P < 0.01）。RELB作为单一标记物，仅在RCHOP类治疗病例中，其预后价值通过GSEA分析和Kaplan-Meier与对数秩检验得到了证实，并在TCGA和GSE57611数据集中得到了验证。异常检测分析在 GSE31312 和 GSE117556 数据集中得到了成功测试。通过免疫组化，RELB 在 B 淋巴细胞和巨噬细胞/树突状细胞中呈阳性，并探讨了其与 HLA DP-DR、SIRPA、CD85A (LILRB3)、PD-L1、MARCO 和 TOX 的相关性。结论异常检测和其他生物信息学技术成功地预测了DLBCL的预后，高RELB与良好的预后相关。

{"title":"Anomaly Detection and Artificial Intelligence Identified the Pathogenic Role of Apoptosis and RELB Proto-Oncogene, NF-kB Subunit in Diffuse Large B-Cell Lymphoma","authors":"J. Carreras, R. Hamoudi","doi":"10.3390/biomedinformatics4020081","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020081","url":null,"abstract":"Background: Diffuse large B-cell lymphoma (DLBCL) is one of the most frequent lymphomas. DLBCL is phenotypically, genetically, and clinically heterogeneous. Aim: We aim to identify new prognostic markers. Methods: We performed anomaly detection analysis, other artificial intelligence techniques, and conventional statistics using gene expression data of 414 patients from the Lymphoma/Leukemia Molecular Profiling Project (GSE10846), and immunohistochemistry in 10 reactive tonsils and 30 DLBCL cases. Results: First, an unsupervised anomaly detection analysis pinpointed outliers (anomalies) in the series, and 12 genes were identified: DPM2, TRAPPC1, HYAL2, TRIM35, NUDT18, TMEM219, CHCHD10, IGFBP7, LAMTOR2, ZNF688, UBL7, and RELB, which belonged to the apoptosis, MAPK, MTOR, and NF-kB pathways. Second, these 12 genes were used to predict overall survival using machine learning, artificial neural networks, and conventional statistics. In a multivariate Cox regression analysis, high expressions of HYAL2 and UBL7 were correlated with poor overall survival, whereas TRAPPC1, IGFBP7, and RELB were correlated with good overall survival (p < 0.01). As a single marker and only in RCHOP-like treated cases, the prognostic value of RELB was confirmed using GSEA analysis and Kaplan–Meier with log-rank test and validated in the TCGA and GSE57611 datasets. Anomaly detection analysis was successfully tested in the GSE31312 and GSE117556 datasets. Using immunohistochemistry, RELB was positive in B-lymphocytes and macrophage/dendritic-like cells, and correlation with HLA DP-DR, SIRPA, CD85A (LILRB3), PD-L1, MARCO, and TOX was explored. Conclusions: Anomaly detection and other bioinformatic techniques successfully predicted the prognosis of DLBCL, and high RELB was associated with a favorable prognosis.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":"18 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141375619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Abdominal MRI Unconditional Synthesis with Medical Assessment 腹部 MRI 无条件合成与医学评估

BioMedInformatics

Pub Date : 2024-06-07 DOI: 10.3390/biomedinformatics4020082

Bernardo Gonçalves, Mariana Silva, Luísa Vieira, Pedro Vieira

Current computer vision models require a significant amount of annotated data to improve their performance in a particular task. However, obtaining the required annotated data is challenging, especially in medicine. Hence, data augmentation techniques play a crucial role. In recent years, generative models have been used to create artificial medical images, which have shown promising results. This study aimed to use a state-of-the-art generative model, StyleGAN3, to generate realistic synthetic abdominal magnetic resonance images. These images will be evaluated using quantitative metrics and qualitative assessments by medical professionals. For this purpose, an abdominal MRI dataset acquired at Garcia da Horta Hospital in Almada, Portugal, was used. A subset containing only axial gadolinium-enhanced slices was used to train the model. The obtained Fréchet inception distance value (12.89) aligned with the state of the art, and a medical expert confirmed the significant realism and quality of the images. However, specific issues were identified in the generated images, such as texture variations, visual artefacts and anatomical inconsistencies. Despite these, this work demonstrated that StyleGAN3 is a viable solution to synthesise realistic medical imaging data, particularly in abdominal imaging.

当前的计算机视觉模型需要大量的注释数据来提高其在特定任务中的性能。然而，获取所需的注释数据具有挑战性，尤其是在医学领域。因此，数据增强技术发挥着至关重要的作用。近年来，生成模型已被用于创建人工医学图像，并取得了可喜的成果。本研究旨在使用最先进的生成模型 StyleGAN3 生成逼真的合成腹部磁共振图像。这些图像将通过定量指标和医学专业人员的定性评估进行评估。为此，我们使用了葡萄牙阿尔马达 Garcia da Horta 医院获得的腹部磁共振成像数据集。该数据集仅包含轴向钆增强切片，用于训练模型。所获得的弗雷谢内距值（12.89）与目前的技术水平相符，一位医学专家也证实了图像的逼真度和质量。不过，在生成的图像中也发现了一些具体问题，如纹理变化、视觉伪影和解剖不一致。尽管如此，这项工作还是证明了 StyleGAN3 是合成逼真医学成像数据的可行解决方案，尤其是在腹部成像方面。

{"title":"Abdominal MRI Unconditional Synthesis with Medical Assessment","authors":"Bernardo Gonçalves, Mariana Silva, Luísa Vieira, Pedro Vieira","doi":"10.3390/biomedinformatics4020082","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020082","url":null,"abstract":"Current computer vision models require a significant amount of annotated data to improve their performance in a particular task. However, obtaining the required annotated data is challenging, especially in medicine. Hence, data augmentation techniques play a crucial role. In recent years, generative models have been used to create artificial medical images, which have shown promising results. This study aimed to use a state-of-the-art generative model, StyleGAN3, to generate realistic synthetic abdominal magnetic resonance images. These images will be evaluated using quantitative metrics and qualitative assessments by medical professionals. For this purpose, an abdominal MRI dataset acquired at Garcia da Horta Hospital in Almada, Portugal, was used. A subset containing only axial gadolinium-enhanced slices was used to train the model. The obtained Fréchet inception distance value (12.89) aligned with the state of the art, and a medical expert confirmed the significant realism and quality of the images. However, specific issues were identified in the generated images, such as texture variations, visual artefacts and anatomical inconsistencies. Despite these, this work demonstrated that StyleGAN3 is a viable solution to synthesise realistic medical imaging data, particularly in abdominal imaging.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":" 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141373128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Physiological Data Augmentation for Eye Movement Gaze in Deep Learning 深度学习中的眼动凝视生理数据增强技术

BioMedInformatics

Pub Date : 2024-06-06 DOI: 10.3390/biomedinformatics4020080

Alae Eddine El Hmimdi, Zoï Kapoula

In this study, the challenges posed by limited annotated medical data in the field of eye movement AI analysis are addressed through the introduction of a novel physiologically based gaze data augmentation library. Unlike traditional augmentation methods, which may introduce artifacts and alter pathological features in medical datasets, the proposed library emulates natural head movements during gaze data collection. This approach enhances sample diversity without compromising authenticity. The library evaluation was conducted on both CNN and hybrid architectures using distinct datasets, demonstrating its effectiveness in regularizing the training process and improving generalization. What is particularly noteworthy is the achievement of a macro F1 score of up to 79% when trained using the proposed augmentation (EMULATE) with the three HTCE variants. This pioneering approach leverages domain-specific knowledge to contribute to the robustness and authenticity of deep learning models in the medical domain.

在本研究中，通过引入基于生理学的新型凝视数据增强库，解决了眼动人工智能分析领域中有限注释医疗数据带来的挑战。传统的增强方法可能会在医疗数据集中引入伪影并改变病理特征，与之不同的是，本研究提出的库可在凝视数据收集过程中模拟自然的头部运动。这种方法既增强了样本的多样性，又不影响真实性。利用不同的数据集对 CNN 和混合架构进行了库评估，证明了其在规范化训练过程和提高泛化方面的有效性。尤其值得注意的是，在使用建议的增强（EMULATE）和三种 HTCE 变体进行训练时，宏观 F1 分数高达 79%。这种开创性的方法利用了特定领域的知识，有助于提高深度学习模型在医疗领域的鲁棒性和真实性。

引用次数: 0

Unlocking the Future of Drug Development: Generative AI, Digital Twins, and Beyond 开启药物开发的未来：生成式人工智能、数字双胞胎及其他

BioMedInformatics

Pub Date : 2024-06-06 DOI: 10.3390/biomedinformatics4020079

Zamara Mariam, Sarfaraz K. Niazi, Matthias Magoola

This article delves into the intersection of generative AI and digital twins within drug discovery, exploring their synergistic potential to revolutionize pharmaceutical research and development. Through various instances and examples, we illuminate how generative AI algorithms, capable of simulating vast chemical spaces and predicting molecular properties, are increasingly integrated with digital twins of biological systems to expedite drug discovery. By harnessing the power of computational models and machine learning, researchers can design novel compounds tailored to specific targets, optimize drug candidates, and simulate their behavior within virtual biological environments. This paradigm shift offers unprecedented opportunities for accelerating drug development, reducing costs, and, ultimately, improving patient outcomes. As we navigate this rapidly evolving landscape, collaboration between interdisciplinary teams and continued innovation will be paramount in realizing the promise of generative AI and digital twins in advancing drug discovery.

本文深入探讨了生成式人工智能和数字孪生在药物发现中的交叉点，探讨了它们在彻底改变药物研究与开发方面的协同潜力。通过各种实例和例子，我们阐明了能够模拟广阔化学空间和预测分子特性的生成式人工智能算法如何越来越多地与生物系统的数字孪生集成，以加快药物发现。通过利用计算模型和机器学习的力量，研究人员可以设计出针对特定靶点的新型化合物，优化候选药物，并模拟它们在虚拟生物环境中的行为。这种模式的转变为加快药物开发、降低成本以及最终改善患者预后提供了前所未有的机遇。在我们驾驭这一快速发展的格局时，跨学科团队之间的合作和持续创新将是实现生成式人工智能和数字双胞胎在推进药物发现方面的前景的关键。

引用次数: 0

A Study on the Effects of Cementless Total Knee Arthroplasty Implants’ Surface Morphology via Finite Element Analysis 通过有限元分析研究无骨水泥全膝关节假体表面形态的影响

BioMedInformatics

Pub Date : 2024-06-03 DOI: 10.3390/biomedinformatics4020078

Peter J. Hunt, Mohammad N. Noori, S. Hazelwood, N. Noori, Wael A. Altabey

Total knee arthroplasty (TKA) is one of the most commonly performed orthopedic surgeries, with nearly one million performed in 2020 in the United States alone. Changing patient demographics, predominately indicated by increases in younger, more active, and more obese patients undergoing TKA, poses a challenge to orthopedic surgeons as these factors present a greater risk of long-term complications. Historically, cemented TKA has been the gold standard for fixation, but long-term aseptic loosening continues to be a risk for cemented implants. Cementless TKA, which relies on the surface morphology of a porous coating for biologic fixation of implant to bone, may provide improved long-term survivorship compared with cement. The quality of this bond is dependent on an interference fit and the roughness, or coefficient of friction, between the implant and the bonebone. Stress shielding is a measure of the difference in the stress experienced by implanted bone versus surrounding native bone. A finite element model (FEM) can be used to quantify and better understand stress shielding in order to better evaluate and optimize implant design. In this study, a FEM was constructed to investigate how the surface coating of cementless implants (coefficient of friction) and the location of the coating application affected the stress-shielding response in the tibia. It was determined that the stress distribution in the native tibia surrounding a cementless TKA implant was dependent on the coefficient of friction applied at the tip of the implant’s stem. Materials with lower friction coefficients applied to the stem tip resulted in higher compressive stress experienced by implanted bone, and more favorable overall stress-shielding responses.

全膝关节置换术（TKA）是最常见的骨科手术之一，仅在美国，2020 年就将完成近 100 万例手术。接受全膝关节置换术的患者年龄越来越小、越来越活跃、越来越肥胖，这给矫形外科医生带来了挑战，因为这些因素会增加长期并发症的风险。从历史上看，骨水泥 TKA 一直是固定的黄金标准，但长期无菌性松动仍然是骨水泥植入物的一个风险。无骨水泥 TKA 依靠多孔涂层的表面形态实现植入物与骨的生物固定，与骨水泥相比，可提高长期存活率。这种结合的质量取决于过盈配合和植入物与骨之间的粗糙度或摩擦系数。应力屏蔽是对植入骨与周围原生骨所受应力差异的测量。有限元模型（FEM）可用于量化和更好地理解应力屏蔽，以便更好地评估和优化种植体设计。本研究构建了一个有限元模型，以研究无骨水泥植入体的表面涂层（摩擦系数）和涂层应用位置如何影响胫骨的应力屏蔽响应。结果表明，无骨水泥 TKA 植入体周围原生胫骨的应力分布取决于植入体柄顶端的摩擦系数。植入柄顶端的摩擦系数越低，植入骨承受的压应力就越大，整体应力屏蔽反应就越好。

{"title":"A Study on the Effects of Cementless Total Knee Arthroplasty Implants’ Surface Morphology via Finite Element Analysis","authors":"Peter J. Hunt, Mohammad N. Noori, S. Hazelwood, N. Noori, Wael A. Altabey","doi":"10.3390/biomedinformatics4020078","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020078","url":null,"abstract":"Total knee arthroplasty (TKA) is one of the most commonly performed orthopedic surgeries, with nearly one million performed in 2020 in the United States alone. Changing patient demographics, predominately indicated by increases in younger, more active, and more obese patients undergoing TKA, poses a challenge to orthopedic surgeons as these factors present a greater risk of long-term complications. Historically, cemented TKA has been the gold standard for fixation, but long-term aseptic loosening continues to be a risk for cemented implants. Cementless TKA, which relies on the surface morphology of a porous coating for biologic fixation of implant to bone, may provide improved long-term survivorship compared with cement. The quality of this bond is dependent on an interference fit and the roughness, or coefficient of friction, between the implant and the bonebone. Stress shielding is a measure of the difference in the stress experienced by implanted bone versus surrounding native bone. A finite element model (FEM) can be used to quantify and better understand stress shielding in order to better evaluate and optimize implant design. In this study, a FEM was constructed to investigate how the surface coating of cementless implants (coefficient of friction) and the location of the coating application affected the stress-shielding response in the tibia. It was determined that the stress distribution in the native tibia surrounding a cementless TKA implant was dependent on the coefficient of friction applied at the tip of the implant’s stem. Materials with lower friction coefficients applied to the stem tip resulted in higher compressive stress experienced by implanted bone, and more favorable overall stress-shielding responses.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":"47 32","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141269876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0