Interdisciplinary Sciences: Computational Life Sciences最新文献_第5页

Interpretable Multimodal Molecular Language Model for Drug-Target Interaction Prediction. 用于药物-靶标相互作用预测的可解释多模态分子语言模型。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2026-01-22 DOI: 10.1007/s12539-025-00808-1

Hui Yu, Qingyong Wang, Xiaobo Zhou, Lichuan Gu

引用次数: 0

Generative Adversarial Networks Based on Fine-Grained Image Recognition for the Progression Prediction of Progressive Mild Cognitive Impairment. 基于细粒度图像识别的生成对抗网络进行性轻度认知障碍的进展预测。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2026-01-13 DOI: 10.1007/s12539-025-00800-9

Changsong Shen, Fangxiang Wu, Bo Liao, Jinsheng Wang, Qingbo Li

Progressive mild cognitive impairment (pMCI) often develops into Alzheimer's disease (AD), whereas stable mild cognitive impairment (sMCI) remains cognitively unchanged. Therefore, early identification of pMCI based on multimodal neuroimaging data (e.g., MRI, PET) is clinically valuable. However, limited multimodal data reduces complementary information across modalities and degrades prediction performance. Existing generative adversarial networks (GANs) often overlook local information when synthesizing cross-modal neuroimages, leading to suboptimal image quality. Motivated by these shortcomings, we propose a generative adversarial network (FGGAN) based on fine-grained image recognition for cross-modal image synthesis and pMCI progression prediction. FGGAN comprises a GAN, a feature depth extraction (FDE) module, and a classifier module. The GAN synthesizes high-quality missing modality data by leveraging local and global cues from the input image, while extracting multimodal feature representations. The FDE refines semantic features to improve feature adaptation for the classifier, which predicts pMCI progression from fused multimodal features. Results from the ADNI dataset indicate that FGGAN achieves superior performance in image synthesis quality and disease classification.

进行性轻度认知障碍（pMCI）通常发展为阿尔茨海默病（AD），而稳定性轻度认知障碍（sMCI）在认知上保持不变。因此，基于多模态神经影像学数据（如MRI、PET）早期识别pMCI具有临床价值。然而，有限的多模态数据减少了模态间的互补信息，降低了预测性能。现有的生成对抗网络（GANs）在合成跨模态神经图像时经常忽略局部信息，导致图像质量不理想。基于这些缺点，我们提出了一种基于细粒度图像识别的生成对抗网络（FGGAN），用于跨模态图像合成和pMCI进展预测。FGGAN包括GAN、特征深度提取（FDE）模块和分类器模块。GAN通过利用来自输入图像的局部和全局线索来合成高质量的缺失模态数据，同时提取多模态特征表示。FDE改进语义特征以提高分类器的特征适应性，从融合的多模态特征预测pMCI的进展。来自ADNI数据集的结果表明，FGGAN在图像合成质量和疾病分类方面具有优异的性能。

{"title":"Generative Adversarial Networks Based on Fine-Grained Image Recognition for the Progression Prediction of Progressive Mild Cognitive Impairment.","authors":"Changsong Shen, Fangxiang Wu, Bo Liao, Jinsheng Wang, Qingbo Li","doi":"10.1007/s12539-025-00800-9","DOIUrl":"https://doi.org/10.1007/s12539-025-00800-9","url":null,"abstract":"Progressive mild cognitive impairment (pMCI) often develops into Alzheimer's disease (AD), whereas stable mild cognitive impairment (sMCI) remains cognitively unchanged. Therefore, early identification of pMCI based on multimodal neuroimaging data (e.g., MRI, PET) is clinically valuable. However, limited multimodal data reduces complementary information across modalities and degrades prediction performance. Existing generative adversarial networks (GANs) often overlook local information when synthesizing cross-modal neuroimages, leading to suboptimal image quality. Motivated by these shortcomings, we propose a generative adversarial network (FGGAN) based on fine-grained image recognition for cross-modal image synthesis and pMCI progression prediction. FGGAN comprises a GAN, a feature depth extraction (FDE) module, and a classifier module. The GAN synthesizes high-quality missing modality data by leveraging local and global cues from the input image, while extracting multimodal feature representations. The FDE refines semantic features to improve feature adaptation for the classifier, which predicts pMCI progression from fused multimodal features. Results from the ADNI dataset indicate that FGGAN achieves superior performance in image synthesis quality and disease classification.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145966354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advanced Multi-Level Bidirectional Attention Network for Retinal Vessel Segmentation. 用于视网膜血管分割的先进多级双向注意网络。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-12-12 DOI: 10.1007/s12539-025-00793-5

Zhendi Ma, Xiaobo Li, Yuxin Zhao, Jiahao Wang, Zhongmei Han, Hui Wang

Retinal vessel segmentation is crucial for clinical diagnosis due to the rich morphological information in retinal fundus images. Although neural networks perform well, issues like feature loss during encoding and insufficient context fusion in skip connections remain. The complex curvature of small vessels and uneven background brightness further complicate pathological image segmentation. To address these problems, this paper proposes a multi-level bidirectional attention aggregation network. The encoder proposes a Partial Encoder Block (PEB) to reduce feature loss from traditional convolution. A Dynamic Direction Attention Module (DDAM) is proposed in the skip connection to enhance anisotropic geometric representation, preserving fine vessel details and contextual information. Additionally, a Multi-Feature Fusion Module (MFFM) is proposed to fuse multi-level features, retaining details while suppressing background noise. Experiments on DRIVE, STARE, and CHASEDB1 datasets demonstrate the network's effectiveness. On DRIVE, AUC, F1-score, and Sensitivity improved by 0.19%, 0.43%, and 1.17%, respectively. On STARE, AUC, F1, and sensitivity rose by 0.26%, 2.95%, and 2.07%, respectively. On CHASEDB1, AUC, F1-score, and specificity increased by 0.2%, 1.12%, and 0.44%, respectively. Results show the proposed network outperforms existing methods in segmentation performance.

由于视网膜眼底图像中丰富的形态信息，视网膜血管分割对临床诊断至关重要。尽管神经网络表现良好，但编码过程中的特征丢失和跳跃连接中上下文融合不足等问题仍然存在。小血管复杂的曲率和不均匀的背景亮度使病理图像分割更加复杂。为了解决这些问题，本文提出了一个多层次的双向注意力聚合网络。该编码器提出了一种局部编码器块（Partial encoder Block， PEB）来减少传统卷积带来的特征损失。提出了一种动态方向注意模块（Dynamic Direction Attention Module， DDAM），以增强各向异性几何表示，同时保留良好的船舶细节和上下文信息。此外，提出了一种多特征融合模块（Multi-Feature Fusion Module， MFFM）来融合多层次特征，在保留细节的同时抑制背景噪声。在DRIVE、STARE和CHASEDB1数据集上的实验证明了该网络的有效性。在DRIVE上，AUC、f1评分和灵敏度分别提高了0.19%、0.43%和1.17%。在STARE上，AUC、F1和灵敏度分别提高了0.26%、2.95%和2.07%。在CHASEDB1上，AUC、f1评分和特异性分别增加0.2%、1.12%和0.44%。结果表明，该网络在分割性能上优于现有方法。

{"title":"Advanced Multi-Level Bidirectional Attention Network for Retinal Vessel Segmentation.","authors":"Zhendi Ma, Xiaobo Li, Yuxin Zhao, Jiahao Wang, Zhongmei Han, Hui Wang","doi":"10.1007/s12539-025-00793-5","DOIUrl":"https://doi.org/10.1007/s12539-025-00793-5","url":null,"abstract":"Retinal vessel segmentation is crucial for clinical diagnosis due to the rich morphological information in retinal fundus images. Although neural networks perform well, issues like feature loss during encoding and insufficient context fusion in skip connections remain. The complex curvature of small vessels and uneven background brightness further complicate pathological image segmentation. To address these problems, this paper proposes a multi-level bidirectional attention aggregation network. The encoder proposes a Partial Encoder Block (PEB) to reduce feature loss from traditional convolution. A Dynamic Direction Attention Module (DDAM) is proposed in the skip connection to enhance anisotropic geometric representation, preserving fine vessel details and contextual information. Additionally, a Multi-Feature Fusion Module (MFFM) is proposed to fuse multi-level features, retaining details while suppressing background noise. Experiments on DRIVE, STARE, and CHASEDB1 datasets demonstrate the network's effectiveness. On DRIVE, AUC, F1-score, and Sensitivity improved by 0.19%, 0.43%, and 1.17%, respectively. On STARE, AUC, F1, and sensitivity rose by 0.26%, 2.95%, and 2.07%, respectively. On CHASEDB1, AUC, F1-score, and specificity increased by 0.2%, 1.12%, and 0.44%, respectively. Results show the proposed network outperforms existing methods in segmentation performance.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EASNet: Edge-aware Segmentation Network for Skin Lesion Segmentation with Boundary-aware and Frequency Attention Mechanisms. EASNet：基于边界感知和频率关注机制的边缘感知皮肤损伤分割网络。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-12-12 DOI: 10.1007/s12539-025-00796-2

Junwei Yu, Yuhe Xia, Jianping Li, Nan Liu, Haoze Li, Weiya Shi

Cutaneous malignancies represent one of the most common cancers globally, with consistently rising incidence rates driving demand for enhanced diagnostic methodologies. While dermoscopy delivers high-resolution image data, existing CNN (convolutional neural network)-based approaches display constrained perception abilities when handling complex lesion boundaries and frequency-domain features. To overcome these constraints, we introduce EASNet, an innovative edge-aware segmentation network that combines frequency-domain insights with explicit boundary modeling. EASNet leverages discrete cosine transform (DCT) and discrete wavelet transform (DWT) to acquire multi-scale frequency information, maintaining global structures alongside precise boundary details. Furthermore, a boundary-driven criss-cross (BDCC) attention component strengthens spatial dependency learning, and a hybrid loss mechanism guarantees accurate boundary supervision throughout training. Extensive experiments on ISIC2017 and ISIC2018 datasets reveal that EASNet attains competitive performance in segmentation precision, boundary clarity, and positional consistency. This work pushes forward dermatological image analysis, offering a dependable tool for precise clinical assessment and therapeutic strategy development.

皮肤恶性肿瘤是全球最常见的癌症之一，其发病率持续上升，推动了对改进诊断方法的需求。虽然皮肤镜检查可以提供高分辨率的图像数据，但现有的基于CNN（卷积神经网络）的方法在处理复杂的病变边界和频域特征时显示出有限的感知能力。为了克服这些限制，我们引入了EASNet，这是一种创新的边缘感知分割网络，将频域洞察力与显式边界建模相结合。EASNet利用离散余弦变换（DCT）和离散小波变换（DWT）来获取多尺度频率信息，在保持精确边界细节的同时保持全局结构。此外，边界驱动的纵横交错（BDCC）注意成分加强了空间依赖学习，混合损失机制保证了在整个训练过程中准确的边界监督。在ISIC2017和ISIC2018数据集上的大量实验表明，EASNet在分割精度、边界清晰度和位置一致性方面具有竞争力。这项工作推动了皮肤图像分析，为精确的临床评估和治疗策略的制定提供了可靠的工具。

{"title":"EASNet: Edge-aware Segmentation Network for Skin Lesion Segmentation with Boundary-aware and Frequency Attention Mechanisms.","authors":"Junwei Yu, Yuhe Xia, Jianping Li, Nan Liu, Haoze Li, Weiya Shi","doi":"10.1007/s12539-025-00796-2","DOIUrl":"https://doi.org/10.1007/s12539-025-00796-2","url":null,"abstract":"Cutaneous malignancies represent one of the most common cancers globally, with consistently rising incidence rates driving demand for enhanced diagnostic methodologies. While dermoscopy delivers high-resolution image data, existing CNN (convolutional neural network)-based approaches display constrained perception abilities when handling complex lesion boundaries and frequency-domain features. To overcome these constraints, we introduce EASNet, an innovative edge-aware segmentation network that combines frequency-domain insights with explicit boundary modeling. EASNet leverages discrete cosine transform (DCT) and discrete wavelet transform (DWT) to acquire multi-scale frequency information, maintaining global structures alongside precise boundary details. Furthermore, a boundary-driven criss-cross (BDCC) attention component strengthens spatial dependency learning, and a hybrid loss mechanism guarantees accurate boundary supervision throughout training. Extensive experiments on ISIC2017 and ISIC2018 datasets reveal that EASNet attains competitive performance in segmentation precision, boundary clarity, and positional consistency. This work pushes forward dermatological image analysis, offering a dependable tool for precise clinical assessment and therapeutic strategy development.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ResNet-Powered Multi-Class Identification of Sequence Patterns for Genome Replication Timing Analysis. 基于resnet的基因组复制时序分析序列模式多类识别。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-12-12 DOI: 10.1007/s12539-025-00797-1

Zhen-Ning Yin, Yu-Hao Zeng, Feng Gao

The precise regulation of DNA replication timing (RT) relies on deciphering sequence patterns. Although significant advances have been made in identifying sequence patterns associated with replication timing, there are still few computational pipelines designed for accurate RT prediction. In this study, we propose a deep learning-based framework, named RT-Predictor, leveraging a residual network (ResNet) to classify sequence patterns associated with RT across the human genome into four distinct domains: early replication domain (ERD), down transition zone (DTZ), late replication domain (LRD), and up transition zone (UTZ). Using solely DNA sequence patterns, the model achieves an accuracy of 74.58%, a Matthews correlation coefficient (MCC) of 0.6612, an F1-score of 0.7458, and a Recall of 0.7457, demonstrating its ability to resolve complex DNA replication timing patterns. By incorporating positional and frequency-based features derived from DNA sequences, we extract a comprehensive set of 384 features that effectively characterize replication dynamics. Genome-wide RT prediction reveals that replication origins (ORIs) predominantly initiate replication during the early S-phase, potentially linking specific sequence patterns to DNA damage repair mechanisms. These findings demonstrate the power of deep learning in decoding the regulatory significance of sequence patterns in replication timing and provide critical insights into the molecular basis of genomic stability and its disruption in diseases, particularly cancer.

DNA复制时间（RT）的精确调控依赖于对序列模式的解读。尽管在识别与复制时间相关的序列模式方面取得了重大进展，但仍然很少有设计用于准确预测RT的计算管道。在这项研究中，我们提出了一个基于深度学习的框架，称为RT- predictor，利用残差网络（ResNet）将人类基因组中与RT相关的序列模式分类为四个不同的区域：早期复制区域（ERD）、向下过渡区域（DTZ）、晚期复制区域（LRD）和向上过渡区域（UTZ）。仅使用DNA序列模式，该模型的准确率为74.58%，马修斯相关系数（MCC）为0.6612，f1得分为0.7458，召回率为0.7457，证明了其解决复杂DNA复制时间模式的能力。通过结合来自DNA序列的基于位置和频率的特征，我们提取了一套全面的384个特征，有效地表征了复制动态。全基因组RT预测显示，复制起源（ORIs）主要在早期s期启动复制，可能将特定序列模式与DNA损伤修复机制联系起来。这些发现证明了深度学习在解码复制时间序列模式的调控意义方面的力量，并为基因组稳定性的分子基础及其在疾病（特别是癌症）中的破坏提供了重要见解。

{"title":"ResNet-Powered Multi-Class Identification of Sequence Patterns for Genome Replication Timing Analysis.","authors":"Zhen-Ning Yin, Yu-Hao Zeng, Feng Gao","doi":"10.1007/s12539-025-00797-1","DOIUrl":"https://doi.org/10.1007/s12539-025-00797-1","url":null,"abstract":"The precise regulation of DNA replication timing (RT) relies on deciphering sequence patterns. Although significant advances have been made in identifying sequence patterns associated with replication timing, there are still few computational pipelines designed for accurate RT prediction. In this study, we propose a deep learning-based framework, named RT-Predictor, leveraging a residual network (ResNet) to classify sequence patterns associated with RT across the human genome into four distinct domains: early replication domain (ERD), down transition zone (DTZ), late replication domain (LRD), and up transition zone (UTZ). Using solely DNA sequence patterns, the model achieves an accuracy of 74.58%, a Matthews correlation coefficient (MCC) of 0.6612, an F1-score of 0.7458, and a Recall of 0.7457, demonstrating its ability to resolve complex DNA replication timing patterns. By incorporating positional and frequency-based features derived from DNA sequences, we extract a comprehensive set of 384 features that effectively characterize replication dynamics. Genome-wide RT prediction reveals that replication origins (ORIs) predominantly initiate replication during the early S-phase, potentially linking specific sequence patterns to DNA damage repair mechanisms. These findings demonstrate the power of deep learning in decoding the regulatory significance of sequence patterns in replication timing and provide critical insights into the molecular basis of genomic stability and its disruption in diseases, particularly cancer.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unveiling Putative Functions of Burkholderia pseudomallei K96243 Hypothetical Proteins Via High-Throughput Characterization of Structural Similarities. 通过结构相似性的高通量表征揭示伪伯克霍尔德菌K96243假设蛋白的假定功能。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-12-12 DOI: 10.1007/s12539-025-00792-6

Syed Abuthakir Mohamed Husain, Su Datt Lam, Mohd Firdaus-Raih, Sheila Nathan, Nor Azlan Nor Muhammad, Chyan Leong Ng

Burkholderia pseudomallei (BP) infections claims tens of thousands of lives worldwide every year. The bacterium's distinctive characteristics include antibiotic resistance, virulence and ability to survive in stressful environments. The B. pseudomallei genome sequencing and annotation reveal that about 25% of the genes encode hypothetical proteins (HPs). As such, characterising the HPs could shed light on the mechanisms that contribute to the above characteristics. Over the last decade, genome sequencing and annotation technologies have advanced drastically. Furthermore, artificial intelligence programs such as AlphaFold2 (AF2), RoseTTAFold2 (RF2), which can predict 3D protein structures with high accuracy, are also available. Taking advantage of the available tools, this study aimed to re-annotate HPs that are encoded within the BP genome. To achieve this, we retrieved 1869 HPs from the Burkholderia Genome Database, then cross-referenced with UniProt. After filtering, 419 remain hypothetical. These were analysed using BLASTp for sequence homologs and antibiotic resistance proteins, followed by 3D structure prediction using AF2 and RF2, and structural homolog search using Foldseek. This study successfully annotated 209 HPs with only 210 proteins (3.7% of BP coding sequences) still classified as 'hypothetical'. The functions of the predicted HPs were further analysed using structure comparison and active site analysis. The annotated protein list includes fifteen antibiotic resistance proteins, five haem oxygenase-like fold proteins involved in biofilm formation, host pathogenesis, and antibacterial activity, along with five essential proteins. These proteins represent promising drug targets for developing new antibiotics against melioidosis. Nonetheless, experimental validation will be necessary to characterize the predicted protein functions.

假性伯克霍尔德氏菌（BP）感染每年夺去全世界数万人的生命。这种细菌的独特特征包括抗生素耐药性、毒力和在压力环境中生存的能力。伪芽孢杆菌基因组测序和注释显示，约25%的基因编码假设蛋白（HPs）。因此，描述hp可以揭示导致上述特征的机制。在过去的十年中，基因组测序和注释技术取得了巨大的进步。此外，AlphaFold2 （AF2）、RoseTTAFold2 （RF2）等人工智能程序也可以高精度地预测蛋白质的3D结构。利用现有的工具，本研究旨在重新注释BP基因组中编码的hp。为了实现这一目标，我们从伯克霍尔德菌基因组数据库中检索了1869个hp，然后与UniProt交叉比对。过滤后，419个仍然是假设的。使用BLASTp分析序列同源性和抗生素耐药蛋白，然后使用AF2和RF2进行3D结构预测，并使用Foldseek进行结构同源性搜索。该研究成功地注释了209个hp，只有210个蛋白（3.7%的BP编码序列）仍然被归类为“假设”。通过结构比较和活性位点分析进一步分析了预测hp的功能。注释蛋白列表包括15种抗生素抗性蛋白，5种血红素加氧酶样折叠蛋白，参与生物膜形成，宿主发病机制和抗菌活性，以及5种必需蛋白。这些蛋白代表了开发新的类鼻疽抗生素的有希望的药物靶点。尽管如此，实验验证将是必要的，以表征预测的蛋白质功能。

{"title":"Unveiling Putative Functions of Burkholderia pseudomallei K96243 Hypothetical Proteins Via High-Throughput Characterization of Structural Similarities.","authors":"Syed Abuthakir Mohamed Husain, Su Datt Lam, Mohd Firdaus-Raih, Sheila Nathan, Nor Azlan Nor Muhammad, Chyan Leong Ng","doi":"10.1007/s12539-025-00792-6","DOIUrl":"https://doi.org/10.1007/s12539-025-00792-6","url":null,"abstract":"Burkholderia pseudomallei (BP) infections claims tens of thousands of lives worldwide every year. The bacterium's distinctive characteristics include antibiotic resistance, virulence and ability to survive in stressful environments. The B. pseudomallei genome sequencing and annotation reveal that about 25% of the genes encode hypothetical proteins (HPs). As such, characterising the HPs could shed light on the mechanisms that contribute to the above characteristics. Over the last decade, genome sequencing and annotation technologies have advanced drastically. Furthermore, artificial intelligence programs such as AlphaFold2 (AF2), RoseTTAFold2 (RF2), which can predict 3D protein structures with high accuracy, are also available. Taking advantage of the available tools, this study aimed to re-annotate HPs that are encoded within the BP genome. To achieve this, we retrieved 1869 HPs from the Burkholderia Genome Database, then cross-referenced with UniProt. After filtering, 419 remain hypothetical. These were analysed using BLASTp for sequence homologs and antibiotic resistance proteins, followed by 3D structure prediction using AF2 and RF2, and structural homolog search using Foldseek. This study successfully annotated 209 HPs with only 210 proteins (3.7% of BP coding sequences) still classified as 'hypothetical'. The functions of the predicted HPs were further analysed using structure comparison and active site analysis. The annotated protein list includes fifteen antibiotic resistance proteins, five haem oxygenase-like fold proteins involved in biofilm formation, host pathogenesis, and antibacterial activity, along with five essential proteins. These proteins represent promising drug targets for developing new antibiotics against melioidosis. Nonetheless, experimental validation will be necessary to characterize the predicted protein functions.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dear-OMG: An Omics-General Compression Method for Genomics, Proteomics and Metabolomics Data. 用于基因组学、蛋白质组学和代谢组学数据的组学通用压缩方法。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-12-04 DOI: 10.1007/s12539-025-00786-4

Qingzu He, Xiang Li, Huan Guo, Yulin Li, Jianwei Shuai

As high-throughput omics technologies continue to advance, researchers are facing the challenge of a rapid surge in proteomics, metabolomics, and genomics data. This growth not only necessitates more disk space and network bandwidth, but also complicates data sharing and subsequent analysis. To address this challenge and enhance the analytical efficiency of downstream software, we propose Dear-OMG, a unified, compact, flexible, and high-performance metadata storage solution. Dear-OMG introduces a novel file storage structure and utilizes the Elias-Fano encoding algorithm to compress and store proteomics, genomics, and metabolomics metadata into the unified OMG format. The OMG format not only demonstrates remarkably high compression and decompression speeds, but also enables parallel random access to any data block. Test results reveal that, compared to the commonly used proteomics formats of mzXML and mzML, the OMG format achieves an 80% reduction in storage space, a 90% decrease in conversion time, and approximately a tenfold speed improvement with support for parallel random access. Dear-OMG is freely available at https://github.com/jianweishuai/Dear-OMG .

随着高通量组学技术的不断发展，研究人员正面临着蛋白质组学、代谢组学和基因组学数据快速增长的挑战。这种增长不仅需要更多的磁盘空间和网络带宽，而且还使数据共享和后续分析变得复杂。为了应对这一挑战并提高下游软件的分析效率，我们提出了一个统一、紧凑、灵活、高性能的元数据存储解决方案——Dear-OMG。darling -OMG引入了一种新的文件存储结构，并利用Elias-Fano编码算法将蛋白质组学、基因组学和代谢组学元数据压缩并存储为统一的OMG格式。OMG格式不仅展示了非常高的压缩和解压缩速度，而且还支持对任何数据块的并行随机访问。测试结果表明，与常用的mzXML和mzML蛋白质组学格式相比，OMG格式减少了80%的存储空间，减少了90%的转换时间，并且在支持并行随机访问的情况下，速度提高了大约10倍。Dear-OMG可以在https://github.com/jianweishuai/Dear-OMG免费获得。

{"title":"Dear-OMG: An Omics-General Compression Method for Genomics, Proteomics and Metabolomics Data.","authors":"Qingzu He, Xiang Li, Huan Guo, Yulin Li, Jianwei Shuai","doi":"10.1007/s12539-025-00786-4","DOIUrl":"https://doi.org/10.1007/s12539-025-00786-4","url":null,"abstract":"As high-throughput omics technologies continue to advance, researchers are facing the challenge of a rapid surge in proteomics, metabolomics, and genomics data. This growth not only necessitates more disk space and network bandwidth, but also complicates data sharing and subsequent analysis. To address this challenge and enhance the analytical efficiency of downstream software, we propose Dear-OMG, a unified, compact, flexible, and high-performance metadata storage solution. Dear-OMG introduces a novel file storage structure and utilizes the Elias-Fano encoding algorithm to compress and store proteomics, genomics, and metabolomics metadata into the unified OMG format. The OMG format not only demonstrates remarkably high compression and decompression speeds, but also enables parallel random access to any data block. Test results reveal that, compared to the commonly used proteomics formats of mzXML and mzML, the OMG format achieves an 80% reduction in storage space, a 90% decrease in conversion time, and approximately a tenfold speed improvement with support for parallel random access. Dear-OMG is freely available at https://github.com/jianweishuai/Dear-OMG .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145677573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GN-Net: A Geometric and Neighborhood-Aware Network for Predicting Intracranial Aneurysm Rupture Risk and Assisting Clinical Decision-Making. GN-Net：用于预测颅内动脉瘤破裂风险和辅助临床决策的几何和邻域感知网络。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-12-02 DOI: 10.1007/s12539-025-00785-5

Jiafeng Zhou, Peiying Li, Yongchang Liu, Shikui Tu, Bing Zhao, Jieqing Wan, Yongchun Chen, Yunjun Yang, Lei Xu

引用次数: 0

DCPPS: Prediction of Kinase-Specific Phosphorylation Sites Using Dynamic Embedding and Cross-Representation Interaction. DCPPS：使用动态嵌入和交叉表征相互作用预测激酶特异性磷酸化位点。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-12-01 Epub Date: 2025-06-11 DOI: 10.1007/s12539-025-00731-5

Mengya Liu, Xin Wang, Zhan-Li Sun, Xiao Yang, Xia Chen

Substrate-specific kinases catalyze addition of phosphate groups to specific amino acids, resulting in kinase-specific phosphorylation. It participates in various signaling pathways and regulation processes. The relevant computational methods can accelerate study of protein function research, disease exploration, and drug development. Existing approaches typically rely on global and local sequences to extract predictive features but often neglect position information and critical feature interaction, which is essential for effective feature representation. In this work, we propose a novel kinase-specific phosphorylation site prediction model, DCPPS, by leveraging dynamic embedding encoding and interaction between global and local representations. Specifically, to enrich sequence position information and strengthen features, we construct a dynamic embedding encoding (DEE) to capture amino acid semantics and positional information of upstream and downstream amino acids, dynamically optimizing feature embeddings. Considering the lack of in-depth feature interaction between local and global information, we design a cross-representation interaction unit (CRIU) to facilitate in-depth mining and complementary improvement of potential connections between multi-source features. Results of kinase-specific phosphorylation and multiple extended experiments show that DCPPS has better predictive performance and scalability. Further ablation studies demonstrate that incorporating global protein information, DEE, and CRIU markedly enhances phosphorylation site prediction accuracy, particularly in mitigating class imbalance.

底物特异性激酶催化磷酸基团加成到特定氨基酸上，导致激酶特异性磷酸化。它参与多种信号通路和调控过程。相关的计算方法可以加速蛋白质功能研究、疾病探索和药物开发的研究。现有的方法通常依赖于全局和局部序列来提取预测特征，但往往忽略了位置信息和关键特征之间的相互作用，而这对于有效的特征表示至关重要。在这项工作中，我们提出了一种新的激酶特异性磷酸化位点预测模型，DCPPS，利用全局和局部表示之间的动态嵌入编码和相互作用。具体而言，为丰富序列位置信息，强化特征，构建动态嵌入编码（DEE），捕获氨基酸语义和上下游氨基酸的位置信息，动态优化特征嵌入。考虑到局部和全局信息之间缺乏深度的特征交互，我们设计了一个交叉表示交互单元（cross-representation interaction unit， CRIU），以促进多源特征之间潜在联系的深度挖掘和互补改进。激酶特异性磷酸化和多次扩展实验的结果表明，DCPPS具有更好的预测性能和可扩展性。进一步的消融研究表明，结合全局蛋白信息、DEE和CRIU可显著提高磷酸化位点预测的准确性，特别是在减轻类失衡方面。

{"title":"DCPPS: Prediction of Kinase-Specific Phosphorylation Sites Using Dynamic Embedding and Cross-Representation Interaction.","authors":"Mengya Liu, Xin Wang, Zhan-Li Sun, Xiao Yang, Xia Chen","doi":"10.1007/s12539-025-00731-5","DOIUrl":"10.1007/s12539-025-00731-5","url":null,"abstract":"Substrate-specific kinases catalyze addition of phosphate groups to specific amino acids, resulting in kinase-specific phosphorylation. It participates in various signaling pathways and regulation processes. The relevant computational methods can accelerate study of protein function research, disease exploration, and drug development. Existing approaches typically rely on global and local sequences to extract predictive features but often neglect position information and critical feature interaction, which is essential for effective feature representation. In this work, we propose a novel kinase-specific phosphorylation site prediction model, DCPPS, by leveraging dynamic embedding encoding and interaction between global and local representations. Specifically, to enrich sequence position information and strengthen features, we construct a dynamic embedding encoding (DEE) to capture amino acid semantics and positional information of upstream and downstream amino acids, dynamically optimizing feature embeddings. Considering the lack of in-depth feature interaction between local and global information, we design a cross-representation interaction unit (CRIU) to facilitate in-depth mining and complementary improvement of potential connections between multi-source features. Results of kinase-specific phosphorylation and multiple extended experiments show that DCPPS has better predictive performance and scalability. Further ablation studies demonstrate that incorporating global protein information, DEE, and CRIU markedly enhances phosphorylation site prediction accuracy, particularly in mitigating class imbalance.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"1056-1073"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

iALP: Identification of Allergenic Proteins Based on Large Language Model and Gate Linear Unit. 基于大语言模型和门线性单元的致敏蛋白识别。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-12-01 Epub Date: 2025-07-13 DOI: 10.1007/s12539-025-00734-2

Bing Zhang, Jianping Zhao, Yannan Bin, Junfeng Xia

The rising incidence of allergic disorders has emerged as a pressing public health issue worldwide, underscoring the need for intensified research and efficacious intervention measures. Accurate identification of allergenic proteins (ALPs) is essential in preventing allergic reactions and mitigating health risks at an individual level. Although machine learning and deep learning techniques have been widely applied in ALP identification, existing methods often have limitations in capturing their complex features. In response, we introduce a novel method iALP, which leverages a large language model ProtT5 and the gate linear unit (GLU) for ALP identification with high efficacy. The advanced features in ProtT5 enable an in-depth analysis of the complex characteristics of ALPs, while GLU captures the intricate nonlinear features hidden within these proteins. The results demonstrate that iALP achieves an impressive accuracy and F1-score of 0.957 on the test set. Furthermore, it demonstrates superior performance compared to the leading predictors in a separate dataset. We also provide a detailed discussion of the model performance with protein sequences shorter than 100 amino acids. We hope that iALP will facilitate accurate ALP prediction, thereby supporting effective allergy symptom prevention and the implementation of allergen prevention and treatment strategies. The iALP source codes and datasets for prediction tasks can be accessed from the GitHub repository located at https://github.com/xialab-ahu/iALP.git .

过敏性疾病发病率的上升已成为世界范围内一个紧迫的公共卫生问题，强调了加强研究和有效干预措施的必要性。准确鉴定致敏蛋白（ALPs）对于预防过敏反应和减轻个人健康风险至关重要。尽管机器学习和深度学习技术已广泛应用于ALP识别，但现有方法在捕获其复杂特征方面往往存在局限性。为此，我们提出了一种新的方法iALP，该方法利用大型语言模型ProtT5和门线性单元（GLU）进行高效的ALP识别。ProtT5的先进功能能够深入分析ALPs的复杂特征，而GLU则捕获隐藏在这些蛋白质中的复杂非线性特征。结果表明，iALP在测试集上取得了令人印象深刻的准确性，f1得分为0.957。此外，与单独数据集中的主要预测器相比，它展示了优越的性能。我们还详细讨论了短于100个氨基酸的蛋白质序列的模型性能。我们希望iALP能够促进准确的ALP预测，从而支持有效的过敏症状预防和过敏原预防和治疗策略的实施。预测任务的iALP源代码和数据集可以从位于https://github.com/xialab-ahu/iALP.git的GitHub存储库访问。

{"title":"iALP: Identification of Allergenic Proteins Based on Large Language Model and Gate Linear Unit.","authors":"Bing Zhang, Jianping Zhao, Yannan Bin, Junfeng Xia","doi":"10.1007/s12539-025-00734-2","DOIUrl":"10.1007/s12539-025-00734-2","url":null,"abstract":"The rising incidence of allergic disorders has emerged as a pressing public health issue worldwide, underscoring the need for intensified research and efficacious intervention measures. Accurate identification of allergenic proteins (ALPs) is essential in preventing allergic reactions and mitigating health risks at an individual level. Although machine learning and deep learning techniques have been widely applied in ALP identification, existing methods often have limitations in capturing their complex features. In response, we introduce a novel method iALP, which leverages a large language model ProtT5 and the gate linear unit (GLU) for ALP identification with high efficacy. The advanced features in ProtT5 enable an in-depth analysis of the complex characteristics of ALPs, while GLU captures the intricate nonlinear features hidden within these proteins. The results demonstrate that iALP achieves an impressive accuracy and F1-score of 0.957 on the test set. Furthermore, it demonstrates superior performance compared to the leading predictors in a separate dataset. We also provide a detailed discussion of the model performance with protein sequences shorter than 100 amino acids. We hope that iALP will facilitate accurate ALP prediction, thereby supporting effective allergy symptom prevention and the implementation of allergen prevention and treatment strategies. The iALP source codes and datasets for prediction tasks can be accessed from the GitHub repository located at https://github.com/xialab-ahu/iALP.git .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"860-872"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144617414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0