Pub Date : 2026-01-13DOI: 10.1007/s12539-025-00800-9
Changsong Shen, Fangxiang Wu, Bo Liao, Jinsheng Wang, Qingbo Li
Progressive mild cognitive impairment (pMCI) often develops into Alzheimer's disease (AD), whereas stable mild cognitive impairment (sMCI) remains cognitively unchanged. Therefore, early identification of pMCI based on multimodal neuroimaging data (e.g., MRI, PET) is clinically valuable. However, limited multimodal data reduces complementary information across modalities and degrades prediction performance. Existing generative adversarial networks (GANs) often overlook local information when synthesizing cross-modal neuroimages, leading to suboptimal image quality. Motivated by these shortcomings, we propose a generative adversarial network (FGGAN) based on fine-grained image recognition for cross-modal image synthesis and pMCI progression prediction. FGGAN comprises a GAN, a feature depth extraction (FDE) module, and a classifier module. The GAN synthesizes high-quality missing modality data by leveraging local and global cues from the input image, while extracting multimodal feature representations. The FDE refines semantic features to improve feature adaptation for the classifier, which predicts pMCI progression from fused multimodal features. Results from the ADNI dataset indicate that FGGAN achieves superior performance in image synthesis quality and disease classification.
{"title":"Generative Adversarial Networks Based on Fine-Grained Image Recognition for the Progression Prediction of Progressive Mild Cognitive Impairment.","authors":"Changsong Shen, Fangxiang Wu, Bo Liao, Jinsheng Wang, Qingbo Li","doi":"10.1007/s12539-025-00800-9","DOIUrl":"https://doi.org/10.1007/s12539-025-00800-9","url":null,"abstract":"<p><p>Progressive mild cognitive impairment (pMCI) often develops into Alzheimer's disease (AD), whereas stable mild cognitive impairment (sMCI) remains cognitively unchanged. Therefore, early identification of pMCI based on multimodal neuroimaging data (e.g., MRI, PET) is clinically valuable. However, limited multimodal data reduces complementary information across modalities and degrades prediction performance. Existing generative adversarial networks (GANs) often overlook local information when synthesizing cross-modal neuroimages, leading to suboptimal image quality. Motivated by these shortcomings, we propose a generative adversarial network (FGGAN) based on fine-grained image recognition for cross-modal image synthesis and pMCI progression prediction. FGGAN comprises a GAN, a feature depth extraction (FDE) module, and a classifier module. The GAN synthesizes high-quality missing modality data by leveraging local and global cues from the input image, while extracting multimodal feature representations. The FDE refines semantic features to improve feature adaptation for the classifier, which predicts pMCI progression from fused multimodal features. Results from the ADNI dataset indicate that FGGAN achieves superior performance in image synthesis quality and disease classification.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145966354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Retinal vessel segmentation is crucial for clinical diagnosis due to the rich morphological information in retinal fundus images. Although neural networks perform well, issues like feature loss during encoding and insufficient context fusion in skip connections remain. The complex curvature of small vessels and uneven background brightness further complicate pathological image segmentation. To address these problems, this paper proposes a multi-level bidirectional attention aggregation network. The encoder proposes a Partial Encoder Block (PEB) to reduce feature loss from traditional convolution. A Dynamic Direction Attention Module (DDAM) is proposed in the skip connection to enhance anisotropic geometric representation, preserving fine vessel details and contextual information. Additionally, a Multi-Feature Fusion Module (MFFM) is proposed to fuse multi-level features, retaining details while suppressing background noise. Experiments on DRIVE, STARE, and CHASEDB1 datasets demonstrate the network's effectiveness. On DRIVE, AUC, F1-score, and Sensitivity improved by 0.19%, 0.43%, and 1.17%, respectively. On STARE, AUC, F1, and sensitivity rose by 0.26%, 2.95%, and 2.07%, respectively. On CHASEDB1, AUC, F1-score, and specificity increased by 0.2%, 1.12%, and 0.44%, respectively. Results show the proposed network outperforms existing methods in segmentation performance.
{"title":"Advanced Multi-Level Bidirectional Attention Network for Retinal Vessel Segmentation.","authors":"Zhendi Ma, Xiaobo Li, Yuxin Zhao, Jiahao Wang, Zhongmei Han, Hui Wang","doi":"10.1007/s12539-025-00793-5","DOIUrl":"https://doi.org/10.1007/s12539-025-00793-5","url":null,"abstract":"<p><p>Retinal vessel segmentation is crucial for clinical diagnosis due to the rich morphological information in retinal fundus images. Although neural networks perform well, issues like feature loss during encoding and insufficient context fusion in skip connections remain. The complex curvature of small vessels and uneven background brightness further complicate pathological image segmentation. To address these problems, this paper proposes a multi-level bidirectional attention aggregation network. The encoder proposes a Partial Encoder Block (PEB) to reduce feature loss from traditional convolution. A Dynamic Direction Attention Module (DDAM) is proposed in the skip connection to enhance anisotropic geometric representation, preserving fine vessel details and contextual information. Additionally, a Multi-Feature Fusion Module (MFFM) is proposed to fuse multi-level features, retaining details while suppressing background noise. Experiments on DRIVE, STARE, and CHASEDB1 datasets demonstrate the network's effectiveness. On DRIVE, AUC, F1-score, and Sensitivity improved by 0.19%, 0.43%, and 1.17%, respectively. On STARE, AUC, F1, and sensitivity rose by 0.26%, 2.95%, and 2.07%, respectively. On CHASEDB1, AUC, F1-score, and specificity increased by 0.2%, 1.12%, and 0.44%, respectively. Results show the proposed network outperforms existing methods in segmentation performance.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1007/s12539-025-00796-2
Junwei Yu, Yuhe Xia, Jianping Li, Nan Liu, Haoze Li, Weiya Shi
Cutaneous malignancies represent one of the most common cancers globally, with consistently rising incidence rates driving demand for enhanced diagnostic methodologies. While dermoscopy delivers high-resolution image data, existing CNN (convolutional neural network)-based approaches display constrained perception abilities when handling complex lesion boundaries and frequency-domain features. To overcome these constraints, we introduce EASNet, an innovative edge-aware segmentation network that combines frequency-domain insights with explicit boundary modeling. EASNet leverages discrete cosine transform (DCT) and discrete wavelet transform (DWT) to acquire multi-scale frequency information, maintaining global structures alongside precise boundary details. Furthermore, a boundary-driven criss-cross (BDCC) attention component strengthens spatial dependency learning, and a hybrid loss mechanism guarantees accurate boundary supervision throughout training. Extensive experiments on ISIC2017 and ISIC2018 datasets reveal that EASNet attains competitive performance in segmentation precision, boundary clarity, and positional consistency. This work pushes forward dermatological image analysis, offering a dependable tool for precise clinical assessment and therapeutic strategy development.
{"title":"EASNet: Edge-aware Segmentation Network for Skin Lesion Segmentation with Boundary-aware and Frequency Attention Mechanisms.","authors":"Junwei Yu, Yuhe Xia, Jianping Li, Nan Liu, Haoze Li, Weiya Shi","doi":"10.1007/s12539-025-00796-2","DOIUrl":"https://doi.org/10.1007/s12539-025-00796-2","url":null,"abstract":"<p><p>Cutaneous malignancies represent one of the most common cancers globally, with consistently rising incidence rates driving demand for enhanced diagnostic methodologies. While dermoscopy delivers high-resolution image data, existing CNN (convolutional neural network)-based approaches display constrained perception abilities when handling complex lesion boundaries and frequency-domain features. To overcome these constraints, we introduce EASNet, an innovative edge-aware segmentation network that combines frequency-domain insights with explicit boundary modeling. EASNet leverages discrete cosine transform (DCT) and discrete wavelet transform (DWT) to acquire multi-scale frequency information, maintaining global structures alongside precise boundary details. Furthermore, a boundary-driven criss-cross (BDCC) attention component strengthens spatial dependency learning, and a hybrid loss mechanism guarantees accurate boundary supervision throughout training. Extensive experiments on ISIC2017 and ISIC2018 datasets reveal that EASNet attains competitive performance in segmentation precision, boundary clarity, and positional consistency. This work pushes forward dermatological image analysis, offering a dependable tool for precise clinical assessment and therapeutic strategy development.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1007/s12539-025-00797-1
Zhen-Ning Yin, Yu-Hao Zeng, Feng Gao
The precise regulation of DNA replication timing (RT) relies on deciphering sequence patterns. Although significant advances have been made in identifying sequence patterns associated with replication timing, there are still few computational pipelines designed for accurate RT prediction. In this study, we propose a deep learning-based framework, named RT-Predictor, leveraging a residual network (ResNet) to classify sequence patterns associated with RT across the human genome into four distinct domains: early replication domain (ERD), down transition zone (DTZ), late replication domain (LRD), and up transition zone (UTZ). Using solely DNA sequence patterns, the model achieves an accuracy of 74.58%, a Matthews correlation coefficient (MCC) of 0.6612, an F1-score of 0.7458, and a Recall of 0.7457, demonstrating its ability to resolve complex DNA replication timing patterns. By incorporating positional and frequency-based features derived from DNA sequences, we extract a comprehensive set of 384 features that effectively characterize replication dynamics. Genome-wide RT prediction reveals that replication origins (ORIs) predominantly initiate replication during the early S-phase, potentially linking specific sequence patterns to DNA damage repair mechanisms. These findings demonstrate the power of deep learning in decoding the regulatory significance of sequence patterns in replication timing and provide critical insights into the molecular basis of genomic stability and its disruption in diseases, particularly cancer.
{"title":"ResNet-Powered Multi-Class Identification of Sequence Patterns for Genome Replication Timing Analysis.","authors":"Zhen-Ning Yin, Yu-Hao Zeng, Feng Gao","doi":"10.1007/s12539-025-00797-1","DOIUrl":"https://doi.org/10.1007/s12539-025-00797-1","url":null,"abstract":"<p><p>The precise regulation of DNA replication timing (RT) relies on deciphering sequence patterns. Although significant advances have been made in identifying sequence patterns associated with replication timing, there are still few computational pipelines designed for accurate RT prediction. In this study, we propose a deep learning-based framework, named RT-Predictor, leveraging a residual network (ResNet) to classify sequence patterns associated with RT across the human genome into four distinct domains: early replication domain (ERD), down transition zone (DTZ), late replication domain (LRD), and up transition zone (UTZ). Using solely DNA sequence patterns, the model achieves an accuracy of 74.58%, a Matthews correlation coefficient (MCC) of 0.6612, an F1-score of 0.7458, and a Recall of 0.7457, demonstrating its ability to resolve complex DNA replication timing patterns. By incorporating positional and frequency-based features derived from DNA sequences, we extract a comprehensive set of 384 features that effectively characterize replication dynamics. Genome-wide RT prediction reveals that replication origins (ORIs) predominantly initiate replication during the early S-phase, potentially linking specific sequence patterns to DNA damage repair mechanisms. These findings demonstrate the power of deep learning in decoding the regulatory significance of sequence patterns in replication timing and provide critical insights into the molecular basis of genomic stability and its disruption in diseases, particularly cancer.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1007/s12539-025-00792-6
Syed Abuthakir Mohamed Husain, Su Datt Lam, Mohd Firdaus-Raih, Sheila Nathan, Nor Azlan Nor Muhammad, Chyan Leong Ng
Burkholderia pseudomallei (BP) infections claims tens of thousands of lives worldwide every year. The bacterium's distinctive characteristics include antibiotic resistance, virulence and ability to survive in stressful environments. The B. pseudomallei genome sequencing and annotation reveal that about 25% of the genes encode hypothetical proteins (HPs). As such, characterising the HPs could shed light on the mechanisms that contribute to the above characteristics. Over the last decade, genome sequencing and annotation technologies have advanced drastically. Furthermore, artificial intelligence programs such as AlphaFold2 (AF2), RoseTTAFold2 (RF2), which can predict 3D protein structures with high accuracy, are also available. Taking advantage of the available tools, this study aimed to re-annotate HPs that are encoded within the BP genome. To achieve this, we retrieved 1869 HPs from the Burkholderia Genome Database, then cross-referenced with UniProt. After filtering, 419 remain hypothetical. These were analysed using BLASTp for sequence homologs and antibiotic resistance proteins, followed by 3D structure prediction using AF2 and RF2, and structural homolog search using Foldseek. This study successfully annotated 209 HPs with only 210 proteins (3.7% of BP coding sequences) still classified as 'hypothetical'. The functions of the predicted HPs were further analysed using structure comparison and active site analysis. The annotated protein list includes fifteen antibiotic resistance proteins, five haem oxygenase-like fold proteins involved in biofilm formation, host pathogenesis, and antibacterial activity, along with five essential proteins. These proteins represent promising drug targets for developing new antibiotics against melioidosis. Nonetheless, experimental validation will be necessary to characterize the predicted protein functions.
{"title":"Unveiling Putative Functions of Burkholderia pseudomallei K96243 Hypothetical Proteins Via High-Throughput Characterization of Structural Similarities.","authors":"Syed Abuthakir Mohamed Husain, Su Datt Lam, Mohd Firdaus-Raih, Sheila Nathan, Nor Azlan Nor Muhammad, Chyan Leong Ng","doi":"10.1007/s12539-025-00792-6","DOIUrl":"https://doi.org/10.1007/s12539-025-00792-6","url":null,"abstract":"<p><p>Burkholderia pseudomallei (BP) infections claims tens of thousands of lives worldwide every year. The bacterium's distinctive characteristics include antibiotic resistance, virulence and ability to survive in stressful environments. The B. pseudomallei genome sequencing and annotation reveal that about 25% of the genes encode hypothetical proteins (HPs). As such, characterising the HPs could shed light on the mechanisms that contribute to the above characteristics. Over the last decade, genome sequencing and annotation technologies have advanced drastically. Furthermore, artificial intelligence programs such as AlphaFold2 (AF2), RoseTTAFold2 (RF2), which can predict 3D protein structures with high accuracy, are also available. Taking advantage of the available tools, this study aimed to re-annotate HPs that are encoded within the BP genome. To achieve this, we retrieved 1869 HPs from the Burkholderia Genome Database, then cross-referenced with UniProt. After filtering, 419 remain hypothetical. These were analysed using BLASTp for sequence homologs and antibiotic resistance proteins, followed by 3D structure prediction using AF2 and RF2, and structural homolog search using Foldseek. This study successfully annotated 209 HPs with only 210 proteins (3.7% of BP coding sequences) still classified as 'hypothetical'. The functions of the predicted HPs were further analysed using structure comparison and active site analysis. The annotated protein list includes fifteen antibiotic resistance proteins, five haem oxygenase-like fold proteins involved in biofilm formation, host pathogenesis, and antibacterial activity, along with five essential proteins. These proteins represent promising drug targets for developing new antibiotics against melioidosis. Nonetheless, experimental validation will be necessary to characterize the predicted protein functions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As high-throughput omics technologies continue to advance, researchers are facing the challenge of a rapid surge in proteomics, metabolomics, and genomics data. This growth not only necessitates more disk space and network bandwidth, but also complicates data sharing and subsequent analysis. To address this challenge and enhance the analytical efficiency of downstream software, we propose Dear-OMG, a unified, compact, flexible, and high-performance metadata storage solution. Dear-OMG introduces a novel file storage structure and utilizes the Elias-Fano encoding algorithm to compress and store proteomics, genomics, and metabolomics metadata into the unified OMG format. The OMG format not only demonstrates remarkably high compression and decompression speeds, but also enables parallel random access to any data block. Test results reveal that, compared to the commonly used proteomics formats of mzXML and mzML, the OMG format achieves an 80% reduction in storage space, a 90% decrease in conversion time, and approximately a tenfold speed improvement with support for parallel random access. Dear-OMG is freely available at https://github.com/jianweishuai/Dear-OMG .
{"title":"Dear-OMG: An Omics-General Compression Method for Genomics, Proteomics and Metabolomics Data.","authors":"Qingzu He, Xiang Li, Huan Guo, Yulin Li, Jianwei Shuai","doi":"10.1007/s12539-025-00786-4","DOIUrl":"https://doi.org/10.1007/s12539-025-00786-4","url":null,"abstract":"<p><p>As high-throughput omics technologies continue to advance, researchers are facing the challenge of a rapid surge in proteomics, metabolomics, and genomics data. This growth not only necessitates more disk space and network bandwidth, but also complicates data sharing and subsequent analysis. To address this challenge and enhance the analytical efficiency of downstream software, we propose Dear-OMG, a unified, compact, flexible, and high-performance metadata storage solution. Dear-OMG introduces a novel file storage structure and utilizes the Elias-Fano encoding algorithm to compress and store proteomics, genomics, and metabolomics metadata into the unified OMG format. The OMG format not only demonstrates remarkably high compression and decompression speeds, but also enables parallel random access to any data block. Test results reveal that, compared to the commonly used proteomics formats of mzXML and mzML, the OMG format achieves an 80% reduction in storage space, a 90% decrease in conversion time, and approximately a tenfold speed improvement with support for parallel random access. Dear-OMG is freely available at https://github.com/jianweishuai/Dear-OMG .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145677573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Substrate-specific kinases catalyze addition of phosphate groups to specific amino acids, resulting in kinase-specific phosphorylation. It participates in various signaling pathways and regulation processes. The relevant computational methods can accelerate study of protein function research, disease exploration, and drug development. Existing approaches typically rely on global and local sequences to extract predictive features but often neglect position information and critical feature interaction, which is essential for effective feature representation. In this work, we propose a novel kinase-specific phosphorylation site prediction model, DCPPS, by leveraging dynamic embedding encoding and interaction between global and local representations. Specifically, to enrich sequence position information and strengthen features, we construct a dynamic embedding encoding (DEE) to capture amino acid semantics and positional information of upstream and downstream amino acids, dynamically optimizing feature embeddings. Considering the lack of in-depth feature interaction between local and global information, we design a cross-representation interaction unit (CRIU) to facilitate in-depth mining and complementary improvement of potential connections between multi-source features. Results of kinase-specific phosphorylation and multiple extended experiments show that DCPPS has better predictive performance and scalability. Further ablation studies demonstrate that incorporating global protein information, DEE, and CRIU markedly enhances phosphorylation site prediction accuracy, particularly in mitigating class imbalance.
{"title":"DCPPS: Prediction of Kinase-Specific Phosphorylation Sites Using Dynamic Embedding and Cross-Representation Interaction.","authors":"Mengya Liu, Xin Wang, Zhan-Li Sun, Xiao Yang, Xia Chen","doi":"10.1007/s12539-025-00731-5","DOIUrl":"10.1007/s12539-025-00731-5","url":null,"abstract":"<p><p>Substrate-specific kinases catalyze addition of phosphate groups to specific amino acids, resulting in kinase-specific phosphorylation. It participates in various signaling pathways and regulation processes. The relevant computational methods can accelerate study of protein function research, disease exploration, and drug development. Existing approaches typically rely on global and local sequences to extract predictive features but often neglect position information and critical feature interaction, which is essential for effective feature representation. In this work, we propose a novel kinase-specific phosphorylation site prediction model, DCPPS, by leveraging dynamic embedding encoding and interaction between global and local representations. Specifically, to enrich sequence position information and strengthen features, we construct a dynamic embedding encoding (DEE) to capture amino acid semantics and positional information of upstream and downstream amino acids, dynamically optimizing feature embeddings. Considering the lack of in-depth feature interaction between local and global information, we design a cross-representation interaction unit (CRIU) to facilitate in-depth mining and complementary improvement of potential connections between multi-source features. Results of kinase-specific phosphorylation and multiple extended experiments show that DCPPS has better predictive performance and scalability. Further ablation studies demonstrate that incorporating global protein information, DEE, and CRIU markedly enhances phosphorylation site prediction accuracy, particularly in mitigating class imbalance.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"1056-1073"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rising incidence of allergic disorders has emerged as a pressing public health issue worldwide, underscoring the need for intensified research and efficacious intervention measures. Accurate identification of allergenic proteins (ALPs) is essential in preventing allergic reactions and mitigating health risks at an individual level. Although machine learning and deep learning techniques have been widely applied in ALP identification, existing methods often have limitations in capturing their complex features. In response, we introduce a novel method iALP, which leverages a large language model ProtT5 and the gate linear unit (GLU) for ALP identification with high efficacy. The advanced features in ProtT5 enable an in-depth analysis of the complex characteristics of ALPs, while GLU captures the intricate nonlinear features hidden within these proteins. The results demonstrate that iALP achieves an impressive accuracy and F1-score of 0.957 on the test set. Furthermore, it demonstrates superior performance compared to the leading predictors in a separate dataset. We also provide a detailed discussion of the model performance with protein sequences shorter than 100 amino acids. We hope that iALP will facilitate accurate ALP prediction, thereby supporting effective allergy symptom prevention and the implementation of allergen prevention and treatment strategies. The iALP source codes and datasets for prediction tasks can be accessed from the GitHub repository located at https://github.com/xialab-ahu/iALP.git .
{"title":"iALP: Identification of Allergenic Proteins Based on Large Language Model and Gate Linear Unit.","authors":"Bing Zhang, Jianping Zhao, Yannan Bin, Junfeng Xia","doi":"10.1007/s12539-025-00734-2","DOIUrl":"10.1007/s12539-025-00734-2","url":null,"abstract":"<p><p>The rising incidence of allergic disorders has emerged as a pressing public health issue worldwide, underscoring the need for intensified research and efficacious intervention measures. Accurate identification of allergenic proteins (ALPs) is essential in preventing allergic reactions and mitigating health risks at an individual level. Although machine learning and deep learning techniques have been widely applied in ALP identification, existing methods often have limitations in capturing their complex features. In response, we introduce a novel method iALP, which leverages a large language model ProtT5 and the gate linear unit (GLU) for ALP identification with high efficacy. The advanced features in ProtT5 enable an in-depth analysis of the complex characteristics of ALPs, while GLU captures the intricate nonlinear features hidden within these proteins. The results demonstrate that iALP achieves an impressive accuracy and F1-score of 0.957 on the test set. Furthermore, it demonstrates superior performance compared to the leading predictors in a separate dataset. We also provide a detailed discussion of the model performance with protein sequences shorter than 100 amino acids. We hope that iALP will facilitate accurate ALP prediction, thereby supporting effective allergy symptom prevention and the implementation of allergen prevention and treatment strategies. The iALP source codes and datasets for prediction tasks can be accessed from the GitHub repository located at https://github.com/xialab-ahu/iALP.git .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"860-872"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144617414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-07-16DOI: 10.1007/s12539-025-00737-z
Li Tan, Li Mengshan, Li Yelin, Zhu Jihong, Guan Lixin
While many machine and deep learning methods have been developed for predicting different types of DNA methylation, common feature encoding methods have not fully extracted the potential information in DNA sequences, influencing the prediction accuracy of the models. Furthermore, many methods focus solely on a single type of methylation, necessitating the development of robust universal predictors. Therefore, this study proposes a novel and efficient method for DNA methylation prediction, named DeepDNA-DNVFF. For sequence encoding, a new dual nucleotide visual fusion feature encoding (DNVFF) method is proposed by improving and integrating two-dimensional DNA visualization techniques. The hybrid deep learning model used in DeepDNA-DNVFF integrates CNN, BiLSTM, and an attention mechanism to enhance the model's ability to capture long-range dependencies. The results show that compared with traditional encoding methods, DNVFF can more effectively extract the latent feature information from DNA sequences. Compared to other existing advanced methods, DeepDNA-DNVFF excelled beyond the state-of-the-art method in 10 out of 17 species datasets, with the best Matthews correlation coefficient approximately 1.24% higher. DeepDNA-DNVFF effectively predicts DNA methylation sites, offering valuable insights for researchers to understand gene regulatory mechanisms and identify potential disease biomarkers.
{"title":"DNA Methylation Recognition Using Hybrid Deep Learning with Dual Nucleotide Visualization Fusion Feature Encoding.","authors":"Li Tan, Li Mengshan, Li Yelin, Zhu Jihong, Guan Lixin","doi":"10.1007/s12539-025-00737-z","DOIUrl":"10.1007/s12539-025-00737-z","url":null,"abstract":"<p><p>While many machine and deep learning methods have been developed for predicting different types of DNA methylation, common feature encoding methods have not fully extracted the potential information in DNA sequences, influencing the prediction accuracy of the models. Furthermore, many methods focus solely on a single type of methylation, necessitating the development of robust universal predictors. Therefore, this study proposes a novel and efficient method for DNA methylation prediction, named DeepDNA-DNVFF. For sequence encoding, a new dual nucleotide visual fusion feature encoding (DNVFF) method is proposed by improving and integrating two-dimensional DNA visualization techniques. The hybrid deep learning model used in DeepDNA-DNVFF integrates CNN, BiLSTM, and an attention mechanism to enhance the model's ability to capture long-range dependencies. The results show that compared with traditional encoding methods, DNVFF can more effectively extract the latent feature information from DNA sequences. Compared to other existing advanced methods, DeepDNA-DNVFF excelled beyond the state-of-the-art method in 10 out of 17 species datasets, with the best Matthews correlation coefficient approximately 1.24% higher. DeepDNA-DNVFF effectively predicts DNA methylation sites, offering valuable insights for researchers to understand gene regulatory mechanisms and identify potential disease biomarkers.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"873-891"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144649380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}