Predicting drug-target interaction (DTI) is of great importance for drug discovery and development. With the rapid development of biological and chemical technologies, computational methods for DTI prediction are becoming a promising approach. However, there are few solutions to the cold-start problem in DTI prediction scenarios, as these methods rely on existing interaction information to support their modeling. Consequently, they are unable to effectively predict DTIs for new drugs or targets with limited interaction data in the existing work. To this end, we propose a graph transformer method based on meta-learning named MGDTI (short for Meta-learning-based Graph Transformer for Drug-Target Interaction prediction) to fill this gap. Technically, we employ drug-drug similarity and target-target similarity as additional information to mitigate the scarcity of interactions. Besides, we trained MGDTI via meta-learning to be adaptive to cold-start tasks. Moreover, we employed graph transformer to prevent over-smoothing by capturing long-range dependencies. Extensive results on the benchmark dataset demonstrate that MGDTI is effective on DTI prediction under cold-start scenarios.
Therapeutic antibodies have emerged as a promising treatment option for a wide range of diseases. However, the light chain of antibodies can potentially induce amyloidosis, a condition characterized by protein misfolding and aggregation, posing a significant safety concern. Therefore, it is crucial to assess the amyloidogenic risk of therapeutic antibodies during the early stages of drug development. In this study, we introduce AB-Amy 2.0, a new computational model with enhanced performance for assessing the light chain amyloidogenic risk of therapeutic antibodies. By employing pretrained protein language models (PLMs) embeddings, AB-Amy 2.0 achieves higher accuracy in amyloidogenic risk prediction compared with traditional features offering a crucial tool for early-stage identification of antibodies with low aggregation propensity. The AB-Amy 2.0 was trained on antiBERTy embeddings and utilizes the SVM algorithm, resulting in superior performance metrics. On an independent test dataset, the model achieved high sensitivity, specificity, ACC, MCC and AUC of 93.47%, 89.23%, 91.92%, 0.8261 and 0.9739, respectively. These results highlight the effectiveness and robustness of AB-Amy 2.0 in predicting light chain amyloidogenic risk accurately. To facilitate user-friendly access, we have developed an online web server (http://i.uestc.edu.cn/AB-Amy2) and a command line tool (https://github.com/zzyywww/ABAmy2). These resources enable the broader application of this advanced model and promise to enhance the development of safer therapeutic antibodies.
The identification of positive selection has been framed as a classification task, with Convolutional Neural Networks (CNNs) already outperforming summary statistics and likelihood-based approaches in accuracy. Despite the prevalence of CNN-based methods that manipulate the pixels of images representing raw genomic data as a preprocessing step to improve classification accuracy, the efficacy of these pixel-rearrangement techniques remains inadequately examined, particularly in the presence of confounding factors like population bottlenecks, migration and recombination hotspots. We introduce a set of pixel rearrangement algorithms aimed at enhancing CNN classification accuracy in detecting selective sweeps. These algorithms are employed to assess the performance of four CNN models for selective sweep detection. Our findings illustrate that the judicious application of rearrangement algorithms notably enhances the overall classification accuracy of a CNN across various datasets simulating confounding factors. We observed that sorting the columns of the genomic matrices has higher on CNN performance than rearranging the sequences. To some extent, these rearrangement algorithms are more robust to misspecified demographic models compared with the utilization of the default preprocessing algorithm as suggested by the respective authors of each CNN architecture. We provide the data rearrangement algorithms as a distinct package available for download at: https://github.com/Zhaohq96/Genetic-data-rearrangement.
Recent developments in spatial transcriptomics (ST) technology have markedly enhanced the proposed capacity to comprehensively characterize gene expression patterns within tissue microenvironments while crucially preserving spatial context. However, the identification of spatial domains at the single-cell level remains a significant challenge in elucidating biological processes. To address this, SpaInGNN was developed, a sophisticated graph neural network (GNN) framework that accurately delineates spatial domains by integrating spatial location data, histological information, and gene expression profiles into low-dimensional latent embeddings. Additionally, to fully leverage spatial coordinate data, spatial integration using graph neural network (SpaInGNN) refines the graph constructed for spatial locations by incorporating both tissue image distance and Euclidean distance, following a pre-clustering of gene expression profiles. This refined graph is then embedded using a self-supervised GNN, which minimizes self-reconfiguration loss. By applying SpaInGNN to refined graphs across multiple consecutive tissue slices, this study mitigates the impact of batch effects in data analysis. The proposed method demonstrates substantial improvements in the accuracy of spatial domain recognition, providing a more faithful representation of the tissue organization in both mouse olfactory bulb and human lateral prefrontal cortex samples.