Yue-Chao Li, Zhu-Hong You, Chang-Qing Yu, Lei Wang, Lun Hu, Peng-Wei Hu, Yan Qiao, Xin-Fei Wang, Yu-An Huang
Recently, the role of competing endogenous RNAs in regulating gene expression through the interaction of microRNAs has been closely associated with the expression of circular RNAs (circRNAs) in various biological processes such as reproduction and apoptosis. While the number of confirmed circRNA-miRNA interactions (CMIs) continues to increase, the conventional in vitro approaches for discovery are expensive, labor intensive, and time consuming. Therefore, there is an urgent need for effective prediction of potential CMIs through appropriate data modeling and prediction based on known information. In this study, we proposed a novel model, called DeepCMI, that utilizes multi-source information on circRNA/miRNA to predict potential CMIs. Comprehensive evaluations on the CMI-9905 and CMI-9589 datasets demonstrated that DeepCMI successfully infers potential CMIs. Specifically, DeepCMI achieved AUC values of 90.54% and 94.8% on the CMI-9905 and CMI-9589 datasets, respectively. These results suggest that DeepCMI is an effective model for predicting potential CMIs and has the potential to significantly reduce the need for downstream in vitro studies. To facilitate the use of our trained model and data, we have constructed a computational platform, which is available at http://120.77.11.78/DeepCMI/. The source code and datasets used in this work are available at https://github.com/LiYuechao1998/DeepCMI.
{"title":"DeepCMI: a graph-based model for accurate prediction of circRNA-miRNA interactions with multiple information.","authors":"Yue-Chao Li, Zhu-Hong You, Chang-Qing Yu, Lei Wang, Lun Hu, Peng-Wei Hu, Yan Qiao, Xin-Fei Wang, Yu-An Huang","doi":"10.1093/bfgp/elad030","DOIUrl":"10.1093/bfgp/elad030","url":null,"abstract":"<p><p>Recently, the role of competing endogenous RNAs in regulating gene expression through the interaction of microRNAs has been closely associated with the expression of circular RNAs (circRNAs) in various biological processes such as reproduction and apoptosis. While the number of confirmed circRNA-miRNA interactions (CMIs) continues to increase, the conventional in vitro approaches for discovery are expensive, labor intensive, and time consuming. Therefore, there is an urgent need for effective prediction of potential CMIs through appropriate data modeling and prediction based on known information. In this study, we proposed a novel model, called DeepCMI, that utilizes multi-source information on circRNA/miRNA to predict potential CMIs. Comprehensive evaluations on the CMI-9905 and CMI-9589 datasets demonstrated that DeepCMI successfully infers potential CMIs. Specifically, DeepCMI achieved AUC values of 90.54% and 94.8% on the CMI-9905 and CMI-9589 datasets, respectively. These results suggest that DeepCMI is an effective model for predicting potential CMIs and has the potential to significantly reduce the need for downstream in vitro studies. To facilitate the use of our trained model and data, we have constructed a computational platform, which is available at http://120.77.11.78/DeepCMI/. The source code and datasets used in this work are available at https://github.com/LiYuechao1998/DeepCMI.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"276-285"},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10291543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yizhi Cui, Hongzhi Liu, Yutong Ming, Zheng Zhang, Li Liu, Ruijun Liu
G-quadruplex (G4), a non-classical deoxyribonucleic acid structure, is widely distributed in the genome and involved in various biological processes. In vivo, high-throughput sequencing has indicated that G4s are significantly enriched at functional regions in a cell-type-specific manner. Therefore, the prediction of G4s based on computational methods is necessary instead of the time-consuming and laborious experimental methods. Recently, G4 CUT&Tag has been developed to generate higher-resolution sequencing data than ChIP-seq, which provides more accurate training samples for model construction. In this paper, we present a new dataset construction method based on G4 CUT&Tag sequencing data and an XGBoost prediction model based on the machine learning boost method. The results show that our model performs well within and across cell types. Furthermore, sequence analysis indicates that the formation of G4 structure is greatly affected by the flanking sequences, and the GC content of the G4 flanking sequences is higher than non-G4. Moreover, we also identified G4 motifs in the high-resolution dataset, among which we found several motifs for known transcription factors (TFs), such as SP2 and BPC. These TFs may directly or indirectly affect the formation of the G4 structure.
{"title":"Prediction of strand-specific and cell-type-specific G-quadruplexes based on high-resolution CUT&Tag data.","authors":"Yizhi Cui, Hongzhi Liu, Yutong Ming, Zheng Zhang, Li Liu, Ruijun Liu","doi":"10.1093/bfgp/elad024","DOIUrl":"10.1093/bfgp/elad024","url":null,"abstract":"<p><p>G-quadruplex (G4), a non-classical deoxyribonucleic acid structure, is widely distributed in the genome and involved in various biological processes. In vivo, high-throughput sequencing has indicated that G4s are significantly enriched at functional regions in a cell-type-specific manner. Therefore, the prediction of G4s based on computational methods is necessary instead of the time-consuming and laborious experimental methods. Recently, G4 CUT&Tag has been developed to generate higher-resolution sequencing data than ChIP-seq, which provides more accurate training samples for model construction. In this paper, we present a new dataset construction method based on G4 CUT&Tag sequencing data and an XGBoost prediction model based on the machine learning boost method. The results show that our model performs well within and across cell types. Furthermore, sequence analysis indicates that the formation of G4 structure is greatly affected by the flanking sequences, and the GC content of the G4 flanking sequences is higher than non-G4. Moreover, we also identified G4 motifs in the high-resolution dataset, among which we found several motifs for known transcription factors (TFs), such as SP2 and BPC. These TFs may directly or indirectly affect the formation of the G4 structure.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"265-275"},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9683854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ting He, Zhipeng Gao, Ling Lin, Xu Zhang, Quan Zou
Esophageal cancer (ESCA) has a bad prognosis. Long non-coding RNA (lncRNA) impacts on cell proliferation. However, the prognosis function of N6-methyladenosine (m6A)-associated lncRNAs (m6A-lncRNAs) in ESCA remains unknown. Univariate Cox analysis was applied to investigate prognosis related m6A-lncRNAs, based on which the samples were clustered. Wilcoxon rank and Chi-square tests were adopted to compare the clinical traits, survival, pathway activity and immune infiltration in different clusters where overall survival, clinical traits (N stage), tumor-invasive immune cells and pathway activity were found significantly different. Through least absolute shrinkage and selection operator and proportional hazard (Lasso-Cox) model, five m6A-lncRNAs were selected to construct the prognostic signature (m6A-lncSig) and risk score. To investigate the link between risk score and clinical traits or immunological microenvironments, Chi-square test and Spearman correlation analysis were utilized. Risk score was found connected with N stage, tumor stage, different clusters, macrophages M2, B cells naive and T cells CD4 memory resting. Risk score and tumor stage were found as independent prognostic variables. And the constructed nomogram model had high accuracy in predicting prognosis. The obtained m6A-lncSig could be taken as potential prognostic biomarker for ESCA patients. This study offers a theoretical foundation for clinical diagnosis and prognosis of ESCA.
{"title":"Prognostic signature analysis and survival prediction of esophageal cancer based on N6-methyladenosine associated lncRNAs.","authors":"Ting He, Zhipeng Gao, Ling Lin, Xu Zhang, Quan Zou","doi":"10.1093/bfgp/elad028","DOIUrl":"10.1093/bfgp/elad028","url":null,"abstract":"<p><p>Esophageal cancer (ESCA) has a bad prognosis. Long non-coding RNA (lncRNA) impacts on cell proliferation. However, the prognosis function of N6-methyladenosine (m6A)-associated lncRNAs (m6A-lncRNAs) in ESCA remains unknown. Univariate Cox analysis was applied to investigate prognosis related m6A-lncRNAs, based on which the samples were clustered. Wilcoxon rank and Chi-square tests were adopted to compare the clinical traits, survival, pathway activity and immune infiltration in different clusters where overall survival, clinical traits (N stage), tumor-invasive immune cells and pathway activity were found significantly different. Through least absolute shrinkage and selection operator and proportional hazard (Lasso-Cox) model, five m6A-lncRNAs were selected to construct the prognostic signature (m6A-lncSig) and risk score. To investigate the link between risk score and clinical traits or immunological microenvironments, Chi-square test and Spearman correlation analysis were utilized. Risk score was found connected with N stage, tumor stage, different clusters, macrophages M2, B cells naive and T cells CD4 memory resting. Risk score and tumor stage were found as independent prognostic variables. And the constructed nomogram model had high accuracy in predicting prognosis. The obtained m6A-lncSig could be taken as potential prognostic biomarker for ESCA patients. This study offers a theoretical foundation for clinical diagnosis and prognosis of ESCA.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"239-248"},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9886829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dayu Tan, Haijun Jiang, Haitao Li, Ying Xie, Yansen Su
The precise identification of drug-protein inter action (DPI) can significantly speed up the drug discovery process. Bioassay methods are time-consuming and expensive to screen for each pair of drug proteins. Machine-learning-based methods cannot accurately predict a large number of DPIs. Compared with traditional computing methods, deep learning methods need less domain knowledge and have strong data learning ability. In this study, we construct a DPI prediction model based on dual channel neural networks with an efficient path attention mechanism, called DCA-DPI. The drug molecular graph and protein sequence are used as the data input of the model, and the residual graph neural network and the residual convolution network are used to learn the feature representation of the drug and protein, respectively, to obtain the feature vector of the drug and the hidden vector of protein. To get a more accurate protein feature vector, the weighted sum of the hidden vector of protein is applied using the neural attention mechanism. In the end, drug and protein vectors are concatenated and input into the full connection layer for classification. In order to evaluate the performance of DCA-DPI, three widely used public data, Human, C.elegans and DUD-E, are used in the experiment. The evaluation metrics values in the experiment are superior to other relevant methods. Experiments show that our model is efficient for DPI prediction.
{"title":"Prediction of drug-protein interaction based on dual channel neural networks with attention mechanism.","authors":"Dayu Tan, Haijun Jiang, Haitao Li, Ying Xie, Yansen Su","doi":"10.1093/bfgp/elad037","DOIUrl":"10.1093/bfgp/elad037","url":null,"abstract":"<p><p>The precise identification of drug-protein inter action (DPI) can significantly speed up the drug discovery process. Bioassay methods are time-consuming and expensive to screen for each pair of drug proteins. Machine-learning-based methods cannot accurately predict a large number of DPIs. Compared with traditional computing methods, deep learning methods need less domain knowledge and have strong data learning ability. In this study, we construct a DPI prediction model based on dual channel neural networks with an efficient path attention mechanism, called DCA-DPI. The drug molecular graph and protein sequence are used as the data input of the model, and the residual graph neural network and the residual convolution network are used to learn the feature representation of the drug and protein, respectively, to obtain the feature vector of the drug and the hidden vector of protein. To get a more accurate protein feature vector, the weighted sum of the hidden vector of protein is applied using the neural attention mechanism. In the end, drug and protein vectors are concatenated and input into the full connection layer for classification. In order to evaluate the performance of DCA-DPI, three widely used public data, Human, C.elegans and DUD-E, are used in the experiment. The evaluation metrics values in the experiment are superior to other relevant methods. Experiments show that our model is efficient for DPI prediction.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"286-294"},"PeriodicalIF":4.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10112268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Archana Mathur, Nikhilanand Arya, Kitsuchart Pasupa, Sriparna Saha, Sudeepa Roy Dey, Snehanshu Saha
We present a survey of the current state-of-the-art in breast cancer detection and prognosis. We analyze the evolution of Artificial Intelligence-based approaches from using just uni-modal information to multi-modality for detection and how such paradigm shift facilitates the efficacy of detection, consistent with clinical observations. We conclude that interpretable AI-based predictions and ability to handle class imbalance should be considered priority.
{"title":"Breast cancer prognosis through the use of multi-modal classifiers: current state of the art and the way forward","authors":"Archana Mathur, Nikhilanand Arya, Kitsuchart Pasupa, Sriparna Saha, Sudeepa Roy Dey, Snehanshu Saha","doi":"10.1093/bfgp/elae015","DOIUrl":"https://doi.org/10.1093/bfgp/elae015","url":null,"abstract":"We present a survey of the current state-of-the-art in breast cancer detection and prognosis. We analyze the evolution of Artificial Intelligence-based approaches from using just uni-modal information to multi-modality for detection and how such paradigm shift facilitates the efficacy of detection, consistent with clinical observations. We conclude that interpretable AI-based predictions and ability to handle class imbalance should be considered priority.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"124 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Schizosaccharomyces pombe is a commonly utilized model organism for studying various aspects of eukaryotic cell physiology. One reason for its widespread use as an experimental system is the ease of genetic manipulations, leveraging the natural homology-targeted repair mechanism to accurately modify the genome. We conducted a study to assess the feasibility and efficiency of directly introducing exogenous genes into the fission yeast S. pombe using Polymerase Chain Reaction (PCR) with short-homology flanking sequences. Specifically, we amplified the NatMX6 gene (which provides resistance to nourseothricin) using PCR with oligonucleotides that had short flanking regions of 20 bp, 40 bp, 60 bp and 80 bp to the target gene. By using this purified PCR product, we successfully introduced the NatMX6 gene at position 171 385 on chromosome III in S. pombe. We have made a simple modification to the transformation procedure, resulting in a significant increase in transformation efficiency by at least 5-fold. The success rate of gene integration at the target position varied between 20% and 50% depending on the length of the flanking regions. Additionally, we discovered that the addition of dimethyl sulfoxide and boiled carrier DNA increased the number of transformants by ~60- and 3-fold, respectively. Furthermore, we found that the removal of the pku70+ gene improved the transformation efficiency to ~5% and reduced the formation of small background colonies. Overall, our results demonstrate that with this modified method, even very short stretches of homologous regions (as short as 20 bp) can be used to effectively target genes at a high frequency in S. pombe. This finding greatly facilitates the introduction of exogenous genes in this organism.
在研究真核细胞生理学的各方面问题时,鼠李糖核酶是一种常用的模式生物。它被广泛用作实验系统的原因之一是其易于进行基因操作,利用天然的同源性靶向修复机制来精确地修改基因组。我们进行了一项研究,评估利用聚合酶链式反应(PCR)和短同源侧翼序列将外源基因直接导入裂殖酵母 S. pombe 的可行性和效率。具体来说,我们使用与目标基因侧翼区分别为 20 bp、40 bp、60 bp 和 80 bp 的寡核苷酸进行 PCR 扩增 NatMX6 基因(该基因对诺索三嗪具有抗性)。通过使用这种纯化的 PCR 产物,我们成功地将 NatMX6 基因导入了 S. pombe 的 III 号染色体 171 385 位。我们对转化程序进行了简单修改,使转化效率显著提高了至少 5 倍。根据侧翼区域的长度,目标位置的基因整合成功率在 20% 到 50% 之间。此外,我们还发现,加入二甲基亚砜和煮沸的载体 DNA 可使转化子的数量分别增加约 60 倍和 3 倍。此外,我们还发现去除 pku70+ 基因可将转化效率提高到约 5%,并减少小背景菌落的形成。总之,我们的研究结果表明,使用这种改进的方法,即使是很短的同源区段(短至 20 bp)也能有效地高频率靶向 S. pombe 中的基因。这一发现极大地促进了外源基因在该生物体内的引入。
{"title":"Short-homology-mediated PCR-based method for gene introduction in the fission yeast Schizosaccharomyces pombe","authors":"Cai-Xia Zhang, Ying-Chun Hou","doi":"10.1093/bfgp/elae016","DOIUrl":"https://doi.org/10.1093/bfgp/elae016","url":null,"abstract":"Schizosaccharomyces pombe is a commonly utilized model organism for studying various aspects of eukaryotic cell physiology. One reason for its widespread use as an experimental system is the ease of genetic manipulations, leveraging the natural homology-targeted repair mechanism to accurately modify the genome. We conducted a study to assess the feasibility and efficiency of directly introducing exogenous genes into the fission yeast S. pombe using Polymerase Chain Reaction (PCR) with short-homology flanking sequences. Specifically, we amplified the NatMX6 gene (which provides resistance to nourseothricin) using PCR with oligonucleotides that had short flanking regions of 20 bp, 40 bp, 60 bp and 80 bp to the target gene. By using this purified PCR product, we successfully introduced the NatMX6 gene at position 171 385 on chromosome III in S. pombe. We have made a simple modification to the transformation procedure, resulting in a significant increase in transformation efficiency by at least 5-fold. The success rate of gene integration at the target position varied between 20% and 50% depending on the length of the flanking regions. Additionally, we discovered that the addition of dimethyl sulfoxide and boiled carrier DNA increased the number of transformants by ~60- and 3-fold, respectively. Furthermore, we found that the removal of the pku70+ gene improved the transformation efficiency to ~5% and reduced the formation of small background colonies. Overall, our results demonstrate that with this modified method, even very short stretches of homologous regions (as short as 20 bp) can be used to effectively target genes at a high frequency in S. pombe. This finding greatly facilitates the introduction of exogenous genes in this organism.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"58 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generative molecular models generate novel molecules with desired properties by searching chemical space. Traditional combinatorial optimization methods, such as genetic algorithms, have demonstrated superior performance in various molecular optimization tasks. However, these methods do not utilize docking simulation to inform the design process, and heavy dependence on the quality and quantity of available data, as well as require additional structural optimization to become candidate drugs. To address this limitation, we propose a novel model named DockingGA that combines Transformer neural networks and genetic algorithms to generate molecules with better binding affinity for specific targets. In order to generate high quality molecules, we chose the Self-referencing Chemical Structure Strings to represent the molecule and optimize the binding affinity of the molecules to different targets. Compared to other baseline models, DockingGA proves to be the optimal model in all docking results for the top 1, 10 and 100 molecules, while maintaining 100% novelty. Furthermore, the distribution of physicochemical properties demonstrates the ability of DockingGA to generate molecules with favorable and appropriate properties. This innovation creates new opportunities for the application of generative models in practical drug discovery.
{"title":"DockingGA: enhancing targeted molecule generation using transformer neural network and genetic algorithm with docking simulation","authors":"Changnan Gao, Wenjie Bao, Shuang Wang, Jianyang Zheng, Lulu Wang, Yongqi Ren, Linfang Jiao, Jianmin Wang, Xun Wang","doi":"10.1093/bfgp/elae011","DOIUrl":"https://doi.org/10.1093/bfgp/elae011","url":null,"abstract":"Generative molecular models generate novel molecules with desired properties by searching chemical space. Traditional combinatorial optimization methods, such as genetic algorithms, have demonstrated superior performance in various molecular optimization tasks. However, these methods do not utilize docking simulation to inform the design process, and heavy dependence on the quality and quantity of available data, as well as require additional structural optimization to become candidate drugs. To address this limitation, we propose a novel model named DockingGA that combines Transformer neural networks and genetic algorithms to generate molecules with better binding affinity for specific targets. In order to generate high quality molecules, we chose the Self-referencing Chemical Structure Strings to represent the molecule and optimize the binding affinity of the molecules to different targets. Compared to other baseline models, DockingGA proves to be the optimal model in all docking results for the top 1, 10 and 100 molecules, while maintaining 100% novelty. Furthermore, the distribution of physicochemical properties demonstrates the ability of DockingGA to generate molecules with favorable and appropriate properties. This innovation creates new opportunities for the application of generative models in practical drug discovery.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"69 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140592804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body’s normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.
{"title":"A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs","authors":"Biyu Diao, Jin Luo, Yu Guo","doi":"10.1093/bfgp/elae010","DOIUrl":"https://doi.org/10.1093/bfgp/elae010","url":null,"abstract":"Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body’s normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"69 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140592722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analysis of cell-cell communication (CCC) in the tumor micro-environment helps decipher the underlying mechanism of cancer progression and drug tolerance. Currently, single-cell RNA-Seq data are available on a large scale, providing an unprecedented opportunity to predict cellular communications. There have been many achievements and applications in inferring cell-cell communication based on the known interactions between molecules, such as ligands, receptors and extracellular matrix. However, the prior information is not quite adequate and only involves a fraction of cellular communications, producing many false-positive or false-negative results. To this end, we propose an improved hierarchical variational autoencoder (HiVAE) based model to fully use single-cell RNA-seq data for automatically estimating CCC. Specifically, the HiVAE model is used to learn the potential representation of cells on known ligand-receptor genes and all genes in single-cell RNA-seq data, respectively, which are then utilized for cascade integration. Subsequently, transfer entropy is employed to measure the transmission of information flow between two cells based on the learned representations, which are regarded as directed communication relationships. Experiments are conducted on single-cell RNA-seq data of the human skin disease dataset and the melanoma dataset, respectively. Results show that the HiVAE model is effective in learning cell representations, and transfer entropy could be used to estimate the communication scores between cell types.
{"title":"An improved hierarchical variational autoencoder for cell-cell communication estimation using single-cell RNA-seq data.","authors":"Shuhui Liu, Yupei Zhang, Jiajie Peng, Xuequn Shang","doi":"10.1093/bfgp/elac056","DOIUrl":"10.1093/bfgp/elac056","url":null,"abstract":"<p><p>Analysis of cell-cell communication (CCC) in the tumor micro-environment helps decipher the underlying mechanism of cancer progression and drug tolerance. Currently, single-cell RNA-Seq data are available on a large scale, providing an unprecedented opportunity to predict cellular communications. There have been many achievements and applications in inferring cell-cell communication based on the known interactions between molecules, such as ligands, receptors and extracellular matrix. However, the prior information is not quite adequate and only involves a fraction of cellular communications, producing many false-positive or false-negative results. To this end, we propose an improved hierarchical variational autoencoder (HiVAE) based model to fully use single-cell RNA-seq data for automatically estimating CCC. Specifically, the HiVAE model is used to learn the potential representation of cells on known ligand-receptor genes and all genes in single-cell RNA-seq data, respectively, which are then utilized for cascade integration. Subsequently, transfer entropy is employed to measure the transmission of information flow between two cells based on the learned representations, which are regarded as directed communication relationships. Experiments are conducted on single-cell RNA-seq data of the human skin disease dataset and the melanoma dataset, respectively. Results show that the HiVAE model is effective in learning cell representations, and transfer entropy could be used to estimate the communication scores between cell types.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"118-127"},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9222533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liangrui Ren, Jun Wang, Wei Li, Maozu Guo, Guoxian Yu
Determining cell types by single-cell transcriptomics data is fundamental for downstream analysis. However, cell clustering and data imputation still face the computation challenges, due to the high dropout rate, sparsity and dimensionality of single-cell data. Although some deep learning based solutions have been proposed to handle these challenges, they still can not leverage gene attribute information and cell topology in a sensible way to explore the consistent clustering. In this paper, we present scDeepFC, a deep information fusion-based single-cell data clustering method for cell clustering and data imputation. Specifically, scDeepFC uses a deep auto-encoder (DAE) network and a deep graph convolution network to embed high-dimensional gene attribute information and high-order cell-cell topological information into different low-dimensional representations, and then fuses them to generate a more comprehensive and accurate consensus representation via a deep information fusion network. In addition, scDeepFC integrates the zero-inflated negative binomial (ZINB) into DAE to model the dropout events. By jointly optimizing the ZINB loss and cell graph reconstruction loss, scDeepFC generates a salient embedding representation for clustering cells and imputing missing data. Extensive experiments on real single-cell datasets prove that scDeepFC outperforms other popular single-cell analysis methods. Both the gene attribute and cell topology information can improve the cell clustering.
{"title":"Single-cell RNA-seq data clustering by deep information fusion.","authors":"Liangrui Ren, Jun Wang, Wei Li, Maozu Guo, Guoxian Yu","doi":"10.1093/bfgp/elad017","DOIUrl":"10.1093/bfgp/elad017","url":null,"abstract":"<p><p>Determining cell types by single-cell transcriptomics data is fundamental for downstream analysis. However, cell clustering and data imputation still face the computation challenges, due to the high dropout rate, sparsity and dimensionality of single-cell data. Although some deep learning based solutions have been proposed to handle these challenges, they still can not leverage gene attribute information and cell topology in a sensible way to explore the consistent clustering. In this paper, we present scDeepFC, a deep information fusion-based single-cell data clustering method for cell clustering and data imputation. Specifically, scDeepFC uses a deep auto-encoder (DAE) network and a deep graph convolution network to embed high-dimensional gene attribute information and high-order cell-cell topological information into different low-dimensional representations, and then fuses them to generate a more comprehensive and accurate consensus representation via a deep information fusion network. In addition, scDeepFC integrates the zero-inflated negative binomial (ZINB) into DAE to model the dropout events. By jointly optimizing the ZINB loss and cell graph reconstruction loss, scDeepFC generates a salient embedding representation for clustering cells and imputing missing data. Extensive experiments on real single-cell datasets prove that scDeepFC outperforms other popular single-cell analysis methods. Both the gene attribute and cell topology information can improve the cell clustering.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"128-137"},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9489133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}