Gong Lejun, Yu Like, Wei Xinyi, Zhou Shehai, Xu Shuhua
{"title":"SeqBMC: Single-cell data processing using iterative block matrix completion algorithm based on matrix factorisation","authors":"Gong Lejun, Yu Like, Wei Xinyi, Zhou Shehai, Xu Shuhua","doi":"10.1049/syb2.70003","DOIUrl":null,"url":null,"abstract":"<p>With the development of high-throughput sequencing technology, the analysis of single-cell RNA sequencing data has become the focus of current research. Matrix analysis and processing of downstream gene expression after preprocessing is a hot topic for researchers. This paper proposed an iterative block matrix completion algorithm, called SeqBMC, based on matrix factorisation. The algorithm is used to complete the missing value of the gene expression matrix caused by the defect of sequencing technology. The gene frequency of the matrix is used to block the matrix, and then the matrix factorisation algorithm is used to complete the small matrix after the block, and then the biological zeros that may exist in the block matrix are retained. Experimental results show that the matrix completion algorithm can significantly improve the classification performance of the gene expression matrix after completion with 86.81% F1 score, which is conducive to the recognition of cell types in sequencing data. Moreover, this completion method can be completed only by the machine learning method without too much prior knowledge related to biology and has good effects. Compared with ALRA, SeqBMC increased 5.47% accuracy and 5.03% F1 score. It indicates that SeqBMC has significant advantages in the matrix completion of single-cell RNA sequencing data.</p>","PeriodicalId":50379,"journal":{"name":"IET Systems Biology","volume":"19 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/syb2.70003","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/syb2.70003","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
With the development of high-throughput sequencing technology, the analysis of single-cell RNA sequencing data has become the focus of current research. Matrix analysis and processing of downstream gene expression after preprocessing is a hot topic for researchers. This paper proposed an iterative block matrix completion algorithm, called SeqBMC, based on matrix factorisation. The algorithm is used to complete the missing value of the gene expression matrix caused by the defect of sequencing technology. The gene frequency of the matrix is used to block the matrix, and then the matrix factorisation algorithm is used to complete the small matrix after the block, and then the biological zeros that may exist in the block matrix are retained. Experimental results show that the matrix completion algorithm can significantly improve the classification performance of the gene expression matrix after completion with 86.81% F1 score, which is conducive to the recognition of cell types in sequencing data. Moreover, this completion method can be completed only by the machine learning method without too much prior knowledge related to biology and has good effects. Compared with ALRA, SeqBMC increased 5.47% accuracy and 5.03% F1 score. It indicates that SeqBMC has significant advantages in the matrix completion of single-cell RNA sequencing data.
期刊介绍:
IET Systems Biology covers intra- and inter-cellular dynamics, using systems- and signal-oriented approaches. Papers that analyse genomic data in order to identify variables and basic relationships between them are considered if the results provide a basis for mathematical modelling and simulation of cellular dynamics. Manuscripts on molecular and cell biological studies are encouraged if the aim is a systems approach to dynamic interactions within and between cells.
The scope includes the following topics:
Genomics, transcriptomics, proteomics, metabolomics, cells, tissue and the physiome; molecular and cellular interaction, gene, cell and protein function; networks and pathways; metabolism and cell signalling; dynamics, regulation and control; systems, signals, and information; experimental data analysis; mathematical modelling, simulation and theoretical analysis; biological modelling, simulation, prediction and control; methodologies, databases, tools and algorithms for modelling and simulation; modelling, analysis and control of biological networks; synthetic biology and bioengineering based on systems biology.