Rong Liao, Yan Zhuang, Xiangfeng Li, Ke Chen, Xingming Wang, Cong Feng, Guangfu Yin, Xiangdong Zhu, Jiangli Lin, Xingdong Zhang
{"title":"Unveiling Protein Corona Composition: Predicting with Resampling Embedding and Machine Learning","authors":"Rong Liao, Yan Zhuang, Xiangfeng Li, Ke Chen, Xingming Wang, Cong Feng, Guangfu Yin, Xiangdong Zhu, Jiangli Lin, Xingdong Zhang","doi":"10.1093/rb/rbad082","DOIUrl":null,"url":null,"abstract":"Biomaterials with surface nanostructures effectively enhance protein secretion and stimulate tissue regeneration. When nanoparticles (NPs) enter the living system, they quickly interact with proteins in the body fluid, forming the protein corona (PC). The accurate prediction of the PC composition is critical for analyzing the osteoinductivity of biomaterials and guiding the reverse design of NPs. However, achieving accurate predictions remains a significant challenge. Although several machine learning (ML) models like RandomForest (RF) have been used for PC prediction, they often fail to consider the extreme values in the abundance region of PC absorption and struggle to improve accuracy due to the imbalanced data distribution. In this study, resampling embedding was introduced to resolve the issue of imbalanced distribution in PC data. Various ML models were evaluated, and RF model was finally used for prediction, and good correlation coefficient (R2) and Root-mean-square deviation (RMSE) values were obtained. Our ablation experiments demonstrated that the proposed method achieved an R2 of 0.68, indicating an improvement of approximately 10%, and an RMSE of 0.90, representing a reduction of approximately 10%. Furthermore, through the verification of label-free quantification of 4 NPs: hydroxyapatite (HA), titanium dioxide (TiO2), silicon dioxide (SiO2) and silver (Ag), and we achieved a prediction performance with an R2 value above 0.70 using Random Oversampling. Additionally, the feature analysis revealed that the composition of the PC is most significantly influenced by the incubation plasma concentration, PDI and surface modification.","PeriodicalId":20929,"journal":{"name":"Regenerative Biomaterials","volume":"25 1","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Regenerative Biomaterials","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1093/rb/rbad082","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
Abstract
Biomaterials with surface nanostructures effectively enhance protein secretion and stimulate tissue regeneration. When nanoparticles (NPs) enter the living system, they quickly interact with proteins in the body fluid, forming the protein corona (PC). The accurate prediction of the PC composition is critical for analyzing the osteoinductivity of biomaterials and guiding the reverse design of NPs. However, achieving accurate predictions remains a significant challenge. Although several machine learning (ML) models like RandomForest (RF) have been used for PC prediction, they often fail to consider the extreme values in the abundance region of PC absorption and struggle to improve accuracy due to the imbalanced data distribution. In this study, resampling embedding was introduced to resolve the issue of imbalanced distribution in PC data. Various ML models were evaluated, and RF model was finally used for prediction, and good correlation coefficient (R2) and Root-mean-square deviation (RMSE) values were obtained. Our ablation experiments demonstrated that the proposed method achieved an R2 of 0.68, indicating an improvement of approximately 10%, and an RMSE of 0.90, representing a reduction of approximately 10%. Furthermore, through the verification of label-free quantification of 4 NPs: hydroxyapatite (HA), titanium dioxide (TiO2), silicon dioxide (SiO2) and silver (Ag), and we achieved a prediction performance with an R2 value above 0.70 using Random Oversampling. Additionally, the feature analysis revealed that the composition of the PC is most significantly influenced by the incubation plasma concentration, PDI and surface modification.
期刊介绍:
Regenerative Biomaterials is an international, interdisciplinary, peer-reviewed journal publishing the latest advances in biomaterials and regenerative medicine. The journal provides a forum for the publication of original research papers, reviews, clinical case reports, and commentaries on the topics relevant to the development of advanced regenerative biomaterials concerning novel regenerative technologies and therapeutic approaches for the regeneration and repair of damaged tissues and organs. The interactions of biomaterials with cells and tissue, especially with stem cells, will be of particular focus.