Nuclear magnetic resonance (NMR), which is famous for its nondestructive nature and high reliability, is one of the principal analytical platforms in metabolomics. NMR metabolomics has been widely used in human and environmental health studies in the past few decades. However, NMR metabolomics data processing remains challenging due to data complexity. Although automated approaches have been explored, their reliability and accuracy are still limited. One of the limitations is the lack of cross-evaluation of the same peak in all samples during raw data processing. In this study, we developed a new approach that applies machine learning models to evaluate the peak quality for all the samples in an NMR metabolomics study. Our new approach combines the automatically selected potential peaks from all samples into a new spectrum for each peak (potential metabolite), which provides an overview of all the samples to ensure the overall data quality for the downstream statistical analysis. The results indicated that two machine learning approaches, Support Vector Machine Discriminant Analysis (SVMDA) and Extreme Gradient Boosting Discriminant Analysis (XGBDA), demonstrated high prediction rates in identifying high-quality peaks. In addition, the raw data conversion resolution was tested to optimize the performance of each machine learning approach, and XGBDA showed better tolerance to data resolution. The results indicated that machine learning approaches, such as SVMDA and XGBDA, can be used to identify high-quality peaks generated through automated peak picking, ensuring data quality for metabolomics studies. Our study paves the way for automated data processing in future NMR metabolomics research.
扫码关注我们
求助内容:
应助结果提醒方式:
