基于多元统计分析探讨乳腺癌患者年龄、BMI及血液成分的特征

IF 4.6 2区数学 Q1 MATHEMATICS, APPLIED Applied and Computational Mathematics Pub Date : 2020-08-22 DOI:10.11648/j.acm.20200904.15

R. Dong

{"title":"基于多元统计分析探讨乳腺癌患者年龄、BMI及血液成分的特征","authors":"R. Dong","doi":"10.11648/j.acm.20200904.15","DOIUrl":null,"url":null,"abstract":"In this paper, through a series of analysis and testing of breast cancer detection data, the statistical rules of multiple objects and multiple indicators are analyzed in the case of their correlation. First of all, univariate diagnosis and multivariate diagnosis were performed on the data. Among them, when studying the correlation between variables, it was found that HOMA had a clear linear positive correlation with insulin content in blood. It is worth noting that some patients with breast cancer show a high degree of insulin resistance and blood insulin content, which is a feature not found in samples without breast cancer. Then, through single factor analysis of variance, we believe that there were significant differences in blood test conditions, ages, and BMI indicators of samples of different health conditions. Next, the principal component analysis was used to reduce the dimension of the data. In this study, the differences in age, BMI, and blood component content between the two groups with different health conditions can be summarized by these two independent factors. Among them, the absolute value of the MCP-1 (monocyte chemoattractant protein 1) coefficient in the main component 1 is large, reflecting the characteristics of the blood component of the sample; the load values of glucose and leptin in the main component 2 are large, reflecting similar results. Then, assuming the use of m = 3 factor model and the use of maximum likelihood method and principal component method, the original data and factor rotation data are re-analyzed, so that the variables are reduced to 3 factors for analysis. Among them, the maximum likelihood method is used to estimate the factor rotation data. The first factor reflects the insulin resistance factor attributed to insulin and HOMA indicators, and the second factor reflects the body fat and thin factor attributed to BMI and leptin. The third factor reflects the glucose content in the blood. Finally, by setting different misjudgment costs for discriminant analysis, the obtained APER is 0.1638 and EAER is 0.1872. Among them, the probability of discriminating patients with breast cancer from not having breast cancer is 0.09375, which is a low rate of misjudgment and also means the model established in this paper is efficient.","PeriodicalId":55503,"journal":{"name":"Applied and Computational Mathematics","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2020-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis\",\"authors\":\"R. Dong\",\"doi\":\"10.11648/j.acm.20200904.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, through a series of analysis and testing of breast cancer detection data, the statistical rules of multiple objects and multiple indicators are analyzed in the case of their correlation. First of all, univariate diagnosis and multivariate diagnosis were performed on the data. Among them, when studying the correlation between variables, it was found that HOMA had a clear linear positive correlation with insulin content in blood. It is worth noting that some patients with breast cancer show a high degree of insulin resistance and blood insulin content, which is a feature not found in samples without breast cancer. Then, through single factor analysis of variance, we believe that there were significant differences in blood test conditions, ages, and BMI indicators of samples of different health conditions. Next, the principal component analysis was used to reduce the dimension of the data. In this study, the differences in age, BMI, and blood component content between the two groups with different health conditions can be summarized by these two independent factors. Among them, the absolute value of the MCP-1 (monocyte chemoattractant protein 1) coefficient in the main component 1 is large, reflecting the characteristics of the blood component of the sample; the load values of glucose and leptin in the main component 2 are large, reflecting similar results. Then, assuming the use of m = 3 factor model and the use of maximum likelihood method and principal component method, the original data and factor rotation data are re-analyzed, so that the variables are reduced to 3 factors for analysis. Among them, the maximum likelihood method is used to estimate the factor rotation data. The first factor reflects the insulin resistance factor attributed to insulin and HOMA indicators, and the second factor reflects the body fat and thin factor attributed to BMI and leptin. The third factor reflects the glucose content in the blood. Finally, by setting different misjudgment costs for discriminant analysis, the obtained APER is 0.1638 and EAER is 0.1872. Among them, the probability of discriminating patients with breast cancer from not having breast cancer is 0.09375, which is a low rate of misjudgment and also means the model established in this paper is efficient.\",\"PeriodicalId\":55503,\"journal\":{\"name\":\"Applied and Computational Mathematics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2020-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied and Computational Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.11648/j.acm.20200904.15\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied and Computational Mathematics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.11648/j.acm.20200904.15","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

本文通过对乳腺癌检测数据的一系列分析和检验，分析了多对象、多指标在相关性情况下的统计规律。首先对数据进行单因素诊断和多因素诊断。其中，在研究变量间的相关性时，发现HOMA与血液中胰岛素含量有明显的线性正相关。值得注意的是，一些乳腺癌患者表现出高度的胰岛素抵抗和血液胰岛素含量，这是在没有乳腺癌的样本中没有发现的特征。然后，通过单因素方差分析，我们认为不同健康状况样本的血液检查条件、年龄、BMI指标存在显著差异。其次，采用主成分分析法对数据进行降维。在本研究中，不同健康状况的两组在年龄、BMI和血液成分含量上的差异可以通过这两个独立的因素来总结。其中，主成分1中的MCP-1(单核细胞趋化蛋白1)系数绝对值较大，反映了样品血液成分的特点;主成分2中葡萄糖和瘦素的负荷值较大，反映出相似的结果。然后，假设使用m = 3因子模型，使用极大似然法和主成分法，对原始数据和因子旋转数据进行重新分析，使变量减少到3个因子进行分析。其中，最大似然法用于估计因子旋转数据。第一个因素反映了胰岛素和HOMA指标导致的胰岛素抵抗因素，第二个因素反映了BMI和leptin导致的体脂和体瘦因素。第三个因素反映了血液中的葡萄糖含量。最后，通过设置不同的误判代价进行判别分析，得到的APER为0.1638,EAER为0.1872。其中，判别乳腺癌患者与非乳腺癌患者的概率为0.09375，误判率较低，也说明本文建立的模型是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis

In this paper, through a series of analysis and testing of breast cancer detection data, the statistical rules of multiple objects and multiple indicators are analyzed in the case of their correlation. First of all, univariate diagnosis and multivariate diagnosis were performed on the data. Among them, when studying the correlation between variables, it was found that HOMA had a clear linear positive correlation with insulin content in blood. It is worth noting that some patients with breast cancer show a high degree of insulin resistance and blood insulin content, which is a feature not found in samples without breast cancer. Then, through single factor analysis of variance, we believe that there were significant differences in blood test conditions, ages, and BMI indicators of samples of different health conditions. Next, the principal component analysis was used to reduce the dimension of the data. In this study, the differences in age, BMI, and blood component content between the two groups with different health conditions can be summarized by these two independent factors. Among them, the absolute value of the MCP-1 (monocyte chemoattractant protein 1) coefficient in the main component 1 is large, reflecting the characteristics of the blood component of the sample; the load values of glucose and leptin in the main component 2 are large, reflecting similar results. Then, assuming the use of m = 3 factor model and the use of maximum likelihood method and principal component method, the original data and factor rotation data are re-analyzed, so that the variables are reduced to 3 factors for analysis. Among them, the maximum likelihood method is used to estimate the factor rotation data. The first factor reflects the insulin resistance factor attributed to insulin and HOMA indicators, and the second factor reflects the body fat and thin factor attributed to BMI and leptin. The third factor reflects the glucose content in the blood. Finally, by setting different misjudgment costs for discriminant analysis, the obtained APER is 0.1638 and EAER is 0.1872. Among them, the probability of discriminating patients with breast cancer from not having breast cancer is 0.09375, which is a low rate of misjudgment and also means the model established in this paper is efficient.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied and Computational Mathematics 数学-应用数学

CiteScore

8.80

自引率

5.00%

发文量

审稿时长

6 months

期刊介绍： Applied and Computational Mathematics (ISSN Online: 2328-5613, ISSN Print: 2328-5605) is a prestigious journal that focuses on the field of applied and computational mathematics. It is driven by the computational revolution and places a strong emphasis on innovative applied mathematics with potential for real-world applicability and practicality. The journal caters to a broad audience of applied mathematicians and scientists who are interested in the advancement of mathematical principles and practical aspects of computational mathematics. Researchers from various disciplines can benefit from the diverse range of topics covered in ACM. To ensure the publication of high-quality content, all research articles undergo a rigorous peer review process. This process includes an initial screening by the editors and anonymous evaluation by expert reviewers. This guarantees that only the most valuable and accurate research is published in ACM.