{"title":"用于将分段(片断)线性模型与 GAMLSS 模型进行比较的精炼肺活量数据集","authors":"Gerald Stanley Zavorsky","doi":"10.1016/j.dib.2024.111062","DOIUrl":null,"url":null,"abstract":"<div><div>Generalized Additive Models for Location, Scale, and Shape (GAMLSS) are widely used for developing spirometric reference equations but are often complex, requiring additional spline tables. This study explores the potential of Segmented (piecewise) Linear Regression as an alternative, comparing its predictive accuracy to GAMLSS and examining the agreement between the two methods. Spirometry data from nearly 16,600 patients, deemed Grade “A” and “B” acceptable from the NHANES 2007-2012 dataset, was analyzed. The dataset includes both nominal and scalar variables. Reference equations for forced expiratory volume in 1 s (FEV<sub>1</sub>), forced vital capacity (FVC), and the ratio (FEV<sub>1</sub>/FVC) were generated using GAMLSS (FEV<sub>1</sub>, FVC, FEV<sub>1</sub>/FVC), Segmented Linear Regression (FEV<sub>1</sub>, FVC) and multiple linear regression (FEV<sub>1</sub>/FVC). <em>K</em>-fold cross-validation was employed to compare prediction accuracy, using root-mean-square error (RMSE) and correlation coefficients. Agreement in classifying spirometric patterns (i.e. airway obstruction, restrictive spirometry pattern, mixed obstructive and restrictive disorder) was evaluated with the kappa statistic. This study uniquely compares the models by incorporating the lower limit of normal (LLN) using fitted z-scores of –1.645 or –1.96. The dataset is publicly available in SPSS (.sav) and .csv formats through the Mendeley Data repository.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 111062"},"PeriodicalIF":1.0000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A refined spirometry dataset for comparing segmented (piecewise) linear models to that of GAMLSS\",\"authors\":\"Gerald Stanley Zavorsky\",\"doi\":\"10.1016/j.dib.2024.111062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Generalized Additive Models for Location, Scale, and Shape (GAMLSS) are widely used for developing spirometric reference equations but are often complex, requiring additional spline tables. This study explores the potential of Segmented (piecewise) Linear Regression as an alternative, comparing its predictive accuracy to GAMLSS and examining the agreement between the two methods. Spirometry data from nearly 16,600 patients, deemed Grade “A” and “B” acceptable from the NHANES 2007-2012 dataset, was analyzed. The dataset includes both nominal and scalar variables. Reference equations for forced expiratory volume in 1 s (FEV<sub>1</sub>), forced vital capacity (FVC), and the ratio (FEV<sub>1</sub>/FVC) were generated using GAMLSS (FEV<sub>1</sub>, FVC, FEV<sub>1</sub>/FVC), Segmented Linear Regression (FEV<sub>1</sub>, FVC) and multiple linear regression (FEV<sub>1</sub>/FVC). <em>K</em>-fold cross-validation was employed to compare prediction accuracy, using root-mean-square error (RMSE) and correlation coefficients. Agreement in classifying spirometric patterns (i.e. airway obstruction, restrictive spirometry pattern, mixed obstructive and restrictive disorder) was evaluated with the kappa statistic. This study uniquely compares the models by incorporating the lower limit of normal (LLN) using fitted z-scores of –1.645 or –1.96. The dataset is publicly available in SPSS (.sav) and .csv formats through the Mendeley Data repository.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"57 \",\"pages\":\"Article 111062\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352340924010242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340924010242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
位置、尺度和形状的广义加法模型(GAMLSS)被广泛用于制定肺活量参考方程,但通常比较复杂,需要额外的样条表。本研究探索了分段(片断)线性回归作为替代方法的潜力,将其预测准确性与 GAMLSS 进行了比较,并检验了两种方法之间的一致性。研究分析了近 16600 名患者的肺活量数据,这些数据来自 2007-2012 年 NHANES 数据集,被视为 "A "级和 "B "级可接受数据。数据集包括名义变量和标量变量。使用 GAMLSS(FEV1、FVC、FEV1/FVC)、分段线性回归(FEV1、FVC)和多元线性回归(FEV1/FVC)生成了 1 秒用力呼气容积(FEV1)、用力生命容量(FVC)和比率(FEV1/FVC)的参考方程。使用均方根误差(RMSE)和相关系数对预测准确性进行了 K 倍交叉验证。用卡帕统计量评估了肺活量测量模式(即气道阻塞、限制性肺活量测量模式、阻塞性和限制性混合障碍)分类的一致性。本研究通过使用拟合 Z 分数-1.645 或-1.96,将正常值下限(LLN)纳入模型,对模型进行了独特的比较。数据集以 SPSS (.sav) 和 .csv 格式通过 Mendeley 数据库公开发布。
A refined spirometry dataset for comparing segmented (piecewise) linear models to that of GAMLSS
Generalized Additive Models for Location, Scale, and Shape (GAMLSS) are widely used for developing spirometric reference equations but are often complex, requiring additional spline tables. This study explores the potential of Segmented (piecewise) Linear Regression as an alternative, comparing its predictive accuracy to GAMLSS and examining the agreement between the two methods. Spirometry data from nearly 16,600 patients, deemed Grade “A” and “B” acceptable from the NHANES 2007-2012 dataset, was analyzed. The dataset includes both nominal and scalar variables. Reference equations for forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), and the ratio (FEV1/FVC) were generated using GAMLSS (FEV1, FVC, FEV1/FVC), Segmented Linear Regression (FEV1, FVC) and multiple linear regression (FEV1/FVC). K-fold cross-validation was employed to compare prediction accuracy, using root-mean-square error (RMSE) and correlation coefficients. Agreement in classifying spirometric patterns (i.e. airway obstruction, restrictive spirometry pattern, mixed obstructive and restrictive disorder) was evaluated with the kappa statistic. This study uniquely compares the models by incorporating the lower limit of normal (LLN) using fitted z-scores of –1.645 or –1.96. The dataset is publicly available in SPSS (.sav) and .csv formats through the Mendeley Data repository.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.