首页 > 最新文献

Annals of Data Science最新文献

英文 中文
Bayesian Learning of Personalized Longitudinal Biomarker Trajectory 个性化纵向生物标志物轨迹的贝叶斯学习
Q1 Decision Sciences Pub Date : 2023-08-01 DOI: 10.1007/s40745-023-00486-0
Shouhao Zhou, Xuelin Huang, Chan Shen, Hagop M. Kantarjian

This work concerns the effective personalized prediction of longitudinal biomarker trajectory, motivated by a study of cancer targeted therapy for patients with chronic myeloid leukemia (CML). Continuous monitoring with a confirmed biomarker of residual disease is a key component of CML management for early prediction of disease relapse. However, the longitudinal biomarker measurements have highly heterogeneous trajectories between subjects (patients) with various shapes and patterns. It is believed that the trajectory is clinically related to the development of treatment resistance, but there was limited knowledge about the underlying mechanism. To address the challenge, we propose a novel Bayesian approach to modeling the distribution of subject-specific longitudinal trajectories. It exploits flexible Bayesian learning to accommodate complex changing patterns over time and non-linear covariate effects, and allows for real-time prediction of both in-sample and out-of-sample subjects. The generated information can help make clinical decisions, and consequently enhance the personalized treatment management of precision medicine.

这项工作涉及对纵向生物标志物轨迹进行有效的个性化预测,其动机是对慢性髓性白血病(CML)患者进行癌症靶向治疗研究。用确诊的生物标志物对残留疾病进行持续监测是 CML 治疗的关键组成部分,可用于疾病复发的早期预测。然而,不同受试者(患者)之间的纵向生物标志物测量结果具有高度异质性的轨迹,形状和模式各不相同。人们认为这种轨迹在临床上与治疗耐药性的发展有关,但对其潜在机制的了解却很有限。为了应对这一挑战,我们提出了一种新颖的贝叶斯方法来模拟受试者特定纵向轨迹的分布。它利用灵活的贝叶斯学习来适应复杂的随时间变化的模式和非线性协变量效应,并允许对样本内和样本外受试者进行实时预测。生成的信息有助于临床决策,从而加强精准医学的个性化治疗管理。
{"title":"Bayesian Learning of Personalized Longitudinal Biomarker Trajectory","authors":"Shouhao Zhou,&nbsp;Xuelin Huang,&nbsp;Chan Shen,&nbsp;Hagop M. Kantarjian","doi":"10.1007/s40745-023-00486-0","DOIUrl":"10.1007/s40745-023-00486-0","url":null,"abstract":"<div><p>This work concerns the effective personalized prediction of longitudinal biomarker trajectory, motivated by a study of cancer targeted therapy for patients with chronic myeloid leukemia (CML). Continuous monitoring with a confirmed biomarker of residual disease is a key component of CML management for early prediction of disease relapse. However, the longitudinal biomarker measurements have highly heterogeneous trajectories between subjects (patients) with various shapes and patterns. It is believed that the trajectory is clinically related to the development of treatment resistance, but there was limited knowledge about the underlying mechanism. To address the challenge, we propose a novel Bayesian approach to modeling the distribution of subject-specific longitudinal trajectories. It exploits flexible Bayesian learning to accommodate complex changing patterns over time and non-linear covariate effects, and allows for real-time prediction of both in-sample and out-of-sample subjects. The generated information can help make clinical decisions, and consequently enhance the personalized treatment management of precision medicine.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"1031 - 1050"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46463104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applications of Reliability Test Plan for Logistic Rayleigh Distributed Quality Characteristic 物流瑞利分布质量特性可靠性试验计划的应用
Q1 Decision Sciences Pub Date : 2023-07-19 DOI: 10.1007/s40745-023-00473-5
Mahendra Saha, Harsh Tripathi, Anju Devi, Pratibha Pareek

In this article, a reliability test plan under time truncated life test is considered for the logistic Rayleigh distribution ((mathcal {LRD})). A brief discussion over statistical properties and significance of the (mathcal {LRD}) is placed in this present study. Larger the value of median—better is the quality of the lot is considered as quality characteristic for the proposed reliability test plan. Minimum sample sizes are placed in tabular form for different set up of specified consumer’s risk. Also operating characteristics ((mathcal{O}mathcal{C})) values are shown in tabular forms for the chosen set up and discussed the pattern of (mathcal{O}mathcal{C}) values. A comparative analysis of the present study with some other reliability test plans is discussed based on the sample sizes. As an illustration, the performance of the proposed plan for the (mathcal {LRD}) is shown through real-life examples.

本文考虑了时间截断寿命试验下的可靠性试验计划,即 logistic Rayleigh 分布((mathcal {LRD}))。本文简要讨论了 (mathcal {LRD}) 的统计特性和意义。中值越大,批次质量越好,这被认为是建议的可靠性测试计划的质量特征。针对不同的消费者风险设置,最小样本量以表格形式列出。此外,还以表格形式显示了所选设置的运行特征((mathcal{O}mathcal{C}))值,并讨论了(mathcal{O}mathcal{C})值的模式。根据样本量,讨论了本研究与其他一些可靠性测试计划的比较分析。作为说明,通过实际例子展示了所建议的计划在 (mathcal {LRD}) 方面的性能。
{"title":"Applications of Reliability Test Plan for Logistic Rayleigh Distributed Quality Characteristic","authors":"Mahendra Saha,&nbsp;Harsh Tripathi,&nbsp;Anju Devi,&nbsp;Pratibha Pareek","doi":"10.1007/s40745-023-00473-5","DOIUrl":"10.1007/s40745-023-00473-5","url":null,"abstract":"<div><p>In this article, a reliability test plan under time truncated life test is considered for the logistic Rayleigh distribution (<span>(mathcal {LRD})</span>). A brief discussion over statistical properties and significance of the <span>(mathcal {LRD})</span> is placed in this present study. Larger the value of median—better is the quality of the lot is considered as quality characteristic for the proposed reliability test plan. Minimum sample sizes are placed in tabular form for different set up of specified consumer’s risk. Also operating characteristics (<span>(mathcal{O}mathcal{C})</span>) values are shown in tabular forms for the chosen set up and discussed the pattern of <span>(mathcal{O}mathcal{C})</span> values. A comparative analysis of the present study with some other reliability test plans is discussed based on the sample sizes. As an illustration, the performance of the proposed plan for the <span>(mathcal {LRD})</span> is shown through real-life examples.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1687 - 1703"},"PeriodicalIF":0.0,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43684187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patient Questionnaires Based Parkinson’s Disease Classification Using Artificial Neural Network 基于患者问卷的帕金森病人工神经网络分类
Q1 Decision Sciences Pub Date : 2023-07-15 DOI: 10.1007/s40745-023-00482-4
Tarakashar Das, Sabrina Mobassirin, Syed Md. Minhaz Hossain, Aka Das, Anik Sen, Khaleque Md. Aashiq Kamal, Kaushik Deb

Parkinson’s disease is one of the most prevalent and harmful neurodegenerative conditions (PD). Even today, PD diagnosis and monitoring remain pricy and inconvenient processes. With the unprecedented progress of artificial intelligence algorithms, there is an opportunity to develop a cost-effective system for diagnosing PD at an earlier stage. No permanent remedy has been established yet; however, an earlier diagnosis helps lead a better life. Probably, the three most responsible categories of symptoms for Parkinson’s Disease are tremors, rigidity, and body bradykinesia. Therefore, we investigate the 53 unique features of the Parkinson’s Progression Markers Initiative dataset to determine the significant symptoms, including three major categories. As feature selection is integral to developing a generalized model, we investigate including and excluding feature selection. Four feature selection methods are incorporated—low variance filter, Wilcoxon rank-sum test, principle component analysis, and Chi-square test. Furthermore, we utilize machine learning, ensemble learning, and artificial neural networks (ANN) for classification. Experimental evidence shows that not all symptoms are equally important, but no symptom can be completely eliminated. However, our proposed ANN model attains the best mean accuracy of 99.51%, 98.17% mean specificity, 0.9830 mean Kappa Score, 0.99 mean AUC, and 99.70% mean F1-score with all the features. The efficiency of our suggested technique on diverse data modalities is demonstrated by comparison with recent publications. Finally, we established a trade-off between classification time and accuracy.

帕金森病是最常见、危害最大的神经退行性疾病(PD)之一。时至今日,帕金森病的诊断和监测过程仍然昂贵而不便。随着人工智能算法取得前所未有的进步,我们有机会开发出一种经济高效的系统,用于早期诊断帕金森病。然而,早期诊断有助于改善生活质量。帕金森病最主要的三类症状可能是震颤、僵直和肢体运动迟缓。因此,我们研究了帕金森病进展标志物倡议数据集的 53 个独特特征,以确定包括三大类在内的重要症状。由于特征选择是开发广义模型不可或缺的一部分,因此我们对包括和不包括特征选择进行了研究。我们采用了四种特征选择方法--低方差过滤器、Wilcoxon 秩和检验、原理成分分析和卡方检验。此外,我们还利用机器学习、集合学习和人工神经网络(ANN)进行分类。实验证据表明,并非所有症状都同等重要,但没有任何症状可以完全排除。然而,我们提出的人工神经网络模型在所有特征中取得了最佳的平均准确率(99.51%)、平均特异性(98.17%)、平均 Kappa 分数(0.9830)、平均 AUC(0.99)和平均 F1 分数(99.70%)。通过与最近发表的文章进行比较,我们证明了所建议的技术在不同数据模式下的效率。最后,我们在分类时间和准确性之间进行了权衡。
{"title":"Patient Questionnaires Based Parkinson’s Disease Classification Using Artificial Neural Network","authors":"Tarakashar Das,&nbsp;Sabrina Mobassirin,&nbsp;Syed Md. Minhaz Hossain,&nbsp;Aka Das,&nbsp;Anik Sen,&nbsp;Khaleque Md. Aashiq Kamal,&nbsp;Kaushik Deb","doi":"10.1007/s40745-023-00482-4","DOIUrl":"10.1007/s40745-023-00482-4","url":null,"abstract":"<div><p>Parkinson’s disease is one of the most prevalent and harmful neurodegenerative conditions (PD). Even today, PD diagnosis and monitoring remain pricy and inconvenient processes. With the unprecedented progress of artificial intelligence algorithms, there is an opportunity to develop a cost-effective system for diagnosing PD at an earlier stage. No permanent remedy has been established yet; however, an earlier diagnosis helps lead a better life. Probably, the three most responsible categories of symptoms for Parkinson’s Disease are tremors, rigidity, and body bradykinesia. Therefore, we investigate the 53 unique features of the Parkinson’s Progression Markers Initiative dataset to determine the significant symptoms, including three major categories. As feature selection is integral to developing a generalized model, we investigate including and excluding feature selection. Four feature selection methods are incorporated—low variance filter, Wilcoxon rank-sum test, principle component analysis, and Chi-square test. Furthermore, we utilize machine learning, ensemble learning, and artificial neural networks (ANN) for classification. Experimental evidence shows that not all symptoms are equally important, but no symptom can be completely eliminated. However, our proposed ANN model attains the best mean accuracy of 99.51%, 98.17% mean specificity, 0.9830 mean Kappa Score, 0.99 mean AUC, and 99.70% mean F1-score with all the features. The efficiency of our suggested technique on diverse data modalities is demonstrated by comparison with recent publications. Finally, we established a trade-off between classification time and accuracy.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1821 - 1864"},"PeriodicalIF":0.0,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00482-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46611876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Class of Distribution Over Bounded Support and Its Associated Regression Model 一类新的有界支持上分布及其回归模型
Q1 Decision Sciences Pub Date : 2023-07-11 DOI: 10.1007/s40745-023-00483-3
Ishfaq S. Ahmad, Rameesa Jan, Poonam Nirwan, Peer Bilal Ahmad

In this paper, a new two-parameter distribution over the bounded support (0,1) is introduced and studied in detail. Some of the interesting statistical properties like concavity, hazard rate function, mean residual life, moments and quantile function are discussed. The method of moments and maximum likelihood estimation methods are used to estimate unknown parameters of the proposed model. Besides, finite sample performance of estimation methods are evaluated through the Monte-Carlo simulation study. Application of the proposed distribution to the real data sets shows a better fit than many known two-parameter distributions on the unit interval. Moreover, a new regression model as an alternative to various unit interval regression models is introduced.

本文介绍并详细研究了有界支持(0,1)上的一种新的双参数分布。本文讨论了一些有趣的统计特性,如凹性、危险率函数、平均残差寿命、矩和量子函数。矩法和最大似然估计方法用于估计所提模型的未知参数。此外,还通过蒙特卡洛模拟研究评估了估计方法的有限样本性能。与许多已知的单位区间双参数分布相比,将提出的分布应用于实际数据集显示出更好的拟合效果。此外,还引入了一个新的回归模型,作为各种单位区间回归模型的替代。
{"title":"A New Class of Distribution Over Bounded Support and Its Associated Regression Model","authors":"Ishfaq S. Ahmad,&nbsp;Rameesa Jan,&nbsp;Poonam Nirwan,&nbsp;Peer Bilal Ahmad","doi":"10.1007/s40745-023-00483-3","DOIUrl":"10.1007/s40745-023-00483-3","url":null,"abstract":"<div><p>In this paper, a new two-parameter distribution over the bounded support (0,1) is introduced and studied in detail. Some of the interesting statistical properties like concavity, hazard rate function, mean residual life, moments and quantile function are discussed. The method of moments and maximum likelihood estimation methods are used to estimate unknown parameters of the proposed model. Besides, finite sample performance of estimation methods are evaluated through the Monte-Carlo simulation study. Application of the proposed distribution to the real data sets shows a better fit than many known two-parameter distributions on the unit interval. Moreover, a new regression model as an alternative to various unit interval regression models is introduced.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 2","pages":"549 - 569"},"PeriodicalIF":0.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44336464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inception-UDet: An Improved U-Net Architecture for Brain Tumor Segmentation Inception-UDet:一种改进的U-Net结构用于脑肿瘤分割
Q1 Decision Sciences Pub Date : 2023-07-01 DOI: 10.1007/s40745-023-00480-6
Ilyasse Aboussaleh, Jamal Riffi, Adnane Mohamed Mahraz, Hamid Tairi

Brain tumor segmentation is an important field and a sensitive task in tumor diagnosis. The treatment research in this area has helped specialists in detecting the tumor’s location in order to deal with it in its early stages. Numerous methods based on deep learning, have been proposed, including the symmetric U-Net architectures, which revealed great results in the medical imaging field, precisely brain tumor segmentation. In this paper, we proposed an improved U-Net architecture called Inception U-Det inspired by U-Det. This work aims at employing the inception block instead of the convolution one used in the bi-directional feature pyramid neural (Bi-FPN) network during the skip connection U-Det phase. Furthermore, a comparison study has been performed between our proposed approach and the three known architectures in medical imaging segmentation; U-Net, DC-Unet, and U-Det. Several segmentation metrics have been computed and then taken into account in these methods, by means of the publicly available BraTS datasets. Thus, our obtained results have showed promising results in terms of accuracy, dice similarity coefficient (DSC), and intersection–union ratio (IOU). Moreover, the proposed method has achieved a DSC of 87.9%, 85.5%, and 83.9% on BraTS2020, BraTS2018, and BraTS2017, respectively, calculated from the best fold in fourfold cross-validation employed in the present approach.

脑肿瘤分割是肿瘤诊断的一个重要领域和敏感任务。该领域的治疗研究有助于专家检测肿瘤的位置,以便在早期阶段进行治疗。目前已经提出了许多基于深度学习的方法,包括对称 U-Net 架构,这些方法在医学成像领域,尤其是脑肿瘤分割领域取得了巨大的成果。在本文中,我们受 U-Det 的启发,提出了一种改进的 U-Net 体系结构--Inception U-Det,其目的是在跳接 U-Det 阶段使用初始块来代替双向特征金字塔神经网络(Bi-FPN)中使用的卷积块。此外,我们还对所提出的方法与医学影像分割领域的三种已知架构(U-Net、DC-Unet 和 U-Det)进行了比较研究,并通过公开的 BraTS 数据集计算了几种分割指标,然后将其纳入这些方法的考虑范围。因此,我们所获得的结果在准确度、骰子相似系数(DSC)和交集联合率(IOU)方面都显示出了良好的效果。此外,根据本方法采用的四重交叉验证中的最佳折叠计算,所提出的方法在 BraTS2020、BraTS2018 和 BraTS2017 上的 DSC 分别达到了 87.9%、85.5% 和 83.9%。
{"title":"Inception-UDet: An Improved U-Net Architecture for Brain Tumor Segmentation","authors":"Ilyasse Aboussaleh,&nbsp;Jamal Riffi,&nbsp;Adnane Mohamed Mahraz,&nbsp;Hamid Tairi","doi":"10.1007/s40745-023-00480-6","DOIUrl":"10.1007/s40745-023-00480-6","url":null,"abstract":"<div><p>Brain tumor segmentation is an important field and a sensitive task in tumor diagnosis. The treatment research in this area has helped specialists in detecting the tumor’s location in order to deal with it in its early stages. Numerous methods based on deep learning, have been proposed, including the symmetric U-Net architectures, which revealed great results in the medical imaging field, precisely brain tumor segmentation. In this paper, we proposed an improved U-Net architecture called Inception U-Det inspired by U-Det. This work aims at employing the inception block instead of the convolution one used in the bi-directional feature pyramid neural (Bi-FPN) network during the skip connection U-Det phase. Furthermore, a comparison study has been performed between our proposed approach and the three known architectures in medical imaging segmentation; U-Net, DC-Unet, and U-Det. Several segmentation metrics have been computed and then taken into account in these methods, by means of the publicly available BraTS datasets. Thus, our obtained results have showed promising results in terms of accuracy, dice similarity coefficient (DSC), and intersection–union ratio (IOU). Moreover, the proposed method has achieved a DSC of 87.9%, 85.5%, and 83.9% on BraTS2020, BraTS2018, and BraTS2017, respectively, calculated from the best fold in fourfold cross-validation employed in the present approach.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"831 - 853"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45364813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations 为具有强相关性的高维基因组数据选择稳定的变量
Q1 Decision Sciences Pub Date : 2023-06-29 DOI: 10.1007/s40745-023-00481-5
Reetika Sarkar, Sithija Manage, Xiaoli Gao

High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including the Lasso and MCP, etc. In this paper, we perform comparative study of regularization approaches for variable selection under different correlation structures and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Extensive simulation studies and high-dimensional genomic data analysis on real datasets have demonstrated the advantage of the proposed rPGBS method over some of the most used regularization methods. In particular, rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to some recent methods addressing variable selection with strong correlations: Precision Lasso (Wang et al. in Bioinformatics 35:1181–1187, 2019) and Whitening Lasso (Zhu et al. in Bioinformatics 37:2238–2244, 2021). Moreover, rPGBS has been shown to be computationally efficient across various settings.

高维基因组数据研究通常表现出强相关性,这导致使用常用正则化方法(包括 Lasso 和 MCP 等)获得的估计值不稳定、不一致。在本文中,我们对不同相关性结构下的变量选择正则化方法进行了比较研究,并提出了一种名为 rPGBS 的两阶段程序,以解决各种强相关性环境下的稳定变量选择问题。这种方法包括重复运行由随机伪组聚类和双级变量选择组成的两阶段分层方法。对真实数据集进行的大量模拟研究和高维基因组数据分析表明,与一些最常用的正则化方法相比,所提出的 rPGBS 方法更具优势。特别是,与最近一些处理强相关性变量选择的方法相比,rPGBS 在各种相关性设置下都能实现更稳定的变量选择:Precision Lasso(Wang 等人,载于《生物信息学》35:1181-1187,2019 年)和 Whitening Lasso(Zhu 等人,载于《生物信息学》37:2238-2244,2021 年)。此外,rPGBS 已被证明在各种环境下都具有计算效率。
{"title":"Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations","authors":"Reetika Sarkar,&nbsp;Sithija Manage,&nbsp;Xiaoli Gao","doi":"10.1007/s40745-023-00481-5","DOIUrl":"10.1007/s40745-023-00481-5","url":null,"abstract":"<div><p>High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including the Lasso and MCP, etc. In this paper, we perform comparative study of regularization approaches for variable selection under different correlation structures and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Extensive simulation studies and high-dimensional genomic data analysis on real datasets have demonstrated the advantage of the proposed rPGBS method over some of the most used regularization methods. In particular, rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to some recent methods addressing variable selection with strong correlations: Precision Lasso (Wang et al. in Bioinformatics 35:1181–1187, 2019) and Whitening Lasso (Zhu et al. in Bioinformatics 37:2238–2244, 2021). Moreover, rPGBS has been shown to be computationally efficient across various settings.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1139 - 1164"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135049935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Progressively Censored Generalized X-Exponential Distribution: (Non) Bayesian Estimation with an Application to Bladder Cancer Data 关于渐进截尾广义X指数分布:(非)贝叶斯估计及其在癌症数据中的应用
Q1 Decision Sciences Pub Date : 2023-06-15 DOI: 10.1007/s40745-023-00477-1
Kousik Maiti, Suchandan Kayal, Aditi Kar Gangopadhyay

This article addresses estimation of the parameters and reliability characteristics of a generalized X-Exponential distribution based on the progressive type-II censored sample. The maximum likelihood estimates (MLEs) are obtained. The uniqueness and existence of the MLEs are studied. The Bayes estimates are obtained under squared error and entropy loss functions. For computation of the Bayes estimates, Markov Chain Monte Carlo method is used. Bootstrap-t and bootstrap-p methods are used to compute the interval estimates. Further, a simulation study is performed to compare the performance of the proposed estimates. Finally, a real-life dataset is considered and analysed for illustrative purposes.

本文探讨了基于渐进式 II 型删减样本的广义 X 指数分布的参数估计和可靠性特征。得到了最大似然估计值(MLE)。研究了 MLE 的唯一性和存在性。在平方误差和熵损失函数下获得贝叶斯估计值。在计算贝叶斯估计值时,使用了马尔可夫链蒙特卡罗方法。使用 Bootstrap-t 和 Bootstrap-p 方法计算区间估计值。此外,还进行了模拟研究,以比较建议的估计值的性能。最后,考虑并分析了现实生活中的一个数据集,以作说明。
{"title":"On Progressively Censored Generalized X-Exponential Distribution: (Non) Bayesian Estimation with an Application to Bladder Cancer Data","authors":"Kousik Maiti,&nbsp;Suchandan Kayal,&nbsp;Aditi Kar Gangopadhyay","doi":"10.1007/s40745-023-00477-1","DOIUrl":"10.1007/s40745-023-00477-1","url":null,"abstract":"<div><p>This article addresses estimation of the parameters and reliability characteristics of a generalized <i>X</i>-Exponential distribution based on the progressive type-II censored sample. The maximum likelihood estimates (MLEs) are obtained. The uniqueness and existence of the MLEs are studied. The Bayes estimates are obtained under squared error and entropy loss functions. For computation of the Bayes estimates, Markov Chain Monte Carlo method is used. Bootstrap-<i>t</i> and bootstrap-<i>p</i> methods are used to compute the interval estimates. Further, a simulation study is performed to compare the performance of the proposed estimates. Finally, a real-life dataset is considered and analysed for illustrative purposes.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1761 - 1798"},"PeriodicalIF":0.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45717684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Differential Privacy for Medical Data Analysis 医疗数据分析中的差异隐私调查
Q1 Decision Sciences Pub Date : 2023-06-10 DOI: 10.1007/s40745-023-00475-3
WeiKang Liu, Yanchun Zhang, Hong Yang, Qinxue Meng

Machine learning methods promote the sustainable development of wise information technology of medicine (WITMED), and a variety of medical data brings high value and convenience to medical analysis. However, the applications of medical data have also been confronted with the risk of privacy leakage that is hard to avoid, especially when conducting correlation analysis or data sharing among multiple institutions. Data security and privacy preservation have recently played an essential role in the field of secure and private medical data analysis, where many differential privacy strategies are applied to medical data publishing and mining. In this paper, we survey research work on the applications of differential privacy for medical data analysis, discussing the necessity of medical privacy-preserving, the advantages of differential privacy, and their applications to typical medical data, such as genomic data and wearable device data. Furthermore, we discuss the challenges and potential future research directions for differential privacy in medical applications.

机器学习方法促进了智慧医疗信息技术(WITMED)的可持续发展,各种医疗数据为医疗分析带来了高价值和便利。然而,医疗数据的应用也面临着难以避免的隐私泄露风险,尤其是在进行关联分析或多机构数据共享时。最近,数据安全和隐私保护在安全和隐私医疗数据分析领域发挥了至关重要的作用,许多差异化隐私策略被应用于医疗数据发布和挖掘。在本文中,我们调查了医学数据分析中差异化隐私应用的研究工作,讨论了医学隐私保护的必要性、差异化隐私的优势以及它们在基因组数据和可穿戴设备数据等典型医学数据中的应用。此外,我们还讨论了差异化隐私在医疗应用中面临的挑战和潜在的未来研究方向。
{"title":"A Survey on Differential Privacy for Medical Data Analysis","authors":"WeiKang Liu,&nbsp;Yanchun Zhang,&nbsp;Hong Yang,&nbsp;Qinxue Meng","doi":"10.1007/s40745-023-00475-3","DOIUrl":"10.1007/s40745-023-00475-3","url":null,"abstract":"<div><p>Machine learning methods promote the sustainable development of wise information technology of medicine (WITMED), and a variety of medical data brings high value and convenience to medical analysis. However, the applications of medical data have also been confronted with the risk of privacy leakage that is hard to avoid, especially when conducting correlation analysis or data sharing among multiple institutions. Data security and privacy preservation have recently played an essential role in the field of secure and private medical data analysis, where many differential privacy strategies are applied to medical data publishing and mining. In this paper, we survey research work on the applications of differential privacy for medical data analysis, discussing the necessity of medical privacy-preserving, the advantages of differential privacy, and their applications to typical medical data, such as genomic data and wearable device data. Furthermore, we discuss the challenges and potential future research directions for differential privacy in medical applications.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 2","pages":"733 - 747"},"PeriodicalIF":0.0,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47520588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Naïve Bayes Classifier Model for Detecting Spam Mails Naïve垃圾邮件检测的贝叶斯分类器模型
Q1 Decision Sciences Pub Date : 2023-06-09 DOI: 10.1007/s40745-023-00479-z
Shrawan Kumar, Kavita Gupta, Manya Gupta

In this paper, the machine learning algorithm Naive Bayes Classifier is applied to the Kaggle spam mails dataset to classify the emails in our inbox as spam or ham. The dataset is made up of two main attributes: type and text. The target variable "Type" has two factors: ham and spam. The text variable contains the text messages that will be classified as spam or ham. The results are obtained by employing two different Laplace values. It is up to the decision maker to select error tolerance in ham and spam messages derived from two different Laplace values. Computing software R is used for data analysis.

本文将机器学习算法 Naive Bayes 分类器应用于 Kaggle 垃圾邮件数据集,将收件箱中的邮件分为垃圾邮件和火腿肠邮件。数据集由两个主要属性组成:类型和文本。目标变量 "类型 "包含两个因子:垃圾邮件和火腿邮件。文本变量包含将被分类为垃圾邮件或火腿肠邮件的文本信息。结果是通过使用两种不同的拉普拉斯值得出的。决策者可以根据两种不同的拉普拉斯值来选择火腿和垃圾邮件的误差容限。计算软件 R 用于数据分析。
{"title":"Naïve Bayes Classifier Model for Detecting Spam Mails","authors":"Shrawan Kumar,&nbsp;Kavita Gupta,&nbsp;Manya Gupta","doi":"10.1007/s40745-023-00479-z","DOIUrl":"10.1007/s40745-023-00479-z","url":null,"abstract":"<div><p>In this paper, the machine learning algorithm Naive Bayes Classifier is applied to the Kaggle spam mails dataset to classify the emails in our inbox as spam or ham. The dataset is made up of two main attributes: type and text. The target variable \"Type\" has two factors: ham and spam. The text variable contains the text messages that will be classified as spam or ham. The results are obtained by employing two different Laplace values. It is up to the decision maker to select error tolerance in ham and spam messages derived from two different Laplace values. Computing software R is used for data analysis.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"1887 - 1897"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43486989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial Intelligence Algorithms for Collaborative Book Recommender Systems 协同图书推荐系统的人工智能算法
Q1 Decision Sciences Pub Date : 2023-06-08 DOI: 10.1007/s40745-023-00474-4
Clemens Tegetmeier, Arne Johannssen, Nataliya Chukhrova

Book recommender systems provide personalized recommendations of books to users based on their previous searches or purchases. As online trading of books has become increasingly important in recent years, artificial intelligence (AI) algorithms are needed to recommend suitable books to users and encourage them to make purchasing decisions in the short and the long run. In this paper, we consider AI algorithms for so called collaborative book recommender systems, especially the matrix factorization algorithm using the stochastic gradient descent method and the book-based k-nearest-neighbor algorithm. We perform a comprehensive case study based on the Book-Crossing benchmark data set, and implement various variants of both AI algorithms to predict unknown book ratings and to recommend books to individual users based on the highest predicted ratings. This study aims to evaluate the quality of the implemented methods in recommending books by using selected evaluation metrics for AI algorithms.

图书推荐系统根据用户以往的搜索或购买情况,向用户提供个性化的图书推荐。近年来,图书在线交易变得越来越重要,因此需要人工智能(AI)算法向用户推荐合适的图书,并鼓励他们在短期和长期内做出购买决定。在本文中,我们考虑了适用于所谓协作式图书推荐系统的人工智能算法,特别是使用随机梯度下降法的矩阵因式分解算法和基于图书的 k-nearest-neighbor 算法。我们基于 Book-Crossing 基准数据集进行了全面的案例研究,并实施了这两种人工智能算法的各种变体,以预测未知图书评分,并根据最高预测评分向单个用户推荐图书。本研究旨在使用选定的人工智能算法评价指标,评估所实施方法在推荐图书方面的质量。
{"title":"Artificial Intelligence Algorithms for Collaborative Book Recommender Systems","authors":"Clemens Tegetmeier,&nbsp;Arne Johannssen,&nbsp;Nataliya Chukhrova","doi":"10.1007/s40745-023-00474-4","DOIUrl":"10.1007/s40745-023-00474-4","url":null,"abstract":"<div><p>Book recommender systems provide personalized recommendations of books to users based on their previous searches or purchases. As online trading of books has become increasingly important in recent years, artificial intelligence (AI) algorithms are needed to recommend suitable books to users and encourage them to make purchasing decisions in the short and the long run. In this paper, we consider AI algorithms for so called collaborative book recommender systems, especially the matrix factorization algorithm using the stochastic gradient descent method and the book-based <i>k</i>-nearest-neighbor algorithm. We perform a comprehensive case study based on the Book-Crossing benchmark data set, and implement various variants of both AI algorithms to predict unknown book ratings and to recommend books to individual users based on the highest predicted ratings. This study aims to evaluate the quality of the implemented methods in recommending books by using selected evaluation metrics for AI algorithms.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1705 - 1739"},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00474-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45942766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1