Journal of Classification最新文献

英文中文

Computing Finite Mixture Estimators in the Tails 计算尾部的有限混合估计量

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2023-04-13 DOI: 10.1007/s00357-023-09433-3

Marilena Furno

引用次数: 0

Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models. 广义线性模型混合的局部和总体偏差R平方测度。

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2023-04-04 DOI: 10.1007/s00357-023-09432-4

Roberto Di Mari, Salvatore Ingrassia, Antonio Punzo

In generalized linear models (GLMs), measures of lack of fit are typically defined as the deviance between two nested models, and a deviance-based R² is commonly used to evaluate the fit. In this paper, we extend deviance measures to mixtures of GLMs, whose parameters are estimated by maximum likelihood (ML) via the EM algorithm. Such measures are defined both locally, i.e., at cluster-level, and globally, i.e., with reference to the whole sample. At the cluster-level, we propose a normalized two-term decomposition of the local deviance into explained, and unexplained local deviances. At the sample-level, we introduce an additive normalized decomposition of the total deviance into three terms, where each evaluates a different aspect of the fitted model: (1) the cluster separation on the dependent variable, (2) the proportion of the total deviance explained by the fitted model, and (3) the proportion of the total deviance which remains unexplained. We use both local and global decompositions to define, respectively, local and overall deviance R² measures for mixtures of GLMs, which we illustrate-for Gaussian, Poisson and binomial responses-by means of a simulation study. The proposed fit measures are then used to assess, and interpret clusters of COVID-19 spread in Italy in two time points.

在广义线性模型（GLM）中，缺乏拟合的度量通常被定义为两个嵌套模型之间的偏差，并且基于偏差的R2通常用于评估拟合。在本文中，我们将偏差度量扩展到GLM的混合物，其参数通过EM算法由最大似然（ML）估计。这种衡量标准既在本地定义，即在集群级别定义，也在全局定义，即参考整个样本。在聚类级别，我们提出了一种将局部偏差归一化为已解释和未解释的局部偏差的两项分解。在样本水平上，我们引入了总偏差的加性归一化分解为三个项，其中每个项都评估拟合模型的不同方面：（1）因变量上的聚类分离，（2）拟合模型解释的总偏差的比例，以及（3）仍然无法解释的总偏离的比例。我们使用局部和全局分解来分别定义GLM混合物的局部和总体偏差R2度量，我们通过模拟研究对高斯、泊松和二项式响应进行了说明。然后使用拟议的拟合措施来评估和解释新冠肺炎在两个时间点在意大利的集群传播。

{"title":"Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models.","authors":"Roberto Di Mari, Salvatore Ingrassia, Antonio Punzo","doi":"10.1007/s00357-023-09432-4","DOIUrl":"10.1007/s00357-023-09432-4","url":null,"abstract":"In generalized linear models (GLMs), measures of lack of fit are typically defined as the deviance between two nested models, and a deviance-based R2 is commonly used to evaluate the fit. In this paper, we extend deviance measures to mixtures of GLMs, whose parameters are estimated by maximum likelihood (ML) via the EM algorithm. Such measures are defined both locally, i.e., at cluster-level, and globally, i.e., with reference to the whole sample. At the cluster-level, we propose a normalized two-term decomposition of the local deviance into explained, and unexplained local deviances. At the sample-level, we introduce an additive normalized decomposition of the total deviance into three terms, where each evaluates a different aspect of the fitted model: (1) the cluster separation on the dependent variable, (2) the proportion of the total deviance explained by the fitted model, and (3) the proportion of the total deviance which remains unexplained. We use both local and global decompositions to define, respectively, local and overall deviance R2 measures for mixtures of GLMs, which we illustrate-for Gaussian, Poisson and binomial responses-by means of a simulation study. The proposed fit measures are then used to assess, and interpret clusters of COVID-19 spread in Italy in two time points.","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":" ","pages":"1-34"},"PeriodicalIF":2.0,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10071261/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9768843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characteristics of Distance Matrices Based on Euclidean, Manhattan and Hausdorff Coefficients 基于欧几里得、曼哈顿和豪斯多夫系数的距离矩阵特征

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2023-04-03 DOI: 10.1007/s00357-023-09435-1

J. T. Temple, R. Bateman

引用次数: 2

Finding the Proverbial Needle: Improving Minority Class Identification Under Extreme Class Imbalance 找到谚语的针：在极端阶级失衡下提高少数民族的阶级认同

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2023-02-23 DOI: 10.1007/s00357-023-09431-5

Trent Geisler, Herman Ray, Ying Xie

引用次数: 0

Classification Trees with Mismeasured Responses 具有误判响应的分类树

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2023-02-16 DOI: 10.1007/s00357-023-09430-6

L. Diao, Grace Y. Yi

引用次数: 0

Uncertainty Diagnostics of Binomial Regression Trees for Ordered Rating Data 有序评级数据二项回归树的不确定性诊断

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2023-01-21 DOI: 10.1007/s00357-022-09429-5

R. Simone

引用次数: 0

DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling. DDCAL:基于迭代特征缩放的数据均匀分布到低方差聚类。

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2023-01-01 DOI: 10.1007/s00357-022-09428-6

Marian Lux, Stefanie Rinderle-Ma

This work studies the problem of clustering one-dimensional data points such that they are evenly distributed over a given number of low variance clusters. One application is the visualization of data on choropleth maps or on business process models, but without over-emphasizing outliers. This enables the detection and differentiation of smaller clusters. The problem is tackled based on a heuristic algorithm called DDCAL (1d distribution cluster algorithm) that is based on iterative feature scaling which generates stable results of clusters. The effectiveness of the DDCAL algorithm is shown based on 5 artificial data sets with different distributions and 4 real-world data sets reflecting different use cases. Moreover, the results from DDCAL, by using these data sets, are compared to 11 existing clustering algorithms. The application of the DDCAL algorithm is illustrated through the visualization of pandemic and population data on choropleth maps as well as process mining results on process models.

这项工作研究了聚类一维数据点的问题，使它们均匀分布在给定数量的低方差聚类上。一个应用程序是对地形图或业务流程模型上的数据进行可视化，但不过分强调异常值。这使得检测和区分较小的集群成为可能。基于迭代特征缩放的启发式算法DDCAL(一维分布聚类算法)可以生成稳定的聚类结果。基于5个不同分布的人工数据集和4个反映不同用例的真实数据集，验证了DDCAL算法的有效性。此外，利用这些数据集，将DDCAL的结果与现有的11种聚类算法进行了比较。通过在地形图上可视化流行病和人口数据以及在过程模型上的过程挖掘结果，说明了DDCAL算法的应用。

引用次数: 1

A Semi-parametric Density Estimation with Application in Clustering 半参数密度估计及其在聚类中的应用

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2022-12-14 DOI: 10.1007/s00357-022-09425-9

M. Salehi, A. Bekker, M. Arashi

引用次数: 0

Merging Components in Linear Gaussian Cluster-Weighted Models 线性高斯聚类加权模型中的分量合并

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2022-12-07 DOI: 10.1007/s00357-022-09424-w

Sangkon Oh, Byungtae Seo

引用次数: 1

Imputation Strategies for Clustering Mixed-Type Data with Missing Values 缺失值混合数据聚类的插值策略

IF 2 4区计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Classification

Pub Date : 2022-11-26 DOI: 10.1007/s00357-022-09422-y

Rabea Aschenbruck, G. Szepannek, A. Wilhelm

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Classification

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀