Association Between Nominal Categorical Variables: New Measure Formulation Based on Metric Distances and Value Validity

IF 0.6 Q4 STATISTICS & PROBABILITY Journal of Statistical Theory and Practice Pub Date : 2023-09-27 DOI:10.1007/s42519-023-00344-5
Tarald O. Kvålseth
{"title":"Association Between Nominal Categorical Variables: New Measure Formulation Based on Metric Distances and Value Validity","authors":"Tarald O. Kvålseth","doi":"10.1007/s42519-023-00344-5","DOIUrl":null,"url":null,"abstract":"Abstract When dealing with nominal categorical data, it is often desirable to know the degree of association or dependence between the categorical variables. While there is literally no limit to the number of alternative association measures that have been proposed over the years, they all yield greatly varying, contradictory, and unreliable results due to their lack of an important property: value validity. After discussing the value-validity property, this paper introduces a new measure of association (dependence) based on the mean Euclidean distance between probability distributions, one being a distribution under independence. Both the asymmetric form, when one variable can be considered as the explanatory (independent) variable and one as the response (dependent) variable, and the symmetric form of the measure are introduced. Particular emphasis is given to the important 2 × 2 case when each variable has two categories, but the general case of any number of categories is also covered. Besides having the value-validity property, the new measure has all the prerequisites of a good association measure. Comparisons are made with the well-known Goodman–Kruskal lambda and tau measures. Statistical inference procedure for the new measure is also derived and numerical examples are provided.","PeriodicalId":45853,"journal":{"name":"Journal of Statistical Theory and Practice","volume":"76 1","pages":"0"},"PeriodicalIF":0.6000,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Theory and Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42519-023-00344-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract When dealing with nominal categorical data, it is often desirable to know the degree of association or dependence between the categorical variables. While there is literally no limit to the number of alternative association measures that have been proposed over the years, they all yield greatly varying, contradictory, and unreliable results due to their lack of an important property: value validity. After discussing the value-validity property, this paper introduces a new measure of association (dependence) based on the mean Euclidean distance between probability distributions, one being a distribution under independence. Both the asymmetric form, when one variable can be considered as the explanatory (independent) variable and one as the response (dependent) variable, and the symmetric form of the measure are introduced. Particular emphasis is given to the important 2 × 2 case when each variable has two categories, but the general case of any number of categories is also covered. Besides having the value-validity property, the new measure has all the prerequisites of a good association measure. Comparisons are made with the well-known Goodman–Kruskal lambda and tau measures. Statistical inference procedure for the new measure is also derived and numerical examples are provided.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
名义分类变量之间的关联:基于度量距离和值有效性的新度量公式
当处理名义分类数据时,通常希望知道分类变量之间的关联或依赖程度。虽然多年来提出的替代关联度量的数量实际上没有限制,但由于它们缺乏一个重要的属性:值有效性,它们都产生了非常不同、矛盾和不可靠的结果。在讨论了值效性的基础上,提出了一种基于概率分布之间的平均欧氏距离的关联(依赖性)度量方法,其中一种是独立分布。介绍了一个变量可以作为解释变量(自变量),另一个变量可以作为响应变量(因变量)的非对称形式和测量的对称形式。当每个变量有两个类别时,特别强调了重要的2 × 2情况,但也涵盖了任何数量的类别的一般情况。新测度除了具有值效性外,还具备了一个好的关联测度所必须具备的条件。与著名的Goodman-Kruskal lambda和tau测度进行了比较。推导了新测度的统计推理过程,并给出了数值算例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Statistical Theory and Practice
Journal of Statistical Theory and Practice STATISTICS & PROBABILITY-
CiteScore
1.40
自引率
0.00%
发文量
74
期刊最新文献
Applications of Deep Neural Networks with Fractal Structure and Attention Blocks for 2D and 3D Brain Tumor Segmentation. Canonical Dependency Analysis Using a Bias-Corrected $$\chi ^2$$ Statistics Matrix Simultaneous Tests for Mean Vectors and Covariance Matrices with Three-Step Monotone Missing Data A Time-Lagged Penalized Regression Model and Applications to Economic Modeling Doubly-Inflated Poisson INGARCH Models for Count Time Series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1