Efficient Analysis of Overdispersed Data Using an Accurate Computation of the Dirichlet Multinomial Distribution

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-31 DOI:10.1109/TPAMI.2024.3489645

Sherenaz Al-Haj Baddar;Alessandro Languasco;Mauro Migliardi

{"title":"Efficient Analysis of Overdispersed Data Using an Accurate Computation of the Dirichlet Multinomial Distribution","authors":"Sherenaz Al-Haj Baddar;Alessandro Languasco;Mauro Migliardi","doi":"10.1109/TPAMI.2024.3489645","DOIUrl":null,"url":null,"abstract":"Modeling count data using suitable statistical distributions has been instrumental for analyzing the patterns it conveys. However, failing to address critical aspects, like overdispersion, jeopardizes the effectiveness of such an analysis. In this paper, overdispersed count data is modeled using the Dirichlet Multinomial (\n<bold>DM\n) distribution by maximizing its likelihood using a fixed-point iteration algorithm. This is achieved by estimating the \n<bold>DM\n distribution parameters while comparing the recent Languasco-Migliardi (\n<bold>LM\n), and the Yu-Shaw (\n<bold>YS\n) procedures, which address the well-known computational difficulties of evaluating its log-likelihood. Experiments were conducted using multiple datasets from different domains spanning polls, images, and IoT network traffic. They all showed the superiority of the \n<bold>LM\n procedure as it succeeded at estimating the \n<bold>DM\n parameters at the designated level of accuracy in all experiments, while the \n<bold>YS\n procedure failed to produce sufficiently accurate results (or any results at all) in several experiments. Moreover, the \n<bold>LM\n procedure achieved a speedup that ranged from 2-fold to 20-fold over \n<bold>YS\n.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 2","pages":"1181-1189"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10740644/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modeling count data using suitable statistical distributions has been instrumental for analyzing the patterns it conveys. However, failing to address critical aspects, like overdispersion, jeopardizes the effectiveness of such an analysis. In this paper, overdispersed count data is modeled using the Dirichlet Multinomial ( DM ) distribution by maximizing its likelihood using a fixed-point iteration algorithm. This is achieved by estimating the DM distribution parameters while comparing the recent Languasco-Migliardi ( LM ), and the Yu-Shaw ( YS ) procedures, which address the well-known computational difficulties of evaluating its log-likelihood. Experiments were conducted using multiple datasets from different domains spanning polls, images, and IoT network traffic. They all showed the superiority of the LM procedure as it succeeded at estimating the DM parameters at the designated level of accuracy in all experiments, while the YS procedure failed to produce sufficiently accurate results (or any results at all) in several experiments. Moreover, the LM procedure achieved a speedup that ranged from 2-fold to 20-fold over YS .

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用 Dirichlet 多叉分布的精确计算高效分析过度分散数据

使用合适的统计分布对计数数据建模，有助于分析其传递的模式。然而，如果不能解决像过度分散这样的关键问题，就会影响这种分析的有效性。在本文中，通过使用定点迭代算法使其似然最大化，使用 Dirichlet 多叉（DM）分布对过度分散的计数数据进行建模。这是通过估计 DM 分布参数来实现的，同时比较了最近的 Languasco-Migliardi (LM) 和 Yu-Shaw (YS) 程序，这两种程序解决了评估其对数似然的众所周知的计算困难。实验使用了来自不同领域的多个数据集，包括民意调查、图像和物联网网络流量。所有实验都显示了 LM 程序的优越性，因为它在所有实验中都成功地以指定的准确度估算出了 DM 参数，而 YS 程序在多个实验中都未能产生足够准确的结果（或根本没有任何结果）。此外，与 YS 程序相比，LM 程序的速度提高了 2 倍到 20 倍不等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量

期刊最新文献

FSD V2: Improving Fully Sparse 3D Object Detection With Virtual Voxels Online Learning Under a Separable Stochastic Approximation Framework Event-Enhanced Snapshot Compressive Videography at 10K FPS Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics Estimating Information Theoretic Measures via Multidimensional Gaussianization