{"title":"拆分-应用-合并,动态分组","authors":"Mark P. J. van der Loo","doi":"arxiv-2406.09887","DOIUrl":null,"url":null,"abstract":"Partitioning a data set by one or more of its attributes and computing an\naggregate for each part is one of the most common operations in data analyses.\nThere are use cases where the partitioning is determined dynamically by\ncollapsing smaller subsets into larger ones, to ensure sufficient support for\nthe computed aggregate. These use cases are not supported by software\nimplementing split-apply-combine types of operations. This paper presents the\n\\texttt{R} package \\texttt{accumulate} that offers convenient interfaces for\ndefining grouped aggregation where the grouping itself is dynamically\ndetermined, based on user-defined conditions on subsets, and a user-defined\nsubset collapsing scheme. The formal underlying algorithm is described and\nanalyzed as well.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Split-Apply-Combine with Dynamic Grouping\",\"authors\":\"Mark P. J. van der Loo\",\"doi\":\"arxiv-2406.09887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Partitioning a data set by one or more of its attributes and computing an\\naggregate for each part is one of the most common operations in data analyses.\\nThere are use cases where the partitioning is determined dynamically by\\ncollapsing smaller subsets into larger ones, to ensure sufficient support for\\nthe computed aggregate. These use cases are not supported by software\\nimplementing split-apply-combine types of operations. This paper presents the\\n\\\\texttt{R} package \\\\texttt{accumulate} that offers convenient interfaces for\\ndefining grouped aggregation where the grouping itself is dynamically\\ndetermined, based on user-defined conditions on subsets, and a user-defined\\nsubset collapsing scheme. The formal underlying algorithm is described and\\nanalyzed as well.\",\"PeriodicalId\":501215,\"journal\":{\"name\":\"arXiv - STAT - Computation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.09887\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.09887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Partitioning a data set by one or more of its attributes and computing an
aggregate for each part is one of the most common operations in data analyses.
There are use cases where the partitioning is determined dynamically by
collapsing smaller subsets into larger ones, to ensure sufficient support for
the computed aggregate. These use cases are not supported by software
implementing split-apply-combine types of operations. This paper presents the
\texttt{R} package \texttt{accumulate} that offers convenient interfaces for
defining grouped aggregation where the grouping itself is dynamically
determined, based on user-defined conditions on subsets, and a user-defined
subset collapsing scheme. The formal underlying algorithm is described and
analyzed as well.