CLADAG 2019 Special Issue: Selected Papers on Classification and Data Analysis

F. Greselin, T. B. Murphy, G. C. Porzio, D. Vistocco
{"title":"CLADAG 2019 Special Issue: Selected Papers on Classification and Data Analysis","authors":"F. Greselin, T. B. Murphy, G. C. Porzio, D. Vistocco","doi":"10.1002/sam.11533","DOIUrl":null,"url":null,"abstract":"This special issue of Statistical Analysis and Data Mining collects papers presented at the 12th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS), held in Cassino, Italy, 11–13 September 2019. The CLADAG group, founded in 1997, promotes advanced methodological research in multivariate statistics with a special vocation in Data Analysis and Classification. CLADAG is a member of the International Federation of Classification Societies (IFCS). It organizes a biennial international scientific meeting, schools related to classification and data analysis, publishes a newsletter, and cooperates with other member societies of the IFCS to the organization of their conferences. Founded in 1985, the IFCS is a federation of national, regional, and linguistically-based classification societies aimed at promoting classification research. Previous CLADAG meetings were held in Pescara (1997), Roma (1999), Palermo (2001), Bologna (2003), Parma (2005), Macerata (2007), Catania (2009), Pavia (2011), Modena and Reggio Emilia (2013), Cagliari (2015), and Milano (2017). Best papers from the conference have been submitted to this special issue, and six of them have been selected for publication, following a blind peer-review process. The manuscripts deal with different data analysis issues: mixture of distributions, compositional data analysis, Markov chain for web usability, survival analysis, and applications to high-throughput, eye-tracking, and insurance transaction data. The paper by Jirí Dvorák et al. (available in Stat Anal Data Min: The ASA Data Sci Journal. 2020;13:548–564) introduces the Clover plot, an easy-to-understand graphical tool that facilitates the appropriate choice of a classifier, to be employed in supervised classification. It combines four complementary classifiers—the depth–depth plot, the bagdistance plot, an approach based on the illumination, and the classical diagnostic plot based on Mahalanobis distances. It borrows strengths from all these methodologies, contrasts them, and allows interpretations about the structure of the data. The paper by S.X. Lee et al. proposes a parallelization strategy of the Expectation–Maximization (EM) algorithm, with a special focus on the estimation of finite mixtures of flexible distribution such as the canonical fundamental skew t distribution (CFUST). The parallel implementation of the EM-algorithm is suitable for single-threaded and multi-threaded processors as well as for single machine and multiple-node systems. The EM algorithm is also discussed in the paper of L. Scrucca. Here, a fast and efficient Modal EM algorithm for identifying the modes of a density estimated through a finite mixture of Gaussian distributions with parsimonious component covariance structures is provided. The proposed approach is based on an iterative procedure aimed at identifying the local maxima, exploiting features of the underlying Gaussian mixture model. Motivated by applications in high-throughput compositional data analysis, the paper by N. Štefelová et al. proposes a data-driven weighting strategy to enhance marker identification through PLS regression with compositional predictors. The weighting strategy draws on the correlation structure between response variable and pairwise log-ratios. Its practical relevance is illustrated through an analysis of metabolite signals associated with the emission of greenhouse gases from cattle. The paper by G. Zammarchi et al. exploits Markov chain to analyse web usability of a University website using eye tracking methodology. With the aim of improving its usability, the paper compares performances of high school and University students in terms of time to completion, number of fixations and difficulty ratio across ten different tasks. Data from a commercial insurance company in the Czech Republic are instead exploited by D. Zapletal to compare the efficacy of some survival analysis models within an insurance transaction framework. The ability to identify relevant explanatory variables through the Cox proportional hazard model and some competing risk models (i.e., the cause-specific and the sub-distribution hazard models) is assessed on a large data set consisting of more than 200,000 individuals. In brief, this special issue is in line with the CLADAG goal of supporting the interchange of ideas in Classification and Data Analysis. We strongly believe it well represents the scientific characteristics of the","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This special issue of Statistical Analysis and Data Mining collects papers presented at the 12th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS), held in Cassino, Italy, 11–13 September 2019. The CLADAG group, founded in 1997, promotes advanced methodological research in multivariate statistics with a special vocation in Data Analysis and Classification. CLADAG is a member of the International Federation of Classification Societies (IFCS). It organizes a biennial international scientific meeting, schools related to classification and data analysis, publishes a newsletter, and cooperates with other member societies of the IFCS to the organization of their conferences. Founded in 1985, the IFCS is a federation of national, regional, and linguistically-based classification societies aimed at promoting classification research. Previous CLADAG meetings were held in Pescara (1997), Roma (1999), Palermo (2001), Bologna (2003), Parma (2005), Macerata (2007), Catania (2009), Pavia (2011), Modena and Reggio Emilia (2013), Cagliari (2015), and Milano (2017). Best papers from the conference have been submitted to this special issue, and six of them have been selected for publication, following a blind peer-review process. The manuscripts deal with different data analysis issues: mixture of distributions, compositional data analysis, Markov chain for web usability, survival analysis, and applications to high-throughput, eye-tracking, and insurance transaction data. The paper by Jirí Dvorák et al. (available in Stat Anal Data Min: The ASA Data Sci Journal. 2020;13:548–564) introduces the Clover plot, an easy-to-understand graphical tool that facilitates the appropriate choice of a classifier, to be employed in supervised classification. It combines four complementary classifiers—the depth–depth plot, the bagdistance plot, an approach based on the illumination, and the classical diagnostic plot based on Mahalanobis distances. It borrows strengths from all these methodologies, contrasts them, and allows interpretations about the structure of the data. The paper by S.X. Lee et al. proposes a parallelization strategy of the Expectation–Maximization (EM) algorithm, with a special focus on the estimation of finite mixtures of flexible distribution such as the canonical fundamental skew t distribution (CFUST). The parallel implementation of the EM-algorithm is suitable for single-threaded and multi-threaded processors as well as for single machine and multiple-node systems. The EM algorithm is also discussed in the paper of L. Scrucca. Here, a fast and efficient Modal EM algorithm for identifying the modes of a density estimated through a finite mixture of Gaussian distributions with parsimonious component covariance structures is provided. The proposed approach is based on an iterative procedure aimed at identifying the local maxima, exploiting features of the underlying Gaussian mixture model. Motivated by applications in high-throughput compositional data analysis, the paper by N. Štefelová et al. proposes a data-driven weighting strategy to enhance marker identification through PLS regression with compositional predictors. The weighting strategy draws on the correlation structure between response variable and pairwise log-ratios. Its practical relevance is illustrated through an analysis of metabolite signals associated with the emission of greenhouse gases from cattle. The paper by G. Zammarchi et al. exploits Markov chain to analyse web usability of a University website using eye tracking methodology. With the aim of improving its usability, the paper compares performances of high school and University students in terms of time to completion, number of fixations and difficulty ratio across ten different tasks. Data from a commercial insurance company in the Czech Republic are instead exploited by D. Zapletal to compare the efficacy of some survival analysis models within an insurance transaction framework. The ability to identify relevant explanatory variables through the Cox proportional hazard model and some competing risk models (i.e., the cause-specific and the sub-distribution hazard models) is assessed on a large data set consisting of more than 200,000 individuals. In brief, this special issue is in line with the CLADAG goal of supporting the interchange of ideas in Classification and Data Analysis. We strongly believe it well represents the scientific characteristics of the
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CLADAG 2019特刊:分类与数据分析论文选集
本期《统计分析与数据挖掘》特刊收集了2019年9月11日至13日在意大利卡西诺举行的意大利统计学会(SIS)分类和数据分析小组(CLADAG)第12届科学会议上发表的论文。CLADAG集团成立于1997年,致力于推动多元统计领域的先进方法研究,并致力于数据分析与分类。CLADAG是国际船级社联合会(IFCS)的成员。它每两年组织一次国际科学会议,与分类和数据分析有关的学校,出版一份通讯,并与IFCS的其他成员协会合作组织会议。IFCS成立于1985年,是一个旨在促进分类研究的国家、地区和语言分类协会联合会。此前的CLADAG会议在佩斯卡拉(1997年)、罗马(1999年)、巴勒莫(2001年)、博洛尼亚(2003年)、帕尔马(2005年)、马切拉塔(2007年)、卡塔尼亚(2009年)、帕维亚(2011年)、摩德纳和雷吉欧艾米利亚(2013年)、卡利亚里(2015年)和米兰(2017年)举行。本次会议的最佳论文已被提交给本期特刊,其中六篇论文已被选中发表,这是经过同行盲评议的过程。这些手稿涉及不同的数据分析问题:混合分布、组合数据分析、网络可用性的马尔可夫链、生存分析,以及对高吞吐量、眼球追踪和保险交易数据的应用。Jirí Dvorák等人的论文(可在Stat Anal Data Min: The ASA Data Sci Journal. 2020; 13:548-564中获得)介绍了Clover plot,这是一种易于理解的图形工具,有助于在监督分类中使用分类器的适当选择。它结合了四种互补的分类器——深度-深度图、袋距图、基于光照的方法和基于马氏距离的经典诊断图。它借鉴了所有这些方法的优点,对它们进行对比,并允许对数据结构进行解释。S.X. Lee等人的论文提出了一种期望最大化(EM)算法的并行化策略,特别关注灵活分布(如典型基本倾斜t分布(CFUST))的有限混合估计。em算法的并行实现适用于单线程和多线程处理器以及单机和多节点系统。L. scucca的论文也讨论了EM算法。本文提出了一种快速有效的模态EM算法,用于识别由有限的高斯分布混合估计的密度的模态。提出的方法是基于一个迭代过程,旨在识别局部最大值,利用底层高斯混合模型的特征。受高通量组合数据分析应用的启发,N. Štefelová等人的论文提出了一种数据驱动的加权策略,通过组合预测因子的PLS回归来增强标记识别。加权策略利用了响应变量与成对对数比之间的相关结构。通过对与牛排放温室气体有关的代谢物信号的分析,可以说明其实际意义。G. Zammarchi等人的论文利用马尔可夫链来分析使用眼动追踪方法的大学网站的网络可用性。为了提高其可用性,本文比较了高中生和大学生在十项不同任务的完成时间、注视数和难度比方面的表现。相反,D. Zapletal利用捷克共和国一家商业保险公司的数据来比较保险交易框架内一些生存分析模型的有效性。通过Cox比例风险模型和一些相互竞争的风险模型(即,特定原因风险模型和子分布风险模型)识别相关解释变量的能力在由20多万人组成的大型数据集上进行了评估。总之,这个特刊符合CLADAG支持分类和数据分析思想交流的目标。我们坚信它很好地代表了科学的特点
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Neural interval‐censored survival regression with feature selection Bayesian batch optimization for molybdenum versus tungsten inertial confinement fusion double shell target design Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis An automated alignment algorithm for identification of the source of footwear impressions with common class characteristics Confidence bounds for threshold similarity graph in random variable network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1