Pub Date : 2024-08-05DOI: 10.1016/j.jmva.2024.105355
Alexander Petersen
Estimation of the mean and covariance parameters for functional data is a critical task, with local linear smoothing being a popular choice. In recent years, many scientific domains are producing multivariate functional data for which , the number of curves per subject, is often much larger than the sample size . In this setting of high-dimensional functional data, much of developed methodology relies on preliminary estimates of the unknown mean functions and the auto- and cross-covariance functions. This paper investigates the convergence rates of local linear estimators in terms of the maximal error across components and pairs of components for mean and covariance functions, respectively, in both and uniform metrics. The local linear estimators utilize a generic weighting scheme that can adjust for differing numbers of discrete observations across curves and subjects , where the vary with . Particular attention is given to the equal weight per observation (OBS) and equal weight per subject (SUBJ) weighting schemes. The theoretical results utilize novel applications of concentration inequalities for functional data and demonstrate that, similar to univariate functional data, the order of the relative to and divides high-dimensional functional data into three regimes (sparse, dense, and ultra-dense), with the high-dimensional parametric convergence rate of being attainable in the latter two.
估计功能数据的均值和协方差参数是一项关键任务,而局部线性平滑是一种常用的选择。近年来,许多科学领域正在产生多变量函数数据,其中每个受试者的曲线数 p 往往远大于样本数 n。在这种高维函数数据设置中,许多已开发的方法依赖于对未知均值函数以及自协方差和交协方差函数的初步估计。本文研究了局部线性估计器的收敛率,即在 L2 和均匀度量下,分别对均值函数和协方差函数的跨分量和成对分量的最大误差进行估计。局部线性估计器采用通用加权方案,该方案可以调整曲线 j 和受试者 i 之间不同数量的离散观测值 Nij,其中 Nij 随 n 变化。理论结果利用了函数数据集中不等式的新应用,并证明了与单变量函数数据类似,Nij 相对于 p 和 n 的阶数将高维函数数据分为三种情况(稀疏、密集和超密集),在后两种情况下可达到 log(p)/n1/2 的高维参数收敛速率。
{"title":"Mean and covariance estimation for discretely observed high-dimensional functional data: Rates of convergence and division of observational regimes","authors":"Alexander Petersen","doi":"10.1016/j.jmva.2024.105355","DOIUrl":"10.1016/j.jmva.2024.105355","url":null,"abstract":"<div><p>Estimation of the mean and covariance parameters for functional data is a critical task, with local linear smoothing being a popular choice. In recent years, many scientific domains are producing multivariate functional data for which <span><math><mi>p</mi></math></span>, the number of curves per subject, is often much larger than the sample size <span><math><mi>n</mi></math></span>. In this setting of high-dimensional functional data, much of developed methodology relies on preliminary estimates of the unknown mean functions and the auto- and cross-covariance functions. This paper investigates the convergence rates of local linear estimators in terms of the maximal error across components and pairs of components for mean and covariance functions, respectively, in both <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> and uniform metrics. The local linear estimators utilize a generic weighting scheme that can adjust for differing numbers of discrete observations <span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></math></span> across curves <span><math><mi>j</mi></math></span> and subjects <span><math><mi>i</mi></math></span>, where the <span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></math></span> vary with <span><math><mi>n</mi></math></span>. Particular attention is given to the equal weight per observation (OBS) and equal weight per subject (SUBJ) weighting schemes. The theoretical results utilize novel applications of concentration inequalities for functional data and demonstrate that, similar to univariate functional data, the order of the <span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></math></span> relative to <span><math><mi>p</mi></math></span> and <span><math><mi>n</mi></math></span> divides high-dimensional functional data into three regimes (sparse, dense, and ultra-dense), with the high-dimensional parametric convergence rate of <span><math><msup><mrow><mfenced><mrow><mo>log</mo><mrow><mo>(</mo><mi>p</mi><mo>)</mo></mrow><mo>/</mo><mi>n</mi></mrow></mfenced></mrow><mrow><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></math></span> being attainable in the latter two.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105355"},"PeriodicalIF":1.4,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141953976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.jmva.2024.105347
Mikael Escobar-Bach, Salima Helali
In this paper, we consider the problem of dependent censoring models with a positive probability that the times of failure are equal. In this context, we propose to consider the Marshall–Olkin type model and studied some properties of the associated survival copula in its application to censored data. We also introduce estimators for the marginal distributions and the joint survival probabilities under different schemes and show their asymptotic normality under appropriate conditions. Finally, we evaluate the finite-sample performance of our approach relying on a small simulation study with synthetic data real data applications.
{"title":"Dependent censoring with simultaneous death times based on the Generalized Marshall–Olkin model","authors":"Mikael Escobar-Bach, Salima Helali","doi":"10.1016/j.jmva.2024.105347","DOIUrl":"10.1016/j.jmva.2024.105347","url":null,"abstract":"<div><p>In this paper, we consider the problem of dependent censoring models with a positive probability that the times of failure are equal. In this context, we propose to consider the Marshall–Olkin type model and studied some properties of the associated survival copula in its application to censored data. We also introduce estimators for the marginal distributions and the joint survival probabilities under different schemes and show their asymptotic normality under appropriate conditions. Finally, we evaluate the finite-sample performance of our approach relying on a small simulation study with synthetic data real data applications.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105347"},"PeriodicalIF":1.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X2400054X/pdfft?md5=b9e7bd9d7773367d73bd57f13743392a&pid=1-s2.0-S0047259X2400054X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-20DOI: 10.1016/j.jmva.2024.105354
Jingyi Wang , Tianming Zhu , Jin-Ting Zhang
Testing the equality of the covariance matrices of two high-dimensional samples is a fundamental inference problem in statistics. Several tests have been proposed but they are either too liberal or too conservative when the required assumptions are not satisfied which attests that they are not always applicable in real data analysis. To overcome this difficulty, a normal-reference test is proposed and studied in this paper. It is shown that under some regularity conditions and the null hypothesis, the proposed test statistic and a chi-squared-type mixture have the same limiting distribution. It is then justified to approximate the null distribution of the proposed test statistic using that of the chi-squared-type mixture. The distribution of the chi-squared-type mixture can be well approximated using a three-cumulant matched chi-squared-approximation with its approximation parameters consistently estimated from the data. The asymptotic power of the proposed test under a local alternative is also established. Simulation studies and a real data example demonstrate that the proposed test works well in general scenarios and outperforms the existing competitors substantially in terms of size control.
{"title":"Two-sample test for high-dimensional covariance matrices: A normal-reference approach","authors":"Jingyi Wang , Tianming Zhu , Jin-Ting Zhang","doi":"10.1016/j.jmva.2024.105354","DOIUrl":"10.1016/j.jmva.2024.105354","url":null,"abstract":"<div><p>Testing the equality of the covariance matrices of two high-dimensional samples is a fundamental inference problem in statistics. Several tests have been proposed but they are either too liberal or too conservative when the required assumptions are not satisfied which attests that they are not always applicable in real data analysis. To overcome this difficulty, a normal-reference test is proposed and studied in this paper. It is shown that under some regularity conditions and the null hypothesis, the proposed test statistic and a chi-squared-type mixture have the same limiting distribution. It is then justified to approximate the null distribution of the proposed test statistic using that of the chi-squared-type mixture. The distribution of the chi-squared-type mixture can be well approximated using a three-cumulant matched chi-squared-approximation with its approximation parameters consistently estimated from the data. The asymptotic power of the proposed test under a local alternative is also established. Simulation studies and a real data example demonstrate that the proposed test works well in general scenarios and outperforms the existing competitors substantially in terms of size control.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105354"},"PeriodicalIF":1.4,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-17DOI: 10.1016/j.jmva.2024.105346
Yu Zhao , Dan Cheng , Armin Schwartzman
We study power approximation formulas for peak detection using Gaussian random field theory. The approximation, based on the expected number of local maxima above the threshold , , is proved to work well under three asymptotic scenarios: small domain, large threshold, and sharp signal. An adjusted version of is also proposed to improve accuracy when the expected number of local maxima exceeds 1. Cheng and Schwartzman (2018) developed explicit formulas for of smooth isotropic Gaussian random fields with zero mean. In this paper, these formulas are extended to allow for rotational symmetric mean functions, making them applicable not only for power calculations but also for other areas of application that involve non-centered Gaussian random fields. We also apply our formulas to 2D and 3D simulated datasets, and the 3D data is induced by a group analysis of fMRI data from the Human Connectome Project to measure performance in a realistic setting.
{"title":"An approximation to peak detection power using Gaussian random field theory","authors":"Yu Zhao , Dan Cheng , Armin Schwartzman","doi":"10.1016/j.jmva.2024.105346","DOIUrl":"10.1016/j.jmva.2024.105346","url":null,"abstract":"<div><p>We study power approximation formulas for peak detection using Gaussian random field theory. The approximation, based on the expected number of local maxima above the threshold <span><math><mi>u</mi></math></span>, <span><math><mrow><mi>E</mi><mrow><mo>[</mo><msub><mrow><mi>M</mi></mrow><mrow><mi>u</mi></mrow></msub><mo>]</mo></mrow></mrow></math></span>, is proved to work well under three asymptotic scenarios: small domain, large threshold, and sharp signal. An adjusted version of <span><math><mrow><mi>E</mi><mrow><mo>[</mo><msub><mrow><mi>M</mi></mrow><mrow><mi>u</mi></mrow></msub><mo>]</mo></mrow></mrow></math></span> is also proposed to improve accuracy when the expected number of local maxima <span><math><mrow><mi>E</mi><mrow><mo>[</mo><msub><mrow><mi>M</mi></mrow><mrow><mo>−</mo><mi>∞</mi></mrow></msub><mo>]</mo></mrow></mrow></math></span> exceeds 1. Cheng and Schwartzman (2018) developed explicit formulas for <span><math><mrow><mi>E</mi><mrow><mo>[</mo><msub><mrow><mi>M</mi></mrow><mrow><mi>u</mi></mrow></msub><mo>]</mo></mrow></mrow></math></span> of smooth isotropic Gaussian random fields with zero mean. In this paper, these formulas are extended to allow for rotational symmetric mean functions, making them applicable not only for power calculations but also for other areas of application that involve non-centered Gaussian random fields. We also apply our formulas to 2D and 3D simulated datasets, and the 3D data is induced by a group analysis of fMRI data from the Human Connectome Project to measure performance in a realistic setting.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105346"},"PeriodicalIF":1.4,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1016/j.jmva.2024.105345
Yiping Yang, Chuanqin Luo, Weiming Yang
In this study, we address the selection of both fixed and random effects in partial linear mixed effects models. By combining B-spline and QR decomposition techniques, we propose a double-penalized likelihood procedure for both estimating and selecting these effects. Furthermore, we introduce an orthogonality-based method to estimate the non-parametric component, ensuring that the fixed and random effects are separated without any mutual interference. The asymptotic properties of the resulting estimators are investigated under mild conditions. Simulation studies are conducted to evaluate the finite sample performance of the proposed method. Finally, we demonstrate the practical applicability of our methodology by analyzing a real data.
在本研究中,我们探讨了部分线性混合效应模型中固定效应和随机效应的选择问题。通过结合 B-样条曲线和 QR 分解技术,我们提出了一种估计和选择这些效应的双重惩罚似然程序。此外,我们还引入了一种基于正交性的方法来估计非参数成分,确保固定效应和随机效应分离,互不干扰。我们在温和的条件下研究了所得到的估计值的渐近特性。我们还进行了模拟研究,以评估所提方法的有限样本性能。最后,我们通过分析真实数据证明了我们方法的实际应用性。
{"title":"Double penalized variable selection for high-dimensional partial linear mixed effects models","authors":"Yiping Yang, Chuanqin Luo, Weiming Yang","doi":"10.1016/j.jmva.2024.105345","DOIUrl":"10.1016/j.jmva.2024.105345","url":null,"abstract":"<div><p>In this study, we address the selection of both fixed and random effects in partial linear mixed effects models. By combining B-spline and QR decomposition techniques, we propose a double-penalized likelihood procedure for both estimating and selecting these effects. Furthermore, we introduce an orthogonality-based method to estimate the non-parametric component, ensuring that the fixed and random effects are separated without any mutual interference. The asymptotic properties of the resulting estimators are investigated under mild conditions. Simulation studies are conducted to evaluate the finite sample performance of the proposed method. Finally, we demonstrate the practical applicability of our methodology by analyzing a real data.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105345"},"PeriodicalIF":1.4,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1016/j.jmva.2024.105344
Šárka Hudecová , Miroslav Šiman
The article proposes and justifies an optimal rank-based portmanteau test of multivariate elliptical strict white noise against multivariate serial dependence. It is based on new stochastic hyperplane-based ranks that are simpler and easier to compute than other usable hyperplane-based competitors and still share with them many good properties such as their distribution-free nature, affine invariance, efficiency, robustness and weak moment assumptions. The finite-sample performance of the portmanteau test is illustrated empirically in a small Monte Carlo simulation study.
{"title":"Stochastic hyperplane-based ranks and their use in multivariate portmanteau tests","authors":"Šárka Hudecová , Miroslav Šiman","doi":"10.1016/j.jmva.2024.105344","DOIUrl":"10.1016/j.jmva.2024.105344","url":null,"abstract":"<div><p>The article proposes and justifies an optimal rank-based portmanteau test of multivariate elliptical strict white noise against multivariate serial dependence. It is based on new stochastic hyperplane-based ranks that are simpler and easier to compute than other usable hyperplane-based competitors and still share with them many good properties such as their distribution-free nature, affine invariance, efficiency, robustness and weak moment assumptions. The finite-sample performance of the portmanteau test is illustrated empirically in a small Monte Carlo simulation study.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"204 ","pages":"Article 105344"},"PeriodicalIF":1.4,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141785479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1016/j.jmva.2024.105343
Ping Yu , Xinyuan Song , Jiang Du
Recent research and substantive studies have shown growing interest in expectile regression (ER) procedures. Similar to quantile regression, ER with respect to different expectile levels can provide a comprehensive picture of the conditional distribution of a response variable given predictors. This study proposes three composite-type ER estimators to improve estimation accuracy. The proposed ER estimators include the composite estimator, which minimizes the composite expectile objective function across expectiles; the weighted expectile average estimator, which takes the weighted average of expectile-specific estimators; and the weighted composite estimator, which minimizes the weighted composite expectile objective function across expectiles. Under certain regularity conditions, we derive the convergence rate of the slope function, obtain the mean squared prediction error, and establish the asymptotic normality of the slope vector. Simulations are conducted to assess the empirical performances of various estimators. An application to the analysis of capital bike share data is presented. The numerical evidence endorses our theoretical results and confirm the superiority of the composite-type ER estimators to the conventional least squares and single ER estimators.
最近的研究和实证研究表明,人们对预期回归(ER)程序越来越感兴趣。与量子回归类似,不同期望水平的 ER 可以全面反映给定预测因子的响应变量的条件分布。本研究提出了三种复合型ER估计器,以提高估计精度。所提出的ER估计器包括复合估计器,它能最小化跨期望值的复合期望值目标函数;加权期望值平均估计器,它取特定期望值估计器的加权平均值;以及加权复合估计器,它能最小化跨期望值的加权复合期望值目标函数。在一定的规则性条件下,我们推导出斜率函数的收敛率,得到均方预测误差,并建立斜率向量的渐近正态性。我们还进行了模拟,以评估各种估计器的经验性能。此外,还介绍了资本自行车份额数据分析的应用。数值证据支持了我们的理论结果,并证实了复合型ER估计器优于传统的最小二乘法和单一ER估计器。
{"title":"Composite expectile estimation in partial functional linear regression model","authors":"Ping Yu , Xinyuan Song , Jiang Du","doi":"10.1016/j.jmva.2024.105343","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105343","url":null,"abstract":"<div><p>Recent research and substantive studies have shown growing interest in expectile regression (ER) procedures. Similar to quantile regression, ER with respect to different expectile levels can provide a comprehensive picture of the conditional distribution of a response variable given predictors. This study proposes three composite-type ER estimators to improve estimation accuracy. The proposed ER estimators include the composite estimator, which minimizes the composite expectile objective function across expectiles; the weighted expectile average estimator, which takes the weighted average of expectile-specific estimators; and the weighted composite estimator, which minimizes the weighted composite expectile objective function across expectiles. Under certain regularity conditions, we derive the convergence rate of the slope function, obtain the mean squared prediction error, and establish the asymptotic normality of the slope vector. Simulations are conducted to assess the empirical performances of various estimators. An application to the analysis of capital bike share data is presented. The numerical evidence endorses our theoretical results and confirm the superiority of the composite-type ER estimators to the conventional least squares and single ER estimators.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"203 ","pages":"Article 105343"},"PeriodicalIF":1.4,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-12DOI: 10.1016/j.jmva.2024.105342
El Mehdi Issouani , Patrice Bertail , Emmanuelle Gautherat
We obtain exponential inequalities for regularized Hotelling’s statistics, that take into account the potential high dimensional aspects of the problem. We explore the finite sample properties of the tail of these statistics by deriving exponential bounds for symmetric distributions and also for general distributions under weak moment assumptions (we never assume exponential moments). For this, we use a penalized estimator of the covariance matrix and propose an optimal choice for the penalty coefficient.
{"title":"Exponential bounds for regularized Hotelling’s T2 statistic in high dimension","authors":"El Mehdi Issouani , Patrice Bertail , Emmanuelle Gautherat","doi":"10.1016/j.jmva.2024.105342","DOIUrl":"10.1016/j.jmva.2024.105342","url":null,"abstract":"<div><p>We obtain exponential inequalities for regularized Hotelling’s <span><math><msubsup><mrow><mi>T</mi></mrow><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msubsup></math></span> statistics, that take into account the potential high dimensional aspects of the problem. We explore the finite sample properties of the tail of these statistics by deriving exponential bounds for symmetric distributions and also for general distributions under weak moment assumptions (we never assume exponential moments). For this, we use a penalized estimator of the covariance matrix and propose an optimal choice for the penalty coefficient.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"203 ","pages":"Article 105342"},"PeriodicalIF":1.4,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000496/pdfft?md5=cd918fc00e938bad85311ad3c899e4a8&pid=1-s2.0-S0047259X24000496-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141392223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.1016/j.jmva.2024.105341
Kanti V. Mardia
Sir Ronald Aylmer Fisher opened many new areas in Multivariate Analysis, and the one which we will consider is discriminant analysis. Several papers by Fisher and others followed from his seminal paper in 1936 where he coined the name discrimination function. Historically, his four papers on discriminant analysis during 1936–1940 connect to the contemporaneous pioneering work of Hotelling and Mahalanobis. We revisit the famous iris data which Fisher used in his 1936 paper and in particular, test the hypothesis of multivariate normality for the data which he assumed. Fisher constructed his genetic discriminant motivated by this application and we provide a deeper insight into this construction; however, this construction has not been well understood as far as we know. We also indicate how the subject has developed along with the computer revolution, noting newer methods to carry out discriminant analysis, such as kernel classifiers, classification trees, support vector machines, neural networks, and deep learning. Overall, with computational power, the whole subject of Multivariate Analysis has changed its emphasis but the impact of this Fisher’s pioneering work continues as an integral part of supervised learning in Artificial Intelligence (AI).
{"title":"Fisher’s pioneering work on discriminant analysis and its impact on Artificial Intelligence","authors":"Kanti V. Mardia","doi":"10.1016/j.jmva.2024.105341","DOIUrl":"10.1016/j.jmva.2024.105341","url":null,"abstract":"<div><p>Sir Ronald Aylmer Fisher opened many new areas in Multivariate Analysis, and the one which we will consider is discriminant analysis. Several papers by Fisher and others followed from his seminal paper in 1936 where he coined the name discrimination function. Historically, his four papers on discriminant analysis during 1936–1940 connect to the contemporaneous pioneering work of Hotelling and Mahalanobis. We revisit the famous iris data which Fisher used in his 1936 paper and in particular, test the hypothesis of multivariate normality for the data which he assumed. Fisher constructed his genetic discriminant motivated by this application and we provide a deeper insight into this construction; however, this construction has not been well understood as far as we know. We also indicate how the subject has developed along with the computer revolution, noting newer methods to carry out discriminant analysis, such as kernel classifiers, classification trees, support vector machines, neural networks, and deep learning. Overall, with computational power, the whole subject of Multivariate Analysis has changed its emphasis but the impact of this Fisher’s pioneering work continues as an integral part of supervised learning in Artificial Intelligence (AI).</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"203 ","pages":"Article 105341"},"PeriodicalIF":1.6,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141403667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1016/j.jmva.2024.105340
Pavel Krupskii , Raphaël Huser
In this paper, we introduce a new class of models for spatial data obtained from max-convolution processes based on indicator kernels with random shape. We show that these models have appealing dependence properties including tail dependence at short distances and independence at long distances. We further consider max-convolutions between such processes and processes with tail independence, in order to separately control the bulk and tail dependence behaviors, and to increase flexibility of the model at longer distances, in particular, to capture intermediate tail dependence. We show how parameters can be estimated using a weighted pairwise likelihood approach, and we conduct an extensive simulation study to show that the proposed inference approach is feasible in relatively high dimensions and it yields accurate parameter estimates in most cases. We apply the proposed methodology to analyze daily temperature maxima measured at 100 monitoring stations in the state of Oklahoma, US. Our results indicate that our proposed model provides a good fit to the data, and that it captures both the bulk and the tail dependence structures accurately.
{"title":"Max-convolution processes with random shape indicator kernels","authors":"Pavel Krupskii , Raphaël Huser","doi":"10.1016/j.jmva.2024.105340","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105340","url":null,"abstract":"<div><p>In this paper, we introduce a new class of models for spatial data obtained from max-convolution processes based on indicator kernels with random shape. We show that these models have appealing dependence properties including tail dependence at short distances and independence at long distances. We further consider max-convolutions between such processes and processes with tail independence, in order to separately control the bulk and tail dependence behaviors, and to increase flexibility of the model at longer distances, in particular, to capture intermediate tail dependence. We show how parameters can be estimated using a weighted pairwise likelihood approach, and we conduct an extensive simulation study to show that the proposed inference approach is feasible in relatively high dimensions and it yields accurate parameter estimates in most cases. We apply the proposed methodology to analyze daily temperature maxima measured at 100 monitoring stations in the state of Oklahoma, US. Our results indicate that our proposed model provides a good fit to the data, and that it captures both the bulk and the tail dependence structures accurately.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"203 ","pages":"Article 105340"},"PeriodicalIF":1.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000472/pdfft?md5=6e148f4b405bc0c38b2fef0ced10dc6b&pid=1-s2.0-S0047259X24000472-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141313857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}