Interest in communication-avoiding orthogonalization schemes for high-performance computing has been growing recently. This manuscript addresses open questions about the numerical stability of various block classical Gram-Schmidt variants that have been proposed in the past few years. An abstract framework is employed, the flexibility of which allows for new rigorous bounds on the loss of orthogonality in these variants. We first analyze a generalization of (reorthogonalized) block classical Gram-Schmidt and show that a “strong” intrablock orthogonalization routine is only needed for the very first block in order to maintain orthogonality on the level of the unit roundoff. In particular, this “strong” first step does not have to be a reorthogonalized QR itself and subsequent steps can use less stable QR variants, thus keeping the overall communication costs low.
Then, using this variant, which has four synchronization points per block column, we remove the synchronization points one at a time and analyze how each alteration affects the stability of the resulting method. Our analysis shows that the variant requiring only one synchronization per block column, equivalent to a variant previously proposed in the literature, cannot be guaranteed to be stable in practice, as stability begins to degrade with the first reduction of synchronization points. As a negative result, we conclude that this particular block algorithm should be avoided in practice.
Our analysis of block methods also provides new, more positive theoretical results for the single-column case. In particular, it is proven that DCGS2 from (Bielich et al., 2022 [5]) and CGS-2 from (Świrydowicz et al., 2021 [10]) are as stable as Householder QR. Numerical examples from the BlockStab toolbox are included throughout, to help compare variants and illustrate the effects of different choices of intraorthogonalization subroutines.
扫码关注我们
求助内容:
应助结果提醒方式:
