This article presents a novel approach to clustering and feature selection for categorical time series via interpretable frequency-domain features. A distance measure is introduced based on the spectral envelope and optimal scalings, which parsimoniously characterize prominent cyclical patterns in categorical time series. Using this distance, partitional clustering algorithms are introduced for accurately clustering categorical time series. These adaptive procedures offer simultaneous feature selection for identifying important features that distinguish clusters and fuzzy membership when time series exhibit similarities to multiple clusters. Clustering consistency of the proposed methods is investigated, and simulation studies are used to demonstrate clustering accuracy with various underlying group structures. The proposed methods are used to cluster sleep stage time series for sleep disorder patients in order to identify particular oscillatory patterns associated with sleep disruption.
In this article, we propose a general framework to learn optimal treatment rules for type 2 diabetes (T2D) patients using electronic health records (EHRs). We first propose a joint modeling approach to characterize patient's pretreatment conditions using longitudinal markers from EHRs. The estimation accounts for informative measurement times using inverse-intensity weighting methods. The predicted latent processes in the joint model are used to divide patients into a finite of subgroups and, within each group, patients share similar health profiles in EHRs. Within each patient group, we estimate optimal individualized treatment rules by extending a matched learning method to handle multicategory treatments using a one-versus-one approach. Each matched learning for two treatments is implemented by a weighted support vector machine with matched pairs of patients. We apply our method to estimate optimal treatment rules for T2D patients in a large sample of EHRs from the Ohio State University Wexner Medical Center. We demonstrate the utility of our method to select the optimal treatments from four classes of drugs and achieve a better control of glycated hemoglobin than any one-size-fits-all rules.

