{"title":"Finding Optimal Alphabet for Encoding Daily Continuous Glucose Monitoring Time Series Into Compressed Text.","authors":"Tobore Igbe, Boris Kovatchev","doi":"10.1177/19322968251323913","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The emergence of continuous glucose monitoring (CGM) devices has not only revolutionized diabetes management but has also opened new avenues for research. This article presents a novel approach to encoding a CGM daily profile into a CGM string and CGM text that preserves clinical metrics information but compresses the data.</p><p><strong>Methods: </strong>Eight alphabets were defined to represent glucose ranges. The Akaike information criterion (AIC) was derived from error, and the compression ratio was estimated for each alphabet to determine the optimal alphabet for encoding the CGM daily profile. The analysis was done with data from six distinct studies, with different treatment modalities, applied to individuals with type 1 diabetes (T1D) or type 2 diabetes (T2D), and without diabetes. The data set was divided into 70% for training and 30% for validation.</p><p><strong>Result: </strong>The result from the training data reveals that a 9-letter alphabet was optimal for encoding daily CGM profiles for T1D or T2D, yielding the lowest AIC score that minimizes information loss. However, in health, fewer letters were needed, and this is to be expected, given the lower variation of the data. Further testing with the Pearson correlation showed that the 9-letter alphabet approximated the coefficient of variation, with correlations between 0.945 and 0.965.</p><p><strong>Conclusion: </strong>Encoding CGM data into text could enhance the classification of CGM profiles and enable the use of well-established search engines with CGM data. Other potential applications include predictive modeling, anomaly detection, indexing, trend analysis, or future generative artificial intelligence applications for diabetes research and clinical practice.</p>","PeriodicalId":15475,"journal":{"name":"Journal of Diabetes Science and Technology","volume":" ","pages":"19322968251323913"},"PeriodicalIF":4.1000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Diabetes Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/19322968251323913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The emergence of continuous glucose monitoring (CGM) devices has not only revolutionized diabetes management but has also opened new avenues for research. This article presents a novel approach to encoding a CGM daily profile into a CGM string and CGM text that preserves clinical metrics information but compresses the data.
Methods: Eight alphabets were defined to represent glucose ranges. The Akaike information criterion (AIC) was derived from error, and the compression ratio was estimated for each alphabet to determine the optimal alphabet for encoding the CGM daily profile. The analysis was done with data from six distinct studies, with different treatment modalities, applied to individuals with type 1 diabetes (T1D) or type 2 diabetes (T2D), and without diabetes. The data set was divided into 70% for training and 30% for validation.
Result: The result from the training data reveals that a 9-letter alphabet was optimal for encoding daily CGM profiles for T1D or T2D, yielding the lowest AIC score that minimizes information loss. However, in health, fewer letters were needed, and this is to be expected, given the lower variation of the data. Further testing with the Pearson correlation showed that the 9-letter alphabet approximated the coefficient of variation, with correlations between 0.945 and 0.965.
Conclusion: Encoding CGM data into text could enhance the classification of CGM profiles and enable the use of well-established search engines with CGM data. Other potential applications include predictive modeling, anomaly detection, indexing, trend analysis, or future generative artificial intelligence applications for diabetes research and clinical practice.
期刊介绍:
The Journal of Diabetes Science and Technology (JDST) is a bi-monthly, peer-reviewed scientific journal published by the Diabetes Technology Society. JDST covers scientific and clinical aspects of diabetes technology including glucose monitoring, insulin and metabolic peptide delivery, the artificial pancreas, digital health, precision medicine, social media, cybersecurity, software for modeling, physiologic monitoring, technology for managing obesity, and diagnostic tests of glycation. The journal also covers the development and use of mobile applications and wireless communication, as well as bioengineered tools such as MEMS, new biomaterials, and nanotechnology to develop new sensors. Articles in JDST cover both basic research and clinical applications of technologies being developed to help people with diabetes.