How to compare algorithms for automated insulin delivery using different sensors?

IF 5.7 2区医学 Q1 ENDOCRINOLOGY & METABOLISM Diabetes, Obesity & Metabolism Pub Date : 2025-02-05 DOI:10.1111/dom.16234

Klavs Würgler Hansen DMS

{"title":"How to compare algorithms for automated insulin delivery using different sensors?","authors":"Klavs Würgler Hansen DMS","doi":"10.1111/dom.16234","DOIUrl":null,"url":null,"abstract":"Recent caution has been advised against comparing automatic insulin delivery (AID) systems that use different sensors.1, 2 While the same two AID systems (Tandem Control IQ (CoIQ) and Minimed 780 G (MM780G)) have shown clear differences in time in range (TIR) outcomes, studies report no clinically relevant difference in HbA1c3, 4 or only minimal changes in HbA1c and TIR.5, 6 A simplified overview of these four studies is presented in Table 1. Of note, all TIR values are numerically higher in persons using MM780G (from 2.4% to 7.0% points), but all HbA1c values are minimally lower in persons using CoIQ (about 0.3% points). Tree studies3, 4, 6 provide information on sensor mean glucose or glucose monitoring indicator (GMI)7 over a 2-week period and HbA1c, which can be translated into estimated p-glucose from the ADAG equation (A1c-derived average glucose).8 Although this indirect approach contains several pitfalls, the result may indicate differences in sensor performances. Specifically, the Dexcom sensor connected to CoIQ appears to measure glucose levels higher than the Guardian sensor connected to the MM780G. This observation aligns with a recent small-scale study in which persons used both sensors simultaneously, reporting a slight positive bias for the Dexcom and a negative bias for the Guardian sensor when compared to self-monitored blood glucose.9This discrepancy between sensor-driven and sensor-independent glycaemic metrics indicates the influence of varying study populations and sensor performance across brands. However, does this necessarily imply that the insulin delivery algorithm and continuous glucose monitoring (CGM) data hold no significance?Consider a scenario in which an insulin pump using algorithm 1 with sensor A is compared with an insulin pump using algorithm 2 with sensor B. Suppose users of algorithm 1 achieve a TIR of 68% and a sensor-based mean glucose of 9.0 mmol/L, while a comparable group of users of algorithm 2 achieves a TIR of 75% and a mean glucose of 8.5 mmol/L, yet with a similar HbA1c. In such cases, TIR and sensor mean glucose metrics may be misleading from a clinical perspective, as sensor A likely measures glucose levels higher than sensor B. If algorithm 1 receives glucose input from sensor B, this is a new product with unknown results requiring new validation.The goal of developing algorithms for insulin delivery is to enhance safety, minimize the risk of hypoglycaemia, ensure patient adherence and increase TIR, guided by input from the specific sensor. In this context, algorithm 2 is more effective than algorithm 1. A 7 percentage point difference in TIR could translate into a meaningful difference in the risk of diabetic complications.10 The apparent ‘rescue’ of algorithm 1 due to differences in sensor performance fails to capture the fundamental intellectual and clinical objectives underlying algorithm design.Sensor performance has traditionally been reported as the mean absolute relative difference (MARD) between a sensor and a reference glucose concentration from a blood sample. There is a growing understanding that a similar and low MARD value does not guarantee the absence of any clinically relevant differences between sensors.2, 11 One of the drawbacks of MARD is that this metric does not provide information about the direction of any deviation from the reference value.12To reliably evaluate the clinical performance of different AID systems based solely on CGM data, an internationally accepted standardization of glucose sensors is essential, with adherence mandated for manufacturers.13-15 However, this standardization process is complicated by the lack of a definitive reference value for interstitial glucose and the inherent difficulties in estimating plasma glucose from interstitial glucose under varying physiological conditions.13While we fully agree that sensor standardization is essential, and HbA1c is still needed for clinical assessment, we believe that insulin delivery algorithms per se can still be evaluated effectively with sensor-based data despite differences in sensor performance. In other words, the insulin delivery algorithm, as a data engineering product, can be evaluated in its own right from the sensor data that served as input during its development and optimisation process.KWH has received grants for an investigator-initiated study from Abbott Diabetes Care and Novo Nordisk.","PeriodicalId":158,"journal":{"name":"Diabetes, Obesity & Metabolism","volume":"27 5","pages":"2319-2321"},"PeriodicalIF":5.7000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/dom.16234","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes, Obesity & Metabolism","FirstCategoryId":"3","ListUrlMain":"https://dom-pubs.onlinelibrary.wiley.com/doi/10.1111/dom.16234","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

Abstract

Recent caution has been advised against comparing automatic insulin delivery (AID) systems that use different sensors.^{1, 2} While the same two AID systems (Tandem Control IQ (CoIQ) and Minimed 780 G (MM780G)) have shown clear differences in time in range (TIR) outcomes, studies report no clinically relevant difference in HbA1c^{3, 4} or only minimal changes in HbA1c and TIR.^{5, 6} A simplified overview of these four studies is presented in Table 1. Of note, all TIR values are numerically higher in persons using MM780G (from 2.4% to 7.0% points), but all HbA1c values are minimally lower in persons using CoIQ (about 0.3% points). Tree studies^{3, 4, 6} provide information on sensor mean glucose or glucose monitoring indicator (GMI)⁷ over a 2-week period and HbA1c, which can be translated into estimated p-glucose from the ADAG equation (A1c-derived average glucose).⁸ Although this indirect approach contains several pitfalls, the result may indicate differences in sensor performances. Specifically, the Dexcom sensor connected to CoIQ appears to measure glucose levels higher than the Guardian sensor connected to the MM780G. This observation aligns with a recent small-scale study in which persons used both sensors simultaneously, reporting a slight positive bias for the Dexcom and a negative bias for the Guardian sensor when compared to self-monitored blood glucose.⁹

This discrepancy between sensor-driven and sensor-independent glycaemic metrics indicates the influence of varying study populations and sensor performance across brands. However, does this necessarily imply that the insulin delivery algorithm and continuous glucose monitoring (CGM) data hold no significance?

Consider a scenario in which an insulin pump using algorithm 1 with sensor A is compared with an insulin pump using algorithm 2 with sensor B. Suppose users of algorithm 1 achieve a TIR of 68% and a sensor-based mean glucose of 9.0 mmol/L, while a comparable group of users of algorithm 2 achieves a TIR of 75% and a mean glucose of 8.5 mmol/L, yet with a similar HbA1c. In such cases, TIR and sensor mean glucose metrics may be misleading from a clinical perspective, as sensor A likely measures glucose levels higher than sensor B. If algorithm 1 receives glucose input from sensor B, this is a new product with unknown results requiring new validation.

The goal of developing algorithms for insulin delivery is to enhance safety, minimize the risk of hypoglycaemia, ensure patient adherence and increase TIR, guided by input from the specific sensor. In this context, algorithm 2 is more effective than algorithm 1. A 7 percentage point difference in TIR could translate into a meaningful difference in the risk of diabetic complications.¹⁰ The apparent ‘rescue’ of algorithm 1 due to differences in sensor performance fails to capture the fundamental intellectual and clinical objectives underlying algorithm design.

Sensor performance has traditionally been reported as the mean absolute relative difference (MARD) between a sensor and a reference glucose concentration from a blood sample. There is a growing understanding that a similar and low MARD value does not guarantee the absence of any clinically relevant differences between sensors.^{2, 11} One of the drawbacks of MARD is that this metric does not provide information about the direction of any deviation from the reference value.¹²

To reliably evaluate the clinical performance of different AID systems based solely on CGM data, an internationally accepted standardization of glucose sensors is essential, with adherence mandated for manufacturers.^13-15 However, this standardization process is complicated by the lack of a definitive reference value for interstitial glucose and the inherent difficulties in estimating plasma glucose from interstitial glucose under varying physiological conditions.¹³

While we fully agree that sensor standardization is essential, and HbA1c is still needed for clinical assessment, we believe that insulin delivery algorithms per se can still be evaluated effectively with sensor-based data despite differences in sensor performance. In other words, the insulin delivery algorithm, as a data engineering product, can be evaluated in its own right from the sensor data that served as input during its development and optimisation process.

KWH has received grants for an investigator-initiated study from Abbott Diabetes Care and Novo Nordisk.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

如何比较使用不同传感器的自动胰岛素输送算法？

最近有人建议不要比较使用不同传感器的自动胰岛素输送（AID）系统。虽然相同的两种AID系统（串联控制IQ （CoIQ）和Minimed 780G (MM780G)）在时间范围（TIR）结果上显示出明显的差异，但研究报告HbA1c没有临床相关差异3,4或只有HbA1c和TIR的微小变化5,6。值得注意的是，在使用MM780G的人群中，所有的TIR值在数值上都更高（从2.4%到7.0%），但在使用CoIQ的人群中，所有的HbA1c值都最低（约0.3%）。3、4、6项研究提供了2周内传感器平均葡萄糖或葡萄糖监测指标（GMI）7和HbA1c的信息，HbA1c可以从ADAG方程（a1c衍生的平均葡萄糖）中转化为估计的p-葡萄糖8虽然这种间接的方法包含几个缺陷，但结果可能表明传感器性能的差异。具体来说，连接到CoIQ的Dexcom传感器测量的葡萄糖水平似乎高于连接到MM780G的Guardian传感器。这一观察结果与最近的一项小规模研究相一致，在该研究中，人们同时使用两种传感器，与自我监测血糖相比，Dexcom传感器有轻微的正面偏差，Guardian传感器有轻微的负面偏差。传感器驱动和传感器独立血糖指标之间的差异表明不同研究人群和不同品牌传感器性能的影响。然而，这是否一定意味着胰岛素输送算法和连续血糖监测（CGM）数据没有意义？假设算法1的用户TIR为68%，基于传感器的平均血糖为9.0 mmol/L，而算法2的用户组TIR为75%，平均血糖为8.5 mmol/L，但HbA1c相似。在这种情况下，从临床角度来看，TIR和传感器平均葡萄糖指标可能会产生误导，因为传感器a测量的葡萄糖水平可能高于传感器B。如果算法1从传感器B接收葡萄糖输入，则这是一种新产品，结果未知，需要重新验证。开发胰岛素输送算法的目标是在特定传感器输入的指导下，提高安全性，最大限度地降低低血糖风险，确保患者依从性并提高TIR。在这种情况下，算法2比算法1更有效。7个百分点的TIR差异可以转化为糖尿病并发症风险的有意义的差异由于传感器性能的差异，算法1的明显“拯救”未能捕捉算法设计背后的基本智力和临床目标。传感器性能传统上被报道为传感器和血液样本参考葡萄糖浓度之间的平均绝对相对差（MARD）。越来越多的人认识到，相似的低MARD值并不能保证传感器之间没有任何临床相关的差异。2,11 MARD的缺点之一是，该指标不提供任何偏离参考值方向的信息。为了可靠地评估仅基于CGM数据的不同AID系统的临床性能，国际上接受的葡萄糖传感器标准化是必不可少的，并强制要求制造商遵守。13-15然而，由于缺乏间质葡萄糖的明确参考值，以及在不同生理条件下通过间质葡萄糖估计血浆葡萄糖的固有困难，这一标准化过程变得复杂。虽然我们完全同意传感器标准化是必要的，临床评估仍然需要HbA1c，但我们认为，尽管传感器性能存在差异，但胰岛素递送算法本身仍然可以通过基于传感器的数据进行有效评估。换句话说，胰岛素输送算法作为一种数据工程产品，在其开发和优化过程中，可以从作为输入的传感器数据中自行评估。KWH已获得雅培糖尿病护理和诺和诺德的一项研究者发起的研究资助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Diabetes, Obesity & Metabolism 医学-内分泌学与代谢

CiteScore

10.90

自引率

6.90%

发文量

319

审稿时长

3-8 weeks

期刊介绍： Diabetes, Obesity and Metabolism is primarily a journal of clinical and experimental pharmacology and therapeutics covering the interrelated areas of diabetes, obesity and metabolism. The journal prioritises high-quality original research that reports on the effects of new or existing therapies, including dietary, exercise and lifestyle (non-pharmacological) interventions, in any aspect of metabolic and endocrine disease, either in humans or animal and cellular systems. ‘Metabolism’ may relate to lipids, bone and drug metabolism, or broader aspects of endocrine dysfunction. Preclinical pharmacology, pharmacokinetic studies, meta-analyses and those addressing drug safety and tolerability are also highly suitable for publication in this journal. Original research may be published as a main paper or as a research letter.