Clustering polycystic ovary syndrome laboratory results extracted from a large internet forum with machine learning

Rebecca H.K. Emanuel , Paul D. Docherty , Helen Lunt , Rua Murray , Rebecca E. Campbell
{"title":"Clustering polycystic ovary syndrome laboratory results extracted from a large internet forum with machine learning","authors":"Rebecca H.K. Emanuel ,&nbsp;Paul D. Docherty ,&nbsp;Helen Lunt ,&nbsp;Rua Murray ,&nbsp;Rebecca E. Campbell","doi":"10.1016/j.ibmed.2024.100135","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Polycystic Ovary Syndrome (PCOS) is reported to affect between 4% and 21% of reproductive aged people with ovaries. It is a heterogeneous condition with a lack of established phenotypes that address the range of reproductive and metabolic features present in PCOS. These reproductive and metabolic features may result in patients undergoing a variety of relevant laboratory tests. Previous work has led to the gathering of laboratory test results from a PCOS specific forum, hosted on a website called reddit.</p></div><div><h3>Objectives</h3><p>In this paper, laboratory results and body mass index (BMI) posted on the PCOS reddit forum were clustered to show the usefulness of the PCOS forum for PCOS research and validate existing PCOS phenotypes or discover other appropriate phenotypes.</p></div><div><h3>Methods and results</h3><p>Over 1500 sets of PCOS-related reddit laboratory test results and BMIs were clustered using nearest neighbour imputation and K-means clustering. However, only non-imputed data was included in the final clusters. Kernel Density Estimation plots were used to display the distinct clusters. The clustered test results suggested the existence of distinct metabolic and reproductive phenotypes, as well as a group displaying mild features of both types of dysregulations and a group skewed towards normal results. It was also possible to separate the groups further into distinct hypothyroid groups within the mixed dysregulation group and to separate insulin resistant and diabetes-like groups within the metabolic group.</p></div><div><h3>Conclusions</h3><p>This research further validates the usefulness of exploring alternate data sources in the age of the internet and machine learning. The reddit clusters reinforced the existing notion that people with PCOS can be separated into a primarily metabolic pathology group, a primarily reproductive pathology group and an in between group with pathology in both domains.</p></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"9 ","pages":"Article 100135"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666521224000024/pdfft?md5=87b2d688b9b327bd7f8d3d181ee40e71&pid=1-s2.0-S2666521224000024-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521224000024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Polycystic Ovary Syndrome (PCOS) is reported to affect between 4% and 21% of reproductive aged people with ovaries. It is a heterogeneous condition with a lack of established phenotypes that address the range of reproductive and metabolic features present in PCOS. These reproductive and metabolic features may result in patients undergoing a variety of relevant laboratory tests. Previous work has led to the gathering of laboratory test results from a PCOS specific forum, hosted on a website called reddit.

Objectives

In this paper, laboratory results and body mass index (BMI) posted on the PCOS reddit forum were clustered to show the usefulness of the PCOS forum for PCOS research and validate existing PCOS phenotypes or discover other appropriate phenotypes.

Methods and results

Over 1500 sets of PCOS-related reddit laboratory test results and BMIs were clustered using nearest neighbour imputation and K-means clustering. However, only non-imputed data was included in the final clusters. Kernel Density Estimation plots were used to display the distinct clusters. The clustered test results suggested the existence of distinct metabolic and reproductive phenotypes, as well as a group displaying mild features of both types of dysregulations and a group skewed towards normal results. It was also possible to separate the groups further into distinct hypothyroid groups within the mixed dysregulation group and to separate insulin resistant and diabetes-like groups within the metabolic group.

Conclusions

This research further validates the usefulness of exploring alternate data sources in the age of the internet and machine learning. The reddit clusters reinforced the existing notion that people with PCOS can be separated into a primarily metabolic pathology group, a primarily reproductive pathology group and an in between group with pathology in both domains.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习对从大型互联网论坛中提取的多囊卵巢综合征实验室结果进行聚类
背景据报道,多囊卵巢综合症(PCOS)影响到 4% 到 21% 的育龄卵巢患者。多囊卵巢综合征是一种异质性疾病,缺乏针对多囊卵巢综合征一系列生殖和代谢特征的既定表型。这些生殖和代谢特征可能导致患者接受各种相关的实验室检查。本文对 PCOS reddit 论坛上发布的实验室结果和体重指数 (BMI) 进行了聚类,以显示 PCOS 论坛对 PCOS 研究的有用性,并验证现有的 PCOS 表型或发现其他合适的表型。方法和结果使用近邻估算和 K-means 聚类对 1500 多组与 PCOS 相关的 reddit 实验室测试结果和 BMI 进行了聚类。不过,最终的聚类只包括非估算数据。核密度估计图用于显示不同的聚类。聚类测试结果表明,存在不同的代谢和生殖表型,一组显示出两种类型失调的轻微特征,另一组则偏向于正常结果。这项研究进一步验证了在互联网和机器学习时代探索其他数据源的实用性。reddit 聚类加强了现有的概念,即多囊卵巢综合症患者可分为以代谢病理为主的组别、以生殖病理为主的组别以及在两个领域都有病理的介于两者之间的组别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Intelligence-based medicine
Intelligence-based medicine Health Informatics
CiteScore
5.00
自引率
0.00%
发文量
0
审稿时长
187 days
期刊最新文献
Artificial intelligence in child development monitoring: A systematic review on usage, outcomes and acceptance Automatic characterization of cerebral MRI images for the detection of autism spectrum disorders DOTnet 2.0: Deep learning network for diffuse optical tomography image reconstruction Artificial intelligence in child development monitoring: A systematic review on usage, outcomes and acceptance Clustering polycystic ovary syndrome laboratory results extracted from a large internet forum with machine learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1