使用聚类算法分析 2019 年俄亥俄州疾病干预专家 (DIS) 记录的梅毒病例。

IF 2.4 4区 医学 Q3 INFECTIOUS DISEASES Sexually transmitted diseases Pub Date : 2025-03-01 Epub Date: 2024-10-31 DOI:10.1097/OLQ.0000000000002091
Payal Chakraborty, Xia Ning, Mary McNeill, David M Kline, Abigail B Shoben, William C Miller, Abigail Norris Turner
{"title":"使用聚类算法分析 2019 年俄亥俄州疾病干预专家 (DIS) 记录的梅毒病例。","authors":"Payal Chakraborty, Xia Ning, Mary McNeill, David M Kline, Abigail B Shoben, William C Miller, Abigail Norris Turner","doi":"10.1097/OLQ.0000000000002091","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Developments in natural language processing and unsupervised machine learning methodologies (e.g., clustering) have given researchers new tools to analyze both structured and unstructured health data. We applied these methods to 2019 Ohio disease intervention specialist (DIS) syphilis records, to determine whether these methods can uncover novel patterns of co-occurrence of individual characteristics, risk factors, and clinical characteristics of syphilis that are not yet reported in the literature.</p><p><strong>Methods: </strong>The 2019 disease intervention specialist syphilis records (n = 1996) contain both structured data (categorical and numerical variables) and unstructured notes. In the structured data, we examined case demographics, syphilis risk factors, and clinical characteristics of syphilis. For the unstructured text, we applied TF-IDF (term frequency multiplied by inverse document frequency) weights, a common way to convert text into numerical representations. We performed agglomerative clustering with cosine similarity using the CLUTO software.</p><p><strong>Results: </strong>The cluster analysis yielded 6 clusters of syphilis cases based on patterns in the structured and unstructured data. The average internal similarities were much higher than the average external similarities, indicating that the clusters were well formed. The factors underlying 3 of the clusters related to patterns of missing data. The factors underlying the other 3 clusters were sexual behaviors and partnerships. Notably, 1 of the 3 consisted of individuals who reported oral sex with male or anonymous partners while intoxicated, and one comprised mainly of males who have sex with females.</p><p><strong>Conclusions: </strong>Our analysis resulted in clusters that were well formed mathematically, but did not reveal novel epidemiological information about syphilis risk factors or transmission that were not already known.</p>","PeriodicalId":21837,"journal":{"name":"Sexually transmitted diseases","volume":" ","pages":"146-153"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of 2019 Ohio Disease Intervention Specialist Records for Syphilis Cases Using Clustering Algorithms.\",\"authors\":\"Payal Chakraborty, Xia Ning, Mary McNeill, David M Kline, Abigail B Shoben, William C Miller, Abigail Norris Turner\",\"doi\":\"10.1097/OLQ.0000000000002091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Developments in natural language processing and unsupervised machine learning methodologies (e.g., clustering) have given researchers new tools to analyze both structured and unstructured health data. We applied these methods to 2019 Ohio disease intervention specialist (DIS) syphilis records, to determine whether these methods can uncover novel patterns of co-occurrence of individual characteristics, risk factors, and clinical characteristics of syphilis that are not yet reported in the literature.</p><p><strong>Methods: </strong>The 2019 disease intervention specialist syphilis records (n = 1996) contain both structured data (categorical and numerical variables) and unstructured notes. In the structured data, we examined case demographics, syphilis risk factors, and clinical characteristics of syphilis. For the unstructured text, we applied TF-IDF (term frequency multiplied by inverse document frequency) weights, a common way to convert text into numerical representations. We performed agglomerative clustering with cosine similarity using the CLUTO software.</p><p><strong>Results: </strong>The cluster analysis yielded 6 clusters of syphilis cases based on patterns in the structured and unstructured data. The average internal similarities were much higher than the average external similarities, indicating that the clusters were well formed. The factors underlying 3 of the clusters related to patterns of missing data. The factors underlying the other 3 clusters were sexual behaviors and partnerships. Notably, 1 of the 3 consisted of individuals who reported oral sex with male or anonymous partners while intoxicated, and one comprised mainly of males who have sex with females.</p><p><strong>Conclusions: </strong>Our analysis resulted in clusters that were well formed mathematically, but did not reveal novel epidemiological information about syphilis risk factors or transmission that were not already known.</p>\",\"PeriodicalId\":21837,\"journal\":{\"name\":\"Sexually transmitted diseases\",\"volume\":\" \",\"pages\":\"146-153\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sexually transmitted diseases\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/OLQ.0000000000002091\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/31 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sexually transmitted diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/OLQ.0000000000002091","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0

摘要

背景:自然语言处理(NLP)和无监督机器学习方法(如聚类)的发展为研究人员提供了分析结构化和非结构化健康数据的新工具。我们将这些方法应用于2019年俄亥俄州疾病干预专家(DIS)梅毒记录,以确定这些方法是否能发现文献中尚未报道的梅毒个体特征、风险因素和临床特征共同出现的新模式:2019 年 DIS 梅毒记录(n=1,996)包含结构化数据(分类和数字变量)和非结构化笔记。在结构化数据中,我们研究了病例人口统计学、梅毒风险因素和梅毒临床特征。对于非结构化文本,我们采用了 TF-IDF(词频乘以反向文档频率)权重,这是一种将文本转换为数字表示的常用方法。我们使用 CLUTO 软件进行了余弦相似性聚类分析:聚类分析根据结构化和非结构化数据中的模式得出了六个梅毒病例聚类。平均内部相似性远高于平均外部相似性,这表明聚类是有序形成的。其中三个聚类的基本因素与数据缺失模式有关。另外三个聚类的基本因素是性行为和伙伴关系。值得注意的是,三个聚类中的一个聚类由报告在醉酒时与男性或匿名伴侣发生口交的个人组成,另一个聚类主要由与女性发生性关系的男性组成:我们的分析得出了在数学上形成良好的聚类,但并没有揭示出梅毒风险因素或传播方面未知的新流行病学信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Analysis of 2019 Ohio Disease Intervention Specialist Records for Syphilis Cases Using Clustering Algorithms.

Background: Developments in natural language processing and unsupervised machine learning methodologies (e.g., clustering) have given researchers new tools to analyze both structured and unstructured health data. We applied these methods to 2019 Ohio disease intervention specialist (DIS) syphilis records, to determine whether these methods can uncover novel patterns of co-occurrence of individual characteristics, risk factors, and clinical characteristics of syphilis that are not yet reported in the literature.

Methods: The 2019 disease intervention specialist syphilis records (n = 1996) contain both structured data (categorical and numerical variables) and unstructured notes. In the structured data, we examined case demographics, syphilis risk factors, and clinical characteristics of syphilis. For the unstructured text, we applied TF-IDF (term frequency multiplied by inverse document frequency) weights, a common way to convert text into numerical representations. We performed agglomerative clustering with cosine similarity using the CLUTO software.

Results: The cluster analysis yielded 6 clusters of syphilis cases based on patterns in the structured and unstructured data. The average internal similarities were much higher than the average external similarities, indicating that the clusters were well formed. The factors underlying 3 of the clusters related to patterns of missing data. The factors underlying the other 3 clusters were sexual behaviors and partnerships. Notably, 1 of the 3 consisted of individuals who reported oral sex with male or anonymous partners while intoxicated, and one comprised mainly of males who have sex with females.

Conclusions: Our analysis resulted in clusters that were well formed mathematically, but did not reveal novel epidemiological information about syphilis risk factors or transmission that were not already known.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Sexually transmitted diseases
Sexually transmitted diseases 医学-传染病学
CiteScore
4.00
自引率
16.10%
发文量
289
审稿时长
3-8 weeks
期刊介绍: ​Sexually Transmitted Diseases, the official journal of the American Sexually Transmitted Diseases Association​, publishes peer-reviewed, original articles on clinical, laboratory, immunologic, epidemiologic, behavioral, public health, and historical topics pertaining to sexually transmitted diseases and related fields. Reports from the CDC and NIH provide up-to-the-minute information. A highly respected editorial board is composed of prominent scientists who are leaders in this rapidly changing field. Included in each issue are studies and developments from around the world.
期刊最新文献
L-Serovar Rectal Chlamydia trachomatis in Patients Who Were Male-Assigned at Birth Attending Two Sexual Health Clinics, Baltimore, Maryland 2009-2016. Another Tool for the Sexual Health Toolkit: US Health Care Provider Knowledge and Attitudes About Doxycycline Postexposure Prophylaxis to Prevent Bacterial Sexually Transmitted Infections Among Men Who Have Sex With Men. Analysis of 2019 Ohio Disease Intervention Specialist Records for Syphilis Cases Using Clustering Algorithms. Qualitatively Assessing ChatGPT Responses to Frequently Asked Questions Regarding Sexually Transmitted Diseases. Gonorrhea and Early Syphilis Treatment Practices Among Community Health Care Providers in Baltimore City, Maryland.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1