Analysis of 2019 Ohio Disease Intervention Specialist (DIS) Records for Syphilis Cases Using Clustering Algorithms.

IF 2.4 4区 医学 Q3 INFECTIOUS DISEASES Sexually transmitted diseases Pub Date : 2024-10-31 DOI:10.1097/OLQ.0000000000002091
Payal Chakraborty, Xia Ning, Mary McNeill, David M Kline, Abigail B Shoben, William C Miller, Abigail Norris Turner
{"title":"Analysis of 2019 Ohio Disease Intervention Specialist (DIS) Records for Syphilis Cases Using Clustering Algorithms.","authors":"Payal Chakraborty, Xia Ning, Mary McNeill, David M Kline, Abigail B Shoben, William C Miller, Abigail Norris Turner","doi":"10.1097/OLQ.0000000000002091","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Developments in natural language processing (NLP) and unsupervised machine learning methodologies (e.g., clustering) have given researchers new tools to analyze both structured and unstructured health data. We applied these methods to 2019 Ohio disease intervention specialist (DIS) syphilis records, to determine whether these methods can uncover novel patterns of co-occurrence of individual characteristics, risk factors, and clinical characteristics of syphilis that are not yet reported in the literature.</p><p><strong>Methods: </strong>The 2019 DIS syphilis records (n=1,996) contain both structured data (categorical and numerical variables) and unstructured notes. In the structured data, we examined case demographics, syphilis risk factors, and clinical characteristics of syphilis. For the unstructured text, we applied TF-IDF (term frequency multiplied by inverse document frequency) weights, a common way to convert text into numerical representations. We performed agglomerative clustering with cosine similarity using the CLUTO software.</p><p><strong>Results: </strong>The cluster analysis yielded six clusters of syphilis cases based on patterns in the structured and unstructured data. The average internal similarities were much higher than the average external similarities, indicating that the clusters were well-formed. The factors underlying three of the clusters related to patterns of missing data. The factors underlying the other three clusters were sexual behaviors and partnerships. Notably, one of the three consisted of individuals who reported oral sex with male or anonymous partners while intoxicated, and one was comprised mainly of males who have sex with females.</p><p><strong>Conclusions: </strong>Our analysis resulted in clusters that were well-formed mathematically, but did not reveal novel epidemiological information about syphilis risk factors or transmission that were not already known.</p>","PeriodicalId":21837,"journal":{"name":"Sexually transmitted diseases","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sexually transmitted diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/OLQ.0000000000002091","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Developments in natural language processing (NLP) and unsupervised machine learning methodologies (e.g., clustering) have given researchers new tools to analyze both structured and unstructured health data. We applied these methods to 2019 Ohio disease intervention specialist (DIS) syphilis records, to determine whether these methods can uncover novel patterns of co-occurrence of individual characteristics, risk factors, and clinical characteristics of syphilis that are not yet reported in the literature.

Methods: The 2019 DIS syphilis records (n=1,996) contain both structured data (categorical and numerical variables) and unstructured notes. In the structured data, we examined case demographics, syphilis risk factors, and clinical characteristics of syphilis. For the unstructured text, we applied TF-IDF (term frequency multiplied by inverse document frequency) weights, a common way to convert text into numerical representations. We performed agglomerative clustering with cosine similarity using the CLUTO software.

Results: The cluster analysis yielded six clusters of syphilis cases based on patterns in the structured and unstructured data. The average internal similarities were much higher than the average external similarities, indicating that the clusters were well-formed. The factors underlying three of the clusters related to patterns of missing data. The factors underlying the other three clusters were sexual behaviors and partnerships. Notably, one of the three consisted of individuals who reported oral sex with male or anonymous partners while intoxicated, and one was comprised mainly of males who have sex with females.

Conclusions: Our analysis resulted in clusters that were well-formed mathematically, but did not reveal novel epidemiological information about syphilis risk factors or transmission that were not already known.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用聚类算法分析 2019 年俄亥俄州疾病干预专家 (DIS) 记录的梅毒病例。
背景:自然语言处理(NLP)和无监督机器学习方法(如聚类)的发展为研究人员提供了分析结构化和非结构化健康数据的新工具。我们将这些方法应用于2019年俄亥俄州疾病干预专家(DIS)梅毒记录,以确定这些方法是否能发现文献中尚未报道的梅毒个体特征、风险因素和临床特征共同出现的新模式:2019 年 DIS 梅毒记录(n=1,996)包含结构化数据(分类和数字变量)和非结构化笔记。在结构化数据中,我们研究了病例人口统计学、梅毒风险因素和梅毒临床特征。对于非结构化文本,我们采用了 TF-IDF(词频乘以反向文档频率)权重,这是一种将文本转换为数字表示的常用方法。我们使用 CLUTO 软件进行了余弦相似性聚类分析:聚类分析根据结构化和非结构化数据中的模式得出了六个梅毒病例聚类。平均内部相似性远高于平均外部相似性,这表明聚类是有序形成的。其中三个聚类的基本因素与数据缺失模式有关。另外三个聚类的基本因素是性行为和伙伴关系。值得注意的是,三个聚类中的一个聚类由报告在醉酒时与男性或匿名伴侣发生口交的个人组成,另一个聚类主要由与女性发生性关系的男性组成:我们的分析得出了在数学上形成良好的聚类,但并没有揭示出梅毒风险因素或传播方面未知的新流行病学信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Sexually transmitted diseases
Sexually transmitted diseases 医学-传染病学
CiteScore
4.00
自引率
16.10%
发文量
289
审稿时长
3-8 weeks
期刊介绍: ​Sexually Transmitted Diseases, the official journal of the American Sexually Transmitted Diseases Association​, publishes peer-reviewed, original articles on clinical, laboratory, immunologic, epidemiologic, behavioral, public health, and historical topics pertaining to sexually transmitted diseases and related fields. Reports from the CDC and NIH provide up-to-the-minute information. A highly respected editorial board is composed of prominent scientists who are leaders in this rapidly changing field. Included in each issue are studies and developments from around the world.
期刊最新文献
Willingness of Joining Online Support Groups Among Men Who Have Sex With Men Living With HIV in Shandong Province of China: A Mixed Methods Study. Gonorrhea and Chlamydia Opt-Out Screening of Justice-Involved Women During Intake at the Los Angeles County Jail: The Pivotal Role of Correctional Health Systems. Retrospective Cohort Study of Financial Incentives for Sexually Transmitted Infection Testing and Treatment in an Outreach Population in Edmonton, Canada, 2018-2019. Multilevel Drivers of Congenital Syphilis, Oregon, 2013 to 2021. Clinical Studies Are Needed to Determine the Efficacy of Ceftriaxone and Other Interventions in Addressing Resistant Neisseria gonorrhoeae Infection.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1