语义搜索助手:一种基于在多条目问卷中使用嵌入的工具,作为合并大型数据集的协调机会-可行性研究。

IF 7.2 2区 医学 Q1 PSYCHIATRY European Psychiatry Pub Date : 2025-01-20 DOI:10.1192/j.eurpsy.2024.1808
Karl Gottfried, Karina Janson, Nathalie E Holz, Olaf Reis, Johannes Kornhuber, Anna Eichler, Tobias Banaschewski, Frauke Nees
{"title":"语义搜索助手:一种基于在多条目问卷中使用嵌入的工具,作为合并大型数据集的协调机会-可行性研究。","authors":"Karl Gottfried, Karina Janson, Nathalie E Holz, Olaf Reis, Johannes Kornhuber, Anna Eichler, Tobias Banaschewski, Frauke Nees","doi":"10.1192/j.eurpsy.2024.1808","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Recent advances in natural language processing (NLP), particularly in language processing methods, have opened new avenues in semantic data analysis. A promising application of NLP is data harmonization in questionnaire-based cohort studies, where it can be used as an additional method, specifically when only different instruments are available for one construct as well as for the evaluation of potentially new construct-constellations. The present article therefore explores embedding models' potential to detect opportunities for semantic harmonization.</p><p><strong>Methods: </strong>Using models like SBERT and OpenAI's ADA, we developed a prototype application (\"Semantic Search Helper\") to facilitate the harmonization process of detecting semantically similar items within extensive health-related datasets. The approach's feasibility and applicability were evaluated through a use case analysis involving data from four large cohort studies with heterogeneous data obtained with a different set of instruments for common constructs.</p><p><strong>Results: </strong>With the prototype, we effectively identified potential harmonization pairs, which significantly reduced manual evaluation efforts. Expert ratings of semantic similarity candidates showed high agreement with model-generated pairs, confirming the validity of our approach.</p><p><strong>Conclusions: </strong>This study demonstrates the potential of embeddings in matching semantic similarity as a promising add-on tool to assist harmonization processes of multiplex data sets and instruments but with similar content, within and across studies.</p>","PeriodicalId":12155,"journal":{"name":"European Psychiatry","volume":"68 1","pages":"e8"},"PeriodicalIF":7.2000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic search helper: A tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets - A feasibility study.\",\"authors\":\"Karl Gottfried, Karina Janson, Nathalie E Holz, Olaf Reis, Johannes Kornhuber, Anna Eichler, Tobias Banaschewski, Frauke Nees\",\"doi\":\"10.1192/j.eurpsy.2024.1808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Recent advances in natural language processing (NLP), particularly in language processing methods, have opened new avenues in semantic data analysis. A promising application of NLP is data harmonization in questionnaire-based cohort studies, where it can be used as an additional method, specifically when only different instruments are available for one construct as well as for the evaluation of potentially new construct-constellations. The present article therefore explores embedding models' potential to detect opportunities for semantic harmonization.</p><p><strong>Methods: </strong>Using models like SBERT and OpenAI's ADA, we developed a prototype application (\\\"Semantic Search Helper\\\") to facilitate the harmonization process of detecting semantically similar items within extensive health-related datasets. The approach's feasibility and applicability were evaluated through a use case analysis involving data from four large cohort studies with heterogeneous data obtained with a different set of instruments for common constructs.</p><p><strong>Results: </strong>With the prototype, we effectively identified potential harmonization pairs, which significantly reduced manual evaluation efforts. Expert ratings of semantic similarity candidates showed high agreement with model-generated pairs, confirming the validity of our approach.</p><p><strong>Conclusions: </strong>This study demonstrates the potential of embeddings in matching semantic similarity as a promising add-on tool to assist harmonization processes of multiplex data sets and instruments but with similar content, within and across studies.</p>\",\"PeriodicalId\":12155,\"journal\":{\"name\":\"European Psychiatry\",\"volume\":\"68 1\",\"pages\":\"e8\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Psychiatry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1192/j.eurpsy.2024.1808\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Psychiatry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1192/j.eurpsy.2024.1808","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

摘要

背景:自然语言处理(NLP)的最新进展,特别是在语言处理方法方面的进展,为语义数据分析开辟了新的途径。NLP的一个很有前途的应用是基于问卷的队列研究中的数据协调,它可以作为一种额外的方法,特别是当一个结构只有不同的工具可用时,以及用于评估潜在的新结构星座。因此,本文探讨了嵌入模型检测语义协调机会的潜力。方法:使用SBERT和OpenAI的ADA等模型,我们开发了一个原型应用程序(“语义搜索助手”),以促进在广泛的健康相关数据集中检测语义相似项目的协调过程。该方法的可行性和适用性通过用例分析进行评估,该用例分析涉及来自四个大型队列研究的数据,这些研究使用不同的工具集获得异构数据,用于常见结构。结果:通过原型,我们有效地识别了潜在的协调对,这大大减少了人工评估的工作量。专家对语义相似度候选的评级显示与模型生成的对高度一致,证实了我们方法的有效性。结论:本研究证明了嵌入在匹配语义相似性方面的潜力,作为一种有前途的附加工具,可以帮助在研究内部和跨研究中具有相似内容的多个数据集和工具的协调过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Semantic search helper: A tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets - A feasibility study.

Background: Recent advances in natural language processing (NLP), particularly in language processing methods, have opened new avenues in semantic data analysis. A promising application of NLP is data harmonization in questionnaire-based cohort studies, where it can be used as an additional method, specifically when only different instruments are available for one construct as well as for the evaluation of potentially new construct-constellations. The present article therefore explores embedding models' potential to detect opportunities for semantic harmonization.

Methods: Using models like SBERT and OpenAI's ADA, we developed a prototype application ("Semantic Search Helper") to facilitate the harmonization process of detecting semantically similar items within extensive health-related datasets. The approach's feasibility and applicability were evaluated through a use case analysis involving data from four large cohort studies with heterogeneous data obtained with a different set of instruments for common constructs.

Results: With the prototype, we effectively identified potential harmonization pairs, which significantly reduced manual evaluation efforts. Expert ratings of semantic similarity candidates showed high agreement with model-generated pairs, confirming the validity of our approach.

Conclusions: This study demonstrates the potential of embeddings in matching semantic similarity as a promising add-on tool to assist harmonization processes of multiplex data sets and instruments but with similar content, within and across studies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
European Psychiatry
European Psychiatry 医学-精神病学
CiteScore
8.50
自引率
3.80%
发文量
2338
审稿时长
4.5 weeks
期刊介绍: European Psychiatry, the official journal of the European Psychiatric Association, is dedicated to sharing cutting-edge research, policy updates, and fostering dialogue among clinicians, researchers, and patient advocates in the fields of psychiatry, mental health, behavioral science, and neuroscience. This peer-reviewed, Open Access journal strives to publish the latest advancements across various mental health issues, including diagnostic and treatment breakthroughs, as well as advancements in understanding the biological foundations of mental, behavioral, and cognitive functions in both clinical and general population studies.
期刊最新文献
The impact of social distancing on mental health during the Covid-19 pandemic: A nationwide study of 4.6 million Danish adults. Associations between IL-6 and trajectories of depressive symptoms across the life course: Evidence from ALSPAC and UK Biobank cohorts. European Psychiatry: 2024 in review. HOW TO IMPROVE PSYCHIATRIC NOSOGRAPHY IN THE XXI CENTURY: A PHENOMENOLOGIST'S VIEWPOINT. Excess costs of post-traumatic stress disorder related to child maltreatment in Germany.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1