病人识别和肿瘤识别管理:癌症多中心临床数据仓库的质量项目。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Cancer Informatics Pub Date : 2023-01-01 DOI:10.1177/11769351231172609
Karine Pallier, Olivier Prot, Simone Naldi, Francisco Silva, Thierry Denis, Olivier Giry, Sophie Leobon, Elise Deluche, Nicole Tubiana-Mathieu
{"title":"病人识别和肿瘤识别管理:癌症多中心临床数据仓库的质量项目。","authors":"Karine Pallier,&nbsp;Olivier Prot,&nbsp;Simone Naldi,&nbsp;Francisco Silva,&nbsp;Thierry Denis,&nbsp;Olivier Giry,&nbsp;Sophie Leobon,&nbsp;Elise Deluche,&nbsp;Nicole Tubiana-Mathieu","doi":"10.1177/11769351231172609","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The Regional Basis of Solid Tumor (RBST), a clinical data warehouse, centralizes information related to cancer patient care in 5 health establishments in 2 French departments.</p><p><strong>Purpose: </strong>To develop algorithms matching heterogeneous data to \"real\" patients and \"real\" tumors with respect to patient identification (PI) and tumor identification (TI).</p><p><strong>Methods: </strong>A graph database programed in java Neo4j was used to build the RBST with data from ~20 000 patients. The PI algorithm using the Levenshtein distance was based on the regulatory criteria identifying a patient. A TI algorithm was built on 6 characteristics: tumor location and laterality, date of diagnosis, histology, primary and metastatic status. Given the heterogeneous nature and semantics of the collected data, the creation of repositories (organ, synonym, and histology repositories) was required. The TI algorithm used the Dice coefficient to match tumors.</p><p><strong>Results: </strong>Patients matched if there was complete agreement of the given name, surname, sex, and date/month/year of birth. These parameters were assigned weights of 28%, 28%, 21%, and 23% (with 18% for year, 2.5% for month, and 2.5% for day), respectively. The algorithm had a sensitivity of 99.69% (95% confidence interval [CI] [98.89%, 99.96%]) and a specificity of 100% (95% CI [99.72%, 100%]). The TI algorithm used repositories, weights were assigned to the diagnosis date and associated organ (37.5% and 37.5%, respectively), laterality (16%) histology (5%), and metastatic status (4%). This algorithm had a sensitivity of 71% (95% CI [62.68%, 78.25%]) and a specificity of 100% (95% CI [94.31%, 100%]).</p><p><strong>Conclusion: </strong>The RBST encompasses 2 quality controls: PI and TI. It facilitates the implementation of transversal structuring and assessments of the performance of the provided care.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"22 ","pages":"11769351231172609"},"PeriodicalIF":2.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/f9/25/10.1177_11769351231172609.PMC10201142.pdf","citationCount":"0","resultStr":"{\"title\":\"Patient Identification and Tumor Identification Management: Quality Program in a Cancer Multicentric Clinical Data Warehouse.\",\"authors\":\"Karine Pallier,&nbsp;Olivier Prot,&nbsp;Simone Naldi,&nbsp;Francisco Silva,&nbsp;Thierry Denis,&nbsp;Olivier Giry,&nbsp;Sophie Leobon,&nbsp;Elise Deluche,&nbsp;Nicole Tubiana-Mathieu\",\"doi\":\"10.1177/11769351231172609\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The Regional Basis of Solid Tumor (RBST), a clinical data warehouse, centralizes information related to cancer patient care in 5 health establishments in 2 French departments.</p><p><strong>Purpose: </strong>To develop algorithms matching heterogeneous data to \\\"real\\\" patients and \\\"real\\\" tumors with respect to patient identification (PI) and tumor identification (TI).</p><p><strong>Methods: </strong>A graph database programed in java Neo4j was used to build the RBST with data from ~20 000 patients. The PI algorithm using the Levenshtein distance was based on the regulatory criteria identifying a patient. A TI algorithm was built on 6 characteristics: tumor location and laterality, date of diagnosis, histology, primary and metastatic status. Given the heterogeneous nature and semantics of the collected data, the creation of repositories (organ, synonym, and histology repositories) was required. The TI algorithm used the Dice coefficient to match tumors.</p><p><strong>Results: </strong>Patients matched if there was complete agreement of the given name, surname, sex, and date/month/year of birth. These parameters were assigned weights of 28%, 28%, 21%, and 23% (with 18% for year, 2.5% for month, and 2.5% for day), respectively. The algorithm had a sensitivity of 99.69% (95% confidence interval [CI] [98.89%, 99.96%]) and a specificity of 100% (95% CI [99.72%, 100%]). The TI algorithm used repositories, weights were assigned to the diagnosis date and associated organ (37.5% and 37.5%, respectively), laterality (16%) histology (5%), and metastatic status (4%). This algorithm had a sensitivity of 71% (95% CI [62.68%, 78.25%]) and a specificity of 100% (95% CI [94.31%, 100%]).</p><p><strong>Conclusion: </strong>The RBST encompasses 2 quality controls: PI and TI. It facilitates the implementation of transversal structuring and assessments of the performance of the provided care.</p>\",\"PeriodicalId\":35418,\"journal\":{\"name\":\"Cancer Informatics\",\"volume\":\"22 \",\"pages\":\"11769351231172609\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/f9/25/10.1177_11769351231172609.PMC10201142.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/11769351231172609\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11769351231172609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:区域实体瘤基础(RBST)是一个临床数据仓库,集中了法国2个部门5家卫生机构的癌症患者护理相关信息。目的:开发在患者识别(PI)和肿瘤识别(TI)方面将异构数据与“真实”患者和“真实”肿瘤匹配的算法。方法:采用java Neo4j编程的图形数据库构建约2万例患者的RBST数据。使用Levenshtein距离的PI算法基于识别患者的监管标准。TI算法基于6个特征:肿瘤的位置和侧边性、诊断日期、组织学、原发和转移状态。鉴于所收集数据的异构性质和语义,需要创建存储库(器官、同义词和组织学存储库)。TI算法使用Dice系数来匹配肿瘤。结果:如果患者的名字、姓氏、性别和出生日期/月/年完全一致,则患者匹配。这些参数的权重分别为28%、28%、21%和23%(年为18%,月为2.5%,日为2.5%)。该算法的灵敏度为99.69%(95%置信区间[CI][98.89%, 99.96%]),特异性为100% (95% CI[99.72%, 100%])。TI算法使用存储库,将权重分配给诊断日期和相关器官(分别为37.5%和37.5%)、侧边性(16%)、组织学(5%)和转移状态(4%)。该算法的灵敏度为71% (95% CI[62.68%, 78.25%]),特异性为100% (95% CI[94.31%, 100%])。结论:RBST包括PI和TI两种质量控制。它有助于实施横向结构和评估所提供护理的绩效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Patient Identification and Tumor Identification Management: Quality Program in a Cancer Multicentric Clinical Data Warehouse.

Background: The Regional Basis of Solid Tumor (RBST), a clinical data warehouse, centralizes information related to cancer patient care in 5 health establishments in 2 French departments.

Purpose: To develop algorithms matching heterogeneous data to "real" patients and "real" tumors with respect to patient identification (PI) and tumor identification (TI).

Methods: A graph database programed in java Neo4j was used to build the RBST with data from ~20 000 patients. The PI algorithm using the Levenshtein distance was based on the regulatory criteria identifying a patient. A TI algorithm was built on 6 characteristics: tumor location and laterality, date of diagnosis, histology, primary and metastatic status. Given the heterogeneous nature and semantics of the collected data, the creation of repositories (organ, synonym, and histology repositories) was required. The TI algorithm used the Dice coefficient to match tumors.

Results: Patients matched if there was complete agreement of the given name, surname, sex, and date/month/year of birth. These parameters were assigned weights of 28%, 28%, 21%, and 23% (with 18% for year, 2.5% for month, and 2.5% for day), respectively. The algorithm had a sensitivity of 99.69% (95% confidence interval [CI] [98.89%, 99.96%]) and a specificity of 100% (95% CI [99.72%, 100%]). The TI algorithm used repositories, weights were assigned to the diagnosis date and associated organ (37.5% and 37.5%, respectively), laterality (16%) histology (5%), and metastatic status (4%). This algorithm had a sensitivity of 71% (95% CI [62.68%, 78.25%]) and a specificity of 100% (95% CI [94.31%, 100%]).

Conclusion: The RBST encompasses 2 quality controls: PI and TI. It facilitates the implementation of transversal structuring and assessments of the performance of the provided care.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cancer Informatics
Cancer Informatics Medicine-Oncology
CiteScore
3.00
自引率
5.00%
发文量
30
审稿时长
8 weeks
期刊介绍: The field of cancer research relies on advances in many other disciplines, including omics technology, mass spectrometry, radio imaging, computer science, and biostatistics. Cancer Informatics provides open access to peer-reviewed high-quality manuscripts reporting bioinformatics analysis of molecular genetics and/or clinical data pertaining to cancer, emphasizing the use of machine learning, artificial intelligence, statistical algorithms, advanced imaging techniques, data visualization, and high-throughput technologies. As the leading journal dedicated exclusively to the report of the use of computational methods in cancer research and practice, Cancer Informatics leverages methodological improvements in systems biology, genomics, proteomics, metabolomics, and molecular biochemistry into the fields of cancer detection, treatment, classification, risk-prediction, prevention, outcome, and modeling.
期刊最新文献
Detecting the Tumor Prognostic Factors From the YTH Domain Family Through Integrative Pan-Cancer Analysis. Unveiling Recurrence Patterns: Analyzing Predictive Risk Factors for Breast Cancer Recurrence after Surgery. Understanding the Biological Basis of Polygenic Risk Scores and Disparities in Prostate Cancer: A Comprehensive Genomic Analysis. Machine Learning for Dynamic Prognostication of Patients With Hepatocellular Carcinoma Using Time-Series Data: Survival Path Versus Dynamic-DeepHit HCC Model. Advancements and Challenges in the Image-Based Diagnosis of Lung and Colon Cancer: A Comprehensive Review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1