Tingyan Xu, Timo Wolters, Johannes Lotz, Tom Bisson, Tim-Rasmus Kiehl, Nadine Flinner, Norman Zerbe, Marco Eichelberg
{"title":"PROSurvival: A Technical Case Report on Creating and Publishing a Dataset for Federated Learning on Survival Prediction of Prostate Cancer Patients.","authors":"Tingyan Xu, Timo Wolters, Johannes Lotz, Tom Bisson, Tim-Rasmus Kiehl, Nadine Flinner, Norman Zerbe, Marco Eichelberg","doi":"10.3233/SHTI241096","DOIUrl":null,"url":null,"abstract":"<p><p>The PROSurvival project aims to improve the prediction of recurrence-free survival in prostate cancer by applying federated machine learning to whole slide images combined with selected clinical data. Both the image and clinical data will be aggregated into an anonymized dataset compliant with the General Data Protection Regulation and published under the principles of findable, accessible, interoperable, and reusable data. The DICOM standard will be used for the image data. For the accompanying clinical data, a human-readable, compact and flexible standard is yet to be defined. From the set of existing standards, mostly extendable with varying degrees of modifications, we chose oBDS as a starting point and modified it to include missing data points and to remove mandatory items not applicable to our dataset. Clinical and survival data from clinic-specific spreadsheets were converted into this modified standard, ensuring on-site data privacy during processing. For publication of the dataset, both image and clinical data are anonymized using established methods. The key challenges arose during the clinical data anonymization and in identifying research repositories meeting all of our requirements. Each clinic had to coordinate the publication with their responsible data protection officers, requiring different approval processes due to the individual states' differing interpretations of the legal regulations. The newly established German Health Data Utilization Act is expected to simplify future data sharing in a responsible and powerful way.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"321 ","pages":"220-224"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI241096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The PROSurvival project aims to improve the prediction of recurrence-free survival in prostate cancer by applying federated machine learning to whole slide images combined with selected clinical data. Both the image and clinical data will be aggregated into an anonymized dataset compliant with the General Data Protection Regulation and published under the principles of findable, accessible, interoperable, and reusable data. The DICOM standard will be used for the image data. For the accompanying clinical data, a human-readable, compact and flexible standard is yet to be defined. From the set of existing standards, mostly extendable with varying degrees of modifications, we chose oBDS as a starting point and modified it to include missing data points and to remove mandatory items not applicable to our dataset. Clinical and survival data from clinic-specific spreadsheets were converted into this modified standard, ensuring on-site data privacy during processing. For publication of the dataset, both image and clinical data are anonymized using established methods. The key challenges arose during the clinical data anonymization and in identifying research repositories meeting all of our requirements. Each clinic had to coordinate the publication with their responsible data protection officers, requiring different approval processes due to the individual states' differing interpretations of the legal regulations. The newly established German Health Data Utilization Act is expected to simplify future data sharing in a responsible and powerful way.