{"title":"Extensible Query Framework for Unstructured Medical Data -- A Big Data Approach","authors":"Sarmad Istephan, Mohammad-Reza Siadat","doi":"10.1109/ICDMW.2015.67","DOIUrl":null,"url":null,"abstract":"With the ever increasing amount of medical image scans, it is critical to have an extensible framework that allows for mining such unstructured data. Such a framework would provide a medical researcher the flexibility in validating and testing hypotheses. Important characteristics of this type of framework include accuracy, efficiency and extensibility. The objective of this work is to build an initial implementation of such a framework within a big data paradigm. To this end, a clinical data warehouse was built for the structured data and a set of modules were created to analyze the unstructured content. The framework contains built-in modules but is flexible in allowing the user to import their own, making it extensible. Furthermore, the framework runs the modules in a Hadoop cluster making it efficient by utilizing the distributed computing capability of big data approach. To test the framework, simulated data of 1,000 patients along with their hippocampi images were created. The results show that the framework accurately returned all 15 patients who had hippocampal resection with hippocampus ipsilateral to surgery being less than 20% the size of the hippocampus contralateral to surgery, using a built-in module. In addition, the framework allowed the user to run a different module using the previous output to further analyze the unstructured data. Finally, the framework also enabled the user to import a new module. This study paves the way towards showing the feasibility of such a framework to handle unstructured medical data in an accurate, efficient and extensible manner.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2015.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
With the ever increasing amount of medical image scans, it is critical to have an extensible framework that allows for mining such unstructured data. Such a framework would provide a medical researcher the flexibility in validating and testing hypotheses. Important characteristics of this type of framework include accuracy, efficiency and extensibility. The objective of this work is to build an initial implementation of such a framework within a big data paradigm. To this end, a clinical data warehouse was built for the structured data and a set of modules were created to analyze the unstructured content. The framework contains built-in modules but is flexible in allowing the user to import their own, making it extensible. Furthermore, the framework runs the modules in a Hadoop cluster making it efficient by utilizing the distributed computing capability of big data approach. To test the framework, simulated data of 1,000 patients along with their hippocampi images were created. The results show that the framework accurately returned all 15 patients who had hippocampal resection with hippocampus ipsilateral to surgery being less than 20% the size of the hippocampus contralateral to surgery, using a built-in module. In addition, the framework allowed the user to run a different module using the previous output to further analyze the unstructured data. Finally, the framework also enabled the user to import a new module. This study paves the way towards showing the feasibility of such a framework to handle unstructured medical data in an accurate, efficient and extensible manner.