S. S. Aravinth, M. Ramesh Kumar, R. Ranganathan, P. M, M. Sasikala
{"title":"Apache Pig Programming for Processing the Big Medical Data of Patients with Distributed Environment","authors":"S. S. Aravinth, M. Ramesh Kumar, R. Ranganathan, P. M, M. Sasikala","doi":"10.1109/ICECAA55415.2022.9936555","DOIUrl":null,"url":null,"abstract":"Every day a huge amount of unstructured and semi structured data is used in all the business sectors. Those data are very complicated to store and process for applying into the decision-making system. Especially, the medical data, clinical data and patient history data are to be accessed in a faster way to bring the feasible solution. In view of this, a high speed, reliable and fault tolerant programming framework is needed [1].Apache Pig is a high level and globally accepted programming language to execute the map reduce tasks over the Hadoop cluster while dealing with unstructured data. This language works on Hadoop Distributed File System (HDFS) and this language is written in Java.In this proposed work, the medical history data of patients are considered to be processed. The existing approaches such as oracle SQL queries and mongo DB based results have been producing the very slower time to process these records. But in pig programming language, these gaps are rectified and produced an efficient result.The data dictionary of this implemented dataset is having 7 fields of records to be process in a phased approach. Each and every phase of analysis, the various fields are considered for further processing and interpretation. With the help of relationship operator’s, the relationship of these dataset fields is identified. Followed by this, the functions are applied to produce the faster segregation on the given dataset. Two types of functions are used here such as math function and evaluation function [2].","PeriodicalId":273850,"journal":{"name":"2022 International Conference on Edge Computing and Applications (ICECAA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Edge Computing and Applications (ICECAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECAA55415.2022.9936555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Every day a huge amount of unstructured and semi structured data is used in all the business sectors. Those data are very complicated to store and process for applying into the decision-making system. Especially, the medical data, clinical data and patient history data are to be accessed in a faster way to bring the feasible solution. In view of this, a high speed, reliable and fault tolerant programming framework is needed [1].Apache Pig is a high level and globally accepted programming language to execute the map reduce tasks over the Hadoop cluster while dealing with unstructured data. This language works on Hadoop Distributed File System (HDFS) and this language is written in Java.In this proposed work, the medical history data of patients are considered to be processed. The existing approaches such as oracle SQL queries and mongo DB based results have been producing the very slower time to process these records. But in pig programming language, these gaps are rectified and produced an efficient result.The data dictionary of this implemented dataset is having 7 fields of records to be process in a phased approach. Each and every phase of analysis, the various fields are considered for further processing and interpretation. With the help of relationship operator’s, the relationship of these dataset fields is identified. Followed by this, the functions are applied to produce the faster segregation on the given dataset. Two types of functions are used here such as math function and evaluation function [2].