Apache Pig Programming for Processing the Big Medical Data of Patients with Distributed Environment

2022 International Conference on Edge Computing and Applications (ICECAA) Pub Date : 2022-10-13 DOI:10.1109/ICECAA55415.2022.9936555

S. S. Aravinth, M. Ramesh Kumar, R. Ranganathan, P. M, M. Sasikala

{"title":"Apache Pig Programming for Processing the Big Medical Data of Patients with Distributed Environment","authors":"S. S. Aravinth, M. Ramesh Kumar, R. Ranganathan, P. M, M. Sasikala","doi":"10.1109/ICECAA55415.2022.9936555","DOIUrl":null,"url":null,"abstract":"Every day a huge amount of unstructured and semi structured data is used in all the business sectors. Those data are very complicated to store and process for applying into the decision-making system. Especially, the medical data, clinical data and patient history data are to be accessed in a faster way to bring the feasible solution. In view of this, a high speed, reliable and fault tolerant programming framework is needed [1].Apache Pig is a high level and globally accepted programming language to execute the map reduce tasks over the Hadoop cluster while dealing with unstructured data. This language works on Hadoop Distributed File System (HDFS) and this language is written in Java.In this proposed work, the medical history data of patients are considered to be processed. The existing approaches such as oracle SQL queries and mongo DB based results have been producing the very slower time to process these records. But in pig programming language, these gaps are rectified and produced an efficient result.The data dictionary of this implemented dataset is having 7 fields of records to be process in a phased approach. Each and every phase of analysis, the various fields are considered for further processing and interpretation. With the help of relationship operator’s, the relationship of these dataset fields is identified. Followed by this, the functions are applied to produce the faster segregation on the given dataset. Two types of functions are used here such as math function and evaluation function [2].","PeriodicalId":273850,"journal":{"name":"2022 International Conference on Edge Computing and Applications (ICECAA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Edge Computing and Applications (ICECAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECAA55415.2022.9936555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Every day a huge amount of unstructured and semi structured data is used in all the business sectors. Those data are very complicated to store and process for applying into the decision-making system. Especially, the medical data, clinical data and patient history data are to be accessed in a faster way to bring the feasible solution. In view of this, a high speed, reliable and fault tolerant programming framework is needed [1].Apache Pig is a high level and globally accepted programming language to execute the map reduce tasks over the Hadoop cluster while dealing with unstructured data. This language works on Hadoop Distributed File System (HDFS) and this language is written in Java.In this proposed work, the medical history data of patients are considered to be processed. The existing approaches such as oracle SQL queries and mongo DB based results have been producing the very slower time to process these records. But in pig programming language, these gaps are rectified and produced an efficient result.The data dictionary of this implemented dataset is having 7 fields of records to be process in a phased approach. Each and every phase of analysis, the various fields are considered for further processing and interpretation. With the help of relationship operator’s, the relationship of these dataset fields is identified. Followed by this, the functions are applied to produce the faster segregation on the given dataset. Two types of functions are used here such as math function and evaluation function [2].

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分布式环境下处理患者医疗大数据的Apache Pig编程

每天，所有业务部门都使用大量的非结构化和半结构化数据。这些数据的存储和处理非常复杂，难以应用于决策系统。特别是医疗数据、临床数据和病史数据的快速存取，带来了可行的解决方案。因此，需要一个高速、可靠、容错的编程框架。Apache Pig是一种高级且全球公认的编程语言，用于在Hadoop集群上执行map reduce任务，同时处理非结构化数据。该语言工作在HDFS (Hadoop Distributed File System)上，使用Java编写。在本工作中，考虑对患者的病史数据进行处理。现有的方法，如oracle SQL查询和基于mongodb的结果，处理这些记录的时间非常慢。但在pig编程语言中，这些差距被纠正并产生了高效的结果。这个实现的数据集的数据字典有7个记录字段，以分阶段的方式进行处理。在分析的每个阶段，各个领域都要考虑进一步的处理和解释。在关系算子的帮助下，识别这些数据集字段之间的关系。然后，应用这些函数在给定的数据集上产生更快的分离。这里使用了两种类型的函数，例如数学函数和求值函数[2]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 International Conference on Edge Computing and Applications (ICECAA)

自引率

0.00%

发文量