Apache Pig Programming for Processing the Big Medical Data of Patients with Distributed Environment

S. S. Aravinth, M. Ramesh Kumar, R. Ranganathan, P. M, M. Sasikala
{"title":"Apache Pig Programming for Processing the Big Medical Data of Patients with Distributed Environment","authors":"S. S. Aravinth, M. Ramesh Kumar, R. Ranganathan, P. M, M. Sasikala","doi":"10.1109/ICECAA55415.2022.9936555","DOIUrl":null,"url":null,"abstract":"Every day a huge amount of unstructured and semi structured data is used in all the business sectors. Those data are very complicated to store and process for applying into the decision-making system. Especially, the medical data, clinical data and patient history data are to be accessed in a faster way to bring the feasible solution. In view of this, a high speed, reliable and fault tolerant programming framework is needed [1].Apache Pig is a high level and globally accepted programming language to execute the map reduce tasks over the Hadoop cluster while dealing with unstructured data. This language works on Hadoop Distributed File System (HDFS) and this language is written in Java.In this proposed work, the medical history data of patients are considered to be processed. The existing approaches such as oracle SQL queries and mongo DB based results have been producing the very slower time to process these records. But in pig programming language, these gaps are rectified and produced an efficient result.The data dictionary of this implemented dataset is having 7 fields of records to be process in a phased approach. Each and every phase of analysis, the various fields are considered for further processing and interpretation. With the help of relationship operator’s, the relationship of these dataset fields is identified. Followed by this, the functions are applied to produce the faster segregation on the given dataset. Two types of functions are used here such as math function and evaluation function [2].","PeriodicalId":273850,"journal":{"name":"2022 International Conference on Edge Computing and Applications (ICECAA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Edge Computing and Applications (ICECAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECAA55415.2022.9936555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Every day a huge amount of unstructured and semi structured data is used in all the business sectors. Those data are very complicated to store and process for applying into the decision-making system. Especially, the medical data, clinical data and patient history data are to be accessed in a faster way to bring the feasible solution. In view of this, a high speed, reliable and fault tolerant programming framework is needed [1].Apache Pig is a high level and globally accepted programming language to execute the map reduce tasks over the Hadoop cluster while dealing with unstructured data. This language works on Hadoop Distributed File System (HDFS) and this language is written in Java.In this proposed work, the medical history data of patients are considered to be processed. The existing approaches such as oracle SQL queries and mongo DB based results have been producing the very slower time to process these records. But in pig programming language, these gaps are rectified and produced an efficient result.The data dictionary of this implemented dataset is having 7 fields of records to be process in a phased approach. Each and every phase of analysis, the various fields are considered for further processing and interpretation. With the help of relationship operator’s, the relationship of these dataset fields is identified. Followed by this, the functions are applied to produce the faster segregation on the given dataset. Two types of functions are used here such as math function and evaluation function [2].
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分布式环境下处理患者医疗大数据的Apache Pig编程
每天,所有业务部门都使用大量的非结构化和半结构化数据。这些数据的存储和处理非常复杂,难以应用于决策系统。特别是医疗数据、临床数据和病史数据的快速存取,带来了可行的解决方案。因此,需要一个高速、可靠、容错的编程框架。Apache Pig是一种高级且全球公认的编程语言,用于在Hadoop集群上执行map reduce任务,同时处理非结构化数据。该语言工作在HDFS (Hadoop Distributed File System)上,使用Java编写。在本工作中,考虑对患者的病史数据进行处理。现有的方法,如oracle SQL查询和基于mongodb的结果,处理这些记录的时间非常慢。但在pig编程语言中,这些差距被纠正并产生了高效的结果。这个实现的数据集的数据字典有7个记录字段,以分阶段的方式进行处理。在分析的每个阶段,各个领域都要考虑进一步的处理和解释。在关系算子的帮助下,识别这些数据集字段之间的关系。然后,应用这些函数在给定的数据集上产生更快的分离。这里使用了两种类型的函数,例如数学函数和求值函数[2]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Identification of IT Tickets and Bugs using Text-Supervised Pedagogical Approaches Application of Computer CAD Software Optimization in the Manufacture of Mechanical Reducer Considering Artificial Intelligence Auxiliary Decision-Making System for College Curriculum Construction based on Big Data Technology Pest Identification and Control using Deep Learning and Augmented Reality Internet of Things-based Personal Private Server Computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1