首页 > 最新文献

2016 International Workshop on Big Data and Information Security (IWBIS)最新文献

英文 中文
Enhancing query performance of library information systems using NoSQL DBMS: Case study on library information systems of Universitas Indonesia 利用NoSQL DBMS提高图书馆信息系统的查询性能——以印尼大学图书馆信息系统为例
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872887
Herrnansyah, Y. Ruldeviyani, R. F. Aji
Library Automation and Digital Archive (Lontar) is a liblary information system developed by Universitas Indonesia and used by its main library. Rapid increase of library collections will soon make query performance of current SQL DBMS, which is MySQL, not fast enough to satisfy users and need to be complemented by NoSQL database, an emerging technology that specially developed for managing big data. The goal of this research is to implement and analyze the usage of NoSQL database to improve the query performance of Lontar. MongoDB is selected as NoSQL DBMS and the result shows that MongoDB is signficantly faster than MySQL.
图书馆自动化和数字档案(Lontar)是由印度尼西亚大学开发并由其主图书馆使用的图书馆信息系统。随着图书馆馆藏的快速增长,现有的SQL DBMS MySQL的查询性能很快就不能满足用户的需求,需要NoSQL数据库的补充,这是一种专门为管理大数据而开发的新兴技术。本研究的目的是实现和分析NoSQL数据库的使用,以提高Lontar的查询性能。我们选择MongoDB作为NoSQL DBMS,结果表明MongoDB的速度明显快于MySQL。
{"title":"Enhancing query performance of library information systems using NoSQL DBMS: Case study on library information systems of Universitas Indonesia","authors":"Herrnansyah, Y. Ruldeviyani, R. F. Aji","doi":"10.1109/IWBIS.2016.7872887","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872887","url":null,"abstract":"Library Automation and Digital Archive (Lontar) is a liblary information system developed by Universitas Indonesia and used by its main library. Rapid increase of library collections will soon make query performance of current SQL DBMS, which is MySQL, not fast enough to satisfy users and need to be complemented by NoSQL database, an emerging technology that specially developed for managing big data. The goal of this research is to implement and analyze the usage of NoSQL database to improve the query performance of Lontar. MongoDB is selected as NoSQL DBMS and the result shows that MongoDB is signficantly faster than MySQL.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124611510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Medical warning system based on Internet of Things using fog computing 基于雾计算的物联网医疗预警系统
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872884
I. Azimi, A. Anzanpour, A. Rahmani, P. Liljeberg, T. Salakoski
Remote patient monitoring is essential for many patients that are suffering from acute diseases such as different heart conditions. Continuous health monitoring can provide medical services that consider the current medical state of the patient and to predict or early-detect future potentially critical situations. In this regard, Internet of Things as a multidisciplinary paradigm can provide profound impacts. However, the current IoT-based systems may encounter difficulties to provide continuous and real time patient monitoring due to issues in data analytics. In this paper, we introduce a new IoT-based approach to offer smart medical warning in personalized patient monitoring. The proposed approach consider local computing paradigm enabled by machine learning algorithms and automate management of system components in computing section. The proposed system is evaluated via a case study concerning continuous patient monitoring to early-detect patient deterioration via arrhythmia in ECG signal.
对于许多患有不同心脏病等急性疾病的患者来说,远程患者监测是必不可少的。持续健康监测可以提供医疗服务,考虑患者当前的医疗状态,并预测或早期发现未来潜在的危急情况。在这方面,物联网作为一个多学科的范式可以提供深远的影响。然而,由于数据分析方面的问题,目前基于物联网的系统可能难以提供持续和实时的患者监测。在本文中,我们介绍了一种新的基于物联网的方法,在个性化患者监护中提供智能医疗警报。该方法考虑了由机器学习算法支持的局部计算范式,并在计算部分实现了系统组件的自动化管理。通过一个案例研究,对该系统进行了评估,该系统通过心电信号中的心律失常来早期检测患者的病情恶化。
{"title":"Medical warning system based on Internet of Things using fog computing","authors":"I. Azimi, A. Anzanpour, A. Rahmani, P. Liljeberg, T. Salakoski","doi":"10.1109/IWBIS.2016.7872884","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872884","url":null,"abstract":"Remote patient monitoring is essential for many patients that are suffering from acute diseases such as different heart conditions. Continuous health monitoring can provide medical services that consider the current medical state of the patient and to predict or early-detect future potentially critical situations. In this regard, Internet of Things as a multidisciplinary paradigm can provide profound impacts. However, the current IoT-based systems may encounter difficulties to provide continuous and real time patient monitoring due to issues in data analytics. In this paper, we introduce a new IoT-based approach to offer smart medical warning in personalized patient monitoring. The proposed approach consider local computing paradigm enabled by machine learning algorithms and automate management of system components in computing section. The proposed system is evaluated via a case study concerning continuous patient monitoring to early-detect patient deterioration via arrhythmia in ECG signal.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132337075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Comparative study of lightweight secure multiroute communication system in low cost wireless sensor network for CO2 monitoring 低成本无线传感器网络中轻型安全多路由通信系统的比较研究
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872904
Novian Habibie, Rindra Wiska, A. Nugraha, A. Wibisono, P. Mursanto, W. S. Nugroho, S. Yazid
Wireless Sensor Network (WSN) is a system used to conduct a remote monitoring in a wide monitoring area. It has a sensor node — a sampling point — which communicate each other to passing their data to central node for recapitulation or transmit it to data center. Because of that, communication system is a crucial thing for WSN. However, WSN may be deployed in a environment that far from ideal condition. Placed in an unattended area with far distance between nodes, WSN is very vulnerable with security threats. To overcome that, the good combination between communication protocol and encryption algorithm for WSN is needed to gather an accurate and representative data with high transmission speed. This research focused on finding those combination for our own-made low-cost sensor node for CO2 monitoring. In this research, two routing protocols (AODV and TARP) and several encryption algorithms (AES, ChaCha, and Speck) tested to determine which combination is give the best result. As the result, combination between routing protocol AODV and encryption algorithm Speck give the best result in the term of performance.
无线传感器网络(WSN)是一种用于在大范围监控区域内进行远程监控的系统。它有一个传感器节点——一个采样点——它们相互通信,将它们的数据传递到中心节点进行重述或传输到数据中心。因此,通信系统对无线传感器网络至关重要。然而,无线传感器网络可能部署在远离理想条件的环境中。无线传感器网络处于无人值守的区域,节点之间距离较远,极易受到安全威胁。为了克服这一问题,需要将无线传感器网络的通信协议和加密算法很好地结合起来,以高速传输的方式采集到准确、有代表性的数据。这项研究的重点是为我们自己制造的低成本二氧化碳监测传感器节点找到这些组合。在本研究中,测试了两种路由协议(AODV和TARP)和几种加密算法(AES、ChaCha和Speck),以确定哪种组合能产生最佳结果。结果表明,路由协议AODV与加密算法Speck的组合在性能上是最好的。
{"title":"Comparative study of lightweight secure multiroute communication system in low cost wireless sensor network for CO2 monitoring","authors":"Novian Habibie, Rindra Wiska, A. Nugraha, A. Wibisono, P. Mursanto, W. S. Nugroho, S. Yazid","doi":"10.1109/IWBIS.2016.7872904","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872904","url":null,"abstract":"Wireless Sensor Network (WSN) is a system used to conduct a remote monitoring in a wide monitoring area. It has a sensor node — a sampling point — which communicate each other to passing their data to central node for recapitulation or transmit it to data center. Because of that, communication system is a crucial thing for WSN. However, WSN may be deployed in a environment that far from ideal condition. Placed in an unattended area with far distance between nodes, WSN is very vulnerable with security threats. To overcome that, the good combination between communication protocol and encryption algorithm for WSN is needed to gather an accurate and representative data with high transmission speed. This research focused on finding those combination for our own-made low-cost sensor node for CO2 monitoring. In this research, two routing protocols (AODV and TARP) and several encryption algorithms (AES, ChaCha, and Speck) tested to determine which combination is give the best result. As the result, combination between routing protocol AODV and encryption algorithm Speck give the best result in the term of performance.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132627092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A review of big data analytics in the biomedical field 生物医学领域大数据分析综述
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872886
W. Jatmiko, D. M. S. Arsa, H. A. Wisesa, Grafiks Jati, M. A. Ma'sum
In the recent years, the volume of data that exists in the world has risen dramatically. Biomedical data are data that are recorded from a living being that is used to help analyzing and diagnosis of a certain illness. Like many other types of data, the volume biomedical data has also risen in the last couple of years. In order to process this large amount of data, conventional processing techniques are not adequate. In this paper, we discuss several approach in processing large amount of biomedical data. This paper will also discuss several variations of biomedical data and the challenge that are faced when processing those biomedical data in large sizes. We also proposed integrated Telehealth system which combine Tele-ECG, Tele-USG, and existing biomedical application. The system will be implemented on Big Data Framework. Then Tele-health development can be done using the phase that we propose. The system is started by developing end-to-end user system, implementation to Big Data Framework, then it is finished by Clinical Practice. The proposed framework can be used for high standard biomedical system.
近年来,世界上存在的数据量急剧增加。生物医学数据是从生物身上记录下来的数据,用于帮助分析和诊断某种疾病。与许多其他类型的数据一样,生物医学数据的数量在过去几年中也有所增加。为了处理如此大量的数据,传统的处理技术是不够的。本文讨论了处理大量生物医学数据的几种方法。本文还将讨论生物医学数据的几种变体以及在处理这些大规模生物医学数据时所面临的挑战。我们还提出了将远程心电、远程usg和现有生物医学应用相结合的综合远程医疗系统。该系统将在大数据框架下实施。然后,可以使用我们提出的阶段来进行远程保健开发。系统从开发端到端用户系统开始,实施到大数据框架,最后由临床实践完成。该框架可用于高标准的生物医学系统。
{"title":"A review of big data analytics in the biomedical field","authors":"W. Jatmiko, D. M. S. Arsa, H. A. Wisesa, Grafiks Jati, M. A. Ma'sum","doi":"10.1109/IWBIS.2016.7872886","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872886","url":null,"abstract":"In the recent years, the volume of data that exists in the world has risen dramatically. Biomedical data are data that are recorded from a living being that is used to help analyzing and diagnosis of a certain illness. Like many other types of data, the volume biomedical data has also risen in the last couple of years. In order to process this large amount of data, conventional processing techniques are not adequate. In this paper, we discuss several approach in processing large amount of biomedical data. This paper will also discuss several variations of biomedical data and the challenge that are faced when processing those biomedical data in large sizes. We also proposed integrated Telehealth system which combine Tele-ECG, Tele-USG, and existing biomedical application. The system will be implemented on Big Data Framework. Then Tele-health development can be done using the phase that we propose. The system is started by developing end-to-end user system, implementation to Big Data Framework, then it is finished by Clinical Practice. The proposed framework can be used for high standard biomedical system.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127822148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Big data compression using spiht in Hadoop: A case study in multi-lead ECG signals 在Hadoop中使用spit进行大数据压缩:多导联心电信号的案例研究
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872902
G. Jati, Ilham Kusuma, M. Hilman, W. Jatmiko
Compression still become main concern in big data framework. The performance of big data depend on speed of data transfer. Compressed data can speed up transfer data between network. It also save more space for storage. Several compression method is provide by Hadoop as a most common big data framework. That method mostly for general purpose. But the performance still have to optimize especially for Biomedical record like ECG data. We propose Set Partitioning in Hierarchical Tree (SPIHT) for big data compression with study case ECG signal data. In this paper compression will run in Hadoop Framework. The proposed method has stages such as input signal, map input signal, spiht coding, and reduce bit-stream. The compression produce compressed data for intermediate (Map) output and final (reduce) output. The experiment using ECG data to measure compression performance. The proposed method gets Percentage Root-mean-square difference (PRD) is about 1.0. Compare to existing method, the proposed method get better Compression Ratio (CR) with competitive longer compression time. So proposed method gets better performance compare to other method especially for ECG dataset.
压缩仍然是大数据框架的主要关注点。大数据的性能取决于数据传输的速度。压缩数据可以加快网络间的数据传输速度。它还节省了更多的存储空间。Hadoop作为一个最常见的大数据框架提供了几种压缩方法。这种方法主要用于一般用途。但是对于像心电这样的生物医学记录,其性能还有待进一步优化。以心电信号数据为例,提出了基于层次树的集划分方法。在本文中,压缩将在Hadoop框架中运行。该方法具有输入信号、映射输入信号、精神编码和降码等阶段。压缩产生中间(Map)输出和最终(reduce)输出的压缩数据。实验采用心电数据来衡量压缩性能。所提方法得到的百分比均方根差(PRD)约为1.0。与现有方法相比,该方法具有更好的压缩比和更长的压缩时间。因此,与其他方法相比,该方法具有更好的性能,特别是在心电数据集上。
{"title":"Big data compression using spiht in Hadoop: A case study in multi-lead ECG signals","authors":"G. Jati, Ilham Kusuma, M. Hilman, W. Jatmiko","doi":"10.1109/IWBIS.2016.7872902","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872902","url":null,"abstract":"Compression still become main concern in big data framework. The performance of big data depend on speed of data transfer. Compressed data can speed up transfer data between network. It also save more space for storage. Several compression method is provide by Hadoop as a most common big data framework. That method mostly for general purpose. But the performance still have to optimize especially for Biomedical record like ECG data. We propose Set Partitioning in Hierarchical Tree (SPIHT) for big data compression with study case ECG signal data. In this paper compression will run in Hadoop Framework. The proposed method has stages such as input signal, map input signal, spiht coding, and reduce bit-stream. The compression produce compressed data for intermediate (Map) output and final (reduce) output. The experiment using ECG data to measure compression performance. The proposed method gets Percentage Root-mean-square difference (PRD) is about 1.0. Compare to existing method, the proposed method get better Compression Ratio (CR) with competitive longer compression time. So proposed method gets better performance compare to other method especially for ECG dataset.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129809567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Design of intelligent k-means based on spark for big data clustering 基于spark的大数据聚类智能k-means设计
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872895
Ilham Kusuma, M. A. Ma'sum, Novian Habibie, W. Jatmiko, H. Suhartanto
The growth of data has bring us to the big data generation where the amount of data cannot be computed using conventional environment. There are a lot of computational environment that had been developed to compute big data, one of them is Hadoop that has Distributed File System and MapReduce framework. Spark is newly framework that can be combined with Hadoop and run on top of it. In this paper, we design intelligent k-means based on Spark for big data clustering. Our design is using batch of data instead using original Resilient Distributed Dataset (RDD). We compare our design with the implementation that using original RDD of data. Result of experiment shows that implementation using batch of data is faster than the implementation using original RDD.
数据的增长将我们带到了大数据的产生,而传统的环境无法计算数据量。目前已经开发了很多计算环境来计算大数据,其中之一就是拥有分布式文件系统和MapReduce框架的Hadoop。Spark是一个新的框架,可以与Hadoop结合并在其上运行。本文设计了基于Spark的大数据聚类智能k-means。我们的设计是使用批量数据,而不是使用原始的弹性分布式数据集(RDD)。我们将我们的设计与使用原始数据RDD的实现进行了比较。实验结果表明,使用批量数据的实现比使用原始RDD的实现速度更快。
{"title":"Design of intelligent k-means based on spark for big data clustering","authors":"Ilham Kusuma, M. A. Ma'sum, Novian Habibie, W. Jatmiko, H. Suhartanto","doi":"10.1109/IWBIS.2016.7872895","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872895","url":null,"abstract":"The growth of data has bring us to the big data generation where the amount of data cannot be computed using conventional environment. There are a lot of computational environment that had been developed to compute big data, one of them is Hadoop that has Distributed File System and MapReduce framework. Spark is newly framework that can be combined with Hadoop and run on top of it. In this paper, we design intelligent k-means based on Spark for big data clustering. Our design is using batch of data instead using original Resilient Distributed Dataset (RDD). We compare our design with the implementation that using original RDD of data. Result of experiment shows that implementation using batch of data is faster than the implementation using original RDD.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128238497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Adaptive range in FIMT-DD tree for large data streams FIMT-DD树对大数据流的自适应范围
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872898
H. Wisesa, M. A. Ma'sum, A. Wibisono
The number of vehicles that exists on public roads have increased drastically over the years. This have caused several problems, where one of the most common problem is traffic jam. There have been several studies that have tried to solve this problem, such as by using real time videos with computer vision, wireless sensor networks, and traffic data predictions. In this study, we proposed a modification of Fast Incremental Model Trees with Drift Detections (FIMT-DD) to predict the traffic flow from a large traffic data set provided by the Government of United Kingdom. From our experiment results using large datasets, our proposed method have proven to be more accurate in predicting the traffic flow as compared to the conventional FIMT-DD Algorithm.
这些年来,公共道路上的车辆数量急剧增加。这造成了几个问题,其中最常见的问题之一是交通堵塞。已经有几项研究试图解决这个问题,例如通过使用带有计算机视觉的实时视频、无线传感器网络和交通数据预测。在这项研究中,我们提出了一种改进的带有漂移检测的快速增量模型树(FIMT-DD)来预测英国政府提供的大型交通数据集的交通流。从我们使用大型数据集的实验结果来看,与传统的FIMT-DD算法相比,我们提出的方法在预测交通流量方面更加准确。
{"title":"Adaptive range in FIMT-DD tree for large data streams","authors":"H. Wisesa, M. A. Ma'sum, A. Wibisono","doi":"10.1109/IWBIS.2016.7872898","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872898","url":null,"abstract":"The number of vehicles that exists on public roads have increased drastically over the years. This have caused several problems, where one of the most common problem is traffic jam. There have been several studies that have tried to solve this problem, such as by using real time videos with computer vision, wireless sensor networks, and traffic data predictions. In this study, we proposed a modification of Fast Incremental Model Trees with Drift Detections (FIMT-DD) to predict the traffic flow from a large traffic data set provided by the Government of United Kingdom. From our experiment results using large datasets, our proposed method have proven to be more accurate in predicting the traffic flow as compared to the conventional FIMT-DD Algorithm.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130449116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Processing big data with decision trees: A case study in large traffic data 用决策树处理大数据:大型交通数据的案例研究
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872899
H. Wisesa, M. A. Ma'sum, P. Mursanto, A. Febrian
This paper provides a comparison of processing large traffic data by using decision trees. The experiment was tested in three different classifier tools that are very popular and are widely used in the community. These classifier tools are WEKA classifier, MoA (Massive Online Analysis) classifier, and SPARK MLib that runs on Hadoop infrastructure. We tested the traffic data using decision trees because it is one of the best methods for regressing the large data. The experiment results showed that the WEKA classifier fails to classify dataset with a large number of instance, wheras the MoA has successfully regress the dataset as a datastream. The SPARK MLib decision trees algorithm could also successfully resgress the traffic data quickly with a fairly good accuracy.
本文对决策树在处理大型交通数据中的应用进行了比较。实验在三种不同的分类器工具中进行了测试,这些工具在社区中非常流行和广泛使用。这些分类器工具是WEKA分类器、MoA (Massive Online Analysis)分类器和运行在Hadoop基础设施上的SPARK MLib。我们使用决策树来测试交通数据,因为它是回归大数据的最佳方法之一。实验结果表明,WEKA分类器对具有大量实例的数据集无法进行分类,而MoA分类器则成功地将数据集回归为数据流。SPARK MLib决策树算法也可以成功地快速分解交通数据,并具有较好的精度。
{"title":"Processing big data with decision trees: A case study in large traffic data","authors":"H. Wisesa, M. A. Ma'sum, P. Mursanto, A. Febrian","doi":"10.1109/IWBIS.2016.7872899","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872899","url":null,"abstract":"This paper provides a comparison of processing large traffic data by using decision trees. The experiment was tested in three different classifier tools that are very popular and are widely used in the community. These classifier tools are WEKA classifier, MoA (Massive Online Analysis) classifier, and SPARK MLib that runs on Hadoop infrastructure. We tested the traffic data using decision trees because it is one of the best methods for regressing the large data. The experiment results showed that the WEKA classifier fails to classify dataset with a large number of instance, wheras the MoA has successfully regress the dataset as a datastream. The SPARK MLib decision trees algorithm could also successfully resgress the traffic data quickly with a fairly good accuracy.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123748985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Developing recommender systems for personalized email with big data 开发基于大数据的个性化电子邮件推荐系统
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872893
A. A. Gunawan, Tania, Derwin Suhartono
Recommender systems are nowadays widely used in e-commerce industry to boost its sale. One of the popular algorithms in recommender systems is collaborative filtering. The fundamental assumption behind this algorithm is that other users' opinions can be filtered and accumulated in such a way as to provide a plausible prediction of the target user's preference. In this paper, we would like to develop a recommender system with big data of one e-commerce company and deliver the recommendations through a personalized email. To address this problem, we propose user-based collaboration filter based on company dataset and employ several similarity functions: Euclidean distance, Cosine, Pearson correlation and Tanimoto coefficient. The experimental results show that: (i) user responses are positive to the given recommendations based on user perception survey (ii) Tanimoto coefficient with 10 neighbors shows the best performance in the RMSE, precision and recall evaluation based on groundtruth dataset.
推荐系统被广泛应用于电子商务行业,以促进其销售。协同过滤是推荐系统中常用的算法之一。该算法背后的基本假设是,可以过滤和积累其他用户的意见,从而提供对目标用户偏好的合理预测。在本文中,我们想利用某电商公司的大数据开发一个推荐系统,并通过个性化的邮件发送推荐。为了解决这个问题,我们提出了基于用户的基于公司数据集的协作过滤器,并采用了几个相似函数:欧几里得距离、余弦、Pearson相关和谷本系数。实验结果表明:(i)基于用户感知调查的用户对给定推荐的响应是积极的;(ii)基于groundtruth数据集的谷本系数在RMSE、精度和召回率评估中表现最佳。
{"title":"Developing recommender systems for personalized email with big data","authors":"A. A. Gunawan, Tania, Derwin Suhartono","doi":"10.1109/IWBIS.2016.7872893","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872893","url":null,"abstract":"Recommender systems are nowadays widely used in e-commerce industry to boost its sale. One of the popular algorithms in recommender systems is collaborative filtering. The fundamental assumption behind this algorithm is that other users' opinions can be filtered and accumulated in such a way as to provide a plausible prediction of the target user's preference. In this paper, we would like to develop a recommender system with big data of one e-commerce company and deliver the recommendations through a personalized email. To address this problem, we propose user-based collaboration filter based on company dataset and employ several similarity functions: Euclidean distance, Cosine, Pearson correlation and Tanimoto coefficient. The experimental results show that: (i) user responses are positive to the given recommendations based on user perception survey (ii) Tanimoto coefficient with 10 neighbors shows the best performance in the RMSE, precision and recall evaluation based on groundtruth dataset.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130097117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Parallel rules based classifier using DNA strand displacement for multiple molecular markers detection 基于并行规则的DNA链位移分类器多分子标记检测
Pub Date : 2016-10-01 DOI: 10.1109/IWBIS.2016.7872901
A. Wibowo, S. Adhy, R. Kusumaningrum, H. A. Wibawa, K. Sekiyama
The detection of molecular markers such as micro ribonucleic acid (miRNA) expression levels in the cells are essential for diagnosis of a disease, especially cancer. Recently, a huge amount of molecular markers on cell is being extracted by single molecular detection method, as results, excessively detection process and time is required. Meanwhile DNA computing has capabilities to interact with biological nucleic acids and massively parallel processing, we propose parallel rule-based classifier using DNA strand displacement reaction as method of detecting multiple molecular markers. Two DNA reaction of parallel sensing and encoding system of the IF-THEN rules have been developed in our proposed model. In this paper, we compared between the proposed method and the AND gate method for the classification result. Moreover, based on the simulation results show that our method is possible as an alternative solution for the programmable fast big molecular markers detection and diagnosis.
细胞中微核糖核酸(miRNA)表达水平等分子标记的检测对于疾病,特别是癌症的诊断至关重要。近年来,利用单分子检测法提取细胞上大量的分子标记物,耗费了大量的检测过程和时间。同时,DNA计算具有与生物核酸相互作用和大规模并行处理的能力,我们提出了基于并行规则的分类器,利用DNA链位移反应作为检测多个分子标记的方法。在我们提出的模型中建立了两个DNA反应的并行感知和IF-THEN规则的编码系统。在本文中,我们将该方法与与门方法进行了分类结果的比较。仿真结果表明,该方法可作为可编程大分子标记物快速检测与诊断的替代方案。
{"title":"Parallel rules based classifier using DNA strand displacement for multiple molecular markers detection","authors":"A. Wibowo, S. Adhy, R. Kusumaningrum, H. A. Wibawa, K. Sekiyama","doi":"10.1109/IWBIS.2016.7872901","DOIUrl":"https://doi.org/10.1109/IWBIS.2016.7872901","url":null,"abstract":"The detection of molecular markers such as micro ribonucleic acid (miRNA) expression levels in the cells are essential for diagnosis of a disease, especially cancer. Recently, a huge amount of molecular markers on cell is being extracted by single molecular detection method, as results, excessively detection process and time is required. Meanwhile DNA computing has capabilities to interact with biological nucleic acids and massively parallel processing, we propose parallel rule-based classifier using DNA strand displacement reaction as method of detecting multiple molecular markers. Two DNA reaction of parallel sensing and encoding system of the IF-THEN rules have been developed in our proposed model. In this paper, we compared between the proposed method and the AND gate method for the classification result. Moreover, based on the simulation results show that our method is possible as an alternative solution for the programmable fast big molecular markers detection and diagnosis.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134127167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2016 International Workshop on Big Data and Information Security (IWBIS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1