首页 > 最新文献

2017 International Conference on Data and Software Engineering (ICoDSE)最新文献

英文 中文
Hybrid recommender system using random walk with restart for social tagging system 基于随机漫步和重启的混合推荐系统
Pub Date : 2017-12-06 DOI: 10.1109/ICODSE.2017.8285875
Arif Wijonarko, Dade Nurjanah, D. S. Kusumo
Social Tagging Systems (STS) are very popular web application such that millions of people join the systems and actively share their contents. These enormous number of users are flooding STS with contents and tags in an unrestrained way in that threatening the capability of the system for relevant content retrieval and information sharing. Recommender Systems (RS) is a known successful method for overcome information overload problem by filtering the relevant contents over the nonrelevant contents. Besides manage folksonomy information, STS also handle social network information of its users. Both information can be used by RS to generate a good recommendation for its users. This work proposes an enhanced method for an existing hybrid recommender system, by incorporating social network information into the input of the hybrid recommender. The recommendation generation process includes Random Walk with Restart (RWR) alongside Content-Based Filtering (CBF) and Collaborative Filtering (CF) methods. Some parameters were introduced in the system to control weight contribution of each method. A comprehensive experiment with a set of a real-world open data set in two areas, social bookmark (Delicious.com) and music sharing (Last.fm) to test the proposed hybrid recommender system. The outcomes exhibit that it can give improvement compared to an existing method in terms of accuracy. The proposed hybrid achieves 24.4% more than RWR on the Delicious dataset, and 53.85% more than CBF on Lastfm dataset.
社会标签系统(STS)是一个非常流行的网络应用程序,数以百万计的人加入了这个系统,并积极地分享他们的内容。这些数量庞大的用户以不受限制的方式将内容和标签淹没在STS中,威胁到系统检索相关内容和信息共享的能力。推荐系统(RS)是一种通过过滤相关内容而非相关内容来克服信息过载问题的成功方法。除了管理大众分类法信息,STS还处理用户的社交网络信息。这两个信息都可以被RS用来为它的用户生成一个好的推荐。本文提出了一种针对现有混合推荐系统的改进方法,将社交网络信息纳入混合推荐系统的输入中。推荐生成过程包括随机行走与重启(RWR)以及基于内容的过滤(CBF)和协同过滤(CF)方法。在系统中引入了一些参数来控制每种方法的权重贡献。在两个领域,社交书签(Delicious.com)和音乐分享(Last.fm),对一组真实世界的开放数据集进行了全面的实验,以测试所提出的混合推荐系统。结果表明,与现有方法相比,该方法在准确性方面有很大提高。该混合算法在Delicious数据集上的RWR比前者高24.4%,在Lastfm数据集上的CBF比后者高53.85%。
{"title":"Hybrid recommender system using random walk with restart for social tagging system","authors":"Arif Wijonarko, Dade Nurjanah, D. S. Kusumo","doi":"10.1109/ICODSE.2017.8285875","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285875","url":null,"abstract":"Social Tagging Systems (STS) are very popular web application such that millions of people join the systems and actively share their contents. These enormous number of users are flooding STS with contents and tags in an unrestrained way in that threatening the capability of the system for relevant content retrieval and information sharing. Recommender Systems (RS) is a known successful method for overcome information overload problem by filtering the relevant contents over the nonrelevant contents. Besides manage folksonomy information, STS also handle social network information of its users. Both information can be used by RS to generate a good recommendation for its users. This work proposes an enhanced method for an existing hybrid recommender system, by incorporating social network information into the input of the hybrid recommender. The recommendation generation process includes Random Walk with Restart (RWR) alongside Content-Based Filtering (CBF) and Collaborative Filtering (CF) methods. Some parameters were introduced in the system to control weight contribution of each method. A comprehensive experiment with a set of a real-world open data set in two areas, social bookmark (Delicious.com) and music sharing (Last.fm) to test the proposed hybrid recommender system. The outcomes exhibit that it can give improvement compared to an existing method in terms of accuracy. The proposed hybrid achieves 24.4% more than RWR on the Delicious dataset, and 53.85% more than CBF on Lastfm dataset.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125978172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A statistical and rule-based spelling and grammar checker for Indonesian text 印度尼西亚文本的统计和基于规则的拼写和语法检查器
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285846
Asanilta Fahda, A. Purwarianti
Spelling and grammar checkers are widely-used tools which aim to help in detecting and correcting various writing errors. However, there are currently no proofreading systems capable of checking both spelling and grammar errors in Indonesian text. This paper proposes an Indonesian spelling and grammar checker prototype which uses a combination of rules and statistical methods. The rule matcher module currently uses 38 rules which detect, correct, and explain common errors in punctuation, word choice, and spelling. The spelling checker module examines every word using a dictionary trie to find misspellings and Damerau-Levenshtein distance neighbors as correction candidates. Morphological analysis is also added for certain word forms. A bigram/co-occurrence Hidden Markov Model is used for ranking and selecting the candidates. The grammar checker uses a trigram language model from tokens, POS tags, or phrase chunks for identifying sentences with incorrect structures. By experiment, the co-occurrence HMM with an emission probability weight coefficient of 0.95 is selected as the most suitable model for the spelling checker. As for the grammar checker, the phrase chunk model which normalizes by chunk length and uses a threshold score of −0.4 gave the best results. The document evaluation of this system showed an overall accuracy of 83.18%. This prototype is implemented as a web application.
拼写和语法检查器是广泛使用的工具,旨在帮助发现和纠正各种写作错误。但是,目前没有能够检查印尼语文本的拼写和语法错误的校对系统。本文提出了一个采用规则和统计相结合的印尼语拼写语法检查器原型。规则匹配器模块目前使用38条规则来检测、纠正和解释标点、选词和拼写方面的常见错误。拼写检查模块使用字典尝试检查每个单词,以查找拼写错误和Damerau-Levenshtein距离邻居作为纠正候选。词形分析也增加了某些词形。使用重图/共现隐马尔可夫模型对候选对象进行排序和选择。语法检查器使用来自标记、POS标记或短语块的三元组语言模型来识别结构不正确的句子。通过实验,选择发射概率权重系数为0.95的共现HMM作为最适合拼写检查器的模型。在语法检查器方面,采用块长度归一化、阈值得分为−0.4的短语块模型效果最好。经文献评价,该系统的总体准确率为83.18%。这个原型被实现为一个web应用程序。
{"title":"A statistical and rule-based spelling and grammar checker for Indonesian text","authors":"Asanilta Fahda, A. Purwarianti","doi":"10.1109/ICODSE.2017.8285846","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285846","url":null,"abstract":"Spelling and grammar checkers are widely-used tools which aim to help in detecting and correcting various writing errors. However, there are currently no proofreading systems capable of checking both spelling and grammar errors in Indonesian text. This paper proposes an Indonesian spelling and grammar checker prototype which uses a combination of rules and statistical methods. The rule matcher module currently uses 38 rules which detect, correct, and explain common errors in punctuation, word choice, and spelling. The spelling checker module examines every word using a dictionary trie to find misspellings and Damerau-Levenshtein distance neighbors as correction candidates. Morphological analysis is also added for certain word forms. A bigram/co-occurrence Hidden Markov Model is used for ranking and selecting the candidates. The grammar checker uses a trigram language model from tokens, POS tags, or phrase chunks for identifying sentences with incorrect structures. By experiment, the co-occurrence HMM with an emission probability weight coefficient of 0.95 is selected as the most suitable model for the spelling checker. As for the grammar checker, the phrase chunk model which normalizes by chunk length and uses a threshold score of −0.4 gave the best results. The document evaluation of this system showed an overall accuracy of 83.18%. This prototype is implemented as a web application.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122864978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Utility function based-mixed integer nonlinear programming (MINLP) problem model of information service pricing schemes 基于效用函数的信息服务定价方案混合整数非线性规划问题模型
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285892
Robinson Sitepu, F. Puspita, Shintya Apriliyani
The development of the internet in this era of globalization has increased fast. The need for internet becomes unlimited. Utility functions as one of measurements in internet usage, were usually associated with a level of satisfaction that user get for the use of information services used specifically relating to maximize profits in achieving specific. There are three internet pricing scheme used, that is flat fee, usage based and two part tariff by applying pricing scheme Internet by using one of the utility function is Cobb-Douglass with monitoring cost and marginal cost. The internet pricing scheme will be solved by LINGO 13.0 in form of non-linear optimization problems to get optimal solution. internet pricing scheme by considering marginal and monitoring cost of Cobb Douglass utility function, the optimal solution is obtained using the either usage-based pricing scheme model or two-part tariff pricing scheme model for each services offered, if we compared with flat-fee pricing scheme. It is the best way for provider to offer network based on two part tariff scheme. The results show that by applying two-part tariff scheme, the providers can maximize its revenue either for homogeneous and heterogeneous consumers.
在这个全球化时代,互联网的发展速度很快。对互联网的需求变得无限。效用函数作为互联网使用的衡量标准之一,通常与用户对使用信息服务的满意度有关,特别是与实现特定目标的利润最大化有关。有三种互联网定价方案被使用,即固定费用,基于使用和两部分关税通过应用定价方案互联网通过使用效用函数之一是科布-道格拉斯与监控成本和边际成本。利用LINGO 13.0将上网电价方案以非线性优化问题的形式进行求解,得到最优解。在考虑边际成本和监测成本的Cobb - Douglass效用函数的基础上,对所提供的每项服务分别采用基于使用量的定价方案模型或两部分资费定价方案模型,并与固定费率定价方案进行比较,得出最优方案。基于两部分资费方案,是运营商提供网络的最佳方式。结果表明,采用两部分收费方案,供应商对同质消费者和异质消费者都能实现收益最大化。
{"title":"Utility function based-mixed integer nonlinear programming (MINLP) problem model of information service pricing schemes","authors":"Robinson Sitepu, F. Puspita, Shintya Apriliyani","doi":"10.1109/ICODSE.2017.8285892","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285892","url":null,"abstract":"The development of the internet in this era of globalization has increased fast. The need for internet becomes unlimited. Utility functions as one of measurements in internet usage, were usually associated with a level of satisfaction that user get for the use of information services used specifically relating to maximize profits in achieving specific. There are three internet pricing scheme used, that is flat fee, usage based and two part tariff by applying pricing scheme Internet by using one of the utility function is Cobb-Douglass with monitoring cost and marginal cost. The internet pricing scheme will be solved by LINGO 13.0 in form of non-linear optimization problems to get optimal solution. internet pricing scheme by considering marginal and monitoring cost of Cobb Douglass utility function, the optimal solution is obtained using the either usage-based pricing scheme model or two-part tariff pricing scheme model for each services offered, if we compared with flat-fee pricing scheme. It is the best way for provider to offer network based on two part tariff scheme. The results show that by applying two-part tariff scheme, the providers can maximize its revenue either for homogeneous and heterogeneous consumers.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114158214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A generic tool for modeling and simulation of fire propagation using cellular automata 使用元胞自动机建模和模拟火灾传播的通用工具
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285870
Taufiqurrahman, Saiful Akbar
In spite of their simplicity, Cellular Automata (CA) have a great potential for being used in modeling various natural phenomenon. CA receive widespread interest among researchers from the diverse field to learn and use them in their application domain. Researchers usually develop CA model and the platform for simulating the models for each specific problems domain. This cause inefficiency for them and also some researchers do not know how to code the simulation program. The researchers should focus to develop the model without necessarily being worried to develop the platform to simulate the model. In this research, we attempt to develop a tool to model and simulate CA that helps to develop a various CA-based model. As a starting point, the tool will be implemented to modeling and simulate fire propagation. In consequence, literature study was conducted on some related research of fire propagation model to get the generic aspect from that models. The generic aspect then implemented into software artifact as a generic tool. In the end, the tools will be tested to implement rules from related research. The test result shows that with the same construction, the tool can implement various rule and fire propagation model.
细胞自动机(CA)虽然简单,但在各种自然现象的建模中具有很大的潜力。CA受到来自不同领域的研究人员的广泛关注,他们希望在各自的应用领域中学习和使用CA。研究人员通常针对每个特定的问题领域开发CA模型和模型仿真平台。这导致他们效率低下,而且一些研究人员不知道如何编写模拟程序。研究人员应该专注于开发模型,而不必担心开发模拟模型的平台。在本研究中,我们试图开发一种工具来建模和模拟CA,以帮助开发各种基于CA的模型。作为一个起点,该工具将实现建模和模拟火灾的传播。因此,本文对火灾传播模型的相关研究进行了文献研究,以期从该模型中获得共性。然后将通用方面作为通用工具实现到软件工件中。最后,将对这些工具进行测试,以实现相关研究中的规则。测试结果表明,在相同的结构下,该工具可以实现各种规则和火灾传播模型。
{"title":"A generic tool for modeling and simulation of fire propagation using cellular automata","authors":"Taufiqurrahman, Saiful Akbar","doi":"10.1109/ICODSE.2017.8285870","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285870","url":null,"abstract":"In spite of their simplicity, Cellular Automata (CA) have a great potential for being used in modeling various natural phenomenon. CA receive widespread interest among researchers from the diverse field to learn and use them in their application domain. Researchers usually develop CA model and the platform for simulating the models for each specific problems domain. This cause inefficiency for them and also some researchers do not know how to code the simulation program. The researchers should focus to develop the model without necessarily being worried to develop the platform to simulate the model. In this research, we attempt to develop a tool to model and simulate CA that helps to develop a various CA-based model. As a starting point, the tool will be implemented to modeling and simulate fire propagation. In consequence, literature study was conducted on some related research of fire propagation model to get the generic aspect from that models. The generic aspect then implemented into software artifact as a generic tool. In the end, the tools will be tested to implement rules from related research. The test result shows that with the same construction, the tool can implement various rule and fire propagation model.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116618405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Traffic speed prediction from GPS data of taxi trip using support vector regression 基于支持向量回归的出租车行程GPS数据交通速度预测
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285869
Dwina Satrinia, G. Saptawati
Traffic congestion prediction is one of the solution to overcome congestion problem. In this paper, we propose a development of system that can predict traffic speed with help of GPS data from history of taxi trip in Bandung city. GPS data from taxi trip in Bandung city does not have data speed and sometimes the location detected from GPS device is less accurate so additional steps required in data preprocessing phase. We proposed using Map Matching with topological information method in pre-processing phase. Map Matching will produce a new trajectory that has corresponded to the road. Then, from that new trajectories we calculate speed for each road segment. To predict traffic speed in the future we utilize Support Vector Regression (SVR) method. The results of this study indicate that Map Matching can help to obtain more accurate traffic speed and SVR has good performance to predict the traffic speed.
交通拥堵预测是解决交通拥堵问题的方法之一。本文提出了一种基于万隆市出租车行驶历史的GPS数据预测交通速度的系统开发方案。万隆市出租车行程的GPS数据没有数据速度,有时GPS设备检测到的位置不太准确,因此需要在数据预处理阶段进行额外的步骤。在预处理阶段,提出了基于拓扑信息的映射匹配方法。地图匹配将产生与道路对应的新轨迹。然后,根据新的轨迹,我们计算每个路段的速度。为了预测未来的交通速度,我们使用支持向量回归(SVR)方法。研究结果表明,地图匹配有助于获得更准确的交通速度,支持向量回归算法在预测交通速度方面具有良好的性能。
{"title":"Traffic speed prediction from GPS data of taxi trip using support vector regression","authors":"Dwina Satrinia, G. Saptawati","doi":"10.1109/ICODSE.2017.8285869","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285869","url":null,"abstract":"Traffic congestion prediction is one of the solution to overcome congestion problem. In this paper, we propose a development of system that can predict traffic speed with help of GPS data from history of taxi trip in Bandung city. GPS data from taxi trip in Bandung city does not have data speed and sometimes the location detected from GPS device is less accurate so additional steps required in data preprocessing phase. We proposed using Map Matching with topological information method in pre-processing phase. Map Matching will produce a new trajectory that has corresponded to the road. Then, from that new trajectories we calculate speed for each road segment. To predict traffic speed in the future we utilize Support Vector Regression (SVR) method. The results of this study indicate that Map Matching can help to obtain more accurate traffic speed and SVR has good performance to predict the traffic speed.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128547897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Content-based clustering and visualization of social media text messages 基于内容的社交媒体文本信息聚类和可视化
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285856
S. A. Barnard, S. M. Chung, Vincent A. Schmidt
Although Twitter has been around for more than ten years, crisis management agencies and first response personnel are not able to fully use the information this type of data provides during a crisis or a natural disaster. This paper presents a tool that automatically clusters geotagged text data based on their content, rather than by only time and location, and displays the clusters and their locations on the map. It allows at-a-glance information to be displayed throughout the evolution of a crisis. For accurate clustering, we used the silhouette coefficient to determine the number of clusters automatically. To visualize the topics (i.e., frequent words) within each cluster, we used the word cloud. Our experiments demonstrated the performance of this tool is very scalable. This tool could be easily used by first response and official management personnel to quickly determine when a crisis is occurring, where it is concentrated, and what resources to best deploy to stabilize the situation.
虽然Twitter已经存在了十多年,但危机管理机构和第一反应人员无法在危机或自然灾害期间充分利用这类数据提供的信息。本文提出了一种基于文本内容而不是时间和位置的地理标记文本数据自动聚类的工具,并在地图上显示聚类及其位置。它允许在整个危机演变过程中一目了然地显示信息。为了准确聚类,我们使用剪影系数来自动确定聚类的数量。为了可视化每个集群中的主题(即频繁词),我们使用了词云。我们的实验表明,该工具的性能具有很强的可扩展性。第一反应人员和官方管理人员可以很容易地使用这个工具来快速确定危机何时发生、集中在哪里,以及最好部署哪些资源来稳定局势。
{"title":"Content-based clustering and visualization of social media text messages","authors":"S. A. Barnard, S. M. Chung, Vincent A. Schmidt","doi":"10.1109/ICODSE.2017.8285856","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285856","url":null,"abstract":"Although Twitter has been around for more than ten years, crisis management agencies and first response personnel are not able to fully use the information this type of data provides during a crisis or a natural disaster. This paper presents a tool that automatically clusters geotagged text data based on their content, rather than by only time and location, and displays the clusters and their locations on the map. It allows at-a-glance information to be displayed throughout the evolution of a crisis. For accurate clustering, we used the silhouette coefficient to determine the number of clusters automatically. To visualize the topics (i.e., frequent words) within each cluster, we used the word cloud. Our experiments demonstrated the performance of this tool is very scalable. This tool could be easily used by first response and official management personnel to quickly determine when a crisis is occurring, where it is concentrated, and what resources to best deploy to stabilize the situation.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"16 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129643136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Anomaly detection using random forest: A performance revisited 使用随机森林进行异常检测:重新考察性能
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285847
Rifkie Primartha, Bayu Adhi Tama
Intruders have become more and more sophisticated thus a deterrence mechanism such as an intrusion detection systems (IDS) is pivotal in information security management. An IDS aims at capturing and repealing any malignant activities in the network before they can cause harmful destruction. An IDS relies on a well-trained classification model so the model is able to identify the presence of attacks effectively. This paper compares the performance of IDS by exerting random forest classifier with respect to two performance measures, i.e. accuracy and false alarm rate. Three public intrusion data sets, i.e NSL-KDD, UNSW-NB15, and GPRS are employed in the experiment. Furthermore, different tree-size ensembles are considered whilst other best learning parameters are obtained using a grid search. Our experimental results prove the superiority of random forest model for IDS as it significantly outperforms the similar ensemble, i.e. ensemble of random tree + naive bayes tree and other single classifier, i.e. naive bayes and neural network in terms of k-cross validation method.
随着入侵者的日益复杂,入侵检测系统(IDS)等威慑机制在信息安全管理中起着至关重要的作用。IDS的目的是在网络中的任何恶性活动造成有害破坏之前捕获并消除它们。IDS依赖于训练良好的分类模型,因此该模型能够有效地识别攻击的存在。本文利用随机森林分类器对IDS的准确率和虚警率两个性能指标进行了比较。实验采用NSL-KDD、UNSW-NB15和GPRS三个公共入侵数据集。此外,考虑不同的树大小集合,同时使用网格搜索获得其他最佳学习参数。我们的实验结果证明了随机森林模型在IDS中的优越性,在k-cross验证方法上,它明显优于相似的集合(随机树+朴素贝叶斯树的集合)和其他单一分类器(朴素贝叶斯和神经网络)。
{"title":"Anomaly detection using random forest: A performance revisited","authors":"Rifkie Primartha, Bayu Adhi Tama","doi":"10.1109/ICODSE.2017.8285847","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285847","url":null,"abstract":"Intruders have become more and more sophisticated thus a deterrence mechanism such as an intrusion detection systems (IDS) is pivotal in information security management. An IDS aims at capturing and repealing any malignant activities in the network before they can cause harmful destruction. An IDS relies on a well-trained classification model so the model is able to identify the presence of attacks effectively. This paper compares the performance of IDS by exerting random forest classifier with respect to two performance measures, i.e. accuracy and false alarm rate. Three public intrusion data sets, i.e NSL-KDD, UNSW-NB15, and GPRS are employed in the experiment. Furthermore, different tree-size ensembles are considered whilst other best learning parameters are obtained using a grid search. Our experimental results prove the superiority of random forest model for IDS as it significantly outperforms the similar ensemble, i.e. ensemble of random tree + naive bayes tree and other single classifier, i.e. naive bayes and neural network in terms of k-cross validation method.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130309501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Spatio-temporal mining to identify potential traff congestion based on transportation mode 基于交通方式的潜在交通拥堵时空挖掘
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285857
Irrevaldy, G. Saptawati
The increasing development of a city, creates density potential which could lead to traffic congestion. In recent years, the use of smartphone devices and other gadgets that have GPS (Global Positioning System) features become very commonly used in everyday activities. Previous work has built an architecture which could infer transportation mode based on GPS data. In this paper, we propose development of the previous work to detect potential traffic congestion based on transportation mode and with help from city spatial data. The data mining architecture is divided into three phases. In the first phase, we form classification model which will be used to get transportation mode information from GPS data. In the second phase, we extract spatial data, divide area into grids and divide time into several interval group. In the last phase, we use first phase result as a dataset to run in DBSCAN (Density-based spatial clustering of applications with noise) clustering algorithm for each different time interval group to know which grid area have traffic congestion potential. From this architecture, we introduced new term, cluster overlay which identify potential traffic congestion level in certain areas.
一个城市的不断发展,产生了可能导致交通拥堵的密度潜力。近年来,智能手机设备和其他具有GPS(全球定位系统)功能的小工具在日常活动中使用得非常普遍。之前的工作已经建立了一个架构,可以根据GPS数据推断交通方式。在本文中,我们提出了基于交通方式和城市空间数据的潜在交通拥堵检测的发展方向。数据挖掘体系结构分为三个阶段。在第一阶段,我们建立了分类模型,该模型将用于从GPS数据中获得运输方式信息。第二阶段提取空间数据,将区域划分为网格,将时间划分为多个区间组。在最后一个阶段,我们使用第一阶段的结果作为数据集,在DBSCAN(基于密度的空间聚类应用与噪声)聚类算法中运行每个不同的时间间隔组,以了解哪些网格区域具有交通拥堵的潜力。从这个架构中,我们引入了新的术语,集群覆盖,用于识别某些区域的潜在交通拥堵水平。
{"title":"Spatio-temporal mining to identify potential traff congestion based on transportation mode","authors":"Irrevaldy, G. Saptawati","doi":"10.1109/ICODSE.2017.8285857","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285857","url":null,"abstract":"The increasing development of a city, creates density potential which could lead to traffic congestion. In recent years, the use of smartphone devices and other gadgets that have GPS (Global Positioning System) features become very commonly used in everyday activities. Previous work has built an architecture which could infer transportation mode based on GPS data. In this paper, we propose development of the previous work to detect potential traffic congestion based on transportation mode and with help from city spatial data. The data mining architecture is divided into three phases. In the first phase, we form classification model which will be used to get transportation mode information from GPS data. In the second phase, we extract spatial data, divide area into grids and divide time into several interval group. In the last phase, we use first phase result as a dataset to run in DBSCAN (Density-based spatial clustering of applications with noise) clustering algorithm for each different time interval group to know which grid area have traffic congestion potential. From this architecture, we introduced new term, cluster overlay which identify potential traffic congestion level in certain areas.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124232840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Identification process relationship of process model discovery based on workflow-net 基于工作流网络的过程模型发现过程关系识别
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285876
Ferdi Rahmadi, G. Saptawati
While web services now is common to use as a solution to integrating business process in an organization, the effort to analyze the business process will getting more complicated because the distributed nature of services. Process on each service interacting to each other and the process will have some relation. To help analyze the business process in a service, a process mining technique can be used. How ever, discovered process model from process mining technique still using traditional process model, i.e., workflow net that have some limitations to describe an interacting process that happens in a web services. Based on the limitations, method to identifying interactions and relation process will be proposed in this paper. From the experiment, port component in proclet is capable to model process interactions between web service and can be implemented in process mining.
虽然现在通常将web服务用作集成组织中的业务流程的解决方案,但由于服务的分布式特性,分析业务流程的工作将变得更加复杂。进程上的每个服务相互交互,并且进程之间会有一定的关系。为了帮助分析服务中的业务流程,可以使用流程挖掘技术。然而,从过程挖掘技术中发现的过程模型仍然使用传统的过程模型,即工作流网络,对描述发生在web服务中的交互过程有一定的局限性。基于这些局限性,本文将提出识别交互和关系过程的方法。实验结果表明,proclet中的端口组件能够对web服务之间的流程交互进行建模,并且可以在流程挖掘中实现。
{"title":"Identification process relationship of process model discovery based on workflow-net","authors":"Ferdi Rahmadi, G. Saptawati","doi":"10.1109/ICODSE.2017.8285876","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285876","url":null,"abstract":"While web services now is common to use as a solution to integrating business process in an organization, the effort to analyze the business process will getting more complicated because the distributed nature of services. Process on each service interacting to each other and the process will have some relation. To help analyze the business process in a service, a process mining technique can be used. How ever, discovered process model from process mining technique still using traditional process model, i.e., workflow net that have some limitations to describe an interacting process that happens in a web services. Based on the limitations, method to identifying interactions and relation process will be proposed in this paper. From the experiment, port component in proclet is capable to model process interactions between web service and can be implemented in process mining.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126299338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation of greedy perimeter stateless routing protocol on vehicular ad hoc network in palembang city 巨港市车辆自组网中贪婪周长无状态路由协议的评价
Pub Date : 2017-11-01 DOI: 10.1109/ICODSE.2017.8285873
R. F. Malik, Muhammad Sulkhan Nurfatih, H. Ubaya, Rido Zulfahmi, E. Sodikin
Vehicular Ad Hoc Network (VANET) is a Mobile Ad Hoc Network (MANET) concept where the vehicle acts as a node on the network. VANET uses a mobile vehicle in the Ad Hoc based wireless network. Therefore, VANET is a development of wireless networks have a protocol-specific routing implementation within the network. In this paper, we will analyze correlation queue length and time in the positionbased routing, Greedy Perimeter Stateless Routing (GPSR). The simulation is using Network Simulator 3 (NS3) and Simulation of Urban Mobility (SUMO). The scenarios are node count and velocity in the urban environment in Palembang. We propose a queue parameters (queue length and time) in the GPSR protocol in order to produce better performance. The result, by increasing the existing GPRS attributes (queue length is 96 byte and queue time is 45 seconds) made the GPSR performance better in PDR, throughput and packet loss. While for end-to-end delay results, this parameter values have minimum performance.
车辆自组织网络(VANET)是一种移动自组织网络(MANET)概念,其中车辆充当网络上的节点。VANET在基于Ad Hoc的无线网络中使用移动车辆。因此,VANET是无线网络发展中具有特定协议的网络内路由实现。本文将分析基于位置路由的贪婪周边无状态路由(GPSR)中的相关队列长度和时间。仿真采用Network Simulator 3 (NS3)和simulation of Urban Mobility (SUMO)。场景是巨港城市环境中的节点数和速度。为了获得更好的性能,我们在GPSR协议中提出了队列参数(队列长度和时间)。结果表明,通过增加现有的GPRS属性(队列长度为96字节,队列时间为45秒),GPRS在PDR、吞吐量和丢包方面的性能得到了提高。而对于端到端延迟结果,该参数值具有最低性能。
{"title":"Evaluation of greedy perimeter stateless routing protocol on vehicular ad hoc network in palembang city","authors":"R. F. Malik, Muhammad Sulkhan Nurfatih, H. Ubaya, Rido Zulfahmi, E. Sodikin","doi":"10.1109/ICODSE.2017.8285873","DOIUrl":"https://doi.org/10.1109/ICODSE.2017.8285873","url":null,"abstract":"Vehicular Ad Hoc Network (VANET) is a Mobile Ad Hoc Network (MANET) concept where the vehicle acts as a node on the network. VANET uses a mobile vehicle in the Ad Hoc based wireless network. Therefore, VANET is a development of wireless networks have a protocol-specific routing implementation within the network. In this paper, we will analyze correlation queue length and time in the positionbased routing, Greedy Perimeter Stateless Routing (GPSR). The simulation is using Network Simulator 3 (NS3) and Simulation of Urban Mobility (SUMO). The scenarios are node count and velocity in the urban environment in Palembang. We propose a queue parameters (queue length and time) in the GPSR protocol in order to produce better performance. The result, by increasing the existing GPRS attributes (queue length is 96 byte and queue time is 45 seconds) made the GPSR performance better in PDR, throughput and packet loss. While for end-to-end delay results, this parameter values have minimum performance.","PeriodicalId":366005,"journal":{"name":"2017 International Conference on Data and Software Engineering (ICoDSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132622811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2017 International Conference on Data and Software Engineering (ICoDSE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1