Vimala Balakrishnan, Pravin Kumar Selvanayagam, L. Yin
1. Sentiment and emotion analyses provide a quick and easy way to infer users' perceptions regarding products, services, topics and events, and thus rendering it useful to businesses and government bodies for effective decision making. In this paper, we describe the outcomes of sentiment and emotion analyses performed on a mobile payment app, Boost, which is available in the Google Play Store. A total of 2463 text reviews were gathered, however, after pre-processing, 1054 of these reviews were annotated and used for sentiment and emotion analyses. Four supervised learning algorithms, namely, Support Vector Machine, Naïve Bayes, Decision Tree and Random Forest were compared using Python. Accuracy and F1 scores indicate Random Forest to have outperformed all the other algorithms for both sentiment and emotion analyses. A vast majority of the reviews were found to contain anger for the negative sentiments, whereas joy was observed for the positive reviews.
{"title":"Sentiment and Emotion Analyses for Malaysian Mobile Digital Payment Applications","authors":"Vimala Balakrishnan, Pravin Kumar Selvanayagam, L. Yin","doi":"10.1145/3388142.3388144","DOIUrl":"https://doi.org/10.1145/3388142.3388144","url":null,"abstract":"1. Sentiment and emotion analyses provide a quick and easy way to infer users' perceptions regarding products, services, topics and events, and thus rendering it useful to businesses and government bodies for effective decision making. In this paper, we describe the outcomes of sentiment and emotion analyses performed on a mobile payment app, Boost, which is available in the Google Play Store. A total of 2463 text reviews were gathered, however, after pre-processing, 1054 of these reviews were annotated and used for sentiment and emotion analyses. Four supervised learning algorithms, namely, Support Vector Machine, Naïve Bayes, Decision Tree and Random Forest were compared using Python. Accuracy and F1 scores indicate Random Forest to have outperformed all the other algorithms for both sentiment and emotion analyses. A vast majority of the reviews were found to contain anger for the negative sentiments, whereas joy was observed for the positive reviews.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115886601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The 2019 World Malaria Report confirms that Africa continue to bear the burden of malaria morbidity. The continent accounted for over 93% of the global malaria incidence reported in 2018. Despite the numerous multi-level and consultative efforts to combat this epidemic, malaria continues to claim thousands of human lives, especially those of children under 5 years of age. Since malaria is preventable and treatable, one of the solutions towards reducing the number of deaths is by implementing an effective malaria outbreak early warning system that can forecast malaria incidence long before they occur. This way, policymakers can put mitigation measures in place. Tapping into the success of machine learning algorithms in predicting disease outbreaks, we present a malaria outbreak prediction system that is anchored on the well-established correlation between certain climatic conditions and breeding environment of the malaria carrying vector (mosquito). Historical datasets on climate and malaria incidence are used to train nine machine learning algorithms and four best performing ones identified based on classification accuracy and computation performance. Preceding the models' development, reliability and correlation analysis was carried out on the data; this was then followed by reduction of the dimensionality of the feature space of the two datasets. Given the power of deep learning in handling selectivity variance, the malaria predictor system was developed based on the deep learning algorithm. Further, the evaluation of the system was done using the Simulator function in RapidMiner and the accuracy of the predictions assessed using an independent dataset that was not used in the models' development. With prediction accuracy of up to 99%, this system has the potential in contributing to the fight against malaria epidemic in Africa and elsewhere in the world.
{"title":"Africa's Malaria Epidemic Predictor: Application of Machine Learning on Malaria Incidence and Climate Data","authors":"M. Masinde","doi":"10.1145/3388142.3388158","DOIUrl":"https://doi.org/10.1145/3388142.3388158","url":null,"abstract":"The 2019 World Malaria Report confirms that Africa continue to bear the burden of malaria morbidity. The continent accounted for over 93% of the global malaria incidence reported in 2018. Despite the numerous multi-level and consultative efforts to combat this epidemic, malaria continues to claim thousands of human lives, especially those of children under 5 years of age. Since malaria is preventable and treatable, one of the solutions towards reducing the number of deaths is by implementing an effective malaria outbreak early warning system that can forecast malaria incidence long before they occur. This way, policymakers can put mitigation measures in place. Tapping into the success of machine learning algorithms in predicting disease outbreaks, we present a malaria outbreak prediction system that is anchored on the well-established correlation between certain climatic conditions and breeding environment of the malaria carrying vector (mosquito). Historical datasets on climate and malaria incidence are used to train nine machine learning algorithms and four best performing ones identified based on classification accuracy and computation performance. Preceding the models' development, reliability and correlation analysis was carried out on the data; this was then followed by reduction of the dimensionality of the feature space of the two datasets. Given the power of deep learning in handling selectivity variance, the malaria predictor system was developed based on the deep learning algorithm. Further, the evaluation of the system was done using the Simulator function in RapidMiner and the accuracy of the predictions assessed using an independent dataset that was not used in the models' development. With prediction accuracy of up to 99%, this system has the potential in contributing to the fight against malaria epidemic in Africa and elsewhere in the world.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123855716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jose G. Garcia, Elizabeth R. Villota, C. B. Castañón
In this paper we provide an approach on sports analysis using Deep learning techniques. As part of a current project, the volleyball's basic reception technique has been divided into temporal phases. We performed an evaluation over our own labelled dataset consisting in 14814 frames from 69 videos depicting the desired reception technique. A model based on the YOLO algorithm was trained to locate the player region and trim the frames. Two time fusion methods over the frames wereproposed and evaluated with CNN models which were created based on the ResNet models and a transfer learning approach was used to train them. The results show that these models were able of classifying the frames with their corresponding phase with an accuracy of 92.21% in our best model. Also it can be seen that the RGB merging method shown in this paper helps to slightly improve the performance of the models. Furthermore, the models were capable of learning the temporality of the phases as the mistakes done by the models occurred between consecutive phases.
{"title":"An Approach to Temporal Phase Classification on Videos of the Volleyball's Basic Reception Technique","authors":"Jose G. Garcia, Elizabeth R. Villota, C. B. Castañón","doi":"10.1145/3388142.3388150","DOIUrl":"https://doi.org/10.1145/3388142.3388150","url":null,"abstract":"In this paper we provide an approach on sports analysis using Deep learning techniques. As part of a current project, the volleyball's basic reception technique has been divided into temporal phases. We performed an evaluation over our own labelled dataset consisting in 14814 frames from 69 videos depicting the desired reception technique. A model based on the YOLO algorithm was trained to locate the player region and trim the frames. Two time fusion methods over the frames wereproposed and evaluated with CNN models which were created based on the ResNet models and a transfer learning approach was used to train them. The results show that these models were able of classifying the frames with their corresponding phase with an accuracy of 92.21% in our best model. Also it can be seen that the RGB merging method shown in this paper helps to slightly improve the performance of the models. Furthermore, the models were capable of learning the temporality of the phases as the mistakes done by the models occurred between consecutive phases.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115005915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtual Reality is a technology, which helps the user to interact with an environment of the simulation of the real world or an imaginary world. This paper details an application using the Unity game engine to showcase and demonstrate the capabilities of VR. It includes the capabilities such as: Movement of the player, to demonstrate the ability to move your player model around the play area using the controllers; Head tracking in game, to demonstrate the VR head tracking technology by moving the camera in the game to match your head position in real life; and using in game hands to mimic the positioning of your real hands in relation to your body, to interact with objects within the virtual space.
{"title":"A Study and Implementation of Virtual Reality and its Capabilities","authors":"Jacob Stec, S. Shanmugam","doi":"10.1145/3388142.3388173","DOIUrl":"https://doi.org/10.1145/3388142.3388173","url":null,"abstract":"Virtual Reality is a technology, which helps the user to interact with an environment of the simulation of the real world or an imaginary world. This paper details an application using the Unity game engine to showcase and demonstrate the capabilities of VR. It includes the capabilities such as: Movement of the player, to demonstrate the ability to move your player model around the play area using the controllers; Head tracking in game, to demonstrate the VR head tracking technology by moving the camera in the game to match your head position in real life; and using in game hands to mimic the positioning of your real hands in relation to your body, to interact with objects within the virtual space.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128853512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amal Almansour, Ghada Alsaeedi, Haifaa Almazroui, Huda Almuflehi
The ever-increasing popularity of Online Social Networks (OSNs) sites for posting and sharing photos and videos has led to unprecedented concerns on privacy violation. The available Online social networking (OSNs) sites offer a limited degree of privacy protection solutions. Most of the solutions focus on conditional access control meaning, allowing users to control who can access the shared photos and videos. This research study attempts to address this issue and study the scenario when a user shares a photo and video containing individuals other than himself/herself (public-level photos and videos). For privacy-preserving, the proposed system intends to support an automated human face recognition and filtering for public-level photos and videos. Our proposed approach takes into account the content of a photo and makes use of face filtering as a strategy to increase privacy while still allowing users to share photos. First, the proposed system automatically identifies a person face frame from a digital image or video. Next, it compares the detected face features to each face vectors stored in the application database. After face recognition step completed, the proposed system filters all un-known persons in the image. Conventual Neural Network (CNN) has been used for face detection step, while deep learning facial embedding algorithms has been used for the recognition. Both have shown high accuracy results in addition to the capability of being executed in real-time. For face filtering, Gaussian algorithm has been used for face blurring as it has been considered a very fast real-time algorithm which allow the user to control the blurring degree. Based on the obtained results after testing the system using three different datasets, we can conclude that our system can detect and recognize the faces in photos and videos using the improved Conventual Neural Network (CNN) for face detection with 91.3% accuracy and K-Nearest Neighbor (KNN) for the face recognition with 96.154% accuracy using I-Privacy dataset.
随着上传、分享照片和视频的网络社交网络(sns)日益普及,人们对侵犯个人隐私的担忧达到了前所未有的程度。现有的在线社交网络(Online social networking, osn)站点提供的隐私保护解决方案程度有限。大多数解决方案都侧重于条件访问控制,允许用户控制谁可以访问共享的照片和视频。本研究试图解决这个问题,并研究当用户分享包含他/她以外的个人的照片和视频(公共级照片和视频)时的场景。为了保护隐私,拟议的系统打算支持自动人脸识别和过滤公共级别的照片和视频。我们提出的方法考虑了照片的内容,并利用面部过滤作为一种策略来增加隐私,同时仍然允许用户分享照片。首先,该系统从数字图像或视频中自动识别人脸帧。然后,将检测到的人脸特征与存储在应用程序数据库中的每个人脸向量进行比较。在人脸识别步骤完成后,该系统对图像中所有未知的人进行过滤。人脸检测步骤采用了卷积神经网络(CNN),人脸识别步骤采用了深度学习人脸嵌入算法。除了实时执行的能力外,两者都显示出高精度的结果。对于人脸滤波,高斯算法被用于人脸模糊,因为它被认为是一种非常快速的实时算法,允许用户控制模糊程度。基于三种不同数据集的测试结果,我们可以得出结论,我们的系统使用改进的卷积神经网络(CNN)进行人脸检测和识别,准确率为91.3%,使用I-Privacy数据集进行k -最近邻(KNN)进行人脸识别,准确率为96.154%。
{"title":"I-Privacy Photo: Face Recognition and Filtering","authors":"Amal Almansour, Ghada Alsaeedi, Haifaa Almazroui, Huda Almuflehi","doi":"10.1145/3388142.3388161","DOIUrl":"https://doi.org/10.1145/3388142.3388161","url":null,"abstract":"The ever-increasing popularity of Online Social Networks (OSNs) sites for posting and sharing photos and videos has led to unprecedented concerns on privacy violation. The available Online social networking (OSNs) sites offer a limited degree of privacy protection solutions. Most of the solutions focus on conditional access control meaning, allowing users to control who can access the shared photos and videos. This research study attempts to address this issue and study the scenario when a user shares a photo and video containing individuals other than himself/herself (public-level photos and videos). For privacy-preserving, the proposed system intends to support an automated human face recognition and filtering for public-level photos and videos. Our proposed approach takes into account the content of a photo and makes use of face filtering as a strategy to increase privacy while still allowing users to share photos. First, the proposed system automatically identifies a person face frame from a digital image or video. Next, it compares the detected face features to each face vectors stored in the application database. After face recognition step completed, the proposed system filters all un-known persons in the image. Conventual Neural Network (CNN) has been used for face detection step, while deep learning facial embedding algorithms has been used for the recognition. Both have shown high accuracy results in addition to the capability of being executed in real-time. For face filtering, Gaussian algorithm has been used for face blurring as it has been considered a very fast real-time algorithm which allow the user to control the blurring degree. Based on the obtained results after testing the system using three different datasets, we can conclude that our system can detect and recognize the faces in photos and videos using the improved Conventual Neural Network (CNN) for face detection with 91.3% accuracy and K-Nearest Neighbor (KNN) for the face recognition with 96.154% accuracy using I-Privacy dataset.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128636445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phishing websites are fraudulent sites that impersonate a trusted party to gain access to sensitive information of an individual person or organization. Traditionally, phishing website detection is done through the usage of blacklist databases. However, due to the current, rapid development of global networking and communication technologies, there are numerous websites and it has become difficult to classify based on traditional methods since new websites are created every second. In this paper, we are proposing a real-time, anti-phishing system. In the first step, we extract the lexical and host-based properties of a website. In the second step, we combine URL (Uniform Resource Locator) features, NLP and host-based properties to train the machine learning and deep learning models. Our detection model is able to detect phishing URLs with a detection rate of 94.89%.
{"title":"Learning-based models to detect runtime phishing activities using URLs","authors":"Surya Srikar Sirigineedi, Jayesh Soni, Himanshu Upadhyay","doi":"10.1145/3388142.3388170","DOIUrl":"https://doi.org/10.1145/3388142.3388170","url":null,"abstract":"Phishing websites are fraudulent sites that impersonate a trusted party to gain access to sensitive information of an individual person or organization. Traditionally, phishing website detection is done through the usage of blacklist databases. However, due to the current, rapid development of global networking and communication technologies, there are numerous websites and it has become difficult to classify based on traditional methods since new websites are created every second. In this paper, we are proposing a real-time, anti-phishing system. In the first step, we extract the lexical and host-based properties of a website. In the second step, we combine URL (Uniform Resource Locator) features, NLP and host-based properties to train the machine learning and deep learning models. Our detection model is able to detect phishing URLs with a detection rate of 94.89%.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128810700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For a large volume of data, the clustering algorithm is of significant importance to categorize and analyze data. Accordingly, choosing the optimal number of clusters (K) is an essential factor, but it also is a tricky problem in big data analysis. More importantly, it is to efficiently determine the best K automatically, which is the main issue in clustering algorithms. Indeed, considering both the quality and efficiency of the clustering algorithm during defining K can be a trade-off that is our primary purpose to overcome. K-Means is still one of the popular clustering algorithms, which has a shortcoming that K needs to be pre-set. We introduce a new process with fewer K-Means running, which selects the most promising time to run the K-Means algorithm. To achieve this goal, we applied Bisecting K-Means and a different splitting measure, which all are contributed to efficiently determine the number of clusters automatically while maintaining the quality of clustering for a large set of high dimensional data. We carried out our experimental studies on different data sets and found that our procedure has the flexibility of choosing different criteria for determining the optimal K under each of them. Experiments indicate higher efficiency through decreasing of computation cost compared with the Ray&Turi method or with the use of only the K-Means algorithm.
{"title":"Fast Automatic Determination of Cluster Numbers for High Dimensional Big Data","authors":"Z. Safari, Khalid T. Mursi, Yu Zhuang","doi":"10.1145/3388142.3388164","DOIUrl":"https://doi.org/10.1145/3388142.3388164","url":null,"abstract":"For a large volume of data, the clustering algorithm is of significant importance to categorize and analyze data. Accordingly, choosing the optimal number of clusters (K) is an essential factor, but it also is a tricky problem in big data analysis. More importantly, it is to efficiently determine the best K automatically, which is the main issue in clustering algorithms. Indeed, considering both the quality and efficiency of the clustering algorithm during defining K can be a trade-off that is our primary purpose to overcome. K-Means is still one of the popular clustering algorithms, which has a shortcoming that K needs to be pre-set. We introduce a new process with fewer K-Means running, which selects the most promising time to run the K-Means algorithm. To achieve this goal, we applied Bisecting K-Means and a different splitting measure, which all are contributed to efficiently determine the number of clusters automatically while maintaining the quality of clustering for a large set of high dimensional data. We carried out our experimental studies on different data sets and found that our procedure has the flexibility of choosing different criteria for determining the optimal K under each of them. Experiments indicate higher efficiency through decreasing of computation cost compared with the Ray&Turi method or with the use of only the K-Means algorithm.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125418833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the spread of technology and World Wide Web, Online Social media invaded every home in the world; hence, the analysis of such networks became an important, yet challenging, case of study for researchers. One of the most interesting fields of study in social network analysis is to identify influential users who are important actors in online social networks. In this paper, identification of influential users on some trendy hashtags has been done. The data of these trendy hashtags has been collected between December 2015 and March 2016. For the identification of influential users from the trendy hashtags collected, Association Rule Learning has been employed. In order to investigate why users were detected as influential, different Influence Measures have been identified. The results of this study indicate the effectiveness of using Association Rule Learning for identifying influential users, moreover, detecting the most effective Influence Measures for these users.
{"title":"Use of FP-Growth Algorithm in Identifying Influential Users on Twitter Hashtags","authors":"Islam Elkabani, Layal Abu Daher, R. Zantout","doi":"10.1145/3388142.3388148","DOIUrl":"https://doi.org/10.1145/3388142.3388148","url":null,"abstract":"Due to the spread of technology and World Wide Web, Online Social media invaded every home in the world; hence, the analysis of such networks became an important, yet challenging, case of study for researchers. One of the most interesting fields of study in social network analysis is to identify influential users who are important actors in online social networks. In this paper, identification of influential users on some trendy hashtags has been done. The data of these trendy hashtags has been collected between December 2015 and March 2016. For the identification of influential users from the trendy hashtags collected, Association Rule Learning has been employed. In order to investigate why users were detected as influential, different Influence Measures have been identified. The results of this study indicate the effectiveness of using Association Rule Learning for identifying influential users, moreover, detecting the most effective Influence Measures for these users.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131519822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Action Rule mining is a method to extract actionable pattern from datasets. Classification rules are those which helps predict the object's class, whereas Action Rules are actionable knowledge that provide suggestions on how an objects state or class can be changed to a more desirable state to benefit the user. In the internet era, digital data is wide spread and growing tremendously is such way that it is neccessary to develop systems that process the data in a much faster way. The literature of Action Rule mining involves two major frameworks; Rule-Based method: where extraction of Action Rules is dependent on the pre-processing step of classification rule discovery, and Object Based Method: extracts Action Rule directly from the database without the use of classification rules. Object based method extracts Action Rule in a apriori like method using frequent action sets. Since this method is iterative it takes longer time to process huge datasets. In this work we propose a novel hybrid approach to generate complete set of Action Rules by combining the Rule-Based and Object-Based methods. Our results show a significant improvement, where the existing algorithm does not span for the Twitter dataset. On the other hand the proposed hybrid approach completed execution and produces Action Rules in less than 500 seconds on a Cluster.
{"title":"Hybrid Scalable Action Rule: Rule Based and Object Based","authors":"Jaishree Ranganathan, Sagar Sharma, A. Tzacheva","doi":"10.1145/3388142.3388143","DOIUrl":"https://doi.org/10.1145/3388142.3388143","url":null,"abstract":"Action Rule mining is a method to extract actionable pattern from datasets. Classification rules are those which helps predict the object's class, whereas Action Rules are actionable knowledge that provide suggestions on how an objects state or class can be changed to a more desirable state to benefit the user. In the internet era, digital data is wide spread and growing tremendously is such way that it is neccessary to develop systems that process the data in a much faster way. The literature of Action Rule mining involves two major frameworks; Rule-Based method: where extraction of Action Rules is dependent on the pre-processing step of classification rule discovery, and Object Based Method: extracts Action Rule directly from the database without the use of classification rules. Object based method extracts Action Rule in a apriori like method using frequent action sets. Since this method is iterative it takes longer time to process huge datasets. In this work we propose a novel hybrid approach to generate complete set of Action Rules by combining the Rule-Based and Object-Based methods. Our results show a significant improvement, where the existing algorithm does not span for the Twitter dataset. On the other hand the proposed hybrid approach completed execution and produces Action Rules in less than 500 seconds on a Cluster.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121909922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akalanka Mailewa Dissanayaka, S. Mengel, L. Gittner, H. Khan
A Vulnerability Management system is a disciplined, programmatic approach to discover and mitigate vulnerabilities in a system. While securing systems from data exploitation and theft, Vulnerability Management works as a cyclical practice of identifying, assessing, prioritizing, remediating, and mitigating security weaknesses. In this approach, root cause analysis is conducted to find solutions for the problematic areas in policy, process, and standards including configuration standards. Three major reasons make Vulnerability Assessment and Management a vital part in IT risk management. The reasons are, namely, 1. Persistent Threats - Attacks exploiting security vulnerabilities for financial gain and criminal agendas continue to dominate headlines, 2. Regulations - Many government and industry regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) and Sarbanes-Oxley (SOX), mandate rigorous vulnerability management practices, and 3. Risk Management - Mature organizations treat vulnerability assessment and management as a key risk management component [1]. Thus, as opposed to a reactive and technology-oriented approach, a well-organized and executed Vulnerability Management system is proactive and business-oriented. This research initially collects all the vulnerabilities associated with the Data Analytic Framework Implemented with MongoDB on Linux Containers (LXCs) by using the vulnerability analysis testbed with seven deferent analyzing tools. Thereafter, this research work first prioritizes all the vulnerabilities using "Low", "Medium", and "High" according to their severity level. Then, it discovers and analyzes the root cause of fifteen various vulnerabilities with different severities. Finally, according to each of the vulnerability root causes, this research proposes security techniques, to avoid or mitigate those vulnerabilities from the current system.
漏洞管理系统是一种规范的、程序化的方法,用于发现和减轻系统中的漏洞。在保护系统免受数据利用和盗窃的同时,漏洞管理作为识别、评估、确定优先级、修复和减轻安全弱点的周期性实践。在这种方法中,进行根本原因分析,以找到策略、流程和标准(包括配置标准)中有问题区域的解决方案。主要有三个原因使得脆弱性评估和管理成为IT风险管理的重要组成部分。原因是:1。持续的威胁-利用安全漏洞获取经济利益和犯罪议程的攻击继续占据头条新闻。法规—许多政府和行业法规,如《健康保险可携带性和责任法案》(HIPAA)和《萨班斯-奥克斯利法案》(SOX),要求严格的漏洞管理实践;风险管理——成熟的组织将脆弱性评估和管理视为风险管理的关键组成部分[1]。因此,与被动的和面向技术的方法相反,组织良好并执行良好的漏洞管理系统是主动的和面向业务的。本研究通过使用包含七种不同分析工具的漏洞分析测试平台,初步收集了与MongoDB on Linux Containers (LXCs)相关的所有漏洞。随后,本研究工作首先根据漏洞的严重程度,用“低”、“中”、“高”对所有漏洞进行优先级排序。然后,发现并分析了15个不同严重程度的漏洞的根本原因。最后,根据每个漏洞的根源,本研究提出了安全技术,以避免或减轻这些漏洞来自当前系统。
{"title":"Vulnerability Prioritization, Root Cause Analysis, and Mitigation of Secure Data Analytic Framework Implemented with MongoDB on Singularity Linux Containers","authors":"Akalanka Mailewa Dissanayaka, S. Mengel, L. Gittner, H. Khan","doi":"10.1145/3388142.3388168","DOIUrl":"https://doi.org/10.1145/3388142.3388168","url":null,"abstract":"A Vulnerability Management system is a disciplined, programmatic approach to discover and mitigate vulnerabilities in a system. While securing systems from data exploitation and theft, Vulnerability Management works as a cyclical practice of identifying, assessing, prioritizing, remediating, and mitigating security weaknesses. In this approach, root cause analysis is conducted to find solutions for the problematic areas in policy, process, and standards including configuration standards. Three major reasons make Vulnerability Assessment and Management a vital part in IT risk management. The reasons are, namely, 1. Persistent Threats - Attacks exploiting security vulnerabilities for financial gain and criminal agendas continue to dominate headlines, 2. Regulations - Many government and industry regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) and Sarbanes-Oxley (SOX), mandate rigorous vulnerability management practices, and 3. Risk Management - Mature organizations treat vulnerability assessment and management as a key risk management component [1]. Thus, as opposed to a reactive and technology-oriented approach, a well-organized and executed Vulnerability Management system is proactive and business-oriented. This research initially collects all the vulnerabilities associated with the Data Analytic Framework Implemented with MongoDB on Linux Containers (LXCs) by using the vulnerability analysis testbed with seven deferent analyzing tools. Thereafter, this research work first prioritizes all the vulnerabilities using \"Low\", \"Medium\", and \"High\" according to their severity level. Then, it discovers and analyzes the root cause of fifteen various vulnerabilities with different severities. Finally, according to each of the vulnerability root causes, this research proposes security techniques, to avoid or mitigate those vulnerabilities from the current system.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130840930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}