In this paper, we propose an improved variant of HOT SAX algorithm, called HS-Squeezer, for efficient discord detection in static time series. HS-Squeezer employs clustering rather than augmented trie to arrange two ordering heuristics in HOT SAX. Furthermore, we introduce HS-Squeezer-Stream, the application of HS-Squeezer in the framework for detecting local discords in streaming time series. The experimental results reveal that HS-Squeezer can detect the same quality discords as those detected by HOT SAX but with much shorter run time. Furthermore, HS-Squeezer-Stream demonstrates a fast response in handling time series streams with quality local discords detected.
{"title":"Discord Discovery in Streaming Time Series based on an Improved HOT SAX Algorithm","authors":"Pham Minh Chau, B. Duc, D. T. Anh","doi":"10.1145/3287921.3287929","DOIUrl":"https://doi.org/10.1145/3287921.3287929","url":null,"abstract":"In this paper, we propose an improved variant of HOT SAX algorithm, called HS-Squeezer, for efficient discord detection in static time series. HS-Squeezer employs clustering rather than augmented trie to arrange two ordering heuristics in HOT SAX. Furthermore, we introduce HS-Squeezer-Stream, the application of HS-Squeezer in the framework for detecting local discords in streaming time series. The experimental results reveal that HS-Squeezer can detect the same quality discords as those detected by HOT SAX but with much shorter run time. Furthermore, HS-Squeezer-Stream demonstrates a fast response in handling time series streams with quality local discords detected.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115720862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cancer classification using microarray gene expression data is known to contain keys for addressing the fundamental problems relating to cancer diagnosis and drug discovery. However, classification gene expression data is a difficult task because these data are characterized by high dimensional space and small sample size. We investigate random ensemble oblique decision stumps (RODS) based on linear support vector machine (SVM) that is suitable for classifying very-high-dimensional microarray gene expression data. Our classification algorithms (called Bag-RODS and Boost-RODS) learn multiple oblique decision stumps in the way of bagging and boosting to form an ensemble of classifiers more accurate than single model. Numerical test results on 50 very-high-dimensional microarray gene expression datasets from Kent Ridge Biomedical repository and Array Expression repositories show that our proposed algorithms are more accurate than the-state-of-the-art classification models, including k nearest neighbors (kNN), SVM, decision trees and ensembles of decision trees like random forests, bagging and adaboost.
{"title":"Random ensemble oblique decision stumps for classifying gene expression data","authors":"Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do","doi":"10.1145/3287921.3287987","DOIUrl":"https://doi.org/10.1145/3287921.3287987","url":null,"abstract":"Cancer classification using microarray gene expression data is known to contain keys for addressing the fundamental problems relating to cancer diagnosis and drug discovery. However, classification gene expression data is a difficult task because these data are characterized by high dimensional space and small sample size. We investigate random ensemble oblique decision stumps (RODS) based on linear support vector machine (SVM) that is suitable for classifying very-high-dimensional microarray gene expression data. Our classification algorithms (called Bag-RODS and Boost-RODS) learn multiple oblique decision stumps in the way of bagging and boosting to form an ensemble of classifiers more accurate than single model. Numerical test results on 50 very-high-dimensional microarray gene expression datasets from Kent Ridge Biomedical repository and Array Expression repositories show that our proposed algorithms are more accurate than the-state-of-the-art classification models, including k nearest neighbors (kNN), SVM, decision trees and ensembles of decision trees like random forests, bagging and adaboost.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116831586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yohei Yamauchi, Ryo Nishide, Yumi Takaki, C. Ohta, K. Oyama, T. Ohkawa
In management of beef cattle, it is important to grasp cattle's unusual conditions such as estrus and disease as soon as possible. The methods to detect the condition of estrus based on information such as the amount of activity from pedometer have been proposed so far. In this paper, we propose an innovative method to grasp cattle's status by focusing on unique changes of communities in time series. Our method has a possibility that it can discover and deal with new cases which were not found on the amount of activity in previous method. To extract cattle's communities, the nature of cattle's behaviors that synchronize in the community is used. The cattle's walking speed is calculated by position information obtained from GPS collar, and their behaviors are classified. We quantify the duration of cattle's behaviors being synchronized, and create a graph to observe the relationship between cattle. Then, we extract communities from the graph, and analyze changes of communities in time series. In the proposed method, we focused on the size of community and discovered cases that the cattle's condition, especially estrus, changed accordingly due to the dynamic changes of communities.
{"title":"Cattle Community Extraction Using the Interactions Based on Synchronous Behavior","authors":"Yohei Yamauchi, Ryo Nishide, Yumi Takaki, C. Ohta, K. Oyama, T. Ohkawa","doi":"10.1145/3287921.3287941","DOIUrl":"https://doi.org/10.1145/3287921.3287941","url":null,"abstract":"In management of beef cattle, it is important to grasp cattle's unusual conditions such as estrus and disease as soon as possible. The methods to detect the condition of estrus based on information such as the amount of activity from pedometer have been proposed so far. In this paper, we propose an innovative method to grasp cattle's status by focusing on unique changes of communities in time series. Our method has a possibility that it can discover and deal with new cases which were not found on the amount of activity in previous method. To extract cattle's communities, the nature of cattle's behaviors that synchronize in the community is used. The cattle's walking speed is calculated by position information obtained from GPS collar, and their behaviors are classified. We quantify the duration of cattle's behaviors being synchronized, and create a graph to observe the relationship between cattle. Then, we extract communities from the graph, and analyze changes of communities in time series. In the proposed method, we focused on the size of community and discovered cases that the cattle's condition, especially estrus, changed accordingly due to the dynamic changes of communities.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128409436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jongwoo Choi, Sungjune Chang, Joon-Hak Bang, J. Park, Hae Ryong Lee
The recent convergence of ICT technology and biotechnology has led to an increasing number of areas in which machines take over what people do. The small sized medical electronic devices easily check health condition by simple test and confirm whether the bio signals are abnormal to advise medical treatment in the hospital. The role of such health screening devices is not to diagnose the disease precisely but to check bio-signal roughly. The conventional health screening devices pick blood sample to detect amount of specific component in blood but invasive blood sampling is painful and burdensome to the patient. Breath analysis is a technique that provides comfortable and easy health screening method unlike conventional techniques because it is non-invasive. However, it is difficult for people to use it because of its complex breath sampling procedures, huge system volume, and sensitive characteristics of gas sensors. We designed a smartphone-sized miniaturized electronic nose system and constructed database system to derive novel rules from various multi-sensors data. The experiment was conducted by applying the electronic nose system to actual diabetic patients and we confirmed the possibility of distinguishing the diseases had. If big data is collected, various artificial intelligence algorithms will be applied to find more accurate health screening methods.
{"title":"The Miniaturized IoT Electronic Nose Device and Sensor Data Collection System for Health Screening by Volatile Organic Compounds Detection from Exhaled Breath","authors":"Jongwoo Choi, Sungjune Chang, Joon-Hak Bang, J. Park, Hae Ryong Lee","doi":"10.1145/3287921.3287943","DOIUrl":"https://doi.org/10.1145/3287921.3287943","url":null,"abstract":"The recent convergence of ICT technology and biotechnology has led to an increasing number of areas in which machines take over what people do. The small sized medical electronic devices easily check health condition by simple test and confirm whether the bio signals are abnormal to advise medical treatment in the hospital. The role of such health screening devices is not to diagnose the disease precisely but to check bio-signal roughly. The conventional health screening devices pick blood sample to detect amount of specific component in blood but invasive blood sampling is painful and burdensome to the patient. Breath analysis is a technique that provides comfortable and easy health screening method unlike conventional techniques because it is non-invasive. However, it is difficult for people to use it because of its complex breath sampling procedures, huge system volume, and sensitive characteristics of gas sensors. We designed a smartphone-sized miniaturized electronic nose system and constructed database system to derive novel rules from various multi-sensors data. The experiment was conducted by applying the electronic nose system to actual diabetic patients and we confirmed the possibility of distinguishing the diseases had. If big data is collected, various artificial intelligence algorithms will be applied to find more accurate health screening methods.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"15 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130556402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tu Le, Duc-Tan Lam, Dinh-Phong Vo, A. Yoshitaka, H. Le
On the days that surface is covered by thick clouds, the acquired images from optical satellites usually suffer missing information, caused to not able to use because we can't see anything under cloudy cover. Many methods have been proposed in order to recover the missing data, but those only recover the image from one or more images that seem to be referenced images, and those approaches mostly select the similar part or corresponding pixels to recover the original damaged. This research proposes a new approach for recovering damaged image, which aims to use this periodical weather pattern. The main idea is combining prediction and reconstruction techniques. For prediction, A time-series data of consecutive images will be used to predict the next image. This image will be used as referenced image for reconstruction process.
{"title":"Recover Water Bodies in Multi-spectral Satellite Images with Deep Neural Nets","authors":"Tu Le, Duc-Tan Lam, Dinh-Phong Vo, A. Yoshitaka, H. Le","doi":"10.1145/3287921.3287969","DOIUrl":"https://doi.org/10.1145/3287921.3287969","url":null,"abstract":"On the days that surface is covered by thick clouds, the acquired images from optical satellites usually suffer missing information, caused to not able to use because we can't see anything under cloudy cover. Many methods have been proposed in order to recover the missing data, but those only recover the image from one or more images that seem to be referenced images, and those approaches mostly select the similar part or corresponding pixels to recover the original damaged. This research proposes a new approach for recovering damaged image, which aims to use this periodical weather pattern. The main idea is combining prediction and reconstruction techniques. For prediction, A time-series data of consecutive images will be used to predict the next image. This image will be used as referenced image for reconstruction process.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131001564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huynh Thi Thanh Binh, Tran The Anh, D. Son, P. Duc, B. Nguyen
Recently, IoT (Internet of Things) has grown steadily, which generates a tremendous amount of data and puts pressure on the cloud computing infrastructures. Fog computing architecture is proposed to be the next generation of the cloud computing to meet the requirements of the IoT network. One of the big challenges of fog computing is resource management and operating function, as task scheduling, which guarantees a high-performance and cost-effective service. We propose TCaS - an evolutionary algorithm to deal with Bag-of-Tasks application in cloud-fog computing environment. By addressing the tasks in this distributed system, our proposed approach aimed at achieving the optimal tradeoff between the execution time and operating costs. We verify our proposal by extensive simulation with various size of data set, and the experimental results demonstrate that our scheduling algorithm outperforms 38.6% Bee Life Algorithm (BLA) in time-cost tradeoff, especially, performs much better than BLA in execution time, simultaneously, satisfies user's requirement.
近年来,物联网(IoT)稳步发展,产生了大量的数据,给云计算基础设施带来了压力。雾计算架构是为满足物联网网络的需求而提出的下一代云计算架构。雾计算面临的最大挑战之一是资源管理和操作功能,如任务调度,这保证了高性能和经济高效的服务。我们提出了一种进化算法TCaS来处理任务袋算法在云雾计算环境中的应用。通过处理这个分布式系统中的任务,我们提出的方法旨在实现执行时间和操作成本之间的最佳权衡。实验结果表明,本文提出的调度算法在时间成本权衡上优于38.6%的蜜蜂寿命算法(Bee Life algorithm, BLA),特别是在执行时间上优于BLA,同时满足了用户的需求。
{"title":"An Evolutionary Algorithm for Solving Task Scheduling Problem in Cloud-Fog Computing Environment","authors":"Huynh Thi Thanh Binh, Tran The Anh, D. Son, P. Duc, B. Nguyen","doi":"10.1145/3287921.3287984","DOIUrl":"https://doi.org/10.1145/3287921.3287984","url":null,"abstract":"Recently, IoT (Internet of Things) has grown steadily, which generates a tremendous amount of data and puts pressure on the cloud computing infrastructures. Fog computing architecture is proposed to be the next generation of the cloud computing to meet the requirements of the IoT network. One of the big challenges of fog computing is resource management and operating function, as task scheduling, which guarantees a high-performance and cost-effective service. We propose TCaS - an evolutionary algorithm to deal with Bag-of-Tasks application in cloud-fog computing environment. By addressing the tasks in this distributed system, our proposed approach aimed at achieving the optimal tradeoff between the execution time and operating costs. We verify our proposal by extensive simulation with various size of data set, and the experimental results demonstrate that our scheduling algorithm outperforms 38.6% Bee Life Algorithm (BLA) in time-cost tradeoff, especially, performs much better than BLA in execution time, simultaneously, satisfies user's requirement.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133627020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evaluating the security level of the Web-Portal is an urgent need, but it is not yet paid enough attention. A quantitative method is a key factor in analyzing security level evaluation. The formal model of the standard ISO/IEC 15408 and some other security standards cannot be directly applied to Web-Portals due to the generality and the abstraction of the model. The prestigious model of OWASP (Open Web Application Security Project) provides many best practices for Web application, but it is sill not enough for a quantitative evaluation and it is hardly applicable to compare the security level of different Web applications. This paper proposes a model and a quantitative method for evaluating the security levels of the Web-Portals based on the standard ISO/IEC 15408, which is highly feasible in the practice.
{"title":"Evaluating the security levels of the Web-Portals based on the standard ISO/IEC 15408","authors":"Dang-Hai Hoang, P. T. Nga","doi":"10.1145/3287921.3287985","DOIUrl":"https://doi.org/10.1145/3287921.3287985","url":null,"abstract":"Evaluating the security level of the Web-Portal is an urgent need, but it is not yet paid enough attention. A quantitative method is a key factor in analyzing security level evaluation. The formal model of the standard ISO/IEC 15408 and some other security standards cannot be directly applied to Web-Portals due to the generality and the abstraction of the model. The prestigious model of OWASP (Open Web Application Security Project) provides many best practices for Web application, but it is sill not enough for a quantitative evaluation and it is hardly applicable to compare the security level of different Web applications. This paper proposes a model and a quantitative method for evaluating the security levels of the Web-Portals based on the standard ISO/IEC 15408, which is highly feasible in the practice.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131413935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over recent years, Android is always captured roughly 80% of the worldwide smartphone volume. Due to its popularity and open characteristic, the Android OS is becoming the system platform most targeted from mobile malware. They can cause a lot of damage on Android devices such as data loss or sabotage of hardware. According to the predictive characteristics, machine learning is a good approach to deal with the number of new malwares increasing rapidly. In this paper, we propose Neural Network for Android Detection of Malware (NADM). The NADM performs an analysis process to gather features of Android applications. Then, these data will be converted into joint vector spaces, which to be input for the training part of deep learning process. Our classifier model can achieve a high accuracy system and has been applied in sProtect [15] on Google Play.
{"title":"NADM: Neural Network for Android Detection Malware","authors":"Nguyen Viet Duc, P. T. Giang","doi":"10.1145/3287921.3287977","DOIUrl":"https://doi.org/10.1145/3287921.3287977","url":null,"abstract":"Over recent years, Android is always captured roughly 80% of the worldwide smartphone volume. Due to its popularity and open characteristic, the Android OS is becoming the system platform most targeted from mobile malware. They can cause a lot of damage on Android devices such as data loss or sabotage of hardware. According to the predictive characteristics, machine learning is a good approach to deal with the number of new malwares increasing rapidly. In this paper, we propose Neural Network for Android Detection of Malware (NADM). The NADM performs an analysis process to gather features of Android applications. Then, these data will be converted into joint vector spaces, which to be input for the training part of deep learning process. Our classifier model can achieve a high accuracy system and has been applied in sProtect [15] on Google Play.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124567848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In our previous work, we introduced a hybrid fingerprint matcher which consists of two stages: local minutiae matching stage and consolidation stage. To improve the accuracy of the former stage, in this paper we suggest characterizing each minutia by an additional feature representing the ability to distinguish it from other minutiae in the fingerprint. By utilizing the discriminability of each minutia in the calculation of the local similarity score between two minutiae, the performance of the local matching stage is improved significantly. Thereby, an increase in the accuracy of the whole matching algorithm of 0.33% in EER and 0.51% in FMR1000 over thepreviousworknow makesour matcherrank2nd in FVC2002-DB2A leaderboard.
{"title":"An Improved Fingerprint Matching Algorithm Using Low Discriminative Region","authors":"Nghia Duong, Minh Nguyen, Hieu Quang, Hoang Manh Cuong","doi":"10.1145/3287921.3287986","DOIUrl":"https://doi.org/10.1145/3287921.3287986","url":null,"abstract":"In our previous work, we introduced a hybrid fingerprint matcher which consists of two stages: local minutiae matching stage and consolidation stage. To improve the accuracy of the former stage, in this paper we suggest characterizing each minutia by an additional feature representing the ability to distinguish it from other minutiae in the fingerprint. By utilizing the discriminability of each minutia in the calculation of the local similarity score between two minutiae, the performance of the local matching stage is improved significantly. Thereby, an increase in the accuracy of the whole matching algorithm of 0.33% in EER and 0.51% in FMR1000 over thepreviousworknow makesour matcherrank2nd in FVC2002-DB2A leaderboard.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125886089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online newspaper nowadays is gradually replacing the traditional one and the variety of articles on newspaper motivated the need for capturing hot topics to give Internet users a shortcut to the hot news. A hot topic always reflects the people's concern in real life and has big impact not only on community but also in business. In this paper, we proposed a novel topic detection approach by applying Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) on Vector Space Model (VSM) to solve the challenge in noisy data and Pearson product-moment correlation coefficient (PMCC) on high ranking keywords to identify topics behind keywords. The proposed approach is evaluated over a dataset of ten thousand of articles and the experimental results are competitive in term of precision with other state-of-the-art methods.
{"title":"Hot Topic Detection on Newspaper","authors":"T. Cao, Tat-Huy Tran, Thanh-Thuy Luu","doi":"10.1145/3287921.3287965","DOIUrl":"https://doi.org/10.1145/3287921.3287965","url":null,"abstract":"Online newspaper nowadays is gradually replacing the traditional one and the variety of articles on newspaper motivated the need for capturing hot topics to give Internet users a shortcut to the hot news. A hot topic always reflects the people's concern in real life and has big impact not only on community but also in business. In this paper, we proposed a novel topic detection approach by applying Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) on Vector Space Model (VSM) to solve the challenge in noisy data and Pearson product-moment correlation coefficient (PMCC) on high ranking keywords to identify topics behind keywords. The proposed approach is evaluated over a dataset of ten thousand of articles and the experimental results are competitive in term of precision with other state-of-the-art methods.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129419138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}