Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170295
Jintao Zhang, Duyu Liu, Wei Xiang
With the increasing complexity and importance of network applications, the security requirements for network protocols are getting higher and higher. Fuzzing, as one of the important Testing techniques to discover undisclosed vulnerabilities, tests the security of network protocols by producing and sending large amounts of data and injecting them into software, many important vulnerabilities such as denial of service, buffer overflows, and formatting strings can be found. Manual generation of test cases can be more appropriate to the target under test, but manual Fuzzing requires accurate understanding of network protocol details and tedious work to construct a large number of test data sets, resulting in limited coverage and poor effect. In order to solve this problem, this paper first investigates the types of vulnerabilities, summarizes the fuzzy strategies, and then constructs a fuzzer based on the existing framework, adopts mutation strategy to construct malformed network packets, which are sent to the tested target for testing. The results show that this method is more efficient than manual analysis in vulnerability mining, which provides a good foundation for improving the security of network protocols.
{"title":"Network Protocol Automatic Vulnerability Mining Technology Based on Fuzzing","authors":"Jintao Zhang, Duyu Liu, Wei Xiang","doi":"10.1109/ISKE47853.2019.9170295","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170295","url":null,"abstract":"With the increasing complexity and importance of network applications, the security requirements for network protocols are getting higher and higher. Fuzzing, as one of the important Testing techniques to discover undisclosed vulnerabilities, tests the security of network protocols by producing and sending large amounts of data and injecting them into software, many important vulnerabilities such as denial of service, buffer overflows, and formatting strings can be found. Manual generation of test cases can be more appropriate to the target under test, but manual Fuzzing requires accurate understanding of network protocol details and tedious work to construct a large number of test data sets, resulting in limited coverage and poor effect. In order to solve this problem, this paper first investigates the types of vulnerabilities, summarizes the fuzzy strategies, and then constructs a fuzzer based on the existing framework, adopts mutation strategy to construct malformed network packets, which are sent to the tested target for testing. The results show that this method is more efficient than manual analysis in vulnerability mining, which provides a good foundation for improving the security of network protocols.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123378070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170426
Muqeet Ahmad, Jie Hu, Mushtaq Ahmad, Zaid Al-Huda, Faisal Khurshid
Mobile wireless sensor networks (MWSNs) face many challenges in the age of the Internet of Things. Mobility and communication of sensors cost significant energy consumption; thus reduces the lifetime of the network. There are various techniques to improve the MWSN’s lifetime, one of which is the clustering method. Clustering-based routing protocols of MWSN improve energy efficiency and enhance network lifetime. In this work, we use two multi-criteria decision-making (MCDM) methods (i.e. Fuzzy Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) and Fuzzy Analytic Hierarchy Process (AHP)) for the selection of optimal cluster leaders (CLs) concerning five criteria: link reliability, connectivity, remaining energy, distance from base station (BS) and speed of sensor nodes. These methods may perform CLs selection, hence significantly improve the network lifetime. Fuzzy TOPSIS and fuzzy AHP are not only compared with each other but also with optimized zone-based energy-efficient routing protocol (OZEEP) and with Enhanced Cluster Based Routing Protocol (ECBR). Our results show that the fuzzy TOPSIS based optimal CL selection not only increases the lifetime of the network but also conserves the energy with minimum overhead.
{"title":"Optimal Cluster Leader Selection Using MCDM Methods in MWSN: A Comparative Study","authors":"Muqeet Ahmad, Jie Hu, Mushtaq Ahmad, Zaid Al-Huda, Faisal Khurshid","doi":"10.1109/ISKE47853.2019.9170426","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170426","url":null,"abstract":"Mobile wireless sensor networks (MWSNs) face many challenges in the age of the Internet of Things. Mobility and communication of sensors cost significant energy consumption; thus reduces the lifetime of the network. There are various techniques to improve the MWSN’s lifetime, one of which is the clustering method. Clustering-based routing protocols of MWSN improve energy efficiency and enhance network lifetime. In this work, we use two multi-criteria decision-making (MCDM) methods (i.e. Fuzzy Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) and Fuzzy Analytic Hierarchy Process (AHP)) for the selection of optimal cluster leaders (CLs) concerning five criteria: link reliability, connectivity, remaining energy, distance from base station (BS) and speed of sensor nodes. These methods may perform CLs selection, hence significantly improve the network lifetime. Fuzzy TOPSIS and fuzzy AHP are not only compared with each other but also with optimized zone-based energy-efficient routing protocol (OZEEP) and with Enhanced Cluster Based Routing Protocol (ECBR). Our results show that the fuzzy TOPSIS based optimal CL selection not only increases the lifetime of the network but also conserves the energy with minimum overhead.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125784718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170335
Bin Zhang, Jie Lu, Guangquan Zhang
Machine learning in evolving environment faces challenges due to concept drift. Most concept drift adaptation methods focus on modifying the model. In this paper, a method, Drift Adaptation via Joint Distribution Alignment (DAJDA), is proposed. DAJDA performs a linear transformation to the drift instances instead of modifying model. Instances are transformed into a common feature space, reducing the discrepancy of distributions before and after drift. Experimental studies show that DAJDA has abilities to improve the performance of learning model under concept drift.
{"title":"Drift Adaptation via Joint Distribution Alignment","authors":"Bin Zhang, Jie Lu, Guangquan Zhang","doi":"10.1109/ISKE47853.2019.9170335","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170335","url":null,"abstract":"Machine learning in evolving environment faces challenges due to concept drift. Most concept drift adaptation methods focus on modifying the model. In this paper, a method, Drift Adaptation via Joint Distribution Alignment (DAJDA), is proposed. DAJDA performs a linear transformation to the drift instances instead of modifying model. Instances are transformed into a common feature space, reducing the discrepancy of distributions before and after drift. Experimental studies show that DAJDA has abilities to improve the performance of learning model under concept drift.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126826596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170364
Kingsley Nketia Acheampong, Wenhong Tian
Neural sequence-to-sequence (seq2seq) grammatical error correction (GEC) models are usually computationally expensive both in training and in translation inference. Also, they tend to suffer from poor generalization and arrive at inept capabilities due to limited error-corrected data, and thus, incapable of effectively correcting grammar. In this work, we propose the use of neural cascading strategies in enhancing the effectiveness of neural sequence-to-sequence grammatical error correction models as inspired by post-editing processes of neural machine translations. The findings of our experiments show that adapting cascading techniques in low resource NMT models unleashes performances that is comparable to high setting NMT models. We extensively exploit and evaluate multiple cascading learning strategies and establish best practices toward improving neural seq2seq GECs.
{"title":"Toward End-to-End Neural Cascading Strategies for Grammatical Error Correction","authors":"Kingsley Nketia Acheampong, Wenhong Tian","doi":"10.1109/ISKE47853.2019.9170364","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170364","url":null,"abstract":"Neural sequence-to-sequence (seq2seq) grammatical error correction (GEC) models are usually computationally expensive both in training and in translation inference. Also, they tend to suffer from poor generalization and arrive at inept capabilities due to limited error-corrected data, and thus, incapable of effectively correcting grammar. In this work, we propose the use of neural cascading strategies in enhancing the effectiveness of neural sequence-to-sequence grammatical error correction models as inspired by post-editing processes of neural machine translations. The findings of our experiments show that adapting cascading techniques in low resource NMT models unleashes performances that is comparable to high setting NMT models. We extensively exploit and evaluate multiple cascading learning strategies and establish best practices toward improving neural seq2seq GECs.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124365277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motif co-occurrence matrix (MCM) is one of the commonly used image features descriptions. However, MCM has two shortcomings. One is that it doesn’t meets translation invariance, and another is that different sub-blocks can be represented by the same motif. In order to overcome the two shortcomings, an image retrieval method based on blocked motif co-occurrence matrix (BMCM) is proposed. BMCM divides the image into five regions firstly, and then it extracts the quantized HSV color histogram feature, MCM feature and local binary pattern feature from each of the five regions. Considering that different attributes and contents of the image are described by different characteristics, this paper achieves image retrieval through a weighted fusion of the above three features. Experimental results in Corel 1k standard image library show that the proposed method has higher precision and lower computation complexity compared with MCM, BCTF and MCMCM algorithm.
{"title":"Image Retrieval Based on Block Motif Co-Occurrence Matrix","authors":"Yuan-ting Yan, Meili Yang, Shi-bo Zhang, Yanping Zhang","doi":"10.1109/ISKE47853.2019.9170384","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170384","url":null,"abstract":"Motif co-occurrence matrix (MCM) is one of the commonly used image features descriptions. However, MCM has two shortcomings. One is that it doesn’t meets translation invariance, and another is that different sub-blocks can be represented by the same motif. In order to overcome the two shortcomings, an image retrieval method based on blocked motif co-occurrence matrix (BMCM) is proposed. BMCM divides the image into five regions firstly, and then it extracts the quantized HSV color histogram feature, MCM feature and local binary pattern feature from each of the five regions. Considering that different attributes and contents of the image are described by different characteristics, this paper achieves image retrieval through a weighted fusion of the above three features. Experimental results in Corel 1k standard image library show that the proposed method has higher precision and lower computation complexity compared with MCM, BCTF and MCMCM algorithm.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125231485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170273
Yihan Wang, Fusheng Yu, W. Homenda, A. Jastrzębska, Xiao Wang
In a fuzzy cognitive map-based forecasting model, causal relationships (represented with a weight matrix) are constant. This may hinder the applicability of such a model. In this paper, we propose an adaptive fuzzy cognitive map-based forecasting model. Different from the existing models, the proposed model is made of a collection of fuzzy cognitive maps. Maps are constructed according to the clustering results of the so-called premises covering an entire time series. Subsequently, we use an optimization algorithm to train parameters of each fuzzy cognitive map individually. The proposed model construction procedure allows forming fuzzy cognitive maps that more flexible and, thus, suitable for forecasting of long time series. In experimental studies on synthetic time series and real time series, the proposed model performed very well in comparison with the original fuzzy cognitive map-based forecasting model and another two forecasting models.
{"title":"A New Adaptive Fuzzy Cognitive Map-Based Forecasting Model for Time Series","authors":"Yihan Wang, Fusheng Yu, W. Homenda, A. Jastrzębska, Xiao Wang","doi":"10.1109/ISKE47853.2019.9170273","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170273","url":null,"abstract":"In a fuzzy cognitive map-based forecasting model, causal relationships (represented with a weight matrix) are constant. This may hinder the applicability of such a model. In this paper, we propose an adaptive fuzzy cognitive map-based forecasting model. Different from the existing models, the proposed model is made of a collection of fuzzy cognitive maps. Maps are constructed according to the clustering results of the so-called premises covering an entire time series. Subsequently, we use an optimization algorithm to train parameters of each fuzzy cognitive map individually. The proposed model construction procedure allows forming fuzzy cognitive maps that more flexible and, thus, suitable for forecasting of long time series. In experimental studies on synthetic time series and real time series, the proposed model performed very well in comparison with the original fuzzy cognitive map-based forecasting model and another two forecasting models.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122118065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170268
Fei Teng, Meng Bai, Tian-Jie Li
Associating genes with diseases is a fundamental challenge in human health with applications of understanding disease properties and developing precision medicine. Over the past decades, biomedical articles increase explosively, which contain a great number of gene-disease associations (GDAs). Association extraction requires annotated corpus of high accuracy, but manual labeling is time consuming and labor intensive. This paper proposes a distant supervision-based method, to automatically label corpus for GDAs extraction. Compared with the manually annotated gold corpus, the automatic labeled corpus has much larger scale and better quality. It improves the performance of state-of-the-art extraction models, with AUC of 0.96, and F1 of 90%. To the best of our knowledge, this is the first study of automatic labeling GDAs in the field of precision medicine. We extracted GDAs using new corpora from 115,261 PubMed abstracts about 29 lung cancers, and finally discovered 296 new genes/proteins related to lung cancers. These findings indicate new directions for drug design.
{"title":"Automatic Labeling for Gene-Disease Associations through Distant Supervision","authors":"Fei Teng, Meng Bai, Tian-Jie Li","doi":"10.1109/ISKE47853.2019.9170268","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170268","url":null,"abstract":"Associating genes with diseases is a fundamental challenge in human health with applications of understanding disease properties and developing precision medicine. Over the past decades, biomedical articles increase explosively, which contain a great number of gene-disease associations (GDAs). Association extraction requires annotated corpus of high accuracy, but manual labeling is time consuming and labor intensive. This paper proposes a distant supervision-based method, to automatically label corpus for GDAs extraction. Compared with the manually annotated gold corpus, the automatic labeled corpus has much larger scale and better quality. It improves the performance of state-of-the-art extraction models, with AUC of 0.96, and F1 of 90%. To the best of our knowledge, this is the first study of automatic labeling GDAs in the field of precision medicine. We extracted GDAs using new corpora from 115,261 PubMed abstracts about 29 lung cancers, and finally discovered 296 new genes/proteins related to lung cancers. These findings indicate new directions for drug design.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123209322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170279
Jinghong Wang, Jiateng Yang, S. Shi
With the advent of the era 01 big data, complex network community detection has become an important research direction. Based on the similarity of the community detection methods attractions GN algorithm fast and accurate but has higher time complexity. In order to overcome the deficiency of GN efficiency, this paper presents a semi-supervised GN algorithm based on node similarity, takes full advantage of the known node, cannot link constraints, a priori information combined with the similarity information between nodes, and validated using artificial and real networks. It is proved that the algorithm proposed in this paper reduces the GN algorithm's time complexity and improve the efficiency.
{"title":"Semi-Supervised Community Discovery Algorithm Based on Node Similarity","authors":"Jinghong Wang, Jiateng Yang, S. Shi","doi":"10.1109/ISKE47853.2019.9170279","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170279","url":null,"abstract":"With the advent of the era 01 big data, complex network community detection has become an important research direction. Based on the similarity of the community detection methods attractions GN algorithm fast and accurate but has higher time complexity. In order to overcome the deficiency of GN efficiency, this paper presents a semi-supervised GN algorithm based on node similarity, takes full advantage of the known node, cannot link constraints, a priori information combined with the similarity information between nodes, and validated using artificial and real networks. It is proved that the algorithm proposed in this paper reduces the GN algorithm's time complexity and improve the efficiency.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126250391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170405
Bo Fu, Ruizi Wang, Yi Li, Chengdi Xing
We introduce an effective technique to restore the images corrupted by additive Gaussian noise and impulse Salt and Pepper noise. In this Work, a three-step non-local directional-guided filter is seted up. We begin by identifying Salt and Pepper noise, estimate intensity of mixed noise and preliminarily remove and repair it by Maximum Likelihood Estimator. Afterwards, use a set of discrete total variation (TV) models to mine potential directional information and generate a set of directional-guided templates. At last, We build a non-local directional-guided filter to restore lost details. Experimental results verify that the proposed algorithm can obtain the best denoising performance compared With some typical methods. In the case of high intensity noise pollution, our algorithm has more advantages.
{"title":"Non-Local Directional-Guided Filter for Impulse-Gaussian Mixed Noise Image Denoising","authors":"Bo Fu, Ruizi Wang, Yi Li, Chengdi Xing","doi":"10.1109/ISKE47853.2019.9170405","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170405","url":null,"abstract":"We introduce an effective technique to restore the images corrupted by additive Gaussian noise and impulse Salt and Pepper noise. In this Work, a three-step non-local directional-guided filter is seted up. We begin by identifying Salt and Pepper noise, estimate intensity of mixed noise and preliminarily remove and repair it by Maximum Likelihood Estimator. Afterwards, use a set of discrete total variation (TV) models to mine potential directional information and generate a set of directional-guided templates. At last, We build a non-local directional-guided filter to restore lost details. Experimental results verify that the proposed algorithm can obtain the best denoising performance compared With some typical methods. In the case of high intensity noise pollution, our algorithm has more advantages.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128207269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01DOI: 10.1109/ISKE47853.2019.9170436
Bassoma Diallo, Jie Hu, Tianrui Li, G. Khan, Chunyan Ji
Many works implemented multi-view clustering algorithms in document clustering. One challenging problem in document clustering is the similarity metric. Existing multi-view document clustering methods widely used two measurements: the Cosine similarity and the Euclidean Distance (ED). The first did not consider the magnitude between the two vectors. The second cannot compute the dissimilarity of two vectors that share the same ED. In this paper, we proposed a multi-view document clustering scheme to overcome these drawbacks by calculating the heterogeneity between documents with the same ED while taking into consideration their magnitudes. The experimental results show that the proposed similarity function can measure the similarity between documents more accurately than the existing metrics, and the proposed document clustering scheme goes beyond the limit of several state-of-the-art algorithms.
{"title":"Concept-Enhanced Multi-view Clustering of Document Data","authors":"Bassoma Diallo, Jie Hu, Tianrui Li, G. Khan, Chunyan Ji","doi":"10.1109/ISKE47853.2019.9170436","DOIUrl":"https://doi.org/10.1109/ISKE47853.2019.9170436","url":null,"abstract":"Many works implemented multi-view clustering algorithms in document clustering. One challenging problem in document clustering is the similarity metric. Existing multi-view document clustering methods widely used two measurements: the Cosine similarity and the Euclidean Distance (ED). The first did not consider the magnitude between the two vectors. The second cannot compute the dissimilarity of two vectors that share the same ED. In this paper, we proposed a multi-view document clustering scheme to overcome these drawbacks by calculating the heterogeneity between documents with the same ED while taking into consideration their magnitudes. The experimental results show that the proposed similarity function can measure the similarity between documents more accurately than the existing metrics, and the proposed document clustering scheme goes beyond the limit of several state-of-the-art algorithms.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130584386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}