Pub Date : 2022-09-22DOI: 10.15625/1813-9663/38/3/17220
Chi Cuong Nguyen, Long Giang Nguyen, Giang Son Tran
Lung cancer is one of the most serious cancer-related diseases in Vietnam and all over the world. Early detection of lung nodules can help to increase the survival rate of lung cancer patients. Computer-aided diagnosis (CAD) systems are proposed in the literature for early detection of lung nodules. However, most of the current CAD systems are based on the building of high-quality machine learning models for a fixed dataset rather than taking into account the dataset properties which are very important for the lung cancer diagnosis. In this paper, we follow the direction of data-centric approach for lung nodule detection by proposing a data-centric method to improve detection performance of lung nodules on CT scans. Our method takes into account the dataset-specific features (nodule sizes and aspect ratios) to train detection models as well as add more training data from local Vietnamese hospital. We experiment our method on the three widely used object detection networks (Faster R-CNN, YOLOv3 and RetinaNet). The experimental results show that our proposed method improves detection sensitivity of these object detection models up to 4.24%.
{"title":"DATA-CENTRIC DEEP LEARNING METHOD FOR PULMONARY NODULE DETECTION","authors":"Chi Cuong Nguyen, Long Giang Nguyen, Giang Son Tran","doi":"10.15625/1813-9663/38/3/17220","DOIUrl":"https://doi.org/10.15625/1813-9663/38/3/17220","url":null,"abstract":"Lung cancer is one of the most serious cancer-related diseases in Vietnam and all over the world. Early detection of lung nodules can help to increase the survival rate of lung cancer patients. Computer-aided diagnosis (CAD) systems are proposed in the literature for early detection of lung nodules. However, most of the current CAD systems are based on the building of high-quality machine learning models for a fixed dataset rather than taking into account the dataset properties which are very important for the lung cancer diagnosis. In this paper, we follow the direction of data-centric approach for lung nodule detection by proposing a data-centric method to improve detection performance of lung nodules on CT scans. Our method takes into account the dataset-specific features (nodule sizes and aspect ratios) to train detection models as well as add more training data from local Vietnamese hospital. We experiment our method on the three widely used object detection networks (Faster R-CNN, YOLOv3 and RetinaNet). The experimental results show that our proposed method improves detection sensitivity of these object detection models up to 4.24%.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85951446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-22DOI: 10.15625/1813-9663/38/3/17094
Nguyen Thi Thanh Van, Vu Van Quang, Nguyen Cong Luong
This work investigates the model update security in a collaborative learning or federated learning network by using the covert communication. The CC uses the jamming signal and multiple friendly jammers (FJs) are deployed that can offer jamming services to the model owner, i.e., a base station (BS). To enable the BS to select the best FJ, i.e., the lowest cost FJ, a truthful auction is adopted. Then, a problem is formulated to optimize the jamming power, transmission power, and local accuracy. The objective is to minimize the training latency, subject to the security performance requirement and budget of the BS. To solve the non-convex problem, we adopt a Successive Convex Approximation algorithm. The simulation results reveals some interesting things. For example, the trustful auction reduces the jamming cost of the BS as the number of FJs increases.
{"title":"JOINT POWER COST AND LATENCY MINIMIZATION FOR SECURE COLLABORATIVE LEARNING SYSTEM","authors":"Nguyen Thi Thanh Van, Vu Van Quang, Nguyen Cong Luong","doi":"10.15625/1813-9663/38/3/17094","DOIUrl":"https://doi.org/10.15625/1813-9663/38/3/17094","url":null,"abstract":"This work investigates the model update security in a collaborative learning or federated learning network by using the covert communication. The CC uses the jamming signal and multiple friendly jammers (FJs) are deployed that can offer jamming services to the model owner, i.e., a base station (BS). To enable the BS to select the best FJ, i.e., the lowest cost FJ, a truthful auction is adopted. Then, a problem is formulated to optimize the jamming power, transmission power, and local accuracy. The objective is to minimize the training latency, subject to the security performance requirement and budget of the BS. To solve the non-convex problem, we adopt a Successive Convex Approximation algorithm. The simulation results reveals some interesting things. For example, the trustful auction reduces the jamming cost of the BS as the number of FJs increases.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74340075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-23DOI: 10.15625/1813-9663/38/2/16786
Hai-Minh Nguyen, Van Thanh The, T. Lang
Semantic extraction for images is a topical problem and is applied in many different semantic search systems. In this paper, a method of semantic image retrieval is proposed based on the set of similar images to the input image; then, the semantics of the images are queried on the ontology through the visual words vector. The objects of each image are extracted and classified by the Mask R-CNN and stored on the cluster graph to extract semantics for the image. The similar images of query image are extracted on the cluster graph; then, the k-NN algorithm is applied to find the visual words vector as the basis for querying the semantic of the query image on the ontology by the SPARQL query. On the basis of the proposed method, an experiment was built and evaluated on two large-volume image datasets MIRFLICKR-25K and MS COCO. Experimental results are compared with recently published works on the same datasets to demonstrate the effectiveness of the proposed method. According to the experimental results, the method of semantic image retrieval in this paper has improved the accuracy to 0.897 for MIRFLICKR-25K, 0.833 for MS COCO.
{"title":"A METHOD OF SEMANTIC-BASED IMAGE RETRIEVAL USING GRAPH CUT","authors":"Hai-Minh Nguyen, Van Thanh The, T. Lang","doi":"10.15625/1813-9663/38/2/16786","DOIUrl":"https://doi.org/10.15625/1813-9663/38/2/16786","url":null,"abstract":"Semantic extraction for images is a topical problem and is applied in many different semantic search systems. In this paper, a method of semantic image retrieval is proposed based on the set of similar images to the input image; then, the semantics of the images are queried on the ontology through the visual words vector. The objects of each image are extracted and classified by the Mask R-CNN and stored on the cluster graph to extract semantics for the image. The similar images of query image are extracted on the cluster graph; then, the k-NN algorithm is applied to find the visual words vector as the basis for querying the semantic of the query image on the ontology by the SPARQL query. On the basis of the proposed method, an experiment was built and evaluated on two large-volume image datasets MIRFLICKR-25K and MS COCO. Experimental results are compared with recently published works on the same datasets to demonstrate the effectiveness of the proposed method. According to the experimental results, the method of semantic image retrieval in this paper has improved the accuracy to 0.897 for MIRFLICKR-25K, 0.833 for MS COCO.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88377836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-23DOI: 10.15625/1813-9663/38/2/16125
N. D. Hieu, N. C. Ho, Phạm Đình Phong, Vũ Như Lân, Phạm Hoàng Hiệp
Instead of handling fuzzy sets associated with linguistic (L-) labels based on the developers’ intuition immediately, the study follows the hedge algebras (HA-) approach to the time series forecasting problems, in which the linguistic time series forecasting model was, for the first time, proposed and examined in 2020. It can handle the declared forecasting L-variable word-set directly and, hence, the terminology linguistic time-series (LTS) is used instead of the fuzzy time-series (FTS). Instead of utilizing a limited number of fuzzy sets, this study views the L-variable under consideration as to the numeric forecasting variable's human linguistic counterpart. Hence, its word-domain becomes potentially infinite to positively utilize the HA-approach formalism for increasing the LTS forecasting result exactness. Because the forecasting model proposed in this study can directly handle L-words, the LTS, constructed from the numeric time series and its L-relationship groups, considered human knowledges of the given time-series variation helpful for the human-machine interface. The study shows that the proposed formalism can more easily handle the LTS forecasting models and increase their performance compared to the FTS forecasting models when the words’ number grows.
{"title":"SCALABLE HUMAN KNOWLEDGE ABOUT NUMERIC TIME SERIES VARIATION AND ITS ROLE IN IMPROVING FORECASTING RESULTS","authors":"N. D. Hieu, N. C. Ho, Phạm Đình Phong, Vũ Như Lân, Phạm Hoàng Hiệp","doi":"10.15625/1813-9663/38/2/16125","DOIUrl":"https://doi.org/10.15625/1813-9663/38/2/16125","url":null,"abstract":"Instead of handling fuzzy sets associated with linguistic (L-) labels based on the developers’ intuition immediately, the study follows the hedge algebras (HA-) approach to the time series forecasting problems, in which the linguistic time series forecasting model was, for the first time, proposed and examined in 2020. It can handle the declared forecasting L-variable word-set directly and, hence, the terminology linguistic time-series (LTS) is used instead of the fuzzy time-series (FTS). Instead of utilizing a limited number of fuzzy sets, this study views the L-variable under consideration as to the numeric forecasting variable's human linguistic counterpart. Hence, its word-domain becomes potentially infinite to positively utilize the HA-approach formalism for increasing the LTS forecasting result exactness. Because the forecasting model proposed in this study can directly handle L-words, the LTS, constructed from the numeric time series and its L-relationship groups, considered human knowledges of the given time-series variation helpful for the human-machine interface. The study shows that the proposed formalism can more easily handle the LTS forecasting models and increase their performance compared to the FTS forecasting models when the words’ number grows.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78718403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-23DOI: 10.15625/1813-9663/38/2/16880
Cao Thi Luyen, Nguyen Kim Sao, Le Quang Hoa, Ta Minh Thanh
Pixel value ordering (PVO) has been considered as an effective reversible data hiding method for high embedding capacity and good imperceptibility. In this paper, we propose a novel reversible data hiding based on four sorted pixels. This paper proposes a new reversible hiding scheme based on the arrangement of 4 pixels. Three bits will be embedded into each four pixels sub block without changing the order while the original PVO method only embeds 2 bits. In case the amount of payload is less than the embedding capacity, flatter blocks will be prioritized for embedding to improve image quality. To determine the flat of block, we use 12 neighborhood pixels of current block. Only blocks with satisfactory flatness are used for embedding. The proposed reversible data hiding not only gains high capacity but also gets good imperceptibility. Experimental results also show that the proposed reversible data hiding scheme outperforms several widely schemes using pixel value ordering method in terms of both image quality and embedding capacity.
{"title":"AN EFFICIENT REVERSIBLE DATA HIDING BASED ON IMPROVED PIXEL VALUE ORDERING METHOD","authors":"Cao Thi Luyen, Nguyen Kim Sao, Le Quang Hoa, Ta Minh Thanh","doi":"10.15625/1813-9663/38/2/16880","DOIUrl":"https://doi.org/10.15625/1813-9663/38/2/16880","url":null,"abstract":"Pixel value ordering (PVO) has been considered as an effective reversible data hiding method for high embedding capacity and good imperceptibility. In this paper, we propose a novel reversible data hiding based on four sorted pixels. This paper proposes a new reversible hiding scheme based on the arrangement of 4 pixels. Three bits will be embedded into each four pixels sub block without changing the order while the original PVO method only embeds 2 bits. In case the amount of payload is less than the embedding capacity, flatter blocks will be prioritized for embedding to improve image quality. To determine the flat of block, we use 12 neighborhood pixels of current block. Only blocks with satisfactory flatness are used for embedding. The proposed reversible data hiding not only gains high capacity but also gets good imperceptibility. Experimental results also show that the proposed reversible data hiding scheme outperforms several widely schemes using pixel value ordering method in terms of both image quality and embedding capacity.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81948415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-23DOI: 10.15625/1813-9663/38/2/16912
D. A, Hung Nguyen Quoc, Quang Vu Vinh
In this work, we consider the Dirichlet boundary value problem for nonlinear triharmonic equation. Due to the reduction of the problem to operator equation for the pair of the right hand side function and the unknown second normal derivative of the function to be sought, we design an iterative method at both continuous and discrete levels for numerical solution of the problem. Some examples demonstrate that the numerical method is of fourth order convergence. When the right hand side function does not depend on the unknown function and its derivatives, the numerical method gives more accurate results in comparison with the results obtained by the interior method of Gudi and Neilan.
{"title":"NUMERICAL METHOD FOR SOLVING THE DIRICHLET BOUNDARY VALUE PROBLEM FOR NONLINEAR TRIHARMONIC EQUATION","authors":"D. A, Hung Nguyen Quoc, Quang Vu Vinh","doi":"10.15625/1813-9663/38/2/16912","DOIUrl":"https://doi.org/10.15625/1813-9663/38/2/16912","url":null,"abstract":"In this work, we consider the Dirichlet boundary value problem for nonlinear triharmonic equation. Due to the reduction of the problem to operator equation for the pair of the right hand side function and the unknown second normal derivative of the function to be sought, we design an iterative method at both continuous and discrete levels for numerical solution of the problem. Some examples demonstrate that the numerical method is of fourth order convergence. When the right hand side function does not depend on the unknown function and its derivatives, the numerical method gives more accurate results in comparison with the results obtained by the interior method of Gudi and Neilan.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85981353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-23DOI: 10.15625/1813-9663/38/2/16749
Dinh Que Tran, Phuong Pham
Computational trust among peers plays a crucial role in sharing information, decision making, searching or attracting recommendations in intelligent systems and social networks. However, most trust models focus on considering interaction forms rather than analyzing contexts such as comments, posts being dispatched by users on social media. The purpose of this paper is to present a novel model of computational trust among a truster and a trustee in two stages. First, we construct a function, named experience topic-aware trust, whose computation is based on users interaction and their interests on topics. Then we establish a composition function, named topic-aware trust, which is constructed from the estimation of truster’s direct experience trust and some reputation trust on some trustee. Our experimental results show that the interest degrees affect on trust estimation more than interaction ones. In addition, the more interest degree in a topic users obtain, the more trustworthy they are.
{"title":"MODELING COMPUTATIONAL TRUST BASED ON INTERACTION EXPERIENCE AND REPUTATION WITH USER INTERESTS IN SOCIAL NETWORK","authors":"Dinh Que Tran, Phuong Pham","doi":"10.15625/1813-9663/38/2/16749","DOIUrl":"https://doi.org/10.15625/1813-9663/38/2/16749","url":null,"abstract":"Computational trust among peers plays a crucial role in sharing information, decision making, searching or attracting recommendations in intelligent systems and social networks. However, most trust models focus on considering interaction forms rather than analyzing contexts such as comments, posts being dispatched by users on social media. The purpose of this paper is to present a novel model of computational trust among a truster and a trustee in two stages. First, we construct a function, named experience topic-aware trust, whose computation is based on users interaction and their interests on topics. Then we establish a composition function, named topic-aware trust, which is constructed from the estimation of truster’s direct experience trust and some reputation trust on some trustee. Our experimental results show that the interest degrees affect on trust estimation more than interaction ones. In addition, the more interest degree in a topic users obtain, the more trustworthy they are.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78112794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-23DOI: 10.15625/1813-9663/38/2/16252
Doanh Bui Cao, Nguyen Vo Duy, Khang Nguyen
Object detection methods based on Deep Learning are the revolution of the Computer Vision field in general and object detection problems in particular. In detail, they are methods that belonged to the R - CNN family: Faster R - CNN and Cascade R - CNN. The characteristic of them is the Region Proposal Network, which is utilized for generating proposal regions that may include objects or not, then the proposals will be classified by the IoU threshold. In this study, we apply dynamic training, which adjusts this IoU threshold depending on the statistic of proposal regions on the Faster R - CNN and Cascade R - CNN, training on the SeaShips and DODV dataset. Cascade R - CNN with dynamic training achieve higher results compared to normal on both two datasets (higher 0.2% and 5.7% on the SeaShips and DODV dataset, respectively). In the DODV dataset, Faster R - CNN with dynamic training also perform higher results compared to its normal version, 4.4% higher.
基于深度学习的目标检测方法是计算机视觉领域的革命,特别是目标检测问题。具体来说,它们属于R - CNN家族:Faster R - CNN和Cascade R - CNN。它们的特点是区域提案网络,该网络用于生成可能包含对象或不包含对象的提案区域,然后根据IoU阈值对提案进行分类。在本研究中,我们应用动态训练,根据Faster R - CNN和Cascade R - CNN上的建议区域统计,在SeaShips和DODV数据集上进行训练,调整IoU阈值。Cascade R - CNN与动态训练相比,在这两个数据集上都取得了更高的结果(在SeaShips和DODV数据集上分别高出0.2%和5.7%)。在DODV数据集中,带有动态训练的Faster R - CNN也比其正常版本表现出更高的结果,高出4.4%。
{"title":"DLAFS CASCADE R-CNN: AN OBJECT DETECTOR BASED ON DYNAMIC LABEL ASSIGNMENT","authors":"Doanh Bui Cao, Nguyen Vo Duy, Khang Nguyen","doi":"10.15625/1813-9663/38/2/16252","DOIUrl":"https://doi.org/10.15625/1813-9663/38/2/16252","url":null,"abstract":"Object detection methods based on Deep Learning are the revolution of the Computer Vision field in general and object detection problems in particular. In detail, they are methods that belonged to the R - CNN family: Faster R - CNN and Cascade R - CNN. The characteristic of them is the Region Proposal Network, which is utilized for generating proposal regions that may include objects or not, then the proposals will be classified by the IoU threshold. In this study, we apply dynamic training, which adjusts this IoU threshold depending on the statistic of proposal regions on the Faster R - CNN and Cascade R - CNN, training on the SeaShips and DODV dataset. Cascade R - CNN with dynamic training achieve higher results compared to normal on both two datasets (higher 0.2% and 5.7% on the SeaShips and DODV dataset, respectively). In the DODV dataset, Faster R - CNN with dynamic training also perform higher results compared to its normal version, 4.4% higher.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79129181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-20DOI: 10.15625/1813-9663/17589
Anh-Cang Phan, Thuong-Cang Phan
Big data processing is attracting the interest of many researchers to process large-scale datasets and extract useful information for supporting and providing decisions. One of the biggest challenges is the problem of querying large datasets. It becomes even more complicated with similarity queries instead of exact match queries. A fuzzy join operation is a typical operation frequently used in similarity queries and big data analysis. Currently, there is very little research on this issue, thus it poses significant barriers to the efforts of improving query operations on big data efficiently. As a result, this study overviews the similarity algorithms for fuzzy joins, in which the data at the join key attributes may have slight differences within a fuzzy threshold. We analyze six similarity algorithms including Hamming, Levenshtein, LCS, Jaccard, Jaro, and Jaro - Winkler, to show the difference between these algorithms through the three criteria: output enrichment, false positives/negatives, and the processing time of the algorithms. Experiments of fuzzy joins algorithms are implemented in the Spark environment, a popular big data processing platform. The algorithms are divided into two groups for evaluation: group 1 (Hamming, Levenshtein, and LCS) and group 2 (Jaccard, Jaro, and Jaro - Winkler). For the former, Levenshtein has an advantage over the other two algorithms in terms of output enrichment, high accuracy in the result set (false positives/negatives), and acceptable processing time. In the letter, Jaccard is considered the worst algorithm considering all three criteria mean while Jaro - Winkler algorithm has more output richness and higher accuracy in the result set. The overview of the similarity algorithms in this study will help users to choose the most suitable algorithm for their problems.
{"title":"SIMILARITY ALGORITHMS FOR FUZZY JOIN COMPUTATION IN BIG DATA PROCESSING ENVIRONMENT","authors":"Anh-Cang Phan, Thuong-Cang Phan","doi":"10.15625/1813-9663/17589","DOIUrl":"https://doi.org/10.15625/1813-9663/17589","url":null,"abstract":"Big data processing is attracting the interest of many researchers to process large-scale datasets and extract useful information for supporting and providing decisions. One of the biggest challenges is the problem of querying large datasets. It becomes even more complicated with similarity queries instead of exact match queries. A fuzzy join operation is a typical operation frequently used in similarity queries and big data analysis. Currently, there is very little research on this issue, thus it poses significant barriers to the efforts of improving query operations on big data efficiently. As a result, this study overviews the similarity algorithms for fuzzy joins, in which the data at the join key attributes may have slight differences within a fuzzy threshold. We analyze six similarity algorithms including Hamming, Levenshtein, LCS, Jaccard, Jaro, and Jaro - Winkler, to show the difference between these algorithms through the three criteria: output enrichment, false positives/negatives, and the processing time of the algorithms. Experiments of fuzzy joins algorithms are implemented in the Spark environment, a popular big data processing platform. The algorithms are divided into two groups for evaluation: group 1 (Hamming, Levenshtein, and LCS) and group 2 (Jaccard, Jaro, and Jaro - Winkler). For the former, Levenshtein has an advantage over the other two algorithms in terms of output enrichment, high accuracy in the result set (false positives/negatives), and acceptable processing time. In the letter, Jaccard is considered the worst algorithm considering all three criteria mean while Jaro - Winkler algorithm has more output richness and higher accuracy in the result set. The overview of the similarity algorithms in this study will help users to choose the most suitable algorithm for their problems.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"187 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78734048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-20DOI: 10.15625/1813-9663/38/1/15992
B. Cuong
Pythagorean picture fuzzy set (PPFS) - is a combination of Picture fuzzy set with the Yager’s Pythagorean fuzzy set [12-14]. In the first part of the paper [17] we considered basic notions on PPFS as set operators of PPFS. Unfortunately, we have not papers [18,19, 20] about spherical fuzzy sets with the same definition with some operators and applications to multi attribute group decision making problems. Now in the second part, we will present some main operators in picture fuzzy logic on PPFS: picture negation operator, picture t-norm, picture t-conorm, picture implication operators on PPFS. Last, the compositional rule of inference in PPFS setting should be presented and an numerical example was given.
{"title":"PYTHAGOREAN PICTURE FUZZY SETS(PPFS), PART 2- SOME MAIN PICTURE LOGIC OPERATORS ON PPFS AND SOME PICTURE INFERENCE PROCESSES IN PPF SYSTEMS","authors":"B. Cuong","doi":"10.15625/1813-9663/38/1/15992","DOIUrl":"https://doi.org/10.15625/1813-9663/38/1/15992","url":null,"abstract":"Pythagorean picture fuzzy set (PPFS) - is a combination of Picture fuzzy set with the Yager’s Pythagorean fuzzy set [12-14]. In the first part of the paper [17] we considered basic notions on PPFS as set operators of PPFS. Unfortunately, we have not papers [18,19, 20] about spherical fuzzy sets with the same definition with some operators and applications to multi attribute group decision making problems. Now in the second part, we will present some main operators in picture fuzzy logic on PPFS: picture negation operator, picture t-norm, picture t-conorm, picture implication operators on PPFS. Last, the compositional rule of inference in PPFS setting should be presented and an numerical example was given.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"81 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77356170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}