Cristina Blanco-González-Tejero, B. Ribeiro-Navarrete, Enrique Cano-Marin, William C. McDowell
New models of entrepreneurship are emerging because of increasing digitalization and the development of artificial intelligence (AI). There is a lack of existing research on the intersection between digitalization and entrepreneurship. Therefore, this systematic literature analysis aims to expand knowledge in this area and provide a semantic analysis of existing contributions. Following the SPAR-4-SLR protocol, it analyzes 520 scientific articles from the Dimensions.ai database up to July 2022. The methodology uses natural language processing (NLP) and tools such as bibliometrix and VosViewer, which reveal the main characteristics of the titles and texts of the abstracts and their links with the numbers of citations and with scientific impact. This study provides guidelines and clear recommendations for scientists to focus their scientific research on AI and entrepreneurship and entrepreneurs by including the link between AI and entrepreneurship in their strategies. As future lines of research, the authors highlight the potential of using NLP in bibliometric analysis.
{"title":"A Systematic Literature Review on the Role of Artificial Intelligence in Entrepreneurial Activity","authors":"Cristina Blanco-González-Tejero, B. Ribeiro-Navarrete, Enrique Cano-Marin, William C. McDowell","doi":"10.4018/ijswis.318448","DOIUrl":"https://doi.org/10.4018/ijswis.318448","url":null,"abstract":"New models of entrepreneurship are emerging because of increasing digitalization and the development of artificial intelligence (AI). There is a lack of existing research on the intersection between digitalization and entrepreneurship. Therefore, this systematic literature analysis aims to expand knowledge in this area and provide a semantic analysis of existing contributions. Following the SPAR-4-SLR protocol, it analyzes 520 scientific articles from the Dimensions.ai database up to July 2022. The methodology uses natural language processing (NLP) and tools such as bibliometrix and VosViewer, which reveal the main characteristics of the titles and texts of the abstracts and their links with the numbers of citations and with scientific impact. This study provides guidelines and clear recommendations for scientists to focus their scientific research on AI and entrepreneurship and entrepreneurs by including the link between AI and entrepreneurship in their strategies. As future lines of research, the authors highlight the potential of using NLP in bibliometric analysis.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"84 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83048303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Entity alignment aims to identify equivalent entity pairs from different knowledge graphs (KGs). Recently, aligning temporal knowledge graphs (TKGs) that contain time information has aroused increasingly more interest, as the time dimension is widely used in real-life applications. The matching between TKGs requires seed entity pairs, which are lacking in practice. Hence, it is of great significance to study TKG alignment under scarce supervision. In this work, the authors formally formulate the problem of TKG alignment with limited labeled data and propose to solve it under the active learning framework. As the core of active learning is to devise query strategies to select the most informative instances to label, the authors propose to make full use of time information and put forward novel time-aware strategies to meet the requirement of weakly supervised temporal entity alignment. Extensive experimental results on multiple real-world datasets show that it is important to study TKG alignment with scarce supervision, and the proposed time-aware strategy is effective.
{"title":"Active Temporal Knowledge Graph Alignment","authors":"Jie Zhou, Weixin Zeng, Hao Xu, Xiang Zhao","doi":"10.4018/ijswis.318339","DOIUrl":"https://doi.org/10.4018/ijswis.318339","url":null,"abstract":"Entity alignment aims to identify equivalent entity pairs from different knowledge graphs (KGs). Recently, aligning temporal knowledge graphs (TKGs) that contain time information has aroused increasingly more interest, as the time dimension is widely used in real-life applications. The matching between TKGs requires seed entity pairs, which are lacking in practice. Hence, it is of great significance to study TKG alignment under scarce supervision. In this work, the authors formally formulate the problem of TKG alignment with limited labeled data and propose to solve it under the active learning framework. As the core of active learning is to devise query strategies to select the most informative instances to label, the authors propose to make full use of time information and put forward novel time-aware strategies to meet the requirement of weakly supervised temporal entity alignment. Extensive experimental results on multiple real-world datasets show that it is important to study TKG alignment with scarce supervision, and the proposed time-aware strategy is effective.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"26 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87368197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Li, X. Zhang, Bin Ma, Meihong Yang, Chunpeng Wang, Yang Liu, Xinan Cui, Xiaotong Yang
The photo response non-uniformity (PRNU) is used to connect an image to its source sensor. In this paper, researchers propose a PRNU anonymity method based on image segmentation to cut the relationship between the image and its source camera. According to the distribution rule of PRNU in the high and low frequency band of the image, the high and low frequency information of the part is also processed differently, which ensures the quality of the output image to a large extent. Experiments on the datasets show that the proposed method can preserve the biometric characteristics of the device while maintaining the anonymity of the device. Comparing with prior art, peak signal to noise ratio (PSNR) and cosine similarity are improved by 1.9 dB and 0.02 points, respectively.
{"title":"PRNU Anonymous Algorithm Used for Privacy Protection in Biometric Authentication Systems","authors":"Jian Li, X. Zhang, Bin Ma, Meihong Yang, Chunpeng Wang, Yang Liu, Xinan Cui, Xiaotong Yang","doi":"10.4018/ijswis.317928","DOIUrl":"https://doi.org/10.4018/ijswis.317928","url":null,"abstract":"The photo response non-uniformity (PRNU) is used to connect an image to its source sensor. In this paper, researchers propose a PRNU anonymity method based on image segmentation to cut the relationship between the image and its source camera. According to the distribution rule of PRNU in the high and low frequency band of the image, the high and low frequency information of the part is also processed differently, which ensures the quality of the output image to a large extent. Experiments on the datasets show that the proposed method can preserve the biometric characteristics of the device while maintaining the anonymity of the device. Comparing with prior art, peak signal to noise ratio (PSNR) and cosine similarity are improved by 1.9 dB and 0.02 points, respectively.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"66 1","pages":"1-19"},"PeriodicalIF":3.2,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79990782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, a novel top-K ranking recommendation method called collaborative social metric learning (CSML) is proposed, which implements a trust network that provides both user-item and user-user interactions in simple structure. Most existing recommender systems adopting trust networks focus on item ratings, but this does not always guarantee optimal top-K ranking prediction. Conventional direct ranking systems in trust networks are based on sub-optimal correlation approaches that do not consider item-item relations. The proposed CSML algorithm utilizes the metric learning method to directly predict the top-K items in a trust network. A new triplet loss is further proposed, called socio-centric loss, which represents user-user interactions to fully exploit the information contained in a trust network, as an addition to the two commonly used triplet losses in metric learning for recommender systems, which consider user-item and item-item relations. Experimental results demonstrate that the proposed CSML outperformed existing recommender systems for real-world trust network data.
{"title":"Collaborative Social Metric Learning in Trust Network for Recommender Systems","authors":"Taehan Kim, Wonzoo Chung","doi":"10.4018/ijswis.316535","DOIUrl":"https://doi.org/10.4018/ijswis.316535","url":null,"abstract":"In this study, a novel top-K ranking recommendation method called collaborative social metric learning (CSML) is proposed, which implements a trust network that provides both user-item and user-user interactions in simple structure. Most existing recommender systems adopting trust networks focus on item ratings, but this does not always guarantee optimal top-K ranking prediction. Conventional direct ranking systems in trust networks are based on sub-optimal correlation approaches that do not consider item-item relations. The proposed CSML algorithm utilizes the metric learning method to directly predict the top-K items in a trust network. A new triplet loss is further proposed, called socio-centric loss, which represents user-user interactions to fully exploit the information contained in a trust network, as an addition to the two commonly used triplet losses in metric learning for recommender systems, which consider user-item and item-item relations. Experimental results demonstrate that the proposed CSML outperformed existing recommender systems for real-world trust network data.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"14 1","pages":"1-15"},"PeriodicalIF":3.2,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77957711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenbin Zhao, Jing Huang, Tongrang Fan, Yongliang Wu, Keqiang Liu
In recent years, the related research of entity alignment has mainly focused on entity alignment via knowledge embeddings and graph neural networks; however, these proposed models usually suffer from structural heterogeneity and the large-scale problem of knowledge graph. A novel entity alignment model based on graph isomorphic network and compressed sensing is proposed. First, for the problem of structural heterogeneity, graph isomorphic network encoder is applied in knowledge graph to capture structural similarity of entity relation. Second, for the problem of large scale, key node and community are integrated for priority entity alignment to improve execution speed. However, the exiting node importance ranking algorithm cannot accurately identify key node in knowledge graph. So the compressed sensing is adopted in node importance ranking to improve the accuracy of identifying key node. The authors have carried out several experiments to test the effect and efficiency of the proposed entity alignment model.
{"title":"A Novel Compressed Sensing-Based Graph Isomorphic Network for Key Node Recognition and Entity Alignment","authors":"Wenbin Zhao, Jing Huang, Tongrang Fan, Yongliang Wu, Keqiang Liu","doi":"10.4018/ijswis.315600","DOIUrl":"https://doi.org/10.4018/ijswis.315600","url":null,"abstract":"In recent years, the related research of entity alignment has mainly focused on entity alignment via knowledge embeddings and graph neural networks; however, these proposed models usually suffer from structural heterogeneity and the large-scale problem of knowledge graph. A novel entity alignment model based on graph isomorphic network and compressed sensing is proposed. First, for the problem of structural heterogeneity, graph isomorphic network encoder is applied in knowledge graph to capture structural similarity of entity relation. Second, for the problem of large scale, key node and community are integrated for priority entity alignment to improve execution speed. However, the exiting node importance ranking algorithm cannot accurately identify key node in knowledge graph. So the compressed sensing is adopted in node importance ranking to improve the accuracy of identifying key node. The authors have carried out several experiments to test the effect and efficiency of the proposed entity alignment model.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"30 1","pages":"1-17"},"PeriodicalIF":3.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75430134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image style transfer (IST) has drawn broad attention recently. At present, convolutional neural network (CNN)-based methods and generative adversarial network (GAN)-based methods have been broadly utilized in IST. However, the texture of images obtained by most methods presents a lower definition, which leads to insufficient details of IST. To this end, the authors present a new IST method based on an enhanced GAN with a prior circular local binary pattern (LBP). They utilize circular LBP in a GAN generator as a texture prior to improve the detailed textures of the generated style images. Meanwhile, they integrate a dense connection residual block and an attention mechanism into the generator to further improve high-frequency feature extraction. In addition, the total variation (TV) regularizer is integrated into the loss function to smooth the training results and restrain the noise. The qualitative and quantitative experimental results demonstrate that the metric quality of the generated images can achieve better effects by the proposed strategy compared with other popular approaches.
{"title":"Circular LBP Prior-Based Enhanced GAN for Image Style Transfer","authors":"Wenguang Qian, Hua Li, Haiping Mu","doi":"10.4018/ijswis.315601","DOIUrl":"https://doi.org/10.4018/ijswis.315601","url":null,"abstract":"Image style transfer (IST) has drawn broad attention recently. At present, convolutional neural network (CNN)-based methods and generative adversarial network (GAN)-based methods have been broadly utilized in IST. However, the texture of images obtained by most methods presents a lower definition, which leads to insufficient details of IST. To this end, the authors present a new IST method based on an enhanced GAN with a prior circular local binary pattern (LBP). They utilize circular LBP in a GAN generator as a texture prior to improve the detailed textures of the generated style images. Meanwhile, they integrate a dense connection residual block and an attention mechanism into the generator to further improve high-frequency feature extraction. In addition, the total variation (TV) regularizer is integrated into the loss function to smooth the training results and restrain the noise. The qualitative and quantitative experimental results demonstrate that the metric quality of the generated images can achieve better effects by the proposed strategy compared with other popular approaches.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"128 1","pages":"1-15"},"PeriodicalIF":3.2,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88741079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Increasing number of ontologies demand the interoperability between them in order to gain accurate information. the ontology heterogeneity also makes the interoperability process even more difficult. These scenarios let the development of effective and efficient ontology matching. The existing ontology matching systems are mainly focusing with subject derivatives of the concern domain. Since ontologies are represented as data model in structured format, In this paper, a new modified model of similarity spreading for ontology mapping is proposed. In this approach the mapping mainly involves with node clustering based on edge affinity and then the graph matching is achieved by applying coefficient similarity propagation. This process is carried out by iterative manner and at the end the similarity score is calculated for iteration. This model is evaluated in terms of precision, recall and f-measure parameters and found that it outperforms well than its similar kind of systems.
{"title":"An Improved Structural-Based Ontology Matching Approach Using Similarity Spreading","authors":"Sengodan Mani, Samukutty Annadurai","doi":"10.4018/ijswis.300825","DOIUrl":"https://doi.org/10.4018/ijswis.300825","url":null,"abstract":"Increasing number of ontologies demand the interoperability between them in order to gain accurate information. the ontology heterogeneity also makes the interoperability process even more difficult. These scenarios let the development of effective and efficient ontology matching. The existing ontology matching systems are mainly focusing with subject derivatives of the concern domain. Since ontologies are represented as data model in structured format, In this paper, a new modified model of similarity spreading for ontology mapping is proposed. In this approach the mapping mainly involves with node clustering based on edge affinity and then the graph matching is achieved by applying coefficient similarity propagation. This process is carried out by iterative manner and at the end the similarity score is calculated for iteration. This model is evaluated in terms of precision, recall and f-measure parameters and found that it outperforms well than its similar kind of systems.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"66 1","pages":"1-17"},"PeriodicalIF":3.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90254853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. M. Srivastava, Priyanka Rotte, Arushi Jain, Surya Prakash
Due to the availability of cheap 3D sensors such as Kinect and LiDAR, the use of 3D data in various domains such as manufacturing, healthcare, and retail to achieve operational safety, improved outcomes, and enhanced customer experience has gained momentum in recent years. In many of these domains, object recognition is being performed using 3D data against the difficulties posed by illumination, pose variation, scaling, etc present in 2D data. In this work, we propose three data augmentation techniques for 3D data in point cloud representation that use sub-sampling. We then verify that the 3D samples created through data augmentation carry the same information by comparing the Iterative Closest Point Registration Error within the sub-samples, between the sub-samples and their parent sample, between the sub-samples with different parents and the same subject, and finally, between the sub-samples of different subjects. We also verify that the augmented sub-samples have the same characteristics and features as those of the original 3D point cloud by applying the Central Limit Theorem.
{"title":"Handling Data Scarcity Through Data Augmentation in Training of Deep Neural Networks for 3D Data Processing","authors":"A. M. Srivastava, Priyanka Rotte, Arushi Jain, Surya Prakash","doi":"10.4018/ijswis.297038","DOIUrl":"https://doi.org/10.4018/ijswis.297038","url":null,"abstract":"Due to the availability of cheap 3D sensors such as Kinect and LiDAR, the use of 3D data in various domains such as manufacturing, healthcare, and retail to achieve operational safety, improved outcomes, and enhanced customer experience has gained momentum in recent years. In many of these domains, object recognition is being performed using 3D data against the difficulties posed by illumination, pose variation, scaling, etc present in 2D data. In this work, we propose three data augmentation techniques for 3D data in point cloud representation that use sub-sampling. We then verify that the 3D samples created through data augmentation carry the same information by comparing the Iterative Closest Point Registration Error within the sub-samples, between the sub-samples and their parent sample, between the sub-samples with different parents and the same subject, and finally, between the sub-samples of different subjects. We also verify that the augmented sub-samples have the same characteristics and features as those of the original 3D point cloud by applying the Central Limit Theorem.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"49 1","pages":"1-16"},"PeriodicalIF":3.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73376752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The application of semantic web technologies such as semantic inference to the field of the Internet of Things (IoT) can realize data semantic information enhancement and semantic knowledge discovery, which plays a key role in enhancing data value and application intelligence. However, Mainstream semantic inference engines cannot be applied to IoT computing devices with limited storage resources and weak computing power, and cannot reason about uncertain knowledge. To solve this problem, the authors propose a lightweight semantic inference engine, Tiny-UKSIE, based on the RETE algorithm. The genetic algorithm (GA) is adopted to optimize the Alpha network sequence, and the inference time can be reduced by 8.73% before and after optimization. Moreover, a four-tuple knowledge representation method with probability factors is proposed, and probabilistic inference rules are constructed to enable the inference engine to infer uncertain knowledge. Compared with mainstream inference engines, storage resource usage is reduced by up to 97.37%, and inference time is reduced by up to 24.55%.
{"title":"Tiny-UKSIE-An Optimized Lightweight Semantic Inference Engine for Reasoning Uncertain Knowledge","authors":"Daoqu Geng","doi":"10.4018/ijswis.300826","DOIUrl":"https://doi.org/10.4018/ijswis.300826","url":null,"abstract":"The application of semantic web technologies such as semantic inference to the field of the Internet of Things (IoT) can realize data semantic information enhancement and semantic knowledge discovery, which plays a key role in enhancing data value and application intelligence. However, Mainstream semantic inference engines cannot be applied to IoT computing devices with limited storage resources and weak computing power, and cannot reason about uncertain knowledge. To solve this problem, the authors propose a lightweight semantic inference engine, Tiny-UKSIE, based on the RETE algorithm. The genetic algorithm (GA) is adopted to optimize the Alpha network sequence, and the inference time can be reduced by 8.73% before and after optimization. Moreover, a four-tuple knowledge representation method with probability factors is proposed, and probabilistic inference rules are constructed to enable the inference engine to infer uncertain knowledge. Compared with mainstream inference engines, storage resource usage is reduced by up to 97.37%, and inference time is reduced by up to 24.55%.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"71 8 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83615553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ammar Almomani, Mohammad Alauthman, M. Shatnawi, Mohammed Alweshah, Ayat Alrosan, Waleed Alomoush, B. Gupta
The phishing attack is one of the main cybersecurity threats in web phishing and spear phishing. Phishing websites continue to be a problem. One of the main contributions to our study was working and extracting the URL & Domain Identity feature, Abnormal Features, HTML and JavaScript Features, and Domain Features as semantic features to detect phishing websites, which makes the process of classification using those semantic features, more controllable and more effective. The current study used machine learning model algorithms to detect phishing websites, and comparisons were made. We have used 16 machine learning models adopted with 10 semantic features that represent the most effective features for the detection of phishing webpages extracted from two datasets. The GradientBoostingClassifier and RandomForestClassifier had the best accuracy based on the comparison results (i.e., about 97%). In contrast, GaussianNB and the stochastic gradient descent (SGD) classifier represent the lowest accuracy results; 84% and 81% respectively, in comparison with other classifiers.
{"title":"Phishing Website Detection With Semantic Features Based on Machine Learning Classifiers: A Comparative Study","authors":"Ammar Almomani, Mohammad Alauthman, M. Shatnawi, Mohammed Alweshah, Ayat Alrosan, Waleed Alomoush, B. Gupta","doi":"10.4018/ijswis.297032","DOIUrl":"https://doi.org/10.4018/ijswis.297032","url":null,"abstract":"The phishing attack is one of the main cybersecurity threats in web phishing and spear phishing. Phishing websites continue to be a problem. One of the main contributions to our study was working and extracting the URL & Domain Identity feature, Abnormal Features, HTML and JavaScript Features, and Domain Features as semantic features to detect phishing websites, which makes the process of classification using those semantic features, more controllable and more effective. The current study used machine learning model algorithms to detect phishing websites, and comparisons were made. We have used 16 machine learning models adopted with 10 semantic features that represent the most effective features for the detection of phishing webpages extracted from two datasets. The GradientBoostingClassifier and RandomForestClassifier had the best accuracy based on the comparison results (i.e., about 97%). In contrast, GaussianNB and the stochastic gradient descent (SGD) classifier represent the lowest accuracy results; 84% and 81% respectively, in comparison with other classifiers.","PeriodicalId":54934,"journal":{"name":"International Journal on Semantic Web and Information Systems","volume":"33 1","pages":"1-24"},"PeriodicalIF":3.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81875091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}