Pub Date : 2023-01-01DOI: 10.12720/jait.14.2.319-327
H. Ketmaneechairat, Maleerat Maliyaem
—For natural language processing, a corpus is important for training models as also for the algorithms to create the machine learning models. This paper aimed to describe the design and process in creating a corpus-based vocabulary in the Thai language that can be used as a main corpus for natural language processing research. A corpus is created under the regulation of language. By using the actual Word Usage Frequency (WUF) analyzed from a text corpus cover several types of contents. The results presented the frequency of use of several characteristics, namely the frequency of word use character usage frequency and the frequency of using bigram characters. To be used in this research and used as important information for further NLP research. Based on the findings, it was concluded that the average word length increases when the number of words in the corpus increases. It means that the correlation between word length and frequency of words is in the same direction.
{"title":"Corpus-Based Vocabulary List for Thai Language","authors":"H. Ketmaneechairat, Maleerat Maliyaem","doi":"10.12720/jait.14.2.319-327","DOIUrl":"https://doi.org/10.12720/jait.14.2.319-327","url":null,"abstract":"—For natural language processing, a corpus is important for training models as also for the algorithms to create the machine learning models. This paper aimed to describe the design and process in creating a corpus-based vocabulary in the Thai language that can be used as a main corpus for natural language processing research. A corpus is created under the regulation of language. By using the actual Word Usage Frequency (WUF) analyzed from a text corpus cover several types of contents. The results presented the frequency of use of several characteristics, namely the frequency of word use character usage frequency and the frequency of using bigram characters. To be used in this research and used as important information for further NLP research. Based on the findings, it was concluded that the average word length increases when the number of words in the corpus increases. It means that the correlation between word length and frequency of words is in the same direction.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66330346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.2.193-203
Ryuga Kaneko, Taiichi Saito
—This paper proposes a new method to detect Cookie Bomb attacks. A Cookie Bomb attack is a denial-of-service attack such that a user cannot receive a legitimate Hypertext Transfer Protocol (HTTP) response from an HTTP server because the total amount of cookies in an HTTP request exceeds the size limit accepted by the HTTP server. The new method includes our cloud architecture and detection algorithms. The cloud architecture distributes and executes a detection script, which is an implementation of the detection algorithms. This architecture uses Azure Virtual Machines, Azure Storage, Azure Automation, Azure Monitor, and Microsoft Sentinel. The virtual machines are the core components of the architecture, to which end users can connect via RDP to use their browsers. The detection script performs three tasks: obtaining paths to cookies databases generated by browsers, retrieving cookies data from a database, and comparing a threshold with the total size of all cookies a browser sends to a server. Results indicate that our proposed method 1) enables scheduled automation, 2) provides better visibility across regions, and 3) expands detection coverage for different Windows users, browsers, and browser profiles.
本文提出了一种检测Cookie Bomb攻击的新方法。Cookie Bomb攻击是一种拒绝服务攻击,使用户无法从HTTP服务器接收到合法的HTTP (Hypertext Transfer Protocol)响应,因为HTTP请求中的Cookie总数超过了HTTP服务器可接受的大小限制。新方法包括我们的云架构和检测算法。云架构分发并执行检测脚本,该脚本是检测算法的实现。该架构使用Azure虚拟机、Azure存储、Azure自动化、Azure监视器和Microsoft Sentinel。虚拟机是架构的核心组件,最终用户可以通过RDP连接到虚拟机以使用他们的浏览器。检测脚本执行三个任务:获取浏览器生成的cookie数据库的路径,从数据库中检索cookie数据,并将阈值与浏览器发送给服务器的所有cookie的总大小进行比较。结果表明,我们提出的方法1)实现了预定的自动化,2)提供了更好的跨区域可见性,以及3)扩展了针对不同Windows用户、浏览器和浏览器配置文件的检测范围。
{"title":"Detection of Cookie Bomb Attacks in Cloud Computing Environment Monitored by SIEM","authors":"Ryuga Kaneko, Taiichi Saito","doi":"10.12720/jait.14.2.193-203","DOIUrl":"https://doi.org/10.12720/jait.14.2.193-203","url":null,"abstract":"—This paper proposes a new method to detect Cookie Bomb attacks. A Cookie Bomb attack is a denial-of-service attack such that a user cannot receive a legitimate Hypertext Transfer Protocol (HTTP) response from an HTTP server because the total amount of cookies in an HTTP request exceeds the size limit accepted by the HTTP server. The new method includes our cloud architecture and detection algorithms. The cloud architecture distributes and executes a detection script, which is an implementation of the detection algorithms. This architecture uses Azure Virtual Machines, Azure Storage, Azure Automation, Azure Monitor, and Microsoft Sentinel. The virtual machines are the core components of the architecture, to which end users can connect via RDP to use their browsers. The detection script performs three tasks: obtaining paths to cookies databases generated by browsers, retrieving cookies data from a database, and comparing a threshold with the total size of all cookies a browser sends to a server. Results indicate that our proposed method 1) enables scheduled automation, 2) provides better visibility across regions, and 3) expands detection coverage for different Windows users, browsers, and browser profiles.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66330529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.3.392-398
Mario G. Gualsaqui, Stefany M. Cuenca, Ibeth L. Rosero, D. A. Almeida, C. Cadena, Fernando Villalba, Jonathan D. Cruz
—Early detection of the diagnosis of some diseases in the retina of the eye can improve the chances of cure and also prevent blindness. In this study, a Convolutional Neural Network (CNN) with different architectures (Scratch Model, GoogleNet, VGG, ResNet, MobileNet and DenseNet) was created to make a comparison between them and find the one with the best percentage of accuracy and less loss to generate the model for a better automatic classification of images using a MURED database containing retinal images already labeled previously with their respective disease. The results show that the model with the ResNet architecture variant InceptionResNetV2 has an accuracy of 49.85%.
{"title":"Multi-class Classification Approach for Retinal Diseases","authors":"Mario G. Gualsaqui, Stefany M. Cuenca, Ibeth L. Rosero, D. A. Almeida, C. Cadena, Fernando Villalba, Jonathan D. Cruz","doi":"10.12720/jait.14.3.392-398","DOIUrl":"https://doi.org/10.12720/jait.14.3.392-398","url":null,"abstract":"—Early detection of the diagnosis of some diseases in the retina of the eye can improve the chances of cure and also prevent blindness. In this study, a Convolutional Neural Network (CNN) with different architectures (Scratch Model, GoogleNet, VGG, ResNet, MobileNet and DenseNet) was created to make a comparison between them and find the one with the best percentage of accuracy and less loss to generate the model for a better automatic classification of images using a MURED database containing retinal images already labeled previously with their respective disease. The results show that the model with the ResNet architecture variant InceptionResNetV2 has an accuracy of 49.85%.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66330914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.3.543-549
Asmaa J. M. Alshaikhdeeb, Y. Cheah
— Adverse Drug Reaction (ADR) detection from social reviews refers to the task of exploring medical online stores and social reviews for extracting any mention of abnormal reactions that occur after consuming a particular medical product by the consumers themselves. A variety of approaches have been used for extracting ADR from social/medical reviews. These approaches include machine learning, dictionary-based and statistical approaches. Yet, these approaches showed either a high dependency on using an external knowledge source for ADR detection or relying on domain-dependent mechanisms that might lose contextual information. This study aims to propose word sequencing with Long Short-Term Memory (LSTM) architecture. A benchmark dataset of MedSyn has been used in the experiments. Then, a word indexing, mapping, and padding method have been used to represent the words within the reviews as fixed sequences. Such sequences have been fed into the LSTM consequentially. Experimental results showed that the proposed LSTM could achieve an F1 score of up to 92%. Comparing such a finding to the baseline studies reveals the superiority of LSTM. The demonstration of the efficacy of the proposed method has taken different forms including the examination of word indexing with different classifiers, the examination of different features with LSTM, and through the comparison against the baseline studies.
{"title":"Utilizing Word Index Approach with LSTM Architecture for Extracting Adverse Drug Reaction from Medical Reviews","authors":"Asmaa J. M. Alshaikhdeeb, Y. Cheah","doi":"10.12720/jait.14.3.543-549","DOIUrl":"https://doi.org/10.12720/jait.14.3.543-549","url":null,"abstract":"— Adverse Drug Reaction (ADR) detection from social reviews refers to the task of exploring medical online stores and social reviews for extracting any mention of abnormal reactions that occur after consuming a particular medical product by the consumers themselves. A variety of approaches have been used for extracting ADR from social/medical reviews. These approaches include machine learning, dictionary-based and statistical approaches. Yet, these approaches showed either a high dependency on using an external knowledge source for ADR detection or relying on domain-dependent mechanisms that might lose contextual information. This study aims to propose word sequencing with Long Short-Term Memory (LSTM) architecture. A benchmark dataset of MedSyn has been used in the experiments. Then, a word indexing, mapping, and padding method have been used to represent the words within the reviews as fixed sequences. Such sequences have been fed into the LSTM consequentially. Experimental results showed that the proposed LSTM could achieve an F1 score of up to 92%. Comparing such a finding to the baseline studies reveals the superiority of LSTM. The demonstration of the efficacy of the proposed method has taken different forms including the examination of word indexing with different classifiers, the examination of different features with LSTM, and through the comparison against the baseline studies.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66331918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.741-748
Caswell Nkuna, Ebenezer Esenogho, R. Heymann, E. Matlotse
—Hacking social or personal information is rising, and data security is given serious attention in any organization. There are several data security strategies depending on what areas it is applied to, for instance, voice, image, or video. Image is the main focus of this paper; hence, this paper proposed and implemented an image steganography (covert communication) technique that does not break existing image recognition neural network systems. This technique enables data to be hidden in a cover image while the image recognition Artificial Neural Network (ANN) checks the presence of any visible alterations on the stego-image. Two different image steganography methods were tested: Least Significant Bit (LSB) and proposed Discrete Cosine Transform (DCT) LSB-2. The resulting stego-images were analyzed using a neural network implemented in the Keras TensorFlow soft tool. The results showed that the proposed DCT LSB-2 encoding method allows a high data payload and minimizes visible alterations, keeping the neural network’s efficiency at a maximum. An optimum ratio for encoding data in an image was determined to maintain the high robustness of the steganography system. This proposed method has shown improved stego-system performance compared to the previous techniques.
{"title":"Using Artificial Neural Network to Test Image Covert Communication Effect","authors":"Caswell Nkuna, Ebenezer Esenogho, R. Heymann, E. Matlotse","doi":"10.12720/jait.14.4.741-748","DOIUrl":"https://doi.org/10.12720/jait.14.4.741-748","url":null,"abstract":"—Hacking social or personal information is rising, and data security is given serious attention in any organization. There are several data security strategies depending on what areas it is applied to, for instance, voice, image, or video. Image is the main focus of this paper; hence, this paper proposed and implemented an image steganography (covert communication) technique that does not break existing image recognition neural network systems. This technique enables data to be hidden in a cover image while the image recognition Artificial Neural Network (ANN) checks the presence of any visible alterations on the stego-image. Two different image steganography methods were tested: Least Significant Bit (LSB) and proposed Discrete Cosine Transform (DCT) LSB-2. The resulting stego-images were analyzed using a neural network implemented in the Keras TensorFlow soft tool. The results showed that the proposed DCT LSB-2 encoding method allows a high data payload and minimizes visible alterations, keeping the neural network’s efficiency at a maximum. An optimum ratio for encoding data in an image was determined to maintain the high robustness of the steganography system. This proposed method has shown improved stego-system performance compared to the previous techniques.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66333271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.718-728
Vedika Jorika, Nagaratna Medishetty
—Different vertical domains have gained popularity in integrating Blockchain technology with their existing applications, because of its numerous benefits like immutable, transparency, privacy, persistence, and security. Blockchain technology is used in various circumstances, allows the applications to achieve higher security, improved traceability, and transparency. This paper reviewed most of the applications related to the different domains and the number of criteria met by each application in each domain requirement. This paper examines the advantages, disadvantages, and limitations of implementing the Blockchain in various applications in different domains. Furthermore, this paper describes the prerequisites for deploying Blockchain across multiple application fields.
{"title":"Demystifying Blockchain: A Critical Analysis of Application Characteristics in Different Domains","authors":"Vedika Jorika, Nagaratna Medishetty","doi":"10.12720/jait.14.4.718-728","DOIUrl":"https://doi.org/10.12720/jait.14.4.718-728","url":null,"abstract":"—Different vertical domains have gained popularity in integrating Blockchain technology with their existing applications, because of its numerous benefits like immutable, transparency, privacy, persistence, and security. Blockchain technology is used in various circumstances, allows the applications to achieve higher security, improved traceability, and transparency. This paper reviewed most of the applications related to the different domains and the number of criteria met by each application in each domain requirement. This paper examines the advantages, disadvantages, and limitations of implementing the Blockchain in various applications in different domains. Furthermore, this paper describes the prerequisites for deploying Blockchain across multiple application fields.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66333573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.777-787
R. Laxmi, B. Kirubagari, Lakshmana Pandian
.
.
{"title":"Improved Model to Detect Cancer from Cervical Histopathology Images by Optimizing Feature Selection and Ensemble Classifier","authors":"R. Laxmi, B. Kirubagari, Lakshmana Pandian","doi":"10.12720/jait.14.4.777-787","DOIUrl":"https://doi.org/10.12720/jait.14.4.777-787","url":null,"abstract":".","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66333734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
—Lane-keeping is a vital component of autonomous driving that requires multiple artificial intelligence technologies and vision systems. However, maintaining a vehicle’s position within the lane is challenging when there is low visibility due to rain. In this research, a combination of image deraining and a deep learning-based network is proposed to improve the performance of the autonomous vehicle. First, a robust progressive Residual Network (ResNet) is used for rain removal. Second, a deep learning-based network architecture of the Convolutional Neural Networks (CNNs) is applied for lane-following on roads. To assess its accuracy and rain-removal capabilities, the network was evaluated on both synthetic and natural Rainy Datasets (RainSP), and its performance was compared to that of earlier research networks. Furthermore, the effectiveness of using both deraining and non-deraining networks in CNNs is evaluated by analyzing the predicted steering angle output. The experimental results show that the proposed model generates safe and accurate motion planning for lane-keeping in autonomous vehicles.
{"title":"Improving Autonomous Vehicle Performance through Integration of an Image Deraining and a Deep Learning-Based Network for Lane Following","authors":"Hoang Tran Ngoc, Phuc Phan Hong, Anh Nguyen Quoc, Luyl-Da Quach","doi":"10.12720/jait.14.6.1159-1168","DOIUrl":"https://doi.org/10.12720/jait.14.6.1159-1168","url":null,"abstract":"—Lane-keeping is a vital component of autonomous driving that requires multiple artificial intelligence technologies and vision systems. However, maintaining a vehicle’s position within the lane is challenging when there is low visibility due to rain. In this research, a combination of image deraining and a deep learning-based network is proposed to improve the performance of the autonomous vehicle. First, a robust progressive Residual Network (ResNet) is used for rain removal. Second, a deep learning-based network architecture of the Convolutional Neural Networks (CNNs) is applied for lane-following on roads. To assess its accuracy and rain-removal capabilities, the network was evaluated on both synthetic and natural Rainy Datasets (RainSP), and its performance was compared to that of earlier research networks. Furthermore, the effectiveness of using both deraining and non-deraining networks in CNNs is evaluated by analyzing the predicted steering angle output. The experimental results show that the proposed model generates safe and accurate motion planning for lane-keeping in autonomous vehicles.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135610446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.5.1019-1028
Dalia L. Elsheweikh
—Most models of automated web recommender systems depend on data mining algorithms to discover useful navigational patterns from the user’s previous browsing history. This paper presents a new model for developing a collaborative web recommendation system using a new technique for knowledge extraction. The proposed model introduces two techniques: cluster similarity-based technique and rule extraction technique to provide proper recommendations that meet the user’s needs. A cluster similarity-based technique groups the sessions that share common interests and behaviors according to a new similarity measure between the web users’ sessions. The rule extraction technique, which is based on a trained Artificial Neural Network (ANN) using a Genetic Algorithm (GA), is performed to discover groups of accurate and comprehensible rules from the clustering sessions. For extracting rules that belong to a specific cluster, GA can be applied to get the perfect values of the pages that maximize the output function of this cluster. A set of pruning schemes is proposed to decrease the size of the rule set and remove non-interesting rules. The resulting set of web pages recommended for a specific cluster is the dominant page in all rules that belong to this cluster. The experimental results indicate the proposed model’s efficiency in improving the classification’s precision and recall.
{"title":"A Novel Web Recommendation Model Based on the Web Usage Mining Technique","authors":"Dalia L. Elsheweikh","doi":"10.12720/jait.14.5.1019-1028","DOIUrl":"https://doi.org/10.12720/jait.14.5.1019-1028","url":null,"abstract":"—Most models of automated web recommender systems depend on data mining algorithms to discover useful navigational patterns from the user’s previous browsing history. This paper presents a new model for developing a collaborative web recommendation system using a new technique for knowledge extraction. The proposed model introduces two techniques: cluster similarity-based technique and rule extraction technique to provide proper recommendations that meet the user’s needs. A cluster similarity-based technique groups the sessions that share common interests and behaviors according to a new similarity measure between the web users’ sessions. The rule extraction technique, which is based on a trained Artificial Neural Network (ANN) using a Genetic Algorithm (GA), is performed to discover groups of accurate and comprehensible rules from the clustering sessions. For extracting rules that belong to a specific cluster, GA can be applied to get the perfect values of the pages that maximize the output function of this cluster. A set of pruning schemes is proposed to decrease the size of the rule set and remove non-interesting rules. The resulting set of web pages recommended for a specific cluster is the dominant page in all rules that belong to this cluster. The experimental results indicate the proposed model’s efficiency in improving the classification’s precision and recall.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136305458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}