首页 > 最新文献

Journal of Artificial Intelligence and Soft Computing Research最新文献

英文 中文
Classification of Unwanted SMS Data (Spam) with Text Mining Techniques 用文本挖掘技术分类不需要的短信数据(垃圾短信)
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-12-06 DOI: 10.55195/jscai.1210559
Rasim Çekik
Text mining, which derives information from written sources such as websites, books, e-mails, articles, and online news, processes and structures data using advanced approaches. The vast majority of SMS (Short Message Service) messages are unwanted short text documents. Effectively classifying these documents will aid in the detection of spam. The study attempted to identify the most effective techniques on SMS data at each stage of text mining. Four of the most well-known feature selection approaches were used, each of which is one of these parameters. As a result, the strategy that yielded the best results was chosen. In addition, another parameter that produces the best results with this approach, the classifier, was determined. The DFS feature selection approach produced the best results with the SVM classifier, according to the experimental results. This study establishes a general framework for future research in this area that will employ text mining techniques.
文本挖掘从书面来源(如网站、书籍、电子邮件、文章和在线新闻)中获取信息,使用高级方法处理和构建数据。绝大多数SMS(短消息服务)消息都是不需要的短文本文档。有效地对这些文档进行分类将有助于检测垃圾邮件。本研究试图在文本挖掘的每个阶段确定SMS数据的最有效技术。我们使用了四种最著名的特征选择方法,每一种方法都是这些参数中的一个。结果,选择了产生最佳结果的策略。此外,还确定了使用该方法产生最佳结果的另一个参数,即分类器。实验结果表明,DFS特征选择方法在支持向量机分类器上的效果最好。本研究为该领域将采用文本挖掘技术的未来研究建立了一个总体框架。
{"title":"Classification of Unwanted SMS Data (Spam) with Text Mining Techniques","authors":"Rasim Çekik","doi":"10.55195/jscai.1210559","DOIUrl":"https://doi.org/10.55195/jscai.1210559","url":null,"abstract":"Text mining, which derives information from written sources such as websites, books, e-mails, articles, and online news, processes and structures data using advanced approaches. The vast majority of SMS (Short Message Service) messages are unwanted short text documents. Effectively classifying these documents will aid in the detection of spam. The study attempted to identify the most effective techniques on SMS data at each stage of text mining. Four of the most well-known feature selection approaches were used, each of which is one of these parameters. As a result, the strategy that yielded the best results was chosen. In addition, another parameter that produces the best results with this approach, the classifier, was determined. The DFS feature selection approach produced the best results with the SVM classifier, according to the experimental results. This study establishes a general framework for future research in this area that will employ text mining techniques.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"12 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85606725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Map Augmentation to Improve Scale Invariance in Convolutional Neural Networks 增强特征图以提高卷积神经网络的尺度不变性
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-11-28 DOI: 10.2478/jaiscr-2023-0004
Dinesh Kumar, Dharmendra Sharma
Abstract Introducing variation in the training dataset through data augmentation has been a popular technique to make Convolutional Neural Networks (CNNs) spatially invariant but leads to increased dataset volume and computation cost. Instead of data augmentation, augmentation of feature maps is proposed to introduce variations in the features extracted by a CNN. To achieve this, a rotation transformer layer called Rotation Invariance Transformer (RiT) is developed, which applies rotation transformation to augment CNN features. The RiT layer can be used to augment output features from any convolution layer within a CNN. However, its maximum effectiveness is shown when placed at the output end of final convolution layer. We test RiT in the application of scale-invariance where we attempt to classify scaled images from benchmark datasets. Our results show promising improvements in the networks ability to be scale invariant whilst keeping the model computation cost low.
摘要通过数据扩充在训练数据集中引入变异是一种流行的技术,可以使卷积神经网络(CNNs)在空间上保持不变,但会增加数据集的体积和计算成本。提出了特征图的扩充来引入CNN提取的特征的变化,而不是数据扩充。为了实现这一点,开发了一个称为旋转不变变换器(RiT)的旋转变换器层,该层应用旋转变换来增强CNN特征。RiT层可以用于增强来自CNN内的任何卷积层的输出特征。然而,当放置在最终卷积层的输出端时,它的最大有效性被显示出来。我们在尺度不变性的应用中测试了RiT,我们试图从基准数据集中对缩放图像进行分类。我们的结果表明,在保持低模型计算成本的同时,网络的规模不变能力有了很好的改进。
{"title":"Feature Map Augmentation to Improve Scale Invariance in Convolutional Neural Networks","authors":"Dinesh Kumar, Dharmendra Sharma","doi":"10.2478/jaiscr-2023-0004","DOIUrl":"https://doi.org/10.2478/jaiscr-2023-0004","url":null,"abstract":"Abstract Introducing variation in the training dataset through data augmentation has been a popular technique to make Convolutional Neural Networks (CNNs) spatially invariant but leads to increased dataset volume and computation cost. Instead of data augmentation, augmentation of feature maps is proposed to introduce variations in the features extracted by a CNN. To achieve this, a rotation transformer layer called Rotation Invariance Transformer (RiT) is developed, which applies rotation transformation to augment CNN features. The RiT layer can be used to augment output features from any convolution layer within a CNN. However, its maximum effectiveness is shown when placed at the output end of final convolution layer. We test RiT in the application of scale-invariance where we attempt to classify scaled images from benchmark datasets. Our results show promising improvements in the networks ability to be scale invariant whilst keeping the model computation cost low.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"13 1","pages":"51 - 74"},"PeriodicalIF":2.8,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45000719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Extractive and Generic Document Summarization Based on NMF 基于NMF的自动提取与通用文档摘要
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-11-28 DOI: 10.2478/jaiscr-2023-0003
Mehdi Hosseinzadeh Aghdam
Abstract Nowadays, textual information grows exponentially on the Internet. Text summarization (TS) plays a crucial role in the massive amount of textual content. Manual TS is time-consuming and impractical in some applications with a huge amount of textual information. Automatic text summarization (ATS) is an essential technology to overcome mentioned challenges. Non-negative matrix factorization (NMF) is a useful tool for extracting semantic contents from textual data. Existing NMF approaches only focus on how factorized matrices should be modeled, and neglect the relationships among sentences. These relationships provide better factorization for TS. This paper suggests a novel non-negative matrix factorization for text summarization (NMFTS). The proposed ATS model puts regularizes on pairwise sentences vectors. A new cost function based on the Frobenius norm is designed, and an algorithm is developed to minimize this function by proposing iterative updating rules. The proposed NMFTS extracts semantic content by reducing the size of documents and mapping the same sentences closely together in the latent topic space. Compared with the basic NMF, the convergence time of the proposed method does not grow. The convergence proof of the NMFTS and empirical results on the benchmark data sets show that the suggested updating rules converge fast and achieve superior results compared to other methods.
摘要如今,文本信息在互联网上呈指数级增长。文本摘要在海量的文本内容中起着至关重要的作用。手动TS在一些具有大量文本信息的应用程序中耗时且不切实际。自动文本摘要(ATS)是克服上述挑战的关键技术。非负矩阵分解(NMF)是一种从文本数据中提取语义内容的有用工具。现有的NMF方法只关注因子分解矩阵应该如何建模,而忽略了句子之间的关系。这些关系为TS提供了更好的因子分解。本文提出了一种新的用于文本摘要的非负矩阵因子分解(NMFTS)。所提出的ATS模型对成对的句子向量进行正则化。设计了一种新的基于Frobenius范数的成本函数,并通过提出迭代更新规则,开发了一种最小化该函数的算法。所提出的NMFTS通过缩小文档的大小并在潜在主题空间中将相同的句子紧密映射在一起来提取语义内容。与基本的NMF相比,该方法的收敛时间没有增长。NMFTS的收敛性证明和在基准数据集上的经验结果表明,与其他方法相比,所提出的更新规则收敛速度快,取得了更好的结果。
{"title":"Automatic Extractive and Generic Document Summarization Based on NMF","authors":"Mehdi Hosseinzadeh Aghdam","doi":"10.2478/jaiscr-2023-0003","DOIUrl":"https://doi.org/10.2478/jaiscr-2023-0003","url":null,"abstract":"Abstract Nowadays, textual information grows exponentially on the Internet. Text summarization (TS) plays a crucial role in the massive amount of textual content. Manual TS is time-consuming and impractical in some applications with a huge amount of textual information. Automatic text summarization (ATS) is an essential technology to overcome mentioned challenges. Non-negative matrix factorization (NMF) is a useful tool for extracting semantic contents from textual data. Existing NMF approaches only focus on how factorized matrices should be modeled, and neglect the relationships among sentences. These relationships provide better factorization for TS. This paper suggests a novel non-negative matrix factorization for text summarization (NMFTS). The proposed ATS model puts regularizes on pairwise sentences vectors. A new cost function based on the Frobenius norm is designed, and an algorithm is developed to minimize this function by proposing iterative updating rules. The proposed NMFTS extracts semantic content by reducing the size of documents and mapping the same sentences closely together in the latent topic space. Compared with the basic NMF, the convergence time of the proposed method does not grow. The convergence proof of the NMFTS and empirical results on the benchmark data sets show that the suggested updating rules converge fast and achieve superior results compared to other methods.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"13 1","pages":"37 - 49"},"PeriodicalIF":2.8,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48314411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparative Study for Outlier Detection Methods in High Dimensional Text Data 高维文本数据中异常点检测方法的比较研究
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-11-28 DOI: 10.2478/jaiscr-2023-0001
C. Park
Abstract Outlier detection aims to find a data sample that is significantly different from other data samples. Various outlier detection methods have been proposed and have been shown to be able to detect anomalies in many practical problems. However, in high dimensional data, conventional outlier detection methods often behave unexpectedly due to a phenomenon called the curse of dimensionality. In this paper, we compare and analyze outlier detection performance in various experimental settings, focusing on text data with dimensions typically in the tens of thousands. Experimental setups were simulated to compare the performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions. The performance of outlier detection methods based on dimension reduction is compared, and a discussion on using k-NN distance in high dimensional data is also provided. Analysis through experimental comparison in various environments can provide insights into the application of outlier detection methods in high dimensional data.
摘要异常值检测旨在找到与其他数据样本显著不同的数据样本。已经提出了各种异常值检测方法,并且已经证明能够在许多实际问题中检测异常。然而,在高维数据中,由于一种称为维数诅咒的现象,传统的异常值检测方法往往表现得出乎意料。在本文中,我们比较和分析了各种实验环境中的异常值检测性能,重点关注维度通常为数万的文本数据。模拟了实验装置,以比较无监督与半监督模式以及单模态与多模态数据分布中异常值检测方法的性能。比较了基于降维的异常点检测方法的性能,并讨论了在高维数据中使用k-NN距离的问题。通过在各种环境中进行实验比较进行分析,可以深入了解异常值检测方法在高维数据中的应用。
{"title":"A Comparative Study for Outlier Detection Methods in High Dimensional Text Data","authors":"C. Park","doi":"10.2478/jaiscr-2023-0001","DOIUrl":"https://doi.org/10.2478/jaiscr-2023-0001","url":null,"abstract":"Abstract Outlier detection aims to find a data sample that is significantly different from other data samples. Various outlier detection methods have been proposed and have been shown to be able to detect anomalies in many practical problems. However, in high dimensional data, conventional outlier detection methods often behave unexpectedly due to a phenomenon called the curse of dimensionality. In this paper, we compare and analyze outlier detection performance in various experimental settings, focusing on text data with dimensions typically in the tens of thousands. Experimental setups were simulated to compare the performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions. The performance of outlier detection methods based on dimension reduction is compared, and a discussion on using k-NN distance in high dimensional data is also provided. Analysis through experimental comparison in various environments can provide insights into the application of outlier detection methods in high dimensional data.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"13 1","pages":"5 - 17"},"PeriodicalIF":2.8,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44468491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Information Technology for Comprehensive Monitoring and Control of the Microclimate in Industrial Greenhouses Based on Fuzzy Logic 基于模糊逻辑的工业大棚小气候综合监测与控制信息技术
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-11-28 DOI: 10.2478/jaiscr-2023-0002
I. Laktionov, O. Vovna, M. Kabanets
Abstract Nowadays, applied computer-oriented and information digitalization technologies are developing very dynamically and are widely used in various industries. One of the highest priority sectors of the economy of Ukraine and other countries around the world, the needs of which require intensive implementation of high-performance information technologies, is agriculture. The purpose of the article is to synthesise scientific and practical provisions to improve the information technology of the comprehensive monitoring and control of microclimate in industrial greenhouses. The object of research is non-stationary processes of aggregation and transformation of measurement data on soil and climatic conditions of the greenhouse microclimate. The subject of research is methods and models of computer-oriented analysis of measurement data on the soil and climatic state of the greenhouse microclimate. The main scientific and practical effect of the article is the development of the theory of intelligent information technologies for monitoring and control of greenhouse microclimate through the development of methods and models of distributed aggregation and intellectualised transformation of measurement data based on fuzzy logic.
摘要当今,面向计算机的应用技术和信息数字化技术正在蓬勃发展,并被广泛应用于各个行业。乌克兰和世界其他国家的经济中最优先考虑的部门之一是农业,其需求需要深入实施高性能信息技术。本文旨在综合科学实用的规定,提高工业温室小气候综合监测与控制的信息技术水平。研究对象是温室小气候的土壤和气候条件测量数据的聚合和转换的非平稳过程。研究主题是以计算机为导向分析温室小气候的土壤和气候状态测量数据的方法和模型。本文的主要科学和实际效果是通过开发基于模糊逻辑的测量数据分布式聚合和智能化转换的方法和模型,发展了温室小气候监测和控制的智能信息技术理论。
{"title":"Information Technology for Comprehensive Monitoring and Control of the Microclimate in Industrial Greenhouses Based on Fuzzy Logic","authors":"I. Laktionov, O. Vovna, M. Kabanets","doi":"10.2478/jaiscr-2023-0002","DOIUrl":"https://doi.org/10.2478/jaiscr-2023-0002","url":null,"abstract":"Abstract Nowadays, applied computer-oriented and information digitalization technologies are developing very dynamically and are widely used in various industries. One of the highest priority sectors of the economy of Ukraine and other countries around the world, the needs of which require intensive implementation of high-performance information technologies, is agriculture. The purpose of the article is to synthesise scientific and practical provisions to improve the information technology of the comprehensive monitoring and control of microclimate in industrial greenhouses. The object of research is non-stationary processes of aggregation and transformation of measurement data on soil and climatic conditions of the greenhouse microclimate. The subject of research is methods and models of computer-oriented analysis of measurement data on the soil and climatic state of the greenhouse microclimate. The main scientific and practical effect of the article is the development of the theory of intelligent information technologies for monitoring and control of greenhouse microclimate through the development of methods and models of distributed aggregation and intellectualised transformation of measurement data based on fuzzy logic.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"13 1","pages":"19 - 35"},"PeriodicalIF":2.8,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49011467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Training CNN Classifiers Solely on Webly Data 仅在Webly数据上训练CNN分类器
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-11-28 DOI: 10.2478/jaiscr-2023-0005
D. Lewy, J. Mańdziuk
Abstract Real life applications of deep learning (DL) are often limited by the lack of expert labeled data required to effectively train DL models. Creation of such data usually requires substantial amount of time for manual categorization, which is costly and is considered to be one of the major impediments in development of DL methods in many areas. This work proposes a classification approach which completely removes the need for costly expert labeled data and utilizes noisy web data created by the users who are not subject matter experts. The experiments are performed with two well-known Convolutional Neural Network (CNN) architectures: VGG16 and ResNet50 trained on three randomly collected Instagram-based sets of images from three distinct domains: metropolitan cities, popular food and common objects - the last two sets were compiled by the authors and made freely available to the research community. The dataset containing common objects is a webly counterpart of PascalVOC2007 set. It is demonstrated that despite significant amount of label noise in the training data, application of proposed approach paired with standard training CNN protocol leads to high classification accuracy on representative data in all three above-mentioned domains. Additionally, two straightforward procedures of automatic cleaning of the data, before its use in the training process, are proposed. Apparently, data cleaning does not lead to improvement of results which suggests that the presence of noise in webly data is actually helpful in learning meaningful and robust class representations. Manual inspection of a subset of web-based test data shows that labels assigned to many images are ambiguous even for humans. It is our conclusion that for the datasets and CNN architectures used in this paper, in case of training with webly data, a major factor contributing to the final classification accuracy is representativeness of test data rather than application of data cleaning procedures.
深度学习(DL)在现实生活中的应用往往受到缺乏有效训练DL模型所需的专家标记数据的限制。创建这样的数据通常需要大量的时间进行手动分类,这是昂贵的,并且被认为是在许多领域开发DL方法的主要障碍之一。这项工作提出了一种分类方法,该方法完全消除了对昂贵的专家标记数据的需要,并利用了由非主题专家的用户创建的噪声网络数据。实验是用两个著名的卷积神经网络(CNN)架构进行的:VGG16和ResNet50,在三个随机收集的基于instagram的图像集上进行训练,这些图像集来自三个不同的领域:大都市、受欢迎的食物和普通物体——最后两组由作者编译并免费提供给研究社区。包含公共对象的数据集是PascalVOC2007集的网络对应物。结果表明,尽管训练数据中存在大量的标签噪声,但将本文提出的方法与标准训练CNN协议相结合,可以在上述三个领域的代表性数据上获得较高的分类精度。此外,还提出了在训练过程中使用数据之前对数据进行自动清洗的两个简单步骤。显然,数据清洗不会导致结果的改善,这表明网络数据中噪声的存在实际上有助于学习有意义和鲁棒的类表示。人工检查基于web的测试数据子集表明,即使对人类来说,分配给许多图像的标签也是模糊的。我们的结论是,对于本文使用的数据集和CNN架构,在使用webly数据进行训练的情况下,影响最终分类准确率的主要因素是测试数据的代表性,而不是数据清洗程序的应用。
{"title":"Training CNN Classifiers Solely on Webly Data","authors":"D. Lewy, J. Mańdziuk","doi":"10.2478/jaiscr-2023-0005","DOIUrl":"https://doi.org/10.2478/jaiscr-2023-0005","url":null,"abstract":"Abstract Real life applications of deep learning (DL) are often limited by the lack of expert labeled data required to effectively train DL models. Creation of such data usually requires substantial amount of time for manual categorization, which is costly and is considered to be one of the major impediments in development of DL methods in many areas. This work proposes a classification approach which completely removes the need for costly expert labeled data and utilizes noisy web data created by the users who are not subject matter experts. The experiments are performed with two well-known Convolutional Neural Network (CNN) architectures: VGG16 and ResNet50 trained on three randomly collected Instagram-based sets of images from three distinct domains: metropolitan cities, popular food and common objects - the last two sets were compiled by the authors and made freely available to the research community. The dataset containing common objects is a webly counterpart of PascalVOC2007 set. It is demonstrated that despite significant amount of label noise in the training data, application of proposed approach paired with standard training CNN protocol leads to high classification accuracy on representative data in all three above-mentioned domains. Additionally, two straightforward procedures of automatic cleaning of the data, before its use in the training process, are proposed. Apparently, data cleaning does not lead to improvement of results which suggests that the presence of noise in webly data is actually helpful in learning meaningful and robust class representations. Manual inspection of a subset of web-based test data shows that labels assigned to many images are ambiguous even for humans. It is our conclusion that for the datasets and CNN architectures used in this paper, in case of training with webly data, a major factor contributing to the final classification accuracy is representativeness of test data rather than application of data cleaning procedures.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"13 1","pages":"75 - 92"},"PeriodicalIF":2.8,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48816169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Detecting Anomalies in Advertising Web Traffic with the Use of the Variational Autoencoder 利用变分自动编码器检测广告网络流量异常
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-10-01 DOI: 10.2478/jaiscr-2022-0017
Marcin Gabryel, Dawid Lada, Z. Filutowicz, Zofia Patora-Wysocka, Marek Kisiel-Dorohinicki, Guangxing Chen
Abstract This paper presents a neural network model for identifying non-human traffic to a web-site, which is significantly different from visits made by regular users. Such visits are undesirable from the point of view of the website owner as they are not human activity, and therefore do not bring any value, and, what is more, most often involve costs incurred in connection with the handling of advertising. They are made most often by dishonest publishers using special software (bots) to generate profits. Bots are also used in scraping, which is automatic scanning and downloading of website content, which actually is not in the interest of website authors. The model proposed in this work is learnt by data extracted directly from the web browser during website visits. This data is acquired by using a specially prepared JavaScript that monitors the behavior of the user or bot. The appearance of a bot on a website generates parameter values that are significantly different from those collected during typical visits made by human website users. It is not possible to learn more about the software controlling the bots and to know all the data generated by them. Therefore, this paper proposes a variational autoencoder (VAE) neural network model with modifications to detect the occurrence of abnormal parameter values that deviate from data obtained from human users’ Internet traffic. The algorithm works on the basis of a popular autoencoder method for detecting anomalies, however, a number of original improvements have been implemented. In the study we used authentic data extracted from several large online stores.
摘要:本文提出了一种神经网络模型,用于识别非人类访问网站的流量,这些流量与普通用户的访问有很大的不同。从网站所有者的角度来看,这种访问是不可取的,因为它们不是人类活动,因此不会带来任何价值,而且,更重要的是,大多数情况下涉及与处理广告相关的费用。它们通常是由不诚实的出版商使用特殊软件(机器人)来产生利润的。爬虫也被用于抓取,这是自动扫描和下载网站内容,这实际上是不符合网站作者的利益。在这项工作中提出的模型是通过在网站访问期间直接从web浏览器中提取数据来学习的。这些数据是通过使用专门准备的JavaScript获取的,该JavaScript监视用户或bot的行为。网站上出现的机器人产生的参数值与人类网站用户在典型访问期间收集的参数值有很大不同。我们不可能更多地了解控制机器人的软件,也不可能知道它们产生的所有数据。因此,本文提出了一种经过修改的变分自编码器(VAE)神经网络模型,用于检测偏离人类用户互联网流量数据的异常参数值的发生。该算法基于一种流行的自动编码器方法来检测异常,然而,一些原始的改进已经实现。在研究中,我们使用了从几家大型在线商店提取的真实数据。
{"title":"Detecting Anomalies in Advertising Web Traffic with the Use of the Variational Autoencoder","authors":"Marcin Gabryel, Dawid Lada, Z. Filutowicz, Zofia Patora-Wysocka, Marek Kisiel-Dorohinicki, Guangxing Chen","doi":"10.2478/jaiscr-2022-0017","DOIUrl":"https://doi.org/10.2478/jaiscr-2022-0017","url":null,"abstract":"Abstract This paper presents a neural network model for identifying non-human traffic to a web-site, which is significantly different from visits made by regular users. Such visits are undesirable from the point of view of the website owner as they are not human activity, and therefore do not bring any value, and, what is more, most often involve costs incurred in connection with the handling of advertising. They are made most often by dishonest publishers using special software (bots) to generate profits. Bots are also used in scraping, which is automatic scanning and downloading of website content, which actually is not in the interest of website authors. The model proposed in this work is learnt by data extracted directly from the web browser during website visits. This data is acquired by using a specially prepared JavaScript that monitors the behavior of the user or bot. The appearance of a bot on a website generates parameter values that are significantly different from those collected during typical visits made by human website users. It is not possible to learn more about the software controlling the bots and to know all the data generated by them. Therefore, this paper proposes a variational autoencoder (VAE) neural network model with modifications to detect the occurrence of abnormal parameter values that deviate from data obtained from human users’ Internet traffic. The algorithm works on the basis of a popular autoencoder method for detecting anomalies, however, a number of original improvements have been implemented. In the study we used authentic data extracted from several large online stores.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"12 1","pages":"255 - 256"},"PeriodicalIF":2.8,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42517276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-Population-Based Algorithm with an Exchange of Training Plans Based on Population Evaluation 基于种群评估的训练计划交换的多种群算法
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-10-01 DOI: 10.2478/jaiscr-2022-0016
Krystian Łapa, K. Cpałka, Marek Kisiel-Dorohinicki, J. Paszkowski, Maciej Dębski, Van-Hung Le
Abstract Population Based Algorithms (PBAs) are excellent search tools that allow searching space of parameters defined by problems under consideration. They are especially useful when it is difficult to define a differentiable evaluation criterion. This applies, for example, to problems that are a combination of continuous and discrete (combinatorial) problems. In such problems, it is often necessary to select a certain structure of the solution (e.g. a neural network or other systems with a structure usually selected by the trial and error method) and to determine the parameters of such structure. As PBAs have great application possibilities, the aim is to develop more and more effective search formulas used in them. An interesting approach is to use multiple populations and process them with separate PBAs (in a different way). In this paper, we propose a new multi-population-based algorithm with: (a) subpopulation evaluation and (b) replacement of the associated PBAs subpopulation formulas used for their processing. In the simulations, we used a set of typical CEC2013 benchmark functions. The obtained results confirm the validity of the proposed concept.
基于种群的算法(PBAs)是一种优秀的搜索工具,它允许对所考虑的问题定义的参数进行搜索空间。当难以定义一个可微的评价标准时,它们特别有用。例如,这适用于连续和离散(组合)问题的组合问题。在这类问题中,通常需要选择解的某种结构(例如,神经网络或其他具有通常通过试错法选择结构的系统)并确定这种结构的参数。由于pha具有很大的应用前景,因此我们的目标是开发出更多更有效的搜索公式。一个有趣的方法是使用多个种群并使用单独的pha(以不同的方式)处理它们。在本文中,我们提出了一种新的基于多种群的算法:(a)亚种群评估和(b)替换用于处理它们的相关PBAs亚种群公式。在模拟中,我们使用了一组典型的CEC2013基准函数。所得结果证实了所提概念的有效性。
{"title":"Multi-Population-Based Algorithm with an Exchange of Training Plans Based on Population Evaluation","authors":"Krystian Łapa, K. Cpałka, Marek Kisiel-Dorohinicki, J. Paszkowski, Maciej Dębski, Van-Hung Le","doi":"10.2478/jaiscr-2022-0016","DOIUrl":"https://doi.org/10.2478/jaiscr-2022-0016","url":null,"abstract":"Abstract Population Based Algorithms (PBAs) are excellent search tools that allow searching space of parameters defined by problems under consideration. They are especially useful when it is difficult to define a differentiable evaluation criterion. This applies, for example, to problems that are a combination of continuous and discrete (combinatorial) problems. In such problems, it is often necessary to select a certain structure of the solution (e.g. a neural network or other systems with a structure usually selected by the trial and error method) and to determine the parameters of such structure. As PBAs have great application possibilities, the aim is to develop more and more effective search formulas used in them. An interesting approach is to use multiple populations and process them with separate PBAs (in a different way). In this paper, we propose a new multi-population-based algorithm with: (a) subpopulation evaluation and (b) replacement of the associated PBAs subpopulation formulas used for their processing. In the simulations, we used a set of typical CEC2013 benchmark functions. The obtained results confirm the validity of the proposed concept.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"12 1","pages":"239 - 253"},"PeriodicalIF":2.8,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47137157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semantic Hashing for Fast Solar Magnetogram Retrieval 用于快速太阳磁图检索的语义哈希
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-10-01 DOI: 10.2478/jaiscr-2022-0020
Rafał Grycuk, R. Scherer, A. Marchlewska, Christian Napoli
Abstract We propose a method for content-based retrieving solar magnetograms. We use the SDO Helioseismic and Magnetic Imager output collected with SunPy PyTorch libraries. We create a mathematical representation of the magnetic field regions of the Sun in the form of a vector. Thanks to this solution we can compare short vectors instead of comparing full-disk images. In order to decrease the retrieval time, we used a fully-connected autoencoder, which reduced the 256-element descriptor to a 32-element semantic hash. The performed experiments and comparisons proved the efficiency of the proposed approach. Our approach has the highest precision value in comparison with other state-of-the-art methods. The presented method can be used not only for solar image retrieval but also for classification tasks.
摘要我们提出了一种基于内容的太阳磁图检索方法。我们使用SunPyPyTorch库收集的SDO太阳地震和磁成像仪输出。我们以矢量的形式创建了太阳磁场区域的数学表示。多亏了这个解决方案,我们可以比较短矢量,而不是比较全磁盘图像。为了减少检索时间,我们使用了一个完全连接的自动编码器,它将256元素的描述符简化为32元素的语义哈希。实验和比较证明了该方法的有效性。与其他最先进的方法相比,我们的方法具有最高的精度值。所提出的方法不仅可以用于太阳图像检索,还可以用于分类任务。
{"title":"Semantic Hashing for Fast Solar Magnetogram Retrieval","authors":"Rafał Grycuk, R. Scherer, A. Marchlewska, Christian Napoli","doi":"10.2478/jaiscr-2022-0020","DOIUrl":"https://doi.org/10.2478/jaiscr-2022-0020","url":null,"abstract":"Abstract We propose a method for content-based retrieving solar magnetograms. We use the SDO Helioseismic and Magnetic Imager output collected with SunPy PyTorch libraries. We create a mathematical representation of the magnetic field regions of the Sun in the form of a vector. Thanks to this solution we can compare short vectors instead of comparing full-disk images. In order to decrease the retrieval time, we used a fully-connected autoencoder, which reduced the 256-element descriptor to a 32-element semantic hash. The performed experiments and comparisons proved the efficiency of the proposed approach. Our approach has the highest precision value in comparison with other state-of-the-art methods. The presented method can be used not only for solar image retrieval but also for classification tasks.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"12 1","pages":"299 - 306"},"PeriodicalIF":2.8,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41625672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Combined YOLOv5 and HRNet for High Accuracy 2D Keypoint and Human Pose Estimation 结合YOLOv5和HRNet的高精度二维关键点和人体姿态估计
IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-10-01 DOI: 10.2478/jaiscr-2022-0019
Hung-Cuong Nguyen, Thi-Hao Nguyen, Jakub Nowak, A. Byrski, A. Siwocha, Van-Hung Le
Abstract Two-dimensional human pose estimation has been widely applied in real-world applications such as sports analysis, medical fall detection, human-robot interaction, with many positive results obtained utilizing Convolutional Neural Networks (CNNs). Li et al. at CVPR 2020 proposed a study in which they achieved high accuracy in estimating 2D keypoints estimation/2D human pose estimation. However, the study performed estimation only on the cropped human image data. In this research, we propose a method for automatically detecting and estimating human poses in photos using a combination of YOLOv5 + CC (Contextual Constraints) and HRNet. Our approach inherits the speed of the YOLOv5 for detecting humans and the efficiency of the HRNet for estimating 2D keypoints/2D human pose on the images. We also performed human marking on the images by bounding boxes of the Human 3.6M dataset (Protocol #1) for human detection evaluation. Our approach obtained high detection results in the image and the processing time is 55 FPS on the Human 3.6M dataset (Protocol #1). The mean error distance is 5.14 pixels on the full size of the image (1000 × 1002). In particular, the average results of 2D human pose estimation/2D keypoints estimation are 94.8% of PCK and 99.2% of PDJ@0.4 (head joint). The results are available.
摘要二维人体姿态估计在运动分析、医疗跌倒检测、人机交互等实际应用中得到了广泛应用,利用卷积神经网络获得了许多积极的结果。李等人在CVPR 2020上提出了一项研究,在该研究中,他们在估计2D关键点估计/2D人体姿态估计方面实现了高精度。然而,该研究仅对裁剪后的人体图像数据进行了估计。在这项研究中,我们提出了一种使用YOLOv5+CC(上下文约束)和HRNet的组合来自动检测和估计照片中人类姿势的方法。我们的方法继承了YOLOv5用于检测人类的速度和HRNet用于估计图像上的2D关键点/2D人类姿势的效率。我们还通过人类3.6M数据集(方案#1)的边界框对图像进行了人类标记,用于人类检测评估。我们的方法在图像中获得了高检测结果,并且在人类3.6M数据集(协议#1)上的处理时间为55FPS。在图像的全尺寸(1000×1002)上,平均误差距离为5.14像素。特别地,2D人体姿态估计/2D关键点估计的平均结果是PCK的94.8%和PCK的99.2%PDJ@0.4(头关节)。结果是可用的。
{"title":"Combined YOLOv5 and HRNet for High Accuracy 2D Keypoint and Human Pose Estimation","authors":"Hung-Cuong Nguyen, Thi-Hao Nguyen, Jakub Nowak, A. Byrski, A. Siwocha, Van-Hung Le","doi":"10.2478/jaiscr-2022-0019","DOIUrl":"https://doi.org/10.2478/jaiscr-2022-0019","url":null,"abstract":"Abstract Two-dimensional human pose estimation has been widely applied in real-world applications such as sports analysis, medical fall detection, human-robot interaction, with many positive results obtained utilizing Convolutional Neural Networks (CNNs). Li et al. at CVPR 2020 proposed a study in which they achieved high accuracy in estimating 2D keypoints estimation/2D human pose estimation. However, the study performed estimation only on the cropped human image data. In this research, we propose a method for automatically detecting and estimating human poses in photos using a combination of YOLOv5 + CC (Contextual Constraints) and HRNet. Our approach inherits the speed of the YOLOv5 for detecting humans and the efficiency of the HRNet for estimating 2D keypoints/2D human pose on the images. We also performed human marking on the images by bounding boxes of the Human 3.6M dataset (Protocol #1) for human detection evaluation. Our approach obtained high detection results in the image and the processing time is 55 FPS on the Human 3.6M dataset (Protocol #1). The mean error distance is 5.14 pixels on the full size of the image (1000 × 1002). In particular, the average results of 2D human pose estimation/2D keypoints estimation are 94.8% of PCK and 99.2% of PDJ@0.4 (head joint). The results are available.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"12 1","pages":"281 - 298"},"PeriodicalIF":2.8,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44147032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Journal of Artificial Intelligence and Soft Computing Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1