Applied Computer Systems最新文献

Approximate Nearest Neighbour-based Index Tree: A Case Study for Instrumental Music Search 基于最近邻的近似索引树:器乐搜索的案例研究

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0015

Nguyen Ha Thanh, Linh Dan Vo, Thien Thanh Tran

Abstract Many people are interested in instrumental music. They may have one piece of song, but it is a challenge to seek the song because they do not have lyrics to describe for a text-based search engine. This study leverages the Approximate Nearest Neighbours to preprocess the instrumental songs and extract the characteristics of the track in the repository using the Mel frequency cepstral coefficients (MFCC) characteristic extraction. Our method digitizes the track, extracts the track characteristics, and builds the index tree with different lengths of each MFCC and dimension number of vectors. We collected songs played with various instruments for the experiments. Our result on 100 pieces of various songs in different lengths, with a sampling rate of 16000 and a length of each MFCC of 13, gives the best results, where accuracy on the Top 1 is 36 %, Top 5 is 4 %, and Top 10 is 44 %. We expect this work to provide useful tools to develop digital music e-commerce systems.

许多人对器乐感兴趣。他们可能有一首歌曲，但寻找这首歌是一个挑战，因为他们没有歌词来描述基于文本的搜索引擎。本研究利用近似近邻对器乐歌曲进行预处理，并使用Mel频率倒谱系数(MFCC)特征提取提取库中曲目的特征。该方法对航迹进行数字化处理，提取航迹特征，并利用每个MFCC的不同长度和向量的维数构建索引树。我们为实验收集了用各种乐器演奏的歌曲。我们对100首不同长度的不同歌曲的结果，采样率为16000，每个MFCC的长度为13，给出了最好的结果，其中Top 1的准确率为36%，Top 5为4%，Top 10为44%。我们期望这项工作为开发数字音乐电子商务系统提供有用的工具。

引用次数: 0

Classification of COVID-19 Chest X-Ray Images Based on Speeded Up Robust Features and Clustering-Based Support Vector Machines 基于加速鲁棒特征和聚类支持向量机的COVID-19胸部x射线图像分类

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0016

M. Rajab

Abstract Due to the worldwide deficiency of medical test kits and the significant time required by radiology experts to identify the new COVID-19, it is essential to develop fast, robust, and intelligent chest X-ray (CXR) image classification system. The proposed method consists of two major components: feature extraction and classification. The Bag of image features algorithm creates visual vocabulary from two training data categories of chest X-ray images: Normal and COVID-19 patients’ datasets. The algorithm extracts salient features and descriptors from CXR images using the Speeded Up Robust Features (SURF) algorithm. Machine learning with the Clustering-Based Support Vector Machines (CB-SVMs) multiclass classifier is trained using SURF features to classify the CXR image categories. The careful collection of ground truth Normal and COVID-19 CXR datasets, provided by worldwide expert radiologists, has certainly influenced the performance of the proposed CB-SVMs classifier to preserve the generalization capabilities. The high classification accuracy of 99 % demonstrates the effectiveness of the proposed method, where the accuracy is assessed on an independent test sets.

由于医学检测试剂盒在全球范围内缺乏，且放射学专家需要大量时间来识别新型COVID-19，因此开发快速、健壮、智能的胸部x射线(CXR)图像分类系统势在必行。该方法主要由特征提取和分类两部分组成。图像特征包算法从胸部x射线图像的两个训练数据类别(正常和COVID-19患者数据集)中创建视觉词汇表。该算法采用加速鲁棒特征(SURF)算法从CXR图像中提取显著特征和描述符。利用SURF特征训练基于聚类支持向量机(cb - svm)的机器学习多类分类器对CXR图像类别进行分类。由全球专家放射科医生提供的地面真实正常和COVID-19 CXR数据集的仔细收集肯定会影响所提出的cb - svm分类器的性能，以保持泛化能力。99%的高分类准确率证明了所提出方法的有效性，其中准确度是在独立的测试集上评估的。

{"title":"Classification of COVID-19 Chest X-Ray Images Based on Speeded Up Robust Features and Clustering-Based Support Vector Machines","authors":"M. Rajab","doi":"10.2478/acss-2023-0016","DOIUrl":"https://doi.org/10.2478/acss-2023-0016","url":null,"abstract":"Abstract Due to the worldwide deficiency of medical test kits and the significant time required by radiology experts to identify the new COVID-19, it is essential to develop fast, robust, and intelligent chest X-ray (CXR) image classification system. The proposed method consists of two major components: feature extraction and classification. The Bag of image features algorithm creates visual vocabulary from two training data categories of chest X-ray images: Normal and COVID-19 patients’ datasets. The algorithm extracts salient features and descriptors from CXR images using the Speeded Up Robust Features (SURF) algorithm. Machine learning with the Clustering-Based Support Vector Machines (CB-SVMs) multiclass classifier is trained using SURF features to classify the CXR image categories. The careful collection of ground truth Normal and COVID-19 CXR datasets, provided by worldwide expert radiologists, has certainly influenced the performance of the proposed CB-SVMs classifier to preserve the generalization capabilities. The high classification accuracy of 99 % demonstrates the effectiveness of the proposed method, where the accuracy is assessed on an independent test sets.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"8 1","pages":"163 - 169"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79090306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Construction of Quasi-DOE on Sobol’s Sequences with Better Uniformity 2D Projections 二维均匀性较好的Sobol序列拟doe的构造

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0003

V. Halchenko, R. Trembovetska, V. Tychkov, N. Tychkova

Abstract In order to establish the projection properties of computer uniform designs of experiments on Sobol’s sequences, an empirical comparative statistical analysis of the homogeneity of 2D projections of the best known improved designs of experiments was carried out using the novel objective indicators of discrepancies. These designs show an incomplete solution to the problem of clustering points in low-dimensional projections graphically and numerically, which requires further research for new Sobol’s sequences without the drawback mentioned above. In the article, using the example of the first 20 improved Sobol’s sequences, a methodology for creating refined designs is proposed, which is based on the unconventional use of these already found sequences. It involves the creation of the next dimensional design based on the best homogeneity and projection properties of the previous one. The selection of sequences for creating an initial design is based on the analysis of numerical indicators of the weighted symmetrized centered discrepancy for two-dimensional projections. According to the algorithm, the combination of sequences is fixed for the found variant and a complete search of the added one-dimensional sequences is performed until the best one is detected. According to the proposed methodology, as an example, a search for more perfect variants of designs for factor spaces from two to nine dimensions was carried out. New combinations of Sobol’s sequences with better projection properties than those already known are given. Their effectiveness is confirmed by statistical calculations and graphically demonstrated box plots and histograms of the projection indicators distribution of the weighted symmetrized centred discrepancy. In addition, the numerical results of calculating the volumetric indicators of discrepancies for the created designs with different number of points are given.

摘要为了建立Sobol序列实验计算机均匀设计的投影特性，采用新的差异客观指标对已知的改进实验设计的二维投影均匀性进行了实证比较统计分析。这些设计在图形和数值上显示了对低维投影中点聚类问题的不完全解决方案，这需要进一步研究新的Sobol序列，而不存在上述缺点。在本文中，使用前20个改进的Sobol序列的示例，提出了一种创建改进设计的方法，该方法基于对这些已经发现的序列的非常规使用。它涉及到基于前一个维度的最佳均匀性和投影特性的下一个维度设计的创建。在对二维投影加权对称中心差的数值指标进行分析的基础上，选择初始设计序列。根据该算法，对于发现的变异，固定序列组合，并对添加的一维序列进行完整搜索，直到检测到最佳序列。根据所提出的方法，作为一个例子，在2到9维的因子空间中搜索更完美的设计变体。给出了具有更好投影特性的Sobol序列的新组合。统计计算和加权对称中心差异的投影指标分布的箱形图和直方图证实了它们的有效性。此外，还给出了不同点数的设计方案体积差异指标的数值计算结果。

{"title":"Construction of Quasi-DOE on Sobol’s Sequences with Better Uniformity 2D Projections","authors":"V. Halchenko, R. Trembovetska, V. Tychkov, N. Tychkova","doi":"10.2478/acss-2023-0003","DOIUrl":"https://doi.org/10.2478/acss-2023-0003","url":null,"abstract":"Abstract In order to establish the projection properties of computer uniform designs of experiments on Sobol’s sequences, an empirical comparative statistical analysis of the homogeneity of 2D projections of the best known improved designs of experiments was carried out using the novel objective indicators of discrepancies. These designs show an incomplete solution to the problem of clustering points in low-dimensional projections graphically and numerically, which requires further research for new Sobol’s sequences without the drawback mentioned above. In the article, using the example of the first 20 improved Sobol’s sequences, a methodology for creating refined designs is proposed, which is based on the unconventional use of these already found sequences. It involves the creation of the next dimensional design based on the best homogeneity and projection properties of the previous one. The selection of sequences for creating an initial design is based on the analysis of numerical indicators of the weighted symmetrized centered discrepancy for two-dimensional projections. According to the algorithm, the combination of sequences is fixed for the found variant and a complete search of the added one-dimensional sequences is performed until the best one is detected. According to the proposed methodology, as an example, a search for more perfect variants of designs for factor spaces from two to nine dimensions was carried out. New combinations of Sobol’s sequences with better projection properties than those already known are given. Their effectiveness is confirmed by statistical calculations and graphically demonstrated box plots and histograms of the projection indicators distribution of the weighted symmetrized centred discrepancy. In addition, the numerical results of calculating the volumetric indicators of discrepancies for the created designs with different number of points are given.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"21 1","pages":"21 - 34"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86096140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PhoBERT: Application in Disease Classification based on Vietnamese Symptom Analysis 越南症状分析在疾病分类中的应用

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0004

Nguyen Ha Thanh, Tuyet Ngoc Huynh, Nhi Mai, K. D. Le, Pham Thi-Ngoc-Diem

Abstract Besides the successful use of support software in cutting-edge medical procedures, the significance of determining a disease early signs and symptoms before its detection is a growing pressing requirement to raise the standard of medical examination and treatment. This creates favourable conditions, reduces patient inconvenience and hospital overcrowding. Before transferring patients to an appropriate doctor, healthcare staff must have the patient’s symptoms. This study leverages the PhoBERT model to assist in classifying patients with text classification tasks based on symptoms they provided in the first stages of Vietnamese hospital admission. The outcomes of PhoBERT on more than 200 000 text-based symptoms collected from Vietnamese hospitals can improve the classification performance compared to Bag of Words (BOW) with classic machine learning algorithms, and some considered deep learning architectures such as 1D-Convolutional Neural Networks and Long Short-Term Memory. The proposed method can achieve promising results to be deployed in automatic hospital admission procedures in Vietnam.

除了支持软件在尖端医疗程序中的成功应用外，在发现疾病之前确定疾病的早期体征和症状的意义是提高医疗检查和治疗水平的日益迫切的要求。这创造了有利条件，减少了病人的不便和医院过度拥挤。在将患者转诊给合适的医生之前，医护人员必须了解患者的症状。本研究利用PhoBERT模型，以协助分类病人的文本分类任务，基于症状，他们提供在越南医院入院的第一阶段。PhoBERT对从越南医院收集的20多万个基于文本的症状的结果，与使用经典机器学习算法的单词袋(BOW)相比，可以提高分类性能，并且一些考虑了深度学习架构，如1d -卷积神经网络和长短期记忆。所提出的方法可以在越南的自动住院程序中得到很好的应用。

引用次数: 0

Intelligent Mobile User Profiling for Maximum Performance 智能移动用户分析的最大性能

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0014

A. Muhammad, Sher Afghan, Afzal Muhammad

Abstract The use of smartphones and their applications is expanding rapidly, thereby increasing the demand of computational power and other hardware resources of the smartphones. On the other hand, these small devices can have limited resources of computation power, battery backup, RAM memory, and storage space due to their small size. These devices need to reconcile resource hungry applications. This research focuses on solving issues of power and efficiency of smart devices by adapting intelligently to mobile usage by profiling the user intelligently. Our designed architecture makes a smartphone smarter by intelligently utilizing its resources to increase the battery life. Our developed application makes profiles of the applications usage at different time intervals. These stored usage profiles are utilized to make intelligent resource allocation for next time interval. We implemented and evaluated the profiling scheme for different brands of android smartphone. We implemented our approach with Naive Bayes and Decision Tree for performance and compared it with conventional approach. The results show that the proposed approach based on decision trees saves 31 % CPU and 60 % of RAM usage as compared to the conventional approach.

智能手机的使用及其应用正在迅速扩大，从而增加了对智能手机计算能力和其他硬件资源的需求。另一方面，这些小型设备由于体积小，在计算能力、备用电池、RAM内存和存储空间方面的资源有限。这些设备需要协调需要大量资源的应用程序。本研究的重点是通过智能剖析用户，解决智能设备智能适应移动使用的功耗和效率问题。我们设计的架构通过智能地利用其资源来延长电池寿命，使智能手机更加智能。我们开发的应用程序以不同的时间间隔生成应用程序使用情况的概要文件。这些存储的使用配置文件用于为下一个时间间隔进行智能资源分配。我们针对不同品牌的android智能手机实现并评估了分析方案。我们使用朴素贝叶斯和决策树来实现我们的方法，并将其与传统方法进行比较。结果表明，与传统方法相比，基于决策树的方法可以节省31%的CPU和60%的RAM使用。

{"title":"Intelligent Mobile User Profiling for Maximum Performance","authors":"A. Muhammad, Sher Afghan, Afzal Muhammad","doi":"10.2478/acss-2023-0014","DOIUrl":"https://doi.org/10.2478/acss-2023-0014","url":null,"abstract":"Abstract The use of smartphones and their applications is expanding rapidly, thereby increasing the demand of computational power and other hardware resources of the smartphones. On the other hand, these small devices can have limited resources of computation power, battery backup, RAM memory, and storage space due to their small size. These devices need to reconcile resource hungry applications. This research focuses on solving issues of power and efficiency of smart devices by adapting intelligently to mobile usage by profiling the user intelligently. Our designed architecture makes a smartphone smarter by intelligently utilizing its resources to increase the battery life. Our developed application makes profiles of the applications usage at different time intervals. These stored usage profiles are utilized to make intelligent resource allocation for next time interval. We implemented and evaluated the profiling scheme for different brands of android smartphone. We implemented our approach with Naive Bayes and Decision Tree for performance and compared it with conventional approach. The results show that the proposed approach based on decision trees saves 31 % CPU and 60 % of RAM usage as compared to the conventional approach.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"18 1","pages":"148 - 155"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78497004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Social Media: An Exploratory Study of Information, Misinformation, Disinformation, and Malinformation 社交媒体:信息、错误信息、虚假信息和错误信息的探索性研究

Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0002

Mumtaz Hussain, Tariq Rahim Soomro

Abstract The widespread use of social media all around the globe has affected the way of life in all aspects, not only for individuals but for businesses as well. Businesses share their upcoming events, reveal their products, and advertise to their potential customers, where individuals use social media to stay connected with their social circles, get updates and news from social media pages of news agencies, and update their information regarding the latest activities, businesses, economy, events, politics, trends, and about their area of interest. According to Statista, there were 4.59 billion users of social media worldwide in 2022 and expected to grow up to 5.85 billion in the year 2027. With its massive user base, social media does not only generate useful information for businesses and individuals, but at the same time, it also creates an abundance of misinformation and disinformation as well as malinformation to acquire social-political or business agendas. Individuals tend to share social media posts without checking the authenticity of the information they are sharing, which results in posts having misinformation, disinformation, or malinformation becoming viral around the world in a matter of minutes. Identifying misinformation, disinformation, and malinformation has become a prominent problem associated with social media.

社交媒体在全球范围内的广泛使用，不仅影响了个人的生活方式，也影响了企业的生活方式。企业分享他们即将举行的活动，展示他们的产品，并向他们的潜在客户做广告，个人使用社交媒体与他们的社交圈保持联系，从新闻机构的社交媒体页面获取更新和新闻，并更新他们关于最新活动，商业，经济，事件，政治，趋势以及他们感兴趣的领域的信息。根据Statista的数据，2022年全球有45.9亿社交媒体用户，预计到2027年将增长到58.5亿。凭借其庞大的用户基础，社交媒体不仅为企业和个人提供有用的信息，同时，它也创造了大量的错误信息和虚假信息，以及获取社会政治或商业议程的恶意信息。人们倾向于在分享社交媒体帖子时不检查信息的真实性，这导致含有错误信息、虚假信息或恶意信息的帖子在几分钟内就在全球范围内传播开来。识别错误信息、虚假信息和恶意信息已经成为与社交媒体相关的一个突出问题。

{"title":"Social Media: An Exploratory Study of Information, Misinformation, Disinformation, and Malinformation","authors":"Mumtaz Hussain, Tariq Rahim Soomro","doi":"10.2478/acss-2023-0002","DOIUrl":"https://doi.org/10.2478/acss-2023-0002","url":null,"abstract":"Abstract The widespread use of social media all around the globe has affected the way of life in all aspects, not only for individuals but for businesses as well. Businesses share their upcoming events, reveal their products, and advertise to their potential customers, where individuals use social media to stay connected with their social circles, get updates and news from social media pages of news agencies, and update their information regarding the latest activities, businesses, economy, events, politics, trends, and about their area of interest. According to Statista, there were 4.59 billion users of social media worldwide in 2022 and expected to grow up to 5.85 billion in the year 2027. With its massive user base, social media does not only generate useful information for businesses and individuals, but at the same time, it also creates an abundance of misinformation and disinformation as well as malinformation to acquire social-political or business agendas. Individuals tend to share social media posts without checking the authenticity of the information they are sharing, which results in posts having misinformation, disinformation, or malinformation becoming viral around the world in a matter of minutes. Identifying misinformation, disinformation, and malinformation has become a prominent problem associated with social media.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135046043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BRS-based Model for the Specification of Multi-view Point Ontology 基于brs的多视点本体规范模型

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0008

Manel Kolli

Abstract In this paper, we propose a new approach, based on bigraphic reactive systems (BRS), to provide a formal modelling of the architectural elements of a Multi-Viewpoints ontology (MVp ontology). We introduce a formal model in which the main elements of MVp ontology find their definition in terms of bigraphic concepts by preserving their semantics. Besides, we enrich the proposed model with reaction rules in order to handle the dynamic reactions of MVp ontology. In order to confirm the applicability of our approach, we have carried out a case study using the proposed model.

在本文中，我们提出了一种基于图形反应系统(BRS)的新方法，为多视点本体(MVp本体)的架构元素提供形式化建模。我们引入了一个形式化模型，在该模型中，MVp本体的主要元素通过保留其语义来找到它们在图形概念方面的定义。此外，为了处理MVp本体的动态反应，我们在模型中加入了反应规则。为了证实我们的方法的适用性，我们使用提出的模型进行了一个案例研究。

引用次数: 0

Speedup of the k-Means Algorithm for Partitioning Large Datasets of Flat Points by a Preliminary Partition and Selecting Initial Centroids k-Means算法对大数据集平面点的初步划分和初始质心选择的加速研究

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0001

V. Romanuke

Abstract A problem of partitioning large datasets of flat points is considered. Known as the centroid-based clustering problem, it is mainly addressed by the k-means algorithm and its modifications. As the k-means performance becomes poorer on large datasets, including the dataset shape stretching, the goal is to study a possibility of improving the centroid-based clustering for such cases. It is quite noticeable on non-sparse datasets that the resulting clusters produced by k-means resemble beehive honeycomb. It is natural for rectangular-shaped datasets because the hexagonal cells make efficient use of space owing to which the sum of the within-cluster squared Euclidean distances to the centroids is approximated to its minimum. Therefore, the lattices of rectangular and hexagonal clusters, consisting of stretched rectangles and regular hexagons, are suggested to be successively applied. Then the initial centroids are calculated by averaging within respective hexagons. These centroids are used as initial seeds to start the k-means algorithm. This ensures faster and more accurate convergence, where at least the expected speedup is 1.7 to 2.1 times by a 0.7 to 0.9 % accuracy gain. The lattice of rectangular clusters applied first makes rather rough but effective partition allowing to optionally run further clustering on parallel processor cores. The lattice of hexagonal clusters applied to every rectangle allows obtaining initial centroids very quickly. Such centroids are far closer to the solution than the initial centroids in the k-means++ algorithm. Another approach to the k-means update, where initial centroids are selected separately within every rectangle hexagons, can be used as well. It is faster than selecting initial centroids across all hexagons but is less accurate. The speedup is 9 to 11 times by a possible accuracy loss of 0.3 %. However, this approach may outperform the k-means algorithm. The speedup increases as both the lattices become denser and the dataset becomes larger reaching 30 to 50 times.

摘要:研究了大型平面点数据集的分区问题。它被称为基于质心的聚类问题，主要由k-means算法及其修正来解决。由于k-means在大型数据集上的性能变得越来越差，包括数据集形状拉伸，我们的目标是研究在这种情况下改进基于质心的聚类的可能性。在非稀疏数据集上，k-means产生的聚类类似于蜂巢。对于矩形数据集来说，这是很自然的，因为六边形单元有效地利用了空间，因此簇内到质心的欧氏距离平方的总和近似于最小值。因此，建议依次应用由拉伸矩形和正六边形组成的矩形和六边形簇的晶格。然后在各自的六边形内平均计算初始质心。这些质心被用作启动k-means算法的初始种子。这确保了更快和更精确的收敛，其中至少预期的加速是1.7到2.1倍，精度增益为0.7到0.9%。首先应用的矩形集群的晶格会产生相当粗糙但有效的分区，允许在并行处理器内核上选择性地运行进一步的集群。应用于每个矩形的六边形簇的晶格可以非常快速地获得初始质心。这样的质心比k-means++算法中的初始质心更接近解。k-means更新的另一种方法，即在每个矩形六边形中分别选择初始质心，也可以使用。它比在所有六边形中选择初始质心要快，但精度较低。由于可能的精度损失0.3%，加速提高了9到11倍。然而，这种方法可能优于k-means算法。随着格子变得更密集，数据集变得更大，加速会增加，达到30到50倍。

{"title":"Speedup of the k-Means Algorithm for Partitioning Large Datasets of Flat Points by a Preliminary Partition and Selecting Initial Centroids","authors":"V. Romanuke","doi":"10.2478/acss-2023-0001","DOIUrl":"https://doi.org/10.2478/acss-2023-0001","url":null,"abstract":"Abstract A problem of partitioning large datasets of flat points is considered. Known as the centroid-based clustering problem, it is mainly addressed by the k-means algorithm and its modifications. As the k-means performance becomes poorer on large datasets, including the dataset shape stretching, the goal is to study a possibility of improving the centroid-based clustering for such cases. It is quite noticeable on non-sparse datasets that the resulting clusters produced by k-means resemble beehive honeycomb. It is natural for rectangular-shaped datasets because the hexagonal cells make efficient use of space owing to which the sum of the within-cluster squared Euclidean distances to the centroids is approximated to its minimum. Therefore, the lattices of rectangular and hexagonal clusters, consisting of stretched rectangles and regular hexagons, are suggested to be successively applied. Then the initial centroids are calculated by averaging within respective hexagons. These centroids are used as initial seeds to start the k-means algorithm. This ensures faster and more accurate convergence, where at least the expected speedup is 1.7 to 2.1 times by a 0.7 to 0.9 % accuracy gain. The lattice of rectangular clusters applied first makes rather rough but effective partition allowing to optionally run further clustering on parallel processor cores. The lattice of hexagonal clusters applied to every rectangle allows obtaining initial centroids very quickly. Such centroids are far closer to the solution than the initial centroids in the k-means++ algorithm. Another approach to the k-means update, where initial centroids are selected separately within every rectangle hexagons, can be used as well. It is faster than selecting initial centroids across all hexagons but is less accurate. The speedup is 9 to 11 times by a possible accuracy loss of 0.3 %. However, this approach may outperform the k-means algorithm. The speedup increases as both the lattices become denser and the dataset becomes larger reaching 30 to 50 times.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"69 6 1","pages":"1 - 12"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83329766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Biometric System Based on the Fusion in Score of Fingerprint and Online Handwritten Signature 基于指纹评分与在线手写签名融合的多模态生物识别系统

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0006

T. Hafs, Hatem Zehir, A. Hafs, A. Nait-Ali

Abstract Multimodal biometrics is the technique of using multiple modalities on a single system. This allows us to overcome the limitations of unimodal systems, such as the inability to acquire data from certain individuals or intentional fraud, while improving recognition performance. In this paper, a study of score normalization and its impact on the performance of the system is performed. The fusion of scores requires prior normalisation before applying a weighted sum fusion that separates impostor and genuine scores into a common interval with close ranges. The experiments were carried out on three biometric databases. The results show that the proposed strategy performs very encouragingly, especially in combination with Empirical Modal Decomposition (EMD). The proposed fusion system shows good performance. The best result is obtained by merging the globality online signature and fingerprint where an EER of 1.69 % is obtained by normalizing the scores according to the Min-Max method.

多模态生物识别技术是在一个系统上使用多个模态的技术。这使我们能够克服单模系统的局限性，例如无法从某些个人或故意欺诈中获取数据，同时提高识别性能。本文研究了分数归一化及其对系统性能的影响。分数的融合需要在应用加权和融合之前进行标准化，该融合将冒名顶替分数和真实分数分离到一个具有接近范围的公共区间。实验在三个生物特征数据库中进行。结果表明，该方法与经验模态分解(EMD)相结合，具有较好的效果。所提出的融合系统具有良好的性能。将全局在线签名与指纹进行融合得到最佳结果，根据最小-最大方法对分数进行归一化，得到的EER值为1.69%。

引用次数: 0

Multichannel Approach for Sentiment Analysis Using Stack of Neural Network with Lexicon Based Padding and Attention Mechanism 基于词库填充和注意机制的神经网络堆栈多通道情感分析方法

IF 1 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems

Pub Date : 2023-06-01 DOI: 10.2478/acss-2023-0013

V. R. Kota, Munisamy Shyamala Devi

Abstract Sentiment analysis (SA) has been an important focus of study in the fields of computational linguistics and data analysis for a decade. Recently, promising results have been achieved when applying DNN models to sentiment analysis tasks. Long short-term memory (LSTM) models, as well as its derivatives like gated recurrent unit (GRU), are becoming increasingly popular in neural architecture used for sentiment analysis. Using these models in the feature extraction layer of a DNN results in a high dimensional feature space, despite the fact that the models can handle sequences of arbitrary length. Another problem with these models is that they weight each feature equally. Natural language processing (NLP) makes use of word embeddings created with word2vec. For many NLP jobs, deep neural networks have become the method of choice. Traditional deep networks are not dependable in storing contextual information, so dealing with sequential data like text and sound was a nightmare for such networks. This research proposes multichannel word embedding and employing stack of neural networks with lexicon-based padding and attention mechanism (MCSNNLA) method for SA. Using convolution neural network (CNN), Bi-LSTM, and the attention process in mind, this approach to sentiment analysis is described. One embedding layer, two convolution layers with max-pooling, one LSTM layer, and two fully connected (FC) layers make up the proposed technique, which is tailored for sentence-level SA. To address the shortcomings of prior SA models for product reviews, the MCSNNLA model integrates the aforementioned sentiment lexicon with deep learning technologies. The MCSNNLA model combines the strengths of emotion lexicons with those of deep learning. To begin, the reviews are processed with the sentiment lexicon in order to enhance the sentiment features. The experimental findings show that the model has the potential to greatly improve text SA performance.

情感分析(SA)是近十年来计算语言学和数据分析领域的一个重要研究热点。最近，在将深度神经网络模型应用于情感分析任务时取得了可喜的结果。长短期记忆(LSTM)模型，以及它的衍生品，如门控循环单元(GRU)，在用于情感分析的神经结构中越来越流行。尽管这些模型可以处理任意长度的序列，但在深度神经网络的特征提取层中使用这些模型会产生高维特征空间。这些模型的另一个问题是，它们对每个特征的权重是相等的。自然语言处理(NLP)利用word2vec创建的词嵌入。对于许多NLP工作，深度神经网络已成为首选方法。传统的深度网络在存储上下文信息方面不可靠，因此处理文本和声音等顺序数据对这种网络来说是一场噩梦。本研究提出了多通道词嵌入和基于词典填充和注意机制的神经网络堆栈(MCSNNLA)方法。利用卷积神经网络(CNN)、Bi-LSTM和注意力过程，描述了这种情感分析方法。一个嵌入层、两个带最大池化的卷积层、一个LSTM层和两个全连接(FC)层组成了该技术，该技术是为句子级自动识别量身定制的。为了解决先前产品评论的SA模型的缺点，MCSNNLA模型将上述情感词典与深度学习技术集成在一起。MCSNNLA模型结合了情感词汇和深度学习的优势。首先，用情感词汇对评论进行处理，以增强情感特征。实验结果表明，该模型具有显著提高文本自动识别性能的潜力。

{"title":"Multichannel Approach for Sentiment Analysis Using Stack of Neural Network with Lexicon Based Padding and Attention Mechanism","authors":"V. R. Kota, Munisamy Shyamala Devi","doi":"10.2478/acss-2023-0013","DOIUrl":"https://doi.org/10.2478/acss-2023-0013","url":null,"abstract":"Abstract Sentiment analysis (SA) has been an important focus of study in the fields of computational linguistics and data analysis for a decade. Recently, promising results have been achieved when applying DNN models to sentiment analysis tasks. Long short-term memory (LSTM) models, as well as its derivatives like gated recurrent unit (GRU), are becoming increasingly popular in neural architecture used for sentiment analysis. Using these models in the feature extraction layer of a DNN results in a high dimensional feature space, despite the fact that the models can handle sequences of arbitrary length. Another problem with these models is that they weight each feature equally. Natural language processing (NLP) makes use of word embeddings created with word2vec. For many NLP jobs, deep neural networks have become the method of choice. Traditional deep networks are not dependable in storing contextual information, so dealing with sequential data like text and sound was a nightmare for such networks. This research proposes multichannel word embedding and employing stack of neural networks with lexicon-based padding and attention mechanism (MCSNNLA) method for SA. Using convolution neural network (CNN), Bi-LSTM, and the attention process in mind, this approach to sentiment analysis is described. One embedding layer, two convolution layers with max-pooling, one LSTM layer, and two fully connected (FC) layers make up the proposed technique, which is tailored for sentence-level SA. To address the shortcomings of prior SA models for product reviews, the MCSNNLA model integrates the aforementioned sentiment lexicon with deep learning technologies. The MCSNNLA model combines the strengths of emotion lexicons with those of deep learning. To begin, the reviews are processed with the sentiment lexicon in order to enhance the sentiment features. The experimental findings show that the model has the potential to greatly improve text SA performance.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"58 1","pages":"137 - 147"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76732839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0