Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2024-02-17 DOI: 10.1016/j.array.2024.100337

Achraf El Bouazzaoui, Abdelkader Hadjoudja, Omar Mouhib, Nazha Cherkaoui

The relentless increase in data volume and complexity necessitates advancements in machine learning methodologies that are more adaptable. In response to this challenge, we present a novel architecture enabling dynamic classifier selection on FPGA platforms. This unique architecture combines hardware accelerators of three distinct classifiers—Support Vector Machines, K-Nearest Neighbors, and Deep Neural Networks—without requiring the combined area footprint of those implementations. It further introduces a hardware-based Accelerator Selector that dynamically selects the most fitting classifier for incoming data based on the K-Nearest Centroid approach. When tested on four different datasets, Our architecture demonstrated improved classification performance, with an accuracy enhancement of up to 8% compared to the software implementations. Besides this enhanced accuracy, it achieved a significant reduction in resource usage, with a decrease of up to 45% compared to a static implementation making it highly efficient in terms of resource utilization and energy consumption on FPGA platforms, paving the way for scalable ML applications. To the best of our knowledge, this work is the first to harness FPGA platforms for dynamic classifier selection.

数据量和复杂性的不断增加要求机器学习方法具有更强的适应性。为了应对这一挑战，我们提出了一种新型架构，可在 FPGA 平台上实现动态分类器选择。这种独特的架构将支持向量机、K-近邻和深度神经网络这三种不同分类器的硬件加速器结合在一起，而不需要这些实现的总面积。它还引入了基于硬件的加速器选择器，可根据 K-Nearest Centroid 方法为输入数据动态选择最合适的分类器。在四个不同的数据集上进行测试时，我们的架构显示出更高的分类性能，与软件实现相比，准确率提高了 8%。除了准确率提高之外，它还显著降低了资源使用率，与静态实现相比降低了 45%，使其在 FPGA 平台上的资源利用率和能耗方面非常高效，为可扩展的 ML 应用铺平了道路。据我们所知，这项工作是首次利用 FPGA 平台进行动态分类器选择。

{"title":"FPGA-based ML adaptive accelerator: A partial reconfiguration approach for optimized ML accelerator utilization","authors":"Achraf El Bouazzaoui, Abdelkader Hadjoudja, Omar Mouhib, Nazha Cherkaoui","doi":"10.1016/j.array.2024.100337","DOIUrl":"https://doi.org/10.1016/j.array.2024.100337","url":null,"abstract":"<div><p>The relentless increase in data volume and complexity necessitates advancements in machine learning methodologies that are more adaptable. In response to this challenge, we present a novel architecture enabling dynamic classifier selection on FPGA platforms. This unique architecture combines hardware accelerators of three distinct classifiers—Support Vector Machines, K-Nearest Neighbors, and Deep Neural Networks—without requiring the combined area footprint of those implementations. It further introduces a hardware-based Accelerator Selector that dynamically selects the most fitting classifier for incoming data based on the K-Nearest Centroid approach. When tested on four different datasets, Our architecture demonstrated improved classification performance, with an accuracy enhancement of up to 8% compared to the software implementations. Besides this enhanced accuracy, it achieved a significant reduction in resource usage, with a decrease of up to 45% compared to a static implementation making it highly efficient in terms of resource utilization and energy consumption on FPGA platforms, paving the way for scalable ML applications. To the best of our knowledge, this work is the first to harness FPGA platforms for dynamic classifier selection.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100337"},"PeriodicalIF":0.0,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000031/pdfft?md5=95f2138b6f79f83ca28d5588ddf2edda&pid=1-s2.0-S2590005624000031-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139901212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robustness and user test on text-based CAPTCHA: Letter segmenting is not too easy or too hard 基于文本的验证码的稳健性和用户测试：字母分割不难也不易

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2024-01-04 DOI: 10.1016/j.array.2024.100335

Maneerut Chatrangsan , Chatpong Tangmanee

Text-based CAPTCHA is widely used as an online security guard, requiring a user to input letters for classifying human and automated software (known as a bot). However, they are still a problem for usability and robustness. This study investigated the effect of letter spacing, disturbing line orientation and disturbing line color on user test and robustness of text-based CAPTCHA. The 240 CAPTCHAS were tested using Thai undergraduate students. The results show that there were no significant differences in user tests for the three factors. For robustness, disturbing line orientation had no significant difference. However, overlapping letter CAPTCHA was the most significantly robust. CAPTCHA with a disturbing line using the same color as the background was more significantly robust than that using the same color as the foreground. Moreover, if no-spacing letter is used, the effect of disturbing line color is statistically significant in robustness while the effect of that became insignificant when a spacing between letter and overlapping letter are used. We recommend that CAPTCHA with no spacing letter and combined with disturbing line using the same color as the background is suitable for users and its robustness. This can be concluded that letter segmenting technique is not too hard for users (passed 88 %) while it is not too easy for bot attacks (passed 39 %). In terms of security, more studies can still be carried on the CAPTCHA to enabled more robustness against new crime technologies. In terms of usability, on other age groups could be consider.

基于文本的验证码被广泛用作在线安全卫士，要求用户输入字母，以便对人类和自动软件（称为机器人）进行分类。然而，它们在可用性和稳健性方面仍存在问题。本研究调查了字母间距、干扰线方向和干扰线颜色对用户测试和基于文本的验证码稳健性的影响。泰国大学生对 240 个验证码进行了测试。结果显示，这三个因素在用户测试中没有明显差异。在稳健性方面，干扰线方向没有明显差异。然而，字母重叠验证码的稳健性最为明显。使用与背景相同颜色的干扰线的验证码比使用与前景相同颜色的验证码具有更明显的稳健性。此外，如果使用无间距字母，干扰线颜色对稳健性的影响在统计上是显著的，而当使用字母间距和字母重叠时，干扰线颜色对稳健性的影响变得不显著。我们建议，不使用字母间距并结合使用与背景相同颜色的干扰线的验证码适用于用户，并且具有稳健性。由此可以得出结论，字母分割技术对用户来说并不难（通过率为 88%），而对僵尸攻击来说并不容易（通过率为 39%）。在安全性方面，还可以对验证码进行更多的研究，以增强其抵御新犯罪技术的能力。在可用性方面，可以考虑其他年龄段的用户。

{"title":"Robustness and user test on text-based CAPTCHA: Letter segmenting is not too easy or too hard","authors":"Maneerut Chatrangsan , Chatpong Tangmanee","doi":"10.1016/j.array.2024.100335","DOIUrl":"https://doi.org/10.1016/j.array.2024.100335","url":null,"abstract":"<div><p>Text-based CAPTCHA is widely used as an online security guard, requiring a user to input letters for classifying human and automated software (known as a bot). However, they are still a problem for usability and robustness. This study investigated the effect of letter spacing, disturbing line orientation and disturbing line color on user test and robustness of text-based CAPTCHA. The 240 CAPTCHAS were tested using Thai undergraduate students. The results show that there were no significant differences in user tests for the three factors. For robustness, disturbing line orientation had no significant difference. However, overlapping letter CAPTCHA was the most significantly robust. CAPTCHA with a disturbing line using the same color as the background was more significantly robust than that using the same color as the foreground. Moreover, if no-spacing letter is used, the effect of disturbing line color is statistically significant in robustness while the effect of that became insignificant when a spacing between letter and overlapping letter are used. We recommend that CAPTCHA with no spacing letter and combined with disturbing line using the same color as the background is suitable for users and its robustness. This can be concluded that letter segmenting technique is not too hard for users (passed 88 %) while it is not too easy for bot attacks (passed 39 %). In terms of security, more studies can still be carried on the CAPTCHA to enabled more robustness against new crime technologies. In terms of usability, on other age groups could be consider.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100335"},"PeriodicalIF":0.0,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000018/pdfft?md5=46ee351b0b9dc5c07b463a6fa4514913&pid=1-s2.0-S2590005624000018-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139111808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Triplet extraction leveraging sentence transformers and dependency parsing 利用句子变换器和依赖关系解析进行三重提取

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2023-12-27 DOI: 10.1016/j.array.2023.100334

Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação

Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.

知识图谱是一种结构化（实体、关系、实体）三元组的工具。构建这些知识图谱的一种可行方法是从非结构化文本中提取三元组。这样做的目的是最大限度地增加有用三元组的数量，同时尽量减少不含信息或无用信息的三元组。该领域的大部分前人工作都使用了监督学习技术，这种技术不仅计算成本高，而且需要标注数据。而现有的无监督方法往往会产生过量的低价值三元组，在提取三元组时会依据经验规则，或者在实体与关系的顺序方面存在困难。为了解决这些问题，本文提出了一种新的模型：无监督依赖解析辅助语义三元提取（UDASTE）利用句子结构，允许定义限制性三元关系类型来生成高质量的三元，同时无需将提取的三元映射到关系模式。这是通过利用预训练的语言模型实现的。UDASTE 在三个数据集上与两个基准模型进行了比较。在所有三个数据集上，UDASTE 的表现都优于基线模型。除了在计算智能背景下实施该模型外，还讨论了其局限性和可能的进一步工作。

{"title":"Triplet extraction leveraging sentence transformers and dependency parsing","authors":"Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação","doi":"10.1016/j.array.2023.100334","DOIUrl":"https://doi.org/10.1016/j.array.2023.100334","url":null,"abstract":"<div><p>Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (<em>UDASTE</em>) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. <em>UDASTE</em> is compared with two baseline models on three datasets. <em>UDASTE</em> outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100334"},"PeriodicalIF":0.0,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000590/pdfft?md5=4d42cb559e16ed40cf0fee56cb903290&pid=1-s2.0-S2590005623000590-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139100961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining a multi-feature neural network with multi-task learning for emergency calls severity prediction 将多特征神经网络与多任务学习相结合，用于紧急呼叫严重性预测

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2023-12-19 DOI: 10.1016/j.array.2023.100333

Marianne Abi Kanaan , Jean-François Couchot , Christophe Guyeux , David Laiymani , Talar Atechian , Rony Darazi

In emergency call centers, operators are required to analyze and prioritize emergency situations prior to any intervention. This allows the team to deploy resources efficiently if needed, and thereby provide the optimal assistance to the victims. The automation of such an analysis remains challenging, given the unpredictable nature of the calls. Therefore, in this study, we describe our attempt in improving an emergency calls processing system’s accuracy in the classification of an emergency’s severity, based on transcriptions of the caller’s speech. Specifically, we first extend the baseline classifier to include additional feature extractors of different modalities of data. These features include detected emotions, time-based features, and the victim’s personal information. Second, we experiment with a multi-task learning approach, in which we attempt to detect the nature of the emergency on the one hand, and improve the severity classification score on the other hand. Additional improvements include the use of a larger dataset and an explainability study of the classifier’s decision-making process. Our best model was able to predict 833 emergency calls’ severity with a 71.27% accuracy, a 5.33% improvement over the baseline model. Moreover, we extended our tool with additional modules that can prove to be useful when handling emergency calls.

在紧急呼叫中心，操作员需要在采取任何干预措施之前对紧急情况进行分析并确定优先次序。这样，团队就能在需要时有效地调配资源，从而为受害者提供最佳援助。鉴于呼叫的不可预测性，这种分析的自动化仍具有挑战性。因此，在本研究中，我们将介绍如何根据来电者的语音转录，提高紧急呼叫处理系统对紧急情况严重程度进行分类的准确性。具体来说，我们首先扩展了基线分类器，增加了不同数据模式的特征提取器。这些特征包括检测到的情绪、基于时间的特征和受害者的个人信息。其次，我们尝试使用多任务学习方法，一方面检测紧急情况的性质，另一方面提高严重程度分类得分。其他改进还包括使用更大的数据集以及对分类器决策过程的可解释性研究。我们的最佳模型能够预测 833 个紧急呼叫的严重程度，准确率为 71.27%，比基准模型提高了 5.33%。此外，我们还对工具进行了扩展，增加了在处理紧急呼叫时可能有用的模块。

{"title":"Combining a multi-feature neural network with multi-task learning for emergency calls severity prediction","authors":"Marianne Abi Kanaan , Jean-François Couchot , Christophe Guyeux , David Laiymani , Talar Atechian , Rony Darazi","doi":"10.1016/j.array.2023.100333","DOIUrl":"10.1016/j.array.2023.100333","url":null,"abstract":"<div><p>In emergency call centers, operators are required to analyze and prioritize emergency situations prior to any intervention. This allows the team to deploy resources efficiently if needed, and thereby provide the optimal assistance to the victims. The automation of such an analysis remains challenging, given the unpredictable nature of the calls. Therefore, in this study, we describe our attempt in improving an emergency calls processing system’s accuracy in the classification of an emergency’s severity, based on transcriptions of the caller’s speech. Specifically, we first extend the baseline classifier to include additional feature extractors of different modalities of data. These features include detected emotions, time-based features, and the victim’s personal information. Second, we experiment with a multi-task learning approach, in which we attempt to detect the nature of the emergency on the one hand, and improve the severity classification score on the other hand. Additional improvements include the use of a larger dataset and an explainability study of the classifier’s decision-making process. Our best model was able to predict 833 emergency calls’ severity with a 71.27% accuracy, a 5.33% improvement over the baseline model. Moreover, we extended our tool with additional modules that can prove to be useful when handling emergency calls.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100333"},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000589/pdfft?md5=2d223cfef124a38eb074b282afcf31c6&pid=1-s2.0-S2590005623000589-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139016983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

APIE: An information extraction module designed based on the pipeline method api:基于流水线方法设计的信息抽取模块

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2023-12-01 DOI: 10.1016/j.array.2023.100331

Xu Jiang , Yurong Cheng , Siyi Zhang , Juan Wang , Baoquan Ma

Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design An Pipeline method Information Extraction module called APIE, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.

信息抽取(Information extraction, IE)旨在从非结构化文本中发现和提取有价值的信息。该问题可以分解为两个子任务:命名实体识别(NER)和关系提取(RE)。尽管IE问题已经研究多年，但大多数工作都集中在联合建模这两个子任务上，要么将它们投射到一个结构化的预测框架中，要么通过共享表示执行多任务学习。然而，由于实体模型和关系模型的上下文表示本质上捕获不同的特征信息，共享一个编码器来捕获同一空间中两个子任务所需的信息将损害模型的准确性。最近的研究(Zhong and Chen, 2020)也证明了通过管道方法分别为NER和RE任务使用两个单独的编码器是有效的，该模型在精度上超过了之前所有的联合模型。因此，本文设计了一个管道方法信息提取模块APIE, APIE结合了管道方法和联合方法的优点，具有更高的准确性和强大的推理能力。具体来说，我们设计了一个基于注意机制的多层次特征NER模型和一个基于局部上下文池的文档级RE模型。为了证明我们提出的方法的有效性，我们在多个数据集上进行了测试。大量的实验结果表明，我们提出的模型优于最先进的方法，并提高了准确性和推理能力。

{"title":"APIE: An information extraction module designed based on the pipeline method","authors":"Xu Jiang , Yurong Cheng , Siyi Zhang , Juan Wang , Baoquan Ma","doi":"10.1016/j.array.2023.100331","DOIUrl":"https://doi.org/10.1016/j.array.2023.100331","url":null,"abstract":"<div><p>Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design <strong>A</strong>n <strong>P</strong>ipeline method <strong>I</strong>nformation <strong>E</strong>xtraction module called <strong>APIE</strong>, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100331"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000565/pdfft?md5=1f053c973dea03b6b99efcb063a40e93&pid=1-s2.0-S2590005623000565-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138501687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive analysis of feature ranking-based fish disease recognition 基于特征排序的鱼病识别综合分析

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2023-12-01 DOI: 10.1016/j.array.2023.100329

Aditya Rajbongshi , Rashiduzzaman Shakil , Bonna Akter , Munira Akter Lata , Md. Mahbubul Alam Joarder

In recent years, the field of emerging computer vision systems has witnessed significant advancements in automated disease diagnosis through the utilization of vision-oriented technology. This article proposes an optimal approach for detecting the presence of ailments in Rohu fish. The aims of our research is to identify the most significant features based on Analysis of Variance (ANOVA) feature selection and evaluate the best performance among all features for Rohu fish disease recognition. At the outset, diverse techniques for image preprocessing were employed on the acquired images. The region affected by the disease was partitioned through utilization of the K-means clustering algorithm. Subsequently, 10 distinct statistical and Gray-Level Co-occurrence Matrix (GLCM) features were extracted after the image segmentation. The ANOVA feature selection technique was employed to prioritize the most significant features N (where 5 $\leq$ N $\leq$ 10) from the pool of 10 categories. The Synthetic Minority Oversampling Technique, often known as SMOTE, was applied to solve class imbalance problem. After conducting training and testing on nine different machine learning (ML) classifiers, an evaluation was performed to estimate the performance of each classifier using eight various performance metrics. Additionally, a receiver operating characteristic (ROC) curve was generated. The classifier that utilized the Enable Hist Gradient Boosting algorithm and selected the top 9 features demonstrated superior performance compared to the other eight models, achieving the highest accuracy rate of 88.81%. In conclusion, we have demonstrated that the feature selection process reduces the computational cost.

近年来，新兴计算机视觉系统领域通过利用视觉导向技术，在自动疾病诊断方面取得了重大进展。本文提出了一种检测罗汉鱼是否患病的最佳方法。我们的研究目的是基于方差分析（ANOVA）特征选择找出最重要的特征，并评估所有特征中用于识别罗汉鱼疾病的最佳性能。首先，对获取的图像采用了多种图像预处理技术。通过使用 K-means 聚类算法划分受疾病影响的区域。随后，在图像分割后提取了 10 个不同的统计和灰度共现矩阵（GLCM）特征。采用方差分析特征选择技术，从 10 个类别中优先选择最重要的特征 N（其中 5 ≤ N ≤ 10）。合成少数群体过度采样技术（通常称为 SMOTE）被用于解决类不平衡问题。在对九种不同的机器学习（ML）分类器进行训练和测试后，使用八种不同的性能指标对每种分类器的性能进行了评估。此外，还生成了接收者操作特征曲线（ROC）。与其他 8 个模型相比，使用 Enable Hist 梯度提升算法并选择前 9 个特征的分类器表现出色，准确率最高，达到 88.81%。总之，我们证明了特征选择过程可以降低计算成本。

{"title":"A comprehensive analysis of feature ranking-based fish disease recognition","authors":"Aditya Rajbongshi , Rashiduzzaman Shakil , Bonna Akter , Munira Akter Lata , Md. Mahbubul Alam Joarder","doi":"10.1016/j.array.2023.100329","DOIUrl":"https://doi.org/10.1016/j.array.2023.100329","url":null,"abstract":"<div><p>In recent years, the field of emerging computer vision systems has witnessed significant advancements in automated disease diagnosis through the utilization of vision-oriented technology. This article proposes an optimal approach for detecting the presence of ailments in Rohu fish. The aims of our research is to identify the most significant features based on Analysis of Variance (ANOVA) feature selection and evaluate the best performance among all features for Rohu fish disease recognition. At the outset, diverse techniques for image preprocessing were employed on the acquired images. The region affected by the disease was partitioned through utilization of the K-means clustering algorithm. Subsequently, 10 distinct statistical and Gray-Level Co-occurrence Matrix (GLCM) features were extracted after the image segmentation. The ANOVA feature selection technique was employed to prioritize the most significant features N (where 5 <span><math><mo>≤</mo></math></span> N <span><math><mo>≤</mo></math></span> 10) from the pool of 10 categories. The Synthetic Minority Oversampling Technique, often known as SMOTE, was applied to solve class imbalance problem. After conducting training and testing on nine different machine learning (ML) classifiers, an evaluation was performed to estimate the performance of each classifier using eight various performance metrics. Additionally, a receiver operating characteristic (ROC) curve was generated. The classifier that utilized the Enable Hist Gradient Boosting algorithm and selected the top 9 features demonstrated superior performance compared to the other eight models, achieving the highest accuracy rate of 88.81%. In conclusion, we have demonstrated that the feature selection process reduces the computational cost.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100329"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000541/pdfft?md5=76f0417dbf9f956f909e5d5cc71ad2ca&pid=1-s2.0-S2590005623000541-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138557253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards efficient multi-granular anomaly detection in distributed systems 在分布式系统中实现高效的多粒度异常检测

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2023-11-30 DOI: 10.1016/j.array.2023.100330

Chao Tu , Ming Chen , Liwen Zhang , Long Zhao , Di Wu , Ziyang Yue

Distributed systems often consist of a large number of computing and data nodes, which makes it both significant and challenging to detect anomalies efficiently and accurately in distributed systems. Generally, we not only need to determine whether an anomaly has occurred at a certain time (the time level anomaly), but also need to detect whether anomalies occur in a node (the node level anomaly) and which key performance indicators (KPIs) are anomalies (the KPI level anomaly), that is, to perform multi-granular anomaly detection in distributed systems. However, most existing algorithms only focus on the time level anomalies in centralized systems. For distributed systems, a simple way is to train a model for each node and then detect anomalies independently. An obvious disadvantage is that the cost of model inferring is unacceptable in practice. Therefore, we propose a Multi-Granular Anomaly Detection (MGAD) framework that utilizes a tree structure to perform anomaly detection hierarchically from the node level to time and KPI levels, which greatly reduces the cost of model inferring. Specifically, at the time level, we propose a novel model named Masked Sliding Spatial-Temporal Adversarial Network (MS2TAN) that considers spatial and temporal dependencies simultaneously. Extensive experiments with real-world data offer insights into the performance of the proposals, showing that MGAD is at least 5 $\times$ faster for inferring when compared with the baselines.

分布式系统通常由大量计算和数据节点组成，因此在分布式系统中高效、准确地检测异常既重要又具有挑战性。一般来说，我们不仅需要确定某个时间是否发生了异常（时间级异常），还需要检测某个节点是否发生了异常（节点级异常）以及哪些关键性能指标（KPI）是异常的（KPI 级异常），即在分布式系统中进行多粒度异常检测。然而，现有算法大多只关注集中式系统中的时间级异常。对于分布式系统，一种简单的方法是为每个节点训练一个模型，然后独立检测异常。一个明显的缺点是，模型推断的成本在实践中是不可接受的。因此，我们提出了一个多粒度异常检测（MGAD）框架，利用树形结构从节点级到时间级和关键绩效指标级分层执行异常检测，从而大大降低了模型推断的成本。具体来说，在时间层面，我们提出了一种名为 "屏蔽滑动时空对抗网络"（MS2TAN）的新型模型，该模型同时考虑了空间和时间依赖性。利用真实世界数据进行的大量实验深入分析了这些建议的性能，结果表明，与基线相比，MGAD 的推断速度至少快 5 倍。

{"title":"Towards efficient multi-granular anomaly detection in distributed systems","authors":"Chao Tu , Ming Chen , Liwen Zhang , Long Zhao , Di Wu , Ziyang Yue","doi":"10.1016/j.array.2023.100330","DOIUrl":"https://doi.org/10.1016/j.array.2023.100330","url":null,"abstract":"<div><p>Distributed systems often consist of a large number of computing and data nodes, which makes it both significant and challenging to detect anomalies efficiently and accurately in distributed systems. Generally, we not only need to determine whether an anomaly has occurred at a certain time (the time level anomaly), but also need to detect whether anomalies occur in a node (the node level anomaly) and which key performance indicators (KPIs) are anomalies (the KPI level anomaly), that is, to perform multi-granular anomaly detection in distributed systems. However, most existing algorithms only focus on the time level anomalies in centralized systems. For distributed systems, a simple way is to train a model for each node and then detect anomalies independently. An obvious disadvantage is that the cost of model inferring is unacceptable in practice. Therefore, we propose a <strong>M</strong>ulti-<strong>G</strong>ranular <strong>A</strong>nomaly <strong>D</strong>etection (MGAD) framework that utilizes a tree structure to perform anomaly detection hierarchically from the node level to time and KPI levels, which greatly reduces the cost of model inferring. Specifically, at the time level, we propose a novel model named <strong>M</strong>asked <strong>S</strong>liding <strong>S</strong>patial-<strong>T</strong>emporal <strong>A</strong>dversarial <strong>N</strong>etwork (MS2TAN) that considers spatial and temporal dependencies simultaneously. Extensive experiments with real-world data offer insights into the performance of the proposals, showing that MGAD is at least 5<span><math><mo>×</mo></math></span> faster for inferring when compared with the baselines.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100330"},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000553/pdfft?md5=a8b79cf32296c7cea873bc6dab0e3b2b&pid=1-s2.0-S2590005623000553-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138557230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Special issue “Towards a higher education of the future: Transformational roles of edge intelligence” 特刊 "迈向未来的高等教育：边缘智能的变革作用" 特刊

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2023-11-30 DOI: 10.1016/j.array.2023.100332

Ruchi Doshi, Yu-Chen Hu, Lalit Garg, Temitayo Fagbola

引用次数: 0

Addressing agricultural challenges: An identification of best feature selection technique for dragon fruit disease recognition 解决农业挑战:火龙果病害识别的最佳特征选择技术的确定

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2023-11-02 DOI: 10.1016/j.array.2023.100326

Rashiduzzaman Shakil , Shawn Islam , Yeasir Arafat Shohan , Anonto Mia , Aditya Rajbongshi , Md Habibur Rahman , Bonna Akter

Dragon fruit is a prominent substance in global agriculture. Despite this, it is gaining popularity and is a viable solution in resource-poor, environmentally degraded areas because of its many health benefits. Nevertheless, many dragon fruit plantations have been impacted by the disease, reducing their yield, and the detection system is still conventional. Farmers’ lack of disease identification and management expertise diminished crop quality and products. As a result, little research was carried out to assist those specific farmers requiring adequate agricultural support. This research has proposed an autonomous agro-based system to recognize dragon diseases using in-depth analysis of feature selection techniques. After the collection of real-time images of the dragon, the images are preprocessed using various image-processing techniques. The two important features are retrieved after segmentation. The analysis of variance (ANOVA) and the least absolute shrinkage and selection operator (LASSO) are used as feature selection techniques to assess the feature rank based on the mutual score. To analyze the effectiveness of the machine learning algorithms that were used, six distinct machine learning classifiers were applied to the top-ranked feature sets, and their performance was measured using seven distinct performance evaluation metrics. AdaBoost and Random Forest classifiers for the LASSO feature ranking approach got the maximum accuracy, which is 96.29%, based on a comparison of classifiers based on the ANOVA and LASSO feature set. Despite this, we have optimized the computational resources of each classifier for the LASSO feature set.

火龙果是全球农业的重要原料。尽管如此，它越来越受欢迎，并且由于其许多健康益处，在资源贫乏、环境退化的地区是一种可行的解决办法。然而，许多火龙果种植园受到这种疾病的影响，产量下降，检测系统仍然是传统的。农民缺乏疾病识别和管理专业知识，降低了作物质量和产品。因此，很少进行研究以协助需要充分农业支助的特定农民。本研究提出了一种基于农业的龙病自主识别系统，该系统采用深度分析特征选择技术。在采集到龙的实时图像后，使用各种图像处理技术对图像进行预处理。分割后提取两个重要特征。采用方差分析(ANOVA)和最小绝对收缩和选择算子(LASSO)作为特征选择技术，基于互分评估特征等级。为了分析所使用的机器学习算法的有效性，将六个不同的机器学习分类器应用于排名靠前的特征集，并使用七个不同的性能评估指标来衡量它们的性能。通过对基于ANOVA和LASSO特征集的分类器进行比较，AdaBoost和Random Forest分类器对LASSO特征排序方法的准确率最高，为96.29%。尽管如此，我们已经为LASSO特征集优化了每个分类器的计算资源。

{"title":"Addressing agricultural challenges: An identification of best feature selection technique for dragon fruit disease recognition","authors":"Rashiduzzaman Shakil , Shawn Islam , Yeasir Arafat Shohan , Anonto Mia , Aditya Rajbongshi , Md Habibur Rahman , Bonna Akter","doi":"10.1016/j.array.2023.100326","DOIUrl":"https://doi.org/10.1016/j.array.2023.100326","url":null,"abstract":"<div><p>Dragon fruit is a prominent substance in global agriculture. Despite this, it is gaining popularity and is a viable solution in resource-poor, environmentally degraded areas because of its many health benefits. Nevertheless, many dragon fruit plantations have been impacted by the disease, reducing their yield, and the detection system is still conventional. Farmers’ lack of disease identification and management expertise diminished crop quality and products. As a result, little research was carried out to assist those specific farmers requiring adequate agricultural support. This research has proposed an autonomous agro-based system to recognize dragon diseases using in-depth analysis of feature selection techniques. After the collection of real-time images of the dragon, the images are preprocessed using various image-processing techniques. The two important features are retrieved after segmentation. The analysis of variance (ANOVA) and the least absolute shrinkage and selection operator (LASSO) are used as feature selection techniques to assess the feature rank based on the mutual score. To analyze the effectiveness of the machine learning algorithms that were used, six distinct machine learning classifiers were applied to the top-ranked feature sets, and their performance was measured using seven distinct performance evaluation metrics. AdaBoost and Random Forest classifiers for the LASSO feature ranking approach got the maximum accuracy, which is 96.29%, based on a comparison of classifiers based on the ANOVA and LASSO feature set. Despite this, we have optimized the computational resources of each classifier for the LASSO feature set.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"20 ","pages":"Article 100326"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000516/pdfft?md5=ba7d9ce33800b2e7410939f1cf4f3973&pid=1-s2.0-S2590005623000516-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138086806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On International Chinese Education Index Ranking in a Global Perspective 全球视野下的国际汉语教育指数排名研究

Q2 COMPUTER SCIENCE, THEORY & METHODS

Array

Pub Date : 2023-11-02 DOI: 10.1016/j.array.2023.100328

Hui Chen , Zhengze Li , Xue Wang

The prominence of the Chinese language as a United Nations official language has sparked significant interest, leading to this research on international Chinese education (ICE). This study has a triple aim: firstly, to create indicators for monitoring ICE; secondly, to use these indicators to assess ICE development across nations; and thirdly, to highlight disparities and potential influencing factors for informed policy-making.

To facilitate indicator creation, we introduce an ICE index ranking system, evaluating 24 aspects grouped into three dimensions: Localization, Specialization, and Collaboration. These dimensions further categorize the 24 aspects into seven level-2 indicators, providing insights into global Chinese language education. After a thorough literature review and considering data availability, these indicators rank ICE in 153 countries.

For evaluation, we objectively assess indicators by assigning weights based on expert opinions. The results demonstrate that the categorized and ranked indicators offer valuable insights into global ICE development. Cluster analysis reveals diverse patterns of ICE development, with distinct areas requiring improvement across nations.

To illustrate further, we conduct a correlation analysis using an external dataset encompassing five main components: Economic Ties, Geographical Distance, Cultural Ties, Government Policies, and China's Image. The findings indicate that countries with strong economic ties to China tend to excel in all three ICE dimensions. Additionally, nations with higher numbers of tourists visiting China generally achieve higher ICE scores.

汉语作为联合国官方语言的突出地位引起了人们极大的兴趣，导致了对国际汉语教育(ICE)的研究。本研究有三个目的:首先，创建监测ICE的指标;其次，使用这些指标来评估各国的ICE发展情况;第三，突出差异和潜在的影响因素，促进知情决策。为了方便创建指标，我们引入了ICE指数排名系统，评估了24个方面，分为三个维度:本地化、专业化和协作。这些维度进一步将24个方面划分为7个二级指标，为全球汉语教育提供了见解。经过全面的文献回顾并考虑到数据的可用性，这些指标对153个国家的ICE进行了排名。在评价方面，我们根据专家意见分配权重，客观地评价指标。结果表明，分类和排名指标为全球ICE发展提供了有价值的见解。聚类分析揭示了ICE发展的不同模式，各国有不同的领域需要改进。为了进一步说明，我们使用包含五个主要组成部分的外部数据集进行了相关性分析:经济联系、地理距离、文化联系、政府政策和中国形象。研究结果表明，与中国经济关系密切的国家往往在所有三个ICE维度上都表现出色。此外，访问中国的游客数量较多的国家通常会获得更高的ICE分数。

{"title":"On International Chinese Education Index Ranking in a Global Perspective","authors":"Hui Chen , Zhengze Li , Xue Wang","doi":"10.1016/j.array.2023.100328","DOIUrl":"https://doi.org/10.1016/j.array.2023.100328","url":null,"abstract":"<div><p>The prominence of the Chinese language as a United Nations official language has sparked significant interest, leading to this research on international Chinese education (ICE). This study has a triple aim: firstly, to create indicators for monitoring ICE; secondly, to use these indicators to assess ICE development across nations; and thirdly, to highlight disparities and potential influencing factors for informed policy-making.</p><p>To facilitate indicator creation, we introduce an ICE index ranking system, evaluating 24 aspects grouped into three dimensions: Localization, Specialization, and Collaboration. These dimensions further categorize the 24 aspects into seven level-2 indicators, providing insights into global Chinese language education. After a thorough literature review and considering data availability, these indicators rank ICE in 153 countries.</p><p>For evaluation, we objectively assess indicators by assigning weights based on expert opinions. The results demonstrate that the categorized and ranked indicators offer valuable insights into global ICE development. Cluster analysis reveals diverse patterns of ICE development, with distinct areas requiring improvement across nations.</p><p>To illustrate further, we conduct a correlation analysis using an external dataset encompassing five main components: Economic Ties, Geographical Distance, Cultural Ties, Government Policies, and China's Image. The findings indicate that countries with strong economic ties to China tend to excel in all three ICE dimensions. Additionally, nations with higher numbers of tourists visiting China generally achieve higher ICE scores.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"20 ","pages":"Article 100328"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S259000562300053X/pdfft?md5=8abf86d9cf6a88e981240ff29925d406&pid=1-s2.0-S259000562300053X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138086810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Array最新文献