Pub Date : 2024-02-17DOI: 10.1016/j.array.2024.100337
Achraf El Bouazzaoui, Abdelkader Hadjoudja, Omar Mouhib, Nazha Cherkaoui
The relentless increase in data volume and complexity necessitates advancements in machine learning methodologies that are more adaptable. In response to this challenge, we present a novel architecture enabling dynamic classifier selection on FPGA platforms. This unique architecture combines hardware accelerators of three distinct classifiers—Support Vector Machines, K-Nearest Neighbors, and Deep Neural Networks—without requiring the combined area footprint of those implementations. It further introduces a hardware-based Accelerator Selector that dynamically selects the most fitting classifier for incoming data based on the K-Nearest Centroid approach. When tested on four different datasets, Our architecture demonstrated improved classification performance, with an accuracy enhancement of up to 8% compared to the software implementations. Besides this enhanced accuracy, it achieved a significant reduction in resource usage, with a decrease of up to 45% compared to a static implementation making it highly efficient in terms of resource utilization and energy consumption on FPGA platforms, paving the way for scalable ML applications. To the best of our knowledge, this work is the first to harness FPGA platforms for dynamic classifier selection.
{"title":"FPGA-based ML adaptive accelerator: A partial reconfiguration approach for optimized ML accelerator utilization","authors":"Achraf El Bouazzaoui, Abdelkader Hadjoudja, Omar Mouhib, Nazha Cherkaoui","doi":"10.1016/j.array.2024.100337","DOIUrl":"https://doi.org/10.1016/j.array.2024.100337","url":null,"abstract":"<div><p>The relentless increase in data volume and complexity necessitates advancements in machine learning methodologies that are more adaptable. In response to this challenge, we present a novel architecture enabling dynamic classifier selection on FPGA platforms. This unique architecture combines hardware accelerators of three distinct classifiers—Support Vector Machines, K-Nearest Neighbors, and Deep Neural Networks—without requiring the combined area footprint of those implementations. It further introduces a hardware-based Accelerator Selector that dynamically selects the most fitting classifier for incoming data based on the K-Nearest Centroid approach. When tested on four different datasets, Our architecture demonstrated improved classification performance, with an accuracy enhancement of up to 8% compared to the software implementations. Besides this enhanced accuracy, it achieved a significant reduction in resource usage, with a decrease of up to 45% compared to a static implementation making it highly efficient in terms of resource utilization and energy consumption on FPGA platforms, paving the way for scalable ML applications. To the best of our knowledge, this work is the first to harness FPGA platforms for dynamic classifier selection.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100337"},"PeriodicalIF":0.0,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000031/pdfft?md5=95f2138b6f79f83ca28d5588ddf2edda&pid=1-s2.0-S2590005624000031-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139901212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-04DOI: 10.1016/j.array.2024.100335
Maneerut Chatrangsan , Chatpong Tangmanee
Text-based CAPTCHA is widely used as an online security guard, requiring a user to input letters for classifying human and automated software (known as a bot). However, they are still a problem for usability and robustness. This study investigated the effect of letter spacing, disturbing line orientation and disturbing line color on user test and robustness of text-based CAPTCHA. The 240 CAPTCHAS were tested using Thai undergraduate students. The results show that there were no significant differences in user tests for the three factors. For robustness, disturbing line orientation had no significant difference. However, overlapping letter CAPTCHA was the most significantly robust. CAPTCHA with a disturbing line using the same color as the background was more significantly robust than that using the same color as the foreground. Moreover, if no-spacing letter is used, the effect of disturbing line color is statistically significant in robustness while the effect of that became insignificant when a spacing between letter and overlapping letter are used. We recommend that CAPTCHA with no spacing letter and combined with disturbing line using the same color as the background is suitable for users and its robustness. This can be concluded that letter segmenting technique is not too hard for users (passed 88 %) while it is not too easy for bot attacks (passed 39 %). In terms of security, more studies can still be carried on the CAPTCHA to enabled more robustness against new crime technologies. In terms of usability, on other age groups could be consider.
{"title":"Robustness and user test on text-based CAPTCHA: Letter segmenting is not too easy or too hard","authors":"Maneerut Chatrangsan , Chatpong Tangmanee","doi":"10.1016/j.array.2024.100335","DOIUrl":"https://doi.org/10.1016/j.array.2024.100335","url":null,"abstract":"<div><p>Text-based CAPTCHA is widely used as an online security guard, requiring a user to input letters for classifying human and automated software (known as a bot). However, they are still a problem for usability and robustness. This study investigated the effect of letter spacing, disturbing line orientation and disturbing line color on user test and robustness of text-based CAPTCHA. The 240 CAPTCHAS were tested using Thai undergraduate students. The results show that there were no significant differences in user tests for the three factors. For robustness, disturbing line orientation had no significant difference. However, overlapping letter CAPTCHA was the most significantly robust. CAPTCHA with a disturbing line using the same color as the background was more significantly robust than that using the same color as the foreground. Moreover, if no-spacing letter is used, the effect of disturbing line color is statistically significant in robustness while the effect of that became insignificant when a spacing between letter and overlapping letter are used. We recommend that CAPTCHA with no spacing letter and combined with disturbing line using the same color as the background is suitable for users and its robustness. This can be concluded that letter segmenting technique is not too hard for users (passed 88 %) while it is not too easy for bot attacks (passed 39 %). In terms of security, more studies can still be carried on the CAPTCHA to enabled more robustness against new crime technologies. In terms of usability, on other age groups could be consider.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100335"},"PeriodicalIF":0.0,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000018/pdfft?md5=46ee351b0b9dc5c07b463a6fa4514913&pid=1-s2.0-S2590005624000018-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139111808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-27DOI: 10.1016/j.array.2023.100334
Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação
Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.
{"title":"Triplet extraction leveraging sentence transformers and dependency parsing","authors":"Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação","doi":"10.1016/j.array.2023.100334","DOIUrl":"https://doi.org/10.1016/j.array.2023.100334","url":null,"abstract":"<div><p>Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (<em>UDASTE</em>) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. <em>UDASTE</em> is compared with two baseline models on three datasets. <em>UDASTE</em> outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100334"},"PeriodicalIF":0.0,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000590/pdfft?md5=4d42cb559e16ed40cf0fee56cb903290&pid=1-s2.0-S2590005623000590-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139100961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-19DOI: 10.1016/j.array.2023.100333
Marianne Abi Kanaan , Jean-François Couchot , Christophe Guyeux , David Laiymani , Talar Atechian , Rony Darazi
In emergency call centers, operators are required to analyze and prioritize emergency situations prior to any intervention. This allows the team to deploy resources efficiently if needed, and thereby provide the optimal assistance to the victims. The automation of such an analysis remains challenging, given the unpredictable nature of the calls. Therefore, in this study, we describe our attempt in improving an emergency calls processing system’s accuracy in the classification of an emergency’s severity, based on transcriptions of the caller’s speech. Specifically, we first extend the baseline classifier to include additional feature extractors of different modalities of data. These features include detected emotions, time-based features, and the victim’s personal information. Second, we experiment with a multi-task learning approach, in which we attempt to detect the nature of the emergency on the one hand, and improve the severity classification score on the other hand. Additional improvements include the use of a larger dataset and an explainability study of the classifier’s decision-making process. Our best model was able to predict 833 emergency calls’ severity with a 71.27% accuracy, a 5.33% improvement over the baseline model. Moreover, we extended our tool with additional modules that can prove to be useful when handling emergency calls.
{"title":"Combining a multi-feature neural network with multi-task learning for emergency calls severity prediction","authors":"Marianne Abi Kanaan , Jean-François Couchot , Christophe Guyeux , David Laiymani , Talar Atechian , Rony Darazi","doi":"10.1016/j.array.2023.100333","DOIUrl":"10.1016/j.array.2023.100333","url":null,"abstract":"<div><p>In emergency call centers, operators are required to analyze and prioritize emergency situations prior to any intervention. This allows the team to deploy resources efficiently if needed, and thereby provide the optimal assistance to the victims. The automation of such an analysis remains challenging, given the unpredictable nature of the calls. Therefore, in this study, we describe our attempt in improving an emergency calls processing system’s accuracy in the classification of an emergency’s severity, based on transcriptions of the caller’s speech. Specifically, we first extend the baseline classifier to include additional feature extractors of different modalities of data. These features include detected emotions, time-based features, and the victim’s personal information. Second, we experiment with a multi-task learning approach, in which we attempt to detect the nature of the emergency on the one hand, and improve the severity classification score on the other hand. Additional improvements include the use of a larger dataset and an explainability study of the classifier’s decision-making process. Our best model was able to predict 833 emergency calls’ severity with a 71.27% accuracy, a 5.33% improvement over the baseline model. Moreover, we extended our tool with additional modules that can prove to be useful when handling emergency calls.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100333"},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000589/pdfft?md5=2d223cfef124a38eb074b282afcf31c6&pid=1-s2.0-S2590005623000589-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139016983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1016/j.array.2023.100331
Xu Jiang , Yurong Cheng , Siyi Zhang , Juan Wang , Baoquan Ma
Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design An Pipeline method Information Extraction module called APIE, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.
信息抽取(Information extraction, IE)旨在从非结构化文本中发现和提取有价值的信息。该问题可以分解为两个子任务:命名实体识别(NER)和关系提取(RE)。尽管IE问题已经研究多年,但大多数工作都集中在联合建模这两个子任务上,要么将它们投射到一个结构化的预测框架中,要么通过共享表示执行多任务学习。然而,由于实体模型和关系模型的上下文表示本质上捕获不同的特征信息,共享一个编码器来捕获同一空间中两个子任务所需的信息将损害模型的准确性。最近的研究(Zhong and Chen, 2020)也证明了通过管道方法分别为NER和RE任务使用两个单独的编码器是有效的,该模型在精度上超过了之前所有的联合模型。因此,本文设计了一个管道方法信息提取模块APIE, APIE结合了管道方法和联合方法的优点,具有更高的准确性和强大的推理能力。具体来说,我们设计了一个基于注意机制的多层次特征NER模型和一个基于局部上下文池的文档级RE模型。为了证明我们提出的方法的有效性,我们在多个数据集上进行了测试。大量的实验结果表明,我们提出的模型优于最先进的方法,并提高了准确性和推理能力。
{"title":"APIE: An information extraction module designed based on the pipeline method","authors":"Xu Jiang , Yurong Cheng , Siyi Zhang , Juan Wang , Baoquan Ma","doi":"10.1016/j.array.2023.100331","DOIUrl":"https://doi.org/10.1016/j.array.2023.100331","url":null,"abstract":"<div><p>Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design <strong>A</strong>n <strong>P</strong>ipeline method <strong>I</strong>nformation <strong>E</strong>xtraction module called <strong>APIE</strong>, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100331"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000565/pdfft?md5=1f053c973dea03b6b99efcb063a40e93&pid=1-s2.0-S2590005623000565-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138501687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1016/j.array.2023.100329
Aditya Rajbongshi , Rashiduzzaman Shakil , Bonna Akter , Munira Akter Lata , Md. Mahbubul Alam Joarder
In recent years, the field of emerging computer vision systems has witnessed significant advancements in automated disease diagnosis through the utilization of vision-oriented technology. This article proposes an optimal approach for detecting the presence of ailments in Rohu fish. The aims of our research is to identify the most significant features based on Analysis of Variance (ANOVA) feature selection and evaluate the best performance among all features for Rohu fish disease recognition. At the outset, diverse techniques for image preprocessing were employed on the acquired images. The region affected by the disease was partitioned through utilization of the K-means clustering algorithm. Subsequently, 10 distinct statistical and Gray-Level Co-occurrence Matrix (GLCM) features were extracted after the image segmentation. The ANOVA feature selection technique was employed to prioritize the most significant features N (where 5 N 10) from the pool of 10 categories. The Synthetic Minority Oversampling Technique, often known as SMOTE, was applied to solve class imbalance problem. After conducting training and testing on nine different machine learning (ML) classifiers, an evaluation was performed to estimate the performance of each classifier using eight various performance metrics. Additionally, a receiver operating characteristic (ROC) curve was generated. The classifier that utilized the Enable Hist Gradient Boosting algorithm and selected the top 9 features demonstrated superior performance compared to the other eight models, achieving the highest accuracy rate of 88.81%. In conclusion, we have demonstrated that the feature selection process reduces the computational cost.
{"title":"A comprehensive analysis of feature ranking-based fish disease recognition","authors":"Aditya Rajbongshi , Rashiduzzaman Shakil , Bonna Akter , Munira Akter Lata , Md. Mahbubul Alam Joarder","doi":"10.1016/j.array.2023.100329","DOIUrl":"https://doi.org/10.1016/j.array.2023.100329","url":null,"abstract":"<div><p>In recent years, the field of emerging computer vision systems has witnessed significant advancements in automated disease diagnosis through the utilization of vision-oriented technology. This article proposes an optimal approach for detecting the presence of ailments in Rohu fish. The aims of our research is to identify the most significant features based on Analysis of Variance (ANOVA) feature selection and evaluate the best performance among all features for Rohu fish disease recognition. At the outset, diverse techniques for image preprocessing were employed on the acquired images. The region affected by the disease was partitioned through utilization of the K-means clustering algorithm. Subsequently, 10 distinct statistical and Gray-Level Co-occurrence Matrix (GLCM) features were extracted after the image segmentation. The ANOVA feature selection technique was employed to prioritize the most significant features N (where 5 <span><math><mo>≤</mo></math></span> N <span><math><mo>≤</mo></math></span> 10) from the pool of 10 categories. The Synthetic Minority Oversampling Technique, often known as SMOTE, was applied to solve class imbalance problem. After conducting training and testing on nine different machine learning (ML) classifiers, an evaluation was performed to estimate the performance of each classifier using eight various performance metrics. Additionally, a receiver operating characteristic (ROC) curve was generated. The classifier that utilized the Enable Hist Gradient Boosting algorithm and selected the top 9 features demonstrated superior performance compared to the other eight models, achieving the highest accuracy rate of 88.81%. In conclusion, we have demonstrated that the feature selection process reduces the computational cost.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100329"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000541/pdfft?md5=76f0417dbf9f956f909e5d5cc71ad2ca&pid=1-s2.0-S2590005623000541-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138557253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1016/j.array.2023.100330
Chao Tu , Ming Chen , Liwen Zhang , Long Zhao , Di Wu , Ziyang Yue
Distributed systems often consist of a large number of computing and data nodes, which makes it both significant and challenging to detect anomalies efficiently and accurately in distributed systems. Generally, we not only need to determine whether an anomaly has occurred at a certain time (the time level anomaly), but also need to detect whether anomalies occur in a node (the node level anomaly) and which key performance indicators (KPIs) are anomalies (the KPI level anomaly), that is, to perform multi-granular anomaly detection in distributed systems. However, most existing algorithms only focus on the time level anomalies in centralized systems. For distributed systems, a simple way is to train a model for each node and then detect anomalies independently. An obvious disadvantage is that the cost of model inferring is unacceptable in practice. Therefore, we propose a Multi-Granular Anomaly Detection (MGAD) framework that utilizes a tree structure to perform anomaly detection hierarchically from the node level to time and KPI levels, which greatly reduces the cost of model inferring. Specifically, at the time level, we propose a novel model named Masked Sliding Spatial-Temporal Adversarial Network (MS2TAN) that considers spatial and temporal dependencies simultaneously. Extensive experiments with real-world data offer insights into the performance of the proposals, showing that MGAD is at least 5 faster for inferring when compared with the baselines.
{"title":"Towards efficient multi-granular anomaly detection in distributed systems","authors":"Chao Tu , Ming Chen , Liwen Zhang , Long Zhao , Di Wu , Ziyang Yue","doi":"10.1016/j.array.2023.100330","DOIUrl":"https://doi.org/10.1016/j.array.2023.100330","url":null,"abstract":"<div><p>Distributed systems often consist of a large number of computing and data nodes, which makes it both significant and challenging to detect anomalies efficiently and accurately in distributed systems. Generally, we not only need to determine whether an anomaly has occurred at a certain time (the time level anomaly), but also need to detect whether anomalies occur in a node (the node level anomaly) and which key performance indicators (KPIs) are anomalies (the KPI level anomaly), that is, to perform multi-granular anomaly detection in distributed systems. However, most existing algorithms only focus on the time level anomalies in centralized systems. For distributed systems, a simple way is to train a model for each node and then detect anomalies independently. An obvious disadvantage is that the cost of model inferring is unacceptable in practice. Therefore, we propose a <strong>M</strong>ulti-<strong>G</strong>ranular <strong>A</strong>nomaly <strong>D</strong>etection (MGAD) framework that utilizes a tree structure to perform anomaly detection hierarchically from the node level to time and KPI levels, which greatly reduces the cost of model inferring. Specifically, at the time level, we propose a novel model named <strong>M</strong>asked <strong>S</strong>liding <strong>S</strong>patial-<strong>T</strong>emporal <strong>A</strong>dversarial <strong>N</strong>etwork (MS2TAN) that considers spatial and temporal dependencies simultaneously. Extensive experiments with real-world data offer insights into the performance of the proposals, showing that MGAD is at least 5<span><math><mo>×</mo></math></span> faster for inferring when compared with the baselines.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100330"},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000553/pdfft?md5=a8b79cf32296c7cea873bc6dab0e3b2b&pid=1-s2.0-S2590005623000553-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138557230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dragon fruit is a prominent substance in global agriculture. Despite this, it is gaining popularity and is a viable solution in resource-poor, environmentally degraded areas because of its many health benefits. Nevertheless, many dragon fruit plantations have been impacted by the disease, reducing their yield, and the detection system is still conventional. Farmers’ lack of disease identification and management expertise diminished crop quality and products. As a result, little research was carried out to assist those specific farmers requiring adequate agricultural support. This research has proposed an autonomous agro-based system to recognize dragon diseases using in-depth analysis of feature selection techniques. After the collection of real-time images of the dragon, the images are preprocessed using various image-processing techniques. The two important features are retrieved after segmentation. The analysis of variance (ANOVA) and the least absolute shrinkage and selection operator (LASSO) are used as feature selection techniques to assess the feature rank based on the mutual score. To analyze the effectiveness of the machine learning algorithms that were used, six distinct machine learning classifiers were applied to the top-ranked feature sets, and their performance was measured using seven distinct performance evaluation metrics. AdaBoost and Random Forest classifiers for the LASSO feature ranking approach got the maximum accuracy, which is 96.29%, based on a comparison of classifiers based on the ANOVA and LASSO feature set. Despite this, we have optimized the computational resources of each classifier for the LASSO feature set.
{"title":"Addressing agricultural challenges: An identification of best feature selection technique for dragon fruit disease recognition","authors":"Rashiduzzaman Shakil , Shawn Islam , Yeasir Arafat Shohan , Anonto Mia , Aditya Rajbongshi , Md Habibur Rahman , Bonna Akter","doi":"10.1016/j.array.2023.100326","DOIUrl":"https://doi.org/10.1016/j.array.2023.100326","url":null,"abstract":"<div><p>Dragon fruit is a prominent substance in global agriculture. Despite this, it is gaining popularity and is a viable solution in resource-poor, environmentally degraded areas because of its many health benefits. Nevertheless, many dragon fruit plantations have been impacted by the disease, reducing their yield, and the detection system is still conventional. Farmers’ lack of disease identification and management expertise diminished crop quality and products. As a result, little research was carried out to assist those specific farmers requiring adequate agricultural support. This research has proposed an autonomous agro-based system to recognize dragon diseases using in-depth analysis of feature selection techniques. After the collection of real-time images of the dragon, the images are preprocessed using various image-processing techniques. The two important features are retrieved after segmentation. The analysis of variance (ANOVA) and the least absolute shrinkage and selection operator (LASSO) are used as feature selection techniques to assess the feature rank based on the mutual score. To analyze the effectiveness of the machine learning algorithms that were used, six distinct machine learning classifiers were applied to the top-ranked feature sets, and their performance was measured using seven distinct performance evaluation metrics. AdaBoost and Random Forest classifiers for the LASSO feature ranking approach got the maximum accuracy, which is 96.29%, based on a comparison of classifiers based on the ANOVA and LASSO feature set. Despite this, we have optimized the computational resources of each classifier for the LASSO feature set.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"20 ","pages":"Article 100326"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000516/pdfft?md5=ba7d9ce33800b2e7410939f1cf4f3973&pid=1-s2.0-S2590005623000516-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138086806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-02DOI: 10.1016/j.array.2023.100328
Hui Chen , Zhengze Li , Xue Wang
The prominence of the Chinese language as a United Nations official language has sparked significant interest, leading to this research on international Chinese education (ICE). This study has a triple aim: firstly, to create indicators for monitoring ICE; secondly, to use these indicators to assess ICE development across nations; and thirdly, to highlight disparities and potential influencing factors for informed policy-making.
To facilitate indicator creation, we introduce an ICE index ranking system, evaluating 24 aspects grouped into three dimensions: Localization, Specialization, and Collaboration. These dimensions further categorize the 24 aspects into seven level-2 indicators, providing insights into global Chinese language education. After a thorough literature review and considering data availability, these indicators rank ICE in 153 countries.
For evaluation, we objectively assess indicators by assigning weights based on expert opinions. The results demonstrate that the categorized and ranked indicators offer valuable insights into global ICE development. Cluster analysis reveals diverse patterns of ICE development, with distinct areas requiring improvement across nations.
To illustrate further, we conduct a correlation analysis using an external dataset encompassing five main components: Economic Ties, Geographical Distance, Cultural Ties, Government Policies, and China's Image. The findings indicate that countries with strong economic ties to China tend to excel in all three ICE dimensions. Additionally, nations with higher numbers of tourists visiting China generally achieve higher ICE scores.
{"title":"On International Chinese Education Index Ranking in a Global Perspective","authors":"Hui Chen , Zhengze Li , Xue Wang","doi":"10.1016/j.array.2023.100328","DOIUrl":"https://doi.org/10.1016/j.array.2023.100328","url":null,"abstract":"<div><p>The prominence of the Chinese language as a United Nations official language has sparked significant interest, leading to this research on international Chinese education (ICE). This study has a triple aim: firstly, to create indicators for monitoring ICE; secondly, to use these indicators to assess ICE development across nations; and thirdly, to highlight disparities and potential influencing factors for informed policy-making.</p><p>To facilitate indicator creation, we introduce an ICE index ranking system, evaluating 24 aspects grouped into three dimensions: Localization, Specialization, and Collaboration. These dimensions further categorize the 24 aspects into seven level-2 indicators, providing insights into global Chinese language education. After a thorough literature review and considering data availability, these indicators rank ICE in 153 countries.</p><p>For evaluation, we objectively assess indicators by assigning weights based on expert opinions. The results demonstrate that the categorized and ranked indicators offer valuable insights into global ICE development. Cluster analysis reveals diverse patterns of ICE development, with distinct areas requiring improvement across nations.</p><p>To illustrate further, we conduct a correlation analysis using an external dataset encompassing five main components: Economic Ties, Geographical Distance, Cultural Ties, Government Policies, and China's Image. The findings indicate that countries with strong economic ties to China tend to excel in all three ICE dimensions. Additionally, nations with higher numbers of tourists visiting China generally achieve higher ICE scores.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"20 ","pages":"Article 100328"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S259000562300053X/pdfft?md5=8abf86d9cf6a88e981240ff29925d406&pid=1-s2.0-S259000562300053X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138086810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}