Pub Date : 2024-03-19DOI: 10.1016/j.iswa.2024.200362
Ahmed Al-Adaileh , Mousa Al-Kfairy , Mohammad Tubishat , Omar Alfandi
This research explores the user perceptions of the Metaverse Marketplace, analyzing a substantial dataset of over 860,000 Twitter posts through sentiment analysis and topic modeling techniques. The study aims to uncover the driving factors behind user engagement and sentiment in this novel digital trading space. Key findings highlight a predominantly positive user sentiment, with significant enthusiasm for the marketplace's revenue generation and entertainment potential, particularly within the gaming sector. Users express appreciation for the innovative opportunities the Metaverse Marketplace offers for artists, designers, and traders in handling and trading digital assets. This positive outlook is tempered by notable concerns regarding security and privacy within the Metaverse, pointing to a critical area for development and assurance. The study also reveals a substantial neutral sentiment, reflecting users’ cautious but interested stance, particularly regarding the marketplace's role in investment and passive income opportunities. This balanced view underscores the evolving nature of user perceptions in this emerging field. Theoretically, the research enriches the discourse on technology adoption, particularly in virtual environments, by highlighting perceived benefits and enjoyment as significant adoption drivers. These insights are invaluable for stakeholders in the Metaverse Marketplace, guiding the development of more secure, engaging, and user-friendly platforms. While providing a pioneering perspective on Metaverse user perceptions, the study acknowledges its limitation to Twitter data, suggesting the need for broader research methodologies for a more holistic understanding.
{"title":"A sentiment analysis approach for understanding users’ perception of metaverse marketplace","authors":"Ahmed Al-Adaileh , Mousa Al-Kfairy , Mohammad Tubishat , Omar Alfandi","doi":"10.1016/j.iswa.2024.200362","DOIUrl":"https://doi.org/10.1016/j.iswa.2024.200362","url":null,"abstract":"<div><p>This research explores the user perceptions of the Metaverse Marketplace, analyzing a substantial dataset of over 860,000 Twitter posts through sentiment analysis and topic modeling techniques. The study aims to uncover the driving factors behind user engagement and sentiment in this novel digital trading space. Key findings highlight a predominantly positive user sentiment, with significant enthusiasm for the marketplace's revenue generation and entertainment potential, particularly within the gaming sector. Users express appreciation for the innovative opportunities the Metaverse Marketplace offers for artists, designers, and traders in handling and trading digital assets. This positive outlook is tempered by notable concerns regarding security and privacy within the Metaverse, pointing to a critical area for development and assurance. The study also reveals a substantial neutral sentiment, reflecting users’ cautious but interested stance, particularly regarding the marketplace's role in investment and passive income opportunities. This balanced view underscores the evolving nature of user perceptions in this emerging field. Theoretically, the research enriches the discourse on technology adoption, particularly in virtual environments, by highlighting perceived benefits and enjoyment as significant adoption drivers. These insights are invaluable for stakeholders in the Metaverse Marketplace, guiding the development of more secure, engaging, and user-friendly platforms. While providing a pioneering perspective on Metaverse user perceptions, the study acknowledges its limitation to Twitter data, suggesting the need for broader research methodologies for a more holistic understanding.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200362"},"PeriodicalIF":0.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000383/pdfft?md5=408db24ecd15b5edd94a070515a178eb&pid=1-s2.0-S2667305324000383-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140179896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-19DOI: 10.1016/j.iswa.2024.200361
Gopi Krishna Erabati, Helder Araujo
The image-based backbone (feature extraction) networks downsample the feature maps not only to increase the receptive field but also to efficiently detect objects of various scales. The existing feature extraction networks in LiDAR-based 3D object detection tasks follow the feature map downsampling similar to image-based feature extraction networks to increase the receptive field. But, such downsampling of LiDAR feature maps in large-scale autonomous driving scenarios hinder the detection of small size objects, such as pedestrians. To solve this issue we design an architecture that not only maintains the same scale of the feature maps but also the receptive field in the feature extraction network to aid for efficient detection of small size objects. We resort to attention mechanism to build sufficient receptive field and we propose a Deep and Light-weight Voxel Transformer (DeLiVoTr) network with voxel intra- and inter-region transformer modules to extract voxel local and global features respectively. We introduce DeLiVoTr block that uses transformations with expand and reduce strategy to vary the width and depth of the network efficiently. This facilitates to learn wider and deeper voxel representations and enables to use not only smaller dimension for attention mechanism but also a light-weight feed-forward network, facilitating the reduction of parameters and operations. In addition to model scaling, we employ layer-level scaling of DeLiVoTr encoder layers for efficient parameter allocation in each encoder layer instead of fixed number of parameters as in existing approaches. Leveraging layer-level depth and width scaling we formulate three variants of DeLiVoTr network. We conduct extensive experiments and analysis on large-scale Waymo and KITTI datasets. Our network surpasses state-of-the-art methods for detection of small objects (pedestrians) with an inference speed of 20.5 FPS.
{"title":"DeLiVoTr: Deep and light-weight voxel transformer for 3D object detection","authors":"Gopi Krishna Erabati, Helder Araujo","doi":"10.1016/j.iswa.2024.200361","DOIUrl":"10.1016/j.iswa.2024.200361","url":null,"abstract":"<div><p>The image-based backbone (feature extraction) networks downsample the feature maps not only to increase the receptive field but also to efficiently detect objects of various scales. The existing feature extraction networks in LiDAR-based 3D object detection tasks follow the feature map downsampling similar to image-based feature extraction networks to increase the receptive field. But, such downsampling of LiDAR feature maps in large-scale autonomous driving scenarios hinder the detection of small size objects, such as <em>pedestrians</em>. To solve this issue we design an architecture that not only maintains the same scale of the feature maps but also the receptive field in the feature extraction network to aid for efficient detection of small size objects. We resort to attention mechanism to build sufficient receptive field and we propose a <strong>De</strong>ep and <strong>Li</strong>ght-weight <strong>Vo</strong>xel <strong>Tr</strong>ansformer (DeLiVoTr) network with voxel intra- and inter-region transformer modules to extract voxel local and global features respectively. We introduce DeLiVoTr block that uses transformations with expand and reduce strategy to vary the width and depth of the network efficiently. This facilitates to learn wider and deeper voxel representations and enables to use not only smaller dimension for attention mechanism but also a light-weight feed-forward network, facilitating the reduction of parameters and operations. In addition to <em>model</em> scaling, we employ <em>layer-level</em> scaling of DeLiVoTr encoder layers for efficient parameter allocation in each encoder layer instead of fixed number of parameters as in existing approaches. Leveraging <em>layer-level depth</em> and <em>width</em> scaling we formulate three variants of DeLiVoTr network. We conduct extensive experiments and analysis on large-scale Waymo and KITTI datasets. Our network surpasses state-of-the-art methods for detection of small objects (<em>pedestrians</em>) with an inference speed of 20.5 FPS.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200361"},"PeriodicalIF":0.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000371/pdfft?md5=a6e557978ff347c6423116d4ba2f6a20&pid=1-s2.0-S2667305324000371-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140275011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text summarization is the process of creating a text summary that contains important information from a text document. In recent years, significant progress has been made in the field of text summarization research, along with the challenges that drive research progress in the field at large. The development of textual data has sparked great interest in text summarization research, which is thoroughly reviewed in this survey study. Text summarization research improvements continue to be made to date with various approaches, such as abstractive and extractive. The abstractive approach uses an intermediate representation of the input document to produce a summary that may differ from the original text. The extractive approach means that key sentences are extracted from the source document and combined to form a summary. Despite the various methodologies and approaches recommended, the summaries produced still contain ambiguities that can be interpreted with different meanings, resulting in errors in defining ambiguities, uncertainty in measuring the quality of summaries, difficulty in modeling linguistic context, difficulty in representing semantic meanings, and difficulty in specifying types of ambiguities. This research survey offers a comprehensive exploration of text summarization research, covering challenges, classifications, approaches, preprocessing methods, features, techniques, and evaluation methods, meeting future research needs. The results provide an overview of the state of the art of recent research developments in the topic of ambiguity resolution in text summarization, such as trends in research topics and approaches or techniques used in addressing ambiguity problems in text summarization.
{"title":"Review of ambiguity problem in text summarization using hybrid ACA and SLR","authors":"Sutriawan Sutriawan , Supriadi Rustad , Guruh Fajar Shidik , Pujiono Pujiono , Muljono Muljono","doi":"10.1016/j.iswa.2024.200360","DOIUrl":"10.1016/j.iswa.2024.200360","url":null,"abstract":"<div><p>Text summarization is the process of creating a text summary that contains important information from a text document. In recent years, significant progress has been made in the field of text summarization research, along with the challenges that drive research progress in the field at large. The development of textual data has sparked great interest in text summarization research, which is thoroughly reviewed in this survey study. Text summarization research improvements continue to be made to date with various approaches, such as abstractive and extractive. The abstractive approach uses an intermediate representation of the input document to produce a summary that may differ from the original text. The extractive approach means that key sentences are extracted from the source document and combined to form a summary. Despite the various methodologies and approaches recommended, the summaries produced still contain ambiguities that can be interpreted with different meanings, resulting in errors in defining ambiguities, uncertainty in measuring the quality of summaries, difficulty in modeling linguistic context, difficulty in representing semantic meanings, and difficulty in specifying types of ambiguities. This research survey offers a comprehensive exploration of text summarization research, covering challenges, classifications, approaches, preprocessing methods, features, techniques, and evaluation methods, meeting future research needs. The results provide an overview of the state of the art of recent research developments in the topic of ambiguity resolution in text summarization, such as trends in research topics and approaches or techniques used in addressing ambiguity problems in text summarization.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200360"},"PeriodicalIF":0.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266730532400036X/pdfft?md5=3c2870d3b3f87a6ef8f6576559396413&pid=1-s2.0-S266730532400036X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140269990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-16DOI: 10.1016/j.iswa.2024.200355
Laith Alzubaidi , Khamael AL–Dulaimi , Huda Abdul-Hussain Obeed , Ahmed Saihood , Mohammed A. Fadhel , Sabah Abdulazeez Jebur , Yubo Chen , A.S. Albahri , Jose Santamaría , Ashish Gupta , Yuantong Gu
Adversarial attacks pose a significant threat to deep learning models, specifically medical images, as they can mislead models into making inaccurate predictions by introducing subtle distortions to the input data that are often imperceptible to humans. Although adversarial training is a common technique used to mitigate these attacks on medical images, it lacks the flexibility to address new attack methods and effectively improve feature representation. This paper introduces a novel Model Ensemble Feature Fusion (MEFF) designed to combat adversarial attacks in medical image applications. The proposed model employs feature fusion by combining features extracted from different DL models and then trains Machine Learning classifiers using the fused features. It uses a concatenation method to merge the extracted features, forming a more comprehensive representation and enhancing the model's ability to classify classes accurately. Our experimental study has performed a comprehensive evaluation of MEFF, considering several challenging scenarios, including 2D and 3D images, greyscale and colour images, binary classification, and multi-label classification. The reported results demonstrate the robustness of using MEFF against different types of adversarial attacks across six distinct medical image applications. A key advantage of MEFF is its capability to incorporate a wide range of adversarial attacks without the need to train from scratch. Therefore, it contributes to developing a more diverse and robust defence strategy. More importantly, by leveraging feature fusion and ensemble modelling, MEFF enhances the resilience of DL models in the face of adversarial attacks, paving the way for improved robustness and reliability in medical image analysis.
{"title":"MEFF – A model ensemble feature fusion approach for tackling adversarial attacks in medical imaging","authors":"Laith Alzubaidi , Khamael AL–Dulaimi , Huda Abdul-Hussain Obeed , Ahmed Saihood , Mohammed A. Fadhel , Sabah Abdulazeez Jebur , Yubo Chen , A.S. Albahri , Jose Santamaría , Ashish Gupta , Yuantong Gu","doi":"10.1016/j.iswa.2024.200355","DOIUrl":"https://doi.org/10.1016/j.iswa.2024.200355","url":null,"abstract":"<div><p>Adversarial attacks pose a significant threat to deep learning models, specifically medical images, as they can mislead models into making inaccurate predictions by introducing subtle distortions to the input data that are often imperceptible to humans. Although adversarial training is a common technique used to mitigate these attacks on medical images, it lacks the flexibility to address new attack methods and effectively improve feature representation. This paper introduces a novel Model Ensemble Feature Fusion (MEFF) designed to combat adversarial attacks in medical image applications. The proposed model employs feature fusion by combining features extracted from different DL models and then trains Machine Learning classifiers using the fused features. It uses a concatenation method to merge the extracted features, forming a more comprehensive representation and enhancing the model's ability to classify classes accurately. Our experimental study has performed a comprehensive evaluation of MEFF, considering several challenging scenarios, including 2D and 3D images, greyscale and colour images, binary classification, and multi-label classification. The reported results demonstrate the robustness of using MEFF against different types of adversarial attacks across six distinct medical image applications. A key advantage of MEFF is its capability to incorporate a wide range of adversarial attacks without the need to train from scratch. Therefore, it contributes to developing a more diverse and robust defence strategy. More importantly, by leveraging feature fusion and ensemble modelling, MEFF enhances the resilience of DL models in the face of adversarial attacks, paving the way for improved robustness and reliability in medical image analysis.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200355"},"PeriodicalIF":0.0,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000310/pdfft?md5=5fa2dc401268f3c29a24c198fa07f620&pid=1-s2.0-S2667305324000310-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140191734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-16DOI: 10.1016/j.iswa.2024.200359
Zhiqiang Feng
To share data on Internet of Things devices more securely, accurately, and efficiently, this study designs a layered sharing architecture based on blockchain and federated learning. This architecture achieves efficient and secure Internet of Things data sharing through client node clustering and blockchain consensus processes. In addition, to address the issue of imbalanced distribution of data labels in system devices, a device clustering federated learning algorithm based on label similarity is designed to improve the accuracy and stability of the model. The experimental results showed that under independent synchronous data distribution and non independent synchronous data distribution, the research algorithm achieved a 95 % accuracy after 30 iterations, and the communication cost was relatively low. When testing algorithm stability under non independent synchronous data distribution, the more label categories there are, the higher the accuracy. When the label category M = 12, the accuracy could reach 96.0 %. In the medical sharing system of a certain hospital, the research system took about 42.9 % less time to extract information than the original system, and the accuracy could be maintained at over 98 %. This research method can effectively solve the problem of uneven distribution of device data labels, and improve the data transmission efficiency and accuracy of Internet of Things data sharing systems. Moreover, this method can also reduce the impact of malicious nodes on the global model, providing technical support for data transmission and security protection in other fields.
{"title":"IoT data sharing technology based on blockchain and federated learning algorithms","authors":"Zhiqiang Feng","doi":"10.1016/j.iswa.2024.200359","DOIUrl":"https://doi.org/10.1016/j.iswa.2024.200359","url":null,"abstract":"<div><p>To share data on Internet of Things devices more securely, accurately, and efficiently, this study designs a layered sharing architecture based on blockchain and federated learning. This architecture achieves efficient and secure Internet of Things data sharing through client node clustering and blockchain consensus processes. In addition, to address the issue of imbalanced distribution of data labels in system devices, a device clustering federated learning algorithm based on label similarity is designed to improve the accuracy and stability of the model. The experimental results showed that under independent synchronous data distribution and non independent synchronous data distribution, the research algorithm achieved a 95 % accuracy after 30 iterations, and the communication cost was relatively low. When testing algorithm stability under non independent synchronous data distribution, the more label categories there are, the higher the accuracy. When the label category <em>M</em> = 12, the accuracy could reach 96.0 %. In the medical sharing system of a certain hospital, the research system took about 42.9 % less time to extract information than the original system, and the accuracy could be maintained at over 98 %. This research method can effectively solve the problem of uneven distribution of device data labels, and improve the data transmission efficiency and accuracy of Internet of Things data sharing systems. Moreover, this method can also reduce the impact of malicious nodes on the global model, providing technical support for data transmission and security protection in other fields.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200359"},"PeriodicalIF":0.0,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000358/pdfft?md5=eadc7c0c02f671c3d2bfcdcae178083b&pid=1-s2.0-S2667305324000358-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140179989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-16DOI: 10.1016/j.iswa.2024.200357
Zakarya Farou , Yizhi Wang , Tomáš Horváth
Class imbalance learning is challenging in various domains where training datasets exhibit disproportionate samples in a specific class. Resampling methods have been used to adjust the class distribution, but they often have limitations for small disjunct minority subsets. This paper introduces AROSS, an adaptive cluster-based oversampling approach that addresses these limitations. AROSS utilizes an optimized agglomerative clustering algorithm with the Cophenetic Correlation Coefficient and the Bayesian Information Criterion to identify representative areas of the minority class. Safe and half-safe areas are obtained using an incremental k-Nearest Neighbor strategy, and oversampling is performed with a truncated hyperspherical Gaussian distribution. Experimental evaluations on 70 binary datasets demonstrate the effectiveness of AROSS in improving class imbalance learning performance, making it a promising solution for mitigating class imbalance challenges, especially for small disjunct minority subsets.
在各种领域中,类不平衡学习都具有挑战性,因为在这些领域中,训练数据集显示出特定类中样本比例失调。重采样方法已被用于调整类分布,但对于小的不连续性少数群体子集,这些方法往往有局限性。本文介绍的 AROSS 是一种基于聚类的自适应超采样方法,可以解决这些局限性。AROSS 利用优化的聚集聚类算法、科芬尼相关系数和贝叶斯信息标准来确定少数群体的代表性区域。使用增量 k 近邻策略获得安全区和半安全区,并使用截断的超球面高斯分布进行超采样。在 70 个二元数据集上进行的实验评估表明,AROSS 在提高类不平衡学习性能方面非常有效,使其成为缓解类不平衡挑战的一种有前途的解决方案,特别是对于小的不连续性少数群体子集。
{"title":"Cluster-based oversampling with area extraction from representative points for class imbalance learning","authors":"Zakarya Farou , Yizhi Wang , Tomáš Horváth","doi":"10.1016/j.iswa.2024.200357","DOIUrl":"https://doi.org/10.1016/j.iswa.2024.200357","url":null,"abstract":"<div><p>Class imbalance learning is challenging in various domains where training datasets exhibit disproportionate samples in a specific class. Resampling methods have been used to adjust the class distribution, but they often have limitations for small disjunct minority subsets. This paper introduces AROSS, an adaptive cluster-based oversampling approach that addresses these limitations. AROSS utilizes an optimized agglomerative clustering algorithm with the Cophenetic Correlation Coefficient and the Bayesian Information Criterion to identify representative areas of the minority class. Safe and half-safe areas are obtained using an incremental k-Nearest Neighbor strategy, and oversampling is performed with a truncated hyperspherical Gaussian distribution. Experimental evaluations on 70 binary datasets demonstrate the effectiveness of AROSS in improving class imbalance learning performance, making it a promising solution for mitigating class imbalance challenges, especially for small disjunct minority subsets.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200357"},"PeriodicalIF":0.0,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000334/pdfft?md5=a11f2bb04866bb8768451b4018887e0e&pid=1-s2.0-S2667305324000334-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140162425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study investigates the use of advanced computer vision techniques for assessing the severity of Orientia tsutsugamushi bacterial infectivity. It uses fluorescent scrub typhus images obtained from molecular screening, and addresses challenges posed by a complex and extensive image dataset, with limited computational resources. Our methodology integrates three key strategies within a deep learning framework: transitioning from instance segmentation (IS) models to an object detection model; reducing the model's backbone size; and employing lower-precision floating-point calculations. These approaches were systematically evaluated to strike an optimal balance between model accuracy and inference speed, crucial for effective bacterial infectivity assessment. A significant outcome is that the implementation of the Faster R-CNN architecture, with a shallow backbone and reduced precision, notably improves accuracy and reduces inference time in cell counting and infectivity assessment. This innovative approach successfully addresses the limitations of image processing techniques and IS models, effectively bridging the gap between sophisticated computational methods and modern molecular biology applications. The findings underscore the potential of this integrated approach to enhance the accuracy and efficiency of bacterial infectivity evaluations in molecular research.
本研究调查了先进计算机视觉技术在评估恙虫病细菌感染严重程度中的应用。它使用了从分子筛选中获得的荧光恙虫病图像,并利用有限的计算资源解决了复杂而广泛的图像数据集带来的挑战。我们的方法在深度学习框架内整合了三个关键策略:从实例分割(IS)模型过渡到对象检测模型;缩小模型的主干尺寸;以及采用低精度浮点计算。对这些方法进行了系统评估,以便在模型准确性和推理速度之间取得最佳平衡,这对有效评估细菌感染性至关重要。一个重要的结果是,采用浅骨干网和较低精度的更快 R-CNN 架构,显著提高了细胞计数和感染性评估的准确性,并缩短了推理时间。这种创新方法成功地解决了图像处理技术和 IS 模型的局限性,有效地缩小了复杂计算方法与现代分子生物学应用之间的差距。研究结果凸显了这种集成方法在提高分子研究中细菌感染性评估的准确性和效率方面的潜力。
{"title":"Speed meets accuracy: Advanced deep learning for efficient Orientia tsutsugamushi bacteria assessment in RNAi screening","authors":"Potjanee Kanchanapiboon , Chuenchat Songsaksuppachok , Porncheera Chusorn , Panrasee Ritthipravat","doi":"10.1016/j.iswa.2024.200356","DOIUrl":"https://doi.org/10.1016/j.iswa.2024.200356","url":null,"abstract":"<div><p>This study investigates the use of advanced computer vision techniques for assessing the severity of <em>Orientia tsutsugamushi</em> bacterial infectivity. It uses fluorescent scrub typhus images obtained from molecular screening, and addresses challenges posed by a complex and extensive image dataset, with limited computational resources. Our methodology integrates three key strategies within a deep learning framework: transitioning from instance segmentation (IS) models to an object detection model; reducing the model's backbone size; and employing lower-precision floating-point calculations. These approaches were systematically evaluated to strike an optimal balance between model accuracy and inference speed, crucial for effective bacterial infectivity assessment. A significant outcome is that the implementation of the Faster R-CNN architecture, with a shallow backbone and reduced precision, notably improves accuracy and reduces inference time in cell counting and infectivity assessment. This innovative approach successfully addresses the limitations of image processing techniques and IS models, effectively bridging the gap between sophisticated computational methods and modern molecular biology applications. The findings underscore the potential of this integrated approach to enhance the accuracy and efficiency of bacterial infectivity evaluations in molecular research.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200356"},"PeriodicalIF":0.0,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000322/pdfft?md5=2d06cfac57033fbe4635f13bd56c5c03&pid=1-s2.0-S2667305324000322-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140179894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Handwritten document recognition and classification are among the many computers related issues being studied for digitizing handwritten data. A handwritten document comprises text, diagrams, mathematical expressions, numerals, and tables. Due to the variety of writing styles and the intricacy of the written language, it has proven difficult to recognize handwritten material. As a result, numerous handwritten document recognition systems have been developed, each with unique benefits and drawbacks. The paper reviews the evolution of handwritten document recognition in qualitative and quantitative ways. Initially, the bibliometric survey is presented based on the number of articles, citations, countries, authors, etc., on handwritten document recognition in the Scopus database. Later, a survey is done on the learning techniques used for handwritten documents: text recognition, digit recognition, mathematical expression recognition, table recognition, and diagram recognition. This paper also presents the directions for future research in handwritten document recognition.
{"title":"Exploration of advancements in handwritten document recognition techniques","authors":"Vanita Agrawal , Jayant Jagtap , M.V.V. Prasad Kantipudi","doi":"10.1016/j.iswa.2024.200358","DOIUrl":"https://doi.org/10.1016/j.iswa.2024.200358","url":null,"abstract":"<div><p>Handwritten document recognition and classification are among the many computers related issues being studied for digitizing handwritten data. A handwritten document comprises text, diagrams, mathematical expressions, numerals, and tables. Due to the variety of writing styles and the intricacy of the written language, it has proven difficult to recognize handwritten material. As a result, numerous handwritten document recognition systems have been developed, each with unique benefits and drawbacks. The paper reviews the evolution of handwritten document recognition in qualitative and quantitative ways. Initially, the bibliometric survey is presented based on the number of articles, citations, countries, authors, etc., on handwritten document recognition in the Scopus database. Later, a survey is done on the learning techniques used for handwritten documents: text recognition, digit recognition, mathematical expression recognition, table recognition, and diagram recognition. This paper also presents the directions for future research in handwritten document recognition.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200358"},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000346/pdfft?md5=008f9ee0edb201f02c7d97e969505812&pid=1-s2.0-S2667305324000346-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140179895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most real-world classification problems deal with imbalanced datasets, posing a challenge for Artificial Intelligence (AI), i.e., machine learning algorithms, because the minority class, which is of extreme interest, often proves difficult to be detected. This paper empirically evaluates tree boosting methods' performance given different dataset sizes and class distributions, from perfectly balanced to highly imbalanced. For tabular data, tree-based methods such as XGBoost, stand out in several benchmarks due to detection performance and speed. Therefore, XGBoost and Imbalance-XGBoost are evaluated. After introducing the motivation to address risk assessment with machine learning, the paper reviews evaluation metrics for detection systems or binary classifiers. It proposes a method for data preparation followed by tree boosting methods including hyper-parameter optimization. The method is evaluated on private datasets of 1 thousand (K), 10K and 100K samples on distributions with 50, 45, 25, and 5 percent positive samples. As expected, the developed method increases its recognition performance as more data is given for training and the F1 score decreases as the data distribution becomes more imbalanced, but it is still significantly superior to the baseline of precision-recall determined by the ratio of positives divided by positives and negatives. Sampling to balance the training set does not provide consistent improvement and deteriorates detection. In contrast, classifier hyper-parameter optimization improves recognition, but should be applied carefully depending on data volume and distribution. Finally, the developed method is robust to data variation over time up to some point. Retraining can be used when performance starts deteriorating.
{"title":"Tree boosting methods for balanced and imbalanced classification and their robustness over time in risk assessment","authors":"Gissel Velarde , Michael Weichert, Anuj Deshmunkh, Sanjay Deshmane, Anindya Sudhir, Khushboo Sharma, Vaibhav Joshi","doi":"10.1016/j.iswa.2024.200354","DOIUrl":"https://doi.org/10.1016/j.iswa.2024.200354","url":null,"abstract":"<div><p>Most real-world classification problems deal with imbalanced datasets, posing a challenge for Artificial Intelligence (AI), i.e., machine learning algorithms, because the minority class, which is of extreme interest, often proves difficult to be detected. This paper empirically evaluates tree boosting methods' performance given different dataset sizes and class distributions, from perfectly balanced to highly imbalanced. For tabular data, tree-based methods such as XGBoost, stand out in several benchmarks due to detection performance and speed. Therefore, XGBoost and Imbalance-XGBoost are evaluated. After introducing the motivation to address risk assessment with machine learning, the paper reviews evaluation metrics for detection systems or binary classifiers. It proposes a method for data preparation followed by tree boosting methods including hyper-parameter optimization. The method is evaluated on private datasets of 1 thousand (K), 10K and 100K samples on distributions with 50, 45, 25, and 5 percent positive samples. As expected, the developed method increases its recognition performance as more data is given for training and the F1 score decreases as the data distribution becomes more imbalanced, but it is still significantly superior to the baseline of precision-recall determined by the ratio of positives divided by positives and negatives. Sampling to balance the training set does not provide consistent improvement and deteriorates detection. In contrast, classifier hyper-parameter optimization improves recognition, but should be applied carefully depending on data volume and distribution. Finally, the developed method is robust to data variation over time up to some point. Retraining can be used when performance starts deteriorating.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200354"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000309/pdfft?md5=be6e208c32a749998c8ea1ee56dcab8e&pid=1-s2.0-S2667305324000309-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140122584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-11DOI: 10.1016/j.iswa.2024.200351
Yeşim ÜLGEN SÖNMEZ , Asaf VAROL
In the super smart society (Society 5.0), new and rapid methods are needed for speech recognition, emotion recognition, and speech emotion recognition areas to maximize human-machine or human-computer interaction and collaboration. Speech signal contains much information about the speaker, such as age, sex, ethnicity, health condition, emotion, and thoughts. The field of study which analyzes the mood of the person from the speech is called speech emotion recognition (SER). Classifying the emotions from the speech data is a complicated problem for artificial intelligence, and its sub-discipline, machine learning. Because it is hard to analyze the speech signal which contains various frequencies and characteristics. Speech data are digitized with signal processing methods and speech features are obtained. These features vary depending on the emotions such as sadness, fear, anger, happiness, boredom, confusion, etc. Even though different methods have been developed for determining the audio properties and emotion recognition, the success rate varies depending on the languages, cultures, emotions, and data sets. In speech emotion recognition, there is a need for new methods which can be applied in data sets with different sizes, which will increase classification success, in which best properties can be obtained, and which are affordable. The success rates are affected by many factors such as the methods used, lack of speech emotion datasets, the homogeneity of the database, the difficulty of the language (linguistic differences), the noise in audio data and the length of the audio data. Within the scope of this study, studies on emotion recognition from speech signals from past to present have been analyzed in detail. In this study, classification studies based on a discrete emotion model using speech data belonging to the Berlin emotional database (EMO-DB), Italian emotional speech database (EMOVO), The Surrey audio-visual expressed emotion database (SAVEE), Ryerson Audio-Visual Database of Emotional Speech and Song Database (RAVDESS), which are mostly independent of the speaker and content, are examined. The results of both classical classifiers and deep learning methods are compared. Deep learning results are more successful, but classical classification is more important in determining the defining features of speech, song or voice. So It develops feature extraction stage. This study will be able to contribute to the literature and help the researchers in the SER field.
在超级智能社会(5.0 社会)中,语音识别、情感识别和语音情绪识别领域需要新的快速方法,以最大限度地实现人机或人机交互与协作。语音信号包含说话者的许多信息,如年龄、性别、种族、健康状况、情感和思想等。从语音中分析人的情绪的研究领域被称为语音情绪识别(SER)。对人工智能及其分支学科机器学习来说,从语音数据中进行情绪分类是一个复杂的问题。因为要分析包含各种频率和特征的语音信号非常困难。通过信号处理方法对语音数据进行数字化处理,从而获得语音特征。这些特征因情绪而异,如悲伤、恐惧、愤怒、快乐、无聊、困惑等。尽管已经开发了不同的方法来确定音频属性和情感识别,但成功率因语言、文化、情感和数据集而异。在语音情感识别方面,需要新的方法,这些方法可以应用于不同规模的数据集,提高分类成功率,获得最佳属性,而且价格合理。成功率受很多因素的影响,例如所使用的方法、缺乏语音情感数据集、数据库的同质性、语言的难度(语言差异)、音频数据中的噪声以及音频数据的长度。在本研究范围内,详细分析了从过去到现在的语音信号情感识别研究。在本研究中,基于离散情感模型的分类研究使用了柏林情感数据库(EMO-DB)、意大利情感语音数据库(EMOVO)、萨里视听表达情感数据库(SAVEE)、瑞尔森情感语音和歌曲视听数据库(RAVDESS)中的语音数据,这些数据大多与说话者和内容无关。比较了经典分类器和深度学习方法的结果。深度学习的结果更为成功,但经典分类在确定语音、歌曲或声音的定义特征方面更为重要。因此,它开发了特征提取阶段。这项研究将为相关文献做出贡献,并为 SER 领域的研究人员提供帮助。
{"title":"In-depth investigation of speech emotion recognition studies from past to present –The importance of emotion recognition from speech signal for AI–","authors":"Yeşim ÜLGEN SÖNMEZ , Asaf VAROL","doi":"10.1016/j.iswa.2024.200351","DOIUrl":"https://doi.org/10.1016/j.iswa.2024.200351","url":null,"abstract":"<div><p>In the super smart society (Society 5.0), new and rapid methods are needed for speech recognition, emotion recognition, and speech emotion recognition areas to maximize human-machine or human-computer interaction and collaboration. Speech signal contains much information about the speaker, such as age, sex, ethnicity, health condition, emotion, and thoughts. The field of study which analyzes the mood of the person from the speech is called speech emotion recognition (SER). Classifying the emotions from the speech data is a complicated problem for artificial intelligence, and its sub-discipline, machine learning. Because it is hard to analyze the speech signal which contains various frequencies and characteristics. Speech data are digitized with signal processing methods and speech features are obtained. These features vary depending on the emotions such as sadness, fear, anger, happiness, boredom, confusion, etc. Even though different methods have been developed for determining the audio properties and emotion recognition, the success rate varies depending on the languages, cultures, emotions, and data sets. In speech emotion recognition, there is a need for new methods which can be applied in data sets with different sizes, which will increase classification success, in which best properties can be obtained, and which are affordable. The success rates are affected by many factors such as the methods used, lack of speech emotion datasets, the homogeneity of the database, the difficulty of the language (linguistic differences), the noise in audio data and the length of the audio data. Within the scope of this study, studies on emotion recognition from speech signals from past to present have been analyzed in detail. In this study, classification studies based on a discrete emotion model using speech data belonging to the Berlin emotional database (EMO-DB), Italian emotional speech database (EMOVO), The Surrey audio-visual expressed emotion database (SAVEE), Ryerson Audio-Visual Database of Emotional Speech and Song Database (RAVDESS), which are mostly independent of the speaker and content, are examined. The results of both classical classifiers and deep learning methods are compared. Deep learning results are more successful, but classical classification is more important in determining the defining features of speech, song or voice. So It develops feature extraction stage. This study will be able to contribute to the literature and help the researchers in the SER field.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200351"},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000279/pdfft?md5=1617124db6cea95a53e38e62a54e8824&pid=1-s2.0-S2667305324000279-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140122577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}