Pub Date : 2024-09-05DOI: 10.1007/s11042-024-20144-8
Andréia dos Santos Sachete, Alba Valéria de Sant’anna de Freitas Loiola, Raquel Salcedo Gomes
Adaptive learning is an educational methodology that allows the personalization of learning according to the student’s pedagogical path. In digital environments, the strategic use of technologies enhances adaptive learning initiatives, enabling a dynamic understanding of intricate contextual nuances and the ability to identify and recommend appropriate learning activities. Therefore, this work proposes developing and evaluating a prototype that uses a large language model to create adaptive educational activities in face-to-face and virtual environments automatically. The applied methodology involves the implementation of a large language model with advanced cognitive capabilities to generate learning activities that adapt to individual needs. A proof of concept was developed to evaluate the practicality and usability of this approach. The research results indicate that the approach is practical and adaptable to different educational contexts, reinforcing the synergy between adaptive learning, artificial intelligence, and learning environments. The proof of concept evaluation showed that the prototype is highly usable, validating the proposal as an innovative solution to the growing needs of modern education.
{"title":"AdaptiveGPT: Towards Intelligent Adaptive Learning","authors":"Andréia dos Santos Sachete, Alba Valéria de Sant’anna de Freitas Loiola, Raquel Salcedo Gomes","doi":"10.1007/s11042-024-20144-8","DOIUrl":"https://doi.org/10.1007/s11042-024-20144-8","url":null,"abstract":"<p>Adaptive learning is an educational methodology that allows the personalization of learning according to the student’s pedagogical path. In digital environments, the strategic use of technologies enhances adaptive learning initiatives, enabling a dynamic understanding of intricate contextual nuances and the ability to identify and recommend appropriate learning activities. Therefore, this work proposes developing and evaluating a prototype that uses a large language model to create adaptive educational activities in face-to-face and virtual environments automatically. The applied methodology involves the implementation of a large language model with advanced cognitive capabilities to generate learning activities that adapt to individual needs. A proof of concept was developed to evaluate the practicality and usability of this approach. The research results indicate that the approach is practical and adaptable to different educational contexts, reinforcing the synergy between adaptive learning, artificial intelligence, and learning environments. The proof of concept evaluation showed that the prototype is highly usable, validating the proposal as an innovative solution to the growing needs of modern education.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1007/s11042-024-20132-y
Veenu Rani, Munish Kumar
Identification of individuals based on physical characteristics has recently gained popularity and falls under the category of pattern recognition. Biometric recognition has emerged as an effective strategy for preventing security breaches, as no two people share the same physical characteristics. "Gait recognition" specifically refers to identifying individuals based on their walking patterns. Human gait is a method of locomotion that relies on the coordination of the brain, nerves, and muscles. Traditionally, human gait analysis was performed subjectively through visual observations. However, with advancements in technology and deep learning, human gait analysis can now be conducted empirically and without the need for subject cooperation, enhancing the quality of life. Deep learning methods have demonstrated excellent performance in human gait recognition. In this article, the authors employed the VGG19 transfer learning model for human gait recognition. They used the public benchmark dataset CASIA-A for their experimental work, which contains a total of 19,139 images captured from 20 individuals. The dataset was segmented into two different patterns: 70:30 and 80:20. To optimize the performance of the proposed model, the authors considered three hyperparameters: loss, validation loss (val_loss), and accuracy rate. They reported accuracy rates of 96.9% and 97.8%, with losses of 2.71% and 2.01% for the two patterns, respectively.
{"title":"Transfer learning for human gait recognition using VGG19: CASIA-A dataset","authors":"Veenu Rani, Munish Kumar","doi":"10.1007/s11042-024-20132-y","DOIUrl":"https://doi.org/10.1007/s11042-024-20132-y","url":null,"abstract":"<p>Identification of individuals based on physical characteristics has recently gained popularity and falls under the category of pattern recognition. Biometric recognition has emerged as an effective strategy for preventing security breaches, as no two people share the same physical characteristics. \"Gait recognition\" specifically refers to identifying individuals based on their walking patterns. Human gait is a method of locomotion that relies on the coordination of the brain, nerves, and muscles. Traditionally, human gait analysis was performed subjectively through visual observations. However, with advancements in technology and deep learning, human gait analysis can now be conducted empirically and without the need for subject cooperation, enhancing the quality of life. Deep learning methods have demonstrated excellent performance in human gait recognition. In this article, the authors employed the VGG19 transfer learning model for human gait recognition. They used the public benchmark dataset CASIA-A for their experimental work, which contains a total of 19,139 images captured from 20 individuals. The dataset was segmented into two different patterns: 70:30 and 80:20. To optimize the performance of the proposed model, the authors considered three hyperparameters: loss, validation loss (val_loss), and accuracy rate. They reported accuracy rates of 96.9% and 97.8%, with losses of 2.71% and 2.01% for the two patterns, respectively.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1007/s11042-024-20067-4
Ranita Khumukcham, Kishorjit Nongmeikapam
Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the "Combinatorial Transformative Scalogram (CTS)." The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12(%) with a sensitivity of 93.85(%) and a specificity of 97.96(%) against other existing techniques. The current method performed well despite the data imbalance that exists between classes.
许多研究人员倾向于采用非侵入式技术,通过从语音信号中提取特征描述符来训练机器学习算法,从而准确识别声道生理异常的类型。然而,迄今为止,大多数技术仅限于对声音正常或异常进行分类。至关重要的是,经过训练的人工智能(AI)必须能够识别与嗓音相关的确切病理,以便在现实环境中实施。另一个问题是需要抑制可能与语音频谱混杂在一起的环境噪音。目前的工作提出了一种稳健、耗时较少且非侵入性的技术,用于识别与喉部声音信号相关的病理。更具体地说,对输入的语音信号采用了两阶段信号滤波方法,包括基于评分的几何方法和声门反滤波方法。其目的是估计噪声频谱,重新生成干净的信号,最后提供完全基本的声门流量衍生信号。在下一阶段,干净的声门导数信号被用于形成一种新的融合声谱图,这种声谱图目前被称为 "组合变换声谱图(CTS)"。CTS 是一个时频域图,由两个时频频谱图组合而成。在研究性能时使用了九个分类指标,分别是:灵敏度、平均准确度、误差、精确度、假阳性率、特异性、科恩卡帕(Cohen's kappa)、马修斯相关系数(Matthews Correlation Coefficient)和 F1 分数。与其他现有技术相比,使用 VOice ICar fEDerico II (VOICED) 标准数据库的平均准确率最高,为 94.12%,灵敏度为 93.85%,特异性为 97.96%。尽管类与类之间存在数据不平衡,但目前的方法表现良好。
{"title":"A geometric-approach based Combinatorial Transformative Scalogram analysis for multiclass identification of pathologies in a voice signal","authors":"Ranita Khumukcham, Kishorjit Nongmeikapam","doi":"10.1007/s11042-024-20067-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20067-4","url":null,"abstract":"<p>Many researchers have preferred non-invasive techniques for recognizing the exact type of physiological abnormality in the vocal tract by training machine learning algorithms with feature descriptors extracted from the voice signal. However, until now, most techniques have been limited to classifying whether a voice is normal or abnormal. It is crucial that the trained Artificial Intelligence (AI) be able to identify the exact pathology associated with voice for implementation in a realistic environment. Another issue is the need to suppress the ambient noise that could be mixed up with the spectra of the voice. Current work proposes a robust, less time-consuming and non-invasive technique for the identification of pathology associated with a laryngeal voice signal. More specifically, a two-stage signal filtering approach that encompasses a score-based geometric approach and a glottal inverse filtering method is applied to the input voice signal. The aim here is to estimate the noise spectra, to regenerate a clean signal and finally to deliver a completely fundamental glottal flow-derived signal. For the next stage, clean glottal derivative signals are used in the formation of a novel fused-scalogram which is currently referred to as the \"Combinatorial Transformative Scalogram (CTS).\" The CTS is a time-frequency domain plot which is a combination of two time-frequency scalograms. There is a thorough investigation of the performance of the two individual scalograms as well as that of the CTS database.Nine classification metrics are used to investigate performance, which are: sensitivity, mean accuracy, error, precision, false positive rate, specificity, Cohen’s kappa, Matthews Correlation Coefficient, and F1 score. Implementation of the VOice ICar fEDerico II (VOICED) standard database provided the highest mean accuracy of 94.12<span>(%)</span> with a sensitivity of 93.85<span>(%)</span> and a specificity of 97.96<span>(%)</span> against other existing techniques. The current method performed well despite the data imbalance that exists between classes.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1007/s11042-024-20164-4
Renu Singh, Ashlesha Gupta, Poonam Mittal
Over the past few years, blockchain technology has gained significant attention. This surge in popularity can be attributed to the emergence of cryptocurrencies and the development of smart contracts. Cryptocurrency is a digital currency that eliminates the problem of double spending. Cryptocurrencies like Bitcoin, Ethereum, Litecoin, Stellar, Zcash, Maker, Aave, etc. become popular and are preferred for money transfers. Smart contracts are the next popular technology on the blockchain after cryptocurrency. It can be considered a piece of code that can execute automatically when the predefined conditions are fulfilled. Researchers believe that the potential of blockchain with smart contracts is only in its initial stages and that its true potential has yet to be fully discovered. Hence, an extensive bibliometric analysis is conducted to understand blockchain trends for smart contracts and to give future directions in this field. For this analysis, various steps are followed, starting with formulating the research question, defining the scope of our research, extracting and analyzing data, answering the research question, and finally, drawing a conclusion. This research paper will be fruitful for scholars and researchers, providing an extensive statistical and network analysis of extracted smart contracts publications.
{"title":"Insights into research on blockchain for smart contracts: a bibliometric analysis","authors":"Renu Singh, Ashlesha Gupta, Poonam Mittal","doi":"10.1007/s11042-024-20164-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20164-4","url":null,"abstract":"<p>Over the past few years, blockchain technology has gained significant attention. This surge in popularity can be attributed to the emergence of cryptocurrencies and the development of smart contracts. Cryptocurrency is a digital currency that eliminates the problem of double spending. Cryptocurrencies like Bitcoin, Ethereum, Litecoin, Stellar, Zcash, Maker, Aave, etc. become popular and are preferred for money transfers. Smart contracts are the next popular technology on the blockchain after cryptocurrency. It can be considered a piece of code that can execute automatically when the predefined conditions are fulfilled. Researchers believe that the potential of blockchain with smart contracts is only in its initial stages and that its true potential has yet to be fully discovered. Hence, an extensive bibliometric analysis is conducted to understand blockchain trends for smart contracts and to give future directions in this field. For this analysis, various steps are followed, starting with formulating the research question, defining the scope of our research, extracting and analyzing data, answering the research question, and finally, drawing a conclusion. This research paper will be fruitful for scholars and researchers, providing an extensive statistical and network analysis of extracted smart contracts publications.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"6 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Object grasping is an important skill for robots to interact with the real world, especially in unstructured environments where occlusions and different shapes of target objects are present. In this work, we introduce a robot grasping pipeline called SUGrasping, which can obtain the grasping poses more precisely for target objects. The grasping pipeline treats the Truncated Signed Distance Function (TSDF) and point clouds of the grasping scene as input simultaneously. The proposed multi-head 3D U-Net accepts reconstructed TSDF representation and outputs the grasping configurations, including predicted grasp quality, orientation and width of the gripper. The point cloud is fed into PointNet to obtain the semantic segmentation results for all objects in the grasping workspace. With the help of point cloud inside the gripper, the relationship between the gripper and semantic information can be established. It makes robots know which object they are grasping, rather than just removing objects in the workspace like previous works. Experimental results show that the proposed method has an improvement in grasping success rate and percent cleared of target objects, which outperforms state-of-the-art methods compared in this paper.
物体抓取是机器人与现实世界交互的一项重要技能,尤其是在目标物体存在遮挡物和不同形状的非结构化环境中。在这项工作中,我们介绍了一种名为 SUGrasping 的机器人抓取管道,它可以更精确地获取目标物体的抓取姿势。该抓取流水线同时将截断符号距离函数(TSDF)和抓取场景的点云作为输入。建议的多头 3D U-Net 接受重建的 TSDF 表示并输出抓取配置,包括预测的抓取质量、抓手的方向和宽度。点云被输入 PointNet,以获得抓取工作区中所有物体的语义分割结果。在抓手内部点云的帮助下,可以建立抓手与语义信息之间的关系。它能让机器人知道自己正在抓取哪个物体,而不是像以前的作品那样,只是删除工作区中的物体。实验结果表明,所提出的方法提高了抓取成功率和目标物体的清除率,优于本文所比较的最先进方法。
{"title":"SUGrasping: a semantic grasping framework based on multi-head 3D U-Net","authors":"He Cao, Yunzhou Zhang, Zhexue Ge, Xin Chen, Xiaozheng Liu, Jiaqi Zhao","doi":"10.1007/s11042-024-20037-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20037-w","url":null,"abstract":"<p>Object grasping is an important skill for robots to interact with the real world, especially in unstructured environments where occlusions and different shapes of target objects are present. In this work, we introduce a robot grasping pipeline called SUGrasping, which can obtain the grasping poses more precisely for target objects. The grasping pipeline treats the Truncated Signed Distance Function (TSDF) and point clouds of the grasping scene as input simultaneously. The proposed multi-head 3D U-Net accepts reconstructed TSDF representation and outputs the grasping configurations, including predicted grasp quality, orientation and width of the gripper. The point cloud is fed into PointNet to obtain the semantic segmentation results for all objects in the grasping workspace. With the help of point cloud inside the gripper, the relationship between the gripper and semantic information can be established. It makes robots know which object they are grasping, rather than just removing objects in the workspace like previous works. Experimental results show that the proposed method has an improvement in grasping success rate and percent cleared of target objects, which outperforms state-of-the-art methods compared in this paper.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1007/s11042-024-20131-z
Chien-Hsing Chou, Cheng-Hou Chou, Yi-Zeng Hsieh, Tzu-Shien Yang
In this study, we integrate the Bidirectional Encoder Representations from Transformers (BERT) model with the Cycle Generative Adversarial Network (CycleGAN) to create a system for Chinese text style transfer. Natural language processing (NLP) involves converting human languages into data interpretable by computers, enabling applications like text classification, chatbots, and dialogue systems. Recent advancements, such as Google's transformer model and the BERT technique, have significantly improved NLP capabilities through self-attention mechanisms and unsupervised pretraining. Text style transfer modifies the style of texts without altering their semantics. Previous methods like StyIns and models based on disentangled representation learning highlight the challenges of retaining text meaning during style transfer. Our system leverages CycleGAN’s unsupervised learning to convert unpaired data between wuxia and fantasy styles while preserving semantics. Using the pretrained BERT model from the Chinese Knowledge and Information Processing (CKIP) Lab, our experimental results demonstrate successful style conversion, maintaining the original meanings of texts. This integration of BERT and CycleGAN shows promise for further advancements in NLP applications.
{"title":"Integrating cycleGAN and BERT for Chinese text style transfer","authors":"Chien-Hsing Chou, Cheng-Hou Chou, Yi-Zeng Hsieh, Tzu-Shien Yang","doi":"10.1007/s11042-024-20131-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20131-z","url":null,"abstract":"<p>In this study, we integrate the Bidirectional Encoder Representations from Transformers (BERT) model with the Cycle Generative Adversarial Network (CycleGAN) to create a system for Chinese text style transfer. Natural language processing (NLP) involves converting human languages into data interpretable by computers, enabling applications like text classification, chatbots, and dialogue systems. Recent advancements, such as Google's transformer model and the BERT technique, have significantly improved NLP capabilities through self-attention mechanisms and unsupervised pretraining. Text style transfer modifies the style of texts without altering their semantics. Previous methods like StyIns and models based on disentangled representation learning highlight the challenges of retaining text meaning during style transfer. Our system leverages CycleGAN’s unsupervised learning to convert unpaired data between wuxia and fantasy styles while preserving semantics. Using the pretrained BERT model from the Chinese Knowledge and Information Processing (CKIP) Lab, our experimental results demonstrate successful style conversion, maintaining the original meanings of texts. This integration of BERT and CycleGAN shows promise for further advancements in NLP applications.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1007/s11042-024-20143-9
Khaled Lounnas, Mohamed Lichouri, Mourad Abbas
As dialects are widely used in many countries, there is growing interest in incorporating them into various applications, including conversational systems. Processing spoken dialects is an important module in such systems, yet it remains a challenging task due to the lack of resources and the inherent ambiguity and complexity of dialects. This paper presents a comparison of two approaches for identifying spoken Maghrebi dialects, tested on an in-house corpus composed of four dialects: Algerian Arabic Dialect (AAD), Algerian Berber Dialect (ABD), Moroccan Arabic Dialect (MAD), and Moroccan Berber Dialect (MBD), as well as two variants of Modern Standard Arabic (MSA): MSA_ALG and MSA_MAR. The first method uses a fully connected neural network (NN2) to retrain several Transfer Learning (TL) models with varying layer numbers, including Residual Networks (ResNet50, ResNet101), Visual Geometric Group networks (VGG16, VGG19), Dense Convolutional Networks (DenseNet121, DenseNet169), and Efficient Convolutional Neural Networks for Mobile Vision Applications (MobileNet, MobileNetV2). These models were chosen based on their proven ability to capture different levels of feature abstraction: deeper models like ResNet and DenseNet are capable of capturing more complex and nuanced patterns, which is critical for distinguishing subtle differences in dialects, while VGG and MobileNet models offer computational efficiency, making them suitable for applications with limited resources. The second approach employs a “stacked generalization” strategy, which merges predictions from the previously trained models to enhance the final classification performance. Our results show that this cascade strategy improves the overall performance of the Language/Dialect Identification system, with an accuracy increase of up to 5% for specific dialect pairs. Notably, the best performance was achieved with DenseNet and ResNet models, reaching an accuracy of 99.11% for distinguishing between Algerian Berber Dialect and Moroccan Berber Dialect. These findings indicate that despite the limited size of the employed dataset, the cascade strategy and the selection of robust TL models significantly enhance the system’s performance in dialect identification. By leveraging the unique strengths of each model, our approach demonstrates a robust and efficient solution to the challenge of spoken dialect processing.
{"title":"Enhancing spoken dialect identification with stacked generalization of deep learning models","authors":"Khaled Lounnas, Mohamed Lichouri, Mourad Abbas","doi":"10.1007/s11042-024-20143-9","DOIUrl":"https://doi.org/10.1007/s11042-024-20143-9","url":null,"abstract":"<p>As dialects are widely used in many countries, there is growing interest in incorporating them into various applications, including conversational systems. Processing spoken dialects is an important module in such systems, yet it remains a challenging task due to the lack of resources and the inherent ambiguity and complexity of dialects. This paper presents a comparison of two approaches for identifying spoken Maghrebi dialects, tested on an in-house corpus composed of four dialects: Algerian Arabic Dialect (AAD), Algerian Berber Dialect (ABD), Moroccan Arabic Dialect (MAD), and Moroccan Berber Dialect (MBD), as well as two variants of Modern Standard Arabic (MSA): MSA_ALG and MSA_MAR. The first method uses a fully connected neural network (NN2) to retrain several Transfer Learning (TL) models with varying layer numbers, including Residual Networks (ResNet50, ResNet101), Visual Geometric Group networks (VGG16, VGG19), Dense Convolutional Networks (DenseNet121, DenseNet169), and Efficient Convolutional Neural Networks for Mobile Vision Applications (MobileNet, MobileNetV2). These models were chosen based on their proven ability to capture different levels of feature abstraction: deeper models like ResNet and DenseNet are capable of capturing more complex and nuanced patterns, which is critical for distinguishing subtle differences in dialects, while VGG and MobileNet models offer computational efficiency, making them suitable for applications with limited resources. The second approach employs a “stacked generalization” strategy, which merges predictions from the previously trained models to enhance the final classification performance. Our results show that this cascade strategy improves the overall performance of the Language/Dialect Identification system, with an accuracy increase of up to 5% for specific dialect pairs. Notably, the best performance was achieved with DenseNet and ResNet models, reaching an accuracy of 99.11% for distinguishing between Algerian Berber Dialect and Moroccan Berber Dialect. These findings indicate that despite the limited size of the employed dataset, the cascade strategy and the selection of robust TL models significantly enhance the system’s performance in dialect identification. By leveraging the unique strengths of each model, our approach demonstrates a robust and efficient solution to the challenge of spoken dialect processing.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1007/s11042-024-20147-5
Sidharth Samanta, Debasish Jena, Suvendu Rup
Unsupervised Domain Adaptation (UDA) in person re-identification (reID) addresses the challenge of adapting models trained on labeled source domains to unlabeled target domains, which is crucial for real-world applications. A significant problem in clustering-based UDA methods is the noise in pseudo-labels generated due to inter-domain disparities, which can degrade the performance of reID models. To address this issue, we propose the Unsupervised Dual-Teacher Knowledge Distillation (UDKD), an efficient learning scheme designed to enhance robustness against noisy pseudo-labels in UDA for person reID. The proposed UDKD method combines the outputs of two source-trained classifiers (teachers) to train a third classifier (student) using a modified soft-triplet loss-based metric learning approach. Additionally, a weighted averaging technique is employed to rectify the noise in the predicted labels generated from the teacher networks. Experimental results demonstrate that the proposed UDKD significantly improves performance in terms of mean Average Precision (mAP) and Cumulative Match Characteristic curve (Rank 1, 5, and 10). Specifically, UDKD achieves an mAP of 84.57 and 73.32, and Rank 1 scores of 94.34 and 88.26 for Duke to Market and Market to Duke scenarios, respectively. These results surpass the state-of-the-art performance, underscoring the efficacy of UDKD in advancing UDA techniques for person reID and highlighting its potential to enhance performance and robustness in real-world applications.
人物再识别(reID)中的无监督域适应(UDA)解决了将在有标签源域上训练的模型适应于无标签目标域的难题,这对现实世界的应用至关重要。基于聚类的 UDA 方法中的一个重要问题是由于域间差异而产生的伪标签噪声,这会降低 reID 模型的性能。为了解决这个问题,我们提出了无监督双教师知识蒸馏(UDKD)方法,这是一种高效的学习方案,旨在增强人的重识别(reID)UDA方法对噪声伪标签的鲁棒性。所提出的 UDKD 方法将两个源训练分类器(教师)的输出结合起来,使用改进的基于软三重损失的度量学习方法训练第三个分类器(学生)。此外,还采用了加权平均技术来纠正教师网络生成的预测标签中的噪声。实验结果表明,所提出的 UDKD 在平均精度(mAP)和累积匹配特性曲线(排名 1、5 和 10)方面都有显著提高。具体来说,UDKD 在 Duke to Market 和 Market to Duke 场景中的 mAP 分别达到 84.57 和 73.32,Rank 1 分数分别达到 94.34 和 88.26。这些结果超越了最先进的性能,凸显了 UDKD 在推进用于人员再识别的 UDA 技术方面的功效,并突出了其在实际应用中提高性能和鲁棒性的潜力。
{"title":"Unsupervised dual-teacher knowledge distillation for pseudo-label refinement in domain adaptive person re-identification","authors":"Sidharth Samanta, Debasish Jena, Suvendu Rup","doi":"10.1007/s11042-024-20147-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20147-5","url":null,"abstract":"<p>Unsupervised Domain Adaptation (UDA) in person re-identification (reID) addresses the challenge of adapting models trained on labeled source domains to unlabeled target domains, which is crucial for real-world applications. A significant problem in clustering-based UDA methods is the noise in pseudo-labels generated due to inter-domain disparities, which can degrade the performance of reID models. To address this issue, we propose the Unsupervised Dual-Teacher Knowledge Distillation (UDKD), an efficient learning scheme designed to enhance robustness against noisy pseudo-labels in UDA for person reID. The proposed UDKD method combines the outputs of two source-trained classifiers (teachers) to train a third classifier (student) using a modified soft-triplet loss-based metric learning approach. Additionally, a weighted averaging technique is employed to rectify the noise in the predicted labels generated from the teacher networks. Experimental results demonstrate that the proposed UDKD significantly improves performance in terms of mean Average Precision (mAP) and Cumulative Match Characteristic curve (Rank 1, 5, and 10). Specifically, UDKD achieves an mAP of <b>84.57</b> and <b>73.32</b>, and Rank 1 scores of <b>94.34</b> and <b>88.26</b> for Duke to Market and Market to Duke scenarios, respectively. These results surpass the state-of-the-art performance, underscoring the efficacy of UDKD in advancing UDA techniques for person reID and highlighting its potential to enhance performance and robustness in real-world applications.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"8 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1007/s11042-024-20178-y
Sainul Islam Ansary, Atul Mishra, Sankha Deb, Alok Kanti Deb
Automatic grasping of unknown 3D objects is still a very challenging problem in robotics. Such challenges mainly originate from the limitations of perception systems and implementations of the grasp planning methods for handling arbitrary 3D objects on real robot platforms. This paper presents a complete framework for robotic grasping of unknown 3D objects in a tabletop environment. The framework comprises of a 3D perception system for obtaining the complete point cloud of the objects, followed by a module for finding the best grasp by an object-slicing based grasp planner, a module for trajectory generation for pick and place operations, and finally performing the planned grasps on a real robot platform. The proposed 3D object perception captures the complete geometry information of the target object using two depth cameras placed at different locations. A hole-filling algorithm is also proposed to quickly fill the missing data points in the captured point cloud of target object. The object-slicing based grasp planner is extended to handle the obstacles posed by the neighbouring objects on a tabletop environment. Then, the proposed framework is tested on common household objects by performing pick and place operations on a real robot fitted with an adaptive gripper. Moreover, finding the best feasible grasp in the presence of neighbouring objects is also demonstrated such as avoiding the table-top and surrounding objects.
{"title":"A framework for robotic grasping of 3D objects in a tabletop environment","authors":"Sainul Islam Ansary, Atul Mishra, Sankha Deb, Alok Kanti Deb","doi":"10.1007/s11042-024-20178-y","DOIUrl":"https://doi.org/10.1007/s11042-024-20178-y","url":null,"abstract":"<p><i>A</i>utomatic grasping of unknown 3D objects is still a very challenging problem in robotics. Such challenges mainly originate from the limitations of perception systems and implementations of the grasp planning methods for handling arbitrary 3D objects on real robot platforms. This paper presents a complete framework for robotic grasping of unknown 3D objects in a tabletop environment. The framework comprises of a 3D perception system for obtaining the complete point cloud of the objects, followed by a module for finding the best grasp by an object-slicing based grasp planner, a module for trajectory generation for pick and place operations, and finally performing the planned grasps on a real robot platform. The proposed 3D object perception captures the complete geometry information of the target object using two depth cameras placed at different locations. A hole-filling algorithm is also proposed to quickly fill the missing data points in the captured point cloud of target object. The object-slicing based grasp planner is extended to handle the obstacles posed by the neighbouring objects on a tabletop environment. Then, the proposed framework is tested on common household objects by performing pick and place operations on a real robot fitted with an adaptive gripper. Moreover, finding the best feasible grasp in the presence of neighbouring objects is also demonstrated such as avoiding the table-top and surrounding objects.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the Covid-19 pandemic, the education system in India has changed to remote that is, online study mode. Though there are works on the effect of teaching learning on Indian students, the effect of online mode and associated mental state, particularly when the entire country is going through a crisis could not be found in the literature. Our goal is to analyze data and find some pattern through which we can understand the effectiveness of the online study and also try to figure out the stress level. The dataset we collected from 500 undergraduate college students during April-May, 2021 is in questionnaire format. Our contribution in this paper are - (i) publishing a dataset of student feedbacks, and (ii) designing a data processing pipeline involving autoencoders followed by clustering approach. The dataset is in text format so for our analysis we have converted the dataset into a numerical format using the concept of a binary bag of words. Dimensionality reduction is applied through autoencoder for an effective latent space representation. Finally, for finding patterns out of this dimensionally reduced feature space, we have applied unsupervised learning algorithms - kMeans and DBSCAN. A thorough analysis of the clustering process reveals that the absence of social communication in purely online education provokes isolation irrespective of the urban or rural background of the students. However, it could supplement offline classes as a substantial number of students welcomed the concept of online learning as reported in the data.
{"title":"An autoencoder based unsupervised clustering approach to analyze the effect of E-learning on the mental health of Indian students during the Covid-19 pandemic","authors":"Pritha Banerjee, Chandan Jana, Jayita Saha, Chandreyee Chowdhury","doi":"10.1007/s11042-024-19983-2","DOIUrl":"https://doi.org/10.1007/s11042-024-19983-2","url":null,"abstract":"<p>Due to the Covid-19 pandemic, the education system in India has changed to remote that is, online study mode. Though there are works on the effect of teaching learning on Indian students, the effect of online mode and associated mental state, particularly when the entire country is going through a crisis could not be found in the literature. Our goal is to analyze data and find some pattern through which we can understand the effectiveness of the online study and also try to figure out the stress level. The dataset we collected from 500 undergraduate college students during April-May, 2021 is in questionnaire format. Our contribution in this paper are - (i) publishing a dataset of student feedbacks, and (ii) designing a data processing pipeline involving autoencoders followed by clustering approach. The dataset is in text format so for our analysis we have converted the dataset into a numerical format using the concept of a binary bag of words. Dimensionality reduction is applied through autoencoder for an effective latent space representation. Finally, for finding patterns out of this dimensionally reduced feature space, we have applied unsupervised learning algorithms - kMeans and DBSCAN. A thorough analysis of the clustering process reveals that the absence of social communication in purely online education provokes isolation irrespective of the urban or rural background of the students. However, it could supplement offline classes as a substantial number of students welcomed the concept of online learning as reported in the data.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}