Pub Date : 2024-09-12DOI: 10.1016/j.asoc.2024.112231
The traditional manual concrete vibration work faces numerous limitations, necessitating efficient automated method to assist in this task. This study proposes a vision-based continuous concrete vibration method for vibrating robots. By enhancing the YOLOv8n model with attention mechanisms, our proposed method demonstrates a high AP of 93.31 % in identifying reinforcing grids, and an FPS of 25.6 on embedded systems. For the first time in concrete vibration tasks, this study utilizes spatial positional information to cluster coordinate data, transforming confidence-sorted data into spatially ordered sequences. Vibrating robot case test shows that the proposed method enhances the vibration speed by 22.18 % and improves the vibration success rate by 11.67 % compared to traditional strategies. Additionally, the on-site experiment conducted at four construction sites demonstrated the robustness of the proposed method. These findings advance automation in concrete vibration work, offering significant implications for the fields of robotics and construction engineering.
{"title":"A continuous concrete vibration method for robots based on machine vision with integrated spatial features","authors":"","doi":"10.1016/j.asoc.2024.112231","DOIUrl":"10.1016/j.asoc.2024.112231","url":null,"abstract":"<div><p>The traditional manual concrete vibration work faces numerous limitations, necessitating efficient automated method to assist in this task. This study proposes a vision-based continuous concrete vibration method for vibrating robots. By enhancing the YOLOv8n model with attention mechanisms, our proposed method demonstrates a high AP of 93.31 % in identifying reinforcing grids, and an FPS of 25.6 on embedded systems. For the first time in concrete vibration tasks, this study utilizes spatial positional information to cluster coordinate data, transforming confidence-sorted data into spatially ordered sequences. Vibrating robot case test shows that the proposed method enhances the vibration speed by 22.18 % and improves the vibration success rate by 11.67 % compared to traditional strategies. Additionally, the on-site experiment conducted at four construction sites demonstrated the robustness of the proposed method. These findings advance automation in concrete vibration work, offering significant implications for the fields of robotics and construction engineering.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1016/j.asoc.2024.112191
Clean air, as a symbol of high-quality air quality, is the most basic requirement for people to maintain health. Moreover, in keeping humans fit, accurate short-term air quality prediction is vital. The decomposition algorithm can better capture the local features and temporal changes of the data. However, it increases the computation time, resource consumption, and complexity of the model. On the other hand, existing forecasting systems overlook instability and uncertainty. To solve the above problems, a deterministic and uncertainty AOA-DBGRU-MDN deep learning systems is proposed, which combines arithmetic optimization algorithm (AOA), double-layer bi-directional GRUs (DBGRU), and mixture density network (MDN). The above systems consider meteorological factors and air pollutants comprehensively. It involves feature selection using maximum information coefficient (MIC), decomposition using complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm, classification, and compression of decomposed components using entropy-Huffman tree compression. Firstly, the information measurement process reduces the number of components significantly. Following the incorporation of multi-factor data, the optimal DBGRU model is then obtained using AOA. Finally, the training errors are fitted using MDN to obtain interval prediction results. The experiments demonstrate that (1) Using the CEEMDAN algorithm can improve the prediction accuracy; (2) Classifying and reconstructing the data based on entropy-Huffman tree compression can not only decrease the model's training volume and improve training efficiency but also boost the model's prediction accuracy; (3) The AOA-DBGRU-MDN system performs probabilistic prediction to obtain an effective and intuitive prediction interval to improve the point prediction of air quality prediction.
{"title":"Short-term air quality prediction using point and interval deep learning systems coupled with multi-factor decomposition and data-driven tree compression","authors":"","doi":"10.1016/j.asoc.2024.112191","DOIUrl":"10.1016/j.asoc.2024.112191","url":null,"abstract":"<div><p>Clean air, as a symbol of high-quality air quality, is the most basic requirement for people to maintain health. Moreover, in keeping humans fit, accurate short-term air quality prediction is vital. The decomposition algorithm can better capture the local features and temporal changes of the data. However, it increases the computation time, resource consumption, and complexity of the model. On the other hand, existing forecasting systems overlook instability and uncertainty. To solve the above problems, a deterministic and uncertainty AOA-DBGRU-MDN deep learning systems is proposed, which combines arithmetic optimization algorithm (AOA), double-layer bi-directional GRUs (DBGRU), and mixture density network (MDN). The above systems consider meteorological factors and air pollutants comprehensively. It involves feature selection using maximum information coefficient (MIC), decomposition using complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm, classification, and compression of decomposed components using entropy-Huffman tree compression. Firstly, the information measurement process reduces the number of components significantly. Following the incorporation of multi-factor data, the optimal DBGRU model is then obtained using AOA. Finally, the training errors are fitted using MDN to obtain interval prediction results. The experiments demonstrate that (1) Using the CEEMDAN algorithm can improve the prediction accuracy; (2) Classifying and reconstructing the data based on entropy-Huffman tree compression can not only decrease the model's training volume and improve training efficiency but also boost the model's prediction accuracy; (3) The AOA-DBGRU-MDN system performs probabilistic prediction to obtain an effective and intuitive prediction interval to improve the point prediction of air quality prediction.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1016/j.asoc.2024.112235
Hand gestures are crucial for developing prosthetic and rehabilitation devices, enabling intuitive human–computer interaction (HCI) and improving accessibility for individuals with impairments. Recently, gesture recognition systems based on surface electromyography (sEMG) have been widely employed in various fields, demonstrating remarkable advantages and developments. In this paper, we present a comprehensive survey on sEMG-based hand gesture recognition. We provide an overview of the basic knowledge and background of sEMG signals and the acquisition equipment used. We delve into the applied feature extraction methods and classification models, focusing on recent advances in deep learning techniques. We also identify the datasets of sEMG signals used for hand gesture recognition. Moreover, we highlight recent applications of sEMG-based gesture recognition methods, including HCI, sign language recognition, rehabilitation, prosthesis control, and exoskeletons for augmentation. Additionally, we outline the latest innovative progress in this field, such as the influence of force, user identity detection, and migration effects. We also discuss the current limitations and challenges. Finally, we summarize the main findings and discuss future directions to enhance sEMG-based hand gesture recognition.
{"title":"A survey on hand gesture recognition based on surface electromyography: Fundamentals, methods, applications, challenges and future trends","authors":"","doi":"10.1016/j.asoc.2024.112235","DOIUrl":"10.1016/j.asoc.2024.112235","url":null,"abstract":"<div><p>Hand gestures are crucial for developing prosthetic and rehabilitation devices, enabling intuitive human–computer interaction (HCI) and improving accessibility for individuals with impairments. Recently, gesture recognition systems based on surface electromyography (sEMG) have been widely employed in various fields, demonstrating remarkable advantages and developments. In this paper, we present a comprehensive survey on sEMG-based hand gesture recognition. We provide an overview of the basic knowledge and background of sEMG signals and the acquisition equipment used. We delve into the applied feature extraction methods and classification models, focusing on recent advances in deep learning techniques. We also identify the datasets of sEMG signals used for hand gesture recognition. Moreover, we highlight recent applications of sEMG-based gesture recognition methods, including HCI, sign language recognition, rehabilitation, prosthesis control, and exoskeletons for augmentation. Additionally, we outline the latest innovative progress in this field, such as the influence of force, user identity detection, and migration effects. We also discuss the current limitations and challenges. Finally, we summarize the main findings and discuss future directions to enhance sEMG-based hand gesture recognition.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1016/j.asoc.2024.112229
Recently, transformer trackers mostly associate multiple reference images with the search area to adapt to the changing appearance of the target. However, they ignore the learned cross-relations between the target and surrounding, leading to difficulties in building coherent contextual models for specific target instances. This paper presents a Temporal Relation Transformer Tracker (TRTT) for robust visual tracking, providing a concise approach to modeling temporal relations by dual target memory learning. Specifically, a temporal relation transformer network generates paired memories based on static and dynamic templates, which are reinforced interactively. The memory contains implicit relation hints that capture the relations between the tracked object and its immediate surroundings. More importantly, to ensure consistency of target instance identities between frames, the relation hints from previous frames are transferred to the current frame for merging temporal contextual attention. Our method also incorporates mechanisms for reusing favorable cross-relations and instance-specific features, thereby overcoming background interference in complex spatio-temporal interactions through a sequential constraint. Furthermore, we design a memory token sparsification method that leverages the key points of the target to eliminate interferences and optimize attention calculations. Extensive experiments demonstrate that our method surpasses advanced trackers on 8 challenging benchmarks while maintaining real-time running speed.
{"title":"Temporal relation transformer for robust visual tracking with dual-memory learning","authors":"","doi":"10.1016/j.asoc.2024.112229","DOIUrl":"10.1016/j.asoc.2024.112229","url":null,"abstract":"<div><p>Recently, transformer trackers mostly associate multiple reference images with the search area to adapt to the changing appearance of the target. However, they ignore the learned cross-relations between the target and surrounding, leading to difficulties in building coherent contextual models for specific target instances. This paper presents a Temporal Relation Transformer Tracker (TRTT) for robust visual tracking, providing a concise approach to modeling temporal relations by dual target memory learning. Specifically, a temporal relation transformer network generates paired memories based on static and dynamic templates, which are reinforced interactively. The memory contains implicit relation hints that capture the relations between the tracked object and its immediate surroundings. More importantly, to ensure consistency of target instance identities between frames, the relation hints from previous frames are transferred to the current frame for merging temporal contextual attention. Our method also incorporates mechanisms for reusing favorable cross-relations and instance-specific features, thereby overcoming background interference in complex spatio-temporal interactions through a sequential constraint. Furthermore, we design a memory token sparsification method that leverages the key points of the target to eliminate interferences and optimize attention calculations. Extensive experiments demonstrate that our method surpasses advanced trackers on 8 challenging benchmarks while maintaining real-time running speed.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1016/j.asoc.2024.112237
Although face recognition technology has been applied in many scenarios, it still suffers from many types of presentation attacks, so face anti-spoofing (FAS) becomes a hot topic in computer vision. Recently, vision transformer is recognized as the mainstream architecture for FAS, which always relies on auxiliary information, sophisticated tricks and huge model parameters. Considering that face based identity authentication usually takes place on mobile-like devices, therefore how to design an effective and lightweight model is of great significance. Inspired by the powerful global modeling ability of self-attention and the model compression ability of knowledge distillation, a simple yet effective knowledge distillation approach is proposed for FAS under transformer framework. Our primary idea is to leverage the rich knowledge of a teacher network pre-trained on large-scale face data to guide the learning of a lightweight student network. The main contributions of our method are threefold: (1) Feature- and logits-level distillation are combined to transfer the rich knowledge of teacher to student. (2) A head-aware strategy is proposed to deal with the dimension mismatching issue of middle encoder layers between teacher and student networks, in which a novel attention head correlation matrix is introduced. (3) Our method can bridge the performance gap between teacher and student, and the resulting student network is extremely lightweight with only 5 MB parameters. Extensive experiments are conducted on three public face-spoofing datasets, CASIA-FASD, Replay-Attack and OULU-NPU, the results demonstrate that our method can obtain performance on par with or superior to most FAS methods and outperform many knowledge distillation methods. Meanwhile, the distilled student network achieves excellent performance with 17 fewer parameters and 9 faster inference time compared to the teacher network. The code will be publicly available at https://github.com/Maricle-zhangjun/HaTFAS.
{"title":"Efficient face anti-spoofing via head-aware transformer based knowledge distillation with 5 MB model parameters","authors":"","doi":"10.1016/j.asoc.2024.112237","DOIUrl":"10.1016/j.asoc.2024.112237","url":null,"abstract":"<div><p>Although face recognition technology has been applied in many scenarios, it still suffers from many types of presentation attacks, so face anti-spoofing (FAS) becomes a hot topic in computer vision. Recently, vision transformer is recognized as the mainstream architecture for FAS, which always relies on auxiliary information, sophisticated tricks and huge model parameters. Considering that face based identity authentication usually takes place on mobile-like devices, therefore how to design an effective and lightweight model is of great significance. Inspired by the powerful global modeling ability of self-attention and the model compression ability of knowledge distillation, a simple yet effective knowledge distillation approach is proposed for FAS under transformer framework. Our primary idea is to leverage the rich knowledge of a teacher network pre-trained on large-scale face data to guide the learning of a lightweight student network. The main contributions of our method are threefold: (1) Feature- and logits-level distillation are combined to transfer the rich knowledge of teacher to student. (2) A head-aware strategy is proposed to deal with the dimension mismatching issue of middle encoder layers between teacher and student networks, in which a novel attention head correlation matrix is introduced. (3) Our method can bridge the performance gap between teacher and student, and the resulting student network is extremely lightweight with only 5 MB parameters. Extensive experiments are conducted on three public face-spoofing datasets, CASIA-FASD, Replay-Attack and OULU-NPU, the results demonstrate that our method can obtain performance on par with or superior to most FAS methods and outperform many knowledge distillation methods. Meanwhile, the distilled student network achieves excellent performance with 17<span><math><mo>×</mo></math></span> fewer parameters and 9<span><math><mo>×</mo></math></span> faster inference time compared to the teacher network. The code will be publicly available at <span><span>https://github.com/Maricle-zhangjun/HaTFAS</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1016/j.asoc.2024.112240
Multiple exposure image fusion is a technique used to obtain high dynamic range images. Due to its low cost and high efficiency, it has received a lot of attention from researchers in recent years. Currently, most deep learning-based multiple exposure image fusion methods extract features from different exposure images using a single feature extraction method. Some methods simply rely on two different modules to directly extract features. However, this approach inevitably leads to the loss of some feature information during the feature extraction process, thus further affecting the performance of the model. To minimize the loss of feature information as much as possible, we propose an ultra-high-definition (UHD) multiple exposure image fusion method based on multi-scale feature extraction. The method adopts a U-shaped structure to construct the overall network model, which can fully exploit the feature information at different levels. Additionally, we construct a novel hybrid stacking paradigm to combine convolutional neural networks and Transformer modules. This combined module can extract both local texture features and global color features simultaneously. To more efficiently fuse and extract features, we also design a cross-layer feature fusion module, which can adaptively learn the correlation between features at different layers. Numerous quantitative and qualitative results demonstrate that our proposed method performs well in UHD multiple exposure image fusion.
多重曝光图像融合是一种用于获取高动态范围图像的技术。由于其成本低、效率高,近年来受到研究人员的广泛关注。目前,大多数基于深度学习的多重曝光图像融合方法都是使用单一特征提取方法从不同曝光图像中提取特征。有些方法只是依靠两个不同的模块直接提取特征。然而,这种方法不可避免地会在特征提取过程中损失一些特征信息,从而进一步影响模型的性能。为了尽可能减少特征信息的损失,我们提出了一种基于多尺度特征提取的超高清(UHD)多重曝光图像融合方法。该方法采用 U 型结构构建整体网络模型,可充分利用不同层次的特征信息。此外,我们还构建了一种新颖的混合堆叠范式,将卷积神经网络和 Transformer 模块结合起来。这种组合模块可以同时提取局部纹理特征和全局颜色特征。为了更有效地融合和提取特征,我们还设计了一个跨层特征融合模块,它可以自适应地学习不同层特征之间的相关性。大量定量和定性结果表明,我们提出的方法在超高清多重曝光图像融合中表现出色。
{"title":"An ultra-high-definition multi-exposure image fusion method based on multi-scale feature extraction","authors":"","doi":"10.1016/j.asoc.2024.112240","DOIUrl":"10.1016/j.asoc.2024.112240","url":null,"abstract":"<div><p>Multiple exposure image fusion is a technique used to obtain high dynamic range images. Due to its low cost and high efficiency, it has received a lot of attention from researchers in recent years. Currently, most deep learning-based multiple exposure image fusion methods extract features from different exposure images using a single feature extraction method. Some methods simply rely on two different modules to directly extract features. However, this approach inevitably leads to the loss of some feature information during the feature extraction process, thus further affecting the performance of the model. To minimize the loss of feature information as much as possible, we propose an ultra-high-definition (UHD) multiple exposure image fusion method based on multi-scale feature extraction. The method adopts a U-shaped structure to construct the overall network model, which can fully exploit the feature information at different levels. Additionally, we construct a novel hybrid stacking paradigm to combine convolutional neural networks and Transformer modules. This combined module can extract both local texture features and global color features simultaneously. To more efficiently fuse and extract features, we also design a cross-layer feature fusion module, which can adaptively learn the correlation between features at different layers. Numerous quantitative and qualitative results demonstrate that our proposed method performs well in UHD multiple exposure image fusion.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1016/j.asoc.2024.112230
Batch Processing Machines (BPM) and transportation are seldom studied simultaneously in Flexible Flow Shop. In this study, Flexible Flow Shop Scheduling Problem (FFSP) with BPM at the last stage and transportation is considered and an adaptive shuffled frog-leaping algorithm (ASFLA) is proposed to minimize makespan. To produce high-quality solutions, a heuristic is employed to produce initial solution, two groups are formed by using all memeplexes, then an adaptive memeplex search is implemented, in which the number of searches is dynamically determined by the quality of the memeplex, an adaptive group search is also conducted by exchanging memeplexes or supporting of the worse memeplex. A novel population shuffling and the worst memeplex elimination are proposed. A number of computational experiments are executed to test the new strategies and performances of ASFLA. Computational results demonstrate that new strategies are effective and ASFLA is a very competitive algorithm for FFSP with BPM and transportation.
{"title":"An adaptive shuffled frog-leaping algorithm for flexible flow shop scheduling problem with batch processing machines","authors":"","doi":"10.1016/j.asoc.2024.112230","DOIUrl":"10.1016/j.asoc.2024.112230","url":null,"abstract":"<div><p>Batch Processing Machines (BPM) and transportation are seldom studied simultaneously in Flexible Flow Shop. In this study, Flexible Flow Shop Scheduling Problem (FFSP) with BPM at the last stage and transportation is considered and an adaptive shuffled frog-leaping algorithm (ASFLA) is proposed to minimize makespan. To produce high-quality solutions, a heuristic is employed to produce initial solution, two groups are formed by using all memeplexes, then an adaptive memeplex search is implemented, in which the number of searches is dynamically determined by the quality of the memeplex, an adaptive group search is also conducted by exchanging memeplexes or supporting of the worse memeplex. A novel population shuffling and the worst memeplex elimination are proposed. A number of computational experiments are executed to test the new strategies and performances of ASFLA. Computational results demonstrate that new strategies are effective and ASFLA is a very competitive algorithm for FFSP with BPM and transportation.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1016/j.asoc.2024.112198
Population growth in crowded cities and the resulting increase in vehicle use have led to the problem of insufficient parking. When public parking lots and urban growth are not in coordination, vehicles park on the street and close the crosswalks. In the coming years, this problem will become more complicated with the addition of autonomous vehicles (AVs) to urban traffic. This study addresses the research question of how to effectively select AV parking lots in urban areas experiencing population growth and increased vehicle usage. For this aim, a hybrid Multi-Criteria Decision Making (MCDM) methodology, combining SWARA (Step-wise Weight Assessment Ratio Analysis) and TOPSIS (Technique for Order Preference by Similarity) approaches in a Fermatean Fuzzy (FF) environment is proposed. The decision hierarchy based on the SCOR model has been developed to determine and construct the evaluation criteria. Then, a case study analysis has been applied to selected districts in Istanbul, which is Turkiye's most populous and developing city. Operating expenses, safety and security, and land costs are determined as the most important factors. As a result of the detailed fuzzy analysis, which districts should primarily be chosen for AV parking lots in Istanbul is determined and finally, the robustness and validity of the results obtained by the sensitivity analysis being questioned. The study contributes by providing insights into AV parking lot selection, demonstrating the efficacy of the proposed methodology, and highlighting the importance of addressing this issue in urban planning.
拥挤城市的人口增长和随之而来的车辆使用增加导致了停车位不足的问题。当公共停车场与城市发展不协调时,车辆就会停在街道上,关闭人行横道。未来几年,随着自动驾驶汽车(AV)加入城市交通,这一问题将变得更加复杂。本研究探讨了如何在人口增长和车辆使用增加的城市地区有效选择 AV 停车场的研究问题。为此,本研究提出了一种混合多标准决策(MCDM)方法,该方法结合了在费曼模糊(FF)环境下的 SWARA(逐步权重评估比率分析)和 TOPSIS(相似性排序偏好技术)方法。在 SCOR 模型的基础上建立了决策层次结构,以确定和构建评价标准。然后,对伊斯坦布尔(土耳其人口最多的发展中城市)的选定地区进行了案例研究分析。运营费用、安全和安保以及土地成本被确定为最重要的因素。通过详细的模糊分析,确定了伊斯坦布尔的视听停车场应主要选择哪些地区,最后,对敏感性分析得出的结果的稳健性和有效性提出了质疑。这项研究有助于深入了解视听停车场的选择,证明了所建议方法的有效性,并强调了在城市规划中解决这一问题的重要性。
{"title":"A fermatean fuzzy SWARA-TOPSIS methodology based on SCOR model for autonomous vehicle parking lot selection","authors":"","doi":"10.1016/j.asoc.2024.112198","DOIUrl":"10.1016/j.asoc.2024.112198","url":null,"abstract":"<div><p>Population growth in crowded cities and the resulting increase in vehicle use have led to the problem of insufficient parking. When public parking lots and urban growth are not in coordination, vehicles park on the street and close the crosswalks. In the coming years, this problem will become more complicated with the addition of autonomous vehicles (AVs) to urban traffic. This study addresses the research question of how to effectively select AV parking lots in urban areas experiencing population growth and increased vehicle usage. For this aim, a hybrid Multi-Criteria Decision Making (MCDM) methodology, combining SWARA (Step-wise Weight Assessment Ratio Analysis) and TOPSIS (Technique for Order Preference by Similarity) approaches in a Fermatean Fuzzy (FF) environment is proposed. The decision hierarchy based on the SCOR model has been developed to determine and construct the evaluation criteria. Then, a case study analysis has been applied to selected districts in Istanbul, which is Turkiye's most populous and developing city. Operating expenses, safety and security, and land costs are determined as the most important factors. As a result of the detailed fuzzy analysis, which districts should primarily be chosen for AV parking lots in Istanbul is determined and finally, the robustness and validity of the results obtained by the sensitivity analysis being questioned. The study contributes by providing insights into AV parking lot selection, demonstrating the efficacy of the proposed methodology, and highlighting the importance of addressing this issue in urban planning.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142162604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1016/j.asoc.2024.112224
Sound event detection (SED) is a core technology in smart home projects that rely on detected sound events to trigger specific actions. SED systems face two major challenges: high labeling costs and complex acoustic environments. To reduce labeling costs, some semi-supervised systems extract both global and local features for classification. However, these methods treat global and local features equally, not accounting for their varying importance when recognizing different types of sound events. Furthermore, to address complex acoustic environments, some studies use multitask learning frameworks to introduce SED-related tasks as auxiliaries to improve detection performance. However, these methods fail to align tasks within the framework, leading to conflicting outputs that may limit system performance. To address these issues, in this paper we propose a “two-to-one” teacher-student learning based semi-supervised SED system. This system employs a gating mechanism to selectively enhance global and local features, improving adaptability to different types of sound events, and incorporates a cross-task alignment module to interact SED with related tasks, reducing the risk of performance degradation caused by conflicting outputs. Experimental results on two datasets demonstrate that our system achieves the best performance in all metrics, with EB-F1 scores of 48.1 % and 64.7 %, representing improvements of 15.3 % and 10.6 % over the baseline ConformerSED system, respectively. Our work offers an effective SED solution for smart home projects by providing a semi-supervised SED system that performs well while reducing labeling costs.
声音事件检测(SED)是智能家居项目中的一项核心技术,它依靠检测到的声音事件来触发特定操作。SED 系统面临两大挑战:高昂的标注成本和复杂的声学环境。为了降低标注成本,一些半监督系统同时提取全局和局部特征进行分类。然而,这些方法对全局和局部特征一视同仁,没有考虑到它们在识别不同类型声音事件时的不同重要性。此外,为了应对复杂的声学环境,一些研究利用多任务学习框架引入 SED 相关任务作为辅助工具,以提高检测性能。然而,这些方法未能协调框架内的任务,导致输出结果相互冲突,从而限制了系统性能。为了解决这些问题,我们在本文中提出了一种基于 "二对一 "师生学习的半监督 SED 系统。该系统采用门控机制,选择性地增强全局和局部特征,提高了对不同类型声音事件的适应性,并结合了跨任务对齐模块,将 SED 与相关任务进行交互,降低了因输出冲突而导致性能下降的风险。在两个数据集上的实验结果表明,我们的系统在所有指标上都取得了最佳性能,EB-F1 分数分别为 48.1 % 和 64.7 %,与基线 ConformerSED 系统相比分别提高了 15.3 % 和 10.6 %。我们的工作为智能家居项目提供了一种有效的 SED 解决方案,它提供了一种半监督 SED 系统,该系统性能良好,同时降低了标签成本。
{"title":"A sound event detection support system for smart home based on “two-to-one” teacher–student learning","authors":"","doi":"10.1016/j.asoc.2024.112224","DOIUrl":"10.1016/j.asoc.2024.112224","url":null,"abstract":"<div><p>Sound event detection (SED) is a core technology in smart home projects that rely on detected sound events to trigger specific actions. SED systems face two major challenges: high labeling costs and complex acoustic environments. To reduce labeling costs, some semi-supervised systems extract both global and local features for classification. However, these methods treat global and local features equally, not accounting for their varying importance when recognizing different types of sound events. Furthermore, to address complex acoustic environments, some studies use multitask learning frameworks to introduce SED-related tasks as auxiliaries to improve detection performance. However, these methods fail to align tasks within the framework, leading to conflicting outputs that may limit system performance. To address these issues, in this paper we propose a “two-to-one” teacher-student learning based semi-supervised SED system. This system employs a gating mechanism to selectively enhance global and local features, improving adaptability to different types of sound events, and incorporates a cross-task alignment module to interact SED with related tasks, reducing the risk of performance degradation caused by conflicting outputs. Experimental results on two datasets demonstrate that our system achieves the best performance in all metrics, with EB-F1 scores of 48.1 % and 64.7 %, representing improvements of 15.3 % and 10.6 % over the baseline ConformerSED system, respectively. Our work offers an effective SED solution for smart home projects by providing a semi-supervised SED system that performs well while reducing labeling costs.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142238715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-08DOI: 10.1016/j.asoc.2024.112214
The reasonable ranking of binary pairs that characterize fuzzy information in many fuzzy decision problems is very important. To overcome some defects of the existing score functions for the q-rung orthopair fuzzy numbers (q-ROFNs), a novel score function and ranking criterion are proposed by the q-compression transformation and hesitation factor. The main motivation is to introduce the generalized Zhenyuan (GZ)-integral into the q-ROFN environment, and cleverly transform the aggregation operations into a linear programming problem through the arithmetic operations of q-ROFNs. The main contribution is to solve the aggregation problem of q-rung orthopair fuzzy generalized Zhenyuan integral ordered weighted average (q-ROFGZIOWA) operator through the optimization technique of linear programming, and a new decision making method is established by using the q-ROFGZIOWA operator and ranking criterion. The main innovation is to map all q-ROFNs to the unit triangle in the first quadrant (converted into intuitionistic fuzzy numbers, IFNs) according to the q-compression transformation in geometric significance, and the novel score function and its ranking criterion are proposed by combining hesitation factor, and then the aggregation operation based on generalized Z-integral is converted to an optimization problem in linear programming. Finally, the superiority of the proposed method are verified by comparing the aggregation results of two integral operators through an example, and apply the proposed method to the optimal decision-making of talent recruitment in universities. The proposed method can not only correct some flaws in the ranking of existing q-ROFNs, but also overcomes some defects of existing Choquet integral average (geometric) operators in a q-ROFN environment. These results are of great significance for further research on the widespread application of q-ROFNs.
{"title":"A linear programming aggregation method based on generalized Zhenyuan integral in q-ROFN environment and the application of talent recruitment in universities","authors":"","doi":"10.1016/j.asoc.2024.112214","DOIUrl":"10.1016/j.asoc.2024.112214","url":null,"abstract":"<div><p>The reasonable ranking of binary pairs that characterize fuzzy information in many fuzzy decision problems is very important. To overcome some defects of the existing score functions for the q-rung orthopair fuzzy numbers (q-ROFNs), a novel score function and ranking criterion are proposed by the q-compression transformation and hesitation factor. The main motivation is to introduce the generalized Zhenyuan (GZ)-integral into the q-ROFN environment, and cleverly transform the aggregation operations into a linear programming problem through the arithmetic operations of q-ROFNs. The main contribution is to solve the aggregation problem of q-rung orthopair fuzzy generalized Zhenyuan integral ordered weighted average (q-ROFGZIOWA) operator through the optimization technique of linear programming, and a new decision making method is established by using the q-ROFGZIOWA operator and ranking criterion. The main innovation is to map all q-ROFNs to the unit triangle in the first quadrant (converted into intuitionistic fuzzy numbers, IFNs) according to the q-compression transformation in geometric significance, and the novel score function and its ranking criterion are proposed by combining hesitation factor, and then the aggregation operation based on generalized Z-integral is converted to an optimization problem in linear programming. Finally, the superiority of the proposed method are verified by comparing the aggregation results of two integral operators through an example, and apply the proposed method to the optimal decision-making of talent recruitment in universities. The proposed method can not only correct some flaws in the ranking of existing q-ROFNs, but also overcomes some defects of existing Choquet integral average (geometric) operators in a q-ROFN environment. These results are of great significance for further research on the widespread application of q-ROFNs.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}