AI Communications最新文献_第3页

An adaptive threshold based gait authentication by incorporating quality measures 结合质量度量的自适应阈值步态认证

4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-10-25 DOI: 10.3233/aic-230121

Sonia Das, Sukadev Meher, Upendra Kumar Sahoo

In this paper, an adaptive threshold-based gait authentication model is proposed, which incorporates the quality measure in the distance domain and maps them into the gradient domain to realize the optimal threshold of each gait sample, in contrast to the fixed threshold, as most of the authentication model utilizes. For accessing the quality measure of each gait, a gait covariate invariant generative adversarial network (GCI-GAN) is proposed to generate normal gait (canonical condition) irrespective of covariates (carrying, and viewing conditions) while preserving the subject identity. In particular, GCI-GAN connects to gradient weighted class activation mapping (Grad-CAMs) to obtain an attention mask from the significant components of input features, employs blending operation to manipulate specific regions of the input, and finally, multiple losses are employed to constrain the quality of generated samples. We validate the approach on gait datasets of CASIA-B and OU-ISIR and show a substantial increase in authentication rate over other state-of-the-art techniques.

本文提出了一种基于自适应阈值的步态认证模型，该模型将距离域中的质量度量纳入到梯度域中，以实现每个步态样本的最优阈值，而不是像大多数认证模型那样使用固定阈值。为了获取每个步态的质量度量，提出了一种步态协变量不变生成对抗网络(GCI-GAN)，在保持受试者身份的同时，不考虑协变量(携带和观看条件)生成正常步态(规范条件)。特别地，GCI-GAN连接梯度加权类激活映射(Grad-CAMs)，从输入特征的重要成分中获得注意掩模，使用混合操作来操纵输入的特定区域，最后使用多重损失来约束生成样本的质量。我们在CASIA-B和OU-ISIR的步态数据集上验证了该方法，并显示出与其他最先进的技术相比，认证率大幅提高。

引用次数: 0

Effective training to improve DeepPilot 有效的培训，以提高DeepPilot

4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-10-24 DOI: 10.3233/aic-230065

L. Oyuki Rojas-Perez, Jose Martinez-Carranza

We present an approach to autonomous drone racing inspired by how a human pilot learns a race track. Human pilots drive around the track multiple times to familiarise themselves with the track and find key points that allow them to complete the track without the risk of collision. This paper proposes a three-stage approach: exploration, navigation, and refinement. Our approach does not require prior knowledge about the race track, such as the number of gates, their positions, and their orientations. Instead, we use a trained neural pilot called DeepPilot to return basic flight commands from camera images where a gate is visible to navigate an unknown race track and a Single Shot Detector to visually detect the gates during the exploration stage to identify points of interest. These points are then used in the navigation stage as waypoints in a flight controller to enable faster flight and navigate the entire race track. Finally, in the refinement stage, we use the methodology developed in stages 1 and 2, to generate novel data to re-train DeepPilot, which produces more realistic manoeuvres for when the drone has to cross a gate. In this sense, similar to the original work, rather than generating examples by flying in a full track, we use small tracks of three gates to discover effective waypoints to be followed by the waypoint controller. This produces novel training data for DeepPilot without human intervention. By training with this new data, DeepPilot significantly improves its performance by increasing its flight speed twice w.r.t. its original version. Also, for this stage 3, we required 66 % less training data than in the original DeepPilot without compromising the effectiveness of DeepPilot to enable a drone to autonomously fly in a racetrack.

我们提出了一种自主无人机比赛的方法，灵感来自于人类飞行员学习赛道的方式。人类驾驶员在赛道上多次驾驶，以熟悉赛道，并找到关键点，使他们能够在没有碰撞风险的情况下完成赛道。本文提出了一个三阶段的方法:探索、导航和细化。我们的方法不需要事先了解赛道，比如门的数量、位置和方向。相反，我们使用一个训练有素的神经飞行员DeepPilot从相机图像中返回基本的飞行命令，其中一个门是可见的，可以导航未知的赛道，一个单镜头探测器在探索阶段视觉检测门，以确定兴趣点。然后，这些点在导航阶段用作飞行控制器中的路点，以实现更快的飞行和导航整个赛道。最后，在细化阶段，我们使用在第一阶段和第二阶段开发的方法来生成新的数据来重新训练DeepPilot，当无人机必须穿过大门时，它会产生更现实的机动。在这个意义上，类似于原作，我们不是通过在一个完整的轨道上飞行来生成示例，而是使用三个门的小轨道来发现有效的航路点，以便航路点控制器遵循。这为DeepPilot产生了新的训练数据，无需人工干预。通过使用这些新数据进行训练，DeepPilot的飞行速度比原始版本提高了两倍，显著提高了性能。此外，在第三阶段，我们需要的训练数据比原来的DeepPilot少66%，同时不影响DeepPilot的有效性，使无人机能够在赛道上自主飞行。

{"title":"Effective training to improve DeepPilot","authors":"L. Oyuki Rojas-Perez, Jose Martinez-Carranza","doi":"10.3233/aic-230065","DOIUrl":"https://doi.org/10.3233/aic-230065","url":null,"abstract":"We present an approach to autonomous drone racing inspired by how a human pilot learns a race track. Human pilots drive around the track multiple times to familiarise themselves with the track and find key points that allow them to complete the track without the risk of collision. This paper proposes a three-stage approach: exploration, navigation, and refinement. Our approach does not require prior knowledge about the race track, such as the number of gates, their positions, and their orientations. Instead, we use a trained neural pilot called DeepPilot to return basic flight commands from camera images where a gate is visible to navigate an unknown race track and a Single Shot Detector to visually detect the gates during the exploration stage to identify points of interest. These points are then used in the navigation stage as waypoints in a flight controller to enable faster flight and navigate the entire race track. Finally, in the refinement stage, we use the methodology developed in stages 1 and 2, to generate novel data to re-train DeepPilot, which produces more realistic manoeuvres for when the drone has to cross a gate. In this sense, similar to the original work, rather than generating examples by flying in a full track, we use small tracks of three gates to discover effective waypoints to be followed by the waypoint controller. This produces novel training data for DeepPilot without human intervention. By training with this new data, DeepPilot significantly improves its performance by increasing its flight speed twice w.r.t. its original version. Also, for this stage 3, we required 66 % less training data than in the original DeepPilot without compromising the effectiveness of DeepPilot to enable a drone to autonomously fly in a racetrack.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"33 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135266791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lifetime policy reuse and the importance of task capacity 生命周期策略重用和任务容量的重要性

4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-10-24 DOI: 10.3233/aic-230040

David M. Bossens, Adam J. Sobey

A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting. Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1) Lifetime Policy Reuse, a model-agnostic policy reuse algorithm that avoids generating many policies by optimising a fixed number of near-optimal policies through a combination of policy optimisation and adaptive policy selection; and 2) the task capacity, a measure for the maximal number of tasks that a policy can accurately solve. Comparing two state-of-the-art base-learners, the results demonstrate the importance of Lifetime Policy Reuse and task capacity based pre-selection on an 18-task partially observable Pacman domain and a Cartpole domain of up to 125 tasks.

人工智能领域的一个长期挑战是终身强化学习，学习者被按顺序分配许多任务，必须在任务之间转移知识，同时避免灾难性的遗忘。策略重用和其他多策略强化学习技术可以学习多个任务，但可能产生许多策略。本文提出了两个新颖的贡献，即1)终身策略重用，一种模型无关的策略重用算法，通过策略优化和自适应策略选择的结合，通过优化固定数量的近最优策略来避免生成许多策略;2)任务容量，一个策略可以准确解决的最大任务数量的度量。比较两种最先进的基础学习器，结果表明在18个任务部分可观察的Pacman域和多达125个任务的Cartpole域上，终身策略重用和基于任务容量的预选的重要性。

引用次数: 0

DW: Detected weight for 3D object detection DW:用于3D物体检测的检测权重

4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-10-13 DOI: 10.3233/aic-230008

Zhi Huang

It is a generic paradigm to treat all samples equally in 3D object detection. Although some works focus on discriminating samples in the training process of object detectors, the issue of whether a sample detects its target GT (Ground Truth) during training process has never been studied. In this work, we first point out that discriminating the samples that detect their target GT and the samples that don’t detect their target GT is beneficial to improve the performance measured in terms of mAP (mean Average Precision). Then we propose a novel approach name as DW (Detected Weight). The proposed approach dynamically calculates and assigns different weights to detected and undetected samples, which suppresses the former and promotes the latter. The approach is simple, low-calculation and can be integrated with available weight approaches. Further, it can be applied to almost 3D detectors, even 2D detectors because it is nothing to do with network structures. We evaluate the proposed approach with six state-of-the-art 3D detectors on two datasets. The experiment results show that the proposed approach improves mAP significantly.

在三维物体检测中，平等对待所有样本是一种通用范例。虽然一些研究集中在目标检测器的训练过程中对样本进行判别，但样本在训练过程中是否检测到目标GT (Ground Truth)的问题却从未被研究过。在这项工作中，我们首先指出，区分检测到目标GT的样本和未检测到目标GT的样本有利于提高mAP (mean Average Precision)的性能。然后，我们提出了一种新的方法，称为DW (detection Weight)。该方法对检测到的样本和未检测到的样本动态计算和分配不同的权重，抑制了检测到的样本，促进了未检测到的样本。该方法简单，计算量小，可与现有的权重方法相结合。此外，它几乎可以应用于三维探测器，甚至二维探测器，因为它与网络结构无关。我们用六个最先进的3D探测器在两个数据集上评估了所提出的方法。实验结果表明，该方法显著提高了mAP的性能。

引用次数: 0

Multi-scale spatio-temporal network for skeleton-based gait recognition 基于骨骼步态识别的多尺度时空网络

4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-10-13 DOI: 10.3233/aic-230033

Dongzhi He, Yongle Xue, Yunyu Li, Zhijie Sun, Xingmei Xiao, Jin Wang

Gait has unique physiological characteristics and supports long-distance recognition, so gait recognition is ideal for areas such as home security and identity detection. Methods using graph convolutional networks usually extract features in the spatial and temporal dimensions by stacking GCNs and TCNs, but different joints are interconnected at different moments, so splitting the spatial and temporal dimensions can cause the loss of gait information. Focus on this problem, we propose a gait recognition network, Multi-scale Spatio-Temporal Gait (MST-Gait), which can learn multi-scale gait information simultaneously from spatial and temporal dimensions. We design a multi-scale spatio-temporal groups Transformer (MSTGT) to model the correlation of intra-frame and inter-frame joints simultaneously. And a multi-scale segmentation strategy is designed to capture the periodic and local features of the gait. To fully exploit the temporal information of gait motion, we design a fusion temporal convolution (FTC) to aggregate temporal information at different scales and motion information. Experiments on the popular CASIA-B gait dataset and OUMVLP-Pose dataset show that our method outperforms most existing skeleton-based methods, verifying the effectiveness of the proposed modules.

步态具有独特的生理特征，支持远距离识别，因此步态识别是家庭安全和身份检测等领域的理想选择。使用图卷积网络的方法通常是通过叠加GCNs和tcn来提取空间和时间维度的特征，但不同的关节在不同的时刻是相互连接的，因此分割空间和时间维度会导致步态信息的丢失。针对这一问题，我们提出了一种多尺度时空步态识别网络(mst -步态)，该网络可以从空间和时间维度同时学习多尺度步态信息。我们设计了一个多尺度时空群转换器(MSTGT)来同时模拟框架内和框架间节点的相关性。设计了一种多尺度分割策略来捕捉步态的周期性特征和局部特征。为了充分利用步态运动的时间信息，设计了一种融合时间卷积(FTC)算法，对不同尺度的时间信息和运动信息进行聚合。在流行的CASIA-B步态数据集和OUMVLP-Pose数据集上的实验表明，我们的方法优于大多数现有的基于骨骼的方法，验证了所提模块的有效性。

{"title":"Multi-scale spatio-temporal network for skeleton-based gait recognition","authors":"Dongzhi He, Yongle Xue, Yunyu Li, Zhijie Sun, Xingmei Xiao, Jin Wang","doi":"10.3233/aic-230033","DOIUrl":"https://doi.org/10.3233/aic-230033","url":null,"abstract":"Gait has unique physiological characteristics and supports long-distance recognition, so gait recognition is ideal for areas such as home security and identity detection. Methods using graph convolutional networks usually extract features in the spatial and temporal dimensions by stacking GCNs and TCNs, but different joints are interconnected at different moments, so splitting the spatial and temporal dimensions can cause the loss of gait information. Focus on this problem, we propose a gait recognition network, Multi-scale Spatio-Temporal Gait (MST-Gait), which can learn multi-scale gait information simultaneously from spatial and temporal dimensions. We design a multi-scale spatio-temporal groups Transformer (MSTGT) to model the correlation of intra-frame and inter-frame joints simultaneously. And a multi-scale segmentation strategy is designed to capture the periodic and local features of the gait. To fully exploit the temporal information of gait motion, we design a fusion temporal convolution (FTC) to aggregate temporal information at different scales and motion information. Experiments on the popular CASIA-B gait dataset and OUMVLP-Pose dataset show that our method outperforms most existing skeleton-based methods, verifying the effectiveness of the proposed modules.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135804908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual cross-domain session-based recommendation with multi-channel integration 双跨域会话推荐，多通道集成

4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-10-13 DOI: 10.3233/aic-230084

Jinjin Zhang, Xiang Hua, Peng Zhao, Kai Kang

Session-based recommendation aims at predicting the next behavior when the current interaction sequence is given. Recent advances evaluate the effectiveness of dual cross-domain information for the session-based recommendation. However, we discover that accurately modeling the session representations is still a challenging problem due to the complexity of preference interactions in the cross-domain, and various methods are proposed to only model the common features of cross-domain, while ignoring the specific features and enhanced features for the dual cross-domain. Without modeling the complete features, the existing methods suffer from poor recommendation accuracy. Therefore, we propose an end-to-end dual cross-domain with multi-channel interaction model (DCMI), which utilizes dual cross-domain session information and multiple preference interaction encoders, for session-based recommendation. In DCMI, we apply a graph neural network to generate the session global preference and local preference. Then, we design a cross-preference interaction module to capture the common, specific, and enhanced features for cross-domain sessions with local preferences and global preferences. Finally, we combine multiple preferences with a bilinear fusion mechanism to characterize and make recommendations. Experimental results on the Amazon dataset demonstrate the superiority of the DCMI model over the state-of-the-art methods.

基于会话的推荐旨在预测当前交互序列给定后的下一个行为。最近的进展评估了双重跨领域信息在基于会话的推荐中的有效性。然而，由于跨域偏好交互的复杂性，我们发现准确建模会话表示仍然是一个具有挑战性的问题，并且提出了各种方法，仅对跨域的共同特征进行建模，而忽略了双跨域的特定特征和增强特征。由于没有对完整的特征进行建模，现有的推荐方法存在推荐精度差的问题。因此，我们提出了一种端到端的双跨域多通道交互模型(DCMI)，该模型利用双跨域会话信息和多偏好交互编码器来实现基于会话的推荐。在DCMI中，我们应用图神经网络生成会话全局偏好和局部偏好。然后，我们设计了一个跨偏好交互模块，以捕获具有本地偏好和全局偏好的跨域会话的公共、特定和增强功能。最后，我们将多重偏好与双线性融合机制结合起来进行表征并提出建议。在Amazon数据集上的实验结果表明，DCMI模型优于现有方法。

{"title":"Dual cross-domain session-based recommendation with multi-channel integration","authors":"Jinjin Zhang, Xiang Hua, Peng Zhao, Kai Kang","doi":"10.3233/aic-230084","DOIUrl":"https://doi.org/10.3233/aic-230084","url":null,"abstract":"Session-based recommendation aims at predicting the next behavior when the current interaction sequence is given. Recent advances evaluate the effectiveness of dual cross-domain information for the session-based recommendation. However, we discover that accurately modeling the session representations is still a challenging problem due to the complexity of preference interactions in the cross-domain, and various methods are proposed to only model the common features of cross-domain, while ignoring the specific features and enhanced features for the dual cross-domain. Without modeling the complete features, the existing methods suffer from poor recommendation accuracy. Therefore, we propose an end-to-end dual cross-domain with multi-channel interaction model (DCMI), which utilizes dual cross-domain session information and multiple preference interaction encoders, for session-based recommendation. In DCMI, we apply a graph neural network to generate the session global preference and local preference. Then, we design a cross-preference interaction module to capture the common, specific, and enhanced features for cross-domain sessions with local preferences and global preferences. Finally, we combine multiple preferences with a bilinear fusion mechanism to characterize and make recommendations. Experimental results on the Amazon dataset demonstrate the superiority of the DCMI model over the state-of-the-art methods.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135805221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conflagration-YOLO: a lightweight object detection architecture for conflagration fire - yolo:用于fire的轻量级对象检测体系结构

4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-10-13 DOI: 10.3233/aic-230094

Ning Sun, Pengfei Shen, Xiaoling Ye, Yifei Chen, Xiping Cheng, Pingping Wang, Jie Min

Fire monitoring of fire-prone areas is essential, and in order to meet the requirements of edge deployment and the balance of fire recognition accuracy and speed, we design a lightweight fire recognition network, Conflagration-YOLO. Conflagration-YOLO is constructed by depthwise separable convolution and more attention to fire feature information extraction from a three-dimensional(3D) perspective, which improves the network feature extraction capability, achieves a balance of accuracy and speed, and reduces model parameters. In addition, a new activation function is used to improve the accuracy of fire recognition while minimizing the inference time of the network. All models are trained and validated on a custom fire dataset and fire inference is performed on the CPU. The mean Average Precision(mAP) of the proposed model reaches 80.92%, which has a great advantage compared with Faster R-CNN. Compared with YOLOv3-Tiny, the proposed model decreases the number of parameters by 5.71 M and improves the mAP by 6.67%. Compared with YOLOv4-Tiny, the number of parameters decreases by 3.54 M, mAP increases by 8.47%, and inference time decreases by 62.59 ms. Compared with YOLOv5s, the difference in the number of parameters is nearly twice reduced by 4.45 M and the inference time is reduced by 41.87 ms. Compared with YOLOX-Tiny, the number of parameters decreases by 2.5 M, mAP increases by 0.7%, and inference time decreases by 102.49 ms. Compared with YOLOv7, the number of parameters decreases significantly and the balance of accuracy and speed is achieved. Compared with YOLOv7-Tiny, the number of parameters decreases by 3.64 M, mAP increases by 0.5%, and inference time decreases by 15.65 ms. The experiment verifies the superiority and effectiveness of Conflagration-YOLO compared to the state-of-the-art (SOTA) network model. Furthermore, our proposed model and its dimensional variants can be applied to computer vision downstream target detection tasks in other scenarios as required.

火灾易发区域的火灾监测是必不可少的，为了满足边缘部署的要求以及火灾识别精度和速度的平衡，我们设计了一个轻量级的火灾识别网络——conflaga - yolo。fire - yolo是通过深度可分卷积构建的，更注重从三维角度提取火灾特征信息，提高了网络特征提取能力，实现了准确性和速度的平衡，减少了模型参数。此外，采用了新的激活函数，提高了火灾识别的精度，同时使网络的推理时间最小化。所有模型都在自定义火灾数据集上进行训练和验证，并在CPU上执行火灾推理。该模型的平均精度(mAP)达到80.92%，与Faster R-CNN相比具有很大的优势。与YOLOv3-Tiny模型相比，该模型的参数个数减少了5.71 M, mAP提高了6.67%。与YOLOv4-Tiny相比，参数数量减少了3.54 M, mAP增加了8.47%，推理时间减少了62.59 ms。与YOLOv5s相比，参数数量的差异减少了近2倍，减少了4.45 M，推理时间减少了41.87 ms。与YOLOX-Tiny相比，参数数量减少2.5 M, mAP增加0.7%，推理时间减少102.49 ms。与YOLOv7相比，参数数量明显减少，实现了精度和速度的平衡。与YOLOv7-Tiny相比，参数数量减少了3.64 M, mAP增加了0.5%，推理时间减少了15.65 ms。实验验证了fire - yolo模型相对于最先进的SOTA网络模型的优越性和有效性。此外，我们提出的模型及其维度变体可以根据需要应用于其他场景的计算机视觉下游目标检测任务。

{"title":"Conflagration-YOLO: a lightweight object detection architecture for conflagration","authors":"Ning Sun, Pengfei Shen, Xiaoling Ye, Yifei Chen, Xiping Cheng, Pingping Wang, Jie Min","doi":"10.3233/aic-230094","DOIUrl":"https://doi.org/10.3233/aic-230094","url":null,"abstract":"Fire monitoring of fire-prone areas is essential, and in order to meet the requirements of edge deployment and the balance of fire recognition accuracy and speed, we design a lightweight fire recognition network, Conflagration-YOLO. Conflagration-YOLO is constructed by depthwise separable convolution and more attention to fire feature information extraction from a three-dimensional(3D) perspective, which improves the network feature extraction capability, achieves a balance of accuracy and speed, and reduces model parameters. In addition, a new activation function is used to improve the accuracy of fire recognition while minimizing the inference time of the network. All models are trained and validated on a custom fire dataset and fire inference is performed on the CPU. The mean Average Precision(mAP) of the proposed model reaches 80.92%, which has a great advantage compared with Faster R-CNN. Compared with YOLOv3-Tiny, the proposed model decreases the number of parameters by 5.71 M and improves the mAP by 6.67%. Compared with YOLOv4-Tiny, the number of parameters decreases by 3.54 M, mAP increases by 8.47%, and inference time decreases by 62.59 ms. Compared with YOLOv5s, the difference in the number of parameters is nearly twice reduced by 4.45 M and the inference time is reduced by 41.87 ms. Compared with YOLOX-Tiny, the number of parameters decreases by 2.5 M, mAP increases by 0.7%, and inference time decreases by 102.49 ms. Compared with YOLOv7, the number of parameters decreases significantly and the balance of accuracy and speed is achieved. Compared with YOLOv7-Tiny, the number of parameters decreases by 3.64 M, mAP increases by 0.5%, and inference time decreases by 15.65 ms. The experiment verifies the superiority and effectiveness of Conflagration-YOLO compared to the state-of-the-art (SOTA) network model. Furthermore, our proposed model and its dimensional variants can be applied to computer vision downstream target detection tasks in other scenarios as required.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135805222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transferring experiences in k-nearest neighbors based multiagent reinforcement learning: an application to traffic signal control 基于k近邻的多智能体强化学习的经验传递:在交通信号控制中的应用

4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-09-27 DOI: 10.3233/aic-220305

Ana Lucia C. Bazzan, Vicente N. de Almeida, Monireh Abdoos

The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence in particular. Increasing the capacity of road networks is not always possible, thus a more efficient use of the available transportation infrastructure is required. Another issue is that many problems in traffic management and control are inherently decentralized and/or require adaptation to the traffic situation. Hence, there is a close relationship to multiagent reinforcement learning. However, using reinforcement learning poses the challenge that the state space is normally large and continuous, thus it is necessary to find appropriate schemes to deal with discretization of the state space. To address these issues, a multiagent system with agents learning independently via a learning algorithm was proposed, which is based on estimating Q-values from k-nearest neighbors. In the present paper, we extend this approach and include transfer of experiences among the agents, especially when an agent does not have a good set of k experiences. We deal with traffic signal control, running experiments on a traffic network in which we vary the traffic situation along time, and compare our approach to two baselines (one involving reinforcement learning and one based on fixed times). Our results show that the extended method pays off when an agent returns to an already experienced traffic situation.

社会对移动性的需求日益增长，对交通工程、计算机科学、特别是人工智能提出了各种挑战。增加道路网络的容量并不总是可能的，因此需要更有效地利用现有的运输基础设施。另一个问题是，交通管理和控制中的许多问题本质上是分散的和/或需要适应交通状况。因此，这与多智能体强化学习有着密切的关系。然而，使用强化学习带来了状态空间通常很大且连续的挑战，因此有必要找到适当的方案来处理状态空间的离散化。为了解决这些问题，提出了一种基于k近邻估计q值的学习算法的多智能体系统。在本文中，我们扩展了这种方法，并包括代理之间的经验转移，特别是当一个代理没有k个良好的经验集时。我们处理交通信号控制，在交通网络上运行实验，其中我们随时间变化交通状况，并将我们的方法与两个基线(一个涉及强化学习，另一个基于固定时间)进行比较。我们的结果表明，当agent返回到已经经历过的交通状况时，扩展方法是有效的。

{"title":"Transferring experiences in k-nearest neighbors based multiagent reinforcement learning: an application to traffic signal control","authors":"Ana Lucia C. Bazzan, Vicente N. de Almeida, Monireh Abdoos","doi":"10.3233/aic-220305","DOIUrl":"https://doi.org/10.3233/aic-220305","url":null,"abstract":"The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence in particular. Increasing the capacity of road networks is not always possible, thus a more efficient use of the available transportation infrastructure is required. Another issue is that many problems in traffic management and control are inherently decentralized and/or require adaptation to the traffic situation. Hence, there is a close relationship to multiagent reinforcement learning. However, using reinforcement learning poses the challenge that the state space is normally large and continuous, thus it is necessary to find appropriate schemes to deal with discretization of the state space. To address these issues, a multiagent system with agents learning independently via a learning algorithm was proposed, which is based on estimating Q-values from k-nearest neighbors. In the present paper, we extend this approach and include transfer of experiences among the agents, especially when an agent does not have a good set of k experiences. We deal with traffic signal control, running experiments on a traffic network in which we vary the traffic situation along time, and compare our approach to two baselines (one involving reinforcement learning and one based on fixed times). Our results show that the extended method pays off when an agent returns to an already experienced traffic situation.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135586495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classifying falls using out-of-distribution detection in human activity recognition 基于非分布检测的人体活动识别跌倒分类

IF 0.8 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-08-29 DOI: 10.3233/aic-220205

Debaditya Roy, Vangjush Komini, Sarunas Girdzijauskas

As the research community focuses on improving the reliability of deep learning, identifying out-of-distribution (OOD) data has become crucial. Detecting OOD inputs during test/prediction allows the model to account for discriminative features unknown to the model. This capability increases the model’s reliability since this model provides a class prediction solely at incoming data similar to the training one. Although OOD detection is well-established in computer vision, it is relatively unexplored in other areas, like time series-based human activity recognition (HAR). Since uncertainty has been a critical driver for OOD in vision-based models, the same component has proven effective in time-series applications. In this work, we propose an ensemble-based temporal learning framework to address the OOD detection problem in HAR with time-series data. First, we define different types of OOD for HAR that arise from realistic scenarios. Then we apply our ensemble-based temporal learning framework incorporating uncertainty to detect OODs for the defined HAR workloads. This particular formulation also allows a novel approach to fall detection. We train our model on non-fall activities and detect falls as OOD. Our method shows state-of-the-art performance in a fall detection task using much lesser data. Furthermore, the ensemble framework outperformed the traditional deep-learning method (our baseline) on the OOD detection task across all the other chosen datasets.

随着研究界专注于提高深度学习的可靠性，识别分布外（OOD）数据变得至关重要。在测试/预测期间检测OOD输入允许模型考虑模型未知的判别特征。这种能力增加了模型的可靠性，因为该模型仅在与训练数据类似的输入数据处提供类预测。尽管OOD检测在计算机视觉中已经得到了广泛的应用，但在其他领域却相对未被探索，比如基于时间序列的人类活动识别（HAR）。由于不确定性一直是基于视觉的模型中OOD的关键驱动因素，因此相同的组件在时间序列应用中已被证明是有效的。在这项工作中，我们提出了一个基于集成的时间学习框架来解决具有时间序列数据的HAR中的OOD检测问题。首先，我们为HAR定义了不同类型的OOD，这些OOD源于现实场景。然后，我们应用我们的基于集成的时间学习框架，结合不确定性来检测定义的HAR工作负载的OOD。这种特殊的配方还允许一种新的跌倒检测方法。我们在非跌倒活动上训练我们的模型，并将跌倒检测为OOD。我们的方法在使用更少数据的跌倒检测任务中显示出最先进的性能。此外，在所有其他选择的数据集中，集成框架在OOD检测任务上优于传统的深度学习方法（我们的基线）。

{"title":"Classifying falls using out-of-distribution detection in human activity recognition","authors":"Debaditya Roy, Vangjush Komini, Sarunas Girdzijauskas","doi":"10.3233/aic-220205","DOIUrl":"https://doi.org/10.3233/aic-220205","url":null,"abstract":"As the research community focuses on improving the reliability of deep learning, identifying out-of-distribution (OOD) data has become crucial. Detecting OOD inputs during test/prediction allows the model to account for discriminative features unknown to the model. This capability increases the model’s reliability since this model provides a class prediction solely at incoming data similar to the training one. Although OOD detection is well-established in computer vision, it is relatively unexplored in other areas, like time series-based human activity recognition (HAR). Since uncertainty has been a critical driver for OOD in vision-based models, the same component has proven effective in time-series applications. In this work, we propose an ensemble-based temporal learning framework to address the OOD detection problem in HAR with time-series data. First, we define different types of OOD for HAR that arise from realistic scenarios. Then we apply our ensemble-based temporal learning framework incorporating uncertainty to detect OODs for the defined HAR workloads. This particular formulation also allows a novel approach to fall detection. We train our model on non-fall activities and detect falls as OOD. Our method shows state-of-the-art performance in a fall detection task using much lesser data. Furthermore, the ensemble framework outperformed the traditional deep-learning method (our baseline) on the OOD detection task across all the other chosen datasets.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49090518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TMTrans: texture mixed transformers for medical image segmentation TMTrans:用于医学图像分割的纹理混合变压器

IF 0.8 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications

Pub Date : 2023-08-29 DOI: 10.3233/aic-230089

Lifang Chen, Tao Wang, Hongze Ge

Accurate segmentation of skin cancer is crucial for doctors to identify and treat lesions. Researchers are increasingly using auxiliary modules with Transformers to optimize the model’s ability to process global context information and reduce detail loss. Additionally, diseased skin texture differs from normal skin, and pre-processed texture images can reflect the shape and edge information of the diseased area. We propose TMTrans (Texture Mixed Transformers). We have innovatively designed a dual axis attention mechanism (IEDA-Trans) that considers both global context and local information, as well as a multi-scale fusion (MSF) module that associates surface shape information with deep semantics. Additionally, we utilize TE(Texture Enhance) and SK(Skip connection) modules to bridge the semantic gap between encoders and decoders and enhance texture features. Our model was evaluated on multiple skin datasets, including ISIC 2016/2017/2018 and PH2, and outperformed other convolution and Transformer-based models. Furthermore, we conducted a generalization test on the 2018 DSB dataset, which resulted in a nearly 2% improvement in the Dice index, demonstrating the effectiveness of our proposed model.

皮肤癌症的精确分割对于医生识别和治疗病变至关重要。研究人员越来越多地使用Transformers的辅助模块来优化模型处理全局上下文信息的能力，并减少细节损失。此外，病变皮肤的纹理不同于正常皮肤，预处理的纹理图像可以反映病变区域的形状和边缘信息。我们提出TMTrans（纹理混合变换器）。我们创新性地设计了一种同时考虑全局上下文和局部信息的双轴注意力机制（IEDA Trans），以及一个将表面形状信息与深层语义相关联的多尺度融合（MSF）模块。此外，我们还利用TE（纹理增强）和SK（跳过连接）模块来弥合编码器和解码器之间的语义差距，增强纹理特征。我们的模型在多个皮肤数据集上进行了评估，包括ISIC 2016/2017/2018和PH2，并且优于其他基于卷积和Transformer的模型。此外，我们在2018年DSB数据集上进行了泛化测试，结果Dice指数提高了近2%，证明了我们提出的模型的有效性。

{"title":"TMTrans: texture mixed transformers for medical image segmentation","authors":"Lifang Chen, Tao Wang, Hongze Ge","doi":"10.3233/aic-230089","DOIUrl":"https://doi.org/10.3233/aic-230089","url":null,"abstract":"Accurate segmentation of skin cancer is crucial for doctors to identify and treat lesions. Researchers are increasingly using auxiliary modules with Transformers to optimize the model’s ability to process global context information and reduce detail loss. Additionally, diseased skin texture differs from normal skin, and pre-processed texture images can reflect the shape and edge information of the diseased area. We propose TMTrans (Texture Mixed Transformers). We have innovatively designed a dual axis attention mechanism (IEDA-Trans) that considers both global context and local information, as well as a multi-scale fusion (MSF) module that associates surface shape information with deep semantics. Additionally, we utilize TE(Texture Enhance) and SK(Skip connection) modules to bridge the semantic gap between encoders and decoders and enhance texture features. Our model was evaluated on multiple skin datasets, including ISIC 2016/2017/2018 and PH2, and outperformed other convolution and Transformer-based models. Furthermore, we conducted a generalization test on the 2018 DSB dataset, which resulted in a nearly 2% improvement in the Dice index, demonstrating the effectiveness of our proposed model.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44943605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0