Complex & Intelligent Systems最新文献_第10页

A dynamic dropout self-distillation method for object segmentation 一种用于目标分割的动态dropout自蒸馏方法

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-20 DOI: 10.1007/s40747-024-01705-8

Lei Chen, Tieyong Cao, Yunfei Zheng, Yang Wang, Bo Zhang, Jibin Yang

There is a phenomenon that better teachers cannot teach out better students in knowledge distillation due to the capacity mismatch. Especially in pixel-level object segmentation, there are some challenging pixels that are difficult for the student model to learn. Even if the student model learns from the teacher model for each pixel, the student’s performance still struggles to show significant improvement. Mimicking the learning process of human beings from easy to difficult, a dynamic dropout self-distillation method for object segmentation is proposed, which solves this problem by discarding the knowledge that the student struggles to learn. Firstly, the pixels where there is a significant difference between the teacher and student models are found according to the predicted probabilities. And these pixels are defined as difficult-to-learn pixel for the student model. Secondly, a dynamic dropout strategy is proposed to match the capability variation of the student model, which is used to discard the pixels with hard knowledge for the student model. Finally, to validate the effectiveness of the proposed method, a simple student model for object segmentation and a virtual teacher model with perfect segmentation accuracy are constructed. Experiment results on four public datasets demonstrate that, when there is a large performance gap between the teacher and student models, the proposed self-distillation method is more effective in improving the performance of the student model compared to other methods.

在知识升华过程中，由于能力不匹配，存在着好老师教不出好学生的现象。特别是在像素级的目标分割中，有一些具有挑战性的像素是学生模型难以学习的。即使学生模型在每个像素上都向教师模型学习，学生的表现仍然难以显示出显著的改善。模仿人类从易到难的学习过程，提出了一种动态辍学自蒸馏的目标分割方法，通过丢弃学生努力学习的知识来解决这一问题。首先，根据预测概率找到教师和学生模型之间存在显著差异的像素。这些像素被定义为学生模型中难以学习的像素。其次，针对学生模型的能力变化，提出了一种动态丢弃策略，用于丢弃学生模型中具有硬知识的像素；最后，为了验证该方法的有效性，构建了一个简单的学生对象分割模型和一个具有较好分割精度的虚拟教师模型。在四个公开数据集上的实验结果表明，当教师模型和学生模型之间存在较大的性能差距时，本文提出的自蒸馏方法比其他方法更有效地提高了学生模型的性能。

{"title":"A dynamic dropout self-distillation method for object segmentation","authors":"Lei Chen, Tieyong Cao, Yunfei Zheng, Yang Wang, Bo Zhang, Jibin Yang","doi":"10.1007/s40747-024-01705-8","DOIUrl":"https://doi.org/10.1007/s40747-024-01705-8","url":null,"abstract":"There is a phenomenon that better teachers cannot teach out better students in knowledge distillation due to the capacity mismatch. Especially in pixel-level object segmentation, there are some challenging pixels that are difficult for the student model to learn. Even if the student model learns from the teacher model for each pixel, the student’s performance still struggles to show significant improvement. Mimicking the learning process of human beings from easy to difficult, a dynamic dropout self-distillation method for object segmentation is proposed, which solves this problem by discarding the knowledge that the student struggles to learn. Firstly, the pixels where there is a significant difference between the teacher and student models are found according to the predicted probabilities. And these pixels are defined as difficult-to-learn pixel for the student model. Secondly, a dynamic dropout strategy is proposed to match the capability variation of the student model, which is used to discard the pixels with hard knowledge for the student model. Finally, to validate the effectiveness of the proposed method, a simple student model for object segmentation and a virtual teacher model with perfect segmentation accuracy are constructed. Experiment results on four public datasets demonstrate that, when there is a large performance gap between the teacher and student models, the proposed self-distillation method is more effective in improving the performance of the student model compared to other methods.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"23 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142857995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ADSTrack: adaptive dynamic sampling for visual tracking ADSTrack：用于视觉跟踪的自适应动态采样

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01672-0

Zhenhai Wang, Lutao Yuan, Ying Ren, Sen Zhang, Hongyu Tian

The most common method for visual object tracking involves feeding an image pair comprising a template image and search region into a tracker. The tracker uses a backbone to process the information in the image pair. In pure Transformer-based frameworks, redundant information in image pairs exists throughout the tracking process and the corresponding negative tokens consume the same computational resources as the positive tokens while degrading the performance of the tracker. Therefore, we propose to solve this problem using an adaptive dynamic sampling strategy in a pure Transformer-based tracker, known as ADSTrack. ADSTrack progressively reduces irrelevant, redundant negative tokens in the search region that are not related to the tracked objectand the effect of noise generated by these tokens. The adaptive dynamic sampling strategy enhances the performance of the tracker by scoring and adaptive sampling of important tokens, and the number of tokens sampled varies according to the input image. Moreover, the adaptive dynamic sampling strategy is a parameterless token sampling strategy that does not use additional parameters. We add several extra tokens as auxiliary tokens to the backbone to further optimize the feature map. We extensively evaluate ADSTrack, achieving satisfactory results for seven test sets, including UAV123 and LaSOT.

最常见的视觉物体跟踪方法是将由模板图像和搜索区域组成的图像对输入跟踪器。跟踪器使用主干处理图像对中的信息。在纯粹基于变换器的框架中，图像对中的冗余信息存在于整个跟踪过程中，相应的负标记会消耗与正标记相同的计算资源，同时降低跟踪器的性能。因此，我们建议在基于纯变换器的跟踪器中采用自适应动态采样策略来解决这一问题，即 ADSTrack。ADSTrack 可以逐步减少搜索区域中与被跟踪物体无关的冗余负标记以及这些标记产生的噪声影响。自适应动态采样策略通过对重要标记进行评分和自适应采样来提高跟踪器的性能，采样标记的数量根据输入图像的不同而变化。此外，自适应动态采样策略是一种无参数标记采样策略，不使用额外的参数。我们在骨干网中添加了几个额外的标记作为辅助标记，以进一步优化特征图。我们对 ADSTrack 进行了广泛的评估，在包括 UAV123 和 LaSOT 在内的七个测试集中取得了令人满意的结果。

{"title":"ADSTrack: adaptive dynamic sampling for visual tracking","authors":"Zhenhai Wang, Lutao Yuan, Ying Ren, Sen Zhang, Hongyu Tian","doi":"10.1007/s40747-024-01672-0","DOIUrl":"https://doi.org/10.1007/s40747-024-01672-0","url":null,"abstract":"The most common method for visual object tracking involves feeding an image pair comprising a template image and search region into a tracker. The tracker uses a backbone to process the information in the image pair. In pure Transformer-based frameworks, redundant information in image pairs exists throughout the tracking process and the corresponding negative tokens consume the same computational resources as the positive tokens while degrading the performance of the tracker. Therefore, we propose to solve this problem using an adaptive dynamic sampling strategy in a pure Transformer-based tracker, known as ADSTrack. ADSTrack progressively reduces irrelevant, redundant negative tokens in the search region that are not related to the tracked objectand the effect of noise generated by these tokens. The adaptive dynamic sampling strategy enhances the performance of the tracker by scoring and adaptive sampling of important tokens, and the number of tokens sampled varies according to the input image. Moreover, the adaptive dynamic sampling strategy is a parameterless token sampling strategy that does not use additional parameters. We add several extra tokens as auxiliary tokens to the backbone to further optimize the feature map. We extensively evaluate ADSTrack, achieving satisfactory results for seven test sets, including UAV123 and LaSOT.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"13 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Edge-centric optimization: a novel strategy for minimizing information loss in graph-to-text generation 以边为中心的优化：在图形到文本生成中最小化信息损失的新策略

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01690-y

Zheng Yao, Jingyuan Li, Jianhe Cen, Shiqi Sun, Dahu Yin, Yuanzhuo Wang

Abstract

Given the remarkable text generation capabilities of pre-trained language models, impressive results have been realized in graph-to-text generation. However, while learning from knowledge graphs, these language models are unable to fully grasp the structural information of the graph, leading to logical errors and missing key information. Therefore, an important research direction is to minimize the loss of graph structural information during the model training process. We propose a framework named Edge-Optimized Multi-Level Information refinement (EMLR), which aims to maximize the retention of the graph’s structural information from an edge perspective. Based on this framework, we further propose a new graph generation model, named TriELMR, highlighting the comprehensive interactive learning relationship between the model and the graph structure, as well as the importance of edges in the graph structure. TriELMR adopts three main strategies to reduce information loss during learning: (1) Knowledge Sequence Optimization; (2) EMLR Framework; and (3) Graph Activation Function. Experimental results reveal that TriELMR exhibits exceptional performance across various benchmark tests, especially on the webnlgv2.0 and Event Narrative datasets, achieving BLEU-4 scores of (66.5%) and (37.27%), respectively, surpassing the state-of-the-art models. These demonstrate the advantages of TriELMR in maintaining the accuracy of graph structural information.

Graphical abstract

摘要由于预训练语言模型具有出色的文本生成能力，因此在图到文本生成方面取得了令人印象深刻的成果。然而，在从知识图谱中学习时，这些语言模型无法完全掌握图谱的结构信息，从而导致逻辑错误和关键信息缺失。因此，一个重要的研究方向是在模型训练过程中尽量减少图结构信息的丢失。我们提出了一个名为 "边缘优化多层次信息提炼（EMLR）"的框架，旨在从边缘角度最大限度地保留图的结构信息。在此框架的基础上，我们进一步提出了一种新的图生成模型，命名为 TriELMR，强调了模型与图结构之间的全面交互学习关系，以及边在图结构中的重要性。TriELMR 采用三种主要策略来减少学习过程中的信息损失：（1）知识序列优化；（2）EMLR 框架；（3）图激活函数。实验结果表明，TriELMR在各种基准测试中表现出了优异的性能，尤其是在webnlgv2.0和事件叙事数据集上，其BLEU-4得分分别达到了66.5%和37.27%，超过了最先进的模型。这些都证明了 TriELMR 在保持图结构信息准确性方面的优势。

{"title":"Edge-centric optimization: a novel strategy for minimizing information loss in graph-to-text generation","authors":"Zheng Yao, Jingyuan Li, Jianhe Cen, Shiqi Sun, Dahu Yin, Yuanzhuo Wang","doi":"10.1007/s40747-024-01690-y","DOIUrl":"https://doi.org/10.1007/s40747-024-01690-y","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3>Given the remarkable text generation capabilities of pre-trained language models, impressive results have been realized in graph-to-text generation. However, while learning from knowledge graphs, these language models are unable to fully grasp the structural information of the graph, leading to logical errors and missing key information. Therefore, an important research direction is to minimize the loss of graph structural information during the model training process. We propose a framework named Edge-Optimized Multi-Level Information refinement (EMLR), which aims to maximize the retention of the graph’s structural information from an edge perspective. Based on this framework, we further propose a new graph generation model, named TriELMR, highlighting the comprehensive interactive learning relationship between the model and the graph structure, as well as the importance of edges in the graph structure. TriELMR adopts three main strategies to reduce information loss during learning: (1) Knowledge Sequence Optimization; (2) EMLR Framework; and (3) Graph Activation Function. Experimental results reveal that TriELMR exhibits exceptional performance across various benchmark tests, especially on the webnlgv2.0 and Event Narrative datasets, achieving BLEU-4 scores of (66.5%) and (37.27%), respectively, surpassing the state-of-the-art models. These demonstrate the advantages of TriELMR in maintaining the accuracy of graph structural information.<h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"114 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CitySEIRCast: an agent-based city digital twin for pandemic analysis and simulation CitySEIRCast：基于代理的城市数字孪生，用于流行病分析和模拟

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01683-x

Shakir Bilal, Wajdi Zaatour, Yilian Alonso Otano, Arindam Saha, Ken Newcomb, Soo Kim, Jun Kim, Raveena Ginjala, Derek Groen, Edwin Michael

The COVID-19 pandemic has dramatically highlighted the importance of developing simulation systems for quickly characterizing and providing spatio-temporal forecasts of infection spread dynamics that take specific accounts of the population and spatial heterogeneities that govern pathogen transmission in real-world communities. Developing such computational systems must also overcome the cold-start problem related to the inevitable scarce early data and extant knowledge regarding a novel pathogen’s transmissibility and virulence, while addressing changing population behavior and policy options as a pandemic evolves. Here, we describe how we have coupled advances in the construction of digital or virtual models of real-world cities with an agile, modular, agent-based model of viral transmission and data from navigation and social media interactions, to overcome these challenges in order to provide a new simulation tool, CitySEIRCast, that can model viral spread at the sub-national level. Our data pipelines and workflows are designed purposefully to be flexible and scalable so that we can implement the system on hybrid cloud/cluster systems and be agile enough to address different population settings and indeed, diseases. Our simulation results demonstrate that CitySEIRCast can provide the timely high resolution spatio-temporal epidemic predictions required for supporting situational awareness of the state of a pandemic as well as for facilitating assessments of vulnerable sub-populations and locations and evaluations of the impacts of implemented interventions, inclusive of the effects of population behavioral response to fluctuations in case incidence. This work arose in response to requests from county agencies to support their work on COVID-19 monitoring, risk assessment, and planning, and using the described workflows, we were able to provide uninterrupted bi-weekly simulations to guide their efforts for over a year from late 2021 to 2023. We discuss future work that can significantly improve the scalability and real-time application of this digital city-based epidemic modelling system, such that validated predictions and forecasts of the paths that may followed by a contagion both over time and space can be used to anticipate the spread dynamics, risky groups and regions, and options for responding effectively to a complex epidemic.

2019冠状病毒病大流行极大地突出了开发模拟系统的重要性，以快速表征和提供感染传播动态的时空预测，具体考虑到现实世界社区中控制病原体传播的人口和空间异质性。开发这样的计算系统还必须克服冷启动问题，该问题与关于新型病原体的传播性和毒性的不可避免的早期数据和现有知识匮乏有关，同时应对随着大流行的演变而不断变化的人口行为和政策选择。在这里，我们描述了我们如何将现实世界城市的数字或虚拟模型建设与敏捷、模块化、基于代理的病毒传播模型以及导航和社交媒体互动数据相结合，以克服这些挑战，从而提供一种新的模拟工具CitySEIRCast，可以在次国家层面模拟病毒传播。我们的数据管道和工作流程的设计是灵活的和可扩展的，这样我们就可以在混合云/集群系统上实现系统，并且足够敏捷，可以应对不同的人群环境，甚至是疾病。我们的模拟结果表明，CitySEIRCast可以提供及时的高分辨率时空流行病预测，以支持对大流行状态的态势感知，并促进对弱势亚群体和地点的评估，以及评估实施干预措施的影响，包括人群对病例发病率波动的行为反应的影响。这项工作是应县机构要求支持其COVID-19监测、风险评估和规划工作的要求而开展的，使用所描述的工作流程，我们能够提供不间断的两周模拟，以指导他们从2021年底到2023年的一年多时间内的工作。我们讨论了未来的工作，这些工作可以显著提高这种基于数字城市的流行病建模系统的可扩展性和实时应用，这样，对传染病在时间和空间上可能遵循的路径的有效预测和预测，可以用来预测传播动态、风险群体和地区，以及有效应对复杂流行病的选择。

{"title":"CitySEIRCast: an agent-based city digital twin for pandemic analysis and simulation","authors":"Shakir Bilal, Wajdi Zaatour, Yilian Alonso Otano, Arindam Saha, Ken Newcomb, Soo Kim, Jun Kim, Raveena Ginjala, Derek Groen, Edwin Michael","doi":"10.1007/s40747-024-01683-x","DOIUrl":"https://doi.org/10.1007/s40747-024-01683-x","url":null,"abstract":"The COVID-19 pandemic has dramatically highlighted the importance of developing simulation systems for quickly characterizing and providing spatio-temporal forecasts of infection spread dynamics that take specific accounts of the population and spatial heterogeneities that govern pathogen transmission in real-world communities. Developing such computational systems must also overcome the cold-start problem related to the inevitable scarce early data and extant knowledge regarding a novel pathogen’s transmissibility and virulence, while addressing changing population behavior and policy options as a pandemic evolves. Here, we describe how we have coupled advances in the construction of digital or virtual models of real-world cities with an agile, modular, agent-based model of viral transmission and data from navigation and social media interactions, to overcome these challenges in order to provide a new simulation tool, CitySEIRCast, that can model viral spread at the sub-national level. Our data pipelines and workflows are designed purposefully to be flexible and scalable so that we can implement the system on hybrid cloud/cluster systems and be agile enough to address different population settings and indeed, diseases. Our simulation results demonstrate that CitySEIRCast can provide the timely high resolution spatio-temporal epidemic predictions required for supporting situational awareness of the state of a pandemic as well as for facilitating assessments of vulnerable sub-populations and locations and evaluations of the impacts of implemented interventions, inclusive of the effects of population behavioral response to fluctuations in case incidence. This work arose in response to requests from county agencies to support their work on COVID-19 monitoring, risk assessment, and planning, and using the described workflows, we were able to provide uninterrupted bi-weekly simulations to guide their efforts for over a year from late 2021 to 2023. We discuss future work that can significantly improve the scalability and real-time application of this digital city-based epidemic modelling system, such that validated predictions and forecasts of the paths that may followed by a contagion both over time and space can be used to anticipate the spread dynamics, risky groups and regions, and options for responding effectively to a complex epidemic.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive micro partition and hierarchical merging for accurate mixed data clustering 混合数据精确聚类的自适应微分区和分层合并

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01695-7

Yunfan Zhang, Rong Zou, Yiqun Zhang, Yue Zhang, Yiu-ming Cheung, Kangshun Li

Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority.

异构属性数据（也称为混合数据）以具有数值和分类值的属性为特征，经常出现在各种场景中。由于标注成本高，聚类已成为分析未标记混合数据的一种有利技术。为了解决现实世界中复杂的聚类问题，本文提出了一种基于邻域粗糙集理论的自适应微划分和分层合并（AMPHM）聚类方法，并提出了一种新的分层合并机制。具体来说，我们提出了一种统一于数值和分类属性的距离度量，以利用邻域粗糙集将数据对象划分为细粒度紧凑的簇。然后，我们逐渐合并当前最相似的聚类，以避免将不同的对象合并到相似的聚类中。结果表明，该方法突破了预先设定的搜索簇数k和簇分布偏差带来的聚类性能瓶颈，能够对数值属性和分类属性的各种组合数据集进行聚类。大量的实验评估将所提出的AMPHM与各种数据集上的最新对应物进行了比较，证明了它的优越性。

{"title":"Adaptive micro partition and hierarchical merging for accurate mixed data clustering","authors":"Yunfan Zhang, Rong Zou, Yiqun Zhang, Yue Zhang, Yiu-ming Cheung, Kangshun Li","doi":"10.1007/s40747-024-01695-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01695-7","url":null,"abstract":"Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"23 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LDWLE: self-supervised driven low-light object detection framework LDWLE：自监督驱动微光目标检测框架

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01681-z

Xiaoyang shen, Haibin Li, Yaqian Li, Wenming Zhang

Low-light object detection involves identifying and locating objects in images captured under poor lighting conditions. It plays a significant role in surveillance and security, night pedestrian recognition, and autonomous driving, showcasing broad application prospects. Most existing object detection algorithms and datasets are designed for normal lighting conditions, leading to a significant drop in detection performance when applied to low-light environments. To address this issue, we propose a Low-Light Detection with Low-Light Enhancement (LDWLE) framework. LDWLE is an encoder-decoder architecture where the encoder transforms the raw input data into a compact, abstract representation (encoding), and the decoder gradually generates the target output format from the representation produced by the encoder. Specifically, during training, low-light images are input into the encoder, which produces feature representations that are decoded by two separate decoders: an object detection decoder and a low-light image enhancement decoder. Both decoders share the same encoder and are trained jointly. Throughout the training process, the two decoders optimize each other, guiding the low-light image enhancement towards improvements that benefit object detection. If the input image is normally lit, it first passes through a low-light image conversion module to be transformed into a low-light image before being fed into the encoder. If the input image is already a low-light image, it is directly input into the encoder. During the testing phase, the model can be evaluated in the same way as a standard object detection algorithm. Compared to existing object detection algorithms, LDWLE can train a low-light robust object detection model using standard, normally lit object detection datasets. Additionally, LDWLE is a versatile training framework that can be implemented on most one-stage object detection algorithms. These algorithms typically consist of three components: the backbone, neck, and head. In this framework, the backbone functions as the encoder, while the neck and head form the object detection decoder. Extensive experiments on the COCO, VOC, and ExDark datasets have demonstrated the effectiveness of LDWLE in low-light object detection. In quantitative measurements, it achieves an AP of 25.5 and 38.4 on the synthetic datasets COCO-d and VOC-d, respectively, and achieves the best AP of 30.5 on the real-world dataset ExDark. In qualitative measurements, LDWLE can accurately detect most objects on both public real-world low-light datasets and self-collected ones, demonstrating strong adaptability to varying lighting conditions and multi-scale objects.

弱光目标检测涉及识别和定位在弱光条件下拍摄的图像中的目标。在监控安防、夜间行人识别、自动驾驶等领域发挥着重要作用，具有广阔的应用前景。大多数现有的目标检测算法和数据集都是针对正常光照条件设计的，这导致在低光照环境下检测性能明显下降。为了解决这个问题，我们提出了一个低光增强低光检测（LDWLE）框架。LDWLE是一种编码器-解码器架构，其中编码器将原始输入数据转换为紧凑的抽象表示（编码），解码器从编码器产生的表示逐渐生成目标输出格式。具体来说，在训练过程中，低光图像被输入到编码器中，编码器产生的特征表示由两个独立的解码器解码：一个目标检测解码器和一个低光图像增强解码器。两个解码器共享相同的编码器，并联合训练。在整个训练过程中，两个解码器相互优化，引导弱光图像增强朝着有利于目标检测的方向改进。如果输入图像正常点亮，则首先通过弱光图像转换模块将其转换为弱光图像，然后送入编码器。如果输入图像已经是弱光图像，则直接输入到编码器中。在测试阶段，可以以与标准对象检测算法相同的方式对模型进行评估。与现有的目标检测算法相比，LDWLE可以使用标准的、正常光照的目标检测数据集训练出低光照下的鲁棒目标检测模型。此外，LDWLE是一个通用的训练框架，可以在大多数单阶段目标检测算法上实现。这些算法通常由三个部分组成：脊柱、颈部和头部。在这个框架中，主干作为编码器，而颈部和头部构成目标检测解码器。在COCO、VOC和ExDark数据集上的大量实验证明了LDWLE在低光目标检测中的有效性。在定量测量中，该方法在合成数据集COCO-d和VOC-d上的AP值分别达到25.5和38.4，在真实数据集ExDark上的AP值达到了30.5。在定性测量中，LDWLE无论是在真实世界的公共低光数据集上还是在自采集数据集上，都能准确地检测出大多数目标，对不同光照条件和多尺度目标具有较强的适应性。

{"title":"LDWLE: self-supervised driven low-light object detection framework","authors":"Xiaoyang shen, Haibin Li, Yaqian Li, Wenming Zhang","doi":"10.1007/s40747-024-01681-z","DOIUrl":"https://doi.org/10.1007/s40747-024-01681-z","url":null,"abstract":"Low-light object detection involves identifying and locating objects in images captured under poor lighting conditions. It plays a significant role in surveillance and security, night pedestrian recognition, and autonomous driving, showcasing broad application prospects. Most existing object detection algorithms and datasets are designed for normal lighting conditions, leading to a significant drop in detection performance when applied to low-light environments. To address this issue, we propose a Low-Light Detection with Low-Light Enhancement (LDWLE) framework. LDWLE is an encoder-decoder architecture where the encoder transforms the raw input data into a compact, abstract representation (encoding), and the decoder gradually generates the target output format from the representation produced by the encoder. Specifically, during training, low-light images are input into the encoder, which produces feature representations that are decoded by two separate decoders: an object detection decoder and a low-light image enhancement decoder. Both decoders share the same encoder and are trained jointly. Throughout the training process, the two decoders optimize each other, guiding the low-light image enhancement towards improvements that benefit object detection. If the input image is normally lit, it first passes through a low-light image conversion module to be transformed into a low-light image before being fed into the encoder. If the input image is already a low-light image, it is directly input into the encoder. During the testing phase, the model can be evaluated in the same way as a standard object detection algorithm. Compared to existing object detection algorithms, LDWLE can train a low-light robust object detection model using standard, normally lit object detection datasets. Additionally, LDWLE is a versatile training framework that can be implemented on most one-stage object detection algorithms. These algorithms typically consist of three components: the backbone, neck, and head. In this framework, the backbone functions as the encoder, while the neck and head form the object detection decoder. Extensive experiments on the COCO, VOC, and ExDark datasets have demonstrated the effectiveness of LDWLE in low-light object detection. In quantitative measurements, it achieves an AP of 25.5 and 38.4 on the synthetic datasets COCO-d and VOC-d, respectively, and achieves the best AP of 30.5 on the real-world dataset ExDark. In qualitative measurements, LDWLE can accurately detect most objects on both public real-world low-light datasets and self-collected ones, demonstrating strong adaptability to varying lighting conditions and multi-scale objects.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"12 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MVSGS: Gaussian splatting radiation field enhancement using multi-view stereo MVSGS：多视点立体高斯溅射辐射场增强

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01691-x

Teng Fei, Ligong Bi, Jieming Gao, Shuixuan Chen, Guowei Zhang

With the advent of 3D Gaussian Splatting (3DGS), new and effective solutions have emerged for 3D reconstruction pipelines and scene representation. However, achieving high-fidelity reconstruction of complex scenes and capturing low-frequency features remain long-standing challenges in the field of visual 3D reconstruction. Relying solely on sparse point inputs and simple optimization criteria often leads to non-robust reconstructions of the radiance field, with reconstruction quality heavily dependent on the proper initialization of inputs. Notably, Multi-View Stereo (MVS) techniques offer a mature and reliable approach for generating structured point cloud data using a limited number of views, camera parameters, and feature matching. In this paper, we propose combining MVS with Gaussian Splatting, along with our newly introduced density optimization strategy, to address these challenges. This approach bridges the gap in scene representation by enhancing explicit geometry radiance fields with MVS, and our experimental results demonstrate its effectiveness. Additionally, we have explored the potential of using Gaussian Splatting for non-face template single-process end-to-end Avatar Reconstruction, yielding promising experimental results.

随着3D高斯喷溅（3DGS）的出现，出现了新的有效的3D重建管道和场景表示解决方案。然而，如何实现复杂场景的高保真重建和低频特征的捕获仍然是视觉三维重建领域长期面临的挑战。仅依赖稀疏点输入和简单的优化准则往往会导致辐射场的非鲁棒重建，重建质量严重依赖于输入的适当初始化。值得注意的是，多视图立体（MVS）技术为使用有限数量的视图、相机参数和特征匹配生成结构化点云数据提供了一种成熟可靠的方法。在本文中，我们建议将MVS与高斯飞溅结合起来，以及我们新引入的密度优化策略，以解决这些挑战。该方法通过MVS增强显式几何辐射场，弥补了场景表示的不足，实验结果证明了该方法的有效性。此外，我们已经探索了使用高斯喷溅进行非人脸模板单过程端到端头像重建的潜力，产生了有希望的实验结果。

{"title":"MVSGS: Gaussian splatting radiation field enhancement using multi-view stereo","authors":"Teng Fei, Ligong Bi, Jieming Gao, Shuixuan Chen, Guowei Zhang","doi":"10.1007/s40747-024-01691-x","DOIUrl":"https://doi.org/10.1007/s40747-024-01691-x","url":null,"abstract":"With the advent of 3D Gaussian Splatting (3DGS), new and effective solutions have emerged for 3D reconstruction pipelines and scene representation. However, achieving high-fidelity reconstruction of complex scenes and capturing low-frequency features remain long-standing challenges in the field of visual 3D reconstruction. Relying solely on sparse point inputs and simple optimization criteria often leads to non-robust reconstructions of the radiance field, with reconstruction quality heavily dependent on the proper initialization of inputs. Notably, Multi-View Stereo (MVS) techniques offer a mature and reliable approach for generating structured point cloud data using a limited number of views, camera parameters, and feature matching. In this paper, we propose combining MVS with Gaussian Splatting, along with our newly introduced density optimization strategy, to address these challenges. This approach bridges the gap in scene representation by enhancing explicit geometry radiance fields with MVS, and our experimental results demonstrate its effectiveness. Additionally, we have explored the potential of using Gaussian Splatting for non-face template single-process end-to-end Avatar Reconstruction, yielding promising experimental results.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"54 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

T-LLaMA: a Tibetan large language model based on LLaMA2 T-LLaMA：基于LLaMA2的藏文大语言模型

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01641-7

Hui Lv, Chi Pu, La Duo, Yan Li, Qingguo Zhou, Jun Shen

The advent of ChatGPT and GPT-4 has generated substantial interest in large language model (LLM) research, showcasing remarkable performance in various applications such as conversation systems, machine translation, and research paper summarization. However, their efficacy diminishes when applied to low-resource languages, particularly in academic research contexts like Tibetan. In this study, we trained Tibetan LLaMA (T-LLaMA), a model based on efficient pre-training technology for three downstream tasks: text classification, news text generation and automatic text summarization. To address the lack of corpus, we constructed a Tibetan dataset comprising 2.2 billion characters. Furthermore, we augmented the vocabulary of LLaMA2 from META AI by expanding the Tibetan vocabulary using SentencePiece. Notably, the text classification task attains a state-of-the-art (SOTA) accuracy of 79.8% on a publicly available dataset Tibetan News Classification Corpus. In addition, manual review of 500 generated samples indicates satisfactory results in both news text generation and text summarization tasks. To our knowledge, T-LLaMA stands as the first large-scale language model in Tibetan natural language processing (NLP) with parameters in the billion range. We openly provide our trained models, anticipating that this contribution not only fills gaps in the Tibetan large-scale language model domain but also serves as foundational models for researchers with limited computational resources in the Tibetan NLP community. The T-LLaMA model is available at https://huggingface.co/Pagewood/T-LLaMA.

ChatGPT 和 GPT-4 的出现引起了人们对大型语言模型（LLM）研究的极大兴趣，它们在对话系统、机器翻译和研究论文摘要等各种应用中表现出了卓越的性能。然而，当它们应用于低资源语言，尤其是像藏语这样的学术研究环境时，其功效就会大打折扣。在本研究中，我们基于高效的预训练技术训练了藏文 LLaMA（T-LaMA）模型，用于三个下游任务：文本分类、新闻文本生成和自动文本摘要。为了解决缺乏语料库的问题，我们构建了一个包含 22 亿字符的藏文数据集。此外，我们还使用 SentencePiece 扩展了藏语词汇，从而增强了 META AI 的 LLaMA2 词汇量。值得注意的是，在公开数据集《西藏新闻分类语料库》上，文本分类任务的准确率达到了最先进的（SOTA）79.8%。此外，对生成的 500 个样本进行的人工审核表明，在新闻文本生成和文本摘要任务中都取得了令人满意的结果。据我们所知，T-LaMA 是藏语自然语言处理（NLP）领域第一个参数在十亿范围内的大规模语言模型。我们公开提供经过训练的模型，希望这一贡献不仅能填补藏文大规模语言模型领域的空白，还能为藏文 NLP 界计算资源有限的研究人员提供基础模型。T-LaMA 模型可在 https://huggingface.co/Pagewood/T-LLaMA 上获取。

{"title":"T-LLaMA: a Tibetan large language model based on LLaMA2","authors":"Hui Lv, Chi Pu, La Duo, Yan Li, Qingguo Zhou, Jun Shen","doi":"10.1007/s40747-024-01641-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01641-7","url":null,"abstract":"The advent of ChatGPT and GPT-4 has generated substantial interest in large language model (LLM) research, showcasing remarkable performance in various applications such as conversation systems, machine translation, and research paper summarization. However, their efficacy diminishes when applied to low-resource languages, particularly in academic research contexts like Tibetan. In this study, we trained Tibetan LLaMA (T-LLaMA), a model based on efficient pre-training technology for three downstream tasks: text classification, news text generation and automatic text summarization. To address the lack of corpus, we constructed a Tibetan dataset comprising 2.2 billion characters. Furthermore, we augmented the vocabulary of LLaMA2 from META AI by expanding the Tibetan vocabulary using SentencePiece. Notably, the text classification task attains a state-of-the-art (SOTA) accuracy of 79.8% on a publicly available dataset Tibetan News Classification Corpus. In addition, manual review of 500 generated samples indicates satisfactory results in both news text generation and text summarization tasks. To our knowledge, T-LLaMA stands as the first large-scale language model in Tibetan natural language processing (NLP) with parameters in the billion range. We openly provide our trained models, anticipating that this contribution not only fills gaps in the Tibetan large-scale language model domain but also serves as foundational models for researchers with limited computational resources in the Tibetan NLP community. The T-LLaMA model is available at https://huggingface.co/Pagewood/T-LLaMA.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"10 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A UAV perspective based lightweight target detection and tracking algorithm for intelligent transportation 基于无人机视角的智能交通轻量化目标检测与跟踪算法

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01687-7

Quan Wang, Guangfei Ye, Qidong Chen, Songyang Zhang, Fengqing Wang

Vehicle detection and tracking from a UAV perspective often encounters omission and misdetection due to the small targets, complex scenes and target occlusion, which finally influences hugely on detection accuracy and target tracking stability. Additionally, the number of parameters of current model is large that makes it is hard to be deployed on mobile devices. Therefore, this paper proposes a YOLO-LMP and NGCTrack-based target detection and tracking algorithm to address these issues. Firstly, the performance of detecting small targets in occluded scenes is enhanced by adding a MODConv to the small-target detection head and increasing its size; In addition, excessive deletion of prediction boxes is prevented by utilizing LSKAttention mechanism to adaptively adjust the target sensing field at the downsampling stage and combining it with the Soft-NMS strategy. Furthermore, the C2f module is replaced by the FPW to reduce the pointless computation and memory utilization of the model. At the target tracking stage, the so-called NGCTrack in our algorithm replaces IOU with GIOU and employs a modified NSA Kalman filter to adjust the state-space aspect ratio for width prediction. Finally, the camera adjustment mechanism was introduced to improve the precision and consistency of tracking. The experimental results show that, compared to YOLOv8, the YOLO-LMP model improves map50 and map50:95 metrics by 10.3 and 12.2%, respectively and the number of parameters is decreased by 47.7%. After combined it with the improved NGCTrack, the number of IDSW reduced by 73.6% compared to the ByteTrack method, while the MOTA and IDF1 increase by 5.2 and 9.8%, respectively.

无人机视角下的车辆检测与跟踪由于目标小、场景复杂、目标遮挡等原因，经常出现漏检和误检的情况，最终极大地影响了检测精度和目标跟踪的稳定性。此外，当前模型的参数数量较多，难以在移动设备上部署。因此，本文提出了一种基于YOLO-LMP和ngctrack的目标检测与跟踪算法来解决这些问题。首先，通过在小目标检测头中加入MODConv，增大小目标检测头的大小，增强对遮挡场景中小目标的检测性能；利用lsk - attention机制在降采样阶段自适应调整目标感知场，并与Soft-NMS策略相结合，避免了过度删除预测框。此外，C2f模块被FPW取代，以减少模型的无意义计算和内存占用。在目标跟踪阶段，我们算法中所谓的NGCTrack将IOU替换为GIOU，并采用改进的NSA卡尔曼滤波器调整状态空间宽高比进行宽度预测。最后，引入了摄像机调整机构，提高了跟踪的精度和一致性。实验结果表明，与YOLOv8相比，YOLO-LMP模型的map50和map50:95指标分别提高了10.3%和12.2%，参数数量减少了47.7%。与改进的NGCTrack方法结合后，IDSW的数量比ByteTrack方法减少了73.6%，而MOTA和IDF1的数量分别增加了5.2%和9.8%。

{"title":"A UAV perspective based lightweight target detection and tracking algorithm for intelligent transportation","authors":"Quan Wang, Guangfei Ye, Qidong Chen, Songyang Zhang, Fengqing Wang","doi":"10.1007/s40747-024-01687-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01687-7","url":null,"abstract":"Vehicle detection and tracking from a UAV perspective often encounters omission and misdetection due to the small targets, complex scenes and target occlusion, which finally influences hugely on detection accuracy and target tracking stability. Additionally, the number of parameters of current model is large that makes it is hard to be deployed on mobile devices. Therefore, this paper proposes a YOLO-LMP and NGCTrack-based target detection and tracking algorithm to address these issues. Firstly, the performance of detecting small targets in occluded scenes is enhanced by adding a MODConv to the small-target detection head and increasing its size; In addition, excessive deletion of prediction boxes is prevented by utilizing LSKAttention mechanism to adaptively adjust the target sensing field at the downsampling stage and combining it with the Soft-NMS strategy. Furthermore, the C2f module is replaced by the FPW to reduce the pointless computation and memory utilization of the model. At the target tracking stage, the so-called NGCTrack in our algorithm replaces IOU with GIOU and employs a modified NSA Kalman filter to adjust the state-space aspect ratio for width prediction. Finally, the camera adjustment mechanism was introduced to improve the precision and consistency of tracking. The experimental results show that, compared to YOLOv8, the YOLO-LMP model improves map50 and map50:95 metrics by 10.3 and 12.2%, respectively and the number of parameters is decreased by 47.7%. After combined it with the improved NGCTrack, the number of IDSW reduced by 73.6% compared to the ByteTrack method, while the MOTA and IDF1 increase by 5.2 and 9.8%, respectively.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"54 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transforming traffic accident investigations: a virtual-real-fusion framework for intelligent 3D traffic accident reconstruction 转换交通事故调查：智能三维交通事故重建的虚实融合框架

IF 5.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems

Pub Date : 2024-12-19 DOI: 10.1007/s40747-024-01693-9

Yanzhan Chen, Qian Zhang, Fan Yu

The daily occurrence of traffic accidents has led to the development of 3D reconstruction as a key tool for reconstruction, investigation, and insurance claims. This study proposes a novel virtual-real-fusion simulation framework that integrates traffic accident generation, unmanned aerial vehicle (UAV)-based image collection, and a 3D traffic accident reconstruction pipeline with advanced computer vision techniques and unsupervised 3D point cloud clustering algorithms. Specifically, a micro-traffic simulator and an autonomous driving simulator are co-simulated to generate high-fidelity traffic accidents. Subsequently, a deep learning-based reconstruction method, i.e., 3D Gaussian splatting (3D-GS), is utilized to construct 3D digitized traffic accident scenes from UAV-based image datasets collected in the traffic simulation environment. While visual rendering by 3D-GS struggles under adverse conditions like nighttime or rain, a clustering parameter stochastic optimization model and mixed-integer programming Bayesian optimization (MIPBO) algorithm are proposed to enhance the segmentation of large-scale 3D point clouds. In the numerical experiments, 3D-GS produces high-quality, seamless, and real-time rendered traffic accident scenes achieve a structural similarity index measure of up to 0.90 across different towns. Furthermore, the proposed MIPDBO algorithm exhibits a remarkably fast convergence rate, requiring only 3–5 iterations to identify well-performing parameters and achieve a high ({R}^{2}) value of 0.8 on a benchmark cluster problem. Finally, the Gaussian Mixture Model assisted by MIPBO accurately separates various traffic elements in the accident scenes, demonstrating higher effectiveness compared to other classical clustering algorithms.

随着交通事故的频繁发生，三维重建技术已成为重建、调查和保险索赔的重要工具。本研究提出了一种新型的虚实融合仿真框架，该框架结合先进的计算机视觉技术和无监督的三维点云聚类算法，集成了交通事故生成、基于无人机的图像采集和三维交通事故重建管道。具体而言，通过微交通模拟器和自动驾驶模拟器的联合仿真，生成高保真的交通事故。随后，利用基于深度学习的三维高斯溅射（3D- gs）重建方法，从交通仿真环境中采集的基于无人机的图像数据集构建三维数字化交通事故场景。针对3D- gs在夜间或雨天等恶劣条件下难以进行视觉渲染的问题，提出了一种聚类参数随机优化模型和混合整数规划贝叶斯优化（MIPBO）算法来增强对大规模三维点云的分割。在数值实验中，3D-GS生成的高质量、无缝和实时渲染的交通事故场景在不同城镇之间实现了高达0.90的结构相似指数。此外，所提出的MIPDBO算法具有非常快的收敛速度，只需3-5次迭代即可识别出性能良好的参数，并且在基准聚类问题上达到0.8的({R}^{2})值。最后，MIPBO辅助下的高斯混合模型准确地分离了事故现场的各种交通元素，与其他经典聚类算法相比，显示出更高的有效性。

{"title":"Transforming traffic accident investigations: a virtual-real-fusion framework for intelligent 3D traffic accident reconstruction","authors":"Yanzhan Chen, Qian Zhang, Fan Yu","doi":"10.1007/s40747-024-01693-9","DOIUrl":"https://doi.org/10.1007/s40747-024-01693-9","url":null,"abstract":"The daily occurrence of traffic accidents has led to the development of 3D reconstruction as a key tool for reconstruction, investigation, and insurance claims. This study proposes a novel virtual-real-fusion simulation framework that integrates traffic accident generation, unmanned aerial vehicle (UAV)-based image collection, and a 3D traffic accident reconstruction pipeline with advanced computer vision techniques and unsupervised 3D point cloud clustering algorithms. Specifically, a micro-traffic simulator and an autonomous driving simulator are co-simulated to generate high-fidelity traffic accidents. Subsequently, a deep learning-based reconstruction method, i.e., 3D Gaussian splatting (3D-GS), is utilized to construct 3D digitized traffic accident scenes from UAV-based image datasets collected in the traffic simulation environment. While visual rendering by 3D-GS struggles under adverse conditions like nighttime or rain, a clustering parameter stochastic optimization model and mixed-integer programming Bayesian optimization (MIPBO) algorithm are proposed to enhance the segmentation of large-scale 3D point clouds. In the numerical experiments, 3D-GS produces high-quality, seamless, and real-time rendered traffic accident scenes achieve a structural similarity index measure of up to 0.90 across different towns. Furthermore, the proposed MIPDBO algorithm exhibits a remarkably fast convergence rate, requiring only 3–5 iterations to identify well-performing parameters and achieve a high ({R}^{2}) value of 0.8 on a benchmark cluster problem. Finally, the Gaussian Mixture Model assisted by MIPBO accurately separates various traffic elements in the accident scenes, demonstrating higher effectiveness compared to other classical clustering algorithms.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"27 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0