As an image retrieval problem, person re-identification (Re-ID) relies on robust features extracted by convolution neural models. Most current methods use large backbone models for feature extraction (e.g., ResNet50). However, these large backbone models have many parameters, which cause many problems when embedded in smart camera devices. For example, the device's computing resources are limited, the real-time operation speed is limited, etc. So it is necessary to construct models with low parameters and low complexity. This paper proposes a new lightweight baseline for Re-ID, which is SCL-net and all underlying modules of the model are reconstructed. In our work, we design a new convolution unit----symmetrical combination units (SC-unit), which construct features map of richer channels by reusing feature maps from different convolution layers. In addition, we redesigned all the base modules of SCL-net and proved the effectiveness of all modules. We joint training of shallow and deep features of the model respectively to improve the accuracy of the model. Our SCL-net has about 2.3M parameters, and it can achieve 95.2%/85.9% on Rank-1 and mAP without any pretraining.
{"title":"Lightweight person re-identification model employing symmetrical combination units","authors":"dawei cai, qingwei tang","doi":"10.1117/12.3014389","DOIUrl":"https://doi.org/10.1117/12.3014389","url":null,"abstract":"As an image retrieval problem, person re-identification (Re-ID) relies on robust features extracted by convolution neural models. Most current methods use large backbone models for feature extraction (e.g., ResNet50). However, these large backbone models have many parameters, which cause many problems when embedded in smart camera devices. For example, the device's computing resources are limited, the real-time operation speed is limited, etc. So it is necessary to construct models with low parameters and low complexity. This paper proposes a new lightweight baseline for Re-ID, which is SCL-net and all underlying modules of the model are reconstructed. In our work, we design a new convolution unit----symmetrical combination units (SC-unit), which construct features map of richer channels by reusing feature maps from different convolution layers. In addition, we redesigned all the base modules of SCL-net and proved the effectiveness of all modules. We joint training of shallow and deep features of the model respectively to improve the accuracy of the model. Our SCL-net has about 2.3M parameters, and it can achieve 95.2%/85.9% on Rank-1 and mAP without any pretraining.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"16 2","pages":"129692O - 129692O-11"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
dongmei Liu, Binfeng D. Lin, Yongfeng Li, V. Tarelnyk
In order to solve the disassembly plan of the target parts in the product with high efficiency, a disassembly hybrid graph model of the target parts is proposed and established based on the disassembly connection relationship and disassembly priority constraint relationship between the parts in the product. The disassembly sequence planning problem of the target parts is transformed into a search and optimization problem for the path with the optimal value in the graph model. At the same time, the sorting algorithm is used to solve the mixed graph model of the target part disassembly, finally, an example is given the feasibility of this method has been verified.
{"title":"Research on selective disassembly sequence planning based on graph model","authors":"dongmei Liu, Binfeng D. Lin, Yongfeng Li, V. Tarelnyk","doi":"10.1117/12.3014520","DOIUrl":"https://doi.org/10.1117/12.3014520","url":null,"abstract":"In order to solve the disassembly plan of the target parts in the product with high efficiency, a disassembly hybrid graph model of the target parts is proposed and established based on the disassembly connection relationship and disassembly priority constraint relationship between the parts in the product. The disassembly sequence planning problem of the target parts is transformed into a search and optimization problem for the path with the optimal value in the graph model. At the same time, the sorting algorithm is used to solve the mixed graph model of the target part disassembly, finally, an example is given the feasibility of this method has been verified.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"186 1-2","pages":"129691W - 129691W-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is crucial to ensure the safety of personnel and prevent unauthorized intrusion in the non-stop construction area of large airports. This study proposes an image recognition method for dangerous behavior of non-stop construction personnel in large airports based on infrared imaging technology. Using infrared imaging technology to collect visual information of images of non-stop construction personnel in large airports, and analyzing images using structured similarity features; Based on supervised comparative learning, the method of extracting backbone features is adopted to achieve dynamic feature segmentation and reconstruction processing; Based on ambiguity analysis, extract the edge bounding contour features of personnel and identify dangerous intrusion behaviors of personnel. Through experimental verification, this method has high accuracy in detecting personnel's dangerous intrusion behavior.
{"title":"Image recognition method for dangerous behavior of non-stop construction personnel in large airports","authors":"Zhenyu Zhao, Liangsui Geng","doi":"10.1117/12.3014586","DOIUrl":"https://doi.org/10.1117/12.3014586","url":null,"abstract":"It is crucial to ensure the safety of personnel and prevent unauthorized intrusion in the non-stop construction area of large airports. This study proposes an image recognition method for dangerous behavior of non-stop construction personnel in large airports based on infrared imaging technology. Using infrared imaging technology to collect visual information of images of non-stop construction personnel in large airports, and analyzing images using structured similarity features; Based on supervised comparative learning, the method of extracting backbone features is adopted to achieve dynamic feature segmentation and reconstruction processing; Based on ambiguity analysis, extract the edge bounding contour features of personnel and identify dangerous intrusion behaviors of personnel. Through experimental verification, this method has high accuracy in detecting personnel's dangerous intrusion behavior.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"77 2","pages":"1296915 - 1296915-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The electrical connection serves as a vital and abundant link in power, electronic equipment, and systems, with the electrical contact acting as its core component. In practical working conditions, fretting wear occurs during the usage of electrical contacts, leading to surface destruction and a decline in their performance. Determining the degree of wear on electrical contacts is crucial for assessing their failure in engineering applications. This study focuses on conducting fretting wear tests on copper material under different cycles for electrical contacts while utilizing machine vision algorithms to detect the morphological characteristics of wear marks. Gray threshold segmentation is applied to extract texture features from wear marks after various oxidation conditions. Pseudocolorization techniques are employed to process extracted morphologies, followed by calculating their characteristic areas. Finally, combining these results with contact resistance curves allows for judging the electrical conductivity of the electrical contact under different cycles.
{"title":"Research on electrical contact performance based on machine vision","authors":"Chun-lin Li, Yangxin Ou, Lei You, Zewu Zhang","doi":"10.1117/12.3014355","DOIUrl":"https://doi.org/10.1117/12.3014355","url":null,"abstract":"The electrical connection serves as a vital and abundant link in power, electronic equipment, and systems, with the electrical contact acting as its core component. In practical working conditions, fretting wear occurs during the usage of electrical contacts, leading to surface destruction and a decline in their performance. Determining the degree of wear on electrical contacts is crucial for assessing their failure in engineering applications. This study focuses on conducting fretting wear tests on copper material under different cycles for electrical contacts while utilizing machine vision algorithms to detect the morphological characteristics of wear marks. Gray threshold segmentation is applied to extract texture features from wear marks after various oxidation conditions. Pseudocolorization techniques are employed to process extracted morphologies, followed by calculating their characteristic areas. Finally, combining these results with contact resistance curves allows for judging the electrical conductivity of the electrical contact under different cycles.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"5 4","pages":"129692T - 129692T-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Under the background of digital information age, faced with the increasing data scale and complexity, the application limitations of traditional centralized retrieval services are becoming more and more obvious, and it is urgent to improve the data structure expansion, incremental update control and retrieval operation efficiency. In this paper, the efficient retrieval algorithm and technology of massive data information are taken as the research object, and a set of construction scheme of big data storage and retrieval system is proposed for unstructured data, which promotes the organic combination of distributed technology and full-text retrieval technology and realizes the optimization of fast retrieval processing mode of large-scale data. The system is based on Hadoop framework, with Hbase as the data storage module, and combined with ElasticSearch engine, IKAnalyzer word breaker and Redis cache to complete real-time and efficient data retrieval. Finally, based on Java web technology, a network application program convenient for users to operate online is formed. Practice has proved that the system has solved many problems in the process of collecting, storing and retrieving massive unstructured text data. At the same time, it improves the sharing transmission efficiency and concurrent access control ability of data information, and opens up a brand-new big data retrieval service model.
在数字信息时代背景下,面对日益增长的数据规模和复杂性,传统集中式检索服务的应用局限性日益明显,亟需提高数据结构扩展、增量更新控制和检索操作效率。本文以海量数据信息的高效检索算法与技术为研究对象,针对非结构化数据提出了一套大数据存储与检索系统的构建方案,促进了分布式技术与全文检索技术的有机结合,实现了大规模数据快速检索处理模式的优化。该系统基于Hadoop框架,以Hbase为数据存储模块,结合ElasticSearch引擎、IKAnalyzer断字器和Redis缓存,完成实时高效的数据检索。最后,基于 Java Web 技术,形成了方便用户在线操作的网络应用程序。实践证明,该系统解决了海量非结构化文本数据采集、存储和检索过程中的诸多问题。同时,提高了数据信息的共享传输效率和并发访问控制能力,开辟了一种全新的大数据检索服务模式。
{"title":"Research and implementation of efficient retrieval algorithm in big data environment","authors":"pan gao, Shuhua shao","doi":"10.1117/12.3014436","DOIUrl":"https://doi.org/10.1117/12.3014436","url":null,"abstract":"Under the background of digital information age, faced with the increasing data scale and complexity, the application limitations of traditional centralized retrieval services are becoming more and more obvious, and it is urgent to improve the data structure expansion, incremental update control and retrieval operation efficiency. In this paper, the efficient retrieval algorithm and technology of massive data information are taken as the research object, and a set of construction scheme of big data storage and retrieval system is proposed for unstructured data, which promotes the organic combination of distributed technology and full-text retrieval technology and realizes the optimization of fast retrieval processing mode of large-scale data. The system is based on Hadoop framework, with Hbase as the data storage module, and combined with ElasticSearch engine, IKAnalyzer word breaker and Redis cache to complete real-time and efficient data retrieval. Finally, based on Java web technology, a network application program convenient for users to operate online is formed. Practice has proved that the system has solved many problems in the process of collecting, storing and retrieving massive unstructured text data. At the same time, it improves the sharing transmission efficiency and concurrent access control ability of data information, and opens up a brand-new big data retrieval service model.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"24 4","pages":"129690H - 129690H-4"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinhao Wang, Jizhuang Hui, Yaqian Zhang, Tao Zhou, Kai Ding
Aiming at multi-target detection in complex human-robot collaborative assembly scenes, an improved YOLOv7 algorithm is proposed. Specifically, the Wise-Intersection over Union(Wise-IoU) loss function and the BiFormer attention module are introduced to improve the recognition performance of small assembly parts. Taking a worm-gear decelerator as an example, a dataset for assembly parts recognition is made. By training the improved network in the self-made dataset, the mAP@.5 value is increased by 3.25 % and the average total loss is reduced by 0.02365. The experiment results show that the improved YOLOv7 algorithm can achieve multi-assembly parts detection in collaborative assembly.
{"title":"Multitarget detection of assembly parts based on improved YOLOv7","authors":"Jinhao Wang, Jizhuang Hui, Yaqian Zhang, Tao Zhou, Kai Ding","doi":"10.1117/12.3014468","DOIUrl":"https://doi.org/10.1117/12.3014468","url":null,"abstract":"Aiming at multi-target detection in complex human-robot collaborative assembly scenes, an improved YOLOv7 algorithm is proposed. Specifically, the Wise-Intersection over Union(Wise-IoU) loss function and the BiFormer attention module are introduced to improve the recognition performance of small assembly parts. Taking a worm-gear decelerator as an example, a dataset for assembly parts recognition is made. By training the improved network in the self-made dataset, the mAP@.5 value is increased by 3.25 % and the average total loss is reduced by 0.02365. The experiment results show that the improved YOLOv7 algorithm can achieve multi-assembly parts detection in collaborative assembly.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 8","pages":"1296927 - 1296927-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the wide application of 3D building cluster models in urban planning, visualization and other fields, how to improve the rendering efficiency and reduce the computational cost of building cluster models has become an important issue. To address this problem, this paper proposes a visual perception evaluation model used to assess the weights of buildings based on multi-factor considerations to determine the order of building simplification, and weights the vertex importance for the classical QEM algorithm to redefine the collapsing cost of the edges, which achieves the purpose of reducing the complexity of the model while maintaining the visual quality. Experimental results show that the algorithm can significantly reduce the model rendering time and computational cost while maintaining the visual quality.
{"title":"Research on the simplification of building complex model under multi-factor constraints","authors":"Haoyuan Bai, Kelong Yang, Shunhua Liao","doi":"10.1117/12.3014388","DOIUrl":"https://doi.org/10.1117/12.3014388","url":null,"abstract":"With the wide application of 3D building cluster models in urban planning, visualization and other fields, how to improve the rendering efficiency and reduce the computational cost of building cluster models has become an important issue. To address this problem, this paper proposes a visual perception evaluation model used to assess the weights of buildings based on multi-factor considerations to determine the order of building simplification, and weights the vertex importance for the classical QEM algorithm to redefine the collapsing cost of the edges, which achieves the purpose of reducing the complexity of the model while maintaining the visual quality. Experimental results show that the algorithm can significantly reduce the model rendering time and computational cost while maintaining the visual quality.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 9","pages":"129691G - 129691G-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aiming at the difficulty of point feature matching in 3D reconstruction to meet the tracking requirements of weakly textured scenes, this paper proposes a visual SLAM algorithm based on grid method combining points with edge features. In the tracking thread, a method based on grid method is proposed to evaluate the feature quality of points. The textures of external environment are judged according to ORB feature description, and the information of Canny edge features of weakly textured mesh is added to improve the positioning accuracy. In the local mapping thread, the joint feature points pose and map points are iteratively optimized to improve the convergence rate of the algorithm. The simulation results show that the proposed algorithm has a good location and tracking effects in the weak texture scene.
针对三维重建中点特征匹配难以满足弱纹理场景跟踪要求的问题,本文提出了一种基于网格法的视觉 SLAM 算法,将点与边缘特征相结合。在跟踪线程中,提出了一种基于网格法的点特征质量评估方法。根据 ORB 特征描述对外部环境的纹理进行判断,并加入弱纹理网格的 Canny 边缘特征信息,以提高定位精度。在局部映射线程中,对联合特征点姿态和映射点进行迭代优化,以提高算法的收敛速度。仿真结果表明,所提出的算法在弱纹理场景中具有良好的定位和跟踪效果。
{"title":"RGB-D visual SLAM for point association local edge features","authors":"Hongtu Li, Fang Wang, Yunjiang Zhang","doi":"10.1117/12.3014358","DOIUrl":"https://doi.org/10.1117/12.3014358","url":null,"abstract":"Aiming at the difficulty of point feature matching in 3D reconstruction to meet the tracking requirements of weakly textured scenes, this paper proposes a visual SLAM algorithm based on grid method combining points with edge features. In the tracking thread, a method based on grid method is proposed to evaluate the feature quality of points. The textures of external environment are judged according to ORB feature description, and the information of Canny edge features of weakly textured mesh is added to improve the positioning accuracy. In the local mapping thread, the joint feature points pose and map points are iteratively optimized to improve the convergence rate of the algorithm. The simulation results show that the proposed algorithm has a good location and tracking effects in the weak texture scene.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 23","pages":"129692N - 129692N-5"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A detection algorithm of maximum and minimum eigenvalues based on random matrix theory is proposed for the problem of abnormal detection of customer electricity consumption. Firstly, the data source matrix is constructed by time alignment and superimposed Gaussian white noise, and the sliding window method is used to obtain the window data indicating the operation status at each moment; secondly, the window data are standardized, feature extraction and other operations are performed, and the difference and the sum of the maximum and minimum eigenvalues are compared to construct the feature detection indexes and thresholds; finally, the algorithm is studied and verified by simulation. The results show that the algorithm does not depend on any model, can analyze the operation status of the system more comprehensively and adequately, and realizes the effective detection of abnormal data
{"title":"Identification of customer electricity usage anomalies based on random matrix theory","authors":"Shuo Zhou, Qihui Wang","doi":"10.1117/12.3014405","DOIUrl":"https://doi.org/10.1117/12.3014405","url":null,"abstract":"A detection algorithm of maximum and minimum eigenvalues based on random matrix theory is proposed for the problem of abnormal detection of customer electricity consumption. Firstly, the data source matrix is constructed by time alignment and superimposed Gaussian white noise, and the sliding window method is used to obtain the window data indicating the operation status at each moment; secondly, the window data are standardized, feature extraction and other operations are performed, and the difference and the sum of the maximum and minimum eigenvalues are compared to construct the feature detection indexes and thresholds; finally, the algorithm is studied and verified by simulation. The results show that the algorithm does not depend on any model, can analyze the operation status of the system more comprehensively and adequately, and realizes the effective detection of abnormal data","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 6","pages":"129692M - 129692M-8"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For complex stroke rehabilitation scenarios, visual algorithms, such as motion recognition or video understanding, find it challenging to focus on patient areas with slow motion amplitude and pay more attention to targets with drastic changes in light flow. Therefore, it can provide critical perspectives and adequate information for the above visual tasks using a semantic segmentation algorithm to capture the patient's area from the captured image. Currently, the weakly supervised segmentation algorithm based on bounding boxes tends to utilize existing image classification methods. They can perform secondary processing on the internal images of boxes to obtain larger areas of pseudo-label information. In order to avoid the redundancy caused by algorithm concatenation, this paper proposes an end-to-end weakly supervised segmentation algorithm. In this method, a U-shaped residual module with variable depth is designed to capture the deep semantic features of images, and its output is integrated into the target matrix of the NCut problem in the form of blocks. Then, the region of the target is indicated by solving the sub-minimum eigenvector of the generalized eigensystem, and the segmentation is realized. We conducted experiments on the PASCAL VOC 2012 dataset, and the proposed method achieved 67.7% mIoU. On the private dataset, we compared the proposed method with similar algorithms, which can segment the target area more intensively
{"title":"Box-driven coarse-grained segmentation for stroke rehabilitation scenarios","authors":"Yiming Fan, Yunjia Liu, Xiaofeng Lu","doi":"10.1117/12.3014426","DOIUrl":"https://doi.org/10.1117/12.3014426","url":null,"abstract":"For complex stroke rehabilitation scenarios, visual algorithms, such as motion recognition or video understanding, find it challenging to focus on patient areas with slow motion amplitude and pay more attention to targets with drastic changes in light flow. Therefore, it can provide critical perspectives and adequate information for the above visual tasks using a semantic segmentation algorithm to capture the patient's area from the captured image. Currently, the weakly supervised segmentation algorithm based on bounding boxes tends to utilize existing image classification methods. They can perform secondary processing on the internal images of boxes to obtain larger areas of pseudo-label information. In order to avoid the redundancy caused by algorithm concatenation, this paper proposes an end-to-end weakly supervised segmentation algorithm. In this method, a U-shaped residual module with variable depth is designed to capture the deep semantic features of images, and its output is integrated into the target matrix of the NCut problem in the form of blocks. Then, the region of the target is indicated by solving the sub-minimum eigenvector of the generalized eigensystem, and the segmentation is realized. We conducted experiments on the PASCAL VOC 2012 dataset, and the proposed method achieved 67.7% mIoU. On the private dataset, we compared the proposed method with similar algorithms, which can segment the target area more intensively","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 3","pages":"129692D - 129692D-7"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}