Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506597
Tong Xin, Bohao Chen, Xi Chen, Hua Han
Registration of serial section electron microscopy (ssEM) images is essential for neural circuit reconstruction. Morphologies of neurite structure in adjacent sections are different. Thus, it is challenging to extract valid features in ssEM image registration. Convolutional neural networks (CNN) have made unprecedented progress in feature extraction of natural images. However, morphological differences need not be considered in the registration of natural images. Directly applying these methods will result in matching failure or over-registration. This paper proposes an unsupervised learning-based representation taking the morphological differences of ssEM images into account. CNN architecture was used to extract the feature. To train the network, the focused ion beam scanning electron microscope (FIB-SEM) images are used. The FIB-SEM images are in situ, so they are naturally registered. Sampling those images with a certain thickness can teach CNN to learn changes in neurite structure. The learned feature can be directly applied to existing ssEM image registration methods and reduce the negative effect of section thickness on registration accuracy. The experimental results show that the proposed feature outperforms the state-of-the-art method in matching accuracy and significantly improves the registration outcome when used in ssEM images.
{"title":"UTR: Unsupervised Learning of Thickness-Insensitive Representations for Electron Microscope Image","authors":"Tong Xin, Bohao Chen, Xi Chen, Hua Han","doi":"10.1109/ICIP42928.2021.9506597","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506597","url":null,"abstract":"Registration of serial section electron microscopy (ssEM) images is essential for neural circuit reconstruction. Morphologies of neurite structure in adjacent sections are different. Thus, it is challenging to extract valid features in ssEM image registration. Convolutional neural networks (CNN) have made unprecedented progress in feature extraction of natural images. However, morphological differences need not be considered in the registration of natural images. Directly applying these methods will result in matching failure or over-registration. This paper proposes an unsupervised learning-based representation taking the morphological differences of ssEM images into account. CNN architecture was used to extract the feature. To train the network, the focused ion beam scanning electron microscope (FIB-SEM) images are used. The FIB-SEM images are in situ, so they are naturally registered. Sampling those images with a certain thickness can teach CNN to learn changes in neurite structure. The learned feature can be directly applied to existing ssEM image registration methods and reduce the negative effect of section thickness on registration accuracy. The experimental results show that the proposed feature outperforms the state-of-the-art method in matching accuracy and significantly improves the registration outcome when used in ssEM images.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127324935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506580
Yang Zhen, Yuanfan Guo, Jinjie Wei, Xiuguo Bao, Di Huang
Video anomaly detection has been widely applied in various surveillance systems for public security. However, the existing weakly supervised video anomaly detection methods tend to ignore the interference of the background frames and possess limited ability to extract effective temporal information among the video snippets. In this paper, a multi-scale background suppression based anomaly detection (MSBSAD) method is proposed to suppress the interference of the background frames. We propose a multi-scale temporal convolution module to effectively extract more temporal information among the video snippets for the anomaly events with different durations. A modified hinge loss is constructed in the suppression branch to help our model to better differentiate the abnormal samples from the confusing samples. Experiments on UCF Crime demonstrate the superiority of our MS-BSAD method in the video anomaly detection task.
{"title":"Multi-Scale Background Suppression Anomaly Detection In Surveillance Videos","authors":"Yang Zhen, Yuanfan Guo, Jinjie Wei, Xiuguo Bao, Di Huang","doi":"10.1109/ICIP42928.2021.9506580","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506580","url":null,"abstract":"Video anomaly detection has been widely applied in various surveillance systems for public security. However, the existing weakly supervised video anomaly detection methods tend to ignore the interference of the background frames and possess limited ability to extract effective temporal information among the video snippets. In this paper, a multi-scale background suppression based anomaly detection (MSBSAD) method is proposed to suppress the interference of the background frames. We propose a multi-scale temporal convolution module to effectively extract more temporal information among the video snippets for the anomaly events with different durations. A modified hinge loss is constructed in the suppression branch to help our model to better differentiate the abnormal samples from the confusing samples. Experiments on UCF Crime demonstrate the superiority of our MS-BSAD method in the video anomaly detection task.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133796640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506725
Zirui Zhu, Lifeng Sun
Federated Learning (FL) is a learning paradigm, which allows the model to directly use a large amount of data in edge devices for training without heavy communication costs and privacy leakage. An important problem that FL faced is the heterogeneity of data at different edge nodes, resulting in a lack of convergence efficiency. In this paper, we propose Federated Trace (FedTrace) to address this problem. In FedTrace, we define the time series of some performance metrics of the global model on the edge node as the training trace of this node, which can reflect the data distribution of the edge node. By clustering the training traces, we can know which nodes have similar data distribution, which can guide the selection of nodes in each round of training. Here, we use a simple but effective method, that is, randomly selecting nodes from each cluster evenly. Experiments on various settings demonstrate that our method significantly reduces the number of communication rounds required in FL.
{"title":"Federated Trace: A Node Selection Method for More Efficient Federated Learning","authors":"Zirui Zhu, Lifeng Sun","doi":"10.1109/ICIP42928.2021.9506725","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506725","url":null,"abstract":"Federated Learning (FL) is a learning paradigm, which allows the model to directly use a large amount of data in edge devices for training without heavy communication costs and privacy leakage. An important problem that FL faced is the heterogeneity of data at different edge nodes, resulting in a lack of convergence efficiency. In this paper, we propose Federated Trace (FedTrace) to address this problem. In FedTrace, we define the time series of some performance metrics of the global model on the edge node as the training trace of this node, which can reflect the data distribution of the edge node. By clustering the training traces, we can know which nodes have similar data distribution, which can guide the selection of nodes in each round of training. Here, we use a simple but effective method, that is, randomly selecting nodes from each cluster evenly. Experiments on various settings demonstrate that our method significantly reduces the number of communication rounds required in FL.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115197319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506729
Cagri Ozdemir, R. Hoover, Kyle A. Caudle
Two-directional two-dimensional principal component analysis ((2D$)^{2}$PCA) has shown promising results for it’s ability to both represent and recognize facial images. The current paper extends these results into a multilinear framework (referred to as two-directional Tensor PCA or 2DTPCA for short) using a recently defined tensor operator for 3rd-order tensors. The approach proceeds by first computing a low-dimensional projection tensor for the row-space of the image data (generally referred to as mode-l) and then subsequently computing a low-dimensional projection tensor for the column space of the image data (generally referred to as mode-3). Experimental results are presented on the ORL, extended Yale-B, COIL100, and MNIST data sets that show the proposed approach outperforms traditional “ tensor-based” PCA approaches with a much smaller subspace dimension in terms of recognition rates.
{"title":"2DTPCA: A New Framework for Multilinear Principal Component Analysis","authors":"Cagri Ozdemir, R. Hoover, Kyle A. Caudle","doi":"10.1109/ICIP42928.2021.9506729","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506729","url":null,"abstract":"Two-directional two-dimensional principal component analysis ((2D$)^{2}$PCA) has shown promising results for it’s ability to both represent and recognize facial images. The current paper extends these results into a multilinear framework (referred to as two-directional Tensor PCA or 2DTPCA for short) using a recently defined tensor operator for 3rd-order tensors. The approach proceeds by first computing a low-dimensional projection tensor for the row-space of the image data (generally referred to as mode-l) and then subsequently computing a low-dimensional projection tensor for the column space of the image data (generally referred to as mode-3). Experimental results are presented on the ORL, extended Yale-B, COIL100, and MNIST data sets that show the proposed approach outperforms traditional “ tensor-based” PCA approaches with a much smaller subspace dimension in terms of recognition rates.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115720431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506267
Burak Satar, Hongyuan Zhu, X. Bresson, J. Lim
With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical. Most approaches aim to learn a joint embedding space for plain textual and visual contents without adequately exploiting their intra-modality structures and inter-modality correlations. This paper proposes a novel transformer that explicitly disentangles the text and video into semantic roles of objects, spatial contexts and temporal contexts with an attention scheme to learn the intra- and inter-role correlations among the three roles to discover discriminative features for matching at different levels. The preliminary results on popular YouCook2 indicate that our approach surpasses a current state-of-the-art method, with a high margin in all metrics. It also overpasses two SOTA methods in terms of two metrics.
{"title":"Semantic Role Aware Correlation Transformer For Text To Video Retrieval","authors":"Burak Satar, Hongyuan Zhu, X. Bresson, J. Lim","doi":"10.1109/ICIP42928.2021.9506267","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506267","url":null,"abstract":"With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical. Most approaches aim to learn a joint embedding space for plain textual and visual contents without adequately exploiting their intra-modality structures and inter-modality correlations. This paper proposes a novel transformer that explicitly disentangles the text and video into semantic roles of objects, spatial contexts and temporal contexts with an attention scheme to learn the intra- and inter-role correlations among the three roles to discover discriminative features for matching at different levels. The preliminary results on popular YouCook2 indicate that our approach surpasses a current state-of-the-art method, with a high margin in all metrics. It also overpasses two SOTA methods in terms of two metrics.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"491 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124429191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506087
Ming Zhang, Shuaicheng Liu, Bing Zeng
Weakly supervised object detection (WSOD) has attracted more attention because it only requires image-level annotations to indicate whether a certain class exists. Most WSOD methods utilize multiple instance learning (MIL) to train an object detector where an image is treated as a bag of candidate proposals. Unlike fully supervised object detection (FSOD) that uses the object-aware region proposal network (RPN) to generate effective candidate proposals, WSOD only utilizes region proposal methods (e.g., selective search or edge boxes) due to the lack of instance-level annotations (i.e., bounding boxes). However, the quality of proposals can influence the training of the detector. To solve this problem, we propose a hierarchical region proposal refinement network (HRPRN) to refine these proposals gradually. Specifically, our network contains multiple weakly supervised detectors that are trained stage by stage. In addition, we propose an instance regression refinement model to generate object-aware coordinate offsets to refine proposals at each stage. In order to demonstrate the effectiveness of our method, we conduct experiments on PASCAL VOC 2007 dataset that is the widely used benchmark. Compared with our baseline method, online instance classifier refinement (OICR), our method achieves 9% and 5.6% improvements in terms of mAP and CorLoc, respectively.
{"title":"Hierarchical Region Proposal Refinement Network for Weakly Supervised Object Detection","authors":"Ming Zhang, Shuaicheng Liu, Bing Zeng","doi":"10.1109/ICIP42928.2021.9506087","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506087","url":null,"abstract":"Weakly supervised object detection (WSOD) has attracted more attention because it only requires image-level annotations to indicate whether a certain class exists. Most WSOD methods utilize multiple instance learning (MIL) to train an object detector where an image is treated as a bag of candidate proposals. Unlike fully supervised object detection (FSOD) that uses the object-aware region proposal network (RPN) to generate effective candidate proposals, WSOD only utilizes region proposal methods (e.g., selective search or edge boxes) due to the lack of instance-level annotations (i.e., bounding boxes). However, the quality of proposals can influence the training of the detector. To solve this problem, we propose a hierarchical region proposal refinement network (HRPRN) to refine these proposals gradually. Specifically, our network contains multiple weakly supervised detectors that are trained stage by stage. In addition, we propose an instance regression refinement model to generate object-aware coordinate offsets to refine proposals at each stage. In order to demonstrate the effectiveness of our method, we conduct experiments on PASCAL VOC 2007 dataset that is the widely used benchmark. Compared with our baseline method, online instance classifier refinement (OICR), our method achieves 9% and 5.6% improvements in terms of mAP and CorLoc, respectively.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124533258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506273
Xiying Wang, R. Ni, Wenjie Li, Yao Zhao
Generative Adversarial Network (GAN) models have been widely used in various fields. More recently, styleGAN and styleGAN2 have been developed to synthesize faces that are indistinguishable to the human eyes, which could pose a threat to public security. But latest work has shown that it is possible to identify fakes using powerful CNN networks as classifiers. However, the reliability of these techniques is unknown. Therefore, in this paper we focus on the generation of content-preserving images from fake faces to spoof classifiers. Two GAN-based frameworks are proposed to achieve the goal in the white-box and black-box. For the white-box, a network without up/down sampling is proposed to generate face images to confuse the classifier. In the black-box scenario (where the classifier is unknown), real data is introduced as a guidance for GAN structure to make it adversarial, and a Real Extractor as an auxiliary network to constrain the feature distance between the generated images and the real data to enhance the adversarial capability. Experimental results show that the proposed method effectively reduces the detection accuracy of forensic models with good transferability.
{"title":"Adversarial Attack on Fake-Faces Detectors Under White and Black Box Scenarios","authors":"Xiying Wang, R. Ni, Wenjie Li, Yao Zhao","doi":"10.1109/ICIP42928.2021.9506273","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506273","url":null,"abstract":"Generative Adversarial Network (GAN) models have been widely used in various fields. More recently, styleGAN and styleGAN2 have been developed to synthesize faces that are indistinguishable to the human eyes, which could pose a threat to public security. But latest work has shown that it is possible to identify fakes using powerful CNN networks as classifiers. However, the reliability of these techniques is unknown. Therefore, in this paper we focus on the generation of content-preserving images from fake faces to spoof classifiers. Two GAN-based frameworks are proposed to achieve the goal in the white-box and black-box. For the white-box, a network without up/down sampling is proposed to generate face images to confuse the classifier. In the black-box scenario (where the classifier is unknown), real data is introduced as a guidance for GAN structure to make it adversarial, and a Real Extractor as an auxiliary network to constrain the feature distance between the generated images and the real data to enhance the adversarial capability. Experimental results show that the proposed method effectively reduces the detection accuracy of forensic models with good transferability.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114366351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cervical carcinoma is a common type of cancer in the female reproductive system. Early detection and diagnosis can facilitate immediate treatment and prevent progression of the disease. However, in order to achieve better performance, DL-based algorithms just stack various layers with low interpretability. In this paper, a robust and reliable Nuclear Density Distribution Feature (NDDF) based on priors of the pathologists to promote the Cervical Histopathological Image Classification (CHIC) is proposed. Our proposed method combines the nucleus mask segmented by U-Net with the segmentation grid-lines generated from pathology images utilizing SLIC to obtain the NDDF map, which contains information about the morphology, size, number, and spatial distribution of nuclei. The result shows that the proposed model trained with NDDF maps has better performance and accuracy than that trained on RGB images (patch-level histopathological images). More significantly, the accuracy of the two-stream network trained with RGB images and NDDF maps is steadily improved over the corresponding baselines of different complexity.
{"title":"Nuclear Density Distribution Feature for Improving Cervical Histopathological Images Recognition","authors":"Zhuangzhuang Wang, Mengning Yang, Yangfan Lyu, Kairun Chen, Qicheng Tang","doi":"10.1109/ICIP42928.2021.9506093","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506093","url":null,"abstract":"Cervical carcinoma is a common type of cancer in the female reproductive system. Early detection and diagnosis can facilitate immediate treatment and prevent progression of the disease. However, in order to achieve better performance, DL-based algorithms just stack various layers with low interpretability. In this paper, a robust and reliable Nuclear Density Distribution Feature (NDDF) based on priors of the pathologists to promote the Cervical Histopathological Image Classification (CHIC) is proposed. Our proposed method combines the nucleus mask segmented by U-Net with the segmentation grid-lines generated from pathology images utilizing SLIC to obtain the NDDF map, which contains information about the morphology, size, number, and spatial distribution of nuclei. The result shows that the proposed model trained with NDDF maps has better performance and accuracy than that trained on RGB images (patch-level histopathological images). More significantly, the accuracy of the two-stream network trained with RGB images and NDDF maps is steadily improved over the corresponding baselines of different complexity.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114505107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506601
T. Haruyama, Ren Togo, Keisuke Maeda, Takahiro Ogawa, M. Haseyama
We propose a novel approach that improves text-guided image manipulation performance in this paper. Text-guided image manipulation aims at modifying some parts of an input image in accordance with the user’s text description by semantically associating the regions of the image with the text description. We tackle the conventional methods’ problem of modifying undesired parts caused by differences in representation ability between text descriptions and images. Humans tend to pay attention primarily to objects corresponding to the foreground of images, and text descriptions by humans mostly represent the foreground. Therefore, it is necessary to introduce not only a foreground-aware bias based on text descriptions but also a background-aware bias that the text descriptions do not represent. We introduce an image segmentation network into the generative adversarial network for image manipulation to solve the above problem. Comparative experiments with three state-of-the-art methods show the effectiveness of our method quantitatively and qualitatively.
{"title":"Segmentation-Aware Text-Guided Image Manipulation","authors":"T. Haruyama, Ren Togo, Keisuke Maeda, Takahiro Ogawa, M. Haseyama","doi":"10.1109/ICIP42928.2021.9506601","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506601","url":null,"abstract":"We propose a novel approach that improves text-guided image manipulation performance in this paper. Text-guided image manipulation aims at modifying some parts of an input image in accordance with the user’s text description by semantically associating the regions of the image with the text description. We tackle the conventional methods’ problem of modifying undesired parts caused by differences in representation ability between text descriptions and images. Humans tend to pay attention primarily to objects corresponding to the foreground of images, and text descriptions by humans mostly represent the foreground. Therefore, it is necessary to introduce not only a foreground-aware bias based on text descriptions but also a background-aware bias that the text descriptions do not represent. We introduce an image segmentation network into the generative adversarial network for image manipulation to solve the above problem. Comparative experiments with three state-of-the-art methods show the effectiveness of our method quantitatively and qualitatively.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114710598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506555
Jiahua Wu, H. Lee
In bottom-up multi-person pose estimation method, grouping joint candidates into corresponding person instance is a challenging problem. In this paper, a new bottom-up method, Partitioned CenterPose (PCP) Network, is proposed to better cluster all detected joints. To achieve this goal, a novel Partition Pose Representation (PPR) is proposed which integrate person instance and body joint by joint offset. PPR leverages the center of human body and the offset between center point and body joint to encode human pose. To better enhance the relationship of body joints, we divide human body into five parts, and generate sub-PPR in each part. Based on PPR, PCP Network can detect persons and body joints simultaneously, and then grouping all body joints by joint offset. Moreover, an improved $ell_{1}$ loss is designed to obtain more accurate joint offset. On the COCO keypoints dataset, the proposed method performs on par with the existing state-of-the-art bottom-up method in accuracy and speed.
{"title":"Partitioned Centerpose Network for Bottom-Up Multi-Person Pose Estimation","authors":"Jiahua Wu, H. Lee","doi":"10.1109/ICIP42928.2021.9506555","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506555","url":null,"abstract":"In bottom-up multi-person pose estimation method, grouping joint candidates into corresponding person instance is a challenging problem. In this paper, a new bottom-up method, Partitioned CenterPose (PCP) Network, is proposed to better cluster all detected joints. To achieve this goal, a novel Partition Pose Representation (PPR) is proposed which integrate person instance and body joint by joint offset. PPR leverages the center of human body and the offset between center point and body joint to encode human pose. To better enhance the relationship of body joints, we divide human body into five parts, and generate sub-PPR in each part. Based on PPR, PCP Network can detect persons and body joints simultaneously, and then grouping all body joints by joint offset. Moreover, an improved $ell_{1}$ loss is designed to obtain more accurate joint offset. On the COCO keypoints dataset, the proposed method performs on par with the existing state-of-the-art bottom-up method in accuracy and speed.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116926316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}