Hai Wu, Hongtao Xie, Fanchao Lin, Sicheng Zhang, Jun Sun, Yongdong Zhang
Landmark detection in hip X-ray images plays a critical role in diagnosis of Developmental Dysplasia of the Hip (DDH) and surgeries of Total Hip Arthroplasty (THA). Regression and heatmap techniques of convolution network could obtain reasonable results. However, they have limitations in either robustness or precision given the complexities and intensity inhomogeneities of hip X-ray images. In this paper, we propose a Wave-like Cascade Segmentation Network (WaveCSN) to improve the accuracy of landmark detection by transforming landmark detection into area segmentation. The WaveCSN consists of three basic sub-networks and each sub-network is composed of a U-net module, an indicate module and a max-MSER module. The U-net undertakes the task to generate masks, and the indicate module is trained to distinguish the masks and ground truth. The U-net and indicate module are trained in turns, in which process the generated masks are supervised to be more and more alike to the ground truth. The max-MSER module ensures landmarks can be extracted from the generated masks precisely. We present two professional datasets (DDH and THA) for the first time and evaluate the WaveCSN on them. Our results prove that the WaveCSN can improve 2.66 and 4.11 pixels at least on these two datasets compared to other methods, and achieves the state-of-the-art for landmark detection in hip X-ray images.
{"title":"WaveCSN","authors":"Hai Wu, Hongtao Xie, Fanchao Lin, Sicheng Zhang, Jun Sun, Yongdong Zhang","doi":"10.1145/3338533.3366574","DOIUrl":"https://doi.org/10.1145/3338533.3366574","url":null,"abstract":"Landmark detection in hip X-ray images plays a critical role in diagnosis of Developmental Dysplasia of the Hip (DDH) and surgeries of Total Hip Arthroplasty (THA). Regression and heatmap techniques of convolution network could obtain reasonable results. However, they have limitations in either robustness or precision given the complexities and intensity inhomogeneities of hip X-ray images. In this paper, we propose a Wave-like Cascade Segmentation Network (WaveCSN) to improve the accuracy of landmark detection by transforming landmark detection into area segmentation. The WaveCSN consists of three basic sub-networks and each sub-network is composed of a U-net module, an indicate module and a max-MSER module. The U-net undertakes the task to generate masks, and the indicate module is trained to distinguish the masks and ground truth. The U-net and indicate module are trained in turns, in which process the generated masks are supervised to be more and more alike to the ground truth. The max-MSER module ensures landmarks can be extracted from the generated masks precisely. We present two professional datasets (DDH and THA) for the first time and evaluate the WaveCSN on them. Our results prove that the WaveCSN can improve 2.66 and 4.11 pixels at least on these two datasets compared to other methods, and achieves the state-of-the-art for landmark detection in hip X-ray images.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123436043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this research, we study a specific task of visual-textual presentation synthesis, where artistic text is generated and embedded in a background photo. The art form of visual-textual presentation is widely used in graphic design such as posters, billboards and trademarks, and therefore is of high application value. We propose a new framework to complete this task. First, the shape of the target text is adjusted and the textures are rendered to match the reference style image to generate artistic text. By considering both aesthetics and seamlessness, the layout where the artistic text is placed is determined. Finally the artistic text is blended with the background photo to obtain the visual-textual presentations. The experimental results demonstrate the effectiveness of the proposed framework in creating professionally designed visual-textual presentations.
{"title":"Artistic Text Stylization for Visual-Textual Presentation Synthesis","authors":"Shuai Yang","doi":"10.1145/3338533.3372211","DOIUrl":"https://doi.org/10.1145/3338533.3372211","url":null,"abstract":"In this research, we study a specific task of visual-textual presentation synthesis, where artistic text is generated and embedded in a background photo. The art form of visual-textual presentation is widely used in graphic design such as posters, billboards and trademarks, and therefore is of high application value. We propose a new framework to complete this task. First, the shape of the target text is adjusted and the textures are rendered to match the reference style image to generate artistic text. By considering both aesthetics and seamlessness, the layout where the artistic text is placed is determined. Finally the artistic text is blended with the background photo to obtain the visual-textual presentations. The experimental results demonstrate the effectiveness of the proposed framework in creating professionally designed visual-textual presentations.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115255261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Block-based transform coding in its nature causes blocking artifacts, which severely degrades picture quality especially in a high compression rate. Although convolutional neural networks (CNNs) achieve good performance in image restoration tasks, existing methods mainly focus on deep or efficient network architecture. The gradient of compressed images has different characteristics from the original gradient that has dramatic changes in pixel values along block boundaries. Motivated by them, we propose gradient guided image deblocking based on CNNs in this paper. Guided by the gradient information of the input blocky image, the proposed network successfully preserves textural edges while reducing blocky edges, and thus restores the original clean image from compression degradation. Experimental results demonstrate that the gradient information in the input compressed image contributes to blocking artifact reduction as well as the proposed method achieves a significant performance improvement in terms of visual quality and objective measurements.
{"title":"Gradient Guided Image Deblocking Using Convolutional Neural Networks","authors":"Cheolkon Jung, Jiawei Feng, Zhu Li","doi":"10.1145/3338533.3368258","DOIUrl":"https://doi.org/10.1145/3338533.3368258","url":null,"abstract":"Block-based transform coding in its nature causes blocking artifacts, which severely degrades picture quality especially in a high compression rate. Although convolutional neural networks (CNNs) achieve good performance in image restoration tasks, existing methods mainly focus on deep or efficient network architecture. The gradient of compressed images has different characteristics from the original gradient that has dramatic changes in pixel values along block boundaries. Motivated by them, we propose gradient guided image deblocking based on CNNs in this paper. Guided by the gradient information of the input blocky image, the proposed network successfully preserves textural edges while reducing blocky edges, and thus restores the original clean image from compression degradation. Experimental results demonstrate that the gradient information in the input compressed image contributes to blocking artifact reduction as well as the proposed method achieves a significant performance improvement in terms of visual quality and objective measurements.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129581920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feiyang Liu, G. Cao, Daiqin Yang, Yiyong Zha, Yunfei Zhang, Xin Liu
In this paper, an LSTM based rate-distortion (R-D) prediction method for low-delay video coding has been proposed. Unlike the traditional rate control algorithms, LSTM is introduced to learn the latent pattern of the R-D relationship in the progress of video coding. Temporal information, hierarchical coding structure information and the content of the frame which is to be encoded have been used to achieve more accurate prediction. Based on the proposed network, a new R-D model parameters prediction method is proposed and tested on test model of Versatile Video Coding (VVC). According to the experimental results, compared with the state-of-the-art method used in VVC, the proposed method can achieve better performance.
{"title":"An LSTM based Rate and Distortion Prediction Method for Low-delay Video Coding","authors":"Feiyang Liu, G. Cao, Daiqin Yang, Yiyong Zha, Yunfei Zhang, Xin Liu","doi":"10.1145/3338533.3366630","DOIUrl":"https://doi.org/10.1145/3338533.3366630","url":null,"abstract":"In this paper, an LSTM based rate-distortion (R-D) prediction method for low-delay video coding has been proposed. Unlike the traditional rate control algorithms, LSTM is introduced to learn the latent pattern of the R-D relationship in the progress of video coding. Temporal information, hierarchical coding structure information and the content of the frame which is to be encoded have been used to achieve more accurate prediction. Based on the proposed network, a new R-D model parameters prediction method is proposed and tested on test model of Versatile Video Coding (VVC). According to the experimental results, compared with the state-of-the-art method used in VVC, the proposed method can achieve better performance.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128554765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microphone array beamforming has been approved to be an effective method for suppressing adverse interferences. Recently, acoustic beamformers that employ neural networks (NN) for estimating the time-frequency (T-F) mask, termed as TFMask-BF, receive tremendous attention and have shown great benefits as a front-end for noise-robust Automatic Speech Recognition (ASR). However, our preliminary experiments using TFMask-BF for ASR task show that the mask model trained with simulated data cannot perform well in the real environment since there is a data mismatch problem. In this study, we adopt the knowledge distillation learning framework to make use of real-recording data together with simulated data in the training phase to reduce the impact of the data mismatch. Moreover, a novel iterative knowledge distillation mask model (IKDMM) training scheme has been systematically developed. Specifically, two bi-directional long short-term memory (BLSTM) models, are designed as a teacher mask model (TMM) and a student mask model (SMM). The TMM is trained with simulated data at each iteration and then it is employed to separately generate the soft mask labels of both simulated and real-recording data.The simulated data and the real-recording data with their corresponding generated soft mask labels are formed into the new training data to train our SMM at each iteration. The proposed approach is evaluated as a front-end for ASR on the six-channel CHiME-4 corpus. Experimental results show that the data mismatch problem can be reduced by our IKDMM, leading to a 5% relative Word Error Rate (WER) reduction compared to conventional TFMask-BF for the real-recording data under noisy conditions.
{"title":"IKDMM","authors":"Zhaoyi Liu, Yuexian Zou","doi":"10.1145/3338533.3366607","DOIUrl":"https://doi.org/10.1145/3338533.3366607","url":null,"abstract":"Microphone array beamforming has been approved to be an effective method for suppressing adverse interferences. Recently, acoustic beamformers that employ neural networks (NN) for estimating the time-frequency (T-F) mask, termed as TFMask-BF, receive tremendous attention and have shown great benefits as a front-end for noise-robust Automatic Speech Recognition (ASR). However, our preliminary experiments using TFMask-BF for ASR task show that the mask model trained with simulated data cannot perform well in the real environment since there is a data mismatch problem. In this study, we adopt the knowledge distillation learning framework to make use of real-recording data together with simulated data in the training phase to reduce the impact of the data mismatch. Moreover, a novel iterative knowledge distillation mask model (IKDMM) training scheme has been systematically developed. Specifically, two bi-directional long short-term memory (BLSTM) models, are designed as a teacher mask model (TMM) and a student mask model (SMM). The TMM is trained with simulated data at each iteration and then it is employed to separately generate the soft mask labels of both simulated and real-recording data.The simulated data and the real-recording data with their corresponding generated soft mask labels are formed into the new training data to train our SMM at each iteration. The proposed approach is evaluated as a front-end for ASR on the six-channel CHiME-4 corpus. Experimental results show that the data mismatch problem can be reduced by our IKDMM, leading to a 5% relative Word Error Rate (WER) reduction compared to conventional TFMask-BF for the real-recording data under noisy conditions.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116246799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyang Yu, Guorong Li, Weigang Zhang, H. Yao, Qingming Huang
Under the tracking-by-detection framework, multi-object tracking methods try to connect object detections with target trajectories by reasonable policy. Most methods represent objects by the appearance and motion. The inference of the association is mostly judged by a fusion of appearance similarity and motion consistency. However, the fusion ratio between appearance and motion are often determined by subjective setting. In this paper, we propose a novel self-balance method fusing appearance similarity and motion consistency. Extensive experimental results on public benchmarks demonstrate the effectiveness of the proposed method with comparisons to several state-of-the-art trackers.
{"title":"Self-balance Motion and Appearance Model for Multi-object Tracking in UAV","authors":"Hongyang Yu, Guorong Li, Weigang Zhang, H. Yao, Qingming Huang","doi":"10.1145/3338533.3366561","DOIUrl":"https://doi.org/10.1145/3338533.3366561","url":null,"abstract":"Under the tracking-by-detection framework, multi-object tracking methods try to connect object detections with target trajectories by reasonable policy. Most methods represent objects by the appearance and motion. The inference of the association is mostly judged by a fusion of appearance similarity and motion consistency. However, the fusion ratio between appearance and motion are often determined by subjective setting. In this paper, we propose a novel self-balance method fusing appearance similarity and motion consistency. Extensive experimental results on public benchmarks demonstrate the effectiveness of the proposed method with comparisons to several state-of-the-art trackers.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114877646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiansong Huang, Hong-Ju He, Pengxu Wei, Chi Zhang, Juncen Zhang, Jie Chen
Histopathological image analysis is considered as a gold standard for cancer identification and diagnosis. Tumor segmentation for histopathological images is one of the most important research topics and its performance directly affects the diagnosis judgment of doctors for cancer categories and their periods. With the remarkable development of deep learning methods, extensive methods have been proposed for tumor segmentation. However, there are few researches on analysis of specific pipeline of tumor segmentation. Moreover, few studies have done detailed research on the hard example mining of tumor segmentation. In order to bridge this gap, this study firstly summarize a specific pipeline of tumor segmentation. Then, hard example mining in tumor segmentation is also explored. Finally, experiments are conducted for evaluating segmentation performance of our method, demonstrating the effects of our method and hard example mining.
{"title":"Tumor Tissue Segmentation for Histopathological Images","authors":"Xiansong Huang, Hong-Ju He, Pengxu Wei, Chi Zhang, Juncen Zhang, Jie Chen","doi":"10.1145/3338533.3372210","DOIUrl":"https://doi.org/10.1145/3338533.3372210","url":null,"abstract":"Histopathological image analysis is considered as a gold standard for cancer identification and diagnosis. Tumor segmentation for histopathological images is one of the most important research topics and its performance directly affects the diagnosis judgment of doctors for cancer categories and their periods. With the remarkable development of deep learning methods, extensive methods have been proposed for tumor segmentation. However, there are few researches on analysis of specific pipeline of tumor segmentation. Moreover, few studies have done detailed research on the hard example mining of tumor segmentation. In order to bridge this gap, this study firstly summarize a specific pipeline of tumor segmentation. Then, hard example mining in tumor segmentation is also explored. Finally, experiments are conducted for evaluating segmentation performance of our method, demonstrating the effects of our method and hard example mining.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128195206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep networks have recently been applied to medical assistant diagnosis. The brain is the largest and the most complex structure in the central nervous system, which is also complicated in medical images such as computed tomography (CT) scan. While reading the CT image, radiologists generally search across the image to find lesions, characterize and measure them, and then describe them in the radiological report. To automate this process, we quantitatively analyze the cerebral hemorrhage dataset and propose a Multi-scale Feature with Collaborative Learning (MFCL) strategy in terms of Weakly Supervised Lesion Detection (WSLD), which not only adapts to the characteristics of detecting small lesions but also introduces the global constraint classification objective in training. Specifically, a multi-scale feature branch network and a collaborative learning are designed to locate the lesion area. Experimental results demonstrate that the proposed method is valid on the cerebral hemorrhage dataset, and a new baseline of WSLD is established on cerebral hemorrhage dataset.
{"title":"Multi-scale Features for Weakly Supervised Lesion Detection of Cerebral Hemorrhage with Collaborative Learning","authors":"Zhiwei Chen, Rongrong Ji, Jipeng Wu, Yunhang Shen","doi":"10.1145/3338533.3372209","DOIUrl":"https://doi.org/10.1145/3338533.3372209","url":null,"abstract":"Deep networks have recently been applied to medical assistant diagnosis. The brain is the largest and the most complex structure in the central nervous system, which is also complicated in medical images such as computed tomography (CT) scan. While reading the CT image, radiologists generally search across the image to find lesions, characterize and measure them, and then describe them in the radiological report. To automate this process, we quantitatively analyze the cerebral hemorrhage dataset and propose a Multi-scale Feature with Collaborative Learning (MFCL) strategy in terms of Weakly Supervised Lesion Detection (WSLD), which not only adapts to the characteristics of detecting small lesions but also introduces the global constraint classification objective in training. Specifically, a multi-scale feature branch network and a collaborative learning are designed to locate the lesion area. Experimental results demonstrate that the proposed method is valid on the cerebral hemorrhage dataset, and a new baseline of WSLD is established on cerebral hemorrhage dataset.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123514179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The number of training data is the key bottleneck in achieving good results for medical image analysis and especially in deep learning. Due to small medical training data, deep learning models often fail to mine useful features and have serious over-fitting problems. In this paper, we propose a clean and effective feature fusion adversarial learning network to mine useful features and relieve over-fitting problems. Firstly, we train a fully convolution autoencoder network with unsupervised learning to mine useful feature maps from our liver lesion data. Secondly, these feature maps will be transferred to our adversarial SENet network for liver lesion classification. Our experiments on liver lesion classification in CT show an average accuracy as 85.47% compared with the baseline training scheme, which demonstrate our proposed method can mime useful features and relieve over-fitting problem. It can assist physicians in the early detection and treatment of liver lesions.
{"title":"Feature fusion adversarial learning network for liver lesion classification","authors":"Peng Chen, Yuqing Song, Deqi Yuan, Zhe Liu","doi":"10.1145/3338533.3366577","DOIUrl":"https://doi.org/10.1145/3338533.3366577","url":null,"abstract":"The number of training data is the key bottleneck in achieving good results for medical image analysis and especially in deep learning. Due to small medical training data, deep learning models often fail to mine useful features and have serious over-fitting problems. In this paper, we propose a clean and effective feature fusion adversarial learning network to mine useful features and relieve over-fitting problems. Firstly, we train a fully convolution autoencoder network with unsupervised learning to mine useful feature maps from our liver lesion data. Secondly, these feature maps will be transferred to our adversarial SENet network for liver lesion classification. Our experiments on liver lesion classification in CT show an average accuracy as 85.47% compared with the baseline training scheme, which demonstrate our proposed method can mime useful features and relieve over-fitting problem. It can assist physicians in the early detection and treatment of liver lesions.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131717894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}