Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615868
Huaxi Huang, Jingsong Xu, Jian Zhang, Qiang Wu, Christina Kirsch
Railway power supply infrastructure is one of the most important components of railway transportation. As the key step of railway maintenance system, power supply infrastructure defects recognition plays a vital role in the whole defects inspection sub-system. Traditional defects recognition task is performed manually, which is time-consuming and high-labor costing. Inspired by the great success of deep neural networks in dealing with different vision tasks, this paper presents an end-to-end deep network to solve the railway infrastructure defects detection problem. More importantly, this paper is the first work that adopts the idea of deep fine-grained classification to do railway defects detection. We propose a new bilinear deep network named Spatial Transformer And Bilinear Low-Rank (STABLR) model and apply it to railway infrastructure defects detection. The experimental results demonstrate that the proposed method outperforms both hand-craft features based machine learning methods and classic deep neural network methods.
{"title":"Railway Infrastructure Defects Recognition using Fine-grained Deep Convolutional Neural Networks","authors":"Huaxi Huang, Jingsong Xu, Jian Zhang, Qiang Wu, Christina Kirsch","doi":"10.1109/DICTA.2018.8615868","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615868","url":null,"abstract":"Railway power supply infrastructure is one of the most important components of railway transportation. As the key step of railway maintenance system, power supply infrastructure defects recognition plays a vital role in the whole defects inspection sub-system. Traditional defects recognition task is performed manually, which is time-consuming and high-labor costing. Inspired by the great success of deep neural networks in dealing with different vision tasks, this paper presents an end-to-end deep network to solve the railway infrastructure defects detection problem. More importantly, this paper is the first work that adopts the idea of deep fine-grained classification to do railway defects detection. We propose a new bilinear deep network named Spatial Transformer And Bilinear Low-Rank (STABLR) model and apply it to railway infrastructure defects detection. The experimental results demonstrate that the proposed method outperforms both hand-craft features based machine learning methods and classic deep neural network methods.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133050751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615796
Keishi Nishikawa, J. Ohya, T. Matsuzawa, A. Takanishi, H. Ogata, K. Hashimoto
In recent years, there has been an increasing demand for disaster response robots designed for working in disaster sites such as nuclear power plants where accidents have occurred. One of the tasks the robots need to complete at these kinds of sites is turning a valve. In order to employ robots to perform this task at real sites, it is desirable that the robots have autonomy for detecting the valves to be manipulated. In this paper, we propose a method that allows a disaster response robot to detect a valve, whose parameters such as position, orientation and size are unknown, based on information captured by a depth camera mounted on the robot. In our proposed algorithm, first the target valve is detected on the basis of an RGB image captured by the depth camera, and 3D point cloud data including the target is reconstructed by combining the detection result and the depth image. Second, the reconstructed point cloud data is processed to estimate parameters describing the target. Experiments were conducted on a simulator, and the results showed that our method could accurately estimate the parameters with a minimum error of 0.0230 m in position, 0.196 % in radius, and 0.00222 degree in orientation.
{"title":"Automatic Detection of Valves with Disaster Response Robot on Basis of Depth Camera Information","authors":"Keishi Nishikawa, J. Ohya, T. Matsuzawa, A. Takanishi, H. Ogata, K. Hashimoto","doi":"10.1109/DICTA.2018.8615796","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615796","url":null,"abstract":"In recent years, there has been an increasing demand for disaster response robots designed for working in disaster sites such as nuclear power plants where accidents have occurred. One of the tasks the robots need to complete at these kinds of sites is turning a valve. In order to employ robots to perform this task at real sites, it is desirable that the robots have autonomy for detecting the valves to be manipulated. In this paper, we propose a method that allows a disaster response robot to detect a valve, whose parameters such as position, orientation and size are unknown, based on information captured by a depth camera mounted on the robot. In our proposed algorithm, first the target valve is detected on the basis of an RGB image captured by the depth camera, and 3D point cloud data including the target is reconstructed by combining the detection result and the depth image. Second, the reconstructed point cloud data is processed to estimate parameters describing the target. Experiments were conducted on a simulator, and the results showed that our method could accurately estimate the parameters with a minimum error of 0.0230 m in position, 0.196 % in radius, and 0.00222 degree in orientation.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131762990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615825
A. Suman, Md. Asikuzzaman, A. Webb, D. Perriman, M. Pickering
This paper presents a framework for inter-patient image registration which uses a multi-thresholds, multi-similarity measures and multi-transformations based on compactly supported spline and discrete periodic spline wavelets (DPSWs) using the Gauss-Newton gradient descent (GNGD) and gradient descent (GD) optimization methods. Our primary intellectual contribution is incorporating DPSWs in the transformation while another includes fusing out-of-range concept in a surface matching technique which is implemented by a multi-transformations and multi-similarity measures. In particular, as a true deformation cannot be achieved by single combination of transformation, similarity measure (SM) and optimization of a registration process, a moving image is required to be brought within the range of a registration. On the other hand, the surface matching technique involves an edge position difference (EPD) SM in which coarse to fine surfaces are matched using multiple thresholds with a spline-based free from deformation (FFD) method. The registration experiments were performed on 3D clinical neck magnetic resonance (MR) images, with the results showing that our proposed method provides good accuracy and robustness.
{"title":"Inter-Subject Image Registration of Clinical Neck MRI Volumes using Discrete Periodic Spline Wavelet and Free form Deformation","authors":"A. Suman, Md. Asikuzzaman, A. Webb, D. Perriman, M. Pickering","doi":"10.1109/DICTA.2018.8615825","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615825","url":null,"abstract":"This paper presents a framework for inter-patient image registration which uses a multi-thresholds, multi-similarity measures and multi-transformations based on compactly supported spline and discrete periodic spline wavelets (DPSWs) using the Gauss-Newton gradient descent (GNGD) and gradient descent (GD) optimization methods. Our primary intellectual contribution is incorporating DPSWs in the transformation while another includes fusing out-of-range concept in a surface matching technique which is implemented by a multi-transformations and multi-similarity measures. In particular, as a true deformation cannot be achieved by single combination of transformation, similarity measure (SM) and optimization of a registration process, a moving image is required to be brought within the range of a registration. On the other hand, the surface matching technique involves an edge position difference (EPD) SM in which coarse to fine surfaces are matched using multiple thresholds with a spline-based free from deformation (FFD) method. The registration experiments were performed on 3D clinical neck magnetic resonance (MR) images, with the results showing that our proposed method provides good accuracy and robustness.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132998156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615858
N. Alam, R. Zwiggelaar
In this paper, the scale-specific graph topological changes of Microcalcifications (MC) were investigated to classify MC cluster. A series of multi-scale MC cluster graphs were generated based on the connectivity of individual MCs. The extracted features from the graph series were integrated with the statistical and morphological characteristics of MC clusters. Subsequent feature selection showed that the features related to the denseness of MC cluster at some specific scales of the generated graphs discriminated better than all other features in classifying MC clusters while using an ensemble classifier with 10-fold cross validation. The proposed method was evaluated using two well-known digitized datasets: MIAS (Mammographic Image Analysis Society) and DDSM (The Digital Database for Screening Mammography). High classification accuracy (around 98%) and good ROC (receiver operating characteristic) results (area under the ROC curve up to 0.99) were achieved.
{"title":"Evaluation of Graph Topological Features in Digitized Mammogram for Microcalcification Cluster Classification","authors":"N. Alam, R. Zwiggelaar","doi":"10.1109/DICTA.2018.8615858","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615858","url":null,"abstract":"In this paper, the scale-specific graph topological changes of Microcalcifications (MC) were investigated to classify MC cluster. A series of multi-scale MC cluster graphs were generated based on the connectivity of individual MCs. The extracted features from the graph series were integrated with the statistical and morphological characteristics of MC clusters. Subsequent feature selection showed that the features related to the denseness of MC cluster at some specific scales of the generated graphs discriminated better than all other features in classifying MC clusters while using an ensemble classifier with 10-fold cross validation. The proposed method was evaluated using two well-known digitized datasets: MIAS (Mammographic Image Analysis Society) and DDSM (The Digital Database for Screening Mammography). High classification accuracy (around 98%) and good ROC (receiver operating characteristic) results (area under the ROC curve up to 0.99) were achieved.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127700289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615764
Meicheng Chu, Bo Liu, F. Zhou, X. Bai, Bin Guo
Skeletal bone age assessment is a clinical practice to diagnose the maturity of children. To accurately assess the bone age, we proposed an automatic bone age assessment method in this paper based on deep convolution network. This method includes two stages: mask generation network and age assessment network. A U-Net convolution network with pretrained VGG16 as the encoder is used to extract the mask of bones. For the assessment module, the original images are fused together with the generated mask image to obtain segmented normalized hand bone images. We then built a multiple output convolution network for accurate age assessment. Finally, the bone age regression problem is transformed into the K-1 binary classification sub-problems. Our model was tested on RSNA2017 Pediatric Bone Age dataset. We were able to achieve the mean absolute error (MAE) of 5.98 months, which outperforms other common methods for bone age assessment. The proposed method could be used for developing fully automatic bone age assessment with better accuracy.
{"title":"Bone Age Assessment Based on Two-Stage Deep Neural Networks","authors":"Meicheng Chu, Bo Liu, F. Zhou, X. Bai, Bin Guo","doi":"10.1109/DICTA.2018.8615764","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615764","url":null,"abstract":"Skeletal bone age assessment is a clinical practice to diagnose the maturity of children. To accurately assess the bone age, we proposed an automatic bone age assessment method in this paper based on deep convolution network. This method includes two stages: mask generation network and age assessment network. A U-Net convolution network with pretrained VGG16 as the encoder is used to extract the mask of bones. For the assessment module, the original images are fused together with the generated mask image to obtain segmented normalized hand bone images. We then built a multiple output convolution network for accurate age assessment. Finally, the bone age regression problem is transformed into the K-1 binary classification sub-problems. Our model was tested on RSNA2017 Pediatric Bone Age dataset. We were able to achieve the mean absolute error (MAE) of 5.98 months, which outperforms other common methods for bone age assessment. The proposed method could be used for developing fully automatic bone age assessment with better accuracy.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131025341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615790
H. Plank, G. Holweg, C. Steger, N. Druml
We present a new energy-efficient distance sensing method for 3D object tracking with Time-of-Flight sensors. The field of 3D object tracking with 3D cameras recently gained momentum due to the advent of front-facing depth cameras in smartphones. Tracking the user's head with 3D cameras will enable novel user experiences, but can lead to power consumption issues due to the active illumination. State-of-the-art continuous-wave Time-of-Flight imaging requires at least four different phase-images, while our approach can produce 3D measurements from single phase-images. This reduces the amount of emitted light to a minimum, improves latency and enables higher framerates. As our evaluation shows, after a brief initialization phase, our method can reduce the power consumption of a Time-of-Flight system by up to 68%.
{"title":"Fast and Energy-Efficient Time-of-Flight Distance Sensing Method for 3D Object Tracking","authors":"H. Plank, G. Holweg, C. Steger, N. Druml","doi":"10.1109/DICTA.2018.8615790","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615790","url":null,"abstract":"We present a new energy-efficient distance sensing method for 3D object tracking with Time-of-Flight sensors. The field of 3D object tracking with 3D cameras recently gained momentum due to the advent of front-facing depth cameras in smartphones. Tracking the user's head with 3D cameras will enable novel user experiences, but can lead to power consumption issues due to the active illumination. State-of-the-art continuous-wave Time-of-Flight imaging requires at least four different phase-images, while our approach can produce 3D measurements from single phase-images. This reduces the amount of emitted light to a minimum, improves latency and enables higher framerates. As our evaluation shows, after a brief initialization phase, our method can reduce the power consumption of a Time-of-Flight system by up to 68%.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115703607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615774
Saruar Alam, Len Hamey, K. Ho-Shon
Alzheimer's disease (AD) can be detected using magnetic resonance imaging (MRI) based features and supervised classifiers. The subcortical and ventricular volumes change for AD patients. These volumes can be extracted from MRI by tools such as FreeSurfer and the multi-atlas-based likelihood fusion (MALF) algorithm. Studies use MRI from many medical imaging centers. However, individual centers typically use distinctive MRI protocols for brain scanning. The protocol differences include different scanner models with various operating parameters. Some scanner models have different field strengths. A key factor in classifying multicentric MR subject images having different protocols is how different scanner models affect the extraction of feature, and the subsequent classification performance of a supervised classifier. We have investigated the classification performance of FreeSurfer and MALF based volume features together with Radial Basis Function Support Vector Machine and Extreme Learning Machine across different imaging protocols. We have also investigated for both FreeSurfer and MALF, which brain regions are most effective for the detection of the disease under different protocols. Our study result indicates marginal differences in classification performance across scanner models with the same or different field strengths when differentiating AD, Mild Cognitive Impairment, and Normal Controls. We have also observed differences in ranking order of the most effective brain regions.
{"title":"Impact of MRI Protocols on Alzheimer's Disease Detection","authors":"Saruar Alam, Len Hamey, K. Ho-Shon","doi":"10.1109/DICTA.2018.8615774","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615774","url":null,"abstract":"Alzheimer's disease (AD) can be detected using magnetic resonance imaging (MRI) based features and supervised classifiers. The subcortical and ventricular volumes change for AD patients. These volumes can be extracted from MRI by tools such as FreeSurfer and the multi-atlas-based likelihood fusion (MALF) algorithm. Studies use MRI from many medical imaging centers. However, individual centers typically use distinctive MRI protocols for brain scanning. The protocol differences include different scanner models with various operating parameters. Some scanner models have different field strengths. A key factor in classifying multicentric MR subject images having different protocols is how different scanner models affect the extraction of feature, and the subsequent classification performance of a supervised classifier. We have investigated the classification performance of FreeSurfer and MALF based volume features together with Radial Basis Function Support Vector Machine and Extreme Learning Machine across different imaging protocols. We have also investigated for both FreeSurfer and MALF, which brain regions are most effective for the detection of the disease under different protocols. Our study result indicates marginal differences in classification performance across scanner models with the same or different field strengths when differentiating AD, Mild Cognitive Impairment, and Normal Controls. We have also observed differences in ranking order of the most effective brain regions.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114705779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615864
Amir Ghahremani, E. Bondarev, P. D. With
Exploiting ConvNets for object classification systems requires extensive labor work, since these networks require to be trained by sufficiently large and accurately labeled datasets. We propose a novel self-learning approach, which is able to generate a reliable multi-class object classification model from a low-quality dataset that is disturbed with a high level of inter-class noise samples. This approach iteratively purifies the noisy training datasets for each class and updates the classification model. The iterations continue until the model and its parameters reach sufficient quality. The self-learning approach based on ConvNets is evaluated for a maritime surveillance use case, where vessels need to be classified into eight different types. The experimental results on the evaluation dataset show that the proposed approach improves the F1 score approximately by 5%, 8% and 25% at the end of the third iteration, while the initial training datasets contain 40%, 50% and 60% inter-class noise samples (erroneously classified labels of vessels), respectively. Additionally, the purification performance is highly dependent on inter- and inter-class similarities between training samples for higher noise levels. It was also found that the mean Average Precision (mAP) does not degrade so much, whereas other performance parameters show larger variation.
{"title":"Multi-Class Recognition using Noisy Training Data with a Self-Learning Approach","authors":"Amir Ghahremani, E. Bondarev, P. D. With","doi":"10.1109/DICTA.2018.8615864","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615864","url":null,"abstract":"Exploiting ConvNets for object classification systems requires extensive labor work, since these networks require to be trained by sufficiently large and accurately labeled datasets. We propose a novel self-learning approach, which is able to generate a reliable multi-class object classification model from a low-quality dataset that is disturbed with a high level of inter-class noise samples. This approach iteratively purifies the noisy training datasets for each class and updates the classification model. The iterations continue until the model and its parameters reach sufficient quality. The self-learning approach based on ConvNets is evaluated for a maritime surveillance use case, where vessels need to be classified into eight different types. The experimental results on the evaluation dataset show that the proposed approach improves the F1 score approximately by 5%, 8% and 25% at the end of the third iteration, while the initial training datasets contain 40%, 50% and 60% inter-class noise samples (erroneously classified labels of vessels), respectively. Additionally, the purification performance is highly dependent on inter- and inter-class similarities between training samples for higher noise levels. It was also found that the mean Average Precision (mAP) does not degrade so much, whereas other performance parameters show larger variation.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121081832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615829
Jialiang Shen, Yucheng Wang, Jian Zhang
CNN methods for image super-resolution consume a large number of training-time memory, due to the feature size will not decrease as the network goes deeper. To reduce the memory consumption during training, we propose a memory optimized deep dense network for image super-resolution. We first reduce redundant features learning, by rationally designing the skip connection and dense connection in the network. Then we adopt share memory allocations to store concatenated features and Batch Normalization intermediate feature maps. The memory optimized network consumes less memory than normal dense network. We also evaluate our proposed architecture on highly competitive super-resolution benchmark datasets. Our deep dense network outperforms some existing methods, and requires relatively less computation.
{"title":"Memory Optimized Deep Dense Network for Image Super-resolution","authors":"Jialiang Shen, Yucheng Wang, Jian Zhang","doi":"10.1109/DICTA.2018.8615829","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615829","url":null,"abstract":"CNN methods for image super-resolution consume a large number of training-time memory, due to the feature size will not decrease as the network goes deeper. To reduce the memory consumption during training, we propose a memory optimized deep dense network for image super-resolution. We first reduce redundant features learning, by rationally designing the skip connection and dense connection in the network. Then we adopt share memory allocations to store concatenated features and Batch Normalization intermediate feature maps. The memory optimized network consumes less memory than normal dense network. We also evaluate our proposed architecture on highly competitive super-resolution benchmark datasets. Our deep dense network outperforms some existing methods, and requires relatively less computation.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124429476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615849
Lu Zhang, Jingsong Xu, Jian Zhang, Yongshun Gong
Travelogues consist of textual information shared by tourists through web forums or other social media which often lack illustrations (images). In image sharing websites like Flicker, users can post images with rich textual information: ‘title’, ‘tag’ and ‘description’. The topics of travelogues usually revolve around beautiful sceneries. Corresponding landscape images recommended to these travelogues can enhance the vividness of reading. However, it is difficult to fuse such information because the text attached to each image has diverse meanings/views. In this paper, we propose an unsupervised Hybrid Multiple Kernel K-means (HMKKM) model to link images and travelogues through multiple views. Multi-view matrices are built to reveal the correlations between several respects. For further improving the performance, we add a regularisation based on textual similarity. To evaluate the effectiveness of the proposed method, a dataset is constructed from TripAdvisor and Flicker to find the related images for each travelogue. Experiment results demonstrate the superiority of the proposed model by comparison with other baselines.
{"title":"Information Enhancement for Travelogues via a Hybrid Clustering Model","authors":"Lu Zhang, Jingsong Xu, Jian Zhang, Yongshun Gong","doi":"10.1109/DICTA.2018.8615849","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615849","url":null,"abstract":"Travelogues consist of textual information shared by tourists through web forums or other social media which often lack illustrations (images). In image sharing websites like Flicker, users can post images with rich textual information: ‘title’, ‘tag’ and ‘description’. The topics of travelogues usually revolve around beautiful sceneries. Corresponding landscape images recommended to these travelogues can enhance the vividness of reading. However, it is difficult to fuse such information because the text attached to each image has diverse meanings/views. In this paper, we propose an unsupervised Hybrid Multiple Kernel K-means (HMKKM) model to link images and travelogues through multiple views. Multi-view matrices are built to reveal the correlations between several respects. For further improving the performance, we add a regularisation based on textual similarity. To evaluate the effectiveness of the proposed method, a dataset is constructed from TripAdvisor and Flicker to find the related images for each travelogue. Experiment results demonstrate the superiority of the proposed model by comparison with other baselines.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130223627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}