Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.000-9
Arkaprava Saha, Neha Sengupta, Maya Ramanath
Reachability queries are a fundamental graph operation with applications in several domains. There has been extensive research over several decades on answering reachability queries efficiently using sophisticated index structures. However, most of these methods are built for static graphs. For graphs that are updated very frequently and are massive in size, maintaining such index structures is often infeasible due to a large memory footprint and extremely slow updates. In this paper, we introduce a technique to compute reachability queries for very large and highly dynamic graphs that minimizes the memory footprint and update time. In particular, we enable a previously proposed, index-free, approximate method for reachability called ARROW on a compact graph representation called Bloom graphs. Bloom graphs use collections of the well known summary data structure called the Bloom filter to store the edges of the graph. In our experimental evaluation with real world graph datasets with up to millions of nodes and edges, we show that using ARROW with a Bloom graph achieves memory savings of up to 50%, while having accuracy close to 100% for all graphs.
{"title":"Reachability in Large Graphs Using Bloom Filters","authors":"Arkaprava Saha, Neha Sengupta, Maya Ramanath","doi":"10.1109/ICDEW.2019.000-9","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.000-9","url":null,"abstract":"Reachability queries are a fundamental graph operation with applications in several domains. There has been extensive research over several decades on answering reachability queries efficiently using sophisticated index structures. However, most of these methods are built for static graphs. For graphs that are updated very frequently and are massive in size, maintaining such index structures is often infeasible due to a large memory footprint and extremely slow updates. In this paper, we introduce a technique to compute reachability queries for very large and highly dynamic graphs that minimizes the memory footprint and update time. In particular, we enable a previously proposed, index-free, approximate method for reachability called ARROW on a compact graph representation called Bloom graphs. Bloom graphs use collections of the well known summary data structure called the Bloom filter to store the edges of the graph. In our experimental evaluation with real world graph datasets with up to millions of nodes and edges, we show that using ARROW with a Bloom graph achieves memory savings of up to 50%, while having accuracy close to 100% for all graphs.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115637158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.00-31
Madhu Kumari, Tajinder Singh
The image understanding in the era of deep learning is burgeoning not only in terms of semantics but also in towards the generation of a meaningful descriptions of images, this requires specific cross model training of deep neural networks which must be complex enough to encode the fine contextual information related to the image and simple enough enough to cover wide range of inputs. Conversion of food image to its cooking description/instructions is a suitable instance of the above mentioned image understanding challenge. This paper proposes a unique method of obtaining the compressed embeddings of cooking instructions of a recipe image using cross model training of CNN, LSTM and Bi-Directional LSTM. The major challenge in this is variable length of instructions, number of instructions per recipe and multiple food items present in a food image. Our model successfully meets these challenges through transfer learning and multi-level error propagations across different neural networks by achieving condensed embeddings of cooking instruction which have high similarity with original instructions. In this paper we have specifically experimented on Indian cuisine data (Food image, Ingredients, Cooking Instruction and contextual information) scraped from the web. The proposed model can be significantly useful for information retrieval system and it can also be effectively utilized in automatic recipe recommendations.
{"title":"Food Image to Cooking Instructions Conversion Through Compressed Embeddings Using Deep Learning","authors":"Madhu Kumari, Tajinder Singh","doi":"10.1109/ICDEW.2019.00-31","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-31","url":null,"abstract":"The image understanding in the era of deep learning is burgeoning not only in terms of semantics but also in towards the generation of a meaningful descriptions of images, this requires specific cross model training of deep neural networks which must be complex enough to encode the fine contextual information related to the image and simple enough enough to cover wide range of inputs. Conversion of food image to its cooking description/instructions is a suitable instance of the above mentioned image understanding challenge. This paper proposes a unique method of obtaining the compressed embeddings of cooking instructions of a recipe image using cross model training of CNN, LSTM and Bi-Directional LSTM. The major challenge in this is variable length of instructions, number of instructions per recipe and multiple food items present in a food image. Our model successfully meets these challenges through transfer learning and multi-level error propagations across different neural networks by achieving condensed embeddings of cooking instruction which have high similarity with original instructions. In this paper we have specifically experimented on Indian cuisine data (Food image, Ingredients, Cooking Instruction and contextual information) scraped from the web. The proposed model can be significantly useful for information retrieval system and it can also be effectively utilized in automatic recipe recommendations.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116139541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.000-8
Lin Hu, Naiqing Guan, Lei Zou
Due to the irregularity of graph data, designing an efficient GPU-based graph algorithm is always a challenging task. Inefficient memory access and work imbalance often limit GPU-based graph computing, even though GPU provides a massively parallelism computing fashion. To address that, in this paper, we propose a fine-grained task distribution strategy for triangle counting task. Extensive experiments and theoretical analysis confirm the superiority of our algorithm over both large real and synthetic graph datasets.
{"title":"Triangle Counting on GPU Using Fine-Grained Task Distribution","authors":"Lin Hu, Naiqing Guan, Lei Zou","doi":"10.1109/ICDEW.2019.000-8","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.000-8","url":null,"abstract":"Due to the irregularity of graph data, designing an efficient GPU-based graph algorithm is always a challenging task. Inefficient memory access and work imbalance often limit GPU-based graph computing, even though GPU provides a massively parallelism computing fashion. To address that, in this paper, we propose a fine-grained task distribution strategy for triangle counting task. Extensive experiments and theoretical analysis confirm the superiority of our algorithm over both large real and synthetic graph datasets.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"64 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114116279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.00-29
Angela Chang, Jieyi Hu, Yichao Liu, M. Liu
Data mining is the discovery of valuable and novel structures in datasets. Considering the number of people suffering from diet-related cardiometabolic increase, it brings into question the media's efforts in food and health communication. One of the main objectives in this study focuses on the emerging data mining methodology to understand the structure of what and how the news media discuss food, diet, and the related cardiometabolic diseases. A total of 6,625 items of coverage on food, flavor, and condiments along with cardiometabolic diseases is identified. Data mining algorithms concern food for predicting health outcomes and providing policy information. The most typical usage of a food data corpus is automatic conversion from text to health afflictions on larger cultural forces.
{"title":"Data Mining Approach to Chinese Food Analysis for Diet-Related Cardiometabolic Diseases","authors":"Angela Chang, Jieyi Hu, Yichao Liu, M. Liu","doi":"10.1109/ICDEW.2019.00-29","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-29","url":null,"abstract":"Data mining is the discovery of valuable and novel structures in datasets. Considering the number of people suffering from diet-related cardiometabolic increase, it brings into question the media's efforts in food and health communication. One of the main objectives in this study focuses on the emerging data mining methodology to understand the structure of what and how the news media discuss food, diet, and the related cardiometabolic diseases. A total of 6,625 items of coverage on food, flavor, and condiments along with cardiometabolic diseases is identified. Data mining algorithms concern food for predicting health outcomes and providing policy information. The most typical usage of a food data corpus is automatic conversion from text to health afflictions on larger cultural forces.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134609051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.00-16
Yuzhen Tong, Yadan Luo, Zheng Zhang, S. Sadiq, Peng Cui
Recommendation systems have been a core part of daily Internet life. Conventional recommendation models hardly defend adversaries due to the natural noise like misclicking. Recent researches on GAN-based recommendation systems can improve the robustness of the learning models, yielding the state-of-the-art performance. The basic idea is to adopt an interplay minimax game on two recommendation systems by picking negative samples as fake items and employ reinforcement learning policy. However, such strategy may lead to mode collapse and result in high vulnerability to adversarial perturbations on its model parameters. In this paper, we propose a new collaborative framework, namely Collaborative Generative Adversarial Network (CGAN), which adopts Variational Auto-encoder (VAE) as the generator and performs adversarial training in the continuous embedding space. The formulation of CGAN has two advantages: 1) its auto-encoder takes the role of generator to mimic the true distribution of users preferences over items by capturing subtle latent factors underlying user-item interactions; 2) the adversarial training in continuous space enhances models robustness and performance. Extensive experiments conducted on two real-world benchmark recommendation datasets demonstrate the superior performance of our CGAN in comparison with the state-of-the-art GAN-based methods.
{"title":"Collaborative Generative Adversarial Network for Recommendation Systems","authors":"Yuzhen Tong, Yadan Luo, Zheng Zhang, S. Sadiq, Peng Cui","doi":"10.1109/ICDEW.2019.00-16","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-16","url":null,"abstract":"Recommendation systems have been a core part of daily Internet life. Conventional recommendation models hardly defend adversaries due to the natural noise like misclicking. Recent researches on GAN-based recommendation systems can improve the robustness of the learning models, yielding the state-of-the-art performance. The basic idea is to adopt an interplay minimax game on two recommendation systems by picking negative samples as fake items and employ reinforcement learning policy. However, such strategy may lead to mode collapse and result in high vulnerability to adversarial perturbations on its model parameters. In this paper, we propose a new collaborative framework, namely Collaborative Generative Adversarial Network (CGAN), which adopts Variational Auto-encoder (VAE) as the generator and performs adversarial training in the continuous embedding space. The formulation of CGAN has two advantages: 1) its auto-encoder takes the role of generator to mimic the true distribution of users preferences over items by capturing subtle latent factors underlying user-item interactions; 2) the adversarial training in continuous space enhances models robustness and performance. Extensive experiments conducted on two real-world benchmark recommendation datasets demonstrate the superior performance of our CGAN in comparison with the state-of-the-art GAN-based methods.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131853472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.00-41
Alan G. Labouseur, C. Matheus
Blockchain's popularity has changed the way people think about data access, storage, and retrieval. Because of this, many classic data management challenges are imbued with renewed significance. One such challenge is the issue of Dynamic Data Quality. As time passes, data changes in content and structure and thus becomes dynamic. Data quality, therefore, also becomes dynamic because it is an aggregate characteristic of the changing content and changing structure of data itself. But blockchain is a static structure. The friction between static blockchains and Dynamic Data Quality give rise to new research opportunities, which the authors address in this paper.
{"title":"Dynamic Data Quality for Static Blockchains","authors":"Alan G. Labouseur, C. Matheus","doi":"10.1109/ICDEW.2019.00-41","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-41","url":null,"abstract":"Blockchain's popularity has changed the way people think about data access, storage, and retrieval. Because of this, many classic data management challenges are imbued with renewed significance. One such challenge is the issue of Dynamic Data Quality. As time passes, data changes in content and structure and thus becomes dynamic. Data quality, therefore, also becomes dynamic because it is an aggregate characteristic of the changing content and changing structure of data itself. But blockchain is a static structure. The friction between static blockchains and Dynamic Data Quality give rise to new research opportunities, which the authors address in this paper.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129493239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.00-15
Jialing Song
Combining global user and product characteristics with local review information provides a powerful mechanism for predicting users' sentiment in a review document about a product on online review sites such as Amazon, Yelp and IMDB. However, the user information is not always available in the real scenario, for example, some new-registered users, or some sites allowing users' comments without logging in. To address this issue, we introduce a novel knowledge distillation (KD) learning paradigm, to transfer the user characteristics into the weights of student neural networks that just utilize product and review information. The teacher model transfers its predictive distributions of training data to the student model. Thus, the user profiles are only required during the training stage. Experimental results on several sentiment classification datasets show that the proposed learning framework enables student models to achieve improved performance.
{"title":"Distilling Knowledge from User Information for Document Level Sentiment Classification","authors":"Jialing Song","doi":"10.1109/ICDEW.2019.00-15","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-15","url":null,"abstract":"Combining global user and product characteristics with local review information provides a powerful mechanism for predicting users' sentiment in a review document about a product on online review sites such as Amazon, Yelp and IMDB. However, the user information is not always available in the real scenario, for example, some new-registered users, or some sites allowing users' comments without logging in. To address this issue, we introduce a novel knowledge distillation (KD) learning paradigm, to transfer the user characteristics into the weights of student neural networks that just utilize product and review information. The teacher model transfers its predictive distributions of training data to the student model. Thus, the user profiles are only required during the training stage. Experimental results on several sentiment classification datasets show that the proposed learning framework enables student models to achieve improved performance.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114736667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.000-7
Ran Wang, Yixiang Fang, Xing Feng
With the prevalence of graph data, graph edit distance (GED), a well-known measure of similarity between two graphs, has been widely used in many real applications, such as graph classification and clustering, similar object detection, and biology network analysis. Despite its usefulness and popularity, GED is computationally costly, because it is NP-hard. Currently, most existing solutions focus on computing GED in a serial manner and little attention has been paid for parallel computing. In this paper, we propose a novel efficient parallel algorithm for computing GED. Our algorithm is based on the state[1]of-the-art GED algorithm AStar+-LSa, and is called PGED. The main idea of PGED is to allocate the heavy workload of searching the optimal vertex mapping between two graphs, which is the most time consuming step, to multiple threads based on an effective allocation strategy, resulting in high efficiency of GED computation. We have evaluated PGED on two real datasets, and the experimental results show that by using multiple threads, PGED is more efficient than AStar+-LSa. In addition, by carefully tuning the parameters, the performance of PGED can be further improved.
{"title":"Efficient Parallel Computing of Graph Edit Distance","authors":"Ran Wang, Yixiang Fang, Xing Feng","doi":"10.1109/ICDEW.2019.000-7","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.000-7","url":null,"abstract":"With the prevalence of graph data, graph edit distance (GED), a well-known measure of similarity between two graphs, has been widely used in many real applications, such as graph classification and clustering, similar object detection, and biology network analysis. Despite its usefulness and popularity, GED is computationally costly, because it is NP-hard. Currently, most existing solutions focus on computing GED in a serial manner and little attention has been paid for parallel computing. In this paper, we propose a novel efficient parallel algorithm for computing GED. Our algorithm is based on the state[1]of-the-art GED algorithm AStar+-LSa, and is called PGED. The main idea of PGED is to allocate the heavy workload of searching the optimal vertex mapping between two graphs, which is the most time consuming step, to multiple threads based on an effective allocation strategy, resulting in high efficiency of GED computation. We have evaluated PGED on two real datasets, and the experimental results show that by using multiple threads, PGED is more efficient than AStar+-LSa. In addition, by carefully tuning the parameters, the performance of PGED can be further improved.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123736415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-08DOI: 10.1109/ICDEW.2019.000-5
Chaoqun Yang, Yuanyuan Zhu, Ming Zhong, Rongrong Li
Computing semantic similarity between concepts is a fundamental task in natural language processing and has a large variety of applications. In this paper, first of all, we will review and analyze existing semantic similarity computation methods in knowledge graphs. Through the analysis of these methods, we find that existing works mainly focus on the context features of concepts which indicate the position or the frequency of the concepts in the knowledge graphs, such as the depth of terms, information content of the terms, or the distance between terms, while a fundamental part to describe the meaning of the concept, the synsets of concepts, are neglected for a long term. Thus, in this paper, we propose a new method to compute the similarity of concepts based on their extended synsets. Moreover, we propose a general hybrid framework, which can combine our new similarity measure based on extended synsets with any of existing context feature based semantic similarities to evaluate the concepts more accurately. We conducted experiments on five well-known datasets for semantic similarity evaluation, and the experimental results show that our general framework can improve most of existing methods significantly.
{"title":"Semantic Similarity Computation in Knowledge Graphs: Comparisons and Improvements","authors":"Chaoqun Yang, Yuanyuan Zhu, Ming Zhong, Rongrong Li","doi":"10.1109/ICDEW.2019.000-5","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.000-5","url":null,"abstract":"Computing semantic similarity between concepts is a fundamental task in natural language processing and has a large variety of applications. In this paper, first of all, we will review and analyze existing semantic similarity computation methods in knowledge graphs. Through the analysis of these methods, we find that existing works mainly focus on the context features of concepts which indicate the position or the frequency of the concepts in the knowledge graphs, such as the depth of terms, information content of the terms, or the distance between terms, while a fundamental part to describe the meaning of the concept, the synsets of concepts, are neglected for a long term. Thus, in this paper, we propose a new method to compute the similarity of concepts based on their extended synsets. Moreover, we propose a general hybrid framework, which can combine our new similarity measure based on extended synsets with any of existing context feature based semantic similarities to evaluate the concepts more accurately. We conducted experiments on five well-known datasets for semantic similarity evaluation, and the experimental results show that our general framework can improve most of existing methods significantly.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130660679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hashing has become more and more attractive in the large-scale multimedia retrieval community, due to its fast search speed and low storage cost. Most hashing methods focus on finding the inherent data structure, and neglect the sparse reconstruction relationship. Besides, most of them adopt a two-step solution for the structure embedding and the hash codes learning, which may yield suboptimal results. To address these issues, in this paper, we present a novel sparsity-based hashing method, namely, Sparse Manifold embedded hASHing, SMASH for short. It employs the sparse representation technique to extract the implicit structure in the data, and preserves the structure by minimizing the reconstruction error and the quantization loss with constraints to satisfy the independence and balance of the hash codes. An alternative algorithm is devised to solve the optimization problem in SMASH. Based on it, SMASH learns the hash codes and the hash functions simultaneously. Extensive experiments on several benchmark datasets demonstrate that SMASH outperforms some state-of-the-art hashing methods for the multimedia retrieval task.
{"title":"Sparse Manifold Embedded Hashing for Multimedia Retrieval","authors":"Yongxin Wang, Xin Luo, Huaxiang Zhang, Xin-Shun Xu","doi":"10.1109/ICDEW.2019.00011","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00011","url":null,"abstract":"Hashing has become more and more attractive in the large-scale multimedia retrieval community, due to its fast search speed and low storage cost. Most hashing methods focus on finding the inherent data structure, and neglect the sparse reconstruction relationship. Besides, most of them adopt a two-step solution for the structure embedding and the hash codes learning, which may yield suboptimal results. To address these issues, in this paper, we present a novel sparsity-based hashing method, namely, Sparse Manifold embedded hASHing, SMASH for short. It employs the sparse representation technique to extract the implicit structure in the data, and preserves the structure by minimizing the reconstruction error and the quantization loss with constraints to satisfy the independence and balance of the hash codes. An alternative algorithm is devised to solve the optimization problem in SMASH. Based on it, SMASH learns the hash codes and the hash functions simultaneously. Extensive experiments on several benchmark datasets demonstrate that SMASH outperforms some state-of-the-art hashing methods for the multimedia retrieval task.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132420358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}