Pub Date : 2022-05-11DOI: 10.1007/s13735-022-00239-4
S. Panigrahi, U. Raju
{"title":"InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection","authors":"S. Panigrahi, U. Raju","doi":"10.1007/s13735-022-00239-4","DOIUrl":"https://doi.org/10.1007/s13735-022-00239-4","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72541972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-10DOI: 10.1007/s13735-022-00232-x
Ihssane Houhou, A. Zitouni, Y. Ruichek, Salah Eddine Bekhouche, M. Kas, A. Taleb-Ahmed
{"title":"RGBD deep multi-scale network for background subtraction","authors":"Ihssane Houhou, A. Zitouni, Y. Ruichek, Salah Eddine Bekhouche, M. Kas, A. Taleb-Ahmed","doi":"10.1007/s13735-022-00232-x","DOIUrl":"https://doi.org/10.1007/s13735-022-00232-x","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76581868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-25DOI: 10.1007/s13735-022-00230-z
Na He, Sam Ferguson
{"title":"Music emotion recognition based on segment-level two-stage learning","authors":"Na He, Sam Ferguson","doi":"10.1007/s13735-022-00230-z","DOIUrl":"https://doi.org/10.1007/s13735-022-00230-z","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86966104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-21DOI: 10.1007/s13735-022-00236-7
M. Meraz, Md Afzal Ansari, M. Javed, P. Chakraborty
{"title":"DC-GNN: drop channel graph neural network for object classification and part segmentation in the point cloud","authors":"M. Meraz, Md Afzal Ansari, M. Javed, P. Chakraborty","doi":"10.1007/s13735-022-00236-7","DOIUrl":"https://doi.org/10.1007/s13735-022-00236-7","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87395394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.1007/s13735-022-00234-9
Ohoud Nafea, Wadood Abdul, G. Muhammad
{"title":"Multi-sensor human activity recognition using CNN and GRU","authors":"Ohoud Nafea, Wadood Abdul, G. Muhammad","doi":"10.1007/s13735-022-00234-9","DOIUrl":"https://doi.org/10.1007/s13735-022-00234-9","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83267424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-12DOI: 10.1007/s13735-022-00233-w
M. Fisichella
{"title":"Siamese coding network and pair similarity prediction for near-duplicate image detection","authors":"M. Fisichella","doi":"10.1007/s13735-022-00233-w","DOIUrl":"https://doi.org/10.1007/s13735-022-00233-w","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85549138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-12DOI: 10.1007/s13735-022-00231-y
Xiaoyi Wang, Jun Huang
{"title":"A local representation-enhanced recurrent convolutional network for image captioning","authors":"Xiaoyi Wang, Jun Huang","doi":"10.1007/s13735-022-00231-y","DOIUrl":"https://doi.org/10.1007/s13735-022-00231-y","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78893857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-08DOI: 10.48550/arXiv.2204.04014
Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris
Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multimodal multilayer perceptron processing categorical, visual and textual features of the product and (2) a Quasi-AutoRegressive neural network modelling the “target” time series of the product’s attributes along with the “exogenous” time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products’ unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g. manually written texts). We employ the product’s target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large-scale image fashion datasets, Mallzee-P and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalization to other domains. A comparative study on the VISUELLE dataset shows that MuQAR is capable of competing and surpassing the domain’s current state of the art by 4.65% and 4.8% in terms of WAPE and MAE, respectively.
{"title":"Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products","authors":"Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris","doi":"10.48550/arXiv.2204.04014","DOIUrl":"https://doi.org/10.48550/arXiv.2204.04014","url":null,"abstract":"Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multimodal multilayer perceptron processing categorical, visual and textual features of the product and (2) a Quasi-AutoRegressive neural network modelling the “target” time series of the product’s attributes along with the “exogenous” time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products’ unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g. manually written texts). We employ the product’s target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large-scale image fashion datasets, Mallzee-P and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalization to other domains. A comparative study on the VISUELLE dataset shows that MuQAR is capable of competing and surpassing the domain’s current state of the art by 4.65% and 4.8% in terms of WAPE and MAE, respectively.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84560561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-24DOI: 10.1007/s13735-022-00229-6
M. Junayed, Md Baharul Islam, H. Imani, Tarkan Aydin
{"title":"PDS-Net: A novel point and depth-wise separable convolution for real-time object detection","authors":"M. Junayed, Md Baharul Islam, H. Imani, Tarkan Aydin","doi":"10.1007/s13735-022-00229-6","DOIUrl":"https://doi.org/10.1007/s13735-022-00229-6","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76468263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}