Pub Date : 2019-02-19DOI: 10.1007/978-3-030-40014-9_5
Dieter De Paepe, Diego Nieves Avendano, S. Hoecke
{"title":"Implications of Z-Normalization in the Matrix Profile","authors":"Dieter De Paepe, Diego Nieves Avendano, S. Hoecke","doi":"10.1007/978-3-030-40014-9_5","DOIUrl":"https://doi.org/10.1007/978-3-030-40014-9_5","url":null,"abstract":"","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134036188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007314100830093
Dieter De Paepe, Olivier Janssens, S. Hoecke
As companies are increasingly measuring their products and services, the amount of time series data is rising and techniques to extract usable information are needed. One recently developed data mining technique for time series is the Matrix Profile. It consists of the smallest z-normalized Euclidean distance of each subsequence of a time series to all other subsequences of another series. It has been used for motif and discord discovery, for segmentation and as building block for other techniques. One side effect of the z-normalization used is that small fluctuations on flat signals are upscaled. This can lead to high and unintuitive distances for very similar subsequences from noisy data. We determined an analytic method to estimate and remove the effects of this noise, adding only a single, intuitive parameter to the calculation of the Matrix Profile. This paper explains our method and demonstrates it by performing discord discovery on the Numenta Anomaly Benchmark and by segmenting the PAMAP2 activity dataset. We find that our technique results in a more intuitive Matrix Profile and provides improved results in both usecases for series containing many flat, noisy subsequences. Since our technique is an extension of the Matrix Profile, it can be applied to any of the various tasks that could be solved by it, improving results where data contains flat and noisy sequences.
{"title":"Eliminating Noise in the Matrix Profile","authors":"Dieter De Paepe, Olivier Janssens, S. Hoecke","doi":"10.5220/0007314100830093","DOIUrl":"https://doi.org/10.5220/0007314100830093","url":null,"abstract":"As companies are increasingly measuring their products and services, the amount of time series data is rising and techniques to extract usable information are needed. One recently developed data mining technique for time series is the Matrix Profile. It consists of the smallest z-normalized Euclidean distance of each subsequence of a time series to all other subsequences of another series. It has been used for motif and discord discovery, for segmentation and as building block for other techniques. One side effect of the z-normalization used is that small fluctuations on flat signals are upscaled. This can lead to high and unintuitive distances for very similar subsequences from noisy data. We determined an analytic method to estimate and remove the effects of this noise, adding only a single, intuitive parameter to the calculation of the Matrix Profile. This paper explains our method and demonstrates it by performing discord discovery on the Numenta Anomaly Benchmark and by segmenting the PAMAP2 activity dataset. We find that our technique results in a more intuitive Matrix Profile and provides improved results in both usecases for series containing many flat, noisy subsequences. Since our technique is an extension of the Matrix Profile, it can be applied to any of the various tasks that could be solved by it, improving results where data contains flat and noisy sequences.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"17 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114017170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007385305820589
L. Haar, K. Anding, K. Trambitckii, G. Notni
The reduction of the feature set by selecting relevant features for the classification process is an important step within the image processing chain, but sometimes too little attention is paid to it. Such a reduction has many advantages. It can remove irrelevant and redundant data, improve recognition performance, reduce storage capacity requirements, computational time of calculations and also the complexity of the model. Within this paper supervised and unsupervised feature selection methods are compared with respect to the achievable recognition accuracy. Supervised Methods include information of the given classes in the selection, whereas unsupervised ones can be used for tasks without known class labels. Feature clustering is an unsupervised method. For this type of feature reduction, mainly hierarchical methods, but also k-means are used. Instead of this two clustering methods, the Expectation Maximization (EM) algorithm was used in this paper. The aim is to investigate whether this type of clustering algorithm can provide a proper feature vector using feature clustering. There is no feature reduction technique that provides equally best results for all datasets and classifiers. However, for all datasets, it was possible to reduce the feature set to a specific number of useful features without losses and often even with improvements in recognition performance.
{"title":"Comparison between Supervised and Unsupervised Feature Selection Methods","authors":"L. Haar, K. Anding, K. Trambitckii, G. Notni","doi":"10.5220/0007385305820589","DOIUrl":"https://doi.org/10.5220/0007385305820589","url":null,"abstract":"The reduction of the feature set by selecting relevant features for the classification process is an important step within the image processing chain, but sometimes too little attention is paid to it. Such a reduction has many advantages. It can remove irrelevant and redundant data, improve recognition performance, reduce storage capacity requirements, computational time of calculations and also the complexity of the model. Within this paper supervised and unsupervised feature selection methods are compared with respect to the achievable recognition accuracy. Supervised Methods include information of the given classes in the selection, whereas unsupervised ones can be used for tasks without known class labels. Feature clustering is an unsupervised method. For this type of feature reduction, mainly hierarchical methods, but also k-means are used. Instead of this two clustering methods, the Expectation Maximization (EM) algorithm was used in this paper. The aim is to investigate whether this type of clustering algorithm can provide a proper feature vector using feature clustering. There is no feature reduction technique that provides equally best results for all datasets and classifiers. However, for all datasets, it was possible to reduce the feature set to a specific number of useful features without losses and often even with improvements in recognition performance.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116037640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007483007150720
A. Uteshev, M. Goncharova
The problem of geometric distance d evaluation from a point X0 to an algebraic curve in R2 or manifold G(X) = 0 in R3 is treated in the form of comparison of exact value with two its successive approximations d(1) and d(2). The geometric distance is evaluated from the univariate distance equation possessing the zero set coinciding with that of critical values of the function d2(X0), while d(1)(X0) and d(2)(X0) are obtained via expansion of d2(X0) into the power series of the algebraic distance G(X0). We estimate the quality of approximation comparing the relative positions of the level sets of d(X), d(1)(X) and d(2)(X).
{"title":"Approximation of the Distance from a Point to an Algebraic Manifold","authors":"A. Uteshev, M. Goncharova","doi":"10.5220/0007483007150720","DOIUrl":"https://doi.org/10.5220/0007483007150720","url":null,"abstract":"The problem of geometric distance d evaluation from a point X0 to an algebraic curve in R2 or manifold G(X) = 0 in R3 is treated in the form of comparison of exact value with two its successive approximations d(1) and d(2). The geometric distance is evaluated from the univariate distance equation possessing the zero set coinciding with that of critical values of the function d2(X0), while d(1)(X0) and d(2)(X0) are obtained via expansion of d2(X0) into the power series of the algebraic distance G(X0). We estimate the quality of approximation comparing the relative positions of the level sets of d(X), d(1)(X) and d(2)(X).","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127982940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007681508660873
A. Saadallah, N. Piatkowski, Felix Finkeldey, P. Wiederkehr, K. Morik
Class imbalance occurs when data classes are not equally represented. Generally, it occurs when some classes represent rare events, while the other classes represent the counterpart of these events. Rare events, especially those that may have a negative impact, often require informed decision-making in a timely manner. However, class imbalance is known to induce a learning bias towards majority classes which implies a poor detection of minority classes. Thus, we propose a new ensemble method to handle class imbalance explicitly at training time. In contrast to existing ensemble methods for class imbalance that use either data driven or randomized approaches for their constructions, our method exploits both directions. On the one hand, ensemble members are built from randomized subsets of training data. On the other hand, we construct different scenarios of class imbalance for the unknown test data. An ensemble is built for each resulting scenario by combining random sampling with the estimation of the relative importance of specific loss functions. Final predictions are generated by a weighted average of each ensemble prediction. As opposed to existing methods, our approach does not try to fix imbalanced data sets. Instead, we show how imbalanced data sets can make classification easier, due to a limited range of true class frequencies. Our procedure promotes diversity among the ensemble members and is not sensitive to specific parameter settings. An experimental demonstration shows, that our new method outperforms or is on par with state-of-the-art ensembles and class imbalance techniques.
{"title":"Learning Ensembles in the Presence of Imbalanced Classes","authors":"A. Saadallah, N. Piatkowski, Felix Finkeldey, P. Wiederkehr, K. Morik","doi":"10.5220/0007681508660873","DOIUrl":"https://doi.org/10.5220/0007681508660873","url":null,"abstract":"Class imbalance occurs when data classes are not equally represented. Generally, it occurs when some classes represent rare events, while the other classes represent the counterpart of these events. Rare events, especially those that may have a negative impact, often require informed decision-making in a timely manner. However, class imbalance is known to induce a learning bias towards majority classes which implies a poor detection of minority classes. Thus, we propose a new ensemble method to handle class imbalance explicitly at training time. In contrast to existing ensemble methods for class imbalance that use either data driven or randomized approaches for their constructions, our method exploits both directions. On the one hand, ensemble members are built from randomized subsets of training data. On the other hand, we construct different scenarios of class imbalance for the unknown test data. An ensemble is built for each resulting scenario by combining random sampling with the estimation of the relative importance of specific loss functions. Final predictions are generated by a weighted average of each ensemble prediction. As opposed to existing methods, our approach does not try to fix imbalanced data sets. Instead, we show how imbalanced data sets can make classification easier, due to a limited range of true class frequencies. Our procedure promotes diversity among the ensemble members and is not sensitive to specific parameter settings. An experimental demonstration shows, that our new method outperforms or is on par with state-of-the-art ensembles and class imbalance techniques.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129535420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, a lot of single stage detectors using multi-scale features have been actively proposed. They are much faster than two stage detectors that use region proposal networks (RPN) without much degradation in the detection performances. However, the feature maps in the lower layers close to the input which are responsible for detecting small objects in a single stage detector have a problem of insufficient representation power because they are too shallow. There is also a structural contradiction that the feature maps not only have to deliver low-level information to next layers but also have to contain high-level abstraction for prediction. In this paper, we propose a method to enrich the representation power of feature maps using a new feature fusion method which makes use of the information from the consecutive layer. It also adopts a unified prediction module which has an enhanced generalization performance. The proposed method enables more precise prediction, which achieved higher or compatible score than other competitors such as SSD and DSSD on PASCAL VOC and MS COCO. In addition, it maintains the advantage of fast computation of a single stage detector, which requires much less computation than other detectors with similar performance.
{"title":"Two-layer Residual Feature Fusion for Object Detection","authors":"Jaeseok Choi, Kyoungmin Lee, Jisoo Jeong, Nojun Kwak","doi":"10.5220/0007306803520359","DOIUrl":"https://doi.org/10.5220/0007306803520359","url":null,"abstract":"Recently, a lot of single stage detectors using multi-scale features have been actively proposed. They are much faster than two stage detectors that use region proposal networks (RPN) without much degradation in the detection performances. However, the feature maps in the lower layers close to the input which are responsible for detecting small objects in a single stage detector have a problem of insufficient representation power because they are too shallow. There is also a structural contradiction that the feature maps not only have to deliver low-level information to next layers but also have to contain high-level abstraction for prediction. In this paper, we propose a method to enrich the representation power of feature maps using a new feature fusion method which makes use of the information from the consecutive layer. It also adopts a unified prediction module which has an enhanced generalization performance. The proposed method enables more precise prediction, which achieved higher or compatible score than other competitors such as SSD and DSSD on PASCAL VOC and MS COCO. In addition, it maintains the advantage of fast computation of a single stage detector, which requires much less computation than other detectors with similar performance.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124203848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007367301350144
Yehezkel S. Resheff, I. Lieder, Tom Hope
Pre-trained deep neural networks, powerful models trained on large datasets, have become a popular tool in computer vision for transfer learning. However, the standard approach of using a single network potentially misses out on valuable information contained in other readily available models. In this work, we study the Mixture of Experts (MoE) approach for adaptively fusing multiple pre-trained models for each individual input image. In particular, we explore how far we can get by combining diverse pre-trained representations in a customized way that maximizes their potential in a lightweight framework. Our approach is motivated by an empirical study of the predictions made by popular pre-trained nets across various datasets, finding that both performance and agreement between models vary across datasets. We further propose a miniature CNN gating mechanism operating on a thumbnail version of the input image, and show this is enough to guide a good fusion. Finally, we explore a multi-modal blend of visual and natural-language representations, using a label-space embedding to inject pre-trained word-vectors. Across multiple datasets, we demonstrate that an adaptive fusion of pre-trained models can obtain favorable results.
{"title":"All Together Now! The Benefits of Adaptively Fusing Pre-trained Deep Representations","authors":"Yehezkel S. Resheff, I. Lieder, Tom Hope","doi":"10.5220/0007367301350144","DOIUrl":"https://doi.org/10.5220/0007367301350144","url":null,"abstract":"Pre-trained deep neural networks, powerful models trained on large datasets, have become a popular tool in computer vision for transfer learning. However, the standard approach of using a single network potentially misses out on valuable information contained in other readily available models. In this work, we study the Mixture of Experts (MoE) approach for adaptively fusing multiple pre-trained models for each individual input image. In particular, we explore how far we can get by combining diverse pre-trained representations in a customized way that maximizes their potential in a lightweight framework. Our approach is motivated by an empirical study of the predictions made by popular pre-trained nets across various datasets, finding that both performance and agreement between models vary across datasets. We further propose a miniature CNN gating mechanism operating on a thumbnail version of the input image, and show this is enough to guide a good fusion. Finally, we explore a multi-modal blend of visual and natural-language representations, using a label-space embedding to inject pre-trained word-vectors. Across multiple datasets, we demonstrate that an adaptive fusion of pre-trained models can obtain favorable results.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"337 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122976053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007313603770387
Khari Jarrett, Joachim Lohn-Jaramillo, E. Bowen, Laura Ray, R. Granger
We introduce a new set of mechanisms for tracking entities through videos, at substantially less expense than required by standard methods. The approach combines inexpensive initial processing of individual frames together with integration of information across long time spans (multiple frames), resulting in the recognition and tracking of spatially and temporally contiguous entities, rather than focusing on the individual pixels that comprise those entities.
{"title":"Feedforward and Feedback Processing of Spatiotemporal Tubes for Efficient Object Localization","authors":"Khari Jarrett, Joachim Lohn-Jaramillo, E. Bowen, Laura Ray, R. Granger","doi":"10.5220/0007313603770387","DOIUrl":"https://doi.org/10.5220/0007313603770387","url":null,"abstract":"We introduce a new set of mechanisms for tracking entities through videos, at substantially less expense than required by standard methods. The approach combines inexpensive initial processing of individual frames together with integration of information across long time spans (multiple frames), resulting in the recognition and tracking of spatially and temporally contiguous entities, rather than focusing on the individual pixels that comprise those entities.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125223428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007389402000209
Mara Pistellato, Filippo Bergamasco, A. Albarelli, L. Cosmo, A. Gasparetto, A. Torsello
Phase-shift is one of the most effective techniques in 3D structured-light scanning for its accuracy and noise resilience. However, the periodic nature of the signal causes a spatial ambiguity when the fringe periods are shorter than the projector resolution. To solve this, many techniques exploit multiple combined signals to unwrap the phases and thus recovering a unique consistent code. In this paper, we study the phase estimation and unwrapping problem in a stochastic context. Assuming the acquired fringe signal to be affected by additive white Gaussian noise, we start by modelling each estimated phase as a zero-mean Wrapped Normal distribution with variance σ2. Then, our contributions are twofolds. First, we show how to recover the best projector code given multiple phase observations by means of a ML estimation over the combined fringe distributions. Second, we exploit the Cramer-Rao bounds to relate the phase variance σ2 to the variance of the observed signal, that can be easily estimated online during the fringe acquisition. An extensive set of experiments demonstrate that our approach outperforms other methods in terms of code recovery accuracy and ratio of faulty unwrappings.
{"title":"Stochastic Phase Estimation and Unwrapping","authors":"Mara Pistellato, Filippo Bergamasco, A. Albarelli, L. Cosmo, A. Gasparetto, A. Torsello","doi":"10.5220/0007389402000209","DOIUrl":"https://doi.org/10.5220/0007389402000209","url":null,"abstract":"Phase-shift is one of the most effective techniques in 3D structured-light scanning for its accuracy and noise resilience. However, the periodic nature of the signal causes a spatial ambiguity when the fringe periods are shorter than the projector resolution. To solve this, many techniques exploit multiple combined signals to unwrap the phases and thus recovering a unique consistent code. In this paper, we study the phase estimation and unwrapping problem in a stochastic context. Assuming the acquired fringe signal to be affected by additive white Gaussian noise, we start by modelling each estimated phase as a zero-mean Wrapped Normal distribution with variance σ2. Then, our contributions are twofolds. First, we show how to recover the best projector code given multiple phase observations by means of a ML estimation over the combined fringe distributions. Second, we exploit the Cramer-Rao bounds to relate the phase variance σ2 to the variance of the observed signal, that can be easily estimated online during the fringe acquisition. An extensive set of experiments demonstrate that our approach outperforms other methods in terms of code recovery accuracy and ratio of faulty unwrappings.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115517341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007313503680376
M. Fortin, B. Chaib-draa
Multimodal sentiment analysis has recently received an increasing interest. However, most methods have considered that text and image modalities are always available at test time. This assumption is often violated in real environments (e.g. social media) since users do not always publish a text with an image. In this paper we propose a method based on a multitask framework to combine multimodal information when it is available, while being able to handle the cases where a modality is missing. Our model contains one classifier for analyzing the text, another for analyzing the image, and another performing the prediction by fusing both modalities. In addition to offer a solution to the problem of a missing modality, our experiments show that this multitask framework improves generalization by acting as a regularization mechanism. We also demonstrate that the model can handle a missing modality at training time, thus being able to be trained with image-only and text-only examples.
{"title":"Multimodal Sentiment Analysis: A Multitask Learning Approach","authors":"M. Fortin, B. Chaib-draa","doi":"10.5220/0007313503680376","DOIUrl":"https://doi.org/10.5220/0007313503680376","url":null,"abstract":"Multimodal sentiment analysis has recently received an increasing interest. However, most methods have considered that text and image modalities are always available at test time. This assumption is often violated in real environments (e.g. social media) since users do not always publish a text with an image. In this paper we propose a method based on a multitask framework to combine multimodal information when it is available, while being able to handle the cases where a modality is missing. Our model contains one classifier for analyzing the text, another for analyzing the image, and another performing the prediction by fusing both modalities. In addition to offer a solution to the problem of a missing modality, our experiments show that this multitask framework improves generalization by acting as a regularization mechanism. We also demonstrate that the model can handle a missing modality at training time, thus being able to be trained with image-only and text-only examples.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116325066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}