Pub Date : 2019-02-19DOI: 10.5220/0007582408600865
Jyoti Nigam, Srishti Barahpuriya, Renu M. Rameshan
AlexNet, one of the earliest and successful deep learning networks, has given great performance in image classification task. There are some fundamental properties for good classification such as: the network preserves the important information of the input data; the network is able to see differently, points from different classes. In this work we experimentally verify that these core properties are followed by the AlexNet architecture. We analyze the effect of linear and nonlinear transformations on input data across the layers. The convolution filters are modeled as linear transformations. The verified results motivate to draw conclusions on the desirable properties of transformation matrix that aid in better classification.
{"title":"Analyzing the Linear and Nonlinear Transformations of AlexNet to Gain Insight into Its Performance","authors":"Jyoti Nigam, Srishti Barahpuriya, Renu M. Rameshan","doi":"10.5220/0007582408600865","DOIUrl":"https://doi.org/10.5220/0007582408600865","url":null,"abstract":"AlexNet, one of the earliest and successful deep learning networks, has given great performance in image classification task. There are some fundamental properties for good classification such as: the network preserves the important information of the input data; the network is able to see differently, points from different classes. In this work we experimentally verify that these core properties are followed by the AlexNet architecture. We analyze the effect of linear and nonlinear transformations on input data across the layers. The convolution filters are modeled as linear transformations. The verified results motivate to draw conclusions on the desirable properties of transformation matrix that aid in better classification.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133298419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007314003960405
Pranita Pradhan, T. Meyer, M. Vieth, A. Stallmach, M. Waldner, M. Schmitt, J. Popp, T. Bocklitz
Non-linear multimodal imaging, the combination of coherent anti-stokes Raman scattering (CARS), two-photon excited fluorescence (TPEF) and second harmonic generation (SHG), has shown its potential to assist the diagnosis of different inflammatory bowel diseases (IBDs). This label-free imaging technique can support the ‘gold-standard’ techniques such as colonoscopy and histopathology to ensure an IBD diagnosis in clinical environment. Moreover, non-linear multimodal imaging can measure biomolecular changes in different tissue regions such as crypt and mucosa region, which serve as a predictive marker for IBD severity. To achieve a real-time assessment of IBD severity, an automatic segmentation of the crypt and mucosa regions is needed. In this paper, we semantically segment the crypt and mucosa region using a deep neural network. We utilized the SegNet architecture (Badrinarayanan et al., 2015) and compared its results with a classical machine learning approach. Our trained SegNet model achieved an overall F1 score of 0.75. This model outperformed the classical machine learning approach for the segmentation of the crypt and mucosa region in our study.
非线性多模态成像,结合相干抗斯托克斯拉曼散射(CARS),双光子激发荧光(TPEF)和二次谐波产生(SHG),已经显示出其协助诊断不同炎症性肠病(IBDs)的潜力。这种无标签成像技术可以支持结肠镜检查和组织病理学等“金标准”技术,以确保在临床环境中对IBD进行诊断。此外,非线性多模态成像可以测量不同组织区域(如隐窝和粘膜区域)的生物分子变化,作为IBD严重程度的预测指标。为了实现对IBD严重程度的实时评估,需要对隐窝和粘膜区域进行自动分割。在本文中,我们使用深度神经网络对隐窝和粘膜区域进行语义分割。我们使用了SegNet架构(Badrinarayanan et al., 2015),并将其结果与经典的机器学习方法进行了比较。我们训练的SegNet模型获得了0.75的F1总分。在我们的研究中,该模型在隐窝和粘膜区域的分割方面优于经典的机器学习方法。
{"title":"Semantic Segmentation of Non-linear Multimodal Images for Disease Grading of Inflammatory Bowel Disease: A SegNet-based Application","authors":"Pranita Pradhan, T. Meyer, M. Vieth, A. Stallmach, M. Waldner, M. Schmitt, J. Popp, T. Bocklitz","doi":"10.5220/0007314003960405","DOIUrl":"https://doi.org/10.5220/0007314003960405","url":null,"abstract":"Non-linear multimodal imaging, the combination of coherent anti-stokes Raman scattering (CARS), two-photon excited fluorescence (TPEF) and second harmonic generation (SHG), has shown its potential to assist the diagnosis of different inflammatory bowel diseases (IBDs). This label-free imaging technique can support the ‘gold-standard’ techniques such as colonoscopy and histopathology to ensure an IBD diagnosis in clinical environment. Moreover, non-linear multimodal imaging can measure biomolecular changes in different tissue regions such as crypt and mucosa region, which serve as a predictive marker for IBD severity. To achieve a real-time assessment of IBD severity, an automatic segmentation of the crypt and mucosa regions is needed. In this paper, we semantically segment the crypt and mucosa region using a deep neural network. We utilized the SegNet architecture (Badrinarayanan et al., 2015) and compared its results with a classical machine learning approach. Our trained SegNet model achieved an overall F1 score of 0.75. This model outperformed the classical machine learning approach for the segmentation of the crypt and mucosa region in our study.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"487 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115880337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007382405560564
M. Reza, Md. Ajraf Rakib, S. S. Bukhari, A. Dengel
Document layout analysis is the most important part of converting scanned page images into search-able full text. An intensive amount of research is going on in the field of structured and semi-structured documents (journal articles, books, magazines, invoices) but not much in historical documents. Historical document digitization is a more challenging task than regular structured documents due to poor image quality, damaged characters, big amount of textual and non-textual noise. In the scientific community, the extraneous symbols from the neighboring page are considered as textual noise, while the appearances of black borders, speckles, ruler, different types of image etc. along the border of the documents are considered as non-textual noise. Existing historical document analysis method cannot handle all of this noise which is a very strong reason of getting undesired texts as a result from the output of Optical Character Recognition (OCR) that needs to be removed afterward with a lot of extra afford. This paper presents a new perspective especially for the historical document image cleanup by detecting the page frame of the document. The goal of this method is to find actual contents area of the document and ignore noises along the page border. We use morphological transforms, the line segment detector, and geometric matching algorithm to find an ideal page frame of the document. After the implementation of page frame method, we also evaluate our approach over 16th-19th century printed historical documents. We have noticed in the result that OCR performance for the historical documents increased by 4.49% after applying our page frame detection method. In addition, we are able to increase the OCR accuracy around 6.69% for contemporary documents too.
{"title":"A Robust Page Frame Detection Method for Complex Historical Document Images","authors":"M. Reza, Md. Ajraf Rakib, S. S. Bukhari, A. Dengel","doi":"10.5220/0007382405560564","DOIUrl":"https://doi.org/10.5220/0007382405560564","url":null,"abstract":"Document layout analysis is the most important part of converting scanned page images into search-able full text. An intensive amount of research is going on in the field of structured and semi-structured documents (journal articles, books, magazines, invoices) but not much in historical documents. Historical document digitization is a more challenging task than regular structured documents due to poor image quality, damaged characters, big amount of textual and non-textual noise. In the scientific community, the extraneous symbols from the neighboring page are considered as textual noise, while the appearances of black borders, speckles, ruler, different types of image etc. along the border of the documents are considered as non-textual noise. Existing historical document analysis method cannot handle all of this noise which is a very strong reason of getting undesired texts as a result from the output of Optical Character Recognition (OCR) that needs to be removed afterward with a lot of extra afford. This paper presents a new perspective especially for the historical document image cleanup by detecting the page frame of the document. The goal of this method is to find actual contents area of the document and ignore noises along the page border. We use morphological transforms, the line segment detector, and geometric matching algorithm to find an ideal page frame of the document. After the implementation of page frame method, we also evaluate our approach over 16th-19th century printed historical documents. We have noticed in the result that OCR performance for the historical documents increased by 4.49% after applying our page frame detection method. In addition, we are able to increase the OCR accuracy around 6.69% for contemporary documents too.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123414361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007311003600367
Praveen Kumar Badimala Giridhara, Chinmaya Mishra, Reddy Kumar Modam Venkataramana, S. S. Bukhari, A. Dengel
Data augmentation techniques have been widely used in visual recognition tasks as it is easy to generate new data by simple and straight forward image transformations. However, when it comes to text data augmentations, it is difficult to find appropriate transformation techniques which also preserve the contextual and grammatical structure of language texts. In this paper, we explore various text data augmentation techniques in text space and word embedding space. We study the effect of various augmented datasets on the efficiency of different deep learning models for relation classification in text.
{"title":"A Study of Various Text Augmentation Techniques for Relation Classification in Free Text","authors":"Praveen Kumar Badimala Giridhara, Chinmaya Mishra, Reddy Kumar Modam Venkataramana, S. S. Bukhari, A. Dengel","doi":"10.5220/0007311003600367","DOIUrl":"https://doi.org/10.5220/0007311003600367","url":null,"abstract":"Data augmentation techniques have been widely used in visual recognition tasks as it is easy to generate new data by simple and straight forward image transformations. However, when it comes to text data augmentations, it is difficult to find appropriate transformation techniques which also preserve the contextual and grammatical structure of language texts. In this paper, we explore various text data augmentation techniques in text space and word embedding space. We study the effect of various augmented datasets on the efficiency of different deep learning models for relation classification in text.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131018825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007368405240531
Vijaya Kumar Bajjer Ramanna, S. S. Bukhari, A. Dengel
The distorted images have been a major problem for Optical Character Recognition (OCR). In order to perform OCR on distorted images, dewarping has become a principal preprocessing step. This paper presents a new document dewarping method that removes curl and geometric distortion of modern and historical documents. Finally, the proposed method is evaluated and compared to the existing Computer Vision based method. Most of the traditional dewarping algorithms are created based on the text line feature extraction and segmentation. However, textual content extraction and segmentation can be sophisticated. Hence, the new technique is proposed, which doesn’t need any complicated methods to process the text lines. The proposed method is based on Deep Learning and it can be applied on all type of text documents and also documents with images and graphics. Moreover, there is no preprocessing required to apply this method on warped images. In the proposed system, the document distortion problem is treated as an image-to-image translation. The new method is implemented using a very powerful pix2pixhd network by utilizing Conditional Generative Adversarial Networks (CGAN). The network is trained on UW3 dataset by supplying distorted document as an input and cleaned image as the target. The generated images from the proposed method are cleanly dewarped and they are of high-resolution. Furthermore, these images can be used to perform OCR.
{"title":"Document Image Dewarping using Deep Learning","authors":"Vijaya Kumar Bajjer Ramanna, S. S. Bukhari, A. Dengel","doi":"10.5220/0007368405240531","DOIUrl":"https://doi.org/10.5220/0007368405240531","url":null,"abstract":"The distorted images have been a major problem for Optical Character Recognition (OCR). In order to perform OCR on distorted images, dewarping has become a principal preprocessing step. This paper presents a new document dewarping method that removes curl and geometric distortion of modern and historical documents. Finally, the proposed method is evaluated and compared to the existing Computer Vision based method. Most of the traditional dewarping algorithms are created based on the text line feature extraction and segmentation. However, textual content extraction and segmentation can be sophisticated. Hence, the new technique is proposed, which doesn’t need any complicated methods to process the text lines. The proposed method is based on Deep Learning and it can be applied on all type of text documents and also documents with images and graphics. Moreover, there is no preprocessing required to apply this method on warped images. In the proposed system, the document distortion problem is treated as an image-to-image translation. The new method is implemented using a very powerful pix2pixhd network by utilizing Conditional Generative Adversarial Networks (CGAN). The network is trained on UW3 dataset by supplying distorted document as an input and cleaned image as the target. The generated images from the proposed method are cleanly dewarped and they are of high-resolution. Furthermore, these images can be used to perform OCR.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116150266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007682008740880
T. Bocklitz
Machine learning methods like classification and regression models are specific solutions for pattern recognition problems. Subsequently, the patterns ’found’ by these methods can be used either in an exploration manner or the model converts the patterns into discriminative values or regression predictions. In both application scenarios it is important to visualize the data-basis of the model, because this unravels the patterns. In case of linear classifiers or linear regression models the task is straight forward, because the model is characterized by a vector which acts as variable weighting and can be visualized. For non-linear models the visualization task is not solved yet and therefore these models act as ’black box’ systems. In this contribution we present a framework, which approximates a given trained parametric model (either classification or regression model) by a series of polynomial models derived from a Taylor expansion of the original non-linear model’s output function. These polynomial models can be visualized until the second order and subsequently interpreted. This visualization opens the ways to understand the data basis of a trained non-linear model and it allows estimating the degree of its non-linearity. By doing so the framework helps to understand non-linear models used for pattern recognition tasks and unravel patterns these methods were using for their predictions.
{"title":"Understanding of Non-linear Parametric Regression and Classification Models: A Taylor Series based Approach","authors":"T. Bocklitz","doi":"10.5220/0007682008740880","DOIUrl":"https://doi.org/10.5220/0007682008740880","url":null,"abstract":"Machine learning methods like classification and regression models are specific solutions for pattern recognition problems. Subsequently, the patterns ’found’ by these methods can be used either in an exploration manner or the model converts the patterns into discriminative values or regression predictions. In both application scenarios it is important to visualize the data-basis of the model, because this unravels the patterns. In case of linear classifiers or linear regression models the task is straight forward, because the model is characterized by a vector which acts as variable weighting and can be visualized. For non-linear models the visualization task is not solved yet and therefore these models act as ’black box’ systems. In this contribution we present a framework, which approximates a given trained parametric model (either classification or regression model) by a series of polynomial models derived from a Taylor expansion of the original non-linear model’s output function. These polynomial models can be visualized until the second order and subsequently interpreted. This visualization opens the ways to understand the data basis of a trained non-linear model and it allows estimating the degree of its non-linearity. By doing so the framework helps to understand non-linear models used for pattern recognition tasks and unravel patterns these methods were using for their predictions.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117120314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007686309150922
Gorjan Popovski, S. Kochev, B. Korousic-Seljak, T. Eftimov
The application of Natural Language Processing (NLP) methods and resources to biomedical textual data has received growing attention over the past years. Previously organized biomedical NLP-shared tasks (such as, for example, BioNLP Shared Tasks) are related to extracting different biomedical entities (like genes, phenotypes, drugs, diseases, chemical entities) and finding relations between them. However, to the best of our knowledge there are limited NLP methods that can be used for information extraction of entities related to food concepts. For this reason, to extract food entities from unstructured textual data, we propose a rule-based named-entity recognition method for food information extraction, called FoodIE. It is comprised of a small number of rules based on computational linguistics and semantic information that describe the food entities. Experimental results from the evaluation performed using two different datasets showed that very promising results can be achieved. The proposed method achieved 97% precision, 94% recall, and 96% F1 score.
{"title":"FoodIE: A Rule-based Named-entity Recognition Method for Food Information Extraction","authors":"Gorjan Popovski, S. Kochev, B. Korousic-Seljak, T. Eftimov","doi":"10.5220/0007686309150922","DOIUrl":"https://doi.org/10.5220/0007686309150922","url":null,"abstract":"The application of Natural Language Processing (NLP) methods and resources to biomedical textual data has received growing attention over the past years. Previously organized biomedical NLP-shared tasks (such as, for example, BioNLP Shared Tasks) are related to extracting different biomedical entities (like genes, phenotypes, drugs, diseases, chemical entities) and finding relations between them. However, to the best of our knowledge there are limited NLP methods that can be used for information extraction of entities related to food concepts. For this reason, to extract food entities from unstructured textual data, we propose a rule-based named-entity recognition method for food information extraction, called FoodIE. It is comprised of a small number of rules based on computational linguistics and semantic information that describe the food entities. Experimental results from the evaluation performed using two different datasets showed that very promising results can be achieved. The proposed method achieved 97% precision, 94% recall, and 96% F1 score.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117161124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007690502430254
Sebastian Buschjäger, T. Liebig, K. Morik
Traffic congestion is one of the most pressing issues for smart cities. Information on traffic flow can be used to reduce congestion by predicting vehicle counts at unmonitored locations so that counter-measures can be applied before congestion appears. To do so pricy sensors must be distributed sparsely in the city and at important roads in the city center to collect road and vehicle information throughout the city in real-time. Then, Machine Learning models can be applied to predict vehicle counts at unmonitored locations. To be fault-tolerant and increase coverage of the traffic predictions to the suburbs, rural regions, or even neighboring villages, these Machine Learning models should not operate at a central traffic control room but rather be distributed across the city. Gaussian Processes (GP) work well in the context of traffic count prediction, but cannot capitalize on the vast amount of data available in an entire city. Furthermore, Gaussian Processes are a global and centralized model, which requires all measurements to be available at a central computation node. Product of Expert (PoE) models have been proposed as a scalable alternative to Gaussian Processes. A PoE model trains multiple, independent GPs on different subsets of the data and weight individual predictions based on each experts uncertainty. These methods work well, but they assume that experts are independent even though they may share data points. Furthermore, PoE models require exhaustive communication bandwidth between the individual experts to form the final prediction. In this paper we propose a hierarchical Product of Expert model, which consist of multiple layers of small, independent and local GP experts. We view Gaussian Process induction as regularized optimization procedure and utilize this view to derive an efficient algorithm which selects independent regions of the data. Then, we train local expert models on these regions, so that each expert is responsible for a given region. The resulting algorithm scales well for large amounts of data and outperforms flat PoE models in terms of communication cost, model size and predictive performance. Last, we discuss how to deploy these local expert models onto small devices.
{"title":"Gaussian Model Trees for Traffic Imputation","authors":"Sebastian Buschjäger, T. Liebig, K. Morik","doi":"10.5220/0007690502430254","DOIUrl":"https://doi.org/10.5220/0007690502430254","url":null,"abstract":"Traffic congestion is one of the most pressing issues for smart cities. Information on traffic flow can be used to reduce congestion by predicting vehicle counts at unmonitored locations so that counter-measures can be applied before congestion appears. To do so pricy sensors must be distributed sparsely in the city and at important roads in the city center to collect road and vehicle information throughout the city in real-time. Then, Machine Learning models can be applied to predict vehicle counts at unmonitored locations. To be fault-tolerant and increase coverage of the traffic predictions to the suburbs, rural regions, or even neighboring villages, these Machine Learning models should not operate at a central traffic control room but rather be distributed across the city. Gaussian Processes (GP) work well in the context of traffic count prediction, but cannot capitalize on the vast amount of data available in an entire city. Furthermore, Gaussian Processes are a global and centralized model, which requires all measurements to be available at a central computation node. Product of Expert (PoE) models have been proposed as a scalable alternative to Gaussian Processes. A PoE model trains multiple, independent GPs on different subsets of the data and weight individual predictions based on each experts uncertainty. These methods work well, but they assume that experts are independent even though they may share data points. Furthermore, PoE models require exhaustive communication bandwidth between the individual experts to form the final prediction. In this paper we propose a hierarchical Product of Expert model, which consist of multiple layers of small, independent and local GP experts. We view Gaussian Process induction as regularized optimization procedure and utilize this view to derive an efficient algorithm which selects independent regions of the data. Then, we train local expert models on these regions, so that each expert is responsible for a given region. The resulting algorithm scales well for large amounts of data and outperforms flat PoE models in terms of communication cost, model size and predictive performance. Last, we discuss how to deploy these local expert models onto small devices.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116281864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007381605480555
Xu Yang, J. Gaspar, W. Ke, C. Lam, Yanwei Zheng, W. Lou, Yapeng Wang
Current state-of-the-art single shot object detection pipelines, composed by an object detector such as Yolo, generate multiple detections for each object, requiring a post-processing Non-Maxima Suppression (NMS) algorithm to remove redundant detections. However, this pipeline struggles to achieve high accuracy, particularly in object counting applications, due to a trade-off between precision and recall rates. A higher NMS threshold results in fewer detections suppressed and, consequently, in a higher recall rate, as well as lower precision and accuracy. In this paper, we have explored a new pedestrian detection pipeline which is more flexible, able to adapt to different scenarios and with improved precision and accuracy. A higher NMS threshold is used to retain all true detections and achieve a high recall rate for different scenarios, and a Pedestrian Similarity Extraction (PSE) algorithm is used to remove redundant detentions, consequently improving counting accuracy. The PSE algorithm significantly reduces the detection accuracy volatility and its dependency on NMS thresholds, improving the mean detection accuracy for different input datasets.
{"title":"Pedestrian Similarity Extraction to Improve People Counting Accuracy","authors":"Xu Yang, J. Gaspar, W. Ke, C. Lam, Yanwei Zheng, W. Lou, Yapeng Wang","doi":"10.5220/0007381605480555","DOIUrl":"https://doi.org/10.5220/0007381605480555","url":null,"abstract":"Current state-of-the-art single shot object detection pipelines, composed by an object detector such as Yolo, generate multiple detections for each object, requiring a post-processing Non-Maxima Suppression (NMS) algorithm to remove redundant detections. However, this pipeline struggles to achieve high accuracy, particularly in object counting applications, due to a trade-off between precision and recall rates. A higher NMS threshold results in fewer detections suppressed and, consequently, in a higher recall rate, as well as lower precision and accuracy. In this paper, we have explored a new pedestrian detection pipeline which is more flexible, able to adapt to different scenarios and with improved precision and accuracy. A higher NMS threshold is used to retain all true detections and achieve a high recall rate for different scenarios, and a Pedestrian Similarity Extraction (PSE) algorithm is used to remove redundant detentions, consequently improving counting accuracy. The PSE algorithm significantly reduces the detection accuracy volatility and its dependency on NMS thresholds, improving the mean detection accuracy for different input datasets.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115447434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}