首页 > 最新文献

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing最新文献

英文 中文
A system for assisted transcription and annotation of ancient documents 辅助抄写和注释古代文献的系统
María José Castro Bleda, J. M. Vilar, D. Llorens, A. Marzal, F. Prat, Francisco Zamora-Martínez
Computer assisted transcription tools can speed up the process of reading and transcribing texts. At the same time, new annotation tools open new ways of accessing the text in its graphical form. STATE, an assisted transcription system for ancient documents, offers a multimodal interaction environment to assist humans in transcribing documents: the user can type, write on the screen or utter a word. When one of these actions is used to correct an erroneous word, the system uses this new information to look for other mistakes. The system is modular: creation of projects from a set of images of documents, an automatic transcription system, and user interaction with the transcriptions to easily correct them as needed. This division of labor allows great flexibility for organizing the work in a team of transcribers. Our immediate goals are to improve the recognition system and to enrich the obtained transcriptions with scholarly descriptions.
计算机辅助转录工具可以加快阅读和转录文本的过程。同时,新的注释工具开辟了以图形形式访问文本的新途径。STATE是一个古代文献的辅助转录系统,它提供了一个多模式的交互环境来帮助人类转录文献:用户可以打字、在屏幕上写字或说话。当其中一个操作被用来纠正一个错误的单词时,系统使用这个新信息来查找其他错误。该系统是模块化的:从一组文档图像创建项目,自动转录系统,以及用户与转录的交互,以便根据需要轻松纠正它们。这种分工使得在一个转录团队中组织工作具有很大的灵活性。我们的直接目标是改进识别系统,并以学术描述丰富获得的转录。
{"title":"A system for assisted transcription and annotation of ancient documents","authors":"María José Castro Bleda, J. M. Vilar, D. Llorens, A. Marzal, F. Prat, Francisco Zamora-Martínez","doi":"10.1145/3095713.3095752","DOIUrl":"https://doi.org/10.1145/3095713.3095752","url":null,"abstract":"Computer assisted transcription tools can speed up the process of reading and transcribing texts. At the same time, new annotation tools open new ways of accessing the text in its graphical form. STATE, an assisted transcription system for ancient documents, offers a multimodal interaction environment to assist humans in transcribing documents: the user can type, write on the screen or utter a word. When one of these actions is used to correct an erroneous word, the system uses this new information to look for other mistakes. The system is modular: creation of projects from a set of images of documents, an automatic transcription system, and user interaction with the transcriptions to easily correct them as needed. This division of labor allows great flexibility for organizing the work in a team of transcribers. Our immediate goals are to improve the recognition system and to enrich the obtained transcriptions with scholarly descriptions.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122341074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Cartoon Colorization Based on Convolutional Neural Network 基于卷积神经网络的卡通自动上色
D. Varga, C. Szabó, T. Szirányi
This paper deals with automatic cartoon colorization. This is a hard issue, since it is an ill-posed problem that usually requires user intervention to achieve high quality. Motivated by the recent successes in natural image colorization based on deep learning techniques, we investigate the colorization problem at the cartoon domain using Convolutional Neural Network. To our best knowledge, no existing papers or research studies address this problem using deep learning techniques. Here we investigate a deep Convolutional Neural Network based automatic color filling method for cartoons.
本文主要研究卡通自动上色问题。这是一个困难的问题,因为它是一个病态的问题,通常需要用户干预才能达到高质量。受近年来基于深度学习技术的自然图像着色成功的启发,我们使用卷积神经网络研究了卡通领域的着色问题。据我们所知,目前还没有论文或研究使用深度学习技术来解决这个问题。本文研究了一种基于深度卷积神经网络的卡通色彩自动填充方法。
{"title":"Automatic Cartoon Colorization Based on Convolutional Neural Network","authors":"D. Varga, C. Szabó, T. Szirányi","doi":"10.1145/3095713.3095742","DOIUrl":"https://doi.org/10.1145/3095713.3095742","url":null,"abstract":"This paper deals with automatic cartoon colorization. This is a hard issue, since it is an ill-posed problem that usually requires user intervention to achieve high quality. Motivated by the recent successes in natural image colorization based on deep learning techniques, we investigate the colorization problem at the cartoon domain using Convolutional Neural Network. To our best knowledge, no existing papers or research studies address this problem using deep learning techniques. Here we investigate a deep Convolutional Neural Network based automatic color filling method for cartoons.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125334383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Prediction of User Demographics from Music Listening Habits 从音乐收听习惯预测用户人口统计
Thomas Krismayer, M. Schedl, Peter Knees, Rick Rabiser
Online activities such as social networking, shopping, and consuming multi-media create digital traces often used to improve user experience and increase revenue, e.g., through better-fitting recommendations and targeted marketing. We investigate to which extent the music listening habits of users of the social music platform Last.fm can be used to predict their age, gender, and nationality. We propose a TF-IDF-like feature modeling approach for artist listening information and artist tags combined with additionally extracted features. We show that we can substantially outperform a baseline majority voting approach and can compete with existing approaches. Further, regarding prediction accuracy vs. available listening data we show that even one single listening event per user is enough to outperform the baseline in all prediction tasks. We conclude that personal information can be derived from music listening information, which indeed can help better tailoring recommendations.
社交网络、购物和多媒体消费等在线活动创造了数字痕迹,通常用于改善用户体验和增加收入,例如通过更合适的推荐和有针对性的营销。最后,我们调查了社交音乐平台用户的音乐听习惯在多大程度上。FM可以用来预测他们的年龄、性别和国籍。我们提出了一种类似tf - idf的特征建模方法,用于艺术家聆听信息和艺术家标签,并结合额外提取的特征。我们表明,我们可以大大优于基准多数投票方法,并可以与现有方法竞争。此外,关于预测精度与可用的监听数据,我们表明,即使每个用户有一个监听事件,也足以在所有预测任务中超越基线。我们的结论是,个人信息可以从音乐收听信息中获得,这确实有助于更好地定制推荐。
{"title":"Prediction of User Demographics from Music Listening Habits","authors":"Thomas Krismayer, M. Schedl, Peter Knees, Rick Rabiser","doi":"10.1145/3095713.3095722","DOIUrl":"https://doi.org/10.1145/3095713.3095722","url":null,"abstract":"Online activities such as social networking, shopping, and consuming multi-media create digital traces often used to improve user experience and increase revenue, e.g., through better-fitting recommendations and targeted marketing. We investigate to which extent the music listening habits of users of the social music platform Last.fm can be used to predict their age, gender, and nationality. We propose a TF-IDF-like feature modeling approach for artist listening information and artist tags combined with additionally extracted features. We show that we can substantially outperform a baseline majority voting approach and can compete with existing approaches. Further, regarding prediction accuracy vs. available listening data we show that even one single listening event per user is enough to outperform the baseline in all prediction tasks. We conclude that personal information can be derived from music listening information, which indeed can help better tailoring recommendations.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116138954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Improving Hierarchical Image Classification with Merged CNN Architectures 用合并CNN架构改进分层图像分类
Anuvabh Dutt, D. Pellerin, G. Quénot
We consider the problem of image classification using deep convolutional networks, with respect to hierarchical relationships among classes. We investigate if the semantic hierarchy is captured by CNN models or not. For this we analyze the confidence of the model for a category and its sub-categories. Based on the results, we propose an algorithm for improving the model performance at test time by adapting the classifier to each test sample and without any re-training. Secondly, we propose a strategy for merging models for jointly learning two levels of hierarchy. This reduces the total training time as compared to training models separately, and also gives improved classification performance.
我们考虑使用深度卷积网络的图像分类问题,考虑类之间的层次关系。我们研究了语义层次是否被CNN模型捕获。为此,我们分析了一个类别及其子类别的模型置信度。在此基础上,我们提出了一种算法,通过使分类器适应每个测试样本而不进行任何重新训练来提高模型在测试时的性能。其次,我们提出了一种合并模型的策略,用于两层层次结构的联合学习。与单独训练模型相比,这减少了总训练时间,并且还提高了分类性能。
{"title":"Improving Hierarchical Image Classification with Merged CNN Architectures","authors":"Anuvabh Dutt, D. Pellerin, G. Quénot","doi":"10.1145/3095713.3095745","DOIUrl":"https://doi.org/10.1145/3095713.3095745","url":null,"abstract":"We consider the problem of image classification using deep convolutional networks, with respect to hierarchical relationships among classes. We investigate if the semantic hierarchy is captured by CNN models or not. For this we analyze the confidence of the model for a category and its sub-categories. Based on the results, we propose an algorithm for improving the model performance at test time by adapting the classifier to each test sample and without any re-training. Secondly, we propose a strategy for merging models for jointly learning two levels of hierarchy. This reduces the total training time as compared to training models separately, and also gives improved classification performance.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126809995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
CoMo: A Compact Composite Moment-Based Descriptor for Image Retrieval CoMo:一种用于图像检索的紧凑复合矩描述符
S. A. Vassou, N. Anagnostopoulos, A. Amanatiadis, Klitos Christodoulou, S. Chatzichristofis
Low level features play a vital role in image retrieval. Image moments can effectively represent global information of image content while being invariant under translation, rotation, and scaling. This paper briefly presents a moment based composite and compact low-level descriptor for image retrieval. In order to test the proposed feature, the authors employ the Bag-of-Visual-Words representation to perform experiments on two well-known benchmarking image databases. The robust and highly competitive retrieval performances, reported in all tested diverse collections, verify the promising potential that the proposed descriptor introduces.
低层次特征在图像检索中起着至关重要的作用。图像矩可以有效地表示图像内容的全局信息,并且在平移、旋转和缩放下保持不变。本文简要地提出了一种基于矩量的复合压缩低级描述符。为了测试所提出的特征,作者采用视觉词袋表示在两个知名的基准图像数据库上进行实验。在所有测试的不同集合中报告的鲁棒性和高度竞争性检索性能,验证了所提出的描述符引入的有希望的潜力。
{"title":"CoMo: A Compact Composite Moment-Based Descriptor for Image Retrieval","authors":"S. A. Vassou, N. Anagnostopoulos, A. Amanatiadis, Klitos Christodoulou, S. Chatzichristofis","doi":"10.1145/3095713.3095744","DOIUrl":"https://doi.org/10.1145/3095713.3095744","url":null,"abstract":"Low level features play a vital role in image retrieval. Image moments can effectively represent global information of image content while being invariant under translation, rotation, and scaling. This paper briefly presents a moment based composite and compact low-level descriptor for image retrieval. In order to test the proposed feature, the authors employ the Bag-of-Visual-Words representation to perform experiments on two well-known benchmarking image databases. The robust and highly competitive retrieval performances, reported in all tested diverse collections, verify the promising potential that the proposed descriptor introduces.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121308870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
NeuralStory: an Interactive Multimedia System for Video Indexing and Re-use NeuralStory:一个用于视频索引和再利用的交互式多媒体系统
L. Baraldi, C. Grana, R. Cucchiara
In the last years video has been swamping the Internet: websites, social networks, and business multimedia systems are adopting video as the most important form of communication and information. Video are normally accessed as a whole and are not indexed in the visual content. Thus, they are often uploaded as short, manually cut clips with user-provided annotations, keywords and tags for retrieval. In this paper, we propose a prototype multimedia system which addresses these two limitations: it overcomes the need of human intervention in the video setting, thanks to fully deep learning-based solutions, and decomposes the storytelling structure of the video into coherent parts. These parts can be shots, key-frames, scenes and semantically related stories, and are exploited to provide an automatic annotation of the visual content, so that parts of video can be easily retrieved. This also allows a principled re-use of the video itself: users of the platform can indeed produce new storytelling by means of multi-modal presentations, add text and other media, and propose a different visual organization of the content. We present the overall solution, and some experiments on the re-use capability of our platform in edutainment by conducting an extensive user valuation
在过去的几年里,视频已经席卷了互联网:网站、社交网络和商业多媒体系统正在采用视频作为最重要的通信和信息形式。视频通常是作为一个整体访问的,而不是在视觉内容中索引。因此,它们通常被上传为简短的、手动剪辑的片段,并附有用户提供的注释、关键字和标签,以供检索。在本文中,我们提出了一个原型多媒体系统,解决了这两个限制:它克服了对视频设置中人为干预的需要,这要归功于完全基于深度学习的解决方案,并将视频的故事结构分解为连贯的部分。这些部分可以是镜头、关键帧、场景和语义相关的故事,并用于提供视觉内容的自动注释,以便可以轻松检索视频的部分。这也允许对视频本身进行有原则的重用:平台的用户确实可以通过多模态呈现方式产生新的故事,添加文本和其他媒体,并提出不同的内容视觉组织。我们提出了整体解决方案,并通过进行广泛的用户评估,对我们的教育娱乐平台的重用能力进行了一些实验
{"title":"NeuralStory: an Interactive Multimedia System for Video Indexing and Re-use","authors":"L. Baraldi, C. Grana, R. Cucchiara","doi":"10.1145/3095713.3095735","DOIUrl":"https://doi.org/10.1145/3095713.3095735","url":null,"abstract":"In the last years video has been swamping the Internet: websites, social networks, and business multimedia systems are adopting video as the most important form of communication and information. Video are normally accessed as a whole and are not indexed in the visual content. Thus, they are often uploaded as short, manually cut clips with user-provided annotations, keywords and tags for retrieval. In this paper, we propose a prototype multimedia system which addresses these two limitations: it overcomes the need of human intervention in the video setting, thanks to fully deep learning-based solutions, and decomposes the storytelling structure of the video into coherent parts. These parts can be shots, key-frames, scenes and semantically related stories, and are exploited to provide an automatic annotation of the visual content, so that parts of video can be easily retrieved. This also allows a principled re-use of the video itself: users of the platform can indeed produce new storytelling by means of multi-modal presentations, add text and other media, and propose a different visual organization of the content. We present the overall solution, and some experiments on the re-use capability of our platform in edutainment by conducting an extensive user valuation","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122163347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Separating the Wheat from the Chaff: Events Detection in Twitter Data 从谷壳中分离小麦:Twitter数据中的事件检测
Andrea Ferracani, Daniele Pezzatini, Lea Landucci, Giuseppe Becchi, A. Bimbo
In this paper we present a system for the detection and validation of macro and micro-events in cities (e.g. concerts, business meetings, car accidents) through the analysis of geolocalized messages from Twitter. A simple but effective method is proposed for unknown event detection designed to alleviate computational issues in traditional approaches. The method is exploited by a web interface that in addition to visualizing the results of the automatic computation exposes interactive tools to inspect, validate the data and refine the processing pipeline. Researchers can exploit the web application for the rapid creation of macro and micro-events datasets of geolocalized messages currently unavailable and needed to improve supervised and unsupervised events classification on Twitter. The system has been evaluated in terms of precision.
在本文中,我们提出了一个系统,通过分析来自Twitter的地理定位消息来检测和验证城市中的宏观和微观事件(例如音乐会,商务会议,车祸)。针对传统未知事件检测方法的计算问题,提出了一种简单有效的未知事件检测方法。该方法通过一个web界面来实现,除了将自动计算的结果可视化外,还提供了交互式工具来检查、验证数据并改进处理管道。研究人员可以利用web应用程序快速创建地理定位消息的宏观和微事件数据集,这些数据集目前不可用,需要改进Twitter上的监督和非监督事件分类。该系统已在精度方面进行了评估。
{"title":"Separating the Wheat from the Chaff: Events Detection in Twitter Data","authors":"Andrea Ferracani, Daniele Pezzatini, Lea Landucci, Giuseppe Becchi, A. Bimbo","doi":"10.1145/3095713.3095728","DOIUrl":"https://doi.org/10.1145/3095713.3095728","url":null,"abstract":"In this paper we present a system for the detection and validation of macro and micro-events in cities (e.g. concerts, business meetings, car accidents) through the analysis of geolocalized messages from Twitter. A simple but effective method is proposed for unknown event detection designed to alleviate computational issues in traditional approaches. The method is exploited by a web interface that in addition to visualizing the results of the automatic computation exposes interactive tools to inspect, validate the data and refine the processing pipeline. Researchers can exploit the web application for the rapid creation of macro and micro-events datasets of geolocalized messages currently unavailable and needed to improve supervised and unsupervised events classification on Twitter. The system has been evaluated in terms of precision.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131015453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A free Web API for single and multi-document summarization 一个免费的Web API,用于单个和多个文档摘要
Massimo Mauro, Sergio Benini, N. Adami, A. Signoroni, R. Leonardi, Luca Canini
In this work we present a free Web API for single and multi-text summarization. The summarization algorithm follows an extractive approach, thus selecting the most relevant sentences from a single document or a document set. It integrates in a novel pipeline different text analysis techniques - ranging from keyword and entity extraction, to topic modelling and sentence clustering - and gives SoA competitive results. The application, written in Python, supports as input both plain texts and Web URLs. The API is publicly accessible for free using the specific conference token1 as described in the reference page2. The browser-based demo version, for summarization of single documents only, is publicly accessible at http://yonderlabs.com/demo.
在这项工作中,我们提供了一个免费的Web API,用于单文本和多文本摘要。摘要算法采用提取方法,从单个文档或文档集中选择最相关的句子。它将不同的文本分析技术(从关键字和实体提取,到主题建模和句子聚类)集成在一个新颖的管道中,并给出了具有竞争力的SoA结果。该应用程序是用Python编写的,支持纯文本和Web url作为输入。该API可以使用参考页面2中描述的特定会议令牌1免费公开访问。基于浏览器的演示版本(仅用于单个文档的摘要)可在http://yonderlabs.com/demo上公开访问。
{"title":"A free Web API for single and multi-document summarization","authors":"Massimo Mauro, Sergio Benini, N. Adami, A. Signoroni, R. Leonardi, Luca Canini","doi":"10.1145/3095713.3095738","DOIUrl":"https://doi.org/10.1145/3095713.3095738","url":null,"abstract":"In this work we present a free Web API for single and multi-text summarization. The summarization algorithm follows an extractive approach, thus selecting the most relevant sentences from a single document or a document set. It integrates in a novel pipeline different text analysis techniques - ranging from keyword and entity extraction, to topic modelling and sentence clustering - and gives SoA competitive results. The application, written in Python, supports as input both plain texts and Web URLs. The API is publicly accessible for free using the specific conference token1 as described in the reference page2. The browser-based demo version, for summarization of single documents only, is publicly accessible at http://yonderlabs.com/demo.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115066742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bangladeshi Number Plate Detection: Cascade Learning vs. Deep Learning 孟加拉国车牌检测:级联学习与深度学习
M. Pias, Aunnoy K. Mutasim, M. Amin
This work investigated two different machine learning techniques: Cascade Learning and Deep Learning, to find out which algorithm performs better to detect the number plate of vehicles registered in Bangladesh. To do this, we created a dataset of about 1000 images collected from a security camera of Independent University, Bangladesh. Each image in the dataset were then labelled manually by selecting the Region of Interest (ROI). In the Cascade Learning approach, a sliding window technique was used to detect objects. Then a cascade classifier was employed to determine if the window contained object of interest or not. In the Deep Learning approach, CIFAR-10 dataset was used to pre-train a 15-layer Convolutional Neural Network (CNN). Using this pretrained CNN, a Regions with CNN (R-CNN) was then trained using our dataset. We found that the Deep Learning approach (maximum accuracy 99.60% using 566 training images) outperforms the detector constructed using Cascade classifiers (maximum accuracy 59.52% using 566 positive and 1022 negative training images) for 252 test images.
这项工作调查了两种不同的机器学习技术:级联学习和深度学习,以找出哪种算法在检测孟加拉国注册的车辆号牌方面表现更好。为此,我们创建了一个数据集,其中包括从孟加拉国独立大学的安全摄像头收集的大约1000张图像。然后通过选择感兴趣区域(ROI)手动标记数据集中的每个图像。在级联学习方法中,使用滑动窗口技术来检测目标。然后使用级联分类器来确定窗口是否包含感兴趣的对象。在深度学习方法中,使用CIFAR-10数据集预训练一个15层卷积神经网络(CNN)。使用这个预训练的CNN,然后使用我们的数据集训练一个带有CNN的区域(R-CNN)。我们发现深度学习方法(使用566张训练图像的最大准确率为99.60%)在252张测试图像上优于使用级联分类器构建的检测器(使用566张正训练图像和1022张负训练图像的最大准确率为59.52%)。
{"title":"Bangladeshi Number Plate Detection: Cascade Learning vs. Deep Learning","authors":"M. Pias, Aunnoy K. Mutasim, M. Amin","doi":"10.1145/3095713.3095727","DOIUrl":"https://doi.org/10.1145/3095713.3095727","url":null,"abstract":"This work investigated two different machine learning techniques: Cascade Learning and Deep Learning, to find out which algorithm performs better to detect the number plate of vehicles registered in Bangladesh. To do this, we created a dataset of about 1000 images collected from a security camera of Independent University, Bangladesh. Each image in the dataset were then labelled manually by selecting the Region of Interest (ROI). In the Cascade Learning approach, a sliding window technique was used to detect objects. Then a cascade classifier was employed to determine if the window contained object of interest or not. In the Deep Learning approach, CIFAR-10 dataset was used to pre-train a 15-layer Convolutional Neural Network (CNN). Using this pretrained CNN, a Regions with CNN (R-CNN) was then trained using our dataset. We found that the Deep Learning approach (maximum accuracy 99.60% using 566 training images) outperforms the detector constructed using Cascade classifiers (maximum accuracy 59.52% using 566 positive and 1022 negative training images) for 252 test images.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127337448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Visualizing weakly-Annotated Multi-label Mayan Inscriptions with Supervised t-SNE 基于监督t-SNE的弱标注多标签玛雅铭文可视化
E. Román-Rangel, S. Marchand-Maillet
We present a supervised dimensionality reduction technique suitable for visualizing multi-label images on a 2-D space. This method extends the use of the well-known t-distributed stochastic embedding (t-SNE) algorithm to the case of multi-labels instances, where the concept of partial relevance plays an important role. Furthermore, it is applicable straightaway for weakly annotated data. We apply our approach to generate 2-D representations of Mayan glyph-blocks, which are groups of individual glyph-signs expressing full sentences. The resulting representations are used to place visual instances in a 2-D space with the purpose of providing a browsable catalog for further epigraphic studies, where nearby instances are similar both in semantic and visual terms. We evaluate the performance of our approach quantitatively by performing classification and retrieval experiments. Our results show that this approach obtains high performance in both of these tasks.
我们提出了一种适用于在二维空间上可视化多标签图像的监督降维技术。该方法将众所周知的t分布随机嵌入(t-SNE)算法的使用扩展到多标签实例的情况,其中部分相关的概念起着重要作用。此外,它直接适用于带弱注释的数据。我们应用我们的方法来生成玛雅象形块的二维表示,这是一组表达完整句子的单个象形符号。结果表示用于将可视化实例放置在二维空间中,目的是为进一步的铭文研究提供可浏览的目录,其中附近的实例在语义和视觉方面都是相似的。我们通过执行分类和检索实验来定量评估我们的方法的性能。我们的结果表明,这种方法在这两个任务中都获得了很高的性能。
{"title":"Visualizing weakly-Annotated Multi-label Mayan Inscriptions with Supervised t-SNE","authors":"E. Román-Rangel, S. Marchand-Maillet","doi":"10.1145/3095713.3095720","DOIUrl":"https://doi.org/10.1145/3095713.3095720","url":null,"abstract":"We present a supervised dimensionality reduction technique suitable for visualizing multi-label images on a 2-D space. This method extends the use of the well-known t-distributed stochastic embedding (t-SNE) algorithm to the case of multi-labels instances, where the concept of partial relevance plays an important role. Furthermore, it is applicable straightaway for weakly annotated data. We apply our approach to generate 2-D representations of Mayan glyph-blocks, which are groups of individual glyph-signs expressing full sentences. The resulting representations are used to place visual instances in a 2-D space with the purpose of providing a browsable catalog for further epigraphic studies, where nearby instances are similar both in semantic and visual terms. We evaluate the performance of our approach quantitatively by performing classification and retrieval experiments. Our results show that this approach obtains high performance in both of these tasks.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115521032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1