Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery

G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, G. Sargent, R. Sicre, G. Gravier
{"title":"Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery","authors":"G. B. Fonseca, I. Freire, Zenilton K. G. Patrocínio, S. Guimarães, G. Sargent, R. Sicre, G. Gravier","doi":"10.1145/3095713.3095729","DOIUrl":null,"url":null,"abstract":"The indexing of broadcast TV archives is a current problem in multimedia research. As the size of these databases grows continuously, meaningful features are needed to describe and connect their elements efficiently, such as the identification of speaking faces. In this context, this paper focuses on two approaches for unsupervised person discovery. Initial tagging of speaking faces is provided by an OCR-based method, and these tags propagate through a graph model based on audiovisual relations between speaking faces. Two propagation methods are proposed, one based on random walks and the other based on a hierarchical approach. To better evaluate their performances, these methods were compared with two graph clustering baselines. We also study the impact of different modality fusions on the graph-based tag propagation scenario. From a quantitative analysis, we observed that the graph propagation techniques always outperform the baselines. Among all compared strategies, the methods based on hierarchical propagation with late fusion and random walk with score-fusion obtained the highest MAP values. Finally, even though these two methods produce highly equivalent results according to Kappa coefficient, the random walk method performs better according to a paired t-test, and the computing time for the hierarchical propagation is more than 4 times lower than the one for the random walk propagation.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3095713.3095729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The indexing of broadcast TV archives is a current problem in multimedia research. As the size of these databases grows continuously, meaningful features are needed to describe and connect their elements efficiently, such as the identification of speaking faces. In this context, this paper focuses on two approaches for unsupervised person discovery. Initial tagging of speaking faces is provided by an OCR-based method, and these tags propagate through a graph model based on audiovisual relations between speaking faces. Two propagation methods are proposed, one based on random walks and the other based on a hierarchical approach. To better evaluate their performances, these methods were compared with two graph clustering baselines. We also study the impact of different modality fusions on the graph-based tag propagation scenario. From a quantitative analysis, we observed that the graph propagation techniques always outperform the baselines. Among all compared strategies, the methods based on hierarchical propagation with late fusion and random walk with score-fusion obtained the highest MAP values. Finally, even though these two methods produce highly equivalent results according to Kappa coefficient, the random walk method performs better according to a paired t-test, and the computing time for the hierarchical propagation is more than 4 times lower than the one for the random walk propagation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向多模态人物发现的说话人脸图中的标签传播方法
广播电视档案的标引是多媒体研究中的一个热点问题。随着数据库规模的不断增长,需要有意义的特征来有效地描述和连接数据库中的元素,如说话面孔的识别。在此背景下,本文重点讨论了两种无监督人员发现的方法。通过基于ocr的方法对说话面孔进行初始标记,这些标记通过基于说话面孔之间视听关系的图形模型进行传播。提出了两种传播方法,一种是基于随机行走的方法,另一种是基于分层方法的方法。为了更好地评估它们的性能,将这些方法与两个图聚类基线进行了比较。我们还研究了不同的模态融合对基于图的标签传播场景的影响。从定量分析中,我们观察到图传播技术总是优于基线。在所有比较策略中,基于后期融合的分层传播和基于分数融合的随机行走策略获得了最高的MAP值。最后,尽管这两种方法根据Kappa系数产生了高度等效的结果,但根据配对t检验,随机漫步方法表现更好,分层传播的计算时间比随机漫步传播的计算时间低4倍以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery A free Web API for single and multi-document summarization Visualizing weakly-Annotated Multi-label Mayan Inscriptions with Supervised t-SNE Prediction of User Demographics from Music Listening Habits Detecting adversarial example attacks to deep neural networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1