{"title":"Learning, Understanding and Interaction in Videos","authors":"Manmohan Chandraker","doi":"10.1145/3552463.3555837","DOIUrl":null,"url":null,"abstract":"Advances in mobile phone camera technologies and internet connectivity have made videos one of the most intuitive ways to communicate and share experiences. Millions of cameras deployed in our homes, offices and public spaces record videos for purposes ranging across safety, assistance, entertainment and many others. This talk describes some of our recent progress in learning, understanding and interaction with such digital media. It will introduce methods in unsupervised and self-supervised representation learning that allow video solutions to be efficiently deployed with minimal data curation. It will discuss how physical priors or human knowledge are leveraged to understand insights in videos ranging from three-dimensional scene properties to language-based descriptions. It will also illustrate how these insights allow us to augment or interact with digital media with unprecedented photorealism and ease.","PeriodicalId":293267,"journal":{"name":"Proceedings of the 1st Workshop on User-centric Narrative Summarization of Long Videos","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on User-centric Narrative Summarization of Long Videos","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3552463.3555837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Advances in mobile phone camera technologies and internet connectivity have made videos one of the most intuitive ways to communicate and share experiences. Millions of cameras deployed in our homes, offices and public spaces record videos for purposes ranging across safety, assistance, entertainment and many others. This talk describes some of our recent progress in learning, understanding and interaction with such digital media. It will introduce methods in unsupervised and self-supervised representation learning that allow video solutions to be efficiently deployed with minimal data curation. It will discuss how physical priors or human knowledge are leveraged to understand insights in videos ranging from three-dimensional scene properties to language-based descriptions. It will also illustrate how these insights allow us to augment or interact with digital media with unprecedented photorealism and ease.