Youjun Luo, Andy Sun, Feng Wang, R. Shea, Jiangchuan Liu
{"title":"RealSync:用于实时通信应用的同步多模态媒体流分析框架","authors":"Youjun Luo, Andy Sun, Feng Wang, R. Shea, Jiangchuan Liu","doi":"10.1109/GLOBECOM42002.2020.9322361","DOIUrl":null,"url":null,"abstract":"While advancements in computing algorithms and hardware have enabled real-time stream analytics in videos, the information-rich audio bonded with video is still usually dropped and wasted. Processing both audio and video stream is not trivial as synchronizing multiple data streams creates much more difficulties than processing only one stream. In this paper, we designed and implemented a lightweight multimodal stream processing system that keeps both streams synchronized in processing and tested in a typical use case profanity filter. While it is inevitable to slow down certain processors to keep the steams synchronized, by careful butter setting, the overall latency is not affected (still 400-500ms). Besides achieving real-time processing, we located a core problem causing bursty audio latency and gave directions for further latency improvements.","PeriodicalId":12759,"journal":{"name":"GLOBECOM 2020 - 2020 IEEE Global Communications Conference","volume":"44 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RealSync: A Synchronous Multimodality Media Stream Analytic Framework for Real-Time Communications Applications\",\"authors\":\"Youjun Luo, Andy Sun, Feng Wang, R. Shea, Jiangchuan Liu\",\"doi\":\"10.1109/GLOBECOM42002.2020.9322361\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While advancements in computing algorithms and hardware have enabled real-time stream analytics in videos, the information-rich audio bonded with video is still usually dropped and wasted. Processing both audio and video stream is not trivial as synchronizing multiple data streams creates much more difficulties than processing only one stream. In this paper, we designed and implemented a lightweight multimodal stream processing system that keeps both streams synchronized in processing and tested in a typical use case profanity filter. While it is inevitable to slow down certain processors to keep the steams synchronized, by careful butter setting, the overall latency is not affected (still 400-500ms). Besides achieving real-time processing, we located a core problem causing bursty audio latency and gave directions for further latency improvements.\",\"PeriodicalId\":12759,\"journal\":{\"name\":\"GLOBECOM 2020 - 2020 IEEE Global Communications Conference\",\"volume\":\"44 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GLOBECOM 2020 - 2020 IEEE Global Communications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GLOBECOM42002.2020.9322361\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GLOBECOM 2020 - 2020 IEEE Global Communications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GLOBECOM42002.2020.9322361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RealSync: A Synchronous Multimodality Media Stream Analytic Framework for Real-Time Communications Applications
While advancements in computing algorithms and hardware have enabled real-time stream analytics in videos, the information-rich audio bonded with video is still usually dropped and wasted. Processing both audio and video stream is not trivial as synchronizing multiple data streams creates much more difficulties than processing only one stream. In this paper, we designed and implemented a lightweight multimodal stream processing system that keeps both streams synchronized in processing and tested in a typical use case profanity filter. While it is inevitable to slow down certain processors to keep the steams synchronized, by careful butter setting, the overall latency is not affected (still 400-500ms). Besides achieving real-time processing, we located a core problem causing bursty audio latency and gave directions for further latency improvements.