{"title":"当日记遇上生活日志视频","authors":"Min Gao, Jiande Sun, En Yu, Xiao Dong, Jing Li","doi":"10.1109/ISPACS.2017.8266516","DOIUrl":null,"url":null,"abstract":"As the increasing quantities of personal data is collected by individuals, the number of lifelog video is increasing. People make microblogging in the form of the text, later, in form of the text with pictures or videos. In this paper, a cross-media lifelog video retrieval approach is proposed to automatically match the corresponding lifelog video clip from a long lifelog video according to diary description(Fig.2). This model consists of a video captioning model and a text retrieval model. We train an encoder-decoder architecture to effectively learn video captioning by MSVD and MSR-VTT datasets. We use the similarity judgment to achieve the retrieval of the text. The similarity is measured by measuring the cosine distance between the two vectors. We experiment on some participants' lifelog videos and diaries. This approach is evaluated by investigating participants' satisfaction with results of lifelog video selected, the results show most of the testers were satisfied with the results.","PeriodicalId":166414,"journal":{"name":"2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"61-62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"When diary meets lifelog video\",\"authors\":\"Min Gao, Jiande Sun, En Yu, Xiao Dong, Jing Li\",\"doi\":\"10.1109/ISPACS.2017.8266516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the increasing quantities of personal data is collected by individuals, the number of lifelog video is increasing. People make microblogging in the form of the text, later, in form of the text with pictures or videos. In this paper, a cross-media lifelog video retrieval approach is proposed to automatically match the corresponding lifelog video clip from a long lifelog video according to diary description(Fig.2). This model consists of a video captioning model and a text retrieval model. We train an encoder-decoder architecture to effectively learn video captioning by MSVD and MSR-VTT datasets. We use the similarity judgment to achieve the retrieval of the text. The similarity is measured by measuring the cosine distance between the two vectors. We experiment on some participants' lifelog videos and diaries. This approach is evaluated by investigating participants' satisfaction with results of lifelog video selected, the results show most of the testers were satisfied with the results.\",\"PeriodicalId\":166414,\"journal\":{\"name\":\"2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)\",\"volume\":\"61-62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPACS.2017.8266516\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS.2017.8266516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
As the increasing quantities of personal data is collected by individuals, the number of lifelog video is increasing. People make microblogging in the form of the text, later, in form of the text with pictures or videos. In this paper, a cross-media lifelog video retrieval approach is proposed to automatically match the corresponding lifelog video clip from a long lifelog video according to diary description(Fig.2). This model consists of a video captioning model and a text retrieval model. We train an encoder-decoder architecture to effectively learn video captioning by MSVD and MSR-VTT datasets. We use the similarity judgment to achieve the retrieval of the text. The similarity is measured by measuring the cosine distance between the two vectors. We experiment on some participants' lifelog videos and diaries. This approach is evaluated by investigating participants' satisfaction with results of lifelog video selected, the results show most of the testers were satisfied with the results.