{"title":"Coarse-to-Fine Loss Based On Viterbi Algorithm for Weakly Supervised Action Segmentation","authors":"Longshuai Sheng, Ce Li, Yihan Tian","doi":"10.1109/CONF-SPML54095.2021.00009","DOIUrl":null,"url":null,"abstract":"Weakly supervised action segmentation has been extensively studied to get the category and start time of actions that occur in videos, but it remains an unsolved issue because of lacking great annotation data in video analysis. To handle this issue, weakly supervised action segmentation only uses the action annotation on the whole sequence in a long video instead of specific labeling of each frame, which greatly reduces the difficulty of obtaining video datasets. However, the task remains challenging for the complex temporal length partition of actions in the videos. In this paper, we make use of the Viterbi algorithm to generate an initial action segmentation as the baseline and then design a new coarse-to-fine loss function to refine the length partition and learn the scores of valid and invalid segmentation routes respectively. The new coarse-to-fine loss is learned in the pipeline to reduce the weight of invalid segmentation routes and obtain the best video segmentation. Comparing with the state-of-the-art (SOTA) methods, the experiments on the breakfast and 50 salads datasets show that our fine partition model and coarse-to-fine loss function can be used to obtain higher frame accuracy and significantly reduce the time spent for action segmentation.","PeriodicalId":415094,"journal":{"name":"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONF-SPML54095.2021.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Weakly supervised action segmentation has been extensively studied to get the category and start time of actions that occur in videos, but it remains an unsolved issue because of lacking great annotation data in video analysis. To handle this issue, weakly supervised action segmentation only uses the action annotation on the whole sequence in a long video instead of specific labeling of each frame, which greatly reduces the difficulty of obtaining video datasets. However, the task remains challenging for the complex temporal length partition of actions in the videos. In this paper, we make use of the Viterbi algorithm to generate an initial action segmentation as the baseline and then design a new coarse-to-fine loss function to refine the length partition and learn the scores of valid and invalid segmentation routes respectively. The new coarse-to-fine loss is learned in the pipeline to reduce the weight of invalid segmentation routes and obtain the best video segmentation. Comparing with the state-of-the-art (SOTA) methods, the experiments on the breakfast and 50 salads datasets show that our fine partition model and coarse-to-fine loss function can be used to obtain higher frame accuracy and significantly reduce the time spent for action segmentation.