{"title":"Video text rediscovery: Predicting and tracking text across complex scenes","authors":"Veronica Naosekpam, Nilkanta Sahu","doi":"10.1111/coin.12686","DOIUrl":null,"url":null,"abstract":"<p>Dynamic texts in scene videos provide valuable insights and semantic cues crucial for video applications. However, the movement of this text presents unique challenges, such as blur, shifts, and blockages. While efficient in tracking text, state-of-the-art systems often need help when text becomes obscured or complicated scenes. This study introduces a novel method for detecting and tracking video text, specifically designed to predict the location of obscured or occluded text in subsequent frames using a tracking-by-detection paradigm. Our approach begins with a primary detector to identify text within individual frames, thus enhancing tracking accuracy. Using the Kalman filter, Munkres algorithm, and deep visual features, we establish connections between text instances across frames. Our technique works on the concept that when text goes missing in a frame due to obstructions, we use its previous speed and location to predict its next position. Experiments conducted on the ICDAR2013 Video and ICDAR2015 Video datasets confirm our method's efficacy, matching or surpassing established methods in performance.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 3","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12686","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Dynamic texts in scene videos provide valuable insights and semantic cues crucial for video applications. However, the movement of this text presents unique challenges, such as blur, shifts, and blockages. While efficient in tracking text, state-of-the-art systems often need help when text becomes obscured or complicated scenes. This study introduces a novel method for detecting and tracking video text, specifically designed to predict the location of obscured or occluded text in subsequent frames using a tracking-by-detection paradigm. Our approach begins with a primary detector to identify text within individual frames, thus enhancing tracking accuracy. Using the Kalman filter, Munkres algorithm, and deep visual features, we establish connections between text instances across frames. Our technique works on the concept that when text goes missing in a frame due to obstructions, we use its previous speed and location to predict its next position. Experiments conducted on the ICDAR2013 Video and ICDAR2015 Video datasets confirm our method's efficacy, matching or surpassing established methods in performance.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.