{"title":"Tree-managed network ensembles for video prediction","authors":"Everett Fall, Kai-Wei Chang, Liang-Gee Chen","doi":"10.1007/s00138-024-01575-7","DOIUrl":null,"url":null,"abstract":"<p>This paper presents an innovative approach that leverages a tree structure to effectively manage a large ensemble of neural networks for tackling complex video prediction tasks. Our proposed method introduces a novel technique for partitioning the function domain into simpler subsets, enabling piecewise learning by the ensemble. Seamlessly accessed by an accompanying tree structure with a time complexity of O(log(N)), this ensemble-tree framework progressively expands while training examples become more complex. The tree construction process incorporates a specialized algorithm that utilizes localized comparison functions, learned at each decision node. To evaluate the effectiveness of our method, we conducted experiments in two challenging scenarios: action-conditional video prediction in a 3D video game environment and error detection in real-world 3D printing scenarios. Our approach consistently outperformed existing methods by a significant margin across various experiments. Additionally, we introduce a new evaluation methodology for long-term video prediction tasks, which demonstrates improved alignment with qualitative observations. The results highlight the efficacy and superiority of our ensemble-tree approach in addressing complex video prediction challenges.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"29 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01575-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents an innovative approach that leverages a tree structure to effectively manage a large ensemble of neural networks for tackling complex video prediction tasks. Our proposed method introduces a novel technique for partitioning the function domain into simpler subsets, enabling piecewise learning by the ensemble. Seamlessly accessed by an accompanying tree structure with a time complexity of O(log(N)), this ensemble-tree framework progressively expands while training examples become more complex. The tree construction process incorporates a specialized algorithm that utilizes localized comparison functions, learned at each decision node. To evaluate the effectiveness of our method, we conducted experiments in two challenging scenarios: action-conditional video prediction in a 3D video game environment and error detection in real-world 3D printing scenarios. Our approach consistently outperformed existing methods by a significant margin across various experiments. Additionally, we introduce a new evaluation methodology for long-term video prediction tasks, which demonstrates improved alignment with qualitative observations. The results highlight the efficacy and superiority of our ensemble-tree approach in addressing complex video prediction challenges.
期刊介绍:
Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal.
Particular emphasis is placed on engineering and technology aspects of image processing and computer vision.
The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.