基于讲师行为的兴趣区域自动预测

2022 14th International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2022-10-19 DOI:10.1109/KSE56063.2022.9953765

Yuhui Yang, Koichi Ota, Wen Gu, S. Hasegawa

{"title":"基于讲师行为的兴趣区域自动预测","authors":"Yuhui Yang, Koichi Ota, Wen Gu, S. Hasegawa","doi":"10.1109/KSE56063.2022.9953765","DOIUrl":null,"url":null,"abstract":"This research proposes an automatic region of interest (ROI) prediction architecture with a deep neural network for estimating the learners’ ROI from instructor’s behaviors in lecture archives to generate ROI zoomed videos to fit smaller screens like smart devices. To achieve this goal, we first created a dataset of ROIs from learners’ gaze data in watching the archives and generated 16,039 ROI labels after clustering and smoothing with K-means algorithm based on the gaze point data obtained for the one-second segmented videos. Next, we extracted the instructor’s behaviors as feature maps from the segment video, considering the Frame Difference, Optical Flow, OpenPose, and temporal information. We then composed an Encoder-Decoder architecture that combined U-Net and Resnet with these behavioral features as input to build a deep neural network model for predicting ROI. Through the experiment, the agreement between the ROI labels and the predicted regions was evaluated by Dice loss using each feature map and improved from 0.9 in a single image as a baseline to 0.4 in Openpose and temporal features. The positive potential was obtained from automatic content generation for smart devices through the ROI prediction with the instructor’s behaviors.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automatic Region of Interest Prediction from Instructor’s Behaviors in Lecture Archives\",\"authors\":\"Yuhui Yang, Koichi Ota, Wen Gu, S. Hasegawa\",\"doi\":\"10.1109/KSE56063.2022.9953765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research proposes an automatic region of interest (ROI) prediction architecture with a deep neural network for estimating the learners’ ROI from instructor’s behaviors in lecture archives to generate ROI zoomed videos to fit smaller screens like smart devices. To achieve this goal, we first created a dataset of ROIs from learners’ gaze data in watching the archives and generated 16,039 ROI labels after clustering and smoothing with K-means algorithm based on the gaze point data obtained for the one-second segmented videos. Next, we extracted the instructor’s behaviors as feature maps from the segment video, considering the Frame Difference, Optical Flow, OpenPose, and temporal information. We then composed an Encoder-Decoder architecture that combined U-Net and Resnet with these behavioral features as input to build a deep neural network model for predicting ROI. Through the experiment, the agreement between the ROI labels and the predicted regions was evaluated by Dice loss using each feature map and improved from 0.9 in a single image as a baseline to 0.4 in Openpose and temporal features. The positive potential was obtained from automatic content generation for smart devices through the ROI prediction with the instructor’s behaviors.\",\"PeriodicalId\":330865,\"journal\":{\"name\":\"2022 14th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"196 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE56063.2022.9953765\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本研究提出了一种基于深度神经网络的感兴趣区域(ROI)自动预测架构，用于从授课档案中讲师的行为中估计学习者的ROI，以生成适合智能设备等小屏幕的ROI缩放视频。为了实现这一目标，我们首先从学习者观看档案的注视点数据中创建了一个ROI数据集，并基于1秒分割视频获得的注视点数据，通过K-means算法进行聚类和平滑，生成了16039个ROI标签。接下来，我们从片段视频中提取教练的行为作为特征映射，考虑帧差、光流、OpenPose和时间信息。然后，我们组合了一个编码器-解码器架构，将U-Net和Resnet与这些行为特征结合起来作为输入，构建一个用于预测ROI的深度神经网络模型。通过实验，利用每个特征图的Dice loss来评估ROI标签与预测区域之间的一致性，并将其从单幅图像的0.9作为基线提高到Openpose和时态特征的0.4。通过对讲师行为的ROI预测，获得智能设备内容自动生成的正电位。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automatic Region of Interest Prediction from Instructor’s Behaviors in Lecture Archives

This research proposes an automatic region of interest (ROI) prediction architecture with a deep neural network for estimating the learners’ ROI from instructor’s behaviors in lecture archives to generate ROI zoomed videos to fit smaller screens like smart devices. To achieve this goal, we first created a dataset of ROIs from learners’ gaze data in watching the archives and generated 16,039 ROI labels after clustering and smoothing with K-means algorithm based on the gaze point data obtained for the one-second segmented videos. Next, we extracted the instructor’s behaviors as feature maps from the segment video, considering the Frame Difference, Optical Flow, OpenPose, and temporal information. We then composed an Encoder-Decoder architecture that combined U-Net and Resnet with these behavioral features as input to build a deep neural network model for predicting ROI. Through the experiment, the agreement between the ROI labels and the predicted regions was evaluated by Dice loss using each feature map and improved from 0.9 in a single image as a baseline to 0.4 in Openpose and temporal features. The positive potential was obtained from automatic content generation for smart devices through the ROI prediction with the instructor’s behaviors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 14th International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量

期刊最新文献

DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples Polygenic risk scores adaptation for Height in a Vietnamese population Sentiment Classification for Beauty-fashion Reviews An Automated Stub Method for Unit Testing C/C++ Projects Knowledge-based Problem Solving and Reasoning methods