ACM Transactions on Interactive Intelligent Systems最新文献_第8页

Visualization and Visual Analytics Approaches for Image and Video Datasets: A Survey 图像和视频数据集的可视化和可视化分析方法:综述

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2023-01-02 DOI: 10.1145/3576935

S. Afzal, Sohaib Ghani, M. Hittawe, Sheikh Faisal Rashid, O. Knio, M. Hadwiger, I. Hoteit

Image and video data analysis has become an increasingly important research area with applications in different domains such as security surveillance, healthcare, augmented and virtual reality, video and image editing, activity analysis and recognition, synthetic content generation, distance education, telepresence, remote sensing, sports analytics, art, non-photorealistic rendering, search engines, and social media. Recent advances in Artificial Intelligence (AI) and particularly deep learning have sparked new research challenges and led to significant advancements, especially in image and video analysis. These advancements have also resulted in significant research and development in other areas such as visualization and visual analytics, and have created new opportunities for future lines of research. In this survey article, we present the current state of the art at the intersection of visualization and visual analytics, and image and video data analysis. We categorize the visualization articles included in our survey based on different taxonomies used in visualization and visual analytics research. We review these articles in terms of task requirements, tools, datasets, and application areas. We also discuss insights based on our survey results, trends and patterns, the current focus of visualization research, and opportunities for future research.

图像和视频数据分析已经成为一个越来越重要的研究领域，应用于不同的领域，如安全监控、医疗保健、增强现实和虚拟现实、视频和图像编辑、活动分析和识别、合成内容生成、远程教育、远程呈现、遥感、体育分析、艺术、非真实感渲染、搜索引擎和社交媒体。人工智能(AI)特别是深度学习的最新进展引发了新的研究挑战，并导致了重大进步，特别是在图像和视频分析方面。这些进步也导致了可视化和可视化分析等其他领域的重大研究和发展，并为未来的研究领域创造了新的机会。在这篇调查文章中，我们介绍了可视化和视觉分析以及图像和视频数据分析交叉领域的最新技术。我们根据可视化和可视化分析研究中使用的不同分类法对调查中包含的可视化文章进行分类。我们从任务需求、工具、数据集和应用领域的角度来回顾这些文章。我们还讨论了基于我们的调查结果、趋势和模式、当前可视化研究的焦点以及未来研究的机会的见解。

{"title":"Visualization and Visual Analytics Approaches for Image and Video Datasets: A Survey","authors":"S. Afzal, Sohaib Ghani, M. Hittawe, Sheikh Faisal Rashid, O. Knio, M. Hadwiger, I. Hoteit","doi":"10.1145/3576935","DOIUrl":"https://doi.org/10.1145/3576935","url":null,"abstract":"Image and video data analysis has become an increasingly important research area with applications in different domains such as security surveillance, healthcare, augmented and virtual reality, video and image editing, activity analysis and recognition, synthetic content generation, distance education, telepresence, remote sensing, sports analytics, art, non-photorealistic rendering, search engines, and social media. Recent advances in Artificial Intelligence (AI) and particularly deep learning have sparked new research challenges and led to significant advancements, especially in image and video analysis. These advancements have also resulted in significant research and development in other areas such as visualization and visual analytics, and have created new opportunities for future lines of research. In this survey article, we present the current state of the art at the intersection of visualization and visual analytics, and image and video data analysis. We categorize the visualization articles included in our survey based on different taxonomies used in visualization and visual analytics research. We review these articles in terms of task requirements, tools, datasets, and application areas. We also discuss insights based on our survey results, trends and patterns, the current focus of visualization research, and opportunities for future research.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"29 1","pages":"1 - 41"},"PeriodicalIF":3.4,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80970703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Special Issue on Highlights of IUI 2021: Introduction 2021年IUI亮点特刊:导论

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-12-31 DOI: 10.1145/3561516

T. Hammond, Bart P. Knijnenburg, J. O’Donovan, Paul Taele

degree of illocution results in the generation of more usable explanations. The authors evaluated their hypothesis on two

言外不通的程度导致产生更多可用的解释。作者用两点来评估他们的假设

引用次数: 0

Learning and Understanding User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes 从具有多模态和位置属性的异构网络中学习和理解用户界面语义

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-12-23 DOI: https://dl.acm.org/doi/10.1145/3578522

Gary Ang, Ee-Peng Lim

User interfaces (UI) of desktop, web, and mobile applications involve a hierarchy of objects (e.g., applications, screens, view class, and other types of design objects) with multimodal (e.g., textual, visual) and positional (e.g., spatial location, sequence order and hierarchy level) attributes. We can therefore represent a set of application UIs as a heterogeneous network with multimodal and positional attributes. Such a network not only represents how users understand the visual layout of UIs, but also influences how users would interact with applications through these UIs. To model the UI semantics well for different UI annotation, search, and evaluation tasks, this paper proposes the novel Heterogeneous Attention-based Multimodal Positional (HAMP) graph neural network model. HAMP combines graph neural networks with the scaled dot-product attention used in transformers to learn the embeddings of heterogeneous nodes and associated multimodal and positional attributes in a unified manner. HAMP is evaluated with classification and regression tasks conducted on three distinct real-world datasets. Our experiments demonstrate that HAMP significantly out-performs other state-of-the-art models on such tasks. To further provide interpretations of the contribution of heterogeneous network information for understanding the relationships between the UI structure and prediction tasks, we propose Adaptive HAMP (AHAMP), which adaptively learns the importance of different edges linking different UI objects. Our experiments demonstrate AHAMP’s superior performance over HAMP on a number of tasks, and its ability to provide interpretations of the contribution of multimodal and positional attributes, as well as heterogeneous network information to different tasks.

桌面、web和移动应用程序的用户界面(UI)涉及具有多模态(例如，文本、视觉)和位置(例如，空间位置、序列顺序和层次级别)属性的对象层次(例如，应用程序、屏幕、视图类和其他类型的设计对象)。因此，我们可以将一组应用程序ui表示为具有多模态和位置属性的异构网络。这样的网络不仅代表了用户如何理解ui的视觉布局，而且还影响了用户如何通过这些ui与应用程序交互。为了更好地为不同的用户界面标注、搜索和评估任务建模，本文提出了一种基于异构注意的多模态位置(HAMP)图神经网络模型。HAMP将图神经网络与变压器中使用的尺度点积注意相结合，以统一的方式学习异构节点的嵌入以及相关的多模态和位置属性。HAMP通过对三个不同的真实世界数据集进行分类和回归任务来评估。我们的实验表明，HAMP在这类任务上的表现明显优于其他最先进的模型。为了进一步解释异构网络信息的贡献，以理解UI结构和预测任务之间的关系，我们提出了自适应HAMP (AHAMP)，它自适应地学习连接不同UI对象的不同边的重要性。我们的实验证明了AHAMP在许多任务上优于HAMP，并且它能够提供对多模态和位置属性的贡献的解释，以及对不同任务的异构网络信息。

{"title":"Learning and Understanding User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes","authors":"Gary Ang, Ee-Peng Lim","doi":"https://dl.acm.org/doi/10.1145/3578522","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3578522","url":null,"abstract":"User interfaces (UI) of desktop, web, and mobile applications involve a hierarchy of objects (e.g., applications, screens, view class, and other types of design objects) with multimodal (e.g., textual, visual) and positional (e.g., spatial location, sequence order and hierarchy level) attributes. We can therefore represent a set of application UIs as a heterogeneous network with multimodal and positional attributes. Such a network not only represents how users understand the visual layout of UIs, but also influences how users would interact with applications through these UIs. To model the UI semantics well for different UI annotation, search, and evaluation tasks, this paper proposes the novel Heterogeneous Attention-based Multimodal Positional (HAMP) graph neural network model. HAMP combines graph neural networks with the scaled dot-product attention used in transformers to learn the embeddings of heterogeneous nodes and associated multimodal and positional attributes in a unified manner. HAMP is evaluated with classification and regression tasks conducted on three distinct real-world datasets. Our experiments demonstrate that HAMP significantly out-performs other state-of-the-art models on such tasks. To further provide interpretations of the contribution of heterogeneous network information for understanding the relationships between the UI structure and prediction tasks, we propose Adaptive HAMP (AHAMP), which adaptively learns the importance of different edges linking different UI objects. Our experiments demonstrate AHAMP’s superior performance over HAMP on a number of tasks, and its ability to provide interpretations of the contribution of multimodal and positional attributes, as well as heterogeneous network information to different tasks.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"58 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning and Understanding User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes 从具有多模态和位置属性的异构网络中学习和理解用户界面语义

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-12-23 DOI: 10.1145/3578522

Gary (Ming) Ang, Ee-Peng Lim

User interfaces (UI) of desktop, web, and mobile applications involve a hierarchy of objects (e.g., applications, screens, view class, and other types of design objects) with multimodal (e.g., textual and visual) and positional (e.g., spatial location, sequence order, and hierarchy level) attributes. We can therefore represent a set of application UIs as a heterogeneous network with multimodal and positional attributes. Such a network not only represents how users understand the visual layout of UIs but also influences how users would interact with applications through these UIs. To model the UI semantics well for different UI annotation, search, and evaluation tasks, this article proposes the novel Heterogeneous Attention-based Multimodal Positional (HAMP) graph neural network model. HAMP combines graph neural networks with the scaled dot-product attention used in transformers to learn the embeddings of heterogeneous nodes and associated multimodal and positional attributes in a unified manner. HAMP is evaluated with classification and regression tasks conducted on three distinct real-world datasets. Our experiments demonstrate that HAMP significantly out-performs other state-of-the-art models on such tasks. To further provide interpretations of the contribution of heterogeneous network information for understanding the relationships between the UI structure and prediction tasks, we propose Adaptive HAMP (AHAMP), which adaptively learns the importance of different edges linking different UI objects. Our experiments demonstrate AHAMP’s superior performance over HAMP on a number of tasks, and its ability to provide interpretations of the contribution of multimodal and positional attributes, as well as heterogeneous network information to different tasks.

桌面、web和移动应用程序的用户界面(UI)涉及具有多模态(例如，文本和视觉)和位置(例如，空间位置、序列顺序和层次级别)属性的对象层次(例如，应用程序、屏幕、视图类和其他类型的设计对象)。因此，我们可以将一组应用程序ui表示为具有多模态和位置属性的异构网络。这样的网络不仅表示用户如何理解ui的可视化布局，而且还影响用户如何通过这些ui与应用程序进行交互。为了更好地为不同的用户界面标注、搜索和评估任务建模，本文提出了一种基于异构注意的多模态位置(HAMP)图神经网络模型。HAMP将图神经网络与变压器中使用的尺度点积注意相结合，以统一的方式学习异构节点的嵌入以及相关的多模态和位置属性。HAMP通过对三个不同的真实世界数据集进行分类和回归任务来评估。我们的实验表明，HAMP在这类任务上的表现明显优于其他最先进的模型。为了进一步解释异构网络信息的贡献，以理解UI结构和预测任务之间的关系，我们提出了自适应HAMP (AHAMP)，它自适应地学习连接不同UI对象的不同边的重要性。我们的实验证明了AHAMP在许多任务上优于HAMP，并且它能够提供对多模态和位置属性的贡献的解释，以及对不同任务的异构网络信息。

{"title":"Learning and Understanding User Interface Semantics from Heterogeneous Networks with Multimodal and Positional Attributes","authors":"Gary (Ming) Ang, Ee-Peng Lim","doi":"10.1145/3578522","DOIUrl":"https://doi.org/10.1145/3578522","url":null,"abstract":"User interfaces (UI) of desktop, web, and mobile applications involve a hierarchy of objects (e.g., applications, screens, view class, and other types of design objects) with multimodal (e.g., textual and visual) and positional (e.g., spatial location, sequence order, and hierarchy level) attributes. We can therefore represent a set of application UIs as a heterogeneous network with multimodal and positional attributes. Such a network not only represents how users understand the visual layout of UIs but also influences how users would interact with applications through these UIs. To model the UI semantics well for different UI annotation, search, and evaluation tasks, this article proposes the novel Heterogeneous Attention-based Multimodal Positional (HAMP) graph neural network model. HAMP combines graph neural networks with the scaled dot-product attention used in transformers to learn the embeddings of heterogeneous nodes and associated multimodal and positional attributes in a unified manner. HAMP is evaluated with classification and regression tasks conducted on three distinct real-world datasets. Our experiments demonstrate that HAMP significantly out-performs other state-of-the-art models on such tasks. To further provide interpretations of the contribution of heterogeneous network information for understanding the relationships between the UI structure and prediction tasks, we propose Adaptive HAMP (AHAMP), which adaptively learns the importance of different edges linking different UI objects. Our experiments demonstrate AHAMP’s superior performance over HAMP on a number of tasks, and its ability to provide interpretations of the contribution of multimodal and positional attributes, as well as heterogeneous network information to different tasks.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"14 1","pages":"1 - 31"},"PeriodicalIF":3.4,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82768043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Detection and Recognition of Driver Distraction Using Multimodal Signals 基于多模态信号的驾驶员分心检测与识别

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-12-12 DOI: https://dl.acm.org/doi/10.1145/3519267

Kapotaksha Das, Michalis Papakostas, Kais Riani, Andrew Gasiorowski, Mohamed Abouelenien, Mihai Burzo, Rada Mihalcea

Distracted driving is a leading cause of accidents worldwide. The tasks of distraction detection and recognition have been traditionally addressed as computer vision problems. However, distracted behaviors are not always expressed in a visually observable way. In this work, we introduce a novel multimodal dataset of distracted driver behaviors, consisting of data collected using twelve information channels coming from visual, acoustic, near-infrared, thermal, physiological and linguistic modalities. The data were collected from 45 subjects while being exposed to four different distractions (three cognitive and one physical). For the purposes of this paper, we performed experiments with visual, physiological, and thermal information to explore potential of multimodal modeling for distraction recognition. In addition, we analyze the value of different modalities by identifying specific visual, physiological, and thermal groups of features that contribute the most to distraction characterization. Our results highlight the advantage of multimodal representations and reveal valuable insights for the role played by the three modalities on identifying different types of driving distractions.

分心驾驶是世界范围内交通事故的主要原因。分心检测和识别的任务传统上被认为是计算机视觉问题。然而，分心的行为并不总是以视觉上可观察的方式表达。在这项工作中，我们引入了一个新的多模态驾驶行为数据集，包括使用视觉、声学、近红外、热、生理和语言等12个信息通道收集的数据。这些数据是从45名受试者中收集的，他们暴露在四种不同的干扰中(三种认知干扰，一种身体干扰)。为了达到本文的目的，我们进行了视觉、生理和热信息的实验，以探索多模态建模在分心识别中的潜力。此外，我们通过识别对分心特征贡献最大的特定视觉、生理和热特征组来分析不同模式的价值。我们的研究结果强调了多模态表征的优势，并揭示了三种模式在识别不同类型的驾驶干扰方面所起的作用。

{"title":"Detection and Recognition of Driver Distraction Using Multimodal Signals","authors":"Kapotaksha Das, Michalis Papakostas, Kais Riani, Andrew Gasiorowski, Mohamed Abouelenien, Mihai Burzo, Rada Mihalcea","doi":"https://dl.acm.org/doi/10.1145/3519267","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3519267","url":null,"abstract":"Distracted driving is a leading cause of accidents worldwide. The tasks of distraction detection and recognition have been traditionally addressed as computer vision problems. However, distracted behaviors are not always expressed in a visually observable way. In this work, we introduce a novel multimodal dataset of distracted driver behaviors, consisting of data collected using twelve information channels coming from visual, acoustic, near-infrared, thermal, physiological and linguistic modalities. The data were collected from 45 subjects while being exposed to four different distractions (three cognitive and one physical). For the purposes of this paper, we performed experiments with visual, physiological, and thermal information to explore potential of multimodal modeling for distraction recognition. In addition, we analyze the value of different modalities by identifying specific visual, physiological, and thermal groups of features that contribute the most to distraction characterization. Our results highlight the advantage of multimodal representations and reveal valuable insights for the role played by the three modalities on identifying different types of driving distractions.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"56 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Importance of User Backgrounds and Impressions: Lessons Learned from Interactive AI Applications 关于用户背景和印象的重要性:从交互式人工智能应用中学到的经验教训

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-12-12 DOI: https://dl.acm.org/doi/10.1145/3531066

Mahsan Nourani, Chiradeep Roy, Jeremy E. Block, Donald R. Honeycutt, Tahrima Rahman, Eric D. Ragan, Vibhav Gogate

While EXplainable Artificial Intelligence (XAI) approaches aim to improve human-AI collaborative decision-making by improving model transparency and mental model formations, experiential factors associated with human users can cause challenges in ways system designers do not anticipate. In this article, we first showcase a user study on how anchoring bias can potentially affect mental model formations when users initially interact with an intelligent system and the role of explanations in addressing this bias. Using a video activity recognition tool in cooking domain, we asked participants to verify whether a set of kitchen policies are being followed, with each policy focusing on a weakness or a strength. We controlled the order of the policies and the presence of explanations to test our hypotheses. Our main finding shows that those who observed system strengths early on were more prone to automation bias and made significantly more errors due to positive first impressions of the system, while they built a more accurate mental model of the system competencies. However, those who encountered weaknesses earlier made significantly fewer errors, since they tended to rely more on themselves, while they also underestimated model competencies due to having a more negative first impression of the model. Motivated by these findings and similar existing work, we formalize and present a conceptual model of user’s past experiences that examine the relations between user’s backgrounds, experiences, and human factors in XAI systems based on usage time. Our work presents strong findings and implications, aiming to raise the awareness of AI designers toward biases associated with user impressions and backgrounds.

虽然可解释人工智能(XAI)方法旨在通过提高模型透明度和心理模型的形成来改善人类与人工智能的协作决策，但与人类用户相关的经验因素可能会以系统设计者没有预料到的方式带来挑战。在本文中，我们首先展示了一项用户研究，即当用户最初与智能系统交互时，锚定偏见如何潜在地影响心理模型的形成，以及解释在解决这种偏见中的作用。使用烹饪领域的视频活动识别工具，我们要求参与者验证是否遵循了一套厨房政策，每个政策都侧重于一个弱点或一个优势。我们控制了政策的顺序和解释的出现来检验我们的假设。我们的主要发现表明，那些早期观察到系统优势的人更容易产生自动化偏见，并且由于对系统的积极第一印象而犯了更多的错误，而他们对系统能力建立了更准确的心理模型。然而，那些更早遇到弱点的人犯的错误明显更少，因为他们倾向于更多地依靠自己，同时他们也低估了模型能力，因为他们对模型有更负面的第一印象。受这些发现和类似现有工作的启发，我们形式化并提出了一个用户过去体验的概念模型，该模型基于使用时间检查XAI系统中用户背景、体验和人为因素之间的关系。我们的工作提出了强有力的发现和启示，旨在提高人工智能设计师对与用户印象和背景相关的偏见的认识。

{"title":"On the Importance of User Backgrounds and Impressions: Lessons Learned from Interactive AI Applications","authors":"Mahsan Nourani, Chiradeep Roy, Jeremy E. Block, Donald R. Honeycutt, Tahrima Rahman, Eric D. Ragan, Vibhav Gogate","doi":"https://dl.acm.org/doi/10.1145/3531066","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3531066","url":null,"abstract":"While EXplainable Artificial Intelligence (XAI) approaches aim to improve human-AI collaborative decision-making by improving model transparency and mental model formations, experiential factors associated with human users can cause challenges in ways system designers do not anticipate. In this article, we first showcase a user study on how anchoring bias can potentially affect mental model formations when users initially interact with an intelligent system and the role of explanations in addressing this bias. Using a video activity recognition tool in cooking domain, we asked participants to verify whether a set of kitchen policies are being followed, with each policy focusing on a weakness or a strength. We controlled the order of the policies and the presence of explanations to test our hypotheses. Our main finding shows that those who observed system strengths early on were more prone to automation bias and made significantly more errors due to positive first impressions of the system, while they built a more accurate mental model of the system competencies. However, those who encountered weaknesses earlier made significantly fewer errors, since they tended to rely more on themselves, while they also underestimated model competencies due to having a more negative first impression of the model. Motivated by these findings and similar existing work, we formalize and present a conceptual model of user’s past experiences that examine the relations between user’s backgrounds, experiences, and human factors in XAI systems based on usage time. Our work presents strong findings and implications, aiming to raise the awareness of AI designers toward biases associated with user impressions and backgrounds.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"54 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detection and Recognition of Driver Distraction Using Multimodal Signals 基于多模态信号的驾驶员分心检测与识别

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-12-12 DOI: 10.1145/3519267

K. Das, Michalis Papakostas, Kais Riani, A. Gasiorowski, M. Abouelenien, Mihai Burzo, Rada Mihalcea

Distracted driving is a leading cause of accidents worldwide. The tasks of distraction detection and recognition have been traditionally addressed as computer vision problems. However, distracted behaviors are not always expressed in a visually observable way. In this work, we introduce a novel multimodal dataset of distracted driver behaviors, consisting of data collected using twelve information channels coming from visual, acoustic, near-infrared, thermal, physiological and linguistic modalities. The data were collected from 45 subjects while being exposed to four different distractions (three cognitive and one physical). For the purposes of this paper, we performed experiments with visual, physiological, and thermal information to explore potential of multimodal modeling for distraction recognition. In addition, we analyze the value of different modalities by identifying specific visual, physiological, and thermal groups of features that contribute the most to distraction characterization. Our results highlight the advantage of multimodal representations and reveal valuable insights for the role played by the three modalities on identifying different types of driving distractions.

分心驾驶是世界范围内交通事故的主要原因。分心检测和识别的任务传统上被认为是计算机视觉问题。然而，分心的行为并不总是以视觉上可观察的方式表达。在这项工作中，我们引入了一个新的多模态驾驶行为数据集，包括使用视觉、声学、近红外、热、生理和语言等12个信息通道收集的数据。这些数据是从45名受试者中收集的，他们暴露在四种不同的干扰中(三种认知干扰，一种身体干扰)。为了达到本文的目的，我们进行了视觉、生理和热信息的实验，以探索多模态建模在分心识别中的潜力。此外，我们通过识别对分心特征贡献最大的特定视觉、生理和热特征组来分析不同模式的价值。我们的研究结果强调了多模态表征的优势，并揭示了三种模式在识别不同类型的驾驶干扰方面所起的作用。

{"title":"Detection and Recognition of Driver Distraction Using Multimodal Signals","authors":"K. Das, Michalis Papakostas, Kais Riani, A. Gasiorowski, M. Abouelenien, Mihai Burzo, Rada Mihalcea","doi":"10.1145/3519267","DOIUrl":"https://doi.org/10.1145/3519267","url":null,"abstract":"Distracted driving is a leading cause of accidents worldwide. The tasks of distraction detection and recognition have been traditionally addressed as computer vision problems. However, distracted behaviors are not always expressed in a visually observable way. In this work, we introduce a novel multimodal dataset of distracted driver behaviors, consisting of data collected using twelve information channels coming from visual, acoustic, near-infrared, thermal, physiological and linguistic modalities. The data were collected from 45 subjects while being exposed to four different distractions (three cognitive and one physical). For the purposes of this paper, we performed experiments with visual, physiological, and thermal information to explore potential of multimodal modeling for distraction recognition. In addition, we analyze the value of different modalities by identifying specific visual, physiological, and thermal groups of features that contribute the most to distraction characterization. Our results highlight the advantage of multimodal representations and reveal valuable insights for the role played by the three modalities on identifying different types of driving distractions.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"39 1","pages":"1 - 28"},"PeriodicalIF":3.4,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88197694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

How to Support Users in Understanding Intelligent Systems? An Analysis and Conceptual Framework of User Questions Considering User Mindsets, Involvement, and Knowledge Outcomes 如何支持用户理解智能系统?考虑用户心态、参与和知识结果的用户问题分析和概念框架

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-11-05 DOI: 10.1145/3519264

D. Buschek, Malin Eiband, H. Hussmann

The opaque nature of many intelligent systems violates established usability principles and thus presents a challenge for human-computer interaction. Research in the field therefore highlights the need for transparency, scrutability, intelligibility, interpretability and explainability, among others. While all of these terms carry a vision of supporting users in understanding intelligent systems, the underlying notions and assumptions about users and their interaction with the system often remain unclear. We review the literature in HCI through the lens of implied user questions to synthesise a conceptual framework integrating user mindsets, user involvement, and knowledge outcomes to reveal, differentiate and classify current notions in prior work. This framework aims to resolve conceptual ambiguity in the field and enables researchers to clarify their assumptions and become aware of those made in prior work. We further discuss related aspects such as stakeholders and trust, and also provide material to apply our framework in practice (e.g., ideation/design sessions). We thus hope to advance and structure the dialogue on supporting users in understanding intelligent systems.

许多智能系统的不透明性质违反了既定的可用性原则，从而对人机交互提出了挑战。因此，该领域的研究强调了透明度、可核查性、可理解性、可解释性和可解释性等方面的需求。虽然所有这些术语都带有支持用户理解智能系统的愿景，但关于用户及其与系统交互的潜在概念和假设通常仍然不清楚。我们通过隐含的用户问题来回顾HCI的文献，以综合一个整合用户心态、用户参与和知识结果的概念框架，以揭示、区分和分类先前工作中的当前概念。该框架旨在解决该领域概念上的歧义，使研究人员能够澄清他们的假设，并意识到之前工作中的假设。我们进一步讨论了相关方面，如利益相关者和信任，并提供了在实践中应用我们的框架的材料(例如，构思/设计会议)。因此，我们希望推进和构建支持用户理解智能系统的对话。

{"title":"How to Support Users in Understanding Intelligent Systems? An Analysis and Conceptual Framework of User Questions Considering User Mindsets, Involvement, and Knowledge Outcomes","authors":"D. Buschek, Malin Eiband, H. Hussmann","doi":"10.1145/3519264","DOIUrl":"https://doi.org/10.1145/3519264","url":null,"abstract":"The opaque nature of many intelligent systems violates established usability principles and thus presents a challenge for human-computer interaction. Research in the field therefore highlights the need for transparency, scrutability, intelligibility, interpretability and explainability, among others. While all of these terms carry a vision of supporting users in understanding intelligent systems, the underlying notions and assumptions about users and their interaction with the system often remain unclear. We review the literature in HCI through the lens of implied user questions to synthesise a conceptual framework integrating user mindsets, user involvement, and knowledge outcomes to reveal, differentiate and classify current notions in prior work. This framework aims to resolve conceptual ambiguity in the field and enables researchers to clarify their assumptions and become aware of those made in prior work. We further discuss related aspects such as stakeholders and trust, and also provide material to apply our framework in practice (e.g., ideation/design sessions). We thus hope to advance and structure the dialogue on supporting users in understanding intelligent systems.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"266 1","pages":"1 - 27"},"PeriodicalIF":3.4,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76776133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Textflow: Toward Supporting Screen-free Manipulation of Situation-Relevant Smart Messages Textflow:朝着支持无屏幕操作情境相关的智能消息

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-11-05 DOI: https://dl.acm.org/doi/10.1145/3519263

Pegah Karimi, Emanuele Plebani, Aqueasha Martin-Hammond, Davide Bolchini

Texting relies on screen-centric prompts designed for sighted users, still posing significant barriers to people who are blind and visually impaired (BVI). Can we re-imagine texting untethered from a visual display? In an interview study, 20 BVI adults shared situations surrounding their texting practices, recurrent topics of conversations, and challenges. Informed by these insights, we introduce TextFlow, a mixed-initiative context-aware system that generates entirely auditory message options relevant to the users’ location, activity, and time of the day. Users can browse and select suggested aural messages using finger-taps supported by an off-the-shelf finger-worn device without having to hold or attend to a mobile screen. In an evaluative study, 10 BVI participants successfully interacted with TextFlow to browse and send messages in screen-free mode. The experiential response of the users shed light on the importance of bypassing the phone and accessing rapidly controllable messages at their fingertips while preserving privacy and accuracy with respect to speech or screen-based input. We discuss how non-visual access to proactive, contextual messaging can support the blind in a variety of daily scenarios.

短信依赖于为视力正常的用户设计的以屏幕为中心的提示，这仍然给盲人和视障人士(BVI)造成了很大的障碍。我们能重新想象不受视觉显示束缚的短信吗?在一项采访研究中，20名英属维尔京群岛成年人分享了他们发短信的情况、经常谈论的话题和面临的挑战。根据这些见解，我们推出了TextFlow，这是一个混合主动的上下文感知系统，可以根据用户的位置、活动和时间生成完全听觉的消息选项。用户可以通过手指点击浏览和选择建议的语音信息，而无需手持或关注手机屏幕。在一项评估性研究中，10名BVI参与者成功地与TextFlow交互，在无屏幕模式下浏览和发送消息。用户的体验反应揭示了绕过手机，用指尖快速获取可控信息的重要性，同时在语音或屏幕输入方面保持隐私和准确性。我们讨论了如何在各种日常场景中为盲人提供主动的、上下文信息的非视觉访问。

{"title":"Textflow: Toward Supporting Screen-free Manipulation of Situation-Relevant Smart Messages","authors":"Pegah Karimi, Emanuele Plebani, Aqueasha Martin-Hammond, Davide Bolchini","doi":"https://dl.acm.org/doi/10.1145/3519263","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3519263","url":null,"abstract":"Texting relies on screen-centric prompts designed for sighted users, still posing significant barriers to people who are blind and visually impaired (BVI). Can we re-imagine texting untethered from a visual display? In an interview study, 20 BVI adults shared situations surrounding their texting practices, recurrent topics of conversations, and challenges. Informed by these insights, we introduce TextFlow, a mixed-initiative context-aware system that generates entirely auditory message options relevant to the users’ location, activity, and time of the day. Users can browse and select suggested aural messages using finger-taps supported by an off-the-shelf finger-worn device without having to hold or attend to a mobile screen. In an evaluative study, 10 BVI participants successfully interacted with TextFlow to browse and send messages in screen-free mode. The experiential response of the users shed light on the importance of bypassing the phone and accessing rapidly controllable messages at their fingertips while preserving privacy and accuracy with respect to speech or screen-based input. We discuss how non-visual access to proactive, contextual messaging can support the blind in a variety of daily scenarios.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"57 5","pages":""},"PeriodicalIF":3.4,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How to Support Users in Understanding Intelligent Systems? An Analysis and Conceptual Framework of User Questions Considering User Mindsets, Involvement, and Knowledge Outcomes 如何支持用户理解智能系统?考虑用户心态、参与和知识结果的用户问题分析和概念框架

IF 3.4 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems

Pub Date : 2022-11-05 DOI: https://dl.acm.org/doi/10.1145/3519264

Daniel Buschek, Malin Eiband, Heinrich Hussmann

The opaque nature of many intelligent systems violates established usability principles and thus presents a challenge for human-computer interaction. Research in the field therefore highlights the need for transparency, scrutability, intelligibility, interpretability and explainability, among others. While all of these terms carry a vision of supporting users in understanding intelligent systems, the underlying notions and assumptions about users and their interaction with the system often remain unclear.

We review the literature in HCI through the lens of implied user questions to synthesise a conceptual framework integrating user mindsets, user involvement, and knowledge outcomes to reveal, differentiate and classify current notions in prior work. This framework aims to resolve conceptual ambiguity in the field and enables researchers to clarify their assumptions and become aware of those made in prior work. We further discuss related aspects such as stakeholders and trust, and also provide material to apply our framework in practice (e.g., ideation/design sessions). We thus hope to advance and structure the dialogue on supporting users in understanding intelligent systems.

许多智能系统的不透明性质违反了既定的可用性原则，从而对人机交互提出了挑战。因此，该领域的研究强调了透明度、可核查性、可理解性、可解释性和可解释性等方面的需求。虽然所有这些术语都带有支持用户理解智能系统的愿景，但关于用户及其与系统交互的潜在概念和假设通常仍然不清楚。我们通过隐含的用户问题来回顾HCI的文献，以综合一个整合用户心态、用户参与和知识结果的概念框架，以揭示、区分和分类先前工作中的当前概念。该框架旨在解决该领域概念上的歧义，使研究人员能够澄清他们的假设，并意识到之前工作中的假设。我们进一步讨论了相关方面，如利益相关者和信任，并提供了在实践中应用我们的框架的材料(例如，构思/设计会议)。因此，我们希望推进和构建支持用户理解智能系统的对话。

{"title":"How to Support Users in Understanding Intelligent Systems? An Analysis and Conceptual Framework of User Questions Considering User Mindsets, Involvement, and Knowledge Outcomes","authors":"Daniel Buschek, Malin Eiband, Heinrich Hussmann","doi":"https://dl.acm.org/doi/10.1145/3519264","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3519264","url":null,"abstract":"The opaque nature of many intelligent systems violates established usability principles and thus presents a challenge for human-computer interaction. Research in the field therefore highlights the need for transparency, scrutability, intelligibility, interpretability and explainability, among others. While all of these terms carry a vision of supporting users in understanding intelligent systems, the underlying notions and assumptions about users and their interaction with the system often remain unclear. We review the literature in HCI through the lens of implied user questions to synthesise a conceptual framework integrating user mindsets, user involvement, and knowledge outcomes to reveal, differentiate and classify current notions in prior work. This framework aims to resolve conceptual ambiguity in the field and enables researchers to clarify their assumptions and become aware of those made in prior work. We further discuss related aspects such as stakeholders and trust, and also provide material to apply our framework in practice (e.g., ideation/design sessions). We thus hope to advance and structure the dialogue on supporting users in understanding intelligent systems.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"57 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0