The paper explores the intersection of AI art and blindness, as existing AI research has primarily focused on AI art's reception and impact, on sighted artists and consumers. To address this gap, the researcher interviewed six blind artists from various visual art mediums and levels of blindness about the generative AI image platform Midjourney. The participants shared text prompts and discussed their reactions to the generated images with the sighted researcher. The findings highlight blind artists' interest in AI images as a collaborative tool but express concerns about cultural perceptions and labeling of AI-generated art. They also underscore unique challenges, such as potential misunderstandings and stereotypes about blindness leading to exclusion. The study advocates for greater inclusion of blind individuals in AI art, emphasizing the need to address their specific needs and experiences in developing AI art technologies.
{"title":"Exploring Use and Perceptions of Generative AI Art Tools by Blind Artists","authors":"Gayatri Raman, Erin Brady","doi":"arxiv-2409.08226","DOIUrl":"https://doi.org/arxiv-2409.08226","url":null,"abstract":"The paper explores the intersection of AI art and blindness, as existing AI\u0000research has primarily focused on AI art's reception and impact, on sighted\u0000artists and consumers. To address this gap, the researcher interviewed six\u0000blind artists from various visual art mediums and levels of blindness about the\u0000generative AI image platform Midjourney. The participants shared text prompts\u0000and discussed their reactions to the generated images with the sighted\u0000researcher. The findings highlight blind artists' interest in AI images as a\u0000collaborative tool but express concerns about cultural perceptions and labeling\u0000of AI-generated art. They also underscore unique challenges, such as potential\u0000misunderstandings and stereotypes about blindness leading to exclusion. The\u0000study advocates for greater inclusion of blind individuals in AI art,\u0000emphasizing the need to address their specific needs and experiences in\u0000developing AI art technologies.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiahao Nick LiJerry, ZhuohaoJerry, Zhang, Jiaju Ma
People often capture memories through photos, screenshots, and videos. While existing AI-based tools enable querying this data using natural language, they mostly only support retrieving individual pieces of information like certain objects in photos and struggle with answering more complex queries that involve interpreting interconnected memories like event sequences. We conducted a one-month diary study to collect realistic user queries and generated a taxonomy of necessary contextual information for integrating with captured memories. We then introduce OmniQuery, a novel system that is able to answer complex personal memory-related questions that require extracting and inferring contextual information. OmniQuery augments single captured memories through integrating scattered contextual information from multiple interconnected memories, retrieves relevant memories, and uses a large language model (LLM) to comprehensive answers. In human evaluations, we show the effectiveness of OmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG system, winning or tying in 74.5% of the time.
{"title":"OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering","authors":"Jiahao Nick LiJerry, ZhuohaoJerry, Zhang, Jiaju Ma","doi":"arxiv-2409.08250","DOIUrl":"https://doi.org/arxiv-2409.08250","url":null,"abstract":"People often capture memories through photos, screenshots, and videos. While\u0000existing AI-based tools enable querying this data using natural language, they\u0000mostly only support retrieving individual pieces of information like certain\u0000objects in photos and struggle with answering more complex queries that involve\u0000interpreting interconnected memories like event sequences. We conducted a\u0000one-month diary study to collect realistic user queries and generated a\u0000taxonomy of necessary contextual information for integrating with captured\u0000memories. We then introduce OmniQuery, a novel system that is able to answer\u0000complex personal memory-related questions that require extracting and inferring\u0000contextual information. OmniQuery augments single captured memories through\u0000integrating scattered contextual information from multiple interconnected\u0000memories, retrieves relevant memories, and uses a large language model (LLM) to\u0000comprehensive answers. In human evaluations, we show the effectiveness of\u0000OmniQuery with an accuracy of 71.5%, and it outperformed a conventional RAG\u0000system, winning or tying in 74.5% of the time.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijie Yao, Anastasia Bezerianos, Romain Vuillemot, Petra Isenberg
Competitive sports coverage increasingly includes information on athlete or team statistics and records. Sports video coverage has traditionally embedded representations of this data in fixed locations on the screen, but more recently also attached representations to athletes or other targets in motion. These publicly used representations so far have been rather simple and systematic investigations of the research space of embedded visualizations in motion are still missing. Here we report on our preliminary research in the domain of professional and amateur swimming. We analyzed how visualizations are currently added to the coverage of Olympics swimming competitions and then plan to derive a design space for embedded data representations for swimming competitions. We are currently conducting a crowdsourced survey to explore which kind of swimming-related data general audiences are interested in, in order to identify opportunities for additional visualizations to be added to swimming competition coverage.
{"title":"Situated Visualization in Motion for Swimming","authors":"Lijie Yao, Anastasia Bezerianos, Romain Vuillemot, Petra Isenberg","doi":"arxiv-2409.07695","DOIUrl":"https://doi.org/arxiv-2409.07695","url":null,"abstract":"Competitive sports coverage increasingly includes information on athlete or\u0000team statistics and records. Sports video coverage has traditionally embedded\u0000representations of this data in fixed locations on the screen, but more\u0000recently also attached representations to athletes or other targets in motion.\u0000These publicly used representations so far have been rather simple and\u0000systematic investigations of the research space of embedded visualizations in\u0000motion are still missing. Here we report on our preliminary research in the\u0000domain of professional and amateur swimming. We analyzed how visualizations are\u0000currently added to the coverage of Olympics swimming competitions and then plan\u0000to derive a design space for embedded data representations for swimming\u0000competitions. We are currently conducting a crowdsourced survey to explore\u0000which kind of swimming-related data general audiences are interested in, in\u0000order to identify opportunities for additional visualizations to be added to\u0000swimming competition coverage.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"14 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seyda Öney, Moataz Abdelaal, Kuno Kurzhals, Paul Betz, Cordula Kropp, Daniel Weiskopf
Various standardized tests exist that assess individuals' visualization literacy. Their use can help to draw conclusions from studies. However, it is not taken into account that the test itself can create a pressure situation where participants might fear being exposed and assessed negatively. This is especially problematic when testing domain experts in design studies. We conducted interviews with experts from different domains performing the Mini-VLAT test for visualization literacy to identify potential problems. Our participants reported that the time limit per question, ambiguities in the questions and visualizations, and missing steps in the test procedure mainly had an impact on their performance and content. We discuss possible changes to the test design to address these issues and how such assessment methods could be integrated into existing evaluation procedures.
{"title":"Testing the Test: Observations When Assessing Visualization Literacy of Domain Experts","authors":"Seyda Öney, Moataz Abdelaal, Kuno Kurzhals, Paul Betz, Cordula Kropp, Daniel Weiskopf","doi":"arxiv-2409.08101","DOIUrl":"https://doi.org/arxiv-2409.08101","url":null,"abstract":"Various standardized tests exist that assess individuals' visualization\u0000literacy. Their use can help to draw conclusions from studies. However, it is\u0000not taken into account that the test itself can create a pressure situation\u0000where participants might fear being exposed and assessed negatively. This is\u0000especially problematic when testing domain experts in design studies. We\u0000conducted interviews with experts from different domains performing the\u0000Mini-VLAT test for visualization literacy to identify potential problems. Our\u0000participants reported that the time limit per question, ambiguities in the\u0000questions and visualizations, and missing steps in the test procedure mainly\u0000had an impact on their performance and content. We discuss possible changes to\u0000the test design to address these issues and how such assessment methods could\u0000be integrated into existing evaluation procedures.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For the Bio+Med-Vis Challenge 2024, we propose a visual analytics system as a redesign for the scatter pie chart visualization of cell type proportions of spatial transcriptomics data. Our design uses three linked views: a view of the histological image of the tissue, a stacked bar chart showing cell type proportions of the spots, and a scatter plot showing a dimensionality reduction of the multivariate proportions. Furthermore, we apply a compositional data analysis framework, the Aitchison geometry, to the proportions for dimensionality reduction and $k$-means clustering. Leveraging brushing and linking, the system allows one to explore and uncover patterns in the cell type mixtures and relate them to their spatial locations on the cellular tissue. This redesign shifts the pattern recognition workload from the human visual system to computational methods commonly used in visual analytics. We provide the code and setup instructions of our visual analytics system on GitHub (https://github.com/UniStuttgart-VISUS/va-for-spatial-transcriptomics).
{"title":"Visual Compositional Data Analytics for Spatial Transcriptomics","authors":"David Hägele, Yuxuan Tang, Daniel Weiskopf","doi":"arxiv-2409.07306","DOIUrl":"https://doi.org/arxiv-2409.07306","url":null,"abstract":"For the Bio+Med-Vis Challenge 2024, we propose a visual analytics system as a\u0000redesign for the scatter pie chart visualization of cell type proportions of\u0000spatial transcriptomics data. Our design uses three linked views: a view of the\u0000histological image of the tissue, a stacked bar chart showing cell type\u0000proportions of the spots, and a scatter plot showing a dimensionality reduction\u0000of the multivariate proportions. Furthermore, we apply a compositional data\u0000analysis framework, the Aitchison geometry, to the proportions for\u0000dimensionality reduction and $k$-means clustering. Leveraging brushing and\u0000linking, the system allows one to explore and uncover patterns in the cell type\u0000mixtures and relate them to their spatial locations on the cellular tissue.\u0000This redesign shifts the pattern recognition workload from the human visual\u0000system to computational methods commonly used in visual analytics. We provide\u0000the code and setup instructions of our visual analytics system on GitHub\u0000(https://github.com/UniStuttgart-VISUS/va-for-spatial-transcriptomics).","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While personal characteristics influence people's snapshot trust towards autonomous systems, their relationships with trust dynamics remain poorly understood. We conducted a human-subject experiment with 130 participants performing a simulated surveillance task aided by an automated threat detector. A comprehensive pre-experimental survey collected data on participants' personal characteristics across 12 constructs and 28 dimensions. Based on data collected in the experiment, we clustered participants' trust dynamics into three types and assessed differences among the three clusters in terms of personal characteristics, behaviors, performance, and post-experiment ratings. Participants were clustered into three groups, namely Bayesian decision makers, disbelievers, and oscillators. Results showed that the clusters differ significantly in seven personal characteristics: masculinity, positive affect, extraversion, neuroticism, intellect, performance expectancy, and high expectations. The disbelievers tend to have high neuroticism and low performance expectancy. The oscillators tend to have higher scores in masculinity, positive affect, extraversion and intellect. We also found significant differences in the behaviors and post-experiment ratings among the three groups. The disbelievers are the least likely to blindly follow the recommendations made by the automated threat detector. Based on the significant personal characteristics, we developed a decision tree model to predict cluster types with an accuracy of 70%.
{"title":"Trust Dynamics in Human-Autonomy Interaction: Uncover Associations between Trust Dynamics and Personal Characteristics","authors":"Hyesun Chung, X. Jessie Yang","doi":"arxiv-2409.07406","DOIUrl":"https://doi.org/arxiv-2409.07406","url":null,"abstract":"While personal characteristics influence people's snapshot trust towards\u0000autonomous systems, their relationships with trust dynamics remain poorly\u0000understood. We conducted a human-subject experiment with 130 participants\u0000performing a simulated surveillance task aided by an automated threat detector.\u0000A comprehensive pre-experimental survey collected data on participants'\u0000personal characteristics across 12 constructs and 28 dimensions. Based on data\u0000collected in the experiment, we clustered participants' trust dynamics into\u0000three types and assessed differences among the three clusters in terms of\u0000personal characteristics, behaviors, performance, and post-experiment ratings.\u0000Participants were clustered into three groups, namely Bayesian decision makers,\u0000disbelievers, and oscillators. Results showed that the clusters differ\u0000significantly in seven personal characteristics: masculinity, positive affect,\u0000extraversion, neuroticism, intellect, performance expectancy, and high\u0000expectations. The disbelievers tend to have high neuroticism and low\u0000performance expectancy. The oscillators tend to have higher scores in\u0000masculinity, positive affect, extraversion and intellect. We also found\u0000significant differences in the behaviors and post-experiment ratings among the\u0000three groups. The disbelievers are the least likely to blindly follow the\u0000recommendations made by the automated threat detector. Based on the significant\u0000personal characteristics, we developed a decision tree model to predict cluster\u0000types with an accuracy of 70%.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"157 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We contribute a first design space on visualizations in motion and the design of a pilot study we plan to run in the fall. Visualizations can be useful in contexts where either the observation is in motion or the whole visualization is moving at various speeds. Imagine, for example, displays attached to an athlete or animal that show data about the wearer -- for example, captured from a fitness tracking band; or a visualization attached to a moving object such as a vehicle or a soccer ball. The ultimate goal of our research is to inform the design of visualizations under motion.
{"title":"Situated Visualization in Motion","authors":"Lijie Yao, Anastasia Bezerianos, Petra Isenberg","doi":"arxiv-2409.07005","DOIUrl":"https://doi.org/arxiv-2409.07005","url":null,"abstract":"We contribute a first design space on visualizations in motion and the design\u0000of a pilot study we plan to run in the fall. Visualizations can be useful in\u0000contexts where either the observation is in motion or the whole visualization\u0000is moving at various speeds. Imagine, for example, displays attached to an\u0000athlete or animal that show data about the wearer -- for example, captured from\u0000a fitness tracking band; or a visualization attached to a moving object such as\u0000a vehicle or a soccer ball. The ultimate goal of our research is to inform the\u0000design of visualizations under motion.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengxin Hong, Chang Cai, Sixuan Du, Haiyue Feng, Siyuan Liu, Xiuyi Fan
Interactive feedback, where feedback flows in both directions between teacher and student, is more effective than traditional one-way feedback. However, it is often too time-consuming for widespread use in educational practice. While Large Language Models (LLMs) have potential for automating feedback, they struggle with reasoning and interaction in an interactive setting. This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback. CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation. Essays are first assessed by multiple Teaching-Assistant Agents (TA Agents), and then a Teacher Agent aggregates the evaluations through formal reasoning to generate feedback and grades. Students can further engage with the feedback to refine their understanding. A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback, enhancing the reasoning and interaction capabilities of LLMs. This approach offers a promising solution to overcoming the time and resource barriers that have limited the adoption of interactive feedback in educational settings.
{"title":"\"My Grade is Wrong!\": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays","authors":"Shengxin Hong, Chang Cai, Sixuan Du, Haiyue Feng, Siyuan Liu, Xiuyi Fan","doi":"arxiv-2409.07453","DOIUrl":"https://doi.org/arxiv-2409.07453","url":null,"abstract":"Interactive feedback, where feedback flows in both directions between teacher\u0000and student, is more effective than traditional one-way feedback. However, it\u0000is often too time-consuming for widespread use in educational practice. While\u0000Large Language Models (LLMs) have potential for automating feedback, they\u0000struggle with reasoning and interaction in an interactive setting. This paper\u0000introduces CAELF, a Contestable AI Empowered LLM Framework for automating\u0000interactive feedback. CAELF allows students to query, challenge, and clarify\u0000their feedback by integrating a multi-agent system with computational\u0000argumentation. Essays are first assessed by multiple Teaching-Assistant Agents\u0000(TA Agents), and then a Teacher Agent aggregates the evaluations through formal\u0000reasoning to generate feedback and grades. Students can further engage with the\u0000feedback to refine their understanding. A case study on 500 critical thinking\u0000essays with user studies demonstrates that CAELF significantly improves\u0000interactive feedback, enhancing the reasoning and interaction capabilities of\u0000LLMs. This approach offers a promising solution to overcoming the time and\u0000resource barriers that have limited the adoption of interactive feedback in\u0000educational settings.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper revisits the role of quantitative and qualitative methods in visualization research in the context of advancements in artificial intelligence (AI). The focus is on how we can bridge between the different methods in an integrated process of analyzing user study data. To this end, a process model of - potentially iterated - semantic enrichment and transformation of data is proposed. This joint perspective of data and semantics facilitates the integration of quantitative and qualitative methods. The model is motivated by examples of own prior work, especially in the area of eye tracking user studies and coding data-rich observations. Finally, there is a discussion of open issues and research opportunities in the interplay between AI, human analyst, and qualitative and quantitative methods for visualization research.
{"title":"Bridging Quantitative and Qualitative Methods for Visualization Research: A Data/Semantics Perspective in Light of Advanced AI","authors":"Daniel Weiskopf","doi":"arxiv-2409.07250","DOIUrl":"https://doi.org/arxiv-2409.07250","url":null,"abstract":"This paper revisits the role of quantitative and qualitative methods in\u0000visualization research in the context of advancements in artificial\u0000intelligence (AI). The focus is on how we can bridge between the different\u0000methods in an integrated process of analyzing user study data. To this end, a\u0000process model of - potentially iterated - semantic enrichment and\u0000transformation of data is proposed. This joint perspective of data and\u0000semantics facilitates the integration of quantitative and qualitative methods.\u0000The model is motivated by examples of own prior work, especially in the area of\u0000eye tracking user studies and coding data-rich observations. Finally, there is\u0000a discussion of open issues and research opportunities in the interplay between\u0000AI, human analyst, and qualitative and quantitative methods for visualization\u0000research.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Zhang-Li, Zheyuan Zhang, Jifan Yu, Joy Lim Jia Yin, Shangqing Tu, Linlu Gong, Haohua Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li
The vast pre-existing slides serve as rich and important materials to carry lecture knowledge. However, effectively leveraging lecture slides to serve students is difficult due to the multi-modal nature of slide content and the heterogeneous teaching actions. We study the problem of discovering effective designs that convert a slide into an interactive lecture. We develop Slide2Lecture, a tuning-free and knowledge-regulated intelligent tutoring system that can (1) effectively convert an input lecture slide into a structured teaching agenda consisting of a set of heterogeneous teaching actions; (2) create and manage an interactive lecture that generates responsive interactions catering to student learning demands while regulating the interactions to follow teaching actions. Slide2Lecture contains a complete pipeline for learners to obtain an interactive classroom experience to learn the slide. For teachers and developers, Slide2Lecture enables customization to cater to personalized demands. The evaluation rated by annotators and students shows that Slide2Lecture is effective in outperforming the remaining implementation. Slide2Lecture's online deployment has made more than 200K interaction with students in the 3K lecture sessions. We open source Slide2Lecture's implementation in https://anonymous.4open.science/r/slide2lecture-4210/.
{"title":"Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination","authors":"Daniel Zhang-Li, Zheyuan Zhang, Jifan Yu, Joy Lim Jia Yin, Shangqing Tu, Linlu Gong, Haohua Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li","doi":"arxiv-2409.07372","DOIUrl":"https://doi.org/arxiv-2409.07372","url":null,"abstract":"The vast pre-existing slides serve as rich and important materials to carry\u0000lecture knowledge. However, effectively leveraging lecture slides to serve\u0000students is difficult due to the multi-modal nature of slide content and the\u0000heterogeneous teaching actions. We study the problem of discovering effective\u0000designs that convert a slide into an interactive lecture. We develop\u0000Slide2Lecture, a tuning-free and knowledge-regulated intelligent tutoring\u0000system that can (1) effectively convert an input lecture slide into a\u0000structured teaching agenda consisting of a set of heterogeneous teaching\u0000actions; (2) create and manage an interactive lecture that generates responsive\u0000interactions catering to student learning demands while regulating the\u0000interactions to follow teaching actions. Slide2Lecture contains a complete\u0000pipeline for learners to obtain an interactive classroom experience to learn\u0000the slide. For teachers and developers, Slide2Lecture enables customization to\u0000cater to personalized demands. The evaluation rated by annotators and students\u0000shows that Slide2Lecture is effective in outperforming the remaining\u0000implementation. Slide2Lecture's online deployment has made more than 200K\u0000interaction with students in the 3K lecture sessions. We open source\u0000Slide2Lecture's implementation in\u0000https://anonymous.4open.science/r/slide2lecture-4210/.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}