Pub Date : 2024-05-20DOI: 10.1609/aaaiss.v3i1.31273
Matt Jones, Tyler R. Scott, Michael C. Mozer
Natural environments have correlations at a wide range of timescales. Human cognition is tuned to this temporal structure, as seen by power laws of learning and memory, and by spacing effects whereby the intervals between repeated training data affect how long knowledge is retained. Machine learning is instead dominated by batch iid training or else relatively simple nonstationarity assumptions such as random walks or discrete task sequences. The main contributions of our work are: (1) We develop a Bayesian model formalizing the brain's inductive bias for temporal structure and show our model accounts for key features of human learning and memory. (2) We translate the model into a new gradient-based optimization technique for neural networks that endows them with human-like temporal inductive bias and improves their performance in realistic nonstationary tasks. Our technical approach is founded on Bayesian inference over 1/f noise, a statistical signature of many natural environments with long-range, power law correlations. We derive a new closed-form solution to this problem by treating the state of the environment as a sum of processes on different timescales and applying an extended Kalman filter to learn all timescales jointly. We then derive a variational approximation of this model for training neural networks, which can be used as a drop-in replacement for standard optimizers in arbitrary architectures. Our optimizer decomposes each weight in the network as a sum of subweights with different learning and decay rates and tracks their joint uncertainty. Thus knowledge becomes distributed across timescales, enabling rapid adaptation to task changes while retaining long-term knowledge and avoiding catastrophic interference. Simulations show improved performance in environments with realistic multiscale nonstationarity. Finally, we present simulations showing our model gives essentially parameter-free fits of learning, forgetting, and spacing effects in human data. We then explore the analogue of human spacing effects in a deep net trained in a structured environment where tasks recur at different rates and compare the model's behavioral properties to those of people.
{"title":"Human-like Learning in Temporally Structured Environments","authors":"Matt Jones, Tyler R. Scott, Michael C. Mozer","doi":"10.1609/aaaiss.v3i1.31273","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31273","url":null,"abstract":"Natural environments have correlations at a wide range of timescales. Human cognition is tuned to this temporal structure, as seen by power laws of learning and memory, and by spacing effects whereby the intervals between repeated training data affect how long knowledge is retained. Machine learning is instead dominated by batch iid training or else relatively simple nonstationarity assumptions such as random walks or discrete task sequences.\u0000\u0000The main contributions of our work are:\u0000(1) We develop a Bayesian model formalizing the brain's inductive bias for temporal structure\u0000and show our model accounts for key features of human learning and memory.\u0000(2) We translate the model into a new gradient-based optimization technique for neural networks that endows them with human-like temporal inductive bias and improves their performance in realistic nonstationary tasks.\u0000\u0000Our technical approach is founded on Bayesian inference over 1/f noise, a statistical signature of many natural environments with long-range, power law correlations. We derive a new closed-form solution to this problem by treating the state of the environment as a sum of processes on different timescales and applying an extended Kalman filter to learn all timescales jointly. \u0000\u0000We then derive a variational approximation of this model for training neural networks, which can be used as a drop-in replacement for standard optimizers in arbitrary architectures. Our optimizer decomposes each weight in the network as a sum of subweights with different learning and decay rates and tracks their joint uncertainty. Thus knowledge becomes distributed across timescales, enabling rapid adaptation to task changes while retaining long-term knowledge and avoiding catastrophic interference. Simulations show improved performance in environments with realistic multiscale nonstationarity.\u0000\u0000Finally, we present simulations showing our model gives essentially parameter-free fits of learning, forgetting, and spacing effects in human data. We then explore the analogue of human spacing effects in a deep net trained in a structured environment where tasks recur at different rates and compare the model's behavioral properties to those of people.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"29 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141118928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1609/aaaiss.v3i1.31266
Christelle Scharff, James Brusseau, K. Bathula, Kaleemunnisa Fnu, Samyak Rakesh Meshram, Om Gaikhe
This paper addresses the ethics of inclusion in artificial in-telligence in the context of African fashion. Despite the proliferation of fashion-related AI applications and da-tasets global diversity remains limited, and African fash-ion is significantly underrepresented. This paper docu-ments two use-cases that enhance AI's inclusivity by in-corporating sub-Saharan fashion elements. The first case details the creation of a Senegalese fashion dataset and a model for classifying traditional apparel using transfer learning. The second case investigates African wax textile patterns generated through generative adversarial net-works (GANs), specifically StyleGAN architectures, and machine learning diffusion models. Alongside the practi-cal, technological advances, theoretical ethical progress is made in two directions. First, the cases are used to elabo-rate and define the ethics of inclusion, while also contrib-uting to current debates about how inclusion differs from ethical fairness. Second, the cases engage with the ethical debate on whether AI innovation should be slowed to prevent ethical imbalances or accelerated to solve them.
{"title":"Inclusion Ethics in AI: Use Cases in African Fashion","authors":"Christelle Scharff, James Brusseau, K. Bathula, Kaleemunnisa Fnu, Samyak Rakesh Meshram, Om Gaikhe","doi":"10.1609/aaaiss.v3i1.31266","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31266","url":null,"abstract":"This paper addresses the ethics of inclusion in artificial in-telligence in the context of African fashion. Despite the proliferation of fashion-related AI applications and da-tasets global diversity remains limited, and African fash-ion is significantly underrepresented. This paper docu-ments two use-cases that enhance AI's inclusivity by in-corporating sub-Saharan fashion elements. The first case details the creation of a Senegalese fashion dataset and a model for classifying traditional apparel using transfer learning. The second case investigates African wax textile patterns generated through generative adversarial net-works (GANs), specifically StyleGAN architectures, and machine learning diffusion models. Alongside the practi-cal, technological advances, theoretical ethical progress is made in two directions. First, the cases are used to elabo-rate and define the ethics of inclusion, while also contrib-uting to current debates about how inclusion differs from ethical fairness. Second, the cases engage with the ethical debate on whether AI innovation should be slowed to prevent ethical imbalances or accelerated to solve them.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"18 13","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141120817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1609/aaaiss.v3i1.31234
Stefanie Hauske, Oliver Bendel
This paper explores how generative AI (GenAI) can improve the well-being of learners within self-regulated learning (SRL) frameworks in the corporate context. In the “GenAI to Support SRL” section, it presents three custom versions of ChatGPT aimed at assisting learners. These so-called GPTs demonstrate the GenAI’s potential to actively support learners in SRL and positively influence their well-being. The “Discussion” and “Summary and Outlook” sections provide a balanced overview of the opportunities and risks associated with GenAI in the field of learning and highlight directions for future research. The results indicate that GenAI could improve the well-being of learners in SRL through providing personalized guidance, reducing feelings of stress, and increasing motivation and self-efficacy. At the same time, there are several challenges for companies and employees that need to be overcome.
{"title":"How Can GenAI Foster Well-being in Self-regulated Learning?","authors":"Stefanie Hauske, Oliver Bendel","doi":"10.1609/aaaiss.v3i1.31234","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31234","url":null,"abstract":"This paper explores how generative AI (GenAI) can improve the well-being of learners within self-regulated learning (SRL) frameworks in the corporate context. In the “GenAI to Support SRL” section, it presents three custom versions of ChatGPT aimed at assisting learners. These so-called GPTs demonstrate the GenAI’s potential to actively support learners in SRL and positively influence their well-being. The “Discussion” and “Summary and Outlook” sections provide a balanced overview of the opportunities and risks associated with GenAI in the field of learning and highlight directions for future research. The results indicate that GenAI could improve the well-being of learners in SRL through providing personalized guidance, reducing feelings of stress, and increasing motivation and self-efficacy. At the same time, there are several challenges for companies and employees that need to be overcome.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"52 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1609/aaaiss.v3i1.31274
Steven Jones, Peter Lindes
Human-like learning includes an ability to learn concepts from a stream of embodiment sensor data. Echoing previous thoughts such as those from Barsalou that cognition and perception share a common representation system, we suggest an addendum to the common model of cognition. This addendum poses a simultaneous semantic memory and perception learning that bypasses working memory, and that uses parallel processing to learn concepts apart from deliberate reasoning. The goal is to provide a general outline for how to extend a class of cognitive architectures to implement a more human-like interface between cognition and embodiment of an agent, where a critical aspect of that interface is that it is dynamic because of learning.
{"title":"Toward Human-Like Representation Learning for Cognitive Architectures","authors":"Steven Jones, Peter Lindes","doi":"10.1609/aaaiss.v3i1.31274","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31274","url":null,"abstract":"Human-like learning includes an ability to learn concepts from a stream of embodiment sensor data. Echoing previous thoughts such as those from Barsalou that cognition and perception share a common representation system, we suggest an addendum to the common model of cognition. This addendum poses a simultaneous semantic memory and perception learning that bypasses working memory, and that uses parallel processing to learn concepts apart from deliberate reasoning. The goal is to provide a general outline for how to extend a class of cognitive architectures to implement a more human-like interface between cognition and embodiment of an agent, where a critical aspect of that interface is that it is dynamic because of learning.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"21 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Various meetings are carried out in intellectual production activities and workers have to spend much time to create ideas. In creative meetings, it is sometime difficult for the meeting moderators and facilitators to efficiently conduct the meetings because the participants are required to come up with new ideas one after another and some participants hesitate to express unconventional ideas. Therefore, we propose to develop an AI-based meeting support system that estimates participants’ inspiration and helps to generate comfortable meeting environments for improvement of worker wellbeing. Participants’ inspiration is assumed to be estimated based on their speech and micro behaviors including smiles and nods. In this paper, a dataset we collected for the development of the proposed system is reported. The dataset consists of participants’ brain blood flows measured near-infrared spectrometers, micro behavior annotated from video recording, and inspiration the participants reported with buttons. The data for 1020 min was collected by conducting simulation meetings. In future work, we plan to train an LSTM (long short-term memory) based neural network model to realize the proposed system.
{"title":"A Dataset for Estimating Participant Inspiration in Meetings toward AI-Based Meeting Support System to Improve Worker Wellbeing","authors":"Soki Arai, Yuki Yamamoto, Yuji Nozaki, Haruka Matsukura, Maki Sakamoto","doi":"10.1609/aaaiss.v3i1.31231","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31231","url":null,"abstract":"Various meetings are carried out in intellectual production activities and workers have to spend much time to create ideas. In creative meetings, it is sometime difficult for the meeting moderators and facilitators to efficiently conduct the meetings because the participants are required to come up with new ideas one after another and some participants hesitate to express unconventional ideas. Therefore, we propose to develop an AI-based meeting support system that estimates participants’ inspiration and helps to generate comfortable meeting environments for improvement of worker wellbeing. Participants’ inspiration is assumed to be estimated based on their speech and micro behaviors including smiles and nods. In this paper, a dataset we collected for the development of the proposed system is reported. The dataset consists of participants’ brain blood flows measured near-infrared spectrometers, micro behavior annotated from video recording, and inspiration the participants reported with buttons. The data for 1020 min was collected by conducting simulation meetings. In future work, we plan to train an LSTM (long short-term memory) based neural network model to realize the proposed system.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"84 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141123068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1609/aaaiss.v3i1.31219
Hana Khamfroush
In a resource constrained environment like Internet-of-Things (IoT) systems, it is critical to make optimal decisions on how much resources to allocate pre-processing and how much to allocate to model training, and which specific combination of preprocessing and learning should be selected. This talk first, provides an overview of some initial steps we took towards developing federated data pre-processing in IoT environments, and then a visionary overview of potential research problems related to developing an integrated resource-aware and Quality-of-Service (QoS)-aware data pre-processing and model training system is provided.
{"title":"Resource-aware Federated Data Analytics in Edge-Enabled IoT Systems","authors":"Hana Khamfroush","doi":"10.1609/aaaiss.v3i1.31219","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31219","url":null,"abstract":"In a resource constrained environment like Internet-of-Things (IoT) systems, it is critical to make optimal decisions on how much resources\u0000to allocate pre-processing and how much to allocate to model training, and which specific combination of preprocessing and learning should be selected. \u0000This talk first, provides an overview of some initial steps we took towards developing federated data pre-processing in IoT environments, and then a\u0000visionary overview of potential research problems related to developing an integrated resource-aware and Quality-of-Service (QoS)-aware data pre-processing and model training system is provided.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"99 35","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1609/aaaiss.v3i1.31199
Andreas Martin, Hans Friedrich Witschel, Maximilian Mandl, Mona Stockhecke
This position paper presents a novel approach of semantic verification in Large Language Model-based Retrieval Augmented Generation (LLM-RAG) systems, focusing on the critical need for factually accurate information dissemination during public debates, especially prior to plebiscites e.g. in direct democracies, particularly in the context of Switzerland. Recognizing the unique challenges posed by the current generation of Large Language Models (LLMs) in maintaining factual integrity, this research proposes an innovative solution that integrates retrieval mechanisms with enhanced semantic verification processes. The paper outlines a comprehensive methodology following a Design Science Research approach, which includes defining user personas, designing conversational interfaces, and iteratively developing a hybrid dialogue system. Central to this system is a robust semantic verification framework that leverages a knowledge graph for fact-checking and validation, ensuring the correctness and consistency of information generated by LLMs. The paper discusses the significance of this research in the context of Swiss direct democracy, where informed decision-making is pivotal. By improving the accuracy and reliability of information provided to the public, the proposed system aims to support the democratic process, enabling citizens to make well-informed decisions on complex issues. The research contributes to advancing the field of natural language processing and information retrieval, demonstrating the potential of AI and LLMs in enhancing civic engagement and democratic participation.
{"title":"Semantic Verification in Large Language Model-based Retrieval Augmented Generation","authors":"Andreas Martin, Hans Friedrich Witschel, Maximilian Mandl, Mona Stockhecke","doi":"10.1609/aaaiss.v3i1.31199","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31199","url":null,"abstract":"This position paper presents a novel approach of semantic verification in Large Language Model-based Retrieval Augmented Generation (LLM-RAG) systems, focusing on the critical need for factually accurate information dissemination during public debates, especially prior to plebiscites e.g. in direct democracies, particularly in the context of Switzerland. Recognizing the unique challenges posed by the current generation of Large Language Models (LLMs) in maintaining factual integrity, this research proposes an innovative solution that integrates retrieval mechanisms with enhanced semantic verification processes. The paper outlines a comprehensive methodology following a Design Science Research approach, which includes defining user personas, designing conversational interfaces, and iteratively developing a hybrid dialogue system. Central to this system is a robust semantic verification framework that leverages a knowledge graph for fact-checking and validation, ensuring the correctness and consistency of information generated by LLMs. The paper discusses the significance of this research in the context of Swiss direct democracy, where informed decision-making is pivotal. By improving the accuracy and reliability of information provided to the public, the proposed system aims to support the democratic process, enabling citizens to make well-informed decisions on complex issues. The research contributes to advancing the field of natural language processing and information retrieval, demonstrating the potential of AI and LLMs in enhancing civic engagement and democratic participation.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"16 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141120339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1609/aaaiss.v3i1.31171
O. Bartheye, L. Chaudron
This present work is deliberately placed in the context capable of defining the requirements expressed by machine decision-making calculations. The informational nature of a decision requires abandoning any invariant preserving the structure but on the contrary switching into total chaos, a necessary and sufficient condition for exploiting the symmetries allowing the calculation to converge. Decision arithmetic is the best way to precisely define the nature of these symmetries.
{"title":"The Arithmetic of Machine Decision : How to Find the Symmetries of Complete Chaos","authors":"O. Bartheye, L. Chaudron","doi":"10.1609/aaaiss.v3i1.31171","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31171","url":null,"abstract":"This present work is deliberately placed in the context capable of defining the requirements expressed by machine decision-making calculations. The informational nature of\u0000a decision requires abandoning any invariant preserving the structure but on the contrary switching into total chaos, a necessary and sufficient condition for exploiting the symmetries\u0000allowing the calculation to converge. Decision arithmetic is the best way to precisely define the nature of these symmetries.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"8 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141120279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1609/aaaiss.v3i1.31170
Cecilia O. Alm
AI systems are breaking into new domains and applications, and it is pivotal to center humans in contemporary AI systems and contemplate what this means. This discussion considers three perspectives or human roles in AI as users, contributors, and researchers-in-training, to illustrate this notion.
{"title":"Centering Humans in Artificial Intelligence ","authors":"Cecilia O. Alm","doi":"10.1609/aaaiss.v3i1.31170","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31170","url":null,"abstract":"AI systems are breaking into new domains and applications, and it is pivotal to center humans in contemporary AI systems and contemplate what this means. This discussion considers three perspectives or human roles in AI as users, contributors, and researchers-in-training, to illustrate this notion.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"100 26","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel framework for analyzing and interpreting electron microscopy images in semiconductor manufacturing using vision-language instruction tuning. The framework employs a unique teacher-student approach, leveraging pretrained multimodal large language models such as GPT-4 to generate instruction-following data for zero-shot visual question answering (VQA) and classification tasks, customizing smaller multimodal models (SMMs) for microscopy image analysis, resulting in an instruction tuned language-and-vision assistant. Our framework merges knowledge engineering with machine learning to integrate domain-specific expertise from larger to smaller multimodal models within this specialized field, greatly reducing the need for extensive human labeling. Our study presents a secure, cost-effective, and customizable approach for analyzing microscopy images, addressing the challenges of adopting proprietary models in semiconductor manufacturing.
{"title":"Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis","authors":"Sagar Srinivas Sakhinana, Geethan Sannidhi, Venkataramana Runkana","doi":"10.1609/aaaiss.v3i1.31205","DOIUrl":"https://doi.org/10.1609/aaaiss.v3i1.31205","url":null,"abstract":"We present a novel framework for analyzing and interpreting electron microscopy images in semiconductor manufacturing using vision-language instruction tuning. The framework employs a unique teacher-student approach, leveraging pretrained multimodal large language models such as GPT-4 to generate instruction-following data for zero-shot visual question answering (VQA) and classification tasks, customizing smaller multimodal models (SMMs) for microscopy image analysis, resulting in an instruction tuned language-and-vision assistant. Our framework merges knowledge engineering with machine learning to integrate domain-specific expertise from larger to smaller multimodal models within this specialized field, greatly reducing the need for extensive human labeling. Our study presents a secure, cost-effective, and customizable approach for analyzing microscopy images, addressing the challenges of adopting proprietary models in semiconductor manufacturing.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"80 14","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141123153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}