{"title":"评估和改进自然语言处理模型行为的原则和交互工具","authors":"Tongshuang Wu","doi":"10.1145/3411763.3443423","DOIUrl":null,"url":null,"abstract":"While the accuracy of Natural Language Processing (NLP) models has been going up, users have more expectations than captured by just accuracy. Despite practitioners’ attempt to inspect model blind spots or lacking capabilities, the status-quo processes can be ad-hoc and biased. My thesis focuses on helping practitioners organize and explore the inputs and outputs of their models, such that they can gain more systematic insights into their models’ behaviors. I identified two building blocks that are essential for informative analysis: (1) to scale up the analysis by grouping similar instances, and (2) to isolate important components by generating counterfactuals. To support multiple analysis stages (training data assessment, error analysis, model testing), I designed various interactive tools that instantiate these two building blocks. In the process, I characterized the design space of grouping and counterfactual generation, seeking to balance the machine powers and practitioners’ domain expertise. My future work proposes to explore how the grouping and counterfactual techniques can benefit non-experts in the data collection process.","PeriodicalId":265192,"journal":{"name":"Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Principles and Interactive Tools for Evaluating and Improving the Behavior of Natural Language Processing models\",\"authors\":\"Tongshuang Wu\",\"doi\":\"10.1145/3411763.3443423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While the accuracy of Natural Language Processing (NLP) models has been going up, users have more expectations than captured by just accuracy. Despite practitioners’ attempt to inspect model blind spots or lacking capabilities, the status-quo processes can be ad-hoc and biased. My thesis focuses on helping practitioners organize and explore the inputs and outputs of their models, such that they can gain more systematic insights into their models’ behaviors. I identified two building blocks that are essential for informative analysis: (1) to scale up the analysis by grouping similar instances, and (2) to isolate important components by generating counterfactuals. To support multiple analysis stages (training data assessment, error analysis, model testing), I designed various interactive tools that instantiate these two building blocks. In the process, I characterized the design space of grouping and counterfactual generation, seeking to balance the machine powers and practitioners’ domain expertise. My future work proposes to explore how the grouping and counterfactual techniques can benefit non-experts in the data collection process.\",\"PeriodicalId\":265192,\"journal\":{\"name\":\"Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3411763.3443423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3411763.3443423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Principles and Interactive Tools for Evaluating and Improving the Behavior of Natural Language Processing models
While the accuracy of Natural Language Processing (NLP) models has been going up, users have more expectations than captured by just accuracy. Despite practitioners’ attempt to inspect model blind spots or lacking capabilities, the status-quo processes can be ad-hoc and biased. My thesis focuses on helping practitioners organize and explore the inputs and outputs of their models, such that they can gain more systematic insights into their models’ behaviors. I identified two building blocks that are essential for informative analysis: (1) to scale up the analysis by grouping similar instances, and (2) to isolate important components by generating counterfactuals. To support multiple analysis stages (training data assessment, error analysis, model testing), I designed various interactive tools that instantiate these two building blocks. In the process, I characterized the design space of grouping and counterfactual generation, seeking to balance the machine powers and practitioners’ domain expertise. My future work proposes to explore how the grouping and counterfactual techniques can benefit non-experts in the data collection process.