Annika Marie Schoene, John E. Ortega, Silvio Amir, Kenneth Ward Church
{"title":"An Example of (Too Much) Hyper-Parameter Tuning In Suicide Ideation Detection","authors":"Annika Marie Schoene, John E. Ortega, Silvio Amir, Kenneth Ward Church","doi":"10.1609/icwsm.v17i1.22227","DOIUrl":null,"url":null,"abstract":"This work starts with the TWISCO baseline, a benchmark of suicide-related content from Twitter. We find that hyper-parameter tuning can improve this baseline by 9%. We examined 576 combinations of hyper-parameters: learning rate, batch size, epochs and date range of training data. Reasonable settings of learning rate and batch size produce better results than poor settings. Date range is less conclusive. Balancing the date range of the training data to match the benchmark ought to improve performance, but the differences are relatively small. Optimal settings of learning rate and batch size are much better than poor settings, but optimal settings of date range are not that different from poor settings of date range. Finally, we end with concerns about reproducibility. Of the 576 experiments, 10% produced F1 performance above baseline. It is common practice in the literature to run many experiments and report the best, but doing so may be risky, especially given the sensitive nature of Suicide Ideation Detection.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v17i1.22227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This work starts with the TWISCO baseline, a benchmark of suicide-related content from Twitter. We find that hyper-parameter tuning can improve this baseline by 9%. We examined 576 combinations of hyper-parameters: learning rate, batch size, epochs and date range of training data. Reasonable settings of learning rate and batch size produce better results than poor settings. Date range is less conclusive. Balancing the date range of the training data to match the benchmark ought to improve performance, but the differences are relatively small. Optimal settings of learning rate and batch size are much better than poor settings, but optimal settings of date range are not that different from poor settings of date range. Finally, we end with concerns about reproducibility. Of the 576 experiments, 10% produced F1 performance above baseline. It is common practice in the literature to run many experiments and report the best, but doing so may be risky, especially given the sensitive nature of Suicide Ideation Detection.