一根绳子有多短?:文本长度和文本增强对短文本分类的影响

Austin Mccartney, Svetlana Hensman, L. Longo
{"title":"一根绳子有多短?:文本长度和文本增强对短文本分类的影响","authors":"Austin Mccartney, Svetlana Hensman, L. Longo","doi":"10.21427/D7151M","DOIUrl":null,"url":null,"abstract":"Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases.","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"How Short is a Piece of String? : The Impact of Text Length and Text Augmentation on Short-text Classification\",\"authors\":\"Austin Mccartney, Svetlana Hensman, L. Longo\",\"doi\":\"10.21427/D7151M\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases.\",\"PeriodicalId\":286718,\"journal\":{\"name\":\"Irish Conference on Artificial Intelligence and Cognitive Science\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Irish Conference on Artificial Intelligence and Cognitive Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21427/D7151M\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Irish Conference on Artificial Intelligence and Cognitive Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21427/D7151M","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

最近短信使用和可用性的增加为通过基于机器的分类收集大量信息创造了机会。然而,传统的分类方法未能产生与较长文本的分类精度相当的准确性。以前已经采用了几种方法来扩展传统方法来克服这个问题,包括通过构建与外部数据补充来源的关联来增强原始文本。现有文献并没有精确描述文本长度对分类性能的影响。这项工作定量地检查了使用各种增强方法的一小部分分类器的准确性变化,随着文本长度逐渐减少。基于95%置信区间方差分析的结果表明,使用简单增强的分类器的性能随着文本长度的减少而下降,但使用更复杂的增强可能会导致文本的过度补充,从而导致概念漂移,分类性能随着文本长度的增加而下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
How Short is a Piece of String? : The Impact of Text Length and Text Augmentation on Short-text Classification
Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Brain Tumor Synthetic Data Generation with Adaptive StyleGANs Unimodal and Multimodal Representation Training for Relation Extraction A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions Spot the fake lungs: Generating Synthetic Medical Images using Neural Diffusion Models A Self-attention Guided Multi-scale Gradient GAN for Diversified X-ray Image Synthesis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1