一种提取文本信息的深度学习方法

Allen Huang, Hui Wang, Yi Yang
{"title":"一种提取文本信息的深度学习方法","authors":"Allen Huang, Hui Wang, Yi Yang","doi":"10.2139/ssrn.3910214","DOIUrl":null,"url":null,"abstract":"In this paper, we develop FinBERT, a state-of-the-art deep learning algorithm that incorporates the contextual relations between words in the finance domain. First, using a researcher-labeled analyst report sample, we document that FinBERT significantly outperforms the Loughran and McDonald (LM) dictionary, the naïve Bayes, and Word2Vec in sentiment classification, primarily because of its ability to uncover sentiment in sentences that other algorithms mislabel as neutral. Next, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 32% compared with FinBERT. Our results also indicate that FinBERT’s greater accuracy is especially relevant when empirical tests may suffer from low power, such as with small samples. Last, textual sentiments summarized by FinBERT can better predict future earnings than the LM dictionary, especially after 2011, consistent with firms’ strategic disclosures reducing the information content of textual sentiments measured with LM dictionary. Our results have implications for academic researchers, investment professionals, and financial market regulators who want to extract insights from financial texts.","PeriodicalId":256367,"journal":{"name":"Computational Linguistics & Natural Language Processing eJournal","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"FinBERT—A Deep Learning Approach to Extracting Textual Information\",\"authors\":\"Allen Huang, Hui Wang, Yi Yang\",\"doi\":\"10.2139/ssrn.3910214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we develop FinBERT, a state-of-the-art deep learning algorithm that incorporates the contextual relations between words in the finance domain. First, using a researcher-labeled analyst report sample, we document that FinBERT significantly outperforms the Loughran and McDonald (LM) dictionary, the naïve Bayes, and Word2Vec in sentiment classification, primarily because of its ability to uncover sentiment in sentences that other algorithms mislabel as neutral. Next, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 32% compared with FinBERT. Our results also indicate that FinBERT’s greater accuracy is especially relevant when empirical tests may suffer from low power, such as with small samples. Last, textual sentiments summarized by FinBERT can better predict future earnings than the LM dictionary, especially after 2011, consistent with firms’ strategic disclosures reducing the information content of textual sentiments measured with LM dictionary. Our results have implications for academic researchers, investment professionals, and financial market regulators who want to extract insights from financial texts.\",\"PeriodicalId\":256367,\"journal\":{\"name\":\"Computational Linguistics & Natural Language Processing eJournal\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Linguistics & Natural Language Processing eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3910214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics & Natural Language Processing eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3910214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

在本文中,我们开发了FinBERT,这是一种最先进的深度学习算法,它结合了金融领域中单词之间的上下文关系。首先,使用研究人员标记的分析师报告样本,我们证明FinBERT在情感分类方面显着优于Loughran和McDonald (LM)字典,naïve贝叶斯和Word2Vec,主要是因为它能够发现其他算法错误标记为中立的句子中的情感。接下来,我们表明,与FinBERT相比,其他方法低估了财报电话会议的文本信息量至少32%。我们的结果还表明,当实证测试可能受到低功率的影响时,例如使用小样本时,FinBERT的更高准确性尤其相关。最后,FinBERT总结的文本情感比LM词典能更好地预测未来收益,特别是在2011年之后,这与企业的战略披露相一致,减少了LM词典测量的文本情感的信息含量。我们的研究结果对想要从金融文本中提取见解的学术研究人员、投资专业人士和金融市场监管者具有启示意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FinBERT—A Deep Learning Approach to Extracting Textual Information
In this paper, we develop FinBERT, a state-of-the-art deep learning algorithm that incorporates the contextual relations between words in the finance domain. First, using a researcher-labeled analyst report sample, we document that FinBERT significantly outperforms the Loughran and McDonald (LM) dictionary, the naïve Bayes, and Word2Vec in sentiment classification, primarily because of its ability to uncover sentiment in sentences that other algorithms mislabel as neutral. Next, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 32% compared with FinBERT. Our results also indicate that FinBERT’s greater accuracy is especially relevant when empirical tests may suffer from low power, such as with small samples. Last, textual sentiments summarized by FinBERT can better predict future earnings than the LM dictionary, especially after 2011, consistent with firms’ strategic disclosures reducing the information content of textual sentiments measured with LM dictionary. Our results have implications for academic researchers, investment professionals, and financial market regulators who want to extract insights from financial texts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Digital Storytelling: Computer Based Learning Activity to Enhance Young Learner Vocabulary Corporate ESG News and The Stock Market Neural Discourse Modelling of Conversations FinBERT—A Deep Learning Approach to Extracting Textual Information Implementation on Text Classification Using Bag of Words Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1