使用自然语言处理学习共同基金分类

Proceedings of the Third ACM International Conference on AI in Finance Pub Date : 2022-07-11 DOI:10.1145/3533271.3561748

Dimitrios Vamvourellis, M. Tóth, Dhruv Desai, D. Mehta, S. Pasquali

{"title":"使用自然语言处理学习共同基金分类","authors":"Dimitrios Vamvourellis, M. Tóth, Dhruv Desai, D. Mehta, S. Pasquali","doi":"10.1145/3533271.3561748","DOIUrl":null,"url":null,"abstract":"Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP). Positing as a multi-class classification problem with the input data being only the investment strategy description as reported in the form and the target variable being the Lipper Global categories, and using various NLP models, we show that the categorization system can indeed be learned with high accuracy. We discuss implications and applications of our findings as well as limitations of existing pre-trained architectures in applying them to learn fund categorization.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Learning Mutual Fund Categorization using Natural Language Processing\",\"authors\":\"Dimitrios Vamvourellis, M. Tóth, Dhruv Desai, D. Mehta, S. Pasquali\",\"doi\":\"10.1145/3533271.3561748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP). Positing as a multi-class classification problem with the input data being only the investment strategy description as reported in the form and the target variable being the Lipper Global categories, and using various NLP models, we show that the categorization system can indeed be learned with high accuracy. We discuss implications and applications of our findings as well as limitations of existing pre-trained architectures in applying them to learn fund categorization.\",\"PeriodicalId\":134888,\"journal\":{\"name\":\"Proceedings of the Third ACM International Conference on AI in Finance\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third ACM International Conference on AI in Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3533271.3561748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533271.3561748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

长期以来，共同基金或交易所交易基金(etf)的分类为金融分析师提供了从竞争对手分析到量化投资组合多样化等各种目的的同行分析。分类方法通常依赖于从N-1A表格中提取的结构化基金组成数据。在这里，我们启动了一项研究，使用自然语言处理(NLP)直接从表格中描述的非结构化数据中学习分类系统。假设一个多类分类问题，输入数据仅为表格中报告的投资策略描述，目标变量为Lipper Global类别，并使用各种NLP模型，我们表明分类系统确实可以以高精度学习。我们讨论了我们的发现的意义和应用，以及现有的预训练架构在应用它们来学习基金分类方面的局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning Mutual Fund Categorization using Natural Language Processing

Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP). Positing as a multi-class classification problem with the input data being only the investment strategy description as reported in the form and the target variable being the Lipper Global categories, and using various NLP models, we show that the categorization system can indeed be learned with high accuracy. We discuss implications and applications of our findings as well as limitations of existing pre-trained architectures in applying them to learn fund categorization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Third ACM International Conference on AI in Finance

自引率

0.00%

发文量

期刊最新文献

Core Matrix Regression and Prediction with Regularization Risk-Aware Linear Bandits with Application in Smart Order Routing Addressing Extreme Market Responses Using Secure Aggregation Addressing Non-Stationarity in FX Trading with Online Model Selection of Offline RL Experts Objective Driven Portfolio Construction Using Reinforcement Learning