共同基金分类和距离度量学习的鲁棒性

Dhruv Desai, D. Mehta
{"title":"共同基金分类和距离度量学习的鲁棒性","authors":"Dhruv Desai, D. Mehta","doi":"10.3905/jfds.2021.3.4.130","DOIUrl":null,"url":null,"abstract":"Identifying similar mutual funds among a given universe of funds has many applications, including competitor analysis, marketing and sales, tax loss harvesting, and so on. For a contemporary analyst, the most popular approach to finding similar funds is to look up a categorization system such as Morningstar categorization. Morningstar categorization has been heavily investigated by academic researchers from various angles, including using unsupervised clustering techniques in which clusters were found to be inconsistent with categorization. Recently, however, categorization has been studied using supervised classification techniques, with the categories being the target labels. Categorization was indeed learnable with very high accuracy using a purely data-driven approach, causing a paradox: Clustering was inconsistent with respect to categorization, whereas supervised classification was able to reproduce (near) complete categorization. Here, the authors resolve this apparent paradox by pointing out incorrect uses and interpretations of machine learning techniques in the previous academic literature. The authors demonstrate that by using an appropriate list of variables and metrics to identify the optimal number of clusters and preprocessing the data using distance metric learning, one can indeed reproduce the Morningstar categorization using a data-driven approach. The present work puts an end to the debate on this issue and establishes that the Morningstar categorization is indeed intrinsically rigorous, consistent, rule-based, and reproducible using data-driven approaches, if machine learning techniques are correctly implemented. Key Findings ▪ Academic literature has time and again questioned the consistency and robustness of mutual fund’s categorization systems, such as Morningstar categorization, by contrasting them with unsupervised clustering of funds. ▪ Here, the authors settle the debate in favor of Morningstar categorization by pointing out the use of incorrect lists of variables and interpretation of machine learning algorithms in the previous literature, while emphasizing that the main missing piece from the machine learning side in previous research was the appropriate distance metric. ▪ The authors employ a machine learning technique called distance metric learning and reproduce the Morningstar categorization completely using a data-driven approach.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"On Robustness of Mutual Funds Categorization and Distance Metric Learning\",\"authors\":\"Dhruv Desai, D. Mehta\",\"doi\":\"10.3905/jfds.2021.3.4.130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying similar mutual funds among a given universe of funds has many applications, including competitor analysis, marketing and sales, tax loss harvesting, and so on. For a contemporary analyst, the most popular approach to finding similar funds is to look up a categorization system such as Morningstar categorization. Morningstar categorization has been heavily investigated by academic researchers from various angles, including using unsupervised clustering techniques in which clusters were found to be inconsistent with categorization. Recently, however, categorization has been studied using supervised classification techniques, with the categories being the target labels. Categorization was indeed learnable with very high accuracy using a purely data-driven approach, causing a paradox: Clustering was inconsistent with respect to categorization, whereas supervised classification was able to reproduce (near) complete categorization. Here, the authors resolve this apparent paradox by pointing out incorrect uses and interpretations of machine learning techniques in the previous academic literature. The authors demonstrate that by using an appropriate list of variables and metrics to identify the optimal number of clusters and preprocessing the data using distance metric learning, one can indeed reproduce the Morningstar categorization using a data-driven approach. The present work puts an end to the debate on this issue and establishes that the Morningstar categorization is indeed intrinsically rigorous, consistent, rule-based, and reproducible using data-driven approaches, if machine learning techniques are correctly implemented. Key Findings ▪ Academic literature has time and again questioned the consistency and robustness of mutual fund’s categorization systems, such as Morningstar categorization, by contrasting them with unsupervised clustering of funds. ▪ Here, the authors settle the debate in favor of Morningstar categorization by pointing out the use of incorrect lists of variables and interpretation of machine learning algorithms in the previous literature, while emphasizing that the main missing piece from the machine learning side in previous research was the appropriate distance metric. ▪ The authors employ a machine learning technique called distance metric learning and reproduce the Morningstar categorization completely using a data-driven approach.\",\"PeriodicalId\":199045,\"journal\":{\"name\":\"The Journal of Financial Data Science\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Financial Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3905/jfds.2021.3.4.130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Financial Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3905/jfds.2021.3.4.130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在给定的基金范围中识别相似的共同基金有许多应用,包括竞争对手分析、营销和销售、税收损失收集等。对于当代分析师来说,寻找类似基金最流行的方法是查找晨星(Morningstar)之类的分类系统。学术研究人员从各个角度对晨星分类进行了大量研究,包括使用无监督聚类技术,其中发现聚类与分类不一致。然而,近年来,人们开始使用监督分类技术进行分类研究,将类别作为目标标签。使用纯粹的数据驱动方法,分类确实是可以非常准确地学习的,这导致了一个悖论:聚类与分类不一致,而监督分类能够重现(接近)完整的分类。在这里,作者通过指出先前学术文献中对机器学习技术的错误使用和解释来解决这个明显的悖论。作者证明,通过使用适当的变量和指标列表来确定最佳簇数,并使用距离度量学习对数据进行预处理,可以使用数据驱动的方法再现晨星分类。目前的工作结束了关于这个问题的争论,并确立了晨星分类本质上确实是严格的、一致的、基于规则的,并且使用数据驱动的方法是可重复的,如果机器学习技术被正确地实现的话。学术文献通过将共同基金分类系统(如晨星分类系统)与无监督的基金聚类进行对比,一再质疑共同基金分类系统的一致性和稳健性。▪在这里,作者通过指出先前文献中使用不正确的变量列表和机器学习算法的解释来解决有利于晨星分类的争论,同时强调先前研究中机器学习方面的主要缺失部分是适当的距离度量。▪作者采用了一种称为距离度量学习的机器学习技术,并使用数据驱动的方法完全重现了晨星分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On Robustness of Mutual Funds Categorization and Distance Metric Learning
Identifying similar mutual funds among a given universe of funds has many applications, including competitor analysis, marketing and sales, tax loss harvesting, and so on. For a contemporary analyst, the most popular approach to finding similar funds is to look up a categorization system such as Morningstar categorization. Morningstar categorization has been heavily investigated by academic researchers from various angles, including using unsupervised clustering techniques in which clusters were found to be inconsistent with categorization. Recently, however, categorization has been studied using supervised classification techniques, with the categories being the target labels. Categorization was indeed learnable with very high accuracy using a purely data-driven approach, causing a paradox: Clustering was inconsistent with respect to categorization, whereas supervised classification was able to reproduce (near) complete categorization. Here, the authors resolve this apparent paradox by pointing out incorrect uses and interpretations of machine learning techniques in the previous academic literature. The authors demonstrate that by using an appropriate list of variables and metrics to identify the optimal number of clusters and preprocessing the data using distance metric learning, one can indeed reproduce the Morningstar categorization using a data-driven approach. The present work puts an end to the debate on this issue and establishes that the Morningstar categorization is indeed intrinsically rigorous, consistent, rule-based, and reproducible using data-driven approaches, if machine learning techniques are correctly implemented. Key Findings ▪ Academic literature has time and again questioned the consistency and robustness of mutual fund’s categorization systems, such as Morningstar categorization, by contrasting them with unsupervised clustering of funds. ▪ Here, the authors settle the debate in favor of Morningstar categorization by pointing out the use of incorrect lists of variables and interpretation of machine learning algorithms in the previous literature, while emphasizing that the main missing piece from the machine learning side in previous research was the appropriate distance metric. ▪ The authors employ a machine learning technique called distance metric learning and reproduce the Morningstar categorization completely using a data-driven approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Managing Editor’s Letter Explainable Machine Learning Models of Consumer Credit Risk Predicting Returns with Machine Learning across Horizons, Firm Size, and Time Deep Calibration with Artificial Neural Network: A Performance Comparison on Option-Pricing Models RIFT: Pretraining and Applications for Representations of Interrelated Financial Time Series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1