Jin Chen, Ruining Cao, Yifei Song, Anan Hu, Ying Ding
{"title":"A dataset of venture capitalist types in China (1978-2021): A machine-human hybrid approach.","authors":"Jin Chen, Ruining Cao, Yifei Song, Anan Hu, Ying Ding","doi":"10.1038/s41597-024-04108-z","DOIUrl":null,"url":null,"abstract":"<p><p>Despite escalating interest in distinguishing among various types of venture capitalists (VCs) and their roles in shaping entrepreneurship and innovation, such research remains sparse in the world's second-largest VC market, i.e., China. To address this important gap, we have devised a machine-human hybrid approach to perform the classification task for VC types. Specifically, we have compiled a list of 49,187 VCs that made investments in China before 2021 from CVSource database, collected VC ownership information from other public sources, developed machine-learning algorithms to predict VC types, and used human coders when machine-learning failed to produce a prediction. Utilizing this hybrid approach, we have classified VCs into one of the following types: GVC (public agency-affiliated, state-owned enterprise-affiliated), CVC (corporate VC), IVC (independent VC), BVC (bank-affiliated VC), FVC (financial/non-bank-affiliated VC), UVC (university-affiliated VC), and PenVC (pension-fund-affiliated VC). We not only provide the most up-to-date database for VC types in the Chinese setting but also demonstrate how to leverage machine-learning algorithms to devise a transparent coding approach for VC-type classifications.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1255"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11579325/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-04108-z","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Despite escalating interest in distinguishing among various types of venture capitalists (VCs) and their roles in shaping entrepreneurship and innovation, such research remains sparse in the world's second-largest VC market, i.e., China. To address this important gap, we have devised a machine-human hybrid approach to perform the classification task for VC types. Specifically, we have compiled a list of 49,187 VCs that made investments in China before 2021 from CVSource database, collected VC ownership information from other public sources, developed machine-learning algorithms to predict VC types, and used human coders when machine-learning failed to produce a prediction. Utilizing this hybrid approach, we have classified VCs into one of the following types: GVC (public agency-affiliated, state-owned enterprise-affiliated), CVC (corporate VC), IVC (independent VC), BVC (bank-affiliated VC), FVC (financial/non-bank-affiliated VC), UVC (university-affiliated VC), and PenVC (pension-fund-affiliated VC). We not only provide the most up-to-date database for VC types in the Chinese setting but also demonstrate how to leverage machine-learning algorithms to devise a transparent coding approach for VC-type classifications.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.