Ashwin Sharma, Disha Devalia, Wilfred Almeida, Harshali P. Patil, A. Mishra
{"title":"使用GPT3进行统计数据分析:概述","authors":"Ashwin Sharma, Disha Devalia, Wilfred Almeida, Harshali P. Patil, A. Mishra","doi":"10.1109/IBSSC56953.2022.10037383","DOIUrl":null,"url":null,"abstract":"Though automated statistics has started gaining some momentum in the field of data analysis, it is not unified and very slow with large datasets. Due to computing limitations or lack of specific domain knowledge, general statistics have been used most commonly. But now research advisors are attracted towards a machine learning-based approach for statistical analysis of Data Sets which may help bridge gaps between traditional approaches like correlation matrices, p-values, etc., and new models like GPT3. This paper proposes a novel approach for the analysis of large datasets which uses GPT3 to predict insights from calculated statistics of data. The research addresses the limitations of existing methods and proposes a novel framework to analyze large statistical data sets, which solves many computationally challenging problems in efficient ways. Our proposed method works on top of GPT3's features, where it learns to predict individual words from particular parts of the dataset you pass as prompts (cumulative sums/means etc.) enabling us to analyze extremely large datasets such as telecom churn or census data. A comparison of traditional methods, statistical analysis, and machine learning approaches with GPT3 will be made. Furthermore, a discussion on the pros and cons of using GPT3 for this research is also discussed from the point of view of performance, accuracy, and reliability concerns.","PeriodicalId":426897,"journal":{"name":"2022 IEEE Bombay Section Signature Conference (IBSSC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Statistical Data Analysis using GPT3: An Overview\",\"authors\":\"Ashwin Sharma, Disha Devalia, Wilfred Almeida, Harshali P. Patil, A. Mishra\",\"doi\":\"10.1109/IBSSC56953.2022.10037383\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Though automated statistics has started gaining some momentum in the field of data analysis, it is not unified and very slow with large datasets. Due to computing limitations or lack of specific domain knowledge, general statistics have been used most commonly. But now research advisors are attracted towards a machine learning-based approach for statistical analysis of Data Sets which may help bridge gaps between traditional approaches like correlation matrices, p-values, etc., and new models like GPT3. This paper proposes a novel approach for the analysis of large datasets which uses GPT3 to predict insights from calculated statistics of data. The research addresses the limitations of existing methods and proposes a novel framework to analyze large statistical data sets, which solves many computationally challenging problems in efficient ways. Our proposed method works on top of GPT3's features, where it learns to predict individual words from particular parts of the dataset you pass as prompts (cumulative sums/means etc.) enabling us to analyze extremely large datasets such as telecom churn or census data. A comparison of traditional methods, statistical analysis, and machine learning approaches with GPT3 will be made. Furthermore, a discussion on the pros and cons of using GPT3 for this research is also discussed from the point of view of performance, accuracy, and reliability concerns.\",\"PeriodicalId\":426897,\"journal\":{\"name\":\"2022 IEEE Bombay Section Signature Conference (IBSSC)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Bombay Section Signature Conference (IBSSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IBSSC56953.2022.10037383\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Bombay Section Signature Conference (IBSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBSSC56953.2022.10037383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Though automated statistics has started gaining some momentum in the field of data analysis, it is not unified and very slow with large datasets. Due to computing limitations or lack of specific domain knowledge, general statistics have been used most commonly. But now research advisors are attracted towards a machine learning-based approach for statistical analysis of Data Sets which may help bridge gaps between traditional approaches like correlation matrices, p-values, etc., and new models like GPT3. This paper proposes a novel approach for the analysis of large datasets which uses GPT3 to predict insights from calculated statistics of data. The research addresses the limitations of existing methods and proposes a novel framework to analyze large statistical data sets, which solves many computationally challenging problems in efficient ways. Our proposed method works on top of GPT3's features, where it learns to predict individual words from particular parts of the dataset you pass as prompts (cumulative sums/means etc.) enabling us to analyze extremely large datasets such as telecom churn or census data. A comparison of traditional methods, statistical analysis, and machine learning approaches with GPT3 will be made. Furthermore, a discussion on the pros and cons of using GPT3 for this research is also discussed from the point of view of performance, accuracy, and reliability concerns.