{"title":"大数据的数据评分模型设计","authors":"R. Dash","doi":"10.1504/ijie.2020.10026356","DOIUrl":null,"url":null,"abstract":"The huge volume and variety of data stored in big data provide more accurate predictive platform for the users. However, the decision-making process becomes a tedious task due to requirement of much computational time and memory to access them. Thus, a solution to the said problem is data scoring that provides the selection of only those variables or features that impact the decision-making process to a greater extend. To cater the need of an efficient data scoring model, the work carried out in this paper proposes a new data scoring model for big data. The proposed model uses adaptive LASSO as the statistical method. The steps involved in the design of the proposed model are outlined with proper explanation. The model is trained and tested by k-fold cross validation technique. The performance of the model is measured using ROC curve. The model is simulated using R and is applied on three distinct datasets. To make a comparison with LASSO, LASSO is also applied on these datasets. The simulated results reveal that the adaptive LASSO performs better than LASSO for large-sized datasets.","PeriodicalId":39490,"journal":{"name":"International Journal of Intelligent Enterprise","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design of data scoring model for big data\",\"authors\":\"R. Dash\",\"doi\":\"10.1504/ijie.2020.10026356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The huge volume and variety of data stored in big data provide more accurate predictive platform for the users. However, the decision-making process becomes a tedious task due to requirement of much computational time and memory to access them. Thus, a solution to the said problem is data scoring that provides the selection of only those variables or features that impact the decision-making process to a greater extend. To cater the need of an efficient data scoring model, the work carried out in this paper proposes a new data scoring model for big data. The proposed model uses adaptive LASSO as the statistical method. The steps involved in the design of the proposed model are outlined with proper explanation. The model is trained and tested by k-fold cross validation technique. The performance of the model is measured using ROC curve. The model is simulated using R and is applied on three distinct datasets. To make a comparison with LASSO, LASSO is also applied on these datasets. The simulated results reveal that the adaptive LASSO performs better than LASSO for large-sized datasets.\",\"PeriodicalId\":39490,\"journal\":{\"name\":\"International Journal of Intelligent Enterprise\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Enterprise\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijie.2020.10026356\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Business, Management and Accounting\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Enterprise","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijie.2020.10026356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Business, Management and Accounting","Score":null,"Total":0}
The huge volume and variety of data stored in big data provide more accurate predictive platform for the users. However, the decision-making process becomes a tedious task due to requirement of much computational time and memory to access them. Thus, a solution to the said problem is data scoring that provides the selection of only those variables or features that impact the decision-making process to a greater extend. To cater the need of an efficient data scoring model, the work carried out in this paper proposes a new data scoring model for big data. The proposed model uses adaptive LASSO as the statistical method. The steps involved in the design of the proposed model are outlined with proper explanation. The model is trained and tested by k-fold cross validation technique. The performance of the model is measured using ROC curve. The model is simulated using R and is applied on three distinct datasets. To make a comparison with LASSO, LASSO is also applied on these datasets. The simulated results reveal that the adaptive LASSO performs better than LASSO for large-sized datasets.
期刊介绍:
Major catalysts such as deregulation, global competition, technological breakthroughs, changing customer expectations, structural changes, excess capacity, environmental concerns and less protectionism, among others, are reshaping the landscape of corporations worldwide. The assumptions about predictability, stability, and clear boundaries are becoming less valid as two factors, by no means exhaustive, have a clear impact on the nature of the competitive space and are changing the sources of competitive advantage of firms and industries in new and unpredictable ways: agents with knowledge and interactions.