Pub Date : 2020-09-29DOI: 10.4236/jdaip.2020.84018
Sai Vikram Kolasani, Rida Assaf
External factors, such as social media and financial news, can have wide-spread effects on stock price movement. For this reason, social media is considered a useful resource for precise market predictions. In this paper, we show the effectiveness of using Twitter posts to predict stock prices. We start by training various models on the Sentiment 140 Twitter data. We found that Support Vector Machines (SVM) performed best (0.83 accuracy) in the sentimental analysis, so we used it to predict the average sentiment of tweets for each day that the market was open. Next, we use the sentimental analysis of one year’s data of tweets that contain the “stock market”, “stocktwits”, “AAPL” keywords, with the goal of predicting the corresponding stock prices of Apple Inc. (AAPL) and the US’s Dow Jones Industrial Average (DJIA) index prices. Two models, Boosted Regression Trees and Multilayer Perceptron Neural Networks were used to predict the closing price difference of AAPL and DJIA prices. We show that neural networks perform substantially better than traditional models for stocks’ price prediction.
{"title":"Predicting Stock Movement Using Sentiment Analysis of Twitter Feed with Neural Networks","authors":"Sai Vikram Kolasani, Rida Assaf","doi":"10.4236/jdaip.2020.84018","DOIUrl":"https://doi.org/10.4236/jdaip.2020.84018","url":null,"abstract":"External factors, such as social media and financial news, can have wide-spread effects on stock price movement. For this reason, social media is considered a useful resource for precise market predictions. In this paper, we show the effectiveness of using Twitter posts to predict stock prices. We start by training various models on the Sentiment 140 Twitter data. We found that Support Vector Machines (SVM) performed best (0.83 accuracy) in the sentimental analysis, so we used it to predict the average sentiment of tweets for each day that the market was open. Next, we use the sentimental analysis of one year’s data of tweets that contain the “stock market”, “stocktwits”, “AAPL” keywords, with the goal of predicting the corresponding stock prices of Apple Inc. (AAPL) and the US’s Dow Jones Industrial Average (DJIA) index prices. Two models, Boosted Regression Trees and Multilayer Perceptron Neural Networks were used to predict the closing price difference of AAPL and DJIA prices. We show that neural networks perform substantially better than traditional models for stocks’ price prediction.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45125518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-29DOI: 10.4236/jdaip.2020.84016
M. A. Hael, Y. Yuan
Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.
{"title":"Identifying Extreme Rainfall Events Using Functional Outliers Detection Methods","authors":"M. A. Hael, Y. Yuan","doi":"10.4236/jdaip.2020.84016","DOIUrl":"https://doi.org/10.4236/jdaip.2020.84016","url":null,"abstract":"Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47738485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-29DOI: 10.4236/jdaip.2020.84017
Wan-Ru Wu
It is a commonplace that the injury plays a vital influence in an NBA match and it may reverse the result of two teams with wide strength disparity. In this article, in order to decrease the uncertainty of the risk in the coming match, we propose a pipeline from gathering data at the player’s level including the fundamental statistics and the performance in the match before and data at the team’s level including the basic information and the opponent team’s status in the match we predict on. Confined to the limited and extremely unbalanced data, our result showed a limited power on injury prediction but it made a not bad result on the injury of the star player in a team. We also analyze the contribution of the factors to our prediction. It demonstrated that player’s own performance matters most in their injury. The Principal Component Analysis is also applied to help reduce the dimension of our data and to show the correlation of different features.
{"title":"Injury Analysis Based on Machine Learning in NBA Data","authors":"Wan-Ru Wu","doi":"10.4236/jdaip.2020.84017","DOIUrl":"https://doi.org/10.4236/jdaip.2020.84017","url":null,"abstract":"It is a commonplace that the injury plays a vital influence in an NBA match and it may reverse the result of two teams with wide strength disparity. In this article, in order to decrease the uncertainty of the risk in the coming match, we propose a pipeline from gathering data at the player’s level including the fundamental statistics and the performance in the match before and data at the team’s level including the basic information and the opponent team’s status in the match we predict on. Confined to the limited and extremely unbalanced data, our result showed a limited power on injury prediction but it made a not bad result on the injury of the star player in a team. We also analyze the contribution of the factors to our prediction. It demonstrated that player’s own performance matters most in their injury. The Principal Component Analysis is also applied to help reduce the dimension of our data and to show the correlation of different features.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43225764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-29DOI: 10.4236/jdaip.2020.84019
Eddie Musana, A. H. Basaza-Ejiri
The purpose of this research was to ascertain causes of Restocking Delays in a Distributor Company of Airtel Airtime (AA) that give justification for benefits of using Real Time Inventory Tracking (R.T.I.T) in an attempt to mitigate Restocking Delays. From a study out at the Private Marketing and Trading Services (PMTS) an Authorized Distributor of Airtel Products undertaken in 2017 evidenced by Airtime scratch card and Electronic, E-Recharge Airtime among other forms to encourage R.T.I.T among other products in Telecom Companies and other Business Enterprises. The research comprises of the following areas among which included a detailed focus on a Qualitative and Quantitative approach in obtaining different categories of Restocking Delays in form of Themes and Sub Themes encountered in the Distribution Supply Chain (SC) of AA that is contained in this paper. This research continues to capture an in-depth explanation of the Managerial and Operational causes of restocking delays in respect to AA. Similarly, fast consumer products and services other than AA require a solution to Restocking Delays through implementation of Real Time Inventory Tracking Model (R.T.I.T.M) of AA among Distributor Companies (DCs). This paper also elaborated on Literature, Methodology and Findings obtained from the study. The results were obtained from regression analysis by using the Statistical Package for Social Sciences (SPSS) that showed a higher significance of Stock Turnover Period and Airtime Denomination as a contributor to Restocking Delays whereas Messages from Airtel Head office to the Distributor had a non-significant contribution to restocking Delays as in Figure 9. The research recommends a Model for R.T.I.T in Telecom Distribution SC of AA and Omnichannel Inventory Management (OIM) as a significant contributor to timely reliable inventory restocking and promotes higher sales among DCs and retailers through minimized Restocking Delays. It shows that the forces of Demand and Supply change over time with different tastes and preferences of customers. The imbalance in AA stock levels changes at given times due to unforeseen forces of consumer demand experienced by DCs, explained by the “Bullwhip Effect” due to information distortion in the Supply Chain (SC).
{"title":"Causes of Restocking Delays in Absence of Real Time Inventory Tracking of Airtel Airtime","authors":"Eddie Musana, A. H. Basaza-Ejiri","doi":"10.4236/jdaip.2020.84019","DOIUrl":"https://doi.org/10.4236/jdaip.2020.84019","url":null,"abstract":"The purpose of this research was to ascertain causes of Restocking Delays in a Distributor Company of Airtel Airtime (AA) that give justification for benefits of using Real Time Inventory Tracking (R.T.I.T) in an attempt to mitigate Restocking Delays. From a study out at the Private Marketing and Trading Services (PMTS) an Authorized Distributor of Airtel Products undertaken in 2017 evidenced by Airtime scratch card and Electronic, E-Recharge Airtime among other forms to encourage R.T.I.T among other products in Telecom Companies and other Business Enterprises. The research comprises of the following areas among which included a detailed focus on a Qualitative and Quantitative approach in obtaining different categories of Restocking Delays in form of Themes and Sub Themes encountered in the Distribution Supply Chain (SC) of AA that is contained in this paper. This research continues to capture an in-depth explanation of the Managerial and Operational causes of restocking delays in respect to AA. Similarly, fast consumer products and services other than AA require a solution to Restocking Delays through implementation of Real Time Inventory Tracking Model (R.T.I.T.M) of AA among Distributor Companies (DCs). This paper also elaborated on Literature, Methodology and Findings obtained from the study. The results were obtained from regression analysis by using the Statistical Package for Social Sciences (SPSS) that showed a higher significance of Stock Turnover Period and Airtime Denomination as a contributor to Restocking Delays whereas Messages from Airtel Head office to the Distributor had a non-significant contribution to restocking Delays as in Figure 9. The research recommends a Model for R.T.I.T in Telecom Distribution SC of AA and Omnichannel Inventory Management (OIM) as a significant contributor to timely reliable inventory restocking and promotes higher sales among DCs and retailers through minimized Restocking Delays. It shows that the forces of Demand and Supply change over time with different tastes and preferences of customers. The imbalance in AA stock levels changes at given times due to unforeseen forces of consumer demand experienced by DCs, explained by the “Bullwhip Effect” due to information distortion in the Supply Chain (SC).","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42057452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-02DOI: 10.4236/jdaip.2020.83010
Tallal Omar, Abdullah M. Alzahrani, M. Zohdy
The academic community is currently confronting some challenges in terms of analyzing and evaluating the progress of a student’s academic performance. In the real world, classifying the performance of the students is a scientifically challenging task. Recently, some studies apply cluster analysis for evaluating the students’ results and utilize statistical techniques to part their score in regard to student’s performance. This approach, however, is not efficient. In this study, we combine two techniques, namely, k-mean and elbow clustering algorithm to evaluate the student’s performance. Based on this combination, the results of performance will be more accurate in analyzing and evaluating the progress of the student’s performance. In this study, the methodology has been implemented to define the diverse fascinating model taking the student test scores.
{"title":"Clustering Approach for Analyzing the Student’s Efficiency and Performance Based on Data","authors":"Tallal Omar, Abdullah M. Alzahrani, M. Zohdy","doi":"10.4236/jdaip.2020.83010","DOIUrl":"https://doi.org/10.4236/jdaip.2020.83010","url":null,"abstract":"The academic community is currently confronting some challenges in terms of analyzing and evaluating the progress of a student’s academic performance. In the real world, classifying the performance of the students is a scientifically challenging task. Recently, some studies apply cluster analysis for evaluating the students’ results and utilize statistical techniques to part their score in regard to student’s performance. This approach, however, is not efficient. In this study, we combine two techniques, namely, k-mean and elbow clustering algorithm to evaluate the student’s performance. Based on this combination, the results of performance will be more accurate in analyzing and evaluating the progress of the student’s performance. In this study, the methodology has been implemented to define the diverse fascinating model taking the student test scores.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45267002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-02DOI: 10.4236/jdaip.2020.83012
Haijun Zhang, Yinghui Chen
Most modern face recognition and classification systems mainly rely on hand-crafted image feature descriptors. In this paper, we propose a novel deep learning algorithm combining unsupervised and supervised learning named deep belief network embedded with Softmax regress (DBNESR) as a natural source for obtaining additional, complementary hierarchical representations, which helps to relieve us from the complicated hand-crafted feature-design step. DBNESR first learns hierarchical representations of feature by greedy layer-wise unsupervised learning in a feed-forward (bottom-up) and back-forward (top-down) manner and then makes more efficient recognition with Softmax regress by supervised learning. As a comparison with the algorithms only based on supervised learning, we again propose and design many kinds of classifiers: BP, HBPNNs, RBF, HRBFNNs, SVM and multiple classification decision fusion classifier (MCDFC)—hybrid HBPNNs-HRBFNNs-SVM classifier. The conducted experiments validate: Firstly, the proposed DBNESR is optimal for face recognition with the highest and most stable recognition rates; second, the algorithm combining unsupervised and supervised learning has better effect than all supervised learning algorithms; third, hybrid neural networks have better effect than single model neural network; fourth, the average recognition rate and variance of these algorithms in order of the largest to the smallest are respectively shown as DBNESR, MCDFC, SVM, HRBFNNs, RBF, HBPNNs, BP and BP, RBF, HBPNNs, HRBFNNs, SVM, MCDFC, DBNESR; at last, it reflects hierarchical representations of feature by DBNESR in terms of its capability of modeling hard artificial intelligent tasks.
{"title":"Hierarchical Representations Feature Deep Learning for Face Recognition","authors":"Haijun Zhang, Yinghui Chen","doi":"10.4236/jdaip.2020.83012","DOIUrl":"https://doi.org/10.4236/jdaip.2020.83012","url":null,"abstract":"Most modern face recognition and classification systems mainly rely on hand-crafted image feature descriptors. In this paper, we propose a novel deep learning algorithm combining unsupervised and supervised learning named deep belief network embedded with Softmax regress (DBNESR) as a natural source for obtaining additional, complementary hierarchical representations, which helps to relieve us from the complicated hand-crafted feature-design step. DBNESR first learns hierarchical representations of feature by greedy layer-wise unsupervised learning in a feed-forward (bottom-up) and back-forward (top-down) manner and then makes more efficient recognition with Softmax regress by supervised learning. As a comparison with the algorithms only based on supervised learning, we again propose and design many kinds of classifiers: BP, HBPNNs, RBF, HRBFNNs, SVM and multiple classification decision fusion classifier (MCDFC)—hybrid HBPNNs-HRBFNNs-SVM classifier. The conducted experiments validate: Firstly, the proposed DBNESR is optimal for face recognition with the highest and most stable recognition rates; second, the algorithm combining unsupervised and supervised learning has better effect than all supervised learning algorithms; third, hybrid neural networks have better effect than single model neural network; fourth, the average recognition rate and variance of these algorithms in order of the largest to the smallest are respectively shown as DBNESR, MCDFC, SVM, HRBFNNs, RBF, HBPNNs, BP and BP, RBF, HBPNNs, HRBFNNs, SVM, MCDFC, DBNESR; at last, it reflects hierarchical representations of feature by DBNESR in terms of its capability of modeling hard artificial intelligent tasks.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45883017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-09DOI: 10.4236/jdaip.2020.82005
Erik Sorensen, Ryan Ozzello, Rachael Rogan, Ethan Baker, N. Parks, Wei Hu
Meta-learning algorithms learn about the learning process itself so it can speed up subsequent similar learning tasks with fewer data and iterations. If achieved, these benefits expand the flexibility of traditional machine learning to areas where there are small windows of time or data available. One such area is stock trading, where the relevance of data decreases as time passes, requiring fast results on fewer data points to respond to fast-changing market trends. We, to the best of our knowledge, are the first to apply meta-learning algorithms to an evolutionary strategy for stock trading to decrease learning time by using fewer iterations and to achieve higher trading profits with fewer data points. We found that our meta-learning approach to stock trading earns profits similar to a purely evolutionary algorithm. However, it only requires 50 iterations during test, versus thousands that are typically required without meta-learning, or 50% of the training data during test.
{"title":"Meta-Learning of Evolutionary Strategy for Stock Trading","authors":"Erik Sorensen, Ryan Ozzello, Rachael Rogan, Ethan Baker, N. Parks, Wei Hu","doi":"10.4236/jdaip.2020.82005","DOIUrl":"https://doi.org/10.4236/jdaip.2020.82005","url":null,"abstract":"Meta-learning algorithms learn about the learning process itself so it can speed up subsequent similar learning tasks with fewer data and iterations. If achieved, these benefits expand the flexibility of traditional machine learning to areas where there are small windows of time or data available. One such area is stock trading, where the relevance of data decreases as time passes, requiring fast results on fewer data points to respond to fast-changing market trends. We, to the best of our knowledge, are the first to apply meta-learning algorithms to an evolutionary strategy for stock trading to decrease learning time by using fewer iterations and to achieve higher trading profits with fewer data points. We found that our meta-learning approach to stock trading earns profits similar to a purely evolutionary algorithm. However, it only requires 50 iterations during test, versus thousands that are typically required without meta-learning, or 50% of the training data during test.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46823974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-09DOI: 10.4236/jdaip.2020.82003
Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman
Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.
{"title":"Comparison of Different Machine Learning Algorithms for the Prediction of Coronary Artery Disease","authors":"Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman","doi":"10.4236/jdaip.2020.82003","DOIUrl":"https://doi.org/10.4236/jdaip.2020.82003","url":null,"abstract":"Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44647043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-09DOI: 10.4236/jdaip.2020.82004
Boxin Fu
This research is interested in the user ratings of Apps on Apple Stores. The purpose of this research is to have a better understanding of some characteristics of the good Apps on Apple Store so Apps makers can potentially focus on these traits to maximize their profit. The data for this research is collected from kaggle.com, and originally collected from iTunes Search API, according to the abstract of the data. Four different attributes contribute directly toward an App’s user rating: rating_count_tot, rating_count_ver, user_rating and user_rating_ver. The relationship between Apps receiving higher ratings and Apps receiving lower ratings is analyzed using Exploratory Data Analysis and Data Science technique “clustering” on their numerical attributes. Apps, which are represented as a data point, with similar characteristics in rating are classified as belonging to the same cluster, while common characteristics of all Apps in the same clusters are the determining traits of Apps for that cluster. Both techniques are achieved using Google Colab and libraries including pandas, numpy, seaborn, and matplotlib. The data reveals direct correlation from number of devices supported and languages supported to user rating and inverse correlation from size and price of the App to user rating. In conclusion, free small Apps that many different types of users are able to use are generally well rated by most users, according to the data.
{"title":"Characteristics Classification of Mobile Apps on Apple Store Using Clustering","authors":"Boxin Fu","doi":"10.4236/jdaip.2020.82004","DOIUrl":"https://doi.org/10.4236/jdaip.2020.82004","url":null,"abstract":"This research is interested in the user ratings of Apps on Apple Stores. The purpose of this research is to have a better understanding of some characteristics of the good Apps on Apple Store so Apps makers can potentially focus on these traits to maximize their profit. The data for this research is collected from kaggle.com, and originally collected from iTunes Search API, according to the abstract of the data. Four different attributes contribute directly toward an App’s user rating: rating_count_tot, rating_count_ver, user_rating and user_rating_ver. The relationship between Apps receiving higher ratings and Apps receiving lower ratings is analyzed using Exploratory Data Analysis and Data Science technique “clustering” on their numerical attributes. Apps, which are represented as a data point, with similar characteristics in rating are classified as belonging to the same cluster, while common characteristics of all Apps in the same clusters are the determining traits of Apps for that cluster. Both techniques are achieved using Google Colab and libraries including pandas, numpy, seaborn, and matplotlib. The data reveals direct correlation from number of devices supported and languages supported to user rating and inverse correlation from size and price of the App to user rating. In conclusion, free small Apps that many different types of users are able to use are generally well rated by most users, according to the data.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47969933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-14DOI: 10.4236/jdaip.2020.81002
Dacheng Zheng, Changqiu Li
The rational layout of urban commercial space is conducive to optimizing the allocation of commercial resources in the urban interior space. Based on the commercial POI (Point of Interest) data in the central district of Mianyang, the characteristics of urban commercial spatial pattern under different scales are analyzed by using Kernel Density Estimation, Getis-Ord , Ripley’s K Function and Location Entropy method, and the spatial agglomeration characteristics of various industries in urban commerce are studied. The results show that: 1) The spatial distribution characteristics of commercial outlets in downtown Mianyang are remarkable, and show a multi-center distribution pattern. The hot area distribution of commercial outlets based on road grid unit is generally consistent with the identified commercial density center distribution. 2) The commercial grade scale structure has been formed in the central urban area as a whole, and the distribution of commercial network hot spots based on road grid unit is generally consistent with the identified commercial density center distribution. 3) From the perspective of commercial industry, the differentiation of urban commercial space “center-periphery” is obvious, and different industries show different spatial agglomeration modes. 4) The multi-scale spatial agglomeration of each industry is different, the spatial scale of location choice of comprehensive retail, household appliances and other industries is larger, and the scale of location choice of textile, clothing, culture and sports is small. 5) There are significant differences in specialized functional areas from the perspective of industry. Mature areas show multi-functional elements, multi-advantage industry agglomeration characteristics, and a small number of developing areas also show multi-advantage industry agglomeration characteristics.
{"title":"Research on Spatial Pattern and Its Industrial Distribution of Commercial Space in Mianyang Based on POI Data","authors":"Dacheng Zheng, Changqiu Li","doi":"10.4236/jdaip.2020.81002","DOIUrl":"https://doi.org/10.4236/jdaip.2020.81002","url":null,"abstract":"The rational layout of urban commercial space is conducive to optimizing the allocation of commercial resources in the urban interior space. Based on the commercial POI (Point of Interest) data in the central district of Mianyang, the characteristics of urban commercial spatial pattern under different scales are analyzed by using Kernel Density Estimation, Getis-Ord , Ripley’s K Function and Location Entropy method, and the spatial agglomeration characteristics of various industries in urban commerce are studied. The results show that: 1) The spatial distribution characteristics of commercial outlets in downtown Mianyang are remarkable, and show a multi-center distribution pattern. The hot area distribution of commercial outlets based on road grid unit is generally consistent with the identified commercial density center distribution. 2) The commercial grade scale structure has been formed in the central urban area as a whole, and the distribution of commercial network hot spots based on road grid unit is generally consistent with the identified commercial density center distribution. 3) From the perspective of commercial industry, the differentiation of urban commercial space “center-periphery” is obvious, and different industries show different spatial agglomeration modes. 4) The multi-scale spatial agglomeration of each industry is different, the spatial scale of location choice of comprehensive retail, household appliances and other industries is larger, and the scale of location choice of textile, clothing, culture and sports is small. 5) There are significant differences in specialized functional areas from the perspective of industry. Mature areas show multi-functional elements, multi-advantage industry agglomeration characteristics, and a small number of developing areas also show multi-advantage industry agglomeration characteristics.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41765616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}