{"title":"Analysis of Real Time Twitter Sentiments using Deep Learning Models","authors":"Raed Alsini","doi":"10.47738/jads.v4i4.146","DOIUrl":"https://doi.org/10.47738/jads.v4i4.146","url":null,"abstract":"","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"379 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138989671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gold Prices Time-Series Forecasting: Comparison of Statistical Techniques","authors":"Indra Maryati","doi":"10.47738/jads.v4i4.135","DOIUrl":"https://doi.org/10.47738/jads.v4i4.135","url":null,"abstract":"","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"425 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138991270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple Choice Question Difficulty Level Classification with Multi Class Confusion Matrix in the Online Question Bank of Education Gallery","authors":"Pariang Sonang Siregar","doi":"10.47738/jads.v4i4.132","DOIUrl":"https://doi.org/10.47738/jads.v4i4.132","url":null,"abstract":"","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"321 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139019892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Classifier based on Histogram Matching and Outlier Detection using Hellinger distance","authors":"Anamika Gupta","doi":"10.47738/jads.v4i4.114","DOIUrl":"https://doi.org/10.47738/jads.v4i4.114","url":null,"abstract":"","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"294 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139021741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predictive and Analytics using Data Mining and Machine Learning for Customer Churn Prediction","authors":"Chandra Lukita","doi":"10.47738/jads.v4i4.131","DOIUrl":"https://doi.org/10.47738/jads.v4i4.131","url":null,"abstract":"","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"98 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139025060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This research compares unsupervised learning methods in topic extraction and modeling in large-scale text corpora. The methods used are Singular Value Decomposition (SVD) and Latent Dirichlet Allocation (LDA). SVD is used to extract important features through term-document matrix decomposition, while LDA identifies hidden topics based on the probability distribution of words. The research involves data collection, data exploratory analysis (EDA), topic extraction using SVD, data preprocessing, and topic extraction using LDA. The data used were large-scale text corpora. Data explorative analysis was conducted to understand the characteristics and structure of text corpora before topic extraction was performed. SVD and LDA were used to identify the main topics in the text corpora. The results showed that SVD and LDA were successful in topic extraction and modeling of large-scale text corpora. SVD reveals cohesive patterns and thematically related topics. LDA identifies hidden topics based on the probability distribution of words. These findings have important implications in text processing and analysis. The resulting topic representations can be used for information mining, document categorization, and more in-depth text analysis. The use of SVD and LDA in topic extraction and modeling of large-scale text corpora provides valuable insights in text analysis. However, this research has limitations. The success of the methods depends on the quality and representativeness of the text corpora. Topic interpretation still requires further understanding and analysis. Future research can develop methods and techniques to improve the accuracy and efficiency of topic extraction and text corpora modeling.
{"title":"Unsupervised Learning Methods for Topic Extraction and Modeling in Large-scale Text Corpora using LSA and LDA","authors":"Henderi Henderi","doi":"10.47738/jads.v4i3.102","DOIUrl":"https://doi.org/10.47738/jads.v4i3.102","url":null,"abstract":"This research compares unsupervised learning methods in topic extraction and modeling in large-scale text corpora. The methods used are Singular Value Decomposition (SVD) and Latent Dirichlet Allocation (LDA). SVD is used to extract important features through term-document matrix decomposition, while LDA identifies hidden topics based on the probability distribution of words. The research involves data collection, data exploratory analysis (EDA), topic extraction using SVD, data preprocessing, and topic extraction using LDA. The data used were large-scale text corpora. Data explorative analysis was conducted to understand the characteristics and structure of text corpora before topic extraction was performed. SVD and LDA were used to identify the main topics in the text corpora. The results showed that SVD and LDA were successful in topic extraction and modeling of large-scale text corpora. SVD reveals cohesive patterns and thematically related topics. LDA identifies hidden topics based on the probability distribution of words. These findings have important implications in text processing and analysis. The resulting topic representations can be used for information mining, document categorization, and more in-depth text analysis. The use of SVD and LDA in topic extraction and modeling of large-scale text corpora provides valuable insights in text analysis. However, this research has limitations. The success of the methods depends on the quality and representativeness of the text corpora. Topic interpretation still requires further understanding and analysis. Future research can develop methods and techniques to improve the accuracy and efficiency of topic extraction and text corpora modeling.","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135437965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sessions or unique visitors is the number of visitors from one IP who accessed a journal portal for the first time in a certain period of time. The large number of unique daily average subscriber visits to electronic journal pages indicates that this scientific periodical is in high demand. Hence, the number of unique visitors is an important indicator of the accomplishment of an electronic journal as a measure of the dissemination in accelerating the journal accreditation system. Numerous methods can be used for forecasting, one of which is the backpropagation neural network (BPNN). Data quality is very important in building a good BPNN model, because the success of modeling at BPNN is very dependent on input data. One way that can be carried out to improve data quality is by smoothing the data. In this study, the forecasting method for predicting time series data for unique visitors to electronic journals employed three models, respectively BPNN, BPNN with mean smoothing, and BPNN with median smoothing. Based on the findings, the results of the smallest error were obtained by the BPNN model with a mean smoothing with MSE 0.00129 and RMSE 0.03518 with a learning rate of 0.4 on 1-2-1 architecture which can be used as a forecast for unique visitors of electronic journals.
{"title":"Mean-Median Smoothing Backpropagation Neural Network to Forecast Unique Visitors Time Series of Electronic Journal","authors":"Aji Prasetya Wibawa","doi":"10.47738/jads.v4i3.97","DOIUrl":"https://doi.org/10.47738/jads.v4i3.97","url":null,"abstract":"Sessions or unique visitors is the number of visitors from one IP who accessed a journal portal for the first time in a certain period of time. The large number of unique daily average subscriber visits to electronic journal pages indicates that this scientific periodical is in high demand. Hence, the number of unique visitors is an important indicator of the accomplishment of an electronic journal as a measure of the dissemination in accelerating the journal accreditation system. Numerous methods can be used for forecasting, one of which is the backpropagation neural network (BPNN). Data quality is very important in building a good BPNN model, because the success of modeling at BPNN is very dependent on input data. One way that can be carried out to improve data quality is by smoothing the data. In this study, the forecasting method for predicting time series data for unique visitors to electronic journals employed three models, respectively BPNN, BPNN with mean smoothing, and BPNN with median smoothing. Based on the findings, the results of the smallest error were obtained by the BPNN model with a mean smoothing with MSE 0.00129 and RMSE 0.03518 with a learning rate of 0.4 on 1-2-1 architecture which can be used as a forecast for unique visitors of electronic journals.","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135437963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study aims to compare the effectiveness of three feature selection techniques, namely Principal Component Analysis (PCA), Information Gain (IG), and Recursive Feature Elimination (RFE), in predicting stock market conditions. This research uses three distinct Kaggle datasets that contain data for predicting stock market values. The results show that RFE performs better than PCA and IG in predicting market value with fairly precise accuracy. By using the RFE technique, this study was able to identify the most influential features in prediction, reduce the dimensionality of the data, and improve the performance of the prediction model. These provide significant benefits in the world of stocks, including improved investment decisions, reduced investment risk, improved trading strategy performance, and identification of promising investment opportunities. For future research, further comparative studies between other feature selection techniques can be conducted. This research has novelty in several aspects. First, it applies different feature selection techniques, namely Principal Component Analysis (PCA), Information Gain (IG), and Recursive Feature Elimination (RFE), in the context of stock market prediction. Utilizing these techniques to select the most relevant features in predicting stock market conditions provides a deeper understanding of the influence of these features on stock price movements. Furthermore, this research utilizes different datasets from Kaggle, which represent various stock market value predictions. The utilization of diverse datasets provides variation in the data and allows this research to examine the performance of feature selection techniques in multiple stock market contexts. In conclusion, this research provides insight into the effectiveness of feature selection techniques in stock market value prediction. It also provides actionable guidance for market participants to improve investment decisions and trading performance in the stock market.
{"title":"A Comparative Study of Feature Selection Techniques in Machine Learning for Predicting Stock Market Trends","authors":"Adi Suryaputra Paramita","doi":"10.47738/jads.v4i3.99","DOIUrl":"https://doi.org/10.47738/jads.v4i3.99","url":null,"abstract":"This study aims to compare the effectiveness of three feature selection techniques, namely Principal Component Analysis (PCA), Information Gain (IG), and Recursive Feature Elimination (RFE), in predicting stock market conditions. This research uses three distinct Kaggle datasets that contain data for predicting stock market values. The results show that RFE performs better than PCA and IG in predicting market value with fairly precise accuracy. By using the RFE technique, this study was able to identify the most influential features in prediction, reduce the dimensionality of the data, and improve the performance of the prediction model. These provide significant benefits in the world of stocks, including improved investment decisions, reduced investment risk, improved trading strategy performance, and identification of promising investment opportunities. For future research, further comparative studies between other feature selection techniques can be conducted. This research has novelty in several aspects. First, it applies different feature selection techniques, namely Principal Component Analysis (PCA), Information Gain (IG), and Recursive Feature Elimination (RFE), in the context of stock market prediction. Utilizing these techniques to select the most relevant features in predicting stock market conditions provides a deeper understanding of the influence of these features on stock price movements. Furthermore, this research utilizes different datasets from Kaggle, which represent various stock market value predictions. The utilization of diverse datasets provides variation in the data and allows this research to examine the performance of feature selection techniques in multiple stock market contexts. In conclusion, this research provides insight into the effectiveness of feature selection techniques in stock market value prediction. It also provides actionable guidance for market participants to improve investment decisions and trading performance in the stock market.","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135437957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Madurese is one of the regional languages in Indonesia, which dominates East Java and Madura Island in particular. The use of Madurese as a daily language has declined significantly due to a language shift in children and adolescents, some of which are caused by a sense of prestige and difficulty in learning Madurese. The scarcity of research or scientific titles that raises the Madurese language also helps reduce literacy in the language. Our research focuses on creating a translation machine for Madurese to Indonesian to maintain and preserve the existence of the Madurese language so that learning can be done through digital media. This study use the latest dataset for the Madurese-Indonesian language by using a corpus of 30,000 Madura-Indonesian sentence pairs from the online Bible. This study scrapped online Bible pages to organize the corpus based on the Indonesian and Madurese bilingual Bible. Then This study manually process text to match the two languages' scrapping results, normalization, and tokenization to remove non-printable characters and punctuation from the corpus. To perform neural machine translation (NMT), This study connected the RNN encoder with the RNN decoder of the language model, while for training and testing, This study used a sequential model with LSTM, while the BLEU measure was used to assess the accuracy of the translation results. This study used the SoftMax optimization function with Adam Optimizer and added some settings, including using 128 layers in the training process and adding a Dropout layer so that This study got the average evaluation result for BLEU-1 is 0.798068, BLEU-2 is 0.680932, BLEU-3 is 0.623489, and for BLEU-4 is 0.523546 from five tests conducted. Given the language differences between Madurese and Indonesian, this can be the best approach for machine translation of Indonesian to Madurese.
{"title":"LSTM-Based Machine Translation for Madurese-Indonesian","authors":"Danang Arbian Sulistyo","doi":"10.47738/jads.v4i3.113","DOIUrl":"https://doi.org/10.47738/jads.v4i3.113","url":null,"abstract":"Madurese is one of the regional languages in Indonesia, which dominates East Java and Madura Island in particular. The use of Madurese as a daily language has declined significantly due to a language shift in children and adolescents, some of which are caused by a sense of prestige and difficulty in learning Madurese. The scarcity of research or scientific titles that raises the Madurese language also helps reduce literacy in the language. Our research focuses on creating a translation machine for Madurese to Indonesian to maintain and preserve the existence of the Madurese language so that learning can be done through digital media. This study use the latest dataset for the Madurese-Indonesian language by using a corpus of 30,000 Madura-Indonesian sentence pairs from the online Bible. This study scrapped online Bible pages to organize the corpus based on the Indonesian and Madurese bilingual Bible. Then This study manually process text to match the two languages' scrapping results, normalization, and tokenization to remove non-printable characters and punctuation from the corpus. To perform neural machine translation (NMT), This study connected the RNN encoder with the RNN decoder of the language model, while for training and testing, This study used a sequential model with LSTM, while the BLEU measure was used to assess the accuracy of the translation results. This study used the SoftMax optimization function with Adam Optimizer and added some settings, including using 128 layers in the training process and adding a Dropout layer so that This study got the average evaluation result for BLEU-1 is 0.798068, BLEU-2 is 0.680932, BLEU-3 is 0.623489, and for BLEU-4 is 0.523546 from five tests conducted. Given the language differences between Madurese and Indonesian, this can be the best approach for machine translation of Indonesian to Madurese.","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135437964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The systemic nature of the risk of bankruptcy of financial institutions has become an important issue in maintaining the existence and stability of domestic and global finance. The use of statistics for bankruptcy prediction so far provides optimal benefits. However, this approach has limitations, especially since the model is built based on systematic relationships, so the linearity and normality aspects are often weaknesses. This can be overcome very efficiently through linear and non-linear patterns built by artificial intelligence models. One of the most popular of these techniques is the Artificial Neural Network (ANN). Many studies show that ANN and fuzzy set theory is more accurate, adaptive, and strong in predicting compared to statistical models. One technique to integrate ANN with fuzzy logic systems is through the Adaptive-Network-Based Fuzzy Inference System (ANFIS). ANFIS is an adaptive network that is functionally equivalent to fuzzy inference and has the advantages of ANN and fuzzy logic. One of the important features of ANFIS is its acclimatization capability where the membership function parameters can adapt and change in the learning procedure. Utilizing the ANN model and fuzzy logic for bankruptcy prediction is still very limited in Indonesia. Therefore, this study aims to construct a financial institution bankruptcy prediction model that is much more accurate, operational quickly, and effective through ANFIS as a hybrid of fuzzy logic and ANN. The results showed that ANFIS can be used to predict the bankruptcy of financial institutions with the best MAPE 0.140335507.
{"title":"Bank Soundness Level Prediction: ANFIS vs Deep Learning","authors":"Satia Nur Maharani","doi":"10.47738/jads.v4i3.116","DOIUrl":"https://doi.org/10.47738/jads.v4i3.116","url":null,"abstract":"The systemic nature of the risk of bankruptcy of financial institutions has become an important issue in maintaining the existence and stability of domestic and global finance. The use of statistics for bankruptcy prediction so far provides optimal benefits. However, this approach has limitations, especially since the model is built based on systematic relationships, so the linearity and normality aspects are often weaknesses. This can be overcome very efficiently through linear and non-linear patterns built by artificial intelligence models. One of the most popular of these techniques is the Artificial Neural Network (ANN). Many studies show that ANN and fuzzy set theory is more accurate, adaptive, and strong in predicting compared to statistical models. One technique to integrate ANN with fuzzy logic systems is through the Adaptive-Network-Based Fuzzy Inference System (ANFIS). ANFIS is an adaptive network that is functionally equivalent to fuzzy inference and has the advantages of ANN and fuzzy logic. One of the important features of ANFIS is its acclimatization capability where the membership function parameters can adapt and change in the learning procedure. Utilizing the ANN model and fuzzy logic for bankruptcy prediction is still very limited in Indonesia. Therefore, this study aims to construct a financial institution bankruptcy prediction model that is much more accurate, operational quickly, and effective through ANFIS as a hybrid of fuzzy logic and ANN. The results showed that ANFIS can be used to predict the bankruptcy of financial institutions with the best MAPE 0.140335507.","PeriodicalId":479720,"journal":{"name":"Journal of Applied Data Sciences","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135437954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}