Pub Date : 2023-12-30DOI: 10.54105/ijdm.b1628.112222
Jufin P A, Amrutha N
In the current challenging era, there is a stiff competition happening between the banking industries. To strengthen the grade and level of services they provide, banks focus on customer retention as well as the customer churning. Customer churning becomes one of the duties of corporate intelligences to speculate the number of customers leaving from the bank or presumed to be churned. It also helps in predicting the number of customers retained. The primary objective of this paper is "Bank customer churn prediction" is to build a model that can distinguish and visualize which factors or attributes contribute to customer churn. In addition to that, this paper also discusses a comparison between various classification algorithms. Machine learning is a modern technology that has the potential to solve classification problems. Using supervised machine learning techniques, a best model is chosen that will assign a probability to the churn to simplify customer service to prevent customer churn. Few methodologies are compared in order to accomplish different accuracy levels. XGBoost is considered in order to check if a better model can be obtained that provides best result in terms of accuracy. The other three machine learning algorithms compared are Logistic regression, Support vector machine [SVM], and Random Forest.
{"title":"Bank Customer Churn Prediction","authors":"Jufin P A, Amrutha N","doi":"10.54105/ijdm.b1628.112222","DOIUrl":"https://doi.org/10.54105/ijdm.b1628.112222","url":null,"abstract":"In the current challenging era, there is a stiff competition happening between the banking industries. To strengthen the grade and level of services they provide, banks focus on customer retention as well as the customer churning. Customer churning becomes one of the duties of corporate intelligences to speculate the number of customers leaving from the bank or presumed to be churned. It also helps in predicting the number of customers retained. The primary objective of this paper is \"Bank customer churn prediction\" is to build a model that can distinguish and visualize which factors or attributes contribute to customer churn. In addition to that, this paper also discusses a comparison between various classification algorithms. Machine learning is a modern technology that has the potential to solve classification problems. Using supervised machine learning techniques, a best model is chosen that will assign a probability to the churn to simplify customer service to prevent customer churn. Few methodologies are compared in order to accomplish different accuracy levels. XGBoost is considered in order to check if a better model can be obtained that provides best result in terms of accuracy. The other three machine learning algorithms compared are Logistic regression, Support vector machine [SVM], and Random Forest.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":" 27","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139137782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-30DOI: 10.54105/ijdm.b1630.053123
Rohith Muralidharan, Neenu Kuriakose, Sangeetha J
The Myers-Briggs Type Indicator (MBTI) is one of the most commonly used tool for assessing an individual's personality. This tool allows us to identify the psychological proclivity in the way they take decisions and perceive the world. MBTI has it’s applications spread across several fields which include career development and personal growth. This test consists of a set of questions which are specifically designed to evaluate and measure an individual's choices based on four dichotomies - Extraversion (E) vs. Introversion (I), Sensing (S) vs. Intuition (N), Thinking (T) vs. Feeling (F), and Judging (J) vs. Perceiving (P). Myers-Briggs Personality Prediction project aims to develop and deploy a system using machine learning which is capable of predicting one's MBTI personality type based on their online written interactions such as social media posts, comments, blogs etc. This project has significant implications for various applications, including improving customer experience, optimizing team dynamics, and developing personalized coaching programs. Through this project, we hope to gain a deeper understanding of how language use and personality type are related and to develop a robust tool for personality prediction.
{"title":"Myers-Briggs Personality Prediction","authors":"Rohith Muralidharan, Neenu Kuriakose, Sangeetha J","doi":"10.54105/ijdm.b1630.053123","DOIUrl":"https://doi.org/10.54105/ijdm.b1630.053123","url":null,"abstract":"The Myers-Briggs Type Indicator (MBTI) is one of the most commonly used tool for assessing an individual's personality. This tool allows us to identify the psychological proclivity in the way they take decisions and perceive the world. MBTI has it’s applications spread across several fields which include career development and personal growth. This test consists of a set of questions which are specifically designed to evaluate and measure an individual's choices based on four dichotomies - Extraversion (E) vs. Introversion (I), Sensing (S) vs. Intuition (N), Thinking (T) vs. Feeling (F), and Judging (J) vs. Perceiving (P). Myers-Briggs Personality Prediction project aims to develop and deploy a system using machine learning which is capable of predicting one's MBTI personality type based on their online written interactions such as social media posts, comments, blogs etc. This project has significant implications for various applications, including improving customer experience, optimizing team dynamics, and developing personalized coaching programs. Through this project, we hope to gain a deeper understanding of how language use and personality type are related and to develop a robust tool for personality prediction.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":" 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139139474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-30DOI: 10.54105/ijdm.b1631.112222
Dr. Radhika Kapur
In the present existence, with advancements taking place and with the advent of modernization, the utilization of technologies have acquired prominence. The internet is regarded as one of the eminent factors that is utilized in augmenting information in terms of all subjects and concepts. Furthermore, individuals are able to obtain answers to all types of questions that are overwhelming to them. The senior citizens make use of technologies and internet for number of purposes. In addition, they are required to obtain help from others as well in putting into operation different tasks and activities in a well-ordered manner. In cases of visual impairments and other types of health problems and illnesses, the senior citizens are unable to carry out job duties on their own, hence, they are required to obtain help from others. The senior citizens are not only incurring the feeling of satisfaction, but they are able to contribute efficiently in leading to up-gradation of overall standards of living, when they are making use of different types of technologies and internet. The senior citizens in some cases are overwhelmed by feelings by apprehensiveness and vulnerability. But understanding the concepts and getting engaged in regular practice will be facilitating in honing technical skills. Therefore, the role of technology is considered important in promoting well-being of senior citizens. The main concepts that are taken into account in this research paper are, understanding the meaning and significance of technologies, factors highlighting usage of technology in promoting well-being of senior citizens and measures to be implemented in augmenting technical skills by senior citizens.
{"title":"Usage of Technology in Promoting Well-being of Senior Citizens","authors":"Dr. Radhika Kapur","doi":"10.54105/ijdm.b1631.112222","DOIUrl":"https://doi.org/10.54105/ijdm.b1631.112222","url":null,"abstract":"In the present existence, with advancements taking place and with the advent of modernization, the utilization of technologies have acquired prominence. The internet is regarded as one of the eminent factors that is utilized in augmenting information in terms of all subjects and concepts. Furthermore, individuals are able to obtain answers to all types of questions that are overwhelming to them. The senior citizens make use of technologies and internet for number of purposes. In addition, they are required to obtain help from others as well in putting into operation different tasks and activities in a well-ordered manner. In cases of visual impairments and other types of health problems and illnesses, the senior citizens are unable to carry out job duties on their own, hence, they are required to obtain help from others. The senior citizens are not only incurring the feeling of satisfaction, but they are able to contribute efficiently in leading to up-gradation of overall standards of living, when they are making use of different types of technologies and internet. The senior citizens in some cases are overwhelmed by feelings by apprehensiveness and vulnerability. But understanding the concepts and getting engaged in regular practice will be facilitating in honing technical skills. Therefore, the role of technology is considered important in promoting well-being of senior citizens. The main concepts that are taken into account in this research paper are, understanding the meaning and significance of technologies, factors highlighting usage of technology in promoting well-being of senior citizens and measures to be implemented in augmenting technical skills by senior citizens.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":" 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139137372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.54105/ijdm.b1629.113223
Deepak Sharma, Dr. Priti Sharma
In the twenty first century, data analysis has become the talk of the town. Almost every company or organization depends on data analysis for taking future decision. The most important step in data analysis after data collection is the preprocessing of the collected data. The main aim of data analysis is to find meaningful pattern by processing large amount of data. In data preprocessing, the inconsistency of collected data has been removed. After storing data for a relatively longer period, it becomes noisy and inconsistent. While measuring various parameter due to error in the instrument or human error, the value become incorrect or invalid. It is necessary to remove the invalid data otherwise it will deflect the results and produce error in the prediction. In this work preprocessing of the weather data has been analyzed for rainfall prediction using data mining.
{"title":"Pre-Processing and Normalization of the Historical Weather Data Collected from Secondary Data Source for Rainfall Prediction","authors":"Deepak Sharma, Dr. Priti Sharma","doi":"10.54105/ijdm.b1629.113223","DOIUrl":"https://doi.org/10.54105/ijdm.b1629.113223","url":null,"abstract":"In the twenty first century, data analysis has become the talk of the town. Almost every company or organization depends on data analysis for taking future decision. The most important step in data analysis after data collection is the preprocessing of the collected data. The main aim of data analysis is to find meaningful pattern by processing large amount of data. In data preprocessing, the inconsistency of collected data has been removed. After storing data for a relatively longer period, it becomes noisy and inconsistent. While measuring various parameter due to error in the instrument or human error, the value become incorrect or invalid. It is necessary to remove the invalid data otherwise it will deflect the results and produce error in the prediction. In this work preprocessing of the weather data has been analyzed for rainfall prediction using data mining.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"113 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139204383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.54105/ijdm.b1627.113223
Deepak Sharma, Dr. Priti Sharma
Weather prediction is a very old practice and people are doing predictions about weather much before the discovery of the weather measuring instrument. In ancient times, people give weather predictions by observing the sky for a long time and patterns of the stars at night. Things are a bit different now. People more relay on the past trends and patterns followed by the weather parameters. Data mining and machine leaning is used to analysis the historical weather trends by analyzing weather data using various Data mining techniques. In this paper three rainfall prediction model based on data mining techniques are proposed and compared with the other rainfall prediction model. The comparison has been done on the basis of accuracy, precision, Recall and RMSE. The proposed models are based on ensemble methods such as bagging, boosting, and stacking. Ensemble methods are used to enhance the overall performance and accuracy of the prediction. In both bagging and boosting based proposed rainfall prediction models, artificial neural network is used as a base leaner and daily weather data from the year 1988 to 2022 is used. In stacking based proposed rainfall prediction model, random forest, Logistic regression, and K-Nearest neighbor are used as base leaners or level -0 learners and Artificial neural network is used as Meta model.
{"title":"Comparison of the Proposed Rainfall Prediction Model Designed using Data Mining Techniques with the Existing Rainfall Prediction Methods","authors":"Deepak Sharma, Dr. Priti Sharma","doi":"10.54105/ijdm.b1627.113223","DOIUrl":"https://doi.org/10.54105/ijdm.b1627.113223","url":null,"abstract":"Weather prediction is a very old practice and people are doing predictions about weather much before the discovery of the weather measuring instrument. In ancient times, people give weather predictions by observing the sky for a long time and patterns of the stars at night. Things are a bit different now. People more relay on the past trends and patterns followed by the weather parameters. Data mining and machine leaning is used to analysis the historical weather trends by analyzing weather data using various Data mining techniques. In this paper three rainfall prediction model based on data mining techniques are proposed and compared with the other rainfall prediction model. The comparison has been done on the basis of accuracy, precision, Recall and RMSE. The proposed models are based on ensemble methods such as bagging, boosting, and stacking. Ensemble methods are used to enhance the overall performance and accuracy of the prediction. In both bagging and boosting based proposed rainfall prediction models, artificial neural network is used as a base leaner and daily weather data from the year 1988 to 2022 is used. In stacking based proposed rainfall prediction model, random forest, Logistic regression, and K-Nearest neighbor are used as base leaners or level -0 learners and Artificial neural network is used as Meta model.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139199841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.54105/ijdm.b1626.113223
Deepak Sharma, Dr. Priti Sharma
The field of data mining and machine learning has been grown many folds from the last two decades. Almost every other problem can be solved using data mining and this becomes the most tempting part of it for the scientist and researchers all over the world. Data mining can be viewed as a process of discovering knowledge. This discovery of knowledge starts with the collection of data and ends with the acquired knowledge in the form of patterns. Data collection lays the foundation for the process of knowledge discovery. In this paper, various secondary data sources from where data can be collected for rainfall prediction are deeply studied and analyzed. Some of these authentic websites and secondary data sources are NCDC (National climate data center), Kaggle, Datahub.io, UCI machine learning repository, Earth Data etc. The data collected from these secondary data sources for rainfall prediction have been critically analyzed and compared on the parameters of Accuracy, Completeness, reliability, relevance, and timeliness.
过去二十年来,数据挖掘和机器学习领域成倍增长。几乎所有其他问题都可以通过数据挖掘来解决,这也成为全世界科学家和研究人员最感兴趣的部分。数据挖掘可以看作是一个发现知识的过程。这种知识发现始于数据收集,终于以模式的形式获得知识。数据收集为知识发现过程奠定了基础。本文深入研究和分析了可用于收集降雨预测数据的各种二手数据源。其中一些真实的网站和二手数据源包括 NCDC(国家气候数据中心)、Kaggle、Datahub.io、UCI 机器学习资源库、Earth Data 等。从这些二手数据源收集到的降雨预测数据在准确性、完整性、可靠性、相关性和及时性等参数上进行了严格的分析和比较。
{"title":"Collection of Weather Data from Authentic Websites and Secondary Data Sources for Rainfall Prediction","authors":"Deepak Sharma, Dr. Priti Sharma","doi":"10.54105/ijdm.b1626.113223","DOIUrl":"https://doi.org/10.54105/ijdm.b1626.113223","url":null,"abstract":"The field of data mining and machine learning has been grown many folds from the last two decades. Almost every other problem can be solved using data mining and this becomes the most tempting part of it for the scientist and researchers all over the world. Data mining can be viewed as a process of discovering knowledge. This discovery of knowledge starts with the collection of data and ends with the acquired knowledge in the form of patterns. Data collection lays the foundation for the process of knowledge discovery. In this paper, various secondary data sources from where data can be collected for rainfall prediction are deeply studied and analyzed. Some of these authentic websites and secondary data sources are NCDC (National climate data center), Kaggle, Datahub.io, UCI machine learning repository, Earth Data etc. The data collected from these secondary data sources for rainfall prediction have been critically analyzed and compared on the parameters of Accuracy, Completeness, reliability, relevance, and timeliness.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"65 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139205475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-30DOI: 10.54105/ijdm.c1618.051322
Kumari Pritee, R. Garg
Safety on roads and prevention of accidents are the prime concern of any highway system. Data mining is a source of retrieval of information for knowledge discovery approach. Many data mining methodologies have been applied to accident data in the recent past years. There is need to analyze the relationship between different factors related to accidents i.e. number of persons affected by fatal, minor, grievous, non-injury, road feature (ROF), road condition (ROC), cause of accident (CAU) and vehicle responsible (VR) according to daily, fortnightly, semi-fortnightly and monthly basis. The objective of this study is divided into three sub-objectives. The First sub-objective of this study is to divide number of accident dataset of National Highway sections of Karnataka state implemented by Project Implementation Unit i.e. PIU (Bangalore, Chitradurga, Dharwad, Gulbarga, Hospet and Mangalore) during January 2012 to January 2017 collected from NHAI (National Highway Authority of India) in homogeneous clusters using K-means clustering. The second sub-objective is to reflect the relationship between different factors i.e. a number of persons affected by fatal, minor, grievous, non-injury, CAU, ROC, ROF and VR using Apriori association rule. The last sub-objective is to perform temporal trend analysis for each cluster on the basis of rules generated by Association Rule Mining.
{"title":"Criticality Trend Analysis Based on Different Types of Accidents using Data Mining Approach","authors":"Kumari Pritee, R. Garg","doi":"10.54105/ijdm.c1618.051322","DOIUrl":"https://doi.org/10.54105/ijdm.c1618.051322","url":null,"abstract":"Safety on roads and prevention of accidents are the prime concern of any highway system. Data mining is a source of retrieval of information for knowledge discovery approach. Many data mining methodologies have been applied to accident data in the recent past years. There is need to analyze the relationship between different factors related to accidents i.e. number of persons affected by fatal, minor, grievous, non-injury, road feature (ROF), road condition (ROC), cause of accident (CAU) and vehicle responsible (VR) according to daily, fortnightly, semi-fortnightly and monthly basis. The objective of this study is divided into three sub-objectives. The First sub-objective of this study is to divide number of accident dataset of National Highway sections of Karnataka state implemented by Project Implementation Unit i.e. PIU (Bangalore, Chitradurga, Dharwad, Gulbarga, Hospet and Mangalore) during January 2012 to January 2017 collected from NHAI (National Highway Authority of India) in homogeneous clusters using K-means clustering. The second sub-objective is to reflect the relationship between different factors i.e. a number of persons affected by fatal, minor, grievous, non-injury, CAU, ROC, ROF and VR using Apriori association rule. The last sub-objective is to perform temporal trend analysis for each cluster on the basis of rules generated by Association Rule Mining.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114921446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-30DOI: 10.54105/ijdm.c1619.051322
Dr. Ayan Chattopadhyay, Mr. Mukul Basu
Consumer sentiment analysis has gained immense attention in the recent past. The abundance of data in today’s world, especially those generated from the social media platforms, has triggered sentiment exploration like never before. The analysis of consumer sentiments have indeed helped organizations in effective decision making worldwide. In the communication technology domain, voice activated virtual assistants (VAVAs) are one of the latest entrants and they are gaining immense popularity by the time. Brand sentiment studies on VAVAs being limited in number creates an opportunity to explore further. This study fits into the domain of sentiment mining and the purpose of the paper is to review the consumer sentiment towards the global leader brand in the voice activated virtual assistant product segment, Amazon Alexa. Of the various approaches available, the researchers chose unsupervised learning based lexicon approach to estimate the brand sentiment. Three popular lexicon based sentiment classifiers, TextBlob, VADER and AFINN, have been used in the present context for exploration purpose. To the best of the knowledge of the researchers, this research effort includes, for the first time, multiple lexicon based approaches in exploring the sentiment towards the brand Alexa. This study shows consumers to have a significantly positive sentiment towards the chosen brand. The output from the three comparative classifiers reveal similar results which also validates the robustness of the outcomes and that of the chosen methods. The study anticipates a bright sales potential of the brand. Also, the use of alternative lexicon approaches is expected to enrich the existing literature in the sentiment mining domain.
{"title":"Unsupervised Learning Based Brand Sentiment Mining using Lexicon Approaches A Study on Amazon Alexa","authors":"Dr. Ayan Chattopadhyay, Mr. Mukul Basu","doi":"10.54105/ijdm.c1619.051322","DOIUrl":"https://doi.org/10.54105/ijdm.c1619.051322","url":null,"abstract":"Consumer sentiment analysis has gained immense attention in the recent past. The abundance of data in today’s world, especially those generated from the social media platforms, has triggered sentiment exploration like never before. The analysis of consumer sentiments have indeed helped organizations in effective decision making worldwide. In the communication technology domain, voice activated virtual assistants (VAVAs) are one of the latest entrants and they are gaining immense popularity by the time. Brand sentiment studies on VAVAs being limited in number creates an opportunity to explore further. This study fits into the domain of sentiment mining and the purpose of the paper is to review the consumer sentiment towards the global leader brand in the voice activated virtual assistant product segment, Amazon Alexa. Of the various approaches available, the researchers chose unsupervised learning based lexicon approach to estimate the brand sentiment. Three popular lexicon based sentiment classifiers, TextBlob, VADER and AFINN, have been used in the present context for exploration purpose. To the best of the knowledge of the researchers, this research effort includes, for the first time, multiple lexicon based approaches in exploring the sentiment towards the brand Alexa. This study shows consumers to have a significantly positive sentiment towards the chosen brand. The output from the three comparative classifiers reveal similar results which also validates the robustness of the outcomes and that of the chosen methods. The study anticipates a bright sales potential of the brand. Also, the use of alternative lexicon approaches is expected to enrich the existing literature in the sentiment mining domain.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131089234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-10DOI: 10.35940/ijdm.b1615.111221
D. Sharma, Priti Sharma
Data mining is a rapidly developing technology that has enriched a lot of field such as business analysis, market analysis, weather forecasting, stock market analysis and many more. It starts with collecting data sets from reliable sources and pre-processing that data. There are some anomalies associated with data collected in large volumes such as outliers, missing values, and duplicated values. Remove these kinds of anomalies is teamed as pre-processing of data. In this paper, collection of weather data and pre-processing it for rainfall prediction model using Rapid Miner tool has been discussed. Also, artificial neural network data mining techniques is used to design a rainfall prediction model. ANN classification techniques is a complex data mining technique results in high accuracy in prediction of rainfall.
{"title":"Design and Implementation of Rainfall Prediction Model using Supervised Machine Learning Data Mining Technique s","authors":"D. Sharma, Priti Sharma","doi":"10.35940/ijdm.b1615.111221","DOIUrl":"https://doi.org/10.35940/ijdm.b1615.111221","url":null,"abstract":"Data mining is a rapidly developing technology that has enriched a lot of field such as business analysis, market analysis, weather forecasting, stock market analysis and many more. It starts with collecting data sets from reliable sources and pre-processing that data. There are some anomalies associated with data collected in large volumes such as outliers, missing values, and duplicated values. Remove these kinds of anomalies is teamed as pre-processing of data. In this paper, collection of weather data and pre-processing it for rainfall prediction model using Rapid Miner tool has been discussed. Also, artificial neural network data mining techniques is used to design a rainfall prediction model. ANN classification techniques is a complex data mining technique results in high accuracy in prediction of rainfall.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133634084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-10DOI: 10.35940/ijdm.b1614.111221
Srishty Jindal, Dr. Prof. S.V.A.V. Prasad, Dr. K. Venkatesh Sharma
Nowadays, the use of social media has increased exponentially. People show different behavior on social media depending on the kind of responses and behavior of people around them. It is important now to analyze the behavior of social media users and the way how they affect their friends. In this paper, behavioral analysis of people is done based on Twitter data. An algorithm is proposed which helps in finding the impact of text written by someone on social media and its effect on others. The impact of written text is calculated with the help of the number of retweets done for the same tweet. The severity of the used word is calculated based on AFINN dictionary. According to the proposed algorithm, the score of the dictionary is recalculated when a negative word is forwarded multiple times. This is done with the understanding that if a less severe negative word is used many times, it may affect the person in a highly negative manner. With this, Severity of words is recalculated and its impact on people is found with the help of the proposed algorithm. The impact of using negative words on social media affect 32 % of the total users (in their friend-list). Behavior change is demonstrated with the help of graphs week-wise, month-wise and year-wise analyses. The research helps in finding the impact of swear words on social media users depending on the frequency and severity score of the words.
{"title":"Impact of Swear and Negative Texts on Social Media Users","authors":"Srishty Jindal, Dr. Prof. S.V.A.V. Prasad, Dr. K. Venkatesh Sharma","doi":"10.35940/ijdm.b1614.111221","DOIUrl":"https://doi.org/10.35940/ijdm.b1614.111221","url":null,"abstract":"Nowadays, the use of social media has increased exponentially. People show different behavior on social media depending on the kind of responses and behavior of people around them. It is important now to analyze the behavior of social media users and the way how they affect their friends. In this paper, behavioral analysis of people is done based on Twitter data. An algorithm is proposed which helps in finding the impact of text written by someone on social media and its effect on others. The impact of written text is calculated with the help of the number of retweets done for the same tweet. The severity of the used word is calculated based on AFINN dictionary. According to the proposed algorithm, the score of the dictionary is recalculated when a negative word is forwarded multiple times. This is done with the understanding that if a less severe negative word is used many times, it may affect the person in a highly negative manner. With this, Severity of words is recalculated and its impact on people is found with the help of the proposed algorithm. The impact of using negative words on social media affect 32 % of the total users (in their friend-list). Behavior change is demonstrated with the help of graphs week-wise, month-wise and year-wise analyses. The research helps in finding the impact of swear words on social media users depending on the frequency and severity score of the words.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127651717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}