首页 > 最新文献

Indian Journal of Data Mining最新文献

英文 中文
Bank Customer Churn Prediction 银行客户流失预测
Pub Date : 2023-12-30 DOI: 10.54105/ijdm.b1628.112222
Jufin P A, Amrutha N
In the current challenging era, there is a stiff competition happening between the banking industries. To strengthen the grade and level of services they provide, banks focus on customer retention as well as the customer churning. Customer churning becomes one of the duties of corporate intelligences to speculate the number of customers leaving from the bank or presumed to be churned. It also helps in predicting the number of customers retained. The primary objective of this paper is "Bank customer churn prediction" is to build a model that can distinguish and visualize which factors or attributes contribute to customer churn. In addition to that, this paper also discusses a comparison between various classification algorithms. Machine learning is a modern technology that has the potential to solve classification problems. Using supervised machine learning techniques, a best model is chosen that will assign a probability to the churn to simplify customer service to prevent customer churn. Few methodologies are compared in order to accomplish different accuracy levels. XGBoost is considered in order to check if a better model can be obtained that provides best result in terms of accuracy. The other three machine learning algorithms compared are Logistic regression, Support vector machine [SVM], and Random Forest.
在当前充满挑战的时代,银行业之间的竞争十分激烈。为了提高服务质量和水平,银行在留住客户的同时也关注客户流失问题。客户流失是企业智能的职责之一,它可以推测从银行流失或假定流失的客户数量。它还有助于预测留住的客户数量。本文 "银行客户流失预测 "的主要目的是建立一个模型,以区分和直观显示哪些因素或属性会导致客户流失。除此之外,本文还讨论了各种分类算法之间的比较。机器学习是一种现代技术,具有解决分类问题的潜力。利用有监督的机器学习技术,可以选择一个最佳模型,为客户流失分配一个概率,从而简化客户服务,防止客户流失。为了达到不同的准确度水平,我们对几种方法进行了比较。我们考虑了 XGBoost 算法,以检查是否能获得更好的模型,从而在准确性方面提供最佳结果。比较的其他三种机器学习算法是逻辑回归、支持向量机 [SVM] 和随机森林。
{"title":"Bank Customer Churn Prediction","authors":"Jufin P A, Amrutha N","doi":"10.54105/ijdm.b1628.112222","DOIUrl":"https://doi.org/10.54105/ijdm.b1628.112222","url":null,"abstract":"In the current challenging era, there is a stiff competition happening between the banking industries. To strengthen the grade and level of services they provide, banks focus on customer retention as well as the customer churning. Customer churning becomes one of the duties of corporate intelligences to speculate the number of customers leaving from the bank or presumed to be churned. It also helps in predicting the number of customers retained. The primary objective of this paper is \"Bank customer churn prediction\" is to build a model that can distinguish and visualize which factors or attributes contribute to customer churn. In addition to that, this paper also discusses a comparison between various classification algorithms. Machine learning is a modern technology that has the potential to solve classification problems. Using supervised machine learning techniques, a best model is chosen that will assign a probability to the churn to simplify customer service to prevent customer churn. Few methodologies are compared in order to accomplish different accuracy levels. XGBoost is considered in order to check if a better model can be obtained that provides best result in terms of accuracy. The other three machine learning algorithms compared are Logistic regression, Support vector machine [SVM], and Random Forest.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":" 27","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139137782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Myers-Briggs Personality Prediction 迈尔斯-布里格斯性格预测法
Pub Date : 2023-12-30 DOI: 10.54105/ijdm.b1630.053123
Rohith Muralidharan, Neenu Kuriakose, Sangeetha J
The Myers-Briggs Type Indicator (MBTI) is one of the most commonly used tool for assessing an individual's personality. This tool allows us to identify the psychological proclivity in the way they take decisions and perceive the world. MBTI has it’s applications spread across several fields which include career development and personal growth. This test consists of a set of questions which are specifically designed to evaluate and measure an individual's choices based on four dichotomies - Extraversion (E) vs. Introversion (I), Sensing (S) vs. Intuition (N), Thinking (T) vs. Feeling (F), and Judging (J) vs. Perceiving (P). Myers-Briggs Personality Prediction project aims to develop and deploy a system using machine learning which is capable of predicting one's MBTI personality type based on their online written interactions such as social media posts, comments, blogs etc. This project has significant implications for various applications, including improving customer experience, optimizing team dynamics, and developing personalized coaching programs. Through this project, we hope to gain a deeper understanding of how language use and personality type are related and to develop a robust tool for personality prediction.
迈尔斯-布里格斯类型指标(MBTI)是评估个人性格最常用的工具之一。通过这一工具,我们可以识别个人在做出决定和感知世界时的心理倾向。MBTI 的应用遍及多个领域,包括职业发展和个人成长。该测试由一系列问题组成,专门用于评估和衡量个人基于四种二分法的选择--外向(E)与内向(I)、感觉(S)与直觉(N)、思考(T)与感觉(F)以及判断(J)与感知(P)。迈尔斯-布里格斯性格预测项目旨在利用机器学习技术开发和部署一个系统,该系统能够根据一个人在社交媒体上的帖子、评论、博客等在线书面互动来预测其 MBTI 性格类型。该项目对各种应用具有重要意义,包括改善客户体验、优化团队动力和开发个性化辅导计划。我们希望通过该项目深入了解语言使用与人格类型之间的关系,并开发出一种强大的人格预测工具。
{"title":"Myers-Briggs Personality Prediction","authors":"Rohith Muralidharan, Neenu Kuriakose, Sangeetha J","doi":"10.54105/ijdm.b1630.053123","DOIUrl":"https://doi.org/10.54105/ijdm.b1630.053123","url":null,"abstract":"The Myers-Briggs Type Indicator (MBTI) is one of the most commonly used tool for assessing an individual's personality. This tool allows us to identify the psychological proclivity in the way they take decisions and perceive the world. MBTI has it’s applications spread across several fields which include career development and personal growth. This test consists of a set of questions which are specifically designed to evaluate and measure an individual's choices based on four dichotomies - Extraversion (E) vs. Introversion (I), Sensing (S) vs. Intuition (N), Thinking (T) vs. Feeling (F), and Judging (J) vs. Perceiving (P). Myers-Briggs Personality Prediction project aims to develop and deploy a system using machine learning which is capable of predicting one's MBTI personality type based on their online written interactions such as social media posts, comments, blogs etc. This project has significant implications for various applications, including improving customer experience, optimizing team dynamics, and developing personalized coaching programs. Through this project, we hope to gain a deeper understanding of how language use and personality type are related and to develop a robust tool for personality prediction.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":" 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139139474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Usage of Technology in Promoting Well-being of Senior Citizens 利用科技促进老年人的福祉
Pub Date : 2023-12-30 DOI: 10.54105/ijdm.b1631.112222
Dr. Radhika Kapur
In the present existence, with advancements taking place and with the advent of modernization, the utilization of technologies have acquired prominence. The internet is regarded as one of the eminent factors that is utilized in augmenting information in terms of all subjects and concepts. Furthermore, individuals are able to obtain answers to all types of questions that are overwhelming to them. The senior citizens make use of technologies and internet for number of purposes. In addition, they are required to obtain help from others as well in putting into operation different tasks and activities in a well-ordered manner. In cases of visual impairments and other types of health problems and illnesses, the senior citizens are unable to carry out job duties on their own, hence, they are required to obtain help from others. The senior citizens are not only incurring the feeling of satisfaction, but they are able to contribute efficiently in leading to up-gradation of overall standards of living, when they are making use of different types of technologies and internet. The senior citizens in some cases are overwhelmed by feelings by apprehensiveness and vulnerability. But understanding the concepts and getting engaged in regular practice will be facilitating in honing technical skills. Therefore, the role of technology is considered important in promoting well-being of senior citizens. The main concepts that are taken into account in this research paper are, understanding the meaning and significance of technologies, factors highlighting usage of technology in promoting well-being of senior citizens and measures to be implemented in augmenting technical skills by senior citizens.
在当今时代,随着技术的进步和现代化的到来,技术的利用已变得越来越突出。互联网被认为是用来增加所有主题和概念方面信息的重要因素之一。此外,个人还能获得他们难以解决的各类问题的答案。老年公民利用技术和互联网达到许多目的。此外,他们还需要他人的帮助,以井然有序的方式完成不同的任务和活动。在视力障碍和其他类型的健康问题和疾病的情况下,老年人无法独立完成工作任务,因此需要他人的帮助。老年人利用不同类型的技术和互联网,不仅能获得满足感,还能为提高整体生活水平做出有效贡献。在某些情况下,老年人会感到茫然和脆弱。但理解概念并参与定期实践将有助于磨练技术技能。因此,技术在促进老年人福祉方面发挥着重要作用。本研究论文中考虑的主要概念包括:理解技术的含义和意义、在促进老年人福祉方面突出使用技术的因素以及在提高老年人技术技能方面应实施的措施。
{"title":"Usage of Technology in Promoting Well-being of Senior Citizens","authors":"Dr. Radhika Kapur","doi":"10.54105/ijdm.b1631.112222","DOIUrl":"https://doi.org/10.54105/ijdm.b1631.112222","url":null,"abstract":"In the present existence, with advancements taking place and with the advent of modernization, the utilization of technologies have acquired prominence. The internet is regarded as one of the eminent factors that is utilized in augmenting information in terms of all subjects and concepts. Furthermore, individuals are able to obtain answers to all types of questions that are overwhelming to them. The senior citizens make use of technologies and internet for number of purposes. In addition, they are required to obtain help from others as well in putting into operation different tasks and activities in a well-ordered manner. In cases of visual impairments and other types of health problems and illnesses, the senior citizens are unable to carry out job duties on their own, hence, they are required to obtain help from others. The senior citizens are not only incurring the feeling of satisfaction, but they are able to contribute efficiently in leading to up-gradation of overall standards of living, when they are making use of different types of technologies and internet. The senior citizens in some cases are overwhelmed by feelings by apprehensiveness and vulnerability. But understanding the concepts and getting engaged in regular practice will be facilitating in honing technical skills. Therefore, the role of technology is considered important in promoting well-being of senior citizens. The main concepts that are taken into account in this research paper are, understanding the meaning and significance of technologies, factors highlighting usage of technology in promoting well-being of senior citizens and measures to be implemented in augmenting technical skills by senior citizens.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":" 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139137372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pre-Processing and Normalization of the Historical Weather Data Collected from Secondary Data Source for Rainfall Prediction 对从二手数据源收集的历史天气数据进行预处理和归一化处理,用于降雨预测
Pub Date : 2023-11-30 DOI: 10.54105/ijdm.b1629.113223
Deepak Sharma, Dr. Priti Sharma
In the twenty first century, data analysis has become the talk of the town. Almost every company or organization depends on data analysis for taking future decision. The most important step in data analysis after data collection is the preprocessing of the collected data. The main aim of data analysis is to find meaningful pattern by processing large amount of data. In data preprocessing, the inconsistency of collected data has been removed. After storing data for a relatively longer period, it becomes noisy and inconsistent. While measuring various parameter due to error in the instrument or human error, the value become incorrect or invalid. It is necessary to remove the invalid data otherwise it will deflect the results and produce error in the prediction. In this work preprocessing of the weather data has been analyzed for rainfall prediction using data mining.
在二十一世纪,数据分析已成为人们谈论的话题。几乎每家公司或组织都依赖数据分析来做出未来决策。在数据收集之后,数据分析中最重要的一步就是对收集到的数据进行预处理。数据分析的主要目的是通过处理大量数据找到有意义的模式。在数据预处理过程中,收集到的数据的不一致性被消除。数据存储时间相对较长后,会变得嘈杂和不一致。在测量各种参数时,由于仪器误差或人为误差,数值会变得不正确或无效。有必要删除无效数据,否则会使结果发生偏差,并在预测中产生误差。在这项工作中,利用数据挖掘对天气数据的预处理进行了分析,以进行降雨预测。
{"title":"Pre-Processing and Normalization of the Historical Weather Data Collected from Secondary Data Source for Rainfall Prediction","authors":"Deepak Sharma, Dr. Priti Sharma","doi":"10.54105/ijdm.b1629.113223","DOIUrl":"https://doi.org/10.54105/ijdm.b1629.113223","url":null,"abstract":"In the twenty first century, data analysis has become the talk of the town. Almost every company or organization depends on data analysis for taking future decision. The most important step in data analysis after data collection is the preprocessing of the collected data. The main aim of data analysis is to find meaningful pattern by processing large amount of data. In data preprocessing, the inconsistency of collected data has been removed. After storing data for a relatively longer period, it becomes noisy and inconsistent. While measuring various parameter due to error in the instrument or human error, the value become incorrect or invalid. It is necessary to remove the invalid data otherwise it will deflect the results and produce error in the prediction. In this work preprocessing of the weather data has been analyzed for rainfall prediction using data mining.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"113 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139204383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of the Proposed Rainfall Prediction Model Designed using Data Mining Techniques with the Existing Rainfall Prediction Methods 利用数据挖掘技术设计的拟议降雨预测模型与现有降雨预测方法的比较
Pub Date : 2023-11-30 DOI: 10.54105/ijdm.b1627.113223
Deepak Sharma, Dr. Priti Sharma
Weather prediction is a very old practice and people are doing predictions about weather much before the discovery of the weather measuring instrument. In ancient times, people give weather predictions by observing the sky for a long time and patterns of the stars at night. Things are a bit different now. People more relay on the past trends and patterns followed by the weather parameters. Data mining and machine leaning is used to analysis the historical weather trends by analyzing weather data using various Data mining techniques. In this paper three rainfall prediction model based on data mining techniques are proposed and compared with the other rainfall prediction model. The comparison has been done on the basis of accuracy, precision, Recall and RMSE. The proposed models are based on ensemble methods such as bagging, boosting, and stacking. Ensemble methods are used to enhance the overall performance and accuracy of the prediction. In both bagging and boosting based proposed rainfall prediction models, artificial neural network is used as a base leaner and daily weather data from the year 1988 to 2022 is used. In stacking based proposed rainfall prediction model, random forest, Logistic regression, and K-Nearest neighbor are used as base leaners or level -0 learners and Artificial neural network is used as Meta model.
天气预测是一种非常古老的做法,早在发现气象测量仪器之前,人们就已经开始预测天气了。在古代,人们通过长时间观察天空和夜晚星星的形态来预测天气。现在的情况有些不同。人们更多的是根据天气参数过去的趋势和模式进行预测。通过使用各种数据挖掘技术分析天气数据,数据挖掘和机器精益被用来分析历史天气趋势。本文提出了三种基于数据挖掘技术的降雨预测模型,并与其他降雨预测模型进行了比较。比较的依据是准确度、精确度、召回率和均方误差。所提出的模型基于集合方法,如袋装法、提升法和堆叠法。集合方法用于提高预测的整体性能和准确性。在基于bagging和boosting的降雨预测模型中,使用了人工神经网络作为基础,并使用了1988年至2022年的每日天气数据。在基于堆叠的降雨预测模型中,随机森林、逻辑回归和 K-Nearest neighbor 被用作基础学习器或第 -0 级学习器,人工神经网络被用作元模型。
{"title":"Comparison of the Proposed Rainfall Prediction Model Designed using Data Mining Techniques with the Existing Rainfall Prediction Methods","authors":"Deepak Sharma, Dr. Priti Sharma","doi":"10.54105/ijdm.b1627.113223","DOIUrl":"https://doi.org/10.54105/ijdm.b1627.113223","url":null,"abstract":"Weather prediction is a very old practice and people are doing predictions about weather much before the discovery of the weather measuring instrument. In ancient times, people give weather predictions by observing the sky for a long time and patterns of the stars at night. Things are a bit different now. People more relay on the past trends and patterns followed by the weather parameters. Data mining and machine leaning is used to analysis the historical weather trends by analyzing weather data using various Data mining techniques. In this paper three rainfall prediction model based on data mining techniques are proposed and compared with the other rainfall prediction model. The comparison has been done on the basis of accuracy, precision, Recall and RMSE. The proposed models are based on ensemble methods such as bagging, boosting, and stacking. Ensemble methods are used to enhance the overall performance and accuracy of the prediction. In both bagging and boosting based proposed rainfall prediction models, artificial neural network is used as a base leaner and daily weather data from the year 1988 to 2022 is used. In stacking based proposed rainfall prediction model, random forest, Logistic regression, and K-Nearest neighbor are used as base leaners or level -0 learners and Artificial neural network is used as Meta model.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139199841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collection of Weather Data from Authentic Websites and Secondary Data Sources for Rainfall Prediction 从真实网站和二手数据源收集天气数据用于降雨预测
Pub Date : 2023-11-30 DOI: 10.54105/ijdm.b1626.113223
Deepak Sharma, Dr. Priti Sharma
The field of data mining and machine learning has been grown many folds from the last two decades. Almost every other problem can be solved using data mining and this becomes the most tempting part of it for the scientist and researchers all over the world. Data mining can be viewed as a process of discovering knowledge. This discovery of knowledge starts with the collection of data and ends with the acquired knowledge in the form of patterns. Data collection lays the foundation for the process of knowledge discovery. In this paper, various secondary data sources from where data can be collected for rainfall prediction are deeply studied and analyzed. Some of these authentic websites and secondary data sources are NCDC (National climate data center), Kaggle, Datahub.io, UCI machine learning repository, Earth Data etc. The data collected from these secondary data sources for rainfall prediction have been critically analyzed and compared on the parameters of Accuracy, Completeness, reliability, relevance, and timeliness.
过去二十年来,数据挖掘和机器学习领域成倍增长。几乎所有其他问题都可以通过数据挖掘来解决,这也成为全世界科学家和研究人员最感兴趣的部分。数据挖掘可以看作是一个发现知识的过程。这种知识发现始于数据收集,终于以模式的形式获得知识。数据收集为知识发现过程奠定了基础。本文深入研究和分析了可用于收集降雨预测数据的各种二手数据源。其中一些真实的网站和二手数据源包括 NCDC(国家气候数据中心)、Kaggle、Datahub.io、UCI 机器学习资源库、Earth Data 等。从这些二手数据源收集到的降雨预测数据在准确性、完整性、可靠性、相关性和及时性等参数上进行了严格的分析和比较。
{"title":"Collection of Weather Data from Authentic Websites and Secondary Data Sources for Rainfall Prediction","authors":"Deepak Sharma, Dr. Priti Sharma","doi":"10.54105/ijdm.b1626.113223","DOIUrl":"https://doi.org/10.54105/ijdm.b1626.113223","url":null,"abstract":"The field of data mining and machine learning has been grown many folds from the last two decades. Almost every other problem can be solved using data mining and this becomes the most tempting part of it for the scientist and researchers all over the world. Data mining can be viewed as a process of discovering knowledge. This discovery of knowledge starts with the collection of data and ends with the acquired knowledge in the form of patterns. Data collection lays the foundation for the process of knowledge discovery. In this paper, various secondary data sources from where data can be collected for rainfall prediction are deeply studied and analyzed. Some of these authentic websites and secondary data sources are NCDC (National climate data center), Kaggle, Datahub.io, UCI machine learning repository, Earth Data etc. The data collected from these secondary data sources for rainfall prediction have been critically analyzed and compared on the parameters of Accuracy, Completeness, reliability, relevance, and timeliness.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"65 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139205475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Criticality Trend Analysis Based on Different Types of Accidents using Data Mining Approach 基于数据挖掘方法的不同事故类型临界趋势分析
Pub Date : 2022-05-30 DOI: 10.54105/ijdm.c1618.051322
Kumari Pritee, R. Garg
Safety on roads and prevention of accidents are the prime concern of any highway system. Data mining is a source of retrieval of information for knowledge discovery approach. Many data mining methodologies have been applied to accident data in the recent past years. There is need to analyze the relationship between different factors related to accidents i.e. number of persons affected by fatal, minor, grievous, non-injury, road feature (ROF), road condition (ROC), cause of accident (CAU) and vehicle responsible (VR) according to daily, fortnightly, semi-fortnightly and monthly basis. The objective of this study is divided into three sub-objectives. The First sub-objective of this study is to divide number of accident dataset of National Highway sections of Karnataka state implemented by Project Implementation Unit i.e. PIU (Bangalore, Chitradurga, Dharwad, Gulbarga, Hospet and Mangalore) during January 2012 to January 2017 collected from NHAI (National Highway Authority of India) in homogeneous clusters using K-means clustering. The second sub-objective is to reflect the relationship between different factors i.e. a number of persons affected by fatal, minor, grievous, non-injury, CAU, ROC, ROF and VR using Apriori association rule. The last sub-objective is to perform temporal trend analysis for each cluster on the basis of rules generated by Association Rule Mining.
道路安全和预防事故是任何高速公路系统最关心的问题。数据挖掘是一种检索信息源进行知识发现的方法。近年来,许多数据挖掘方法被应用到事故数据中。需要按每日、每两周、每半周和每月分析与事故有关的不同因素之间的关系,即致命、轻微、严重、非伤害、道路特征(ROF)、道路状况(ROC)、事故原因(CAU)和车辆责任(VR)。本研究的目标分为三个子目标。本研究的第一个子目标是使用K-means聚类将2012年1月至2017年1月从NHAI(印度国家公路管理局)收集的卡纳塔克邦国家公路路段的事故数据集数量划分为同质聚类,这些数据集由项目实施单位即PIU(班加罗尔,Chitradurga, Dharwad, Gulbarga, Hospet和Mangalore)实施。第二个子目标是用Apriori关联规则反映不同因素之间的关系,即致命、轻微、严重、非伤害、CAU、ROC、ROF和VR的影响人数。最后一个子目标是基于关联规则挖掘生成的规则对每个聚类进行时间趋势分析。
{"title":"Criticality Trend Analysis Based on Different Types of Accidents using Data Mining Approach","authors":"Kumari Pritee, R. Garg","doi":"10.54105/ijdm.c1618.051322","DOIUrl":"https://doi.org/10.54105/ijdm.c1618.051322","url":null,"abstract":"Safety on roads and prevention of accidents are the prime concern of any highway system. Data mining is a source of retrieval of information for knowledge discovery approach. Many data mining methodologies have been applied to accident data in the recent past years. There is need to analyze the relationship between different factors related to accidents i.e. number of persons affected by fatal, minor, grievous, non-injury, road feature (ROF), road condition (ROC), cause of accident (CAU) and vehicle responsible (VR) according to daily, fortnightly, semi-fortnightly and monthly basis. The objective of this study is divided into three sub-objectives. The First sub-objective of this study is to divide number of accident dataset of National Highway sections of Karnataka state implemented by Project Implementation Unit i.e. PIU (Bangalore, Chitradurga, Dharwad, Gulbarga, Hospet and Mangalore) during January 2012 to January 2017 collected from NHAI (National Highway Authority of India) in homogeneous clusters using K-means clustering. The second sub-objective is to reflect the relationship between different factors i.e. a number of persons affected by fatal, minor, grievous, non-injury, CAU, ROC, ROF and VR using Apriori association rule. The last sub-objective is to perform temporal trend analysis for each cluster on the basis of rules generated by Association Rule Mining.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114921446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Learning Based Brand Sentiment Mining using Lexicon Approaches A Study on Amazon Alexa 基于词典方法的无监督学习品牌情感挖掘——对亚马逊Alexa的研究
Pub Date : 2022-05-30 DOI: 10.54105/ijdm.c1619.051322
Dr. Ayan Chattopadhyay, Mr. Mukul Basu
Consumer sentiment analysis has gained immense attention in the recent past. The abundance of data in today’s world, especially those generated from the social media platforms, has triggered sentiment exploration like never before. The analysis of consumer sentiments have indeed helped organizations in effective decision making worldwide. In the communication technology domain, voice activated virtual assistants (VAVAs) are one of the latest entrants and they are gaining immense popularity by the time. Brand sentiment studies on VAVAs being limited in number creates an opportunity to explore further. This study fits into the domain of sentiment mining and the purpose of the paper is to review the consumer sentiment towards the global leader brand in the voice activated virtual assistant product segment, Amazon Alexa. Of the various approaches available, the researchers chose unsupervised learning based lexicon approach to estimate the brand sentiment. Three popular lexicon based sentiment classifiers, TextBlob, VADER and AFINN, have been used in the present context for exploration purpose. To the best of the knowledge of the researchers, this research effort includes, for the first time, multiple lexicon based approaches in exploring the sentiment towards the brand Alexa. This study shows consumers to have a significantly positive sentiment towards the chosen brand. The output from the three comparative classifiers reveal similar results which also validates the robustness of the outcomes and that of the chosen methods. The study anticipates a bright sales potential of the brand. Also, the use of alternative lexicon approaches is expected to enrich the existing literature in the sentiment mining domain.
最近,消费者情绪分析备受关注。当今世界的大量数据,尤其是来自社交媒体平台的数据,引发了前所未有的情绪探索。对消费者情绪的分析确实有助于组织在全球范围内进行有效的决策。在通信技术领域,语音激活虚拟助手(VAVAs)是最新进入者之一,并在当时获得了巨大的普及。对vava品牌情感的研究数量有限,这为进一步探索创造了机会。这项研究适用于情感挖掘领域,本文的目的是回顾消费者对语音激活虚拟助理产品领域全球领先品牌亚马逊Alexa的情绪。在各种可用的方法中,研究人员选择了基于无监督学习的词典方法来估计品牌情绪。三个流行的基于词典的情感分类器,TextBlob, VADER和AFINN,已经在目前的语境中用于探索目的。据研究人员所知,这项研究首次包括了多种基于词典的方法来探索人们对Alexa品牌的看法。这项研究表明,消费者对所选择的品牌有显著的积极情绪。从三个比较分类器的输出显示类似的结果,这也验证了结果的稳健性和所选择的方法。这项研究预计该品牌的销售潜力很大。此外,替代词汇方法的使用有望丰富情感挖掘领域的现有文献。
{"title":"Unsupervised Learning Based Brand Sentiment Mining using Lexicon Approaches A Study on Amazon Alexa","authors":"Dr. Ayan Chattopadhyay, Mr. Mukul Basu","doi":"10.54105/ijdm.c1619.051322","DOIUrl":"https://doi.org/10.54105/ijdm.c1619.051322","url":null,"abstract":"Consumer sentiment analysis has gained immense attention in the recent past. The abundance of data in today’s world, especially those generated from the social media platforms, has triggered sentiment exploration like never before. The analysis of consumer sentiments have indeed helped organizations in effective decision making worldwide. In the communication technology domain, voice activated virtual assistants (VAVAs) are one of the latest entrants and they are gaining immense popularity by the time. Brand sentiment studies on VAVAs being limited in number creates an opportunity to explore further. This study fits into the domain of sentiment mining and the purpose of the paper is to review the consumer sentiment towards the global leader brand in the voice activated virtual assistant product segment, Amazon Alexa. Of the various approaches available, the researchers chose unsupervised learning based lexicon approach to estimate the brand sentiment. Three popular lexicon based sentiment classifiers, TextBlob, VADER and AFINN, have been used in the present context for exploration purpose. To the best of the knowledge of the researchers, this research effort includes, for the first time, multiple lexicon based approaches in exploring the sentiment towards the brand Alexa. This study shows consumers to have a significantly positive sentiment towards the chosen brand. The output from the three comparative classifiers reveal similar results which also validates the robustness of the outcomes and that of the chosen methods. The study anticipates a bright sales potential of the brand. Also, the use of alternative lexicon approaches is expected to enrich the existing literature in the sentiment mining domain.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131089234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Implementation of Rainfall Prediction Model using Supervised Machine Learning Data Mining Technique s 基于监督机器学习数据挖掘技术的降雨预测模型设计与实现[j]
Pub Date : 2021-11-10 DOI: 10.35940/ijdm.b1615.111221
D. Sharma, Priti Sharma
Data mining is a rapidly developing technology that has enriched a lot of field such as business analysis, market analysis, weather forecasting, stock market analysis and many more. It starts with collecting data sets from reliable sources and pre-processing that data. There are some anomalies associated with data collected in large volumes such as outliers, missing values, and duplicated values. Remove these kinds of anomalies is teamed as pre-processing of data. In this paper, collection of weather data and pre-processing it for rainfall prediction model using Rapid Miner tool has been discussed. Also, artificial neural network data mining techniques is used to design a rainfall prediction model. ANN classification techniques is a complex data mining technique results in high accuracy in prediction of rainfall.
数据挖掘是一项快速发展的技术,它丰富了商业分析、市场分析、天气预报、股票市场分析等许多领域。首先要从可靠的来源收集数据集,并对这些数据进行预处理。大量收集的数据有一些异常,如异常值、缺失值和重复值。消除这类异常被称为数据的预处理。本文讨论了利用Rapid Miner工具收集气象数据并对其进行预处理以建立降雨预报模型。同时,利用人工神经网络数据挖掘技术设计了降雨预测模型。人工神经网络分类技术是一种复杂的数据挖掘技术,具有较高的预测精度。
{"title":"Design and Implementation of Rainfall Prediction Model using Supervised Machine Learning Data Mining Technique s","authors":"D. Sharma, Priti Sharma","doi":"10.35940/ijdm.b1615.111221","DOIUrl":"https://doi.org/10.35940/ijdm.b1615.111221","url":null,"abstract":"Data mining is a rapidly developing technology that has enriched a lot of field such as business analysis, market analysis, weather forecasting, stock market analysis and many more. It starts with collecting data sets from reliable sources and pre-processing that data. There are some anomalies associated with data collected in large volumes such as outliers, missing values, and duplicated values. Remove these kinds of anomalies is teamed as pre-processing of data. In this paper, collection of weather data and pre-processing it for rainfall prediction model using Rapid Miner tool has been discussed. Also, artificial neural network data mining techniques is used to design a rainfall prediction model. ANN classification techniques is a complex data mining technique results in high accuracy in prediction of rainfall.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133634084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Swear and Negative Texts on Social Media Users 脏话和负面短信对社交媒体用户的影响
Pub Date : 2021-11-10 DOI: 10.35940/ijdm.b1614.111221
Srishty Jindal, Dr. Prof. S.V.A.V. Prasad, Dr. K. Venkatesh Sharma
Nowadays, the use of social media has increased exponentially. People show different behavior on social media depending on the kind of responses and behavior of people around them. It is important now to analyze the behavior of social media users and the way how they affect their friends. In this paper, behavioral analysis of people is done based on Twitter data. An algorithm is proposed which helps in finding the impact of text written by someone on social media and its effect on others. The impact of written text is calculated with the help of the number of retweets done for the same tweet. The severity of the used word is calculated based on AFINN dictionary. According to the proposed algorithm, the score of the dictionary is recalculated when a negative word is forwarded multiple times. This is done with the understanding that if a less severe negative word is used many times, it may affect the person in a highly negative manner. With this, Severity of words is recalculated and its impact on people is found with the help of the proposed algorithm. The impact of using negative words on social media affect 32 % of the total users (in their friend-list). Behavior change is demonstrated with the help of graphs week-wise, month-wise and year-wise analyses. The research helps in finding the impact of swear words on social media users depending on the frequency and severity score of the words.
如今,社交媒体的使用呈指数级增长。根据周围人的反应和行为,人们在社交媒体上表现出不同的行为。现在分析社交媒体用户的行为以及他们如何影响朋友是很重要的。在本文中,人们的行为分析是基于Twitter的数据。提出了一种算法,有助于发现某人在社交媒体上撰写的文本的影响及其对他人的影响。书面文本的影响是通过同一条推文的转发数量来计算的。使用的单词的严重程度是基于AFINN字典计算的。根据提出的算法,当一个否定词被多次转发时,重新计算字典的分数。这样做的前提是,如果一个不那么严重的负面词汇被多次使用,它可能会以一种非常负面的方式影响这个人。在此基础上,重新计算单词的严重性,并利用该算法发现单词对人的影响。在社交媒体上使用负面词汇的影响影响了32%的用户(在他们的朋友列表中)。行为变化是通过图表的帮助,每周,每月和每年的分析来证明的。这项研究有助于发现脏话对社交媒体用户的影响,这取决于这些词的使用频率和严重程度。
{"title":"Impact of Swear and Negative Texts on Social Media Users","authors":"Srishty Jindal, Dr. Prof. S.V.A.V. Prasad, Dr. K. Venkatesh Sharma","doi":"10.35940/ijdm.b1614.111221","DOIUrl":"https://doi.org/10.35940/ijdm.b1614.111221","url":null,"abstract":"Nowadays, the use of social media has increased exponentially. People show different behavior on social media depending on the kind of responses and behavior of people around them. It is important now to analyze the behavior of social media users and the way how they affect their friends. In this paper, behavioral analysis of people is done based on Twitter data. An algorithm is proposed which helps in finding the impact of text written by someone on social media and its effect on others. The impact of written text is calculated with the help of the number of retweets done for the same tweet. The severity of the used word is calculated based on AFINN dictionary. According to the proposed algorithm, the score of the dictionary is recalculated when a negative word is forwarded multiple times. This is done with the understanding that if a less severe negative word is used many times, it may affect the person in a highly negative manner. With this, Severity of words is recalculated and its impact on people is found with the help of the proposed algorithm. The impact of using negative words on social media affect 32 % of the total users (in their friend-list). Behavior change is demonstrated with the help of graphs week-wise, month-wise and year-wise analyses. The research helps in finding the impact of swear words on social media users depending on the frequency and severity score of the words.","PeriodicalId":375116,"journal":{"name":"Indian Journal of Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127651717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Indian Journal of Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1