Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-149
P. Yeh, A. Kass
In this paper, we present a technology platform that can be customized to create a wide range of corporate radar applications that can turn the Web into a systematic source of business insight. This platform integrates a combination of established AI technologies --i.e. semantic models, natural language processing, and inference engines --in a novel way. We present two prototype corporate radars built using this platform: the Business Event Advisor, which detects threats and opportunities relevant to a decision maker's organization, and the Technology Investment Radar which assesses the maturity of technologies that impact a decision maker's business. The Technology Investment Radar has been piloted with business users, and we present encouraging initial results from this pilot.
{"title":"A Technology Platform to Enable the Building of Corporate Radar Applications that Mine the Web for Business Insight","authors":"P. Yeh, A. Kass","doi":"10.3233/978-1-60750-633-1-149","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-149","url":null,"abstract":"In this paper, we present a technology platform that can be customized to create a wide range of corporate radar applications that can turn the Web into a systematic source of business insight. This platform integrates a combination of established AI technologies --i.e. semantic models, natural language processing, and inference engines --in a novel way. We present two prototype corporate radars built using this platform: the Business Event Advisor, which detects threats and opportunities relevant to a decision maker's organization, and the Technology Investment Radar which assesses the maturity of technologies that impact a decision maker's business. The Technology Investment Radar has been piloted with business users, and we present encouraging initial results from this pilot.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129240954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-49
S. Puuronen, Mykola Pechenizkiy
Rigor data mining (DM) research has successfully developed advanced data mining techniques and algorithms, and many organizations have great expectations to take more benefit of their vast data warehouses in decision making. Even when there are some success stories the current status in practice is mainly including great expectations that have not yet been fulfilled. DM researchers have recently become interested in utility-based DM (UBDM) starting to consider some of the economic utility factors (like cost of data, cost of measurement, cost of class label and so forth), but yet many other utility factors are left outside the main directions of UBDM. The goal of this position paper is (1) to motivate researchers to consider utility from broader perspective than usually done in UBDM context and (2) to introduce a new generic framework for these broader utility considerations in DM research. Besides describing our multi-criteria utility based framework (MCUF) we present a few hypothetical examples showing how the framework might be used to consider utilities of some potential DM research stakeholders.
{"title":"Towards the Generic Framework for Utility Considerations in Data Mining Research","authors":"S. Puuronen, Mykola Pechenizkiy","doi":"10.3233/978-1-60750-633-1-49","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-49","url":null,"abstract":"Rigor data mining (DM) research has successfully developed advanced data mining techniques and algorithms, and many organizations have great expectations to take more benefit of their vast data warehouses in decision making. Even when there are some success stories the current status in practice is mainly including great expectations that have not yet been fulfilled. DM researchers have recently become interested in utility-based DM (UBDM) starting to consider some of the economic utility factors (like cost of data, cost of measurement, cost of class label and so forth), but yet many other utility factors are left outside the main directions of UBDM. The goal of this position paper is (1) to motivate researchers to consider utility from broader perspective than usually done in UBDM context and (2) to introduce a new generic framework for these broader utility considerations in DM research. Besides describing our multi-criteria utility based framework (MCUF) we present a few hypothetical examples showing how the framework might be used to consider utilities of some potential DM research stakeholders.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116259337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-123
M. Breitenbach, T. Brennan, W. Dieterich, G. Grudic
In Criminology research the question arises if certain types of delinquents can be identified from data, and while there are many cases that can not be clearly labeled, overlapping taxonomies have been proposed in [1,2,3]. In a recent study Juvenile offenders (N = 1572) from three state systems were assessed on a battery of criminogenic risk and needs factors and their official criminal histories. Cluster analysis methods were applied. One problem we encountered is the large number of hybrid cases that have to belong to two or more classes. To eliminate these cases we propose a method that combines the results of Bagged K-Means and the consistency method [4], a semi-supervised learning technique. A manual interpretation of the results showed very interpretable patterns that were linked to existing criminologic research.
{"title":"Clustering of Adolescent Criminal Offenders using Psychological and Criminological Profiles","authors":"M. Breitenbach, T. Brennan, W. Dieterich, G. Grudic","doi":"10.3233/978-1-60750-633-1-123","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-123","url":null,"abstract":"In Criminology research the question arises if certain types of delinquents can be identified from data, and while there are many cases that can not be clearly labeled, overlapping taxonomies have been proposed in [1,2,3]. In a recent study Juvenile offenders (N = 1572) from three state systems were assessed on a battery of criminogenic risk and needs factors and their official criminal histories. Cluster analysis methods were applied. One problem we encountered is the large number of hybrid cases that have to belong to two or more classes. To eliminate these cases we propose a method that combines the results of Bagged K-Means and the consistency method [4], a semi-supervised learning technique. A manual interpretation of the results showed very interpretable patterns that were linked to existing criminologic research.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121948668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-99
F. Rodrigues, V. Figueiredo, Z. Vale
This paper presents an integrated system that helps both retail companies and electricity consumers on the definition of the best retail contracts and tariffs. This integrated system is composed by a Decision Support System (DSS) based on a Consumer Characterization Framework (CCF). The CCF is based on data mining techniques, applied to obtain useful knowledge about electricity consumers from large amounts of consumption data. This knowledge is acquired following an innovative and systematic approach able to identify different consumers' classes, represented by a load profile, and its characterization using decision trees. The framework generates inputs to use in the knowledge base and in the database of the DSS. The rule sets derived from the decision trees are integrated in the knowledge base of the DSS. The load profiles together with the information about contracts and electricity prices form the database of the DSS. This DSS is able to perform the classification of different consumers, present its load profile and test different electricity tariffs and contracts. The final outputs of the DSS are a comparative economic analysis between different contracts and advice about the most economic contract to each consumer class. The presentation of the DSS is completed with an application example using a real data base of consumers from the Portuguese distribution company.
{"title":"An Integrated System to Support Electricity Tariff Contract Definition","authors":"F. Rodrigues, V. Figueiredo, Z. Vale","doi":"10.3233/978-1-60750-633-1-99","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-99","url":null,"abstract":"This paper presents an integrated system that helps both retail companies and electricity consumers on the definition of the best retail contracts and tariffs. This integrated system is composed by a Decision Support System (DSS) based on a Consumer Characterization Framework (CCF). The CCF is based on data mining techniques, applied to obtain useful knowledge about electricity consumers from large amounts of consumption data. This knowledge is acquired following an innovative and systematic approach able to identify different consumers' classes, represented by a load profile, and its characterization using decision trees. The framework generates inputs to use in the knowledge base and in the database of the DSS. The rule sets derived from the decision trees are integrated in the knowledge base of the DSS. The load profiles together with the information about contracts and electricity prices form the database of the DSS. This DSS is able to perform the classification of different consumers, present its load profile and test different electricity tariffs and contracts. The final outputs of the DSS are a comparative economic analysis between different contracts and advice about the most economic contract to each consumer class. The presentation of the DSS is completed with an application example using a real data base of consumers from the Portuguese distribution company.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128579112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-35
Raul Domingos, Thierry Van de Merckt
Predictive analytics is a well known practice among corporations having business with private consumers (B2C) as a means to achieve competitive advantage. The first part of this article intends to show that corporations operating in a business to business (B2B) setting have similar conditions to use predictive analytics on their favor. Predictive analytics can be applied to solve a myriad of business problems. The solutions to solve some of these problems are well known while the resolution of other problems requires quite an amount of research and innovation. However, predictive analytics professionals tend to solve similar problems in very different ways, even those to which there are known best practices. The second part of this article uses predictive analytics applications identified in a B2B context to describe a set of best practices to solve well known problems (the “let's not re-invent the wheel” attitude) and innovative practices to solve challenging problems.
{"title":"Best Practices for Predictive Analytics in B2B Financial Services","authors":"Raul Domingos, Thierry Van de Merckt","doi":"10.3233/978-1-60750-633-1-35","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-35","url":null,"abstract":"Predictive analytics is a well known practice among corporations having business with private consumers (B2C) as a means to achieve competitive advantage. The first part of this article intends to show that corporations operating in a business to business (B2B) setting have similar conditions to use predictive analytics on their favor. Predictive analytics can be applied to solve a myriad of business problems. The solutions to solve some of these problems are well known while the resolution of other problems requires quite an amount of research and innovation. However, predictive analytics professionals tend to solve similar problems in very different ways, even those to which there are known best practices. The second part of this article uses predictive analytics applications identified in a B2B context to describe a set of best practices to solve well known problems (the “let's not re-invent the wheel” attitude) and innovative practices to solve challenging problems.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123149273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-110
Aaron Ceglar, R. Morrall, J. Roddick
Hospitals are adept at capturing large volumes of highly multi-dimensional data about their activities including clinical, demographic, administrative, financial and, increasingly, outcome data (such as adverse events). Managing and understanding this data is difficult as hospitals typically do not have the staff and/or the expertise to assemble, query, analyse and report on the potential knowledge contained within such data. The Power Knowledge Builder (PKB) project investigated the adaption of data mining algorithms to the domain of patient costing, with the aim of helping practitioners better understand their data and therefore facilitate best practice.
{"title":"Mining Medical Administrative Data - The PKB Suite","authors":"Aaron Ceglar, R. Morrall, J. Roddick","doi":"10.3233/978-1-60750-633-1-110","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-110","url":null,"abstract":"Hospitals are adept at capturing large volumes of highly multi-dimensional data about their activities including clinical, demographic, administrative, financial and, increasingly, outcome data (such as adverse events). Managing and understanding this data is difficult as hospitals typically do not have the staff and/or the expertise to assemble, query, analyse and report on the potential knowledge contained within such data. The Power Knowledge Builder (PKB) project investigated the adaption of data mining algorithms to the domain of patient costing, with the aim of helping practitioners better understand their data and therefore facilitate best practice.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127119572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-84
L. Torgo, Carlos Soares
This paper describes a methodology for the application of hierarchical clustering methods to the task of outlier detection. The methodology is tested on the problem of cleaning Official Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they have an important impact on the statistics produced by the institute. The detectiong of these rare errors is a manual, time-consuming task. This type of tasks is usually constrained by a limited amount of available resources. Our proposal addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the cases which are most different from the other and, thus, have a higher probability of being errors. Our method is based on the output of standard agglomerative hierarchical clustering algorithms, resulting in no significant additional computational costs. Our results show that it enables large savings by selecting a small subset of suspicious transactions for manual inspection, which, nevertheless, includes most of the erroneous transactions. In this study we compare our proposal to a state of the art outlier ranking method (LOF) and show that our method achieves better results on this particular application. The results of our experiments are also competitive with previous results on the same data. Finally, the outcome of our experiments raises important questions concerning the method currently followed at INE concerning items with small number of transactions.
{"title":"Resource-bounded Outlier Detection using Clustering Methods","authors":"L. Torgo, Carlos Soares","doi":"10.3233/978-1-60750-633-1-84","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-84","url":null,"abstract":"This paper describes a methodology for the application of hierarchical clustering methods to the task of outlier detection. The methodology is tested on the problem of cleaning Official Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they have an important impact on the statistics produced by the institute. The detectiong of these rare errors is a manual, time-consuming task. This type of tasks is usually constrained by a limited amount of available resources. Our proposal addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the cases which are most different from the other and, thus, have a higher probability of being errors. Our method is based on the output of standard agglomerative hierarchical clustering algorithms, resulting in no significant additional computational costs. Our results show that it enables large savings by selecting a small subset of suspicious transactions for manual inspection, which, nevertheless, includes most of the erroneous transactions. In this study we compare our proposal to a state of the art outlier ranking method (LOF) and show that our method achieves better results on this particular application. The results of our experiments are also competitive with previous results on the same data. Finally, the outcome of our experiments raises important questions concerning the method currently followed at INE concerning items with small number of transactions.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126584611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-1
Carlos Soares, R. Ghani
This chapter introduces the volume on Data Mining (DM) for Business Applications. The chapters in this book provide an overview of some of the major advances in the field, namely in terms of methodology and applications, both traditional and emerging. In this introductory paper, we provide a context for the rest of the book. The framework for discussing the contents of the book is the DM methodology, which is suitable both to organize and relate the diverse contributions of the chapters selected. The chapter closes with an overview of the chapters in the book to guide the reader.
{"title":"Data Mining for Business Applications: Introduction","authors":"Carlos Soares, R. Ghani","doi":"10.3233/978-1-60750-633-1-1","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-1","url":null,"abstract":"This chapter introduces the volume on Data Mining (DM) for Business Applications. The chapters in this book provide an overview of some of the major advances in the field, namely in terms of methodology and applications, both traditional and emerging. In this introductory paper, we provide a context for the rest of the book. The framework for discussing the contents of the book is the DM methodology, which is suitable both to organize and relate the diverse contributions of the chapters selected. The chapter closes with an overview of the chapters in the book to guide the reader.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117262271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-77
Teemu Mutanen, S. Nousiainen, J. Ahola
This work focuses on one of the central topics in customer relationship management (CRM): transfer of valuable customers to a competitor. Customer retention rate has a strong impact on customer lifetime value, and understanding the true value of a possible customer churn will help the company in its customer relationship management. Customer value analysis along with customer churn predictions will help marketing programs target more specific groups of customers. We predict customer churn with logistic regression techniques and analyze the churning and nonchurning customers by using data from a consumer retail banking company. The result of the case study show that using conventional statistical methods to identify possible churners can be successful.
{"title":"Customer churn prediction - a case study in retail banking","authors":"Teemu Mutanen, S. Nousiainen, J. Ahola","doi":"10.3233/978-1-60750-633-1-77","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-77","url":null,"abstract":"This work focuses on one of the central topics in customer relationship management (CRM): transfer of valuable customers to a competitor. Customer retention rate has a strong impact on customer lifetime value, and understanding the true value of a possible customer churn will help the company in its customer relationship management. Customer value analysis along with customer churn predictions will help marketing programs target more specific groups of customers. We predict customer churn with logistic regression techniques and analyze the churning and nonchurning customers by using data from a consumer retail banking company. The result of the case study show that using conventional statistical methods to identify possible churners can be successful.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115461330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-08-07DOI: 10.3233/978-1-60750-633-1-137
Wolfgang Jank, Galit Shmueli
We propose a dynamic forecasting model for price in online auctions. One of the key features of our model is that it operates during the live-auction, generating real-time forecasts which makes it different from previous static models. Our model is also different with respect to how information about price is incorporated. While one part of the model is based on the more traditional notion of an auction's price-level, another part incorporates its dynamics in the form of price-velocity and -acceleration. In that sense, it incorporates key features of a dynamic environment such as an online auction. The use of novel functional data methodology allows us to measure, and subsequently include, dynamic price characteristics. We illustrate our model on a diverse set of eBay auctions across many different book categories. It achieves significantly higher prediction accuracy compared to standard approaches.
{"title":"Forecasting Online Auctions using Dynamic Models","authors":"Wolfgang Jank, Galit Shmueli","doi":"10.3233/978-1-60750-633-1-137","DOIUrl":"https://doi.org/10.3233/978-1-60750-633-1-137","url":null,"abstract":"We propose a dynamic forecasting model for price in online auctions. One of the key features of our model is that it operates during the live-auction, generating real-time forecasts which makes it different from previous static models. Our model is also different with respect to how information about price is incorporated. While one part of the model is based on the more traditional notion of an auction's price-level, another part incorporates its dynamics in the form of price-velocity and -acceleration. In that sense, it incorporates key features of a dynamic environment such as an online auction. The use of novel functional data methodology allows us to measure, and subsequently include, dynamic price characteristics. We illustrate our model on a diverse set of eBay auctions across many different book categories. It achieves significantly higher prediction accuracy compared to standard approaches.","PeriodicalId":438467,"journal":{"name":"Data Mining for Business Applications","volume":"114 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120880402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}