Pub Date : 2009-12-01DOI: 10.4018/978-1-60566-748-5.CH011
R. Alves, J. Ribeiro, O. Belo, Jiawei Han
Business organizations must pay attention to interesting changes in customer behavior in order to anticipate their needs and act accordingly with appropriated business actions. Tracking customer’s commercial paths through the products they are interested in is an essential technique to improve business and increase customer satisfaction. Data warehousing (DW) allows us to do so, giving the basic means to record every customer transaction based on the different business strategies established. Although managing such huge amounts of records may imply business advantage, its exploration, especially in a multi-dimensional space (MDS), is a nontrivial task. The more dimensions we want to explore, the more are the computational costs involved in multi-dimensional data analysis (MDA). To make MDA practical in real world business problems, DW researchers have been working on combining data cubing and mining techniques to detect interesting changes in MDS. Such changes can also be detected through gradient queries. While those studies have provided the basis for future research in MDA, just few of them points to preference query selection in MDS. Thus, not only the exploration of changes in MDS is an essential task, but also even more important is ranking most interesting gradients. In this chapter, the authors investigate how to mine and rank the most interesting changes in a MDS applying a TOP-K gradient strategy. Additionally, the authors also propose a gradient-based cubing method to evaluate interesting gradient regions in MDS. So, the challenge is to find maximum gradient regions (MGRs) that maximize the task of raking gradients in a MDS. The authors’ evaluation study demonstrates that the proposed method presents a promising strategy for ranking gradients in MDS. DOI: 10.4018/978-1-60566-748-5.ch011
{"title":"Ranking Gradients in Multi-Dimensional Spaces","authors":"R. Alves, J. Ribeiro, O. Belo, Jiawei Han","doi":"10.4018/978-1-60566-748-5.CH011","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH011","url":null,"abstract":"Business organizations must pay attention to interesting changes in customer behavior in order to anticipate their needs and act accordingly with appropriated business actions. Tracking customer’s commercial paths through the products they are interested in is an essential technique to improve business and increase customer satisfaction. Data warehousing (DW) allows us to do so, giving the basic means to record every customer transaction based on the different business strategies established. Although managing such huge amounts of records may imply business advantage, its exploration, especially in a multi-dimensional space (MDS), is a nontrivial task. The more dimensions we want to explore, the more are the computational costs involved in multi-dimensional data analysis (MDA). To make MDA practical in real world business problems, DW researchers have been working on combining data cubing and mining techniques to detect interesting changes in MDS. Such changes can also be detected through gradient queries. While those studies have provided the basis for future research in MDA, just few of them points to preference query selection in MDS. Thus, not only the exploration of changes in MDS is an essential task, but also even more important is ranking most interesting gradients. In this chapter, the authors investigate how to mine and rank the most interesting changes in a MDS applying a TOP-K gradient strategy. Additionally, the authors also propose a gradient-based cubing method to evaluate interesting gradient regions in MDS. So, the challenge is to find maximum gradient regions (MGRs) that maximize the task of raking gradients in a MDS. The authors’ evaluation study demonstrates that the proposed method presents a promising strategy for ranking gradients in MDS. DOI: 10.4018/978-1-60566-748-5.ch011","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117202534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.CH013
A. Freitas, A. Pereira
Classification plays an important role in medicine, especially for medical diagnosis. Real-world medical applications often require classifiers that minimize the total cost, including costs for wrong diagnosis (misclassifications costs) and diagnostic test costs (attribute costs). There are indeed many reasons for considering costs in medicine, as diagnostic tests are not free and health budgets are limited. In this chapter, the authors have defined strategies for cost-sensitive learning. They have developed an algorithm for decision tree induction that considers various types of costs, including test costs, delayed costs and costs associated with risk. Then they have applied their strategy to train and to evaluate cost-sensitive decision trees in medical data. Generated trees can be tested following some strategies, including group costs, common costs, and individual costs. Using the factor of “risk” it is possible to penalize invasive or delayed tests and obtain patient-friendly decision trees.
{"title":"Learning Cost-Sensitive Decision Trees to Support Medical Diagnosis","authors":"A. Freitas, A. Pereira","doi":"10.4018/978-1-60566-748-5.CH013","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH013","url":null,"abstract":"Classification plays an important role in medicine, especially for medical diagnosis. Real-world medical applications often require classifiers that minimize the total cost, including costs for wrong diagnosis (misclassifications costs) and diagnostic test costs (attribute costs). There are indeed many reasons for considering costs in medicine, as diagnostic tests are not free and health budgets are limited. In this chapter, the authors have defined strategies for cost-sensitive learning. They have developed an algorithm for decision tree induction that considers various types of costs, including test costs, delayed costs and costs associated with risk. Then they have applied their strategy to train and to evaluate cost-sensitive decision trees in medical data. Generated trees can be tested following some strategies, including group costs, common costs, and individual costs. Using the factor of “risk” it is possible to penalize invasive or delayed tests and obtain patient-friendly decision trees.","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115027629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.CH015
M. Gebski, A. Penev, R. Wong
{"title":"Protocol Identification of Encrypted Network Streams","authors":"M. Gebski, A. Penev, R. Wong","doi":"10.4018/978-1-60566-748-5.CH015","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH015","url":null,"abstract":"","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132884791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.CH001
Todd Eavis
Over the past ten to fifteen years, data warehousing (DW) has become increasingly important to organizations of all sizes. In particular, the representation of historical data across broad time frames allows decision makers to monitor evolutionary patterns and trends that would simply not be possible with operational databases alone. However, this accumulation of historical data comes at a price; namely, ABStrAct
{"title":"The LBF R-Tree","authors":"Todd Eavis","doi":"10.4018/978-1-60566-748-5.CH001","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH001","url":null,"abstract":"Over the past ten to fifteen years, data warehousing (DW) has become increasingly important to organizations of all sizes. In particular, the representation of historical data across broad time frames allows decision makers to monitor evolutionary patterns and trends that would simply not be possible with operational databases alone. However, this accumulation of historical data comes at a price; namely, ABStrAct","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115693885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.CH016
Rodrigo Salvador Monteiro, Geraldo Zimbrão, H. Schwarz, B. Mitschang, J. Souza
Calendar-based schemas (Li, Y. et al., 2001) (Ramaswamy, S. et al., 1998) were proposed as a semantically rich representation of time intervals and used to mine temporal association rules. An example of a calendar schema is (year, month, day, day_period), which defines a set of calendar patterns, such as every morning of January of 1999 (1999, January, *, morning) or every 16th day of January ABStrAct
基于日历的模式(Li, Y. et al., 2001) (Ramaswamy, S. et al., 1998)被提出作为时间间隔语义丰富的表示,并用于挖掘时间关联规则。日历模式的一个例子是(year, month, day, day_period),它定义了一组日历模式,例如1999年1月的每个早晨(1999,January, *, morning)或1月的每个16天
{"title":"Exploring Calendar-Based Pattern Mining in Data Streams","authors":"Rodrigo Salvador Monteiro, Geraldo Zimbrão, H. Schwarz, B. Mitschang, J. Souza","doi":"10.4018/978-1-60566-748-5.CH016","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH016","url":null,"abstract":"Calendar-based schemas (Li, Y. et al., 2001) (Ramaswamy, S. et al., 1998) were proposed as a semantically rich representation of time intervals and used to mine temporal association rules. An example of a calendar schema is (year, month, day, day_period), which defines a set of calendar patterns, such as every morning of January of 1999 (1999, January, *, morning) or every 16th day of January ABStrAct","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125487132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.CH005
Stefan Berger, M. Schrefl
{"title":"Federated Data Warehouses","authors":"Stefan Berger, M. Schrefl","doi":"10.4018/978-1-60566-748-5.CH005","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH005","url":null,"abstract":"","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128888827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.CH006
Jérôme Cubillé, C. Derquenne, S. Goutier, F. Guisnel, Henri Klajnmic, V. Cariou
{"title":"Built-In Indicators to Support Business Intelligence in OLAP Databases","authors":"Jérôme Cubillé, C. Derquenne, S. Goutier, F. Guisnel, Henri Klajnmic, V. Cariou","doi":"10.4018/978-1-60566-748-5.CH006","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH006","url":null,"abstract":"","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133744647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.CH009
Rogério Luís de Carvalho Costa, P. Furtado
{"title":"Deploying Data Warehouses in Grids with Efficiency and Availability","authors":"Rogério Luís de Carvalho Costa, P. Furtado","doi":"10.4018/978-1-60566-748-5.CH009","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH009","url":null,"abstract":"","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127710877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.CH014
Jia-Ling Koh, Shu-Ning Shin, Yuan-Bin Don
Recently, the data stream, which is an unbounded sequence of data elements generated at a rapid rate, provides a dynamic environment for collecting data sources. It is likely that the embedded knowledge in a data stream will change quickly as time goes by. Therefore, catching the recent trend of data is an important issue when mining frequent itemsets over data streams. Although the sliding window model proposed a good solution for this problem, the appearing information of patterns within a sliding window has to be maintained completely in the traditional approach. For estimating the approximate supports of patterns within a sliding window, the frequency changing point (FCP) method is proposed for monitoring the recent occurrences of itemsets over a data stream. In addition to a basic design proposed under the assumption that exact one transaction arrives at each time point, the FCP method is extended for maintaining recent patterns over a data stream where a block of various numbers of transactions (including zero or more transactions) is inputted within a fixed time unit. Accordingly, the recently frequent itemsets or representative patterns are discovered from the maintained structure approximately. Experimental studies demonstrate that the proposed algorithms achieve high true positive rates and guarantees no false dismissal to the results yielded. A theoretic analysis is provided for the guarantee. In addition, the authors’ approach outperforms the previously proposed method in terms of reducing the run-time memory usage significantly. DOI: 10.4018/978-1-60566-748-5.ch014
{"title":"An Approximate Approach for Maintaining Recent Occurrences of Itemsets in a Sliding Window over Data Streams","authors":"Jia-Ling Koh, Shu-Ning Shin, Yuan-Bin Don","doi":"10.4018/978-1-60566-748-5.CH014","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH014","url":null,"abstract":"Recently, the data stream, which is an unbounded sequence of data elements generated at a rapid rate, provides a dynamic environment for collecting data sources. It is likely that the embedded knowledge in a data stream will change quickly as time goes by. Therefore, catching the recent trend of data is an important issue when mining frequent itemsets over data streams. Although the sliding window model proposed a good solution for this problem, the appearing information of patterns within a sliding window has to be maintained completely in the traditional approach. For estimating the approximate supports of patterns within a sliding window, the frequency changing point (FCP) method is proposed for monitoring the recent occurrences of itemsets over a data stream. In addition to a basic design proposed under the assumption that exact one transaction arrives at each time point, the FCP method is extended for maintaining recent patterns over a data stream where a block of various numbers of transactions (including zero or more transactions) is inputted within a fixed time unit. Accordingly, the recently frequent itemsets or representative patterns are discovered from the maintained structure approximately. Experimental studies demonstrate that the proposed algorithms achieve high true positive rates and guarantees no false dismissal to the results yielded. A theoretic analysis is provided for the guarantee. In addition, the authors’ approach outperforms the previously proposed method in terms of reducing the run-time memory usage significantly. DOI: 10.4018/978-1-60566-748-5.ch014","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114630927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.4018/978-1-60566-748-5.ch003
Songmei Yu, V. Atluri, Nabil R. Adam
This issue of Exception Children, we think, is exceptionally rich. There are articles examining important practices, policies, and research methods. The first three papers are about “SMART” research design, which stands for “sequential multiple-assignment randomized trial” and provides guidance about how to design analyses of multitiered programs. In our regular reports of research for this issue, we also have three articles. They examine preparing special education teachers, teaching of fractions, and policy for students with visual impairments. In the first article in the special series on SMART designs, Roberts and colleagues explain how SMART designs can be used to understand the contributions of multitiered models of intervention. As most special educators know, familiar models of instruction (e.g., response to instruction and positive behavior intervention systems) require repeated decisions about which students receive secondary or tertiary interventions. Roberts et al. illustrate how researchers can enhance the strength of studies examining those tiered systems. Kasari and colleagues report about how they used a SMART design to study the acceptability and feasibility of social skills interventions for students with autism. Across more than 30 classrooms, the researchers found that both educators and parents had views about desirability, feasibility, and benefits of interventions implemented by both groups. In the third entry in the special series, Fluery and Towson describe how young children with autism started in a large-group dialogic reading intervention and then were given adaptive instruction based on their progress. Although teachers’ implementation of the system increased, there were minimal effects on engagement and growth on a vocabulary outcome. These results provide direction for educators examining both dialogic reading processes and tiered systems of instruction. In our first article, not a part of the special section, Theobald and colleagues report results of a study of teacher education. They followed teacher education graduates who had preparation in special education to see how their career paths progressed. They found that whether the teachers were endorsed in both general and special education and whether they completed student teaching with a teacher who was endorsed in special education affected the chances that the teacher candidates would actually take positions teaching special education. Jayanthi and colleagues examined methods for teaching fractions to fifth graders who were struggling in mathematics. In a randomized control trial, they studied whether teaching concepts and procedures with an emphasis on manipulatives, number lines, and writing explanations led to greater proficiency and understanding of fractions. Schles and colleagues examined the provision of services for students with visual impairments. They found that states in the United States provided supports for more than 3 times as many students with
{"title":"Preview","authors":"Songmei Yu, V. Atluri, Nabil R. Adam","doi":"10.4018/978-1-60566-748-5.ch003","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.ch003","url":null,"abstract":"This issue of Exception Children, we think, is exceptionally rich. There are articles examining important practices, policies, and research methods. The first three papers are about “SMART” research design, which stands for “sequential multiple-assignment randomized trial” and provides guidance about how to design analyses of multitiered programs. In our regular reports of research for this issue, we also have three articles. They examine preparing special education teachers, teaching of fractions, and policy for students with visual impairments. In the first article in the special series on SMART designs, Roberts and colleagues explain how SMART designs can be used to understand the contributions of multitiered models of intervention. As most special educators know, familiar models of instruction (e.g., response to instruction and positive behavior intervention systems) require repeated decisions about which students receive secondary or tertiary interventions. Roberts et al. illustrate how researchers can enhance the strength of studies examining those tiered systems. Kasari and colleagues report about how they used a SMART design to study the acceptability and feasibility of social skills interventions for students with autism. Across more than 30 classrooms, the researchers found that both educators and parents had views about desirability, feasibility, and benefits of interventions implemented by both groups. In the third entry in the special series, Fluery and Towson describe how young children with autism started in a large-group dialogic reading intervention and then were given adaptive instruction based on their progress. Although teachers’ implementation of the system increased, there were minimal effects on engagement and growth on a vocabulary outcome. These results provide direction for educators examining both dialogic reading processes and tiered systems of instruction. In our first article, not a part of the special section, Theobald and colleagues report results of a study of teacher education. They followed teacher education graduates who had preparation in special education to see how their career paths progressed. They found that whether the teachers were endorsed in both general and special education and whether they completed student teaching with a teacher who was endorsed in special education affected the chances that the teacher candidates would actually take positions teaching special education. Jayanthi and colleagues examined methods for teaching fractions to fifth graders who were struggling in mathematics. In a randomized control trial, they studied whether teaching concepts and procedures with an emphasis on manipulatives, number lines, and writing explanations led to greater proficiency and understanding of fractions. Schles and colleagues examined the provision of services for students with visual impairments. They found that states in the United States provided supports for more than 3 times as many students with","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127149093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}