Chihiro Hio, Luke Bermingham, Guochen Cai, Kyungmi Lee, Ickjai Lee
There is an increasing need for a trajectory pattern mining as the volume of available trajectory data grows at an unprecedented rate with the aid of mobile sensing. Region-of-interest mining identifies interesting hot spots that reveal trajectory concentrations. This article introduces an efficient and effective grid-based region-of-interest mining method that is linear to the number of grid cells, and is able to detect arbitrary shapes of regions-of-interest. The proposed algorithm is robust and applicable to continuous and discrete trajectories, and relatively insensitive to parameter values. Experiments show promising results which demonstrate benefits of the proposed algorithm.
{"title":"A Hybrid Grid-based Method for Mining Arbitrary Regions-of-Interest from Trajectories","authors":"Chihiro Hio, Luke Bermingham, Guochen Cai, Kyungmi Lee, Ickjai Lee","doi":"10.1145/2542652.2542653","DOIUrl":"https://doi.org/10.1145/2542652.2542653","url":null,"abstract":"There is an increasing need for a trajectory pattern mining as the volume of available trajectory data grows at an unprecedented rate with the aid of mobile sensing. Region-of-interest mining identifies interesting hot spots that reveal trajectory concentrations. This article introduces an efficient and effective grid-based region-of-interest mining method that is linear to the number of grid cells, and is able to detect arbitrary shapes of regions-of-interest. The proposed algorithm is robust and applicable to continuous and discrete trajectories, and relatively insensitive to parameter values. Experiments show promising results which demonstrate benefits of the proposed algorithm.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"15 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127148778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stream mining research has seen an impressive increase in the number of publications over the last few years. It borrows heavily from more established research fields in Machine Learning, especially from so-called online learning as well as from time series analysis. It fuses ideas and methods of both these fields and extends them in unique new ways. Stream mining needs to process potentially infinite streams of data, where the source, which generates the data, may change over time, or in other words, the source is nonstationary. Most standard learning approaches assume a stationary data source. Data may also include categorical features, something time series analysis cannot cope with that well. Additionally to models needing to be adapted continuously, they also need to be able to predict at any time, and usually cannot afford to spend much time or memory on every single example. So polynomial behaviour is not good enough, usually logarithmic complexity per example is a strict upper limit on computational resources. The MOA (Massive Online Analysis) stream mining software suite was started already in 2005, and the first open source release took place in 2007. In this talk I will first very briefly present MOA’s history, and then explain and discuss the challenges stream mining faces, and how MOA tries to address them. Finally, I will also focus on current shortcomings, and suggest ways of addressing them. As this last part is the most useful one in terms of further research, I will briefly outline these points here.
{"title":"The MOA Data Stream Mining Tool: A Mid-Term Report","authors":"B. Pfahringer","doi":"10.1145/2542652.2542660","DOIUrl":"https://doi.org/10.1145/2542652.2542660","url":null,"abstract":"Stream mining research has seen an impressive increase in the number of publications over the last few years. It borrows heavily from more established research fields in Machine Learning, especially from so-called online learning as well as from time series analysis. It fuses ideas and methods of both these fields and extends them in unique new ways. Stream mining needs to process potentially infinite streams of data, where the source, which generates the data, may change over time, or in other words, the source is nonstationary. Most standard learning approaches assume a stationary data source. Data may also include categorical features, something time series analysis cannot cope with that well. Additionally to models needing to be adapted continuously, they also need to be able to predict at any time, and usually cannot afford to spend much time or memory on every single example. So polynomial behaviour is not good enough, usually logarithmic complexity per example is a strict upper limit on computational resources. The MOA (Massive Online Analysis) stream mining software suite was started already in 2005, and the first open source release took place in 2007. In this talk I will first very briefly present MOA’s history, and then explain and discuss the challenges stream mining faces, and how MOA tries to address them. Finally, I will also focus on current shortcomings, and suggest ways of addressing them. As this last part is the most useful one in terms of further research, I will briefly outline these points here.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125606362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wireless sensor networks (WSNs) offer promising solutions for real-time object monitoring and tracking. An interesting application is train localization, in which anchor sensors are deployed along the railway track to detect the train and timely report to a gateway installed on the train. To save energy, anchor sensors operate based on an asynchronous duty-cycling protocol. The accuracy of train localization highly depends on the availability of anchor sensors when a train pass by them, which in turn depends on the duty-cycling. This paper presents an analysis of energy consumption with different levels of performance compromises. We evaluate the energy consumption through simulations, and results show that with slight performance compromise on the number of active anchors, the lifetime of anchor sensors can be significantly extended.
{"title":"Performance Analysis of Duty-Cycling Wireless Sensor Network for Train Localization","authors":"A. Javed, Haibo Zhang, Zhiyi Huang","doi":"10.1145/2542652.2542658","DOIUrl":"https://doi.org/10.1145/2542652.2542658","url":null,"abstract":"Wireless sensor networks (WSNs) offer promising solutions for real-time object monitoring and tracking. An interesting application is train localization, in which anchor sensors are deployed along the railway track to detect the train and timely report to a gateway installed on the train. To save energy, anchor sensors operate based on an asynchronous duty-cycling protocol. The accuracy of train localization highly depends on the availability of anchor sensors when a train pass by them, which in turn depends on the duty-cycling. This paper presents an analysis of energy consumption with different levels of performance compromises. We evaluate the energy consumption through simulations, and results show that with slight performance compromise on the number of active anchors, the lifetime of anchor sensors can be significantly extended.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130131520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An attempt was made to cluster the load profiles of a sample (n ≈ 380) of New Zealand households. An extensive range of approaches was evaluated, including the approach of clustering on "features" of the data rather than the raw data. A semi-automatic search of the problem space (cluster base, distance measure, cluster/partitioning method and k) resulted in a k = 3-cluster solution with acceptable quality indices and face validity. Although a particular combination of base, distance metric and clustering method was found to work well in this case, it is the practice of searching the problem space, rather than a particular solution, that is discussed and advocated.
{"title":"Clustering Household Electricity Use Profiles","authors":"John R. Williams","doi":"10.1145/2542652.2542656","DOIUrl":"https://doi.org/10.1145/2542652.2542656","url":null,"abstract":"An attempt was made to cluster the load profiles of a sample (n ≈ 380) of New Zealand households. An extensive range of approaches was evaluated, including the approach of clustering on \"features\" of the data rather than the raw data. A semi-automatic search of the problem space (cluster base, distance measure, cluster/partitioning method and k) resulted in a k = 3-cluster solution with acceptable quality indices and face validity. Although a particular combination of base, distance metric and clustering method was found to work well in this case, it is the practice of searching the problem space, rather than a particular solution, that is discussed and advocated.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133333393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Association analysis is an important technique in data mining, and it has been widely used in many application areas [6]. However, associations found in data can be spurious and do not reflect the ‘true’ relationships between the variables under consideration. For example, it is easily for hundreds or thousands of association rules to be generated even in a small data set, but most of them could be spurious and have no practical meaning [11, 21, 22]. This has hindered the applications of association analysis to solving real world problems. While the development of efficient techniques for finding association patterns in data, especially in large data sets, is well underway, the problem for identifying non-spurious associations has become prominent. Causal relationships imply the real data generating mechanisms and how the outcome would change when the cause is changed, so finding them has been the ultimate goals of many scientific explorations and social studies [18]. The gold standard for causal discover is randomised controlled trials (RCTs) [4, 16]. However, a RCT is infeasible in many real world applications, particularly in the case of high dimensional problem of a large number of potential causes. As part of the efforts on causal discovery, statisticians have studied various methods for testing a hypothetical causal relationship based on observational data [16]. However, these methods are designed for validating a known candidate causal relationship and they are incapable of dealing with a large number of potential causes either. Although an association between two variables does not always imply causation, it is well known that associations are indicators for causal relationships [7]. Therefore a practical approach to causal discovery in large data sets could start with association analysis of the data. A question is then whether we can filter out associations that do not have causal indications. Note that this objective is different from that of mining interesting associations [9, 20] or discovering statistically sound associations [5, 21] because interestingness criteria do not measure causality and a test of statistical significance only determines if an association is due to random chance. We have integrated two statis-
{"title":"From Association Analysis to Causal Discovery","authors":"Jiuyong Li","doi":"10.1145/2542652.2542659","DOIUrl":"https://doi.org/10.1145/2542652.2542659","url":null,"abstract":"Association analysis is an important technique in data mining, and it has been widely used in many application areas [6]. However, associations found in data can be spurious and do not reflect the ‘true’ relationships between the variables under consideration. For example, it is easily for hundreds or thousands of association rules to be generated even in a small data set, but most of them could be spurious and have no practical meaning [11, 21, 22]. This has hindered the applications of association analysis to solving real world problems. While the development of efficient techniques for finding association patterns in data, especially in large data sets, is well underway, the problem for identifying non-spurious associations has become prominent. Causal relationships imply the real data generating mechanisms and how the outcome would change when the cause is changed, so finding them has been the ultimate goals of many scientific explorations and social studies [18]. The gold standard for causal discover is randomised controlled trials (RCTs) [4, 16]. However, a RCT is infeasible in many real world applications, particularly in the case of high dimensional problem of a large number of potential causes. As part of the efforts on causal discovery, statisticians have studied various methods for testing a hypothetical causal relationship based on observational data [16]. However, these methods are designed for validating a known candidate causal relationship and they are incapable of dealing with a large number of potential causes either. Although an association between two variables does not always imply causation, it is well known that associations are indicators for causal relationships [7]. Therefore a practical approach to causal discovery in large data sets could start with association analysis of the data. A question is then whether we can filter out associations that do not have causal indications. Note that this objective is different from that of mining interesting associations [9, 20] or discovering statistically sound associations [5, 21] because interestingness criteria do not measure causality and a test of statistical significance only determines if an association is due to random chance. We have integrated two statis-","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125824165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wireless Sensor Networks (WSNs) have found many practical applications in recent years. Apart from both the vast new opportunities and challenges raised by the availability of large amounts of sensory data, energy conservation remains a challenging research topic that demands intelligent solutions. Various data aggregation techniques have been proposed in the literature, but the optimal tradeoff between algorithm complexity and prediction ability remains elusive. In this paper we concentrate on employing a few light-weight time series estimation algorithms for online predictive sensing. A number of performance metrics are proposed and employed to examine the effectiveness of the scheme using real-world datasets.
{"title":"Light-weight Online Predictive Data Aggregation for Wireless Sensor Networks","authors":"Jeremiah D. Deng, Yue Zhang","doi":"10.1145/2542652.2542657","DOIUrl":"https://doi.org/10.1145/2542652.2542657","url":null,"abstract":"Wireless Sensor Networks (WSNs) have found many practical applications in recent years. Apart from both the vast new opportunities and challenges raised by the availability of large amounts of sensory data, energy conservation remains a challenging research topic that demands intelligent solutions. Various data aggregation techniques have been proposed in the literature, but the optimal tradeoff between algorithm complexity and prediction ability remains elusive. In this paper we concentrate on employing a few light-weight time series estimation algorithms for online predictive sensing. A number of performance metrics are proposed and employed to examine the effectiveness of the scheme using real-world datasets.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121387920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shellfish farms must be closed if there is suspected contamination during production to avoid serious health hazards. The authorities monitor a number of environmental and water quality variables through a set of sensors to check the health of shellfish farms and to decide on the closure of the farms. The research presented in this paper aims to develop an ensemble feature ranking algorithm to identify the cause of closure. We have presented and analysed the results obtained using the proposed algorithm to demonstrate its effectiveness.
{"title":"Ensemble Feature Ranking for Shellfish Farm Closure Cause Identification","authors":"Ashfaqur Rahman, C. D'Este, John McCulloch","doi":"10.1145/2542652.2542655","DOIUrl":"https://doi.org/10.1145/2542652.2542655","url":null,"abstract":"Shellfish farms must be closed if there is suspected contamination during production to avoid serious health hazards. The authorities monitor a number of environmental and water quality variables through a set of sensors to check the health of shellfish farms and to decide on the closure of the farms. The research presented in this paper aims to develop an ensemble feature ranking algorithm to identify the cause of closure. We have presented and analysed the results obtained using the proposed algorithm to demonstrate its effectiveness.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124783216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The acquisition of huge sensor data has led to the advent of the smart field phenomenon in the petroleum industry. A lot of data is acquired during drilling and production processes through logging tools equipped with sub-surface/down-hole sensors. Reservoir modeling has advanced from the use of empirical equations through statistical regression tools to the present embrace of Artificial Intelligence (AI) and its hybrid techniques. Due to the high dimensionality and heterogeneity of the sensor data, the capability of conventional AI techniques has become limited as they could not handle more than one hypothesis at a time. Ensemble learning method has the capability to combine several hypotheses to evolve a single ensemble solution to a problem. Despite its popular use, especially in petroleum engineering, Artificial Neural Networks (ANN) has posed a number of challenges. One of such is the difficulty in determining the most suitable learning algorithm for optimal model performance. To save the cost, effort and time involved in the use of trial-and-error and evolutionary methods, this paper presents an ensemble model of ANN that combines the diverse performances of seven "weak" learning algorithms to evolve an ensemble solution in the prediction of porosity and permeability of petroleum reservoirs. When compared to the individual ANN, ANN-bagging and RandomForest, the proposed model performed best. This further confirms the great opportunities for ensemble modeling in petroleum reservoir characterization and other petroleum engineering problems.
{"title":"Predicting Petroleum Reservoir Properties from Downhole Sensor Data using an Ensemble Model of Neural Networks","authors":"Anifowose Fatai, J. Labadin, A. Raheem","doi":"10.1145/2542652.2542654","DOIUrl":"https://doi.org/10.1145/2542652.2542654","url":null,"abstract":"The acquisition of huge sensor data has led to the advent of the smart field phenomenon in the petroleum industry. A lot of data is acquired during drilling and production processes through logging tools equipped with sub-surface/down-hole sensors. Reservoir modeling has advanced from the use of empirical equations through statistical regression tools to the present embrace of Artificial Intelligence (AI) and its hybrid techniques. Due to the high dimensionality and heterogeneity of the sensor data, the capability of conventional AI techniques has become limited as they could not handle more than one hypothesis at a time. Ensemble learning method has the capability to combine several hypotheses to evolve a single ensemble solution to a problem. Despite its popular use, especially in petroleum engineering, Artificial Neural Networks (ANN) has posed a number of challenges. One of such is the difficulty in determining the most suitable learning algorithm for optimal model performance. To save the cost, effort and time involved in the use of trial-and-error and evolutionary methods, this paper presents an ensemble model of ANN that combines the diverse performances of seven \"weak\" learning algorithms to evolve an ensemble solution in the prediction of porosity and permeability of petroleum reservoirs. When compared to the individual ANN, ANN-bagging and RandomForest, the proposed model performed best. This further confirms the great opportunities for ensemble modeling in petroleum reservoir characterization and other petroleum engineering problems.","PeriodicalId":248909,"journal":{"name":"MLSDA '13","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124589684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}