Recommendation Engines have gained the most attention in the Big Data world. In order to promote the application of big data, AlibabaGrouporganizedthebig data recommendation competition, which provides the big data processing platform and one billion behavior records to participants. The competition requires the participants to learn the model from the user's behaviors within one month and then predict the purchase behavior in the following day. There are four kinds of different behaviors included: browse, add-to-cart, collection and purchase. The F1-score is as the metric to evaluate the performance. Finally, our team achieves the top score of 8.78%, and our success can be owed to the following aspects: First, we model the recommendation problem as the binary classification problem and design the hierarchical model, Second, in order to improve performance of single classifier, we adopt the sample filtering strategy to select valuable samples for training, which not only boosts the performance but also speeds up the training, Third, the classifier fusion strategy is used to improve the final performance. This paper details our hierarchical model and some relevant key technologies adopted for this competition. This hierarchical model is also the framework of data processing, which is composed of four layers: 1) Sample filtering layer, which removes a large number of invaluable samples and reduces the computing complexity, 2) Feature extraction layer, which extracts extensive features so as to characterize the samples from all possible views, 3) Classifying layer, which trains several classifiers by different sampling strategy and feature groups, 4) Fusion layers, which fuses the results of different classifiers to obtain the better one. Our score in competition manifests the reasonableness and feasibility of our model.
{"title":"The Hierarchical Model to Ali Mobile Recommendation Competition","authors":"Suchi Qian, Furong Peng, Xiang Li, Jianfeng Lu","doi":"10.1109/ICDMW.2015.75","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.75","url":null,"abstract":"Recommendation Engines have gained the most attention in the Big Data world. In order to promote the application of big data, AlibabaGrouporganizedthebig data recommendation competition, which provides the big data processing platform and one billion behavior records to participants. The competition requires the participants to learn the model from the user's behaviors within one month and then predict the purchase behavior in the following day. There are four kinds of different behaviors included: browse, add-to-cart, collection and purchase. The F1-score is as the metric to evaluate the performance. Finally, our team achieves the top score of 8.78%, and our success can be owed to the following aspects: First, we model the recommendation problem as the binary classification problem and design the hierarchical model, Second, in order to improve performance of single classifier, we adopt the sample filtering strategy to select valuable samples for training, which not only boosts the performance but also speeds up the training, Third, the classifier fusion strategy is used to improve the final performance. This paper details our hierarchical model and some relevant key technologies adopted for this competition. This hierarchical model is also the framework of data processing, which is composed of four layers: 1) Sample filtering layer, which removes a large number of invaluable samples and reduces the computing complexity, 2) Feature extraction layer, which extracts extensive features so as to characterize the samples from all possible views, 3) Classifying layer, which trains several classifiers by different sampling strategy and feature groups, 4) Fusion layers, which fuses the results of different classifiers to obtain the better one. Our score in competition manifests the reasonableness and feasibility of our model.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115939547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrateour method's effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.
{"title":"Event Detection from Millions of Tweets Related to the Great East Japan Earthquake Using Feature Selection Technique","authors":"T. Hashimoto, D. Shepard, T. Kuboyama, Kilho Shin","doi":"10.1109/ICDMW.2015.248","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.248","url":null,"abstract":"Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrateour method's effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126675733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A shikake is a trigger for behavioral change to solve a problem. We proposes a Shikake Data Market (SDM) platform for giving everyone an opportunity to implement a shikake with restricted resources, such as ideas, expert knowledge and skill, practitioners, negotiators, and budget. As a preliminary case, we analyzed the collaborative creation at a shikake hackathon and revealed that collaboration among people with diverse expert backgrounds would improve the quality of the output. Based on this result, we discuss collaborative shikake creation.
{"title":"Shikake Data Market for Collaborative Shikake Creation","authors":"N. Matsumura, Hideaki Takeda","doi":"10.1109/ICDMW.2015.130","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.130","url":null,"abstract":"A shikake is a trigger for behavioral change to solve a problem. We proposes a Shikake Data Market (SDM) platform for giving everyone an opportunity to implement a shikake with restricted resources, such as ideas, expert knowledge and skill, practitioners, negotiators, and budget. As a preliminary case, we analyzed the collaborative creation at a shikake hackathon and revealed that collaboration among people with diverse expert backgrounds would improve the quality of the output. Based on this result, we discuss collaborative shikake creation.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124031810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mobility prediction has recently attracted plenty of attention since it plays an important part in many applications ranging from urban planning and traffic forecasting to location-based services, including mobile recommendation and mobile advertisement. However, there is little study on exploiting the activity information, being often associated with the trajectories on which prediction is based, for assisting location prediction. To this end, in this paper, we propose a Time-stamped Activity INference Enhanced Predictor (TAINEP) for forecasting next location on activity trajectories. In TAINEP, we propose to leverage topic models for dimension reduction so as to capture co-occurrences of different time-stamped activities. It is then extended to incorporate temporal dependence between topics of consecutive time-stamped activities to infer the activity which may be conducted at the next location and the time when it will happen. Based on the inferred time-stamped activities, a probabilistic mixture model is further put forward to integrate them with commonly-used Markov predictors for forecasting the next locations. We finally evaluate the proposed model on two real-world datasets. The results show that the proposed method outperforms the competing predictors without inferring time-stamped activities. In other words, it lifts the predictability of human mobility.
{"title":"Lifting the Predictability of Human Mobility on Activity Trajectories","authors":"Xianming Li, Defu Lian, Xing Xie, Guangzhong Sun","doi":"10.1109/ICDMW.2015.164","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.164","url":null,"abstract":"Mobility prediction has recently attracted plenty of attention since it plays an important part in many applications ranging from urban planning and traffic forecasting to location-based services, including mobile recommendation and mobile advertisement. However, there is little study on exploiting the activity information, being often associated with the trajectories on which prediction is based, for assisting location prediction. To this end, in this paper, we propose a Time-stamped Activity INference Enhanced Predictor (TAINEP) for forecasting next location on activity trajectories. In TAINEP, we propose to leverage topic models for dimension reduction so as to capture co-occurrences of different time-stamped activities. It is then extended to incorporate temporal dependence between topics of consecutive time-stamped activities to infer the activity which may be conducted at the next location and the time when it will happen. Based on the inferred time-stamped activities, a probabilistic mixture model is further put forward to integrate them with commonly-used Markov predictors for forecasting the next locations. We finally evaluate the proposed model on two real-world datasets. The results show that the proposed method outperforms the competing predictors without inferring time-stamped activities. In other words, it lifts the predictability of human mobility.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126867921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In performing data mining, a common task is to search for the most appropriate algorithm(s) to retrieve important information from data. With an increasing number of available data mining techniques, it may be impractical to experiment with many techniques on a specific dataset of interest to find the best algorithm(s). In this paper, we demonstrate the suitability of tree-based multi-variable linear regression in predicting algorithm performance. We take into account prior machine learning experience to construct meta-knowledge for supervised learning. The idea is to use summary knowledge about datasets along with past performance of algorithms on these datasets to build this meta-knowledge. We augment pure statistical summaries with descriptive features and a misclassification cost, and discover that transformed datasets obtained by reducing a high dimensional feature space to a smaller dimension still retain significant characteristic knowledge necessary to predict algorithm performance. Our approach works well for both numerical and nominal data obtained from real world environments.
{"title":"Selecting Machine Learning Algorithms Using Regression Models","authors":"Tri Doan, J. Kalita","doi":"10.1109/ICDMW.2015.43","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.43","url":null,"abstract":"In performing data mining, a common task is to search for the most appropriate algorithm(s) to retrieve important information from data. With an increasing number of available data mining techniques, it may be impractical to experiment with many techniques on a specific dataset of interest to find the best algorithm(s). In this paper, we demonstrate the suitability of tree-based multi-variable linear regression in predicting algorithm performance. We take into account prior machine learning experience to construct meta-knowledge for supervised learning. The idea is to use summary knowledge about datasets along with past performance of algorithms on these datasets to build this meta-knowledge. We augment pure statistical summaries with descriptive features and a misclassification cost, and discover that transformed datasets obtained by reducing a high dimensional feature space to a smaller dimension still retain significant characteristic knowledge necessary to predict algorithm performance. Our approach works well for both numerical and nominal data obtained from real world environments.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127942066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since the dissemination of the first beheading video by the Islamic State in Iraq and Levant (ISIL) of its hostage James Foley (an American journalist), this practice has become increasingly common. Videos of ISIL beheading their hostages in orange jumpsuits swarmed over social media as they swept across Iraq. By showing such shocking videos and images, ISIL is able to spread their opinions and create emotional attitudes for their followers. Through a sophisticated social media strategy and strategic use of botnets, ISIL is succeeding in its propaganda dissemination. ISIL is using social media as a tool to conduct recruitment and radicalization campaigns and raise funds. In this study, we examine the reasons for creating such videos grounded in the literature from cultural anthropology, transnationalism and religious identity, and media & communication. Toward this direction, we collect data from Twitter for the beheadings done by ISIL, especially the Egyptian Copts, the Arab-Israeli "Spy", and the Ethiopian Christians. The study provides insights into the way ISIL uses social media (especially Twitter) to disseminate propaganda and develop a framework to identify sociotechnical behavioral patterns from social and computational science perspective.
{"title":"Examining Botnet Behaviors for Propaganda Dissemination: A Case Study of ISIL's Beheading Videos-Based Propaganda","authors":"Samer Al-khateeb, Nitin Agarwal","doi":"10.1109/ICDMW.2015.41","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.41","url":null,"abstract":"Since the dissemination of the first beheading video by the Islamic State in Iraq and Levant (ISIL) of its hostage James Foley (an American journalist), this practice has become increasingly common. Videos of ISIL beheading their hostages in orange jumpsuits swarmed over social media as they swept across Iraq. By showing such shocking videos and images, ISIL is able to spread their opinions and create emotional attitudes for their followers. Through a sophisticated social media strategy and strategic use of botnets, ISIL is succeeding in its propaganda dissemination. ISIL is using social media as a tool to conduct recruitment and radicalization campaigns and raise funds. In this study, we examine the reasons for creating such videos grounded in the literature from cultural anthropology, transnationalism and religious identity, and media & communication. Toward this direction, we collect data from Twitter for the beheadings done by ISIL, especially the Egyptian Copts, the Arab-Israeli \"Spy\", and the Ethiopian Christians. The study provides insights into the way ISIL uses social media (especially Twitter) to disseminate propaganda and develop a framework to identify sociotechnical behavioral patterns from social and computational science perspective.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128950816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative Lattice (EnumLat) for biclustering of binary microarray data. EnumLat is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevant biclusters.
{"title":"An Enumerative Biclustering Algorithm for DNA Microarray Data","authors":"Haifa Ben Saber, M. Elloumi","doi":"10.1109/ICDMW.2015.168","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.168","url":null,"abstract":"In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative Lattice (EnumLat) for biclustering of binary microarray data. EnumLat is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevant biclusters.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127487192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahsanur Rahman, Steve T. K. Jan, Hyunju Kim, B. Prakash, T. Murali
Ensembles of graphs arise in several natural applications, such as mobility tracking, computational biology, socialnetworks, and epidemiology. A common problem addressed by many existing mining techniques is to identify subgraphs of interest in these ensembles. In contrast, in this paper, we propose to quickly discover maximally variable regions of the graphs, i.e., sets of nodes that induce very different subgraphs across the ensemble. We first develop two intuitive and novel definitions of such node sets, which we then show can be efficiently enumerated using a level-wise algorithm. Finally, using extensive experiments on multiple real datasets, we show how these sets capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets.
{"title":"Mining Unstable Communities from Network Ensembles","authors":"Ahsanur Rahman, Steve T. K. Jan, Hyunju Kim, B. Prakash, T. Murali","doi":"10.1109/ICDMW.2015.87","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.87","url":null,"abstract":"Ensembles of graphs arise in several natural applications, such as mobility tracking, computational biology, socialnetworks, and epidemiology. A common problem addressed by many existing mining techniques is to identify subgraphs of interest in these ensembles. In contrast, in this paper, we propose to quickly discover maximally variable regions of the graphs, i.e., sets of nodes that induce very different subgraphs across the ensemble. We first develop two intuitive and novel definitions of such node sets, which we then show can be efficiently enumerated using a level-wise algorithm. Finally, using extensive experiments on multiple real datasets, we show how these sets capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130386655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Collaborative filtering recommender systems (CFRSs) are critical components of existing popular e-commerce websites to make personalized recommendations. In practice, CFRSs are highly vulnerable to "shilling" attacks or "profile injection" attacks due to its openness. A number of detection methods have been proposed to make CFRSs resistant to such attacks. However, some of them distinguished attackers by using typical similarity metrics, which are difficult to fully defend all attackers and show high computation time, although they can be effective to capture the concerned attackers in some extent. In this paper, we propose an unsupervised method to detect such attacks. Firstly, we filter out more genuine users by using suspected target items as far as possible in order to reduce time consumption. Based on the remained result of the first stage, we employ a new similarity metric to further filter out the remained genuine users, which combines the traditional similarity metric and the linkage information between users to improve the accuracy of similarity of users. Experimental results show that our proposed detection method is superior to benchmarked method.
{"title":"Defending Suspected Users by Exploiting Specific Distance Metric in Collaborative Filtering Recommender Systems","authors":"Zhihai Yang, Zhongmin Cai","doi":"10.1109/ICDMW.2015.89","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.89","url":null,"abstract":"Collaborative filtering recommender systems (CFRSs) are critical components of existing popular e-commerce websites to make personalized recommendations. In practice, CFRSs are highly vulnerable to \"shilling\" attacks or \"profile injection\" attacks due to its openness. A number of detection methods have been proposed to make CFRSs resistant to such attacks. However, some of them distinguished attackers by using typical similarity metrics, which are difficult to fully defend all attackers and show high computation time, although they can be effective to capture the concerned attackers in some extent. In this paper, we propose an unsupervised method to detect such attacks. Firstly, we filter out more genuine users by using suspected target items as far as possible in order to reduce time consumption. Based on the remained result of the first stage, we employ a new similarity metric to further filter out the remained genuine users, which combines the traditional similarity metric and the linkage information between users to improve the accuracy of similarity of users. Experimental results show that our proposed detection method is superior to benchmarked method.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116948244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ruggles, T. Kugler, Catherine A. Fitch, D. V. Riper
Terra Populus, part of National Science Foundation's DataNet initiative, is developing organizational and technical infrastructure to integrate, preserve, and disseminate data describing changes in the human population and environment over time. A large number of high-quality environmental and population datasets are available, but they are widely dispersed, have incompatible or inadequate metadata, and have incompatible geographic identifiers. The new Terra Populus infrastructure enables researchers to identify and merge data from heterogeneous sources to study the relationships between human behavior and the natural world.
Terra Populus是美国国家科学基金会数据网计划的一部分,正在开发组织和技术基础设施,以整合、保存和传播描述人口和环境随时间变化的数据。有大量高质量的环境和人口数据集,但它们分布广泛,元数据不兼容或不充分,地理标识符也不兼容。新的Terra Populus基础设施使研究人员能够识别和合并来自不同来源的数据,以研究人类行为与自然世界之间的关系。
{"title":"Terra Populus: Integrated Data on Population and Environment","authors":"S. Ruggles, T. Kugler, Catherine A. Fitch, D. V. Riper","doi":"10.1109/ICDMW.2015.204","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.204","url":null,"abstract":"Terra Populus, part of National Science Foundation's DataNet initiative, is developing organizational and technical infrastructure to integrate, preserve, and disseminate data describing changes in the human population and environment over time. A large number of high-quality environmental and population datasets are available, but they are widely dispersed, have incompatible or inadequate metadata, and have incompatible geographic identifiers. The new Terra Populus infrastructure enables researchers to identify and merge data from heterogeneous sources to study the relationships between human behavior and the natural world.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132700032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}