首页 > 最新文献

2015 IEEE International Conference on Data Mining Workshop (ICDMW)最新文献

英文 中文
Adslot Mining for Online Display Ads 在线展示广告的广告位挖掘
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.82
Kazuki Taniguchi, Yuki Harada, Nguyen Tuan Duc
Finding appropriate adslots to display ads is an important step to achieve high conversion rates in online display advertising. Previous work on ad recommendation and conversion prediction often focuses on matching between adslots, users and ads simultaneously for each impression at micro level. Such methods require rich attributes of users, ads and adslots, which might not always be available, especially with ad-adslot pairs that have never been displayed. In this research, we propose a macro approach for mining new adslots for each ad by recommending appropriate adslots to the ad. The proposed method does not require any user information and can be pre-calculated offline, even when there are not any impressions of the ad on the target adslots. It applies matrix factorization techniques to the ad-adslot performance history matrix to calculate the predicted performance of the target adslots. Experiments show that the proposed method achieves a small root mean-square error (RMSE) when testing with offline data and it yields high conversion rates in online tests with real-world ad campaigns.
寻找合适的广告位是网络展示广告实现高转化率的重要一步。以前关于广告推荐和转化预测的工作通常侧重于在微观层面上同时匹配每个印象的广告位、用户和广告。这些方法需要用户、广告和adslot的丰富属性,这些属性可能并不总是可用的,特别是对于从未显示过的ad-adslot对。在这项研究中,我们提出了一种宏观方法,通过向广告推荐合适的广告位来为每个广告挖掘新的广告位。所提出的方法不需要任何用户信息,并且可以离线预计算,即使在目标广告槽上没有任何广告印象。它将矩阵分解技术应用于ad-adslot性能历史矩阵,以计算目标adslot的预测性能。实验表明,该方法在离线数据测试中获得了较小的均方根误差(RMSE),并且在真实广告活动的在线测试中产生了较高的转化率。
{"title":"Adslot Mining for Online Display Ads","authors":"Kazuki Taniguchi, Yuki Harada, Nguyen Tuan Duc","doi":"10.1109/ICDMW.2015.82","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.82","url":null,"abstract":"Finding appropriate adslots to display ads is an important step to achieve high conversion rates in online display advertising. Previous work on ad recommendation and conversion prediction often focuses on matching between adslots, users and ads simultaneously for each impression at micro level. Such methods require rich attributes of users, ads and adslots, which might not always be available, especially with ad-adslot pairs that have never been displayed. In this research, we propose a macro approach for mining new adslots for each ad by recommending appropriate adslots to the ad. The proposed method does not require any user information and can be pre-calculated offline, even when there are not any impressions of the ad on the target adslots. It applies matrix factorization techniques to the ad-adslot performance history matrix to calculate the predicted performance of the target adslots. Experiments show that the proposed method achieves a small root mean-square error (RMSE) when testing with offline data and it yields high conversion rates in online tests with real-world ad campaigns.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124418680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Prediction of Long-Lead Heavy Precipitation Events Aided by Machine Learning 机器学习辅助下的长铅强降水事件预测
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.218
Yahui Di
Long-lead prediction of heavy precipitation events has a significant impact since it can provide an early warning of disasters, like a flood. However, the performance of existed prediction models has been constrained by the high dimensional space and non-linear relationship among variables. In this study, we study the prediction problem from the prospective of machine learning. In our machine-learning framework for forecasting heavy precipitation events, we use global hydro-meteorological variables with spatial and temporal influences as features, and the target weather events that last several days have been formulated as weather clusters. Our study has three phases: 1) identify weather clusters in different sizes, 2) handle the imbalance problem within the data, 3) select the most-relevant features through the large feature space. We plan to evaluate our methods with several real world data sets for predicting the heavy precipitation events.
对强降水事件的长期预测具有重大影响,因为它可以为洪水等灾害提供早期预警。然而,现有预测模型的性能受到高维空间和变量间非线性关系的限制。在本研究中,我们从机器学习的角度来研究预测问题。在我们预测强降水事件的机器学习框架中,我们使用具有空间和时间影响的全球水文气象变量作为特征,并将持续数天的目标天气事件制定为天气集群。我们的研究分为三个阶段:1)识别不同规模的天气聚类;2)处理数据内部的不平衡问题;3)通过大特征空间选择最相关的特征。我们计划用几个真实世界的数据集来评估我们的方法来预测强降水事件。
{"title":"Prediction of Long-Lead Heavy Precipitation Events Aided by Machine Learning","authors":"Yahui Di","doi":"10.1109/ICDMW.2015.218","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.218","url":null,"abstract":"Long-lead prediction of heavy precipitation events has a significant impact since it can provide an early warning of disasters, like a flood. However, the performance of existed prediction models has been constrained by the high dimensional space and non-linear relationship among variables. In this study, we study the prediction problem from the prospective of machine learning. In our machine-learning framework for forecasting heavy precipitation events, we use global hydro-meteorological variables with spatial and temporal influences as features, and the target weather events that last several days have been formulated as weather clusters. Our study has three phases: 1) identify weather clusters in different sizes, 2) handle the imbalance problem within the data, 3) select the most-relevant features through the large feature space. We plan to evaluate our methods with several real world data sets for predicting the heavy precipitation events.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126163435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pedestrian Detection Using Privileged Information 使用特权信息的行人检测
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.70
Zhiquan Qi, Ying-jie Tian, Lingfeng Niu, Fan Meng, Limeng Cui, Yong Shi
How to balance the speed and the quality is always a challenging issue in pedestrian detection. In this paper, we introduce the Learning model Using Privileged Information (LUPI), which can accelerate the convergence rate of learning and effectively improve the quality without sacrificing the speed. In more detail, we give the clear definition of the privileged information, which is only available at the training stage but is never available for the testing set, for the pedestrian detection problem and show how much the privileged information helps the detector to improve the quality. All experimental results show the robustness and effectiveness of the proposed method, at the same time show that the privileged information offers a significant improvement.
如何平衡行人检测的速度和质量一直是行人检测中一个具有挑战性的问题。本文引入了利用特权信息(Privileged Information, LUPI)的学习模型,在不牺牲学习速度的前提下,加快了学习的收敛速度,有效地提高了学习质量。更详细地说,对于行人检测问题,我们给出了特权信息的明确定义,特权信息只在训练阶段可用,而不能用于测试集,并展示了特权信息对检测器提高质量的帮助程度。实验结果表明了该方法的鲁棒性和有效性,同时对特权信息有了明显的改进。
{"title":"Pedestrian Detection Using Privileged Information","authors":"Zhiquan Qi, Ying-jie Tian, Lingfeng Niu, Fan Meng, Limeng Cui, Yong Shi","doi":"10.1109/ICDMW.2015.70","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.70","url":null,"abstract":"How to balance the speed and the quality is always a challenging issue in pedestrian detection. In this paper, we introduce the Learning model Using Privileged Information (LUPI), which can accelerate the convergence rate of learning and effectively improve the quality without sacrificing the speed. In more detail, we give the clear definition of the privileged information, which is only available at the training stage but is never available for the testing set, for the pedestrian detection problem and show how much the privileged information helps the detector to improve the quality. All experimental results show the robustness and effectiveness of the proposed method, at the same time show that the privileged information offers a significant improvement.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123585629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Temporal Topic Inference for Trend Prediction 趋势预测的时间主题推断
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.214
S. Aghababaei, M. Makrehchi
Publicly available social data has been adoptedwidely to explore language of crowds and leverage themin real world problem predictions. In microblogs, usersextensively share information about their moods, topics ofinterests, and social events which provide ideal data resourcefor many applications. We also study footprints of socialproblems in Twitter data. Hidden topics identified fromTwitter content are utilized to predict crime trend. Since ourproblem has a sequential order, extracting meaningful patternsinvolves temporal analysis. Prediction model requiresto address information evolution, in which data are morerelated when they are close in time rather than further apart. The study has been presented into two steps: firstly, a temporaltopic detection model is introduced to infer predictivehidden topics. The model builds a dynamic vocabulary todetect emerged topics. Topics are compared over time to havediversity and novelty in each time consideration. Secondly, apredictive model is proposed which utilizes identified temporaltopics to predict crime trend in prospective timeframe. The model does not suffer from lack of available learningexamples. Learning examples are annotated with knowledgeinferred from the trend. The experiments have revealed, temporal topic detection outperforms static topic modelingwhen dealing with sequential data. Topics are more diversewhen are inferred in different time slices. In general, theresults indicate temporal topics have a strong correlationwith crime index changes. Predictability is high in somespecific crime types and could be variant depending on theincidents. The study provides insight into the correlation oflanguage and real world problems and impacts of social datain providing predictive indicators.
公开可用的社会数据已被广泛用于探索群体语言,并利用它们来预测现实世界的问题。在微博中,用户广泛地分享关于他们的情绪、兴趣话题和社会事件的信息,这为许多应用程序提供了理想的数据资源。我们还研究了Twitter数据中社会问题的足迹。从twitter内容中识别的隐藏话题被用来预测犯罪趋势。由于我们的问题有一个连续的顺序,提取有意义的模式涉及到时间分析。预测模型需要解决信息演化的问题,即数据在时间上越接近,相关性越强。该研究分为两个步骤:首先,引入时间主题检测模型来推断预测隐藏主题;该模型建立了一个动态词汇表来检测出现的主题。随着时间的推移,主题进行比较,在每次考虑中都具有多样性和新颖性。其次,提出了预测模型,利用确定的时间主题来预测未来时间框架内的犯罪趋势。该模型不受缺乏可用学习实例的影响。学习实例用从趋势中推断出的知识进行注释。实验表明,在处理顺序数据时,时间主题检测优于静态主题建模。当在不同的时间片中推断时,主题更加多样化。总体而言,研究结果表明,时间话题与犯罪指数的变化有很强的相关性。在某些特定的犯罪类型中,可预测性很高,并且可能因事件而异。该研究为语言和现实世界问题的相关性以及社会数据的影响提供了预测指标。
{"title":"Temporal Topic Inference for Trend Prediction","authors":"S. Aghababaei, M. Makrehchi","doi":"10.1109/ICDMW.2015.214","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.214","url":null,"abstract":"Publicly available social data has been adoptedwidely to explore language of crowds and leverage themin real world problem predictions. In microblogs, usersextensively share information about their moods, topics ofinterests, and social events which provide ideal data resourcefor many applications. We also study footprints of socialproblems in Twitter data. Hidden topics identified fromTwitter content are utilized to predict crime trend. Since ourproblem has a sequential order, extracting meaningful patternsinvolves temporal analysis. Prediction model requiresto address information evolution, in which data are morerelated when they are close in time rather than further apart. The study has been presented into two steps: firstly, a temporaltopic detection model is introduced to infer predictivehidden topics. The model builds a dynamic vocabulary todetect emerged topics. Topics are compared over time to havediversity and novelty in each time consideration. Secondly, apredictive model is proposed which utilizes identified temporaltopics to predict crime trend in prospective timeframe. The model does not suffer from lack of available learningexamples. Learning examples are annotated with knowledgeinferred from the trend. The experiments have revealed, temporal topic detection outperforms static topic modelingwhen dealing with sequential data. Topics are more diversewhen are inferred in different time slices. In general, theresults indicate temporal topics have a strong correlationwith crime index changes. Predictability is high in somespecific crime types and could be variant depending on theincidents. The study provides insight into the correlation oflanguage and real world problems and impacts of social datain providing predictive indicators.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125279759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Alternating Direction Method of Multipliers for Nonparallel Support Vector Machines 非并行支持向量机乘法器的交替方向法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.77
Xin Shen, Lingfeng Niu, Ying-jie Tian, Yong Shi
Recently, a novel nonparallel support vector machine (NPSVM) is proposed by Tian et al, which has several attracting advantages over its predecessors. A sequential minimal optimization algorithm(SMO) has already been provided to solve the dual form of NPSVM. Different from the existing work, we present a new strategy to solve the primal form of NPSVM in this paper. Our algorithm is designed in the framework of the alternating direction method of multipliers (ADMM), which is well suited to distributed convex optimization. Although the closed-form solution of each step can be written out directly, in order to be able to handle problems with a very large number of features or training examples, we propose to solve the underlying linear equation systems proximally by the conjugate gradient method. Experiments are carried out on several data sets. Numerical results indeed demonstrate the effectiveness of our method.
最近,Tian等人提出了一种新的非并行支持向量机(NPSVM),它比以前的支持向量机有几个吸引人的优点。针对NPSVM的对偶形式,提出了一种序贯最小优化算法(SMO)。与已有的工作不同,本文提出了一种求解NPSVM原始形式的新策略。我们的算法是在交替方向乘法器(ADMM)框架下设计的,它非常适合于分布式凸优化。虽然可以直接写出每一步的封闭解,但为了能够处理具有大量特征或训练样例的问题,我们建议用共轭梯度法近似地求解底层线性方程组。在几个数据集上进行了实验。数值结果验证了该方法的有效性。
{"title":"Alternating Direction Method of Multipliers for Nonparallel Support Vector Machines","authors":"Xin Shen, Lingfeng Niu, Ying-jie Tian, Yong Shi","doi":"10.1109/ICDMW.2015.77","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.77","url":null,"abstract":"Recently, a novel nonparallel support vector machine (NPSVM) is proposed by Tian et al, which has several attracting advantages over its predecessors. A sequential minimal optimization algorithm(SMO) has already been provided to solve the dual form of NPSVM. Different from the existing work, we present a new strategy to solve the primal form of NPSVM in this paper. Our algorithm is designed in the framework of the alternating direction method of multipliers (ADMM), which is well suited to distributed convex optimization. Although the closed-form solution of each step can be written out directly, in order to be able to handle problems with a very large number of features or training examples, we propose to solve the underlying linear equation systems proximally by the conjugate gradient method. Experiments are carried out on several data sets. Numerical results indeed demonstrate the effectiveness of our method.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116580454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unsupervised Measuring of Entity Resolution Consistency 实体分辨率一致性的无监督测量
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.162
Jeffrey Fisher, Qing Wang
Entity resolution (ER) is a common data cleaning and data-integration task that aims to determine which records in one or more data sets refer to the same real-world entities. In most cases no training data exists and the ER process involves considerable trial and error, with an often time-consuming manual evaluation required to determine whether the obtained results are good enough. We propose a method that makes use of transitive closure within triples of records to provide an early indication of inconsistency in an ER result in an unsupervised fashion. We test our approach on three real-world data sets with different similarity calculations and blocking approaches and show that our approach can detect problems with ER resultsearly on without a manual evaluation.
实体解析(ER)是一种常见的数据清理和数据集成任务,旨在确定一个或多个数据集中哪些记录引用了相同的现实世界实体。在大多数情况下,没有训练数据存在,ER过程涉及大量的试验和错误,通常需要耗时的手动评估来确定获得的结果是否足够好。我们提出了一种方法,该方法利用记录三元组中的传递闭包,以无监督的方式提供ER结果中不一致的早期指示。我们在三个真实世界的数据集上用不同的相似度计算和阻塞方法测试了我们的方法,并表明我们的方法可以在没有人工评估的情况下早期检测到ER结果的问题。
{"title":"Unsupervised Measuring of Entity Resolution Consistency","authors":"Jeffrey Fisher, Qing Wang","doi":"10.1109/ICDMW.2015.162","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.162","url":null,"abstract":"Entity resolution (ER) is a common data cleaning and data-integration task that aims to determine which records in one or more data sets refer to the same real-world entities. In most cases no training data exists and the ER process involves considerable trial and error, with an often time-consuming manual evaluation required to determine whether the obtained results are good enough. We propose a method that makes use of transitive closure within triples of records to provide an early indication of inconsistency in an ER result in an unsupervised fashion. We test our approach on three real-world data sets with different similarity calculations and blocking approaches and show that our approach can detect problems with ER resultsearly on without a manual evaluation.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122130177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enhancing Stock Price Prediction with a Hybrid Approach Based Extreme Learning Machine 基于混合方法的极限学习机增强股票价格预测
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.74
Feng Wang, Yongquan Zhang, Hang Xiao, Li Kuang, Yi-Chang Lai
In this paper, we focus on the problem of how to design a methodology which can improve the prediction accuracy as well as speed up prediction process for stock market prediction. As market news and stock prices are commonly believed as two important market data sources, we present the design of our stock price prediction model based on those two data sources concurrently. Firstly, in order to get the most significant features of the market news documents, we propose a new feature selection algorithm (NRDC), as well as a new feature weighting algorithm (N-TF-IDF) to help improve the prediction accuracy. Then we employ a fast learning model named Extreme Learning Machine(ELM) and use the kernel-based ELM (K-ELM) to improve the prediction speed. Comprehensive experimental comparisons between our hybrid proposal K-ELM with NRDC and N-TF-IDF(N-N-K-ELM) and the state-of-the-art learning algorithms, including Support Vector Machine (SVM) and Back-Propagation Neural Network (BP-NN), have been undertaken on the intra-day tick-by-tick data of the H-share market and contemporaneous news archives. Experimental results show that our N-N-K-ELM model can achieve better performance on the consideration of both prediction accuracy and prediction speed in most cases.
本文主要研究如何设计一种既能提高股票市场预测精度又能加快预测速度的方法来进行股票市场预测。由于市场新闻和股票价格被认为是两个重要的市场数据源,我们提出了基于这两个数据源的股票价格预测模型的设计。首先,为了获得市场新闻文档的最显著特征,我们提出了一种新的特征选择算法(NRDC)和一种新的特征加权算法(N-TF-IDF)来帮助提高预测精度。然后采用一种快速学习模型——极限学习机(ELM),并利用基于核的极限学习机(K-ELM)来提高预测速度。我们的混合建议K-ELM与NRDC和N-TF-IDF(N-N-K-ELM)和最先进的学习算法(包括支持向量机(SVM)和反向传播神经网络(BP-NN))之间的综合实验比较,已经在h股市场的每日实时数据和同期新闻档案上进行了。实验结果表明,在大多数情况下,我们的N-N-K-ELM模型在预测精度和预测速度两方面都有较好的表现。
{"title":"Enhancing Stock Price Prediction with a Hybrid Approach Based Extreme Learning Machine","authors":"Feng Wang, Yongquan Zhang, Hang Xiao, Li Kuang, Yi-Chang Lai","doi":"10.1109/ICDMW.2015.74","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.74","url":null,"abstract":"In this paper, we focus on the problem of how to design a methodology which can improve the prediction accuracy as well as speed up prediction process for stock market prediction. As market news and stock prices are commonly believed as two important market data sources, we present the design of our stock price prediction model based on those two data sources concurrently. Firstly, in order to get the most significant features of the market news documents, we propose a new feature selection algorithm (NRDC), as well as a new feature weighting algorithm (N-TF-IDF) to help improve the prediction accuracy. Then we employ a fast learning model named Extreme Learning Machine(ELM) and use the kernel-based ELM (K-ELM) to improve the prediction speed. Comprehensive experimental comparisons between our hybrid proposal K-ELM with NRDC and N-TF-IDF(N-N-K-ELM) and the state-of-the-art learning algorithms, including Support Vector Machine (SVM) and Back-Propagation Neural Network (BP-NN), have been undertaken on the intra-day tick-by-tick data of the H-share market and contemporaneous news archives. Experimental results show that our N-N-K-ELM model can achieve better performance on the consideration of both prediction accuracy and prediction speed in most cases.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117001108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Deep Convolutional Neural Network and Multi-view Stacking Ensemble in Ali Mobile Recommendation Algorithm Competition: The Solution to the Winning of Ali Mobile Recommendation Algorithm 阿里移动推荐算法竞赛中的深度卷积神经网络与多视图叠加集成:阿里移动推荐算法获胜的解决方案
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.26
Xiang Li, Suchi Qian, Furong Peng, Jian Yang, Xiaolin Hu, Rui Xia
We proposed a deep Convolutional Neural Network (CNN) approach and a Multi-View Stacking Ensemble (MVSE) method in Ali Mobile Recommendation Algorithm competition Season 1 and Season 2, respectively. Specifically, we treat the recommendation task as a classical binary classification problem. We thereby designed a large amount of indicative features based on the logic of mobile business, and grouped them into ten clusters according to their properties. In Season 1, a two-dimensional (2D) feature map which covered both time axis and feature cluster axis was created from the original features. This design made it possible for CNN to do predictions based on the information of both short-time actions and long-time behavior habit of mobile users. Combined with some traditional ensemble methods, the CNN achieved good results which ranked No. 2 in Season 1. In Season 2, we proposed a Multi-View Stacking Ensemble (MVSE) method, by using the stacking technique to efficiently combine different views of features. A classifier was trained on each of the ten feature clusters at first. The predictions of the ten classifiers were then used as additional features. Based on the augmented features, an ensemble classifier was trained to generate the final prediction. We continuously updated our model by padding the new stacking features, and finally achieved the performance of F-1 score 8.78% which ranked No. 1 in Season 2, among over 7,000 teams in total.
在第1季和第2季的阿里移动推荐算法竞赛中,我们分别提出了一种深度卷积神经网络(CNN)方法和一种多视图堆叠集成(MVSE)方法。具体来说,我们将推荐任务视为一个经典的二分类问题。因此,我们根据移动业务的逻辑设计了大量的指示性特征,并根据其属性将其分为10个集群。在第1季中,从原始特征中创建了一个包含时间轴和特征簇轴的二维特征地图。这样的设计使得CNN可以同时根据移动用户的短时间行为和长时间行为习惯的信息进行预测。结合一些传统的合奏方法,CNN取得了不错的成绩,在第一季排名第二。在第二季中,我们提出了一种多视图叠加集成(MVSE)方法,利用叠加技术有效地组合不同视图的特征。首先在十个特征聚类上分别训练一个分类器。然后将十个分类器的预测用作附加特征。基于增强特征,训练集成分类器生成最终预测。我们不断更新我们的模型,填充新的叠加特征,最终取得了第二赛季F-1得分8.78%的成绩,在总共7000多支队伍中排名第一。
{"title":"Deep Convolutional Neural Network and Multi-view Stacking Ensemble in Ali Mobile Recommendation Algorithm Competition: The Solution to the Winning of Ali Mobile Recommendation Algorithm","authors":"Xiang Li, Suchi Qian, Furong Peng, Jian Yang, Xiaolin Hu, Rui Xia","doi":"10.1109/ICDMW.2015.26","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.26","url":null,"abstract":"We proposed a deep Convolutional Neural Network (CNN) approach and a Multi-View Stacking Ensemble (MVSE) method in Ali Mobile Recommendation Algorithm competition Season 1 and Season 2, respectively. Specifically, we treat the recommendation task as a classical binary classification problem. We thereby designed a large amount of indicative features based on the logic of mobile business, and grouped them into ten clusters according to their properties. In Season 1, a two-dimensional (2D) feature map which covered both time axis and feature cluster axis was created from the original features. This design made it possible for CNN to do predictions based on the information of both short-time actions and long-time behavior habit of mobile users. Combined with some traditional ensemble methods, the CNN achieved good results which ranked No. 2 in Season 1. In Season 2, we proposed a Multi-View Stacking Ensemble (MVSE) method, by using the stacking technique to efficiently combine different views of features. A classifier was trained on each of the ten feature clusters at first. The predictions of the ten classifiers were then used as additional features. Based on the augmented features, an ensemble classifier was trained to generate the final prediction. We continuously updated our model by padding the new stacking features, and finally achieved the performance of F-1 score 8.78% which ranked No. 1 in Season 2, among over 7,000 teams in total.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124780326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Scattering Decomposition for Massive Signal Classification: From Theory to Fast Algorithm and Implementation with Validation on International Bioacoustic Benchmark 大规模信号分类中的散射分解:从理论到快速算法及其实现与国际生物声学基准验证
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.127
Randall Balestriero, H. Glotin
With the computational power available today, machine learning is becoming a very active field finding its applications in our everyday life. One of its biggest challenge is the classification task involving data representation (the preprocessing part in a machine learning algorithm). In fact, classification of linearly separable data can be easily done. The aim of the preprocessing part is to obtain well represented data by mapping raw data into a "feature space" where simple classifiers can be used efficiently. For example, almost everything around audio/bioacoustic uses MFCC features until now. We present here a toolbox giving the basic tools for audio representation using the C++ programming language by providing an implementation of the Scattering Network which brings a new and powerful solution for these tasks. We focused our implementation to massive dataset and servers applications. The toolkit of reference in scattering analysis is SCATNET from Mallat et al. http://www.di.ens.fr/data/software/scatnet/. This tool is an attempt to have some of the scatnet features moretractable for Big Data challenges. Furthermore, the use of this toolbox is not limited to machine learning preprocessing. It can also be used for more advanced biological analysis such as animal communication behaviours analysis or any biological study related to signal analysis. This implementation gives out of the box executables that can be used by simple commands without a graphical interface and is thus suited for server applications. As we will review in the next part, we will need to perform data manipulation on huge dataset. It becomes important to have fast and efficient implementations in order to deal with this new "Big Data" era.
随着今天可用的计算能力,机器学习正在成为一个非常活跃的领域,在我们的日常生活中找到它的应用。其最大的挑战之一是涉及数据表示的分类任务(机器学习算法中的预处理部分)。事实上,线性可分数据的分类是很容易做到的。预处理部分的目的是通过将原始数据映射到可以有效使用简单分类器的“特征空间”来获得良好表示的数据。例如,到目前为止,几乎所有关于音频/生物声学的东西都使用MFCC功能。我们在这里提供了一个工具箱,通过提供散射网络的实现,提供了使用c++编程语言进行音频表示的基本工具,为这些任务带来了一个新的强大的解决方案。我们将实现重点放在大规模数据集和服务器应用程序上。散射分析的参考工具包是Mallat等人的SCATNET http://www.di.ens.fr/data/software/scatnet/。这个工具试图让一些简单的特性更容易应对大数据的挑战。此外,这个工具箱的使用并不局限于机器学习预处理。它也可以用于更高级的生物学分析,如动物交流行为分析或任何与信号分析相关的生物学研究。这种实现提供了开箱即用的可执行文件,可以通过简单的命令使用,而不需要图形界面,因此适合服务器应用程序。正如我们将在下一部分中回顾的那样,我们将需要在庞大的数据集上执行数据操作。为了应对这个新的“大数据”时代,快速有效的实施变得非常重要。
{"title":"Scattering Decomposition for Massive Signal Classification: From Theory to Fast Algorithm and Implementation with Validation on International Bioacoustic Benchmark","authors":"Randall Balestriero, H. Glotin","doi":"10.1109/ICDMW.2015.127","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.127","url":null,"abstract":"With the computational power available today, machine learning is becoming a very active field finding its applications in our everyday life. One of its biggest challenge is the classification task involving data representation (the preprocessing part in a machine learning algorithm). In fact, classification of linearly separable data can be easily done. The aim of the preprocessing part is to obtain well represented data by mapping raw data into a \"feature space\" where simple classifiers can be used efficiently. For example, almost everything around audio/bioacoustic uses MFCC features until now. We present here a toolbox giving the basic tools for audio representation using the C++ programming language by providing an implementation of the Scattering Network which brings a new and powerful solution for these tasks. We focused our implementation to massive dataset and servers applications. The toolkit of reference in scattering analysis is SCATNET from Mallat et al. http://www.di.ens.fr/data/software/scatnet/. This tool is an attempt to have some of the scatnet features moretractable for Big Data challenges. Furthermore, the use of this toolbox is not limited to machine learning preprocessing. It can also be used for more advanced biological analysis such as animal communication behaviours analysis or any biological study related to signal analysis. This implementation gives out of the box executables that can be used by simple commands without a graphical interface and is thus suited for server applications. As we will review in the next part, we will need to perform data manipulation on huge dataset. It becomes important to have fast and efficient implementations in order to deal with this new \"Big Data\" era.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123830555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Block-Organized Topology Visualization for Visual Exploration of Signed Networks 块组织拓扑可视化用于签名网络的可视化探索
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.117
Xianlin Hu, Leting Wu, Aidong Lu, Xintao Wu
Many networks nowadays contain both positive and negative relationships, such as ratings and conflicts, which are often mixed in the layouts of network visualization represented by the layouts of node-link diagram and node indices of matrix representation. In this work, we present a visual analysis framework for visualizing signed networks through emphasizing different effects of signed edges on network topologies. The theoretical foundation of the visual analysis framework comes from the spectral analysis of data patterns in the high-dimensional spectral space. Based on the spectral analysis results, we present a block-organized visualization approach in the hybrid form of matrix, node-link, and arc diagrams with the focus on revealing topological structures of signed networks. We demonstrate with a detailed case study that block-organized visualization and spectral space exploration can be combined to analyze topologies of signed networks effectively.
目前,许多网络同时包含正、负关系,如等级关系、冲突关系等,这些关系往往混杂在以节点链接图和矩阵表示的节点指标为代表的网络可视化布局中。在这项工作中,我们通过强调签名边对网络拓扑的不同影响,提出了一个可视化签名网络的可视化分析框架。可视化分析框架的理论基础来源于高维光谱空间中数据模式的光谱分析。基于谱分析结果,我们提出了一种以矩阵、节点链接和圆弧图混合形式的块组织可视化方法,重点揭示了签名网络的拓扑结构。我们通过一个详细的案例研究证明,块组织可视化和频谱空间探索可以结合起来有效地分析签名网络的拓扑结构。
{"title":"Block-Organized Topology Visualization for Visual Exploration of Signed Networks","authors":"Xianlin Hu, Leting Wu, Aidong Lu, Xintao Wu","doi":"10.1109/ICDMW.2015.117","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.117","url":null,"abstract":"Many networks nowadays contain both positive and negative relationships, such as ratings and conflicts, which are often mixed in the layouts of network visualization represented by the layouts of node-link diagram and node indices of matrix representation. In this work, we present a visual analysis framework for visualizing signed networks through emphasizing different effects of signed edges on network topologies. The theoretical foundation of the visual analysis framework comes from the spectral analysis of data patterns in the high-dimensional spectral space. Based on the spectral analysis results, we present a block-organized visualization approach in the hybrid form of matrix, node-link, and arc diagrams with the focus on revealing topological structures of signed networks. We demonstrate with a detailed case study that block-organized visualization and spectral space exploration can be combined to analyze topologies of signed networks effectively.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121465675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2015 IEEE International Conference on Data Mining Workshop (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1