首页 > 最新文献

2015 IEEE International Conference on Data Mining Workshop (ICDMW)最新文献

英文 中文
A Novel Approach for Generating Personalized Mention List on Micro-Blogging System 微博系统个性化提及列表生成的一种新方法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.51
Ge Zhou, Lu Yu, Chuxu Zhang, Chuang Liu, Zi-Ke Zhang, Jianlin Zhang
Online social networks provide us a convenient way to access information, which in turn bring the information overload problem. Most of the previous works focused on analyzing user's retweet behavior on the micro-blogging system, and diverse recommendation algorithms were proposed to push personalized tweet list to users. In this paper, we aim to solve the overload problem in the mention list. We firstly explore the in-depth differences between mention and retweet behaviors, and find the users' various actions for a piece of mention. Then we propose a personalized ranking model with consideration on multi-dimensional relations among users and mention tweets to generate the personalized mention list. The experiment results on a micro-blogging system data set show that the proposed method performs better than benchmark methods.
在线社交网络为我们提供了一种方便的获取信息的方式,这反过来又带来了信息过载的问题。以往的工作大多侧重于分析用户在微博系统上的转发行为,并提出了多种推荐算法向用户推送个性化的推文列表。在本文中,我们的目标是解决提及表中的过载问题。我们首先深入探讨了提及和转发行为之间的差异,并找到了用户对于一条提及的各种行为。然后,我们提出了一种考虑用户与提及tweets之间多维关系的个性化排名模型,生成个性化的提及列表。在微博系统数据集上的实验结果表明,该方法的性能优于基准方法。
{"title":"A Novel Approach for Generating Personalized Mention List on Micro-Blogging System","authors":"Ge Zhou, Lu Yu, Chuxu Zhang, Chuang Liu, Zi-Ke Zhang, Jianlin Zhang","doi":"10.1109/ICDMW.2015.51","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.51","url":null,"abstract":"Online social networks provide us a convenient way to access information, which in turn bring the information overload problem. Most of the previous works focused on analyzing user's retweet behavior on the micro-blogging system, and diverse recommendation algorithms were proposed to push personalized tweet list to users. In this paper, we aim to solve the overload problem in the mention list. We firstly explore the in-depth differences between mention and retweet behaviors, and find the users' various actions for a piece of mention. Then we propose a personalized ranking model with consideration on multi-dimensional relations among users and mention tweets to generate the personalized mention list. The experiment results on a micro-blogging system data set show that the proposed method performs better than benchmark methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129885291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Mining Unstable Communities from Network Ensembles 从网络集合中挖掘不稳定社区
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.87
Ahsanur Rahman, Steve T. K. Jan, Hyunju Kim, B. Prakash, T. Murali
Ensembles of graphs arise in several natural applications, such as mobility tracking, computational biology, socialnetworks, and epidemiology. A common problem addressed by many existing mining techniques is to identify subgraphs of interest in these ensembles. In contrast, in this paper, we propose to quickly discover maximally variable regions of the graphs, i.e., sets of nodes that induce very different subgraphs across the ensemble. We first develop two intuitive and novel definitions of such node sets, which we then show can be efficiently enumerated using a level-wise algorithm. Finally, using extensive experiments on multiple real datasets, we show how these sets capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets.
图的集成出现在一些自然应用中,如移动跟踪、计算生物学、社交网络和流行病学。许多现有挖掘技术解决的一个常见问题是识别这些集成中感兴趣的子图。相反,在本文中,我们提出快速发现图的最大可变区域,即在集成中产生非常不同子图的节点集。我们首先开发了这种节点集的两个直观和新颖的定义,然后我们展示了可以使用分层算法有效地枚举它们。最后,通过对多个真实数据集的广泛实验,我们展示了这些数据集如何捕获给定网络集的主要结构变化,并为我们提供了关于这些数据集的有趣和相关的见解。
{"title":"Mining Unstable Communities from Network Ensembles","authors":"Ahsanur Rahman, Steve T. K. Jan, Hyunju Kim, B. Prakash, T. Murali","doi":"10.1109/ICDMW.2015.87","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.87","url":null,"abstract":"Ensembles of graphs arise in several natural applications, such as mobility tracking, computational biology, socialnetworks, and epidemiology. A common problem addressed by many existing mining techniques is to identify subgraphs of interest in these ensembles. In contrast, in this paper, we propose to quickly discover maximally variable regions of the graphs, i.e., sets of nodes that induce very different subgraphs across the ensemble. We first develop two intuitive and novel definitions of such node sets, which we then show can be efficiently enumerated using a level-wise algorithm. Finally, using extensive experiments on multiple real datasets, we show how these sets capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130386655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Selecting Machine Learning Algorithms Using Regression Models 使用回归模型选择机器学习算法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.43
Tri Doan, J. Kalita
In performing data mining, a common task is to search for the most appropriate algorithm(s) to retrieve important information from data. With an increasing number of available data mining techniques, it may be impractical to experiment with many techniques on a specific dataset of interest to find the best algorithm(s). In this paper, we demonstrate the suitability of tree-based multi-variable linear regression in predicting algorithm performance. We take into account prior machine learning experience to construct meta-knowledge for supervised learning. The idea is to use summary knowledge about datasets along with past performance of algorithms on these datasets to build this meta-knowledge. We augment pure statistical summaries with descriptive features and a misclassification cost, and discover that transformed datasets obtained by reducing a high dimensional feature space to a smaller dimension still retain significant characteristic knowledge necessary to predict algorithm performance. Our approach works well for both numerical and nominal data obtained from real world environments.
在执行数据挖掘时,一个常见的任务是搜索最合适的算法来从数据中检索重要信息。随着可用的数据挖掘技术越来越多,在感兴趣的特定数据集上试验许多技术以找到最佳算法可能是不切实际的。在本文中,我们证明了基于树的多变量线性回归在预测算法性能方面的适用性。我们考虑之前的机器学习经验来构建元知识进行监督学习。这个想法是使用关于数据集的总结知识以及这些数据集上算法的过去性能来构建这个元知识。我们用描述性特征和错误分类代价增强了纯统计摘要,并发现通过将高维特征空间降至较小维度获得的转换数据集仍然保留了预测算法性能所需的重要特征知识。我们的方法适用于从真实世界环境中获得的数值和标称数据。
{"title":"Selecting Machine Learning Algorithms Using Regression Models","authors":"Tri Doan, J. Kalita","doi":"10.1109/ICDMW.2015.43","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.43","url":null,"abstract":"In performing data mining, a common task is to search for the most appropriate algorithm(s) to retrieve important information from data. With an increasing number of available data mining techniques, it may be impractical to experiment with many techniques on a specific dataset of interest to find the best algorithm(s). In this paper, we demonstrate the suitability of tree-based multi-variable linear regression in predicting algorithm performance. We take into account prior machine learning experience to construct meta-knowledge for supervised learning. The idea is to use summary knowledge about datasets along with past performance of algorithms on these datasets to build this meta-knowledge. We augment pure statistical summaries with descriptive features and a misclassification cost, and discover that transformed datasets obtained by reducing a high dimensional feature space to a smaller dimension still retain significant characteristic knowledge necessary to predict algorithm performance. Our approach works well for both numerical and nominal data obtained from real world environments.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127942066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Examining Botnet Behaviors for Propaganda Dissemination: A Case Study of ISIL's Beheading Videos-Based Propaganda 检查僵尸网络行为的宣传传播:ISIL的斩首视频为基础的宣传案例研究
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.41
Samer Al-khateeb, Nitin Agarwal
Since the dissemination of the first beheading video by the Islamic State in Iraq and Levant (ISIL) of its hostage James Foley (an American journalist), this practice has become increasingly common. Videos of ISIL beheading their hostages in orange jumpsuits swarmed over social media as they swept across Iraq. By showing such shocking videos and images, ISIL is able to spread their opinions and create emotional attitudes for their followers. Through a sophisticated social media strategy and strategic use of botnets, ISIL is succeeding in its propaganda dissemination. ISIL is using social media as a tool to conduct recruitment and radicalization campaigns and raise funds. In this study, we examine the reasons for creating such videos grounded in the literature from cultural anthropology, transnationalism and religious identity, and media & communication. Toward this direction, we collect data from Twitter for the beheadings done by ISIL, especially the Egyptian Copts, the Arab-Israeli "Spy", and the Ethiopian Christians. The study provides insights into the way ISIL uses social media (especially Twitter) to disseminate propaganda and develop a framework to identify sociotechnical behavioral patterns from social and computational science perspective.
自从伊拉克和黎凡特伊斯兰国(ISIL)发布其人质詹姆斯·福利(James Foley,美国记者)的首个斩首视频以来,这种做法变得越来越普遍。isis斩首身穿橙色囚服的人质的视频席卷伊拉克,在社交媒体上铺天盖地。通过展示这些令人震惊的视频和图像,ISIL能够传播他们的观点,并为他们的追随者创造情感态度。通过复杂的社交媒体策略和对僵尸网络的战略性使用,ISIL在宣传传播方面取得了成功。ISIL正在利用社交媒体作为工具进行招募和激进化活动,并筹集资金。在本研究中,我们从文化人类学、跨国主义和宗教认同以及媒体与传播的文献中考察了创作此类视频的原因。朝着这个方向,我们从Twitter上收集了ISIL斩首的数据,尤其是埃及科普特人、阿拉伯-以色列“间谍”和埃塞俄比亚基督徒。该研究提供了对ISIL使用社交媒体(尤其是Twitter)传播宣传的方式的见解,并从社会和计算科学的角度开发了一个框架来识别社会技术行为模式。
{"title":"Examining Botnet Behaviors for Propaganda Dissemination: A Case Study of ISIL's Beheading Videos-Based Propaganda","authors":"Samer Al-khateeb, Nitin Agarwal","doi":"10.1109/ICDMW.2015.41","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.41","url":null,"abstract":"Since the dissemination of the first beheading video by the Islamic State in Iraq and Levant (ISIL) of its hostage James Foley (an American journalist), this practice has become increasingly common. Videos of ISIL beheading their hostages in orange jumpsuits swarmed over social media as they swept across Iraq. By showing such shocking videos and images, ISIL is able to spread their opinions and create emotional attitudes for their followers. Through a sophisticated social media strategy and strategic use of botnets, ISIL is succeeding in its propaganda dissemination. ISIL is using social media as a tool to conduct recruitment and radicalization campaigns and raise funds. In this study, we examine the reasons for creating such videos grounded in the literature from cultural anthropology, transnationalism and religious identity, and media & communication. Toward this direction, we collect data from Twitter for the beheadings done by ISIL, especially the Egyptian Copts, the Arab-Israeli \"Spy\", and the Ethiopian Christians. The study provides insights into the way ISIL uses social media (especially Twitter) to disseminate propaganda and develop a framework to identify sociotechnical behavioral patterns from social and computational science perspective.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128950816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
An Enumerative Biclustering Algorithm for DNA Microarray Data DNA微阵列数据的枚举双聚类算法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.168
Haifa Ben Saber, M. Elloumi
In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative Lattice (EnumLat) for biclustering of binary microarray data. EnumLat is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevant biclusters.
在许多领域,如DNA微阵列数据分析中,我们需要同时对数据矩阵的行(基因)和列(条件)进行聚类,以识别具有一组列的恒定行组。这种聚类称为双聚类。双聚类算法广泛应用于DNA微阵列数据分析。更有效的双聚类算法是非常可取和需要的。本文介绍了一种用于二进制微阵列数据双聚类的新算法——枚举点阵(EnumLat)。EnumLat是一种采用双聚类枚举方法的算法。该算法提取出质量一致的所有双聚类。EnumLat的主要思想是构建一个新的树结构来充分表示枚举过程中发现的不同的双聚类。该算法采用一次处理所有双聚类的策略。使用合成和真实DNA微阵列数据对所提出算法的性能进行了评估,我们的算法优于其他二进制微阵列数据的双聚类算法。此外,我们使用基因注释网络工具测试了生物学意义,表明我们提出的方法能够产生生物学相关的双聚类。
{"title":"An Enumerative Biclustering Algorithm for DNA Microarray Data","authors":"Haifa Ben Saber, M. Elloumi","doi":"10.1109/ICDMW.2015.168","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.168","url":null,"abstract":"In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative Lattice (EnumLat) for biclustering of binary microarray data. EnumLat is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevant biclusters.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127487192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Influence Visualization of Scientific Paper through Flow-Based Citation Network Summarization
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.105
Yue Su, Sibai Sun, Yuan Xuan, Lei Shi
This paper presents VEGAS - an online system that can illustrate the influence of one scientific paper on citation networks via the influence graph summarization and visualization. The system is built over an algorithm pipeline that maximizes the rate of influence flows in the final summarization. Both visualization and interaction designs are described with respect to a real usage scenario of the VEGAS system.
本文介绍了一个在线系统VEGAS,它可以通过影响图的汇总和可视化来说明一篇科学论文对引文网络的影响。该系统是建立在一个算法管道,最大限度地提高影响流的速率在最后的总结。可视化和交互设计都是根据VEGAS系统的实际使用场景来描述的。
{"title":"Influence Visualization of Scientific Paper through Flow-Based Citation Network Summarization","authors":"Yue Su, Sibai Sun, Yuan Xuan, Lei Shi","doi":"10.1109/ICDMW.2015.105","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.105","url":null,"abstract":"This paper presents VEGAS - an online system that can illustrate the influence of one scientific paper on citation networks via the influence graph summarization and visualization. The system is built over an algorithm pipeline that maximizes the rate of influence flows in the final summarization. Both visualization and interaction designs are described with respect to a real usage scenario of the VEGAS system.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126973618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Pruned Simple Model Sets for Fast Exact Recovery of Image 修剪简单的模型集,快速准确地恢复图像
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.54
Basarab Matei, Younès Bennani
Reconstruction of image can be defined as the general problem of estimating a two-dimensional object from a partial version of this object (a limited set of "projections"). In this paper, we propose new approach for image reconstruction based onsimple quasicrystals and L1 minimisation. We discuss the exact reconstruction of an image supposed to have small spectra. We show that simple model sets may be used as sampling set for exact recovery. Moreover, by eliminating a finite number of points from the simple model sets we still have exact recovery. This last aspect is very important for practical applications, e.g. lossy compression. We run our approch on benchmark images data sets and show that the quasicrystal sampling is more performant than the random uniform in terms of time execution when the dimension of the input image increases.
图像重建可以定义为从该对象的部分版本(有限的“投影”集)估计二维对象的一般问题。在本文中,我们提出了基于简单准晶体和L1最小化的图像重建新方法。我们讨论了假设具有小光谱的图像的精确重建。我们证明了简单的模型集可以作为精确恢复的采样集。此外,通过从简单模型集中消除有限数量的点,我们仍然有精确的恢复。最后一个方面对于实际应用非常重要,例如有损压缩。我们在基准图像数据集上运行了我们的方法,并表明当输入图像的维数增加时,准晶体采样在执行时间方面比随机均匀采样性能更好。
{"title":"Pruned Simple Model Sets for Fast Exact Recovery of Image","authors":"Basarab Matei, Younès Bennani","doi":"10.1109/ICDMW.2015.54","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.54","url":null,"abstract":"Reconstruction of image can be defined as the general problem of estimating a two-dimensional object from a partial version of this object (a limited set of \"projections\"). In this paper, we propose new approach for image reconstruction based onsimple quasicrystals and L1 minimisation. We discuss the exact reconstruction of an image supposed to have small spectra. We show that simple model sets may be used as sampling set for exact recovery. Moreover, by eliminating a finite number of points from the simple model sets we still have exact recovery. This last aspect is very important for practical applications, e.g. lossy compression. We run our approch on benchmark images data sets and show that the quasicrystal sampling is more performant than the random uniform in terms of time execution when the dimension of the input image increases.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123002498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Learning Techniques for Detection of Regions of Interest in Solar Images 太阳图像中感兴趣区域检测的无监督学习技术
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.61
J. Banda, R. Angryk
Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images that contain potential solar phenomena, this work focuses on describing an automated, non-supervised methodology that will allow us to reduce the image search space when trying to find similar solar phenomenon between multiple sets of images. By experimenting with multiple methods, we identify a successful approach to automatically detecting ROIs for a more refined and robust search in the SDO Content-Based Image-Retrieval (CBIR) system. We then present an extensive experimental evaluation to identify the best performing parameters for our methodology in terms of overlap with expert curated ROIs. Finally we present an exhaustive evaluation of the proposed approach in several image retrieval scenarios to demonstrate that the performance of the identified ROIs is very similar to that of ROIs identified by dedicated science modules of the SDO mission.
图像中兴趣区域的识别是一个非常活跃的研究问题,因为它高度依赖于图像的类型和特征。在本文中,我们提出了一种比较评估的无监督学习方法,特别是聚类,以识别来自太阳动力学天文台(SDO)任务的太阳图像中的roi。为了在太阳图像中找到包含潜在太阳现象的区域,这项工作着重于描述一种自动化的、无监督的方法,该方法将使我们在试图在多组图像之间找到类似的太阳现象时减少图像搜索空间。通过对多种方法的实验,我们确定了一种成功的方法来自动检测roi,以便在SDO基于内容的图像检索(CBIR)系统中进行更精细和健壮的搜索。然后,我们提出了一个广泛的实验评估,以确定我们的方法在与专家策划的roi重叠方面的最佳表现参数。最后,我们在几个图像检索场景中对所提出的方法进行了详尽的评估,以证明所识别的roi的性能与SDO任务的专用科学模块所识别的roi非常相似。
{"title":"Unsupervised Learning Techniques for Detection of Regions of Interest in Solar Images","authors":"J. Banda, R. Angryk","doi":"10.1109/ICDMW.2015.61","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.61","url":null,"abstract":"Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images that contain potential solar phenomena, this work focuses on describing an automated, non-supervised methodology that will allow us to reduce the image search space when trying to find similar solar phenomenon between multiple sets of images. By experimenting with multiple methods, we identify a successful approach to automatically detecting ROIs for a more refined and robust search in the SDO Content-Based Image-Retrieval (CBIR) system. We then present an extensive experimental evaluation to identify the best performing parameters for our methodology in terms of overlap with expert curated ROIs. Finally we present an exhaustive evaluation of the proposed approach in several image retrieval scenarios to demonstrate that the performance of the identified ROIs is very similar to that of ROIs identified by dedicated science modules of the SDO mission.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122941758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Event Detection from Millions of Tweets Related to the Great East Japan Earthquake Using Feature Selection Technique 基于特征选择技术的数百万条东日本大地震相关推文事件检测
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.248
T. Hashimoto, D. Shepard, T. Kuboyama, Kilho Shin
Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrateour method's effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.
社交媒体提供了丰富的洞察力,让我们了解重大事件对个人的影响,比如东日本大地震、阿拉伯之春和波士顿爆炸案。然而,可用数据的规模可能令人生畏:在东日本大地震期间,每天仅日本就发出了800多万条推文。传统的基于词向量的社交媒体事件检测技术使用潜在语义分析、潜在狄利克雷分配或图社区检测,由于它们的空间和时间复杂性,通常无法扩展到如此大的数据量。为了缓解这一问题,我们提出了一种有效的事件检测方法,即利用快速特征选择算法CWC。当我们从每个时隙(在我们的例子中,每小时)的作者和单词的单词计数向量开始时,我们使用CWC从每个时隙提取判别词,这大大减少了要跟踪的特征数量。然后我们将这些词向量转换成从初始点到向量距离的时间序列。当事件发生时,每个时隙与初始点之间的距离保持较高,但当事件结束时,距离急剧下降,从而提供了事件跨度的准确描述。这种方法使得从大量数据集中检测事件成为可能。为了证明我们方法的有效性,我们从东日本大地震后21天内发送的超过2亿条推文的数据集中提取事件。使用CWC,我们可以快速准确地从数据集中识别事件。
{"title":"Event Detection from Millions of Tweets Related to the Great East Japan Earthquake Using Feature Selection Technique","authors":"T. Hashimoto, D. Shepard, T. Kuboyama, Kilho Shin","doi":"10.1109/ICDMW.2015.248","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.248","url":null,"abstract":"Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrateour method's effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126675733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Lifting the Predictability of Human Mobility on Activity Trajectories 提高人类活动轨迹的可预测性
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.164
Xianming Li, Defu Lian, Xing Xie, Guangzhong Sun
Mobility prediction has recently attracted plenty of attention since it plays an important part in many applications ranging from urban planning and traffic forecasting to location-based services, including mobile recommendation and mobile advertisement. However, there is little study on exploiting the activity information, being often associated with the trajectories on which prediction is based, for assisting location prediction. To this end, in this paper, we propose a Time-stamped Activity INference Enhanced Predictor (TAINEP) for forecasting next location on activity trajectories. In TAINEP, we propose to leverage topic models for dimension reduction so as to capture co-occurrences of different time-stamped activities. It is then extended to incorporate temporal dependence between topics of consecutive time-stamped activities to infer the activity which may be conducted at the next location and the time when it will happen. Based on the inferred time-stamped activities, a probabilistic mixture model is further put forward to integrate them with commonly-used Markov predictors for forecasting the next locations. We finally evaluate the proposed model on two real-world datasets. The results show that the proposed method outperforms the competing predictors without inferring time-stamped activities. In other words, it lifts the predictability of human mobility.
从城市规划和交通预测到基于位置的服务,包括移动推荐和移动广告,移动预测在许多应用中发挥着重要作用,近年来引起了人们的广泛关注。然而,很少有研究利用活动信息,通常与预测所依据的轨迹相关联,以协助位置预测。为此,在本文中,我们提出了一个时间戳活动推断增强预测器(TAINEP)来预测活动轨迹上的下一个位置。在TAINEP中,我们建议利用主题模型进行降维,以便捕获不同时间戳活动的共同出现。然后将其扩展为包含连续时间戳活动的主题之间的时间依赖性,以推断可能在下一个地点进行的活动及其发生的时间。在推断出时间戳活动的基础上,进一步提出了一种概率混合模型,将其与常用的马尔可夫预测因子相结合,用于预测下一个地点。最后,我们在两个真实世界的数据集上评估了所提出的模型。结果表明,该方法在不推断时间戳活动的情况下优于竞争预测器。换句话说,它提高了人类流动性的可预测性。
{"title":"Lifting the Predictability of Human Mobility on Activity Trajectories","authors":"Xianming Li, Defu Lian, Xing Xie, Guangzhong Sun","doi":"10.1109/ICDMW.2015.164","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.164","url":null,"abstract":"Mobility prediction has recently attracted plenty of attention since it plays an important part in many applications ranging from urban planning and traffic forecasting to location-based services, including mobile recommendation and mobile advertisement. However, there is little study on exploiting the activity information, being often associated with the trajectories on which prediction is based, for assisting location prediction. To this end, in this paper, we propose a Time-stamped Activity INference Enhanced Predictor (TAINEP) for forecasting next location on activity trajectories. In TAINEP, we propose to leverage topic models for dimension reduction so as to capture co-occurrences of different time-stamped activities. It is then extended to incorporate temporal dependence between topics of consecutive time-stamped activities to infer the activity which may be conducted at the next location and the time when it will happen. Based on the inferred time-stamped activities, a probabilistic mixture model is further put forward to integrate them with commonly-used Markov predictors for forecasting the next locations. We finally evaluate the proposed model on two real-world datasets. The results show that the proposed method outperforms the competing predictors without inferring time-stamped activities. In other words, it lifts the predictability of human mobility.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126867921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2015 IEEE International Conference on Data Mining Workshop (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1