首页 > 最新文献

Proceedings of the 3rd IKDD Conference on Data Science, 2016最新文献

英文 中文
Query Classification using LDA Topic Model and Sparse Representation Based Classifier 基于LDA主题模型和稀疏表示分类器的查询分类
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888474
Indrani Bhattacharya, J. Sil
Users often seek for information by submitting query consisting of keywords may belong to multiple topics, representing overlapping concepts. Objective of the work is to classify the query into a topic class label by considering the query keywords distributed over various topics. The approach effectively reduces the search space in order to retrieve information computationally efficient way. First we apply Latent Dirichlet Allocation (LDA) on the entire corpus to group the documents into topics consisting of unique words. As a next step, a term vocabulary (TRV) has been built with unique words present in the topics. We develop a Topic-Vocabulary Matrix (TVM) by encoding the TRV with respect to each topic. The TVM expresses word distribution among the topics and presented as training data set, which is sparse. The query is encoded by the same way and submitted as test data. We apply sparse representation based classifier (SRC) to classify the query as a topic. The proposed approach shows satisfactory performance with 93% accuracy in classifying query.
用户经常通过提交由多个关键词组成的查询来查找信息,这些关键词可能属于多个主题,表示重叠的概念。该工作的目的是通过考虑分布在各个主题上的查询关键字,将查询分类为主题类标签。该方法有效地缩小了搜索空间,以获得计算效率高的信息检索方式。首先,我们对整个语料库应用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)将文档分组到由唯一单词组成的主题中。作为下一步,术语词汇表(TRV)已经建立,其中包含主题中存在的独特单词。通过对每个主题的TRV进行编码,建立了主题词汇矩阵(TVM)。TVM表示单词在主题之间的分布,并作为训练数据集表示,该数据集是稀疏的。查询以同样的方式编码,并作为测试数据提交。我们使用基于稀疏表示的分类器(SRC)将查询分类为主题。该方法在分类查询中取得了令人满意的性能,准确率达到93%。
{"title":"Query Classification using LDA Topic Model and Sparse Representation Based Classifier","authors":"Indrani Bhattacharya, J. Sil","doi":"10.1145/2888451.2888474","DOIUrl":"https://doi.org/10.1145/2888451.2888474","url":null,"abstract":"Users often seek for information by submitting query consisting of keywords may belong to multiple topics, representing overlapping concepts. Objective of the work is to classify the query into a topic class label by considering the query keywords distributed over various topics. The approach effectively reduces the search space in order to retrieve information computationally efficient way. First we apply Latent Dirichlet Allocation (LDA) on the entire corpus to group the documents into topics consisting of unique words. As a next step, a term vocabulary (TRV) has been built with unique words present in the topics. We develop a Topic-Vocabulary Matrix (TVM) by encoding the TRV with respect to each topic. The TVM expresses word distribution among the topics and presented as training data set, which is sparse. The query is encoded by the same way and submitted as test data. We apply sparse representation based classifier (SRC) to classify the query as a topic. The proposed approach shows satisfactory performance with 93% accuracy in classifying query.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123584802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploiting Local and Global Context In PPI networks For Efficient Protein Function Prediction 利用局部和全局背景在PPI网络有效的蛋白质功能预测
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888461
D. S. Kumar, Siddharth Goyal, V. Reddy, Ramesh Loganathan
Protein-protein interaction (PPI) networks are valuable biological data source which contain rich information useful for protein function prediction. The PPI network data obtained from high-throughput experiments is known to be noisy and incomplete. In the literature, common neighbor, clustering, and classification-based approaches have been proposed to improve the performance of protein function prediction by modeling PPI data as a graph. These approaches exploit the fact that protein shares function with other proteins directly interacting with it. In this paper we have experimented an alternative approach by exploiting the notion that two proteins share a function if they have a well defined group of directly or indirectly connected common neighbors. The experiments conducted on variety of PPI network datasets show that the proposed approach improves protein function prediction accuracy over existing approaches.
蛋白质-蛋白质相互作用(PPI)网络包含丰富的蛋白质功能预测信息,是一个有价值的生物学数据源。从高通量实验中获得的PPI网络数据是已知的有噪声和不完整的。在文献中,已经提出了共同邻居、聚类和基于分类的方法,通过将PPI数据建模为图来提高蛋白质功能预测的性能。这些方法利用了蛋白质与其他直接相互作用的蛋白质共享功能的事实。在本文中,我们通过利用两个蛋白质共享功能的概念,实验了另一种方法,如果它们有一个明确定义的直接或间接连接的共同邻居组。在各种PPI网络数据集上进行的实验表明,该方法比现有方法提高了蛋白质功能预测的准确性。
{"title":"Exploiting Local and Global Context In PPI networks For Efficient Protein Function Prediction","authors":"D. S. Kumar, Siddharth Goyal, V. Reddy, Ramesh Loganathan","doi":"10.1145/2888451.2888461","DOIUrl":"https://doi.org/10.1145/2888451.2888461","url":null,"abstract":"Protein-protein interaction (PPI) networks are valuable biological data source which contain rich information useful for protein function prediction. The PPI network data obtained from high-throughput experiments is known to be noisy and incomplete. In the literature, common neighbor, clustering, and classification-based approaches have been proposed to improve the performance of protein function prediction by modeling PPI data as a graph. These approaches exploit the fact that protein shares function with other proteins directly interacting with it. In this paper we have experimented an alternative approach by exploiting the notion that two proteins share a function if they have a well defined group of directly or indirectly connected common neighbors. The experiments conducted on variety of PPI network datasets show that the proposed approach improves protein function prediction accuracy over existing approaches.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126256251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Spatio-temporal Change Pattern using Mathematical Morphology 基于数学形态学的时空变化模式建模
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888458
Monidipa Das, S. Ghosh
Detection and assessment of spatio-temporal change pattern is a challenging task, and may provide insights into various spatio-temporal changes, like urban sprawl monitoring, surveillance of epidemics due to infectious diseases etc. The existing spatio-temporal pattern mining techniques mostly deal with the assessment of thematic change patterns. However, analyzing the spatio-temporal pattern of geometric changes is also important for analyzing such kinds of spatial changes on a temporal scale. This paper presents a novel framework for modeling such spatio-temporal change in geometry with the help of mathematical morphology and directional granulometric analysis. Morphological operators have been used to detect the various spatio-temporal change patterns in geometry, like spatial growth (due to Expansion and Merge), spatial shrinkage (due to Contraction and Split) etc. Further, the temporal changes in the orientations of these patterns have been modeled by performing granulometric analyses on them. The proposed framework for spatio-temporal change pattern modeling has been validated considering four cases of spatio-temporal change, namely (i) spatial expansion, (ii) spatial contraction, (iii) spatial merge, and (iv) spatial split in regional distribution of climate zones in Australia.
时空变化模式的检测和评估是一项具有挑战性的任务,可以为城市蔓延监测、传染病流行监测等各种时空变化提供见解。现有的时空模式挖掘技术主要针对主题变化模式的评估。然而,分析几何变化的时空格局对于在时间尺度上分析这类空间变化也很重要。本文提出了一种新的框架,利用数学形态学和定向粒度分析在几何上模拟这种时空变化。形态学算子已被用于检测几何空间的各种时空变化模式,如空间增长(由于扩张和合并),空间收缩(由于收缩和分裂)等。此外,通过对这些模式进行粒度分析,模拟了这些模式方向的时间变化。基于澳大利亚气候带区域分布的空间扩张、空间收缩、空间融合和空间分裂四种时空变化情况,对所提出的时空变化模式建模框架进行了验证。
{"title":"Modeling Spatio-temporal Change Pattern using Mathematical Morphology","authors":"Monidipa Das, S. Ghosh","doi":"10.1145/2888451.2888458","DOIUrl":"https://doi.org/10.1145/2888451.2888458","url":null,"abstract":"Detection and assessment of spatio-temporal change pattern is a challenging task, and may provide insights into various spatio-temporal changes, like urban sprawl monitoring, surveillance of epidemics due to infectious diseases etc. The existing spatio-temporal pattern mining techniques mostly deal with the assessment of thematic change patterns. However, analyzing the spatio-temporal pattern of geometric changes is also important for analyzing such kinds of spatial changes on a temporal scale. This paper presents a novel framework for modeling such spatio-temporal change in geometry with the help of mathematical morphology and directional granulometric analysis. Morphological operators have been used to detect the various spatio-temporal change patterns in geometry, like spatial growth (due to Expansion and Merge), spatial shrinkage (due to Contraction and Split) etc. Further, the temporal changes in the orientations of these patterns have been modeled by performing granulometric analyses on them. The proposed framework for spatio-temporal change pattern modeling has been validated considering four cases of spatio-temporal change, namely (i) spatial expansion, (ii) spatial contraction, (iii) spatial merge, and (iv) spatial split in regional distribution of climate zones in Australia.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134252030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning transition models of biological regulatory and signaling networks from noisy data 从噪声数据中学习生物调控和信号网络的过渡模型
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888469
Deepika Vatsa, Sumeet Agarwal, A. Srinivasan
In this paper, we present an extended 2-step probabilistic LGTS (PLGTS) transition system which aims to identify the network structure and stochastic nature of biological processes using time series data. This work is a step towards system identification in a noisy environment using transition systems. Here, the noise implies noise in transitions between states in the observed data. Interestingly, noise in the data helps in assisting system identification. Experimental results on synthetic data show that noise actually helps in understanding the system dynamics as well as constraining the solution space; thus helping to identify the most probable network structure for a given data set.
在本文中,我们提出了一个扩展的2步概率LGTS (PLGTS)转移系统,该系统旨在利用时间序列数据识别生物过程的网络结构和随机性。这项工作是向使用转换系统在噪声环境中进行系统识别迈出的一步。这里的噪声指的是观测数据中状态间转换的噪声。有趣的是,数据中的噪声有助于辅助系统识别。在合成数据上的实验结果表明,噪声不仅有助于理解系统动力学,而且约束了解空间;从而帮助识别给定数据集最可能的网络结构。
{"title":"Learning transition models of biological regulatory and signaling networks from noisy data","authors":"Deepika Vatsa, Sumeet Agarwal, A. Srinivasan","doi":"10.1145/2888451.2888469","DOIUrl":"https://doi.org/10.1145/2888451.2888469","url":null,"abstract":"In this paper, we present an extended 2-step probabilistic LGTS (PLGTS) transition system which aims to identify the network structure and stochastic nature of biological processes using time series data. This work is a step towards system identification in a noisy environment using transition systems. Here, the noise implies noise in transitions between states in the observed data. Interestingly, noise in the data helps in assisting system identification. Experimental results on synthetic data show that noise actually helps in understanding the system dynamics as well as constraining the solution space; thus helping to identify the most probable network structure for a given data set.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134341339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scalable Quick Reduct Algorithm: Iterative MapReduce Approach 可伸缩快速约简算法:迭代MapReduce方法
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888476
P. Singh, P. Prasad
Feature selection by reduct computation is the key technique for knowledge acquistion using rough set theory. Existing MapReduce based reduct algorithms use Hadoop Map Reduce framework, which is not suitable for iterative algorithms. Paper aims to design and implementation of Iterative MapReduce based Quick reduct algorithm using Twister framework. The proposed In_MRQRA Algorithm has partial granular level computations at mappers and granular computations at reducer. Experimental analysis on KDD-Cup99 dataset empirically established the relevence of proposed approach.
基于约简计算的特征选择是粗糙集知识获取的关键技术。现有基于MapReduce的约简算法使用Hadoop MapReduce框架,不适合迭代算法。本文旨在利用Twister框架设计并实现基于迭代MapReduce的快速约简算法。提出的In_MRQRA算法在映射器上进行了部分粒度级计算,在reducer上进行了粒度级计算。在KDD-Cup99数据集上的实验分析经验证明了所提方法的相关性。
{"title":"Scalable Quick Reduct Algorithm: Iterative MapReduce Approach","authors":"P. Singh, P. Prasad","doi":"10.1145/2888451.2888476","DOIUrl":"https://doi.org/10.1145/2888451.2888476","url":null,"abstract":"Feature selection by reduct computation is the key technique for knowledge acquistion using rough set theory. Existing MapReduce based reduct algorithms use Hadoop Map Reduce framework, which is not suitable for iterative algorithms. Paper aims to design and implementation of Iterative MapReduce based Quick reduct algorithm using Twister framework. The proposed In_MRQRA Algorithm has partial granular level computations at mappers and granular computations at reducer. Experimental analysis on KDD-Cup99 dataset empirically established the relevence of proposed approach.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128549862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Weighted Linear Loss Twin Support Vector Clustering 加权线性损失双支持向量聚类
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888467
Reshma Khemchandani, Aman Pal
Traditional point based clustering methods such as k-means [1], k-median [2], etc. work by partitioning the data into clusters based on the cluster prototype points. These methods perform poorly in case when data is not distributed around several cluster points. In contrast to these, plane based clustering methods such as k-plane clustering [3], local k-proximal plane clustering [4], etc. have been proposed in literature. These methods calculate k cluster center planes and partition the data into k clusters according to the proximity of the datapoints with these k planes. Working on the lines of [5], in this paper, we have presented a Weighted Linear Loss Twin Support Vector Clustering termed as WLL-TWSVC for clustering problems. By introducing the weighted linear loss in the formulation of TWSVC leads to solving system of linear equations with lower computational cost as opposed to solving series of quadratic programming problems along with system of linear equations as in TWSVC. We have also introduces a regularization term in the objective function which takes care of structural risk component along with empirical risk.
传统的基于点的聚类方法,如k-means[1]、k-median[2]等,是根据聚类原型点将数据划分为簇。当数据不分布在几个聚类点时,这些方法的性能很差。与此相反,文献中提出了基于平面的聚类方法,如k-平面聚类[3]、局部k-近端平面聚类[4]等。这些方法计算k个聚类中心平面,并根据数据点与这k个平面的接近程度将数据划分为k个聚类。基于[5]的思路,在本文中,我们提出了一种加权线性损失双支持向量聚类,称为WLL-TWSVC,用于聚类问题。通过在TWSVC的公式中引入加权线性损失,可以以较低的计算成本求解线性方程组,而不是像TWSVC那样求解线性方程组的一系列二次规划问题。我们还在目标函数中引入了一个正则化项,它既考虑了结构风险成分,也考虑了经验风险。
{"title":"Weighted Linear Loss Twin Support Vector Clustering","authors":"Reshma Khemchandani, Aman Pal","doi":"10.1145/2888451.2888467","DOIUrl":"https://doi.org/10.1145/2888451.2888467","url":null,"abstract":"Traditional point based clustering methods such as k-means [1], k-median [2], etc. work by partitioning the data into clusters based on the cluster prototype points. These methods perform poorly in case when data is not distributed around several cluster points. In contrast to these, plane based clustering methods such as k-plane clustering [3], local k-proximal plane clustering [4], etc. have been proposed in literature. These methods calculate k cluster center planes and partition the data into k clusters according to the proximity of the datapoints with these k planes. Working on the lines of [5], in this paper, we have presented a Weighted Linear Loss Twin Support Vector Clustering termed as WLL-TWSVC for clustering problems. By introducing the weighted linear loss in the formulation of TWSVC leads to solving system of linear equations with lower computational cost as opposed to solving series of quadratic programming problems along with system of linear equations as in TWSVC. We have also introduces a regularization term in the objective function which takes care of structural risk component along with empirical risk.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129951344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Investigating the Potential of Aggregated Tweets as Surrogate Data for Forecasting Civil Protests 调查汇总tweet作为预测民间抗议替代数据的潜力
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888466
Swati Agarwal, A. Sureka
Online Micro-blogging Social Media websites like Twitter are being used as a real-time platform for information sharing and communication during planning and mobilization of civil unrest events. We conduct a study of more than 1.5 million English Tweets spanning 5 months on the topic of Immigration and found evidences of Twitter being used as a platform for planning and mobilization of protests and civil disobedience related demonstrations. We believe that Twitter data can be used as a surrogate and open-source precursor for forecasting civil unrest and investigate Machine Learning based techniques for building a prediction model. We present our solution approach consisting of various components such as named entity recognition (temporal, spatial location, people expressions extraction), semantic enrichment of events related tweets (crowd-buzz & commentary and mobilization & planning) location-time-topic correlation miner. We conduct a series of experiments on a real-world and large dataset and investigate the application of trend analysis. We conduct two case studies on civil unrest related events and demonstrate the effectiveness of our approach.
像推特这样的社交媒体网站正在被用作一个信息共享和沟通的实时平台,用于规划和动员内乱事件。我们对5个月来150多万条关于移民主题的英语推文进行了研究,发现推特被用作计划和动员抗议活动和公民不服从相关示威活动的平台的证据。我们相信Twitter数据可以用作预测内乱的代理和开源先驱,并研究基于机器学习的技术来构建预测模型。我们提出了由多个组件组成的解决方案,如命名实体识别(时间,空间位置,人物表情提取),事件相关tweet的语义丰富(人群buzz &评论和动员&规划)位置-时间-主题关联挖掘器。我们在现实世界和大型数据集上进行了一系列实验,并研究了趋势分析的应用。我们对内乱相关事件进行了两个案例研究,并证明了我们方法的有效性。
{"title":"Investigating the Potential of Aggregated Tweets as Surrogate Data for Forecasting Civil Protests","authors":"Swati Agarwal, A. Sureka","doi":"10.1145/2888451.2888466","DOIUrl":"https://doi.org/10.1145/2888451.2888466","url":null,"abstract":"Online Micro-blogging Social Media websites like Twitter are being used as a real-time platform for information sharing and communication during planning and mobilization of civil unrest events. We conduct a study of more than 1.5 million English Tweets spanning 5 months on the topic of Immigration and found evidences of Twitter being used as a platform for planning and mobilization of protests and civil disobedience related demonstrations. We believe that Twitter data can be used as a surrogate and open-source precursor for forecasting civil unrest and investigate Machine Learning based techniques for building a prediction model. We present our solution approach consisting of various components such as named entity recognition (temporal, spatial location, people expressions extraction), semantic enrichment of events related tweets (crowd-buzz & commentary and mobilization & planning) location-time-topic correlation miner. We conduct a series of experiments on a real-world and large dataset and investigate the application of trend analysis. We conduct two case studies on civil unrest related events and demonstrate the effectiveness of our approach.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129978919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Mining Multi-source Data to Study Workplace Activity Patterns 挖掘多源数据研究工作场所活动模式
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888470
Sachin Patel, Ravi Mahamuni, Meghendra Singh, David Clarance, Mayuri Duggirala, Shivani Sharma, Vinay Katiyar, Gauri Deshpande, Amruta Deshmukh, Vaibhav, Vivek Balaraman
Examining work activity patterns is a problem of enduring research in organizations. The fortuitous availability of a whole new set of data collection mechanisms such as mobiles, activity loggers, GPS based location detectors, provide us new ways of studying workplace behaviour. We present a data collection framework that helps in collection, anonymization, fusion, processing and mining of behavioural data. We use the framework to study the activities in a research and development team with an aim to find the relationship between behavioural traits, states, and activity patterns. We find partial support for the claim that behavioral states and activity patterns are associated.
检查工作活动模式是组织中长期研究的一个问题。一套全新的数据收集机制(如手机、活动记录仪、基于GPS的位置探测器)的偶然出现,为我们提供了研究职场行为的新方法。我们提出了一个数据收集框架,有助于行为数据的收集、匿名化、融合、处理和挖掘。我们使用该框架来研究研发团队的活动,目的是找到行为特征、状态和活动模式之间的关系。我们发现行为状态和活动模式相关的说法得到了部分支持。
{"title":"Mining Multi-source Data to Study Workplace Activity Patterns","authors":"Sachin Patel, Ravi Mahamuni, Meghendra Singh, David Clarance, Mayuri Duggirala, Shivani Sharma, Vinay Katiyar, Gauri Deshpande, Amruta Deshmukh, Vaibhav, Vivek Balaraman","doi":"10.1145/2888451.2888470","DOIUrl":"https://doi.org/10.1145/2888451.2888470","url":null,"abstract":"Examining work activity patterns is a problem of enduring research in organizations. The fortuitous availability of a whole new set of data collection mechanisms such as mobiles, activity loggers, GPS based location detectors, provide us new ways of studying workplace behaviour. We present a data collection framework that helps in collection, anonymization, fusion, processing and mining of behavioural data. We use the framework to study the activities in a research and development team with an aim to find the relationship between behavioural traits, states, and activity patterns. We find partial support for the claim that behavioral states and activity patterns are associated.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125107479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trustworthiness of t-Distributed Stochastic Neighbour Embedding t分布随机邻居嵌入的可信度
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888465
Shishir Pandey, R. Vaze
A well known technique for embedding high dimensional objects in two or three dimensional space is the t-distributed stochastic neighbour embedding (t-SNE). The t-SNE minimizes the Kullback-Liebler (KL) divergence between two probability distributions, one induced on points in the high dimensional space and the other induced on points in the low dimensional embedding space. In this work, we consider a more general framework of using Rényi divergence which is parametrized by the order α, the KL-divergence is a special case when α → 1.We study how various Rényi divergences perform when compared to the KL-divergence. We show that in terms of the metrics of trustworthiness and neighbourhood preservation, the embedding becomes better as Rényi divergence approaches the KL-divergence.
在二维或三维空间中嵌入高维对象的一种众所周知的技术是t分布随机邻居嵌入(t-SNE)。t-SNE最小化了两个概率分布之间的Kullback-Liebler (KL)散度,一个是在高维空间的点上引起的,另一个是在低维嵌入空间的点上引起的。在这项工作中,我们考虑了一个更一般的框架来使用r散度,它是由α阶参数化的,kl散度是当α→1时的特殊情况。我们研究了与kl -散度相比,各种r逍遥散度的表现。我们表明,在可信度和邻居保存的指标方面,当rsamnyi散度接近kl散度时,嵌入变得更好。
{"title":"Trustworthiness of t-Distributed Stochastic Neighbour Embedding","authors":"Shishir Pandey, R. Vaze","doi":"10.1145/2888451.2888465","DOIUrl":"https://doi.org/10.1145/2888451.2888465","url":null,"abstract":"A well known technique for embedding high dimensional objects in two or three dimensional space is the t-distributed stochastic neighbour embedding (t-SNE). The t-SNE minimizes the Kullback-Liebler (KL) divergence between two probability distributions, one induced on points in the high dimensional space and the other induced on points in the low dimensional embedding space. In this work, we consider a more general framework of using Rényi divergence which is parametrized by the order α, the KL-divergence is a special case when α → 1.We study how various Rényi divergences perform when compared to the KL-divergence. We show that in terms of the metrics of trustworthiness and neighbourhood preservation, the embedding becomes better as Rényi divergence approaches the KL-divergence.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134092287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SocialStories: Segmenting Stories within Trending Twitter Topics SocialStories:在Twitter热门话题中分割故事
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888453
Kokil Jaidka, Kaushik Ramachandran, Prakhar Gupta, Sajal Rustagi
This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster frequency, and a decay weighting for entities. We present our experiments on 0.19 million tweets posted in June 2014, revolving around the mentions of a software brand before, during and after a marketing conference and a software release. The novelty of our work is the text-based similarity calculation metrics, including a new similarity metric, called the inverse cluster frequency, and time-specific metrics that allow for the decay of old entities with the passage of time and preserve the homogeneity and the freshness of stories. We report improved performance and higher recall of 80%, against the gold standard (posthoc journalistic reports), as compared to LDA-, and Wavelet-based systems. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within sub-topics as compared to baseline systems.
这项研究提出了SocialStories——一个基于流式tweet增量聚类的系统,用于在Twitter上更广泛的趋势主题中识别细粒度的故事。贡献包括一种新的tf度量,称为逆聚类频率,以及实体的衰减加权。我们对2014年6月发布的19万条推文进行了实验,围绕一个软件品牌在营销会议和软件发布之前、期间和之后被提及的情况。我们工作的新颖之处在于基于文本的相似性计算度量,包括一种新的相似性度量,称为逆聚类频率,以及特定于时间的度量,该度量允许旧实体随着时间的推移而衰减,并保持故事的同质性和新鲜度。与基于LDA和小波的系统相比,我们报告了针对黄金标准(后新闻报道)的改进性能和更高的80%召回率。我们的算法能够将80%的推文聚类到基于故事的聚类中,其纯度为86%。它还可以比手工报告更早地检测趋势故事,并且与基线系统相比,在子主题中识别细粒度故事方面要准确得多。
{"title":"SocialStories: Segmenting Stories within Trending Twitter Topics","authors":"Kokil Jaidka, Kaushik Ramachandran, Prakhar Gupta, Sajal Rustagi","doi":"10.1145/2888451.2888453","DOIUrl":"https://doi.org/10.1145/2888451.2888453","url":null,"abstract":"This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster frequency, and a decay weighting for entities. We present our experiments on 0.19 million tweets posted in June 2014, revolving around the mentions of a software brand before, during and after a marketing conference and a software release. The novelty of our work is the text-based similarity calculation metrics, including a new similarity metric, called the inverse cluster frequency, and time-specific metrics that allow for the decay of old entities with the passage of time and preserve the homogeneity and the freshness of stories. We report improved performance and higher recall of 80%, against the gold standard (posthoc journalistic reports), as compared to LDA-, and Wavelet-based systems. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within sub-topics as compared to baseline systems.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125379682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 3rd IKDD Conference on Data Science, 2016
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1