首页 > 最新文献

Proceedings of the 3rd IKDD Conference on Data Science, 2016最新文献

英文 中文
Learning to Collectively Link Entities 学习集体链接实体
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888454
Ashish Kulkarni, Kanika Agarwal, Pararth Shah, Sunny Raj Rathod, Ganesh Ramakrishnan
Recently Kulkarni et al. [20] proposed an approach for collective disambiguation of entity mentions occurring in natural language text. Their model achieves disambiguation by efficiently computing exact MAP inference in a binary labeled Markov Random Field. Here, we build on their disambiguation model and propose an approach to jointly learn the node and edge parameters of such a model. We use a max margin framework, which is efficiently implemented using projected subgradient, for collective learning. We leverage this in an online and interactive annotation system which incrementally trains the model as data gets curated progressively. We demonstrate the usefulness of our system by manually completing annotations for a subset of the Wikipedia collection. We have made this data publicly available. Evaluation shows that learning helps and our system performs better than several other systems including that of Kulkarni et al.
最近Kulkarni等人[20]提出了一种自然语言文本中实体提及的集体消歧方法。他们的模型通过有效地计算二元标记马尔科夫随机场中的精确MAP推理来实现消歧。在此,我们在他们的消歧模型的基础上,提出了一种联合学习该模型的节点和边缘参数的方法。我们使用最大边际框架,它是有效地实现使用投影子梯度,集体学习。我们在一个在线和交互式注释系统中利用这一点,随着数据的逐步整理,该系统会逐步训练模型。我们通过手动完成维基百科集合子集的注释来演示我们系统的有用性。我们已经公开了这些数据。评估表明,学习有帮助,我们的系统比其他几个系统(包括Kulkarni等人的系统)表现得更好。
{"title":"Learning to Collectively Link Entities","authors":"Ashish Kulkarni, Kanika Agarwal, Pararth Shah, Sunny Raj Rathod, Ganesh Ramakrishnan","doi":"10.1145/2888451.2888454","DOIUrl":"https://doi.org/10.1145/2888451.2888454","url":null,"abstract":"Recently Kulkarni et al. [20] proposed an approach for collective disambiguation of entity mentions occurring in natural language text. Their model achieves disambiguation by efficiently computing exact MAP inference in a binary labeled Markov Random Field. Here, we build on their disambiguation model and propose an approach to jointly learn the node and edge parameters of such a model. We use a max margin framework, which is efficiently implemented using projected subgradient, for collective learning. We leverage this in an online and interactive annotation system which incrementally trains the model as data gets curated progressively. We demonstrate the usefulness of our system by manually completing annotations for a subset of the Wikipedia collection. We have made this data publicly available. Evaluation shows that learning helps and our system performs better than several other systems including that of Kulkarni et al.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134179521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning from Gurus: Analysis and Modeling of Reopened Questions on Stack Overflow 向大师学习:堆栈溢出重新开放问题的分析和建模
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888460
Rishabh Gupta, P. Reddy
Community-driven Question Answering (Q&A) platforms are gaining popularity now-a-days and the number of posts on such platforms are increasing tremendously. Thus, the challenge to keep these platforms noise-free is attracting the interest of research community. Stack Overflow is one such popular computer programming related Q&A platform. The established users on Stack Overflow have learnt the acceptable format and scope of questions in due course. Even if their questions get closed, they are aware of the required edits, therefore the chances of their questions being reopened increases. On the other hand, non-established users have not adapted to the Stack Overflow system and find difficulty in editing their closed questions. In this work, we aim to identify features which help differentiate editing approaches of established and non-established users, and motivate the need of recommendation model. Such a recommendation model can assist every user to edit their closed questions leveraging the edit-style of the established users of the platform.
社区驱动的问答(Q&A)平台现在越来越受欢迎,这些平台上的帖子数量正在急剧增加。因此,保持这些平台无噪声的挑战吸引了研究界的兴趣。Stack Overflow就是这样一个流行的计算机编程相关的问答平台。Stack Overflow的现有用户已在适当时候了解了可接受的问题格式和范围。即使他们的问题被关闭,他们也知道需要进行编辑,因此他们的问题被重新打开的机会增加了。另一方面,非用户还没有适应Stack Overflow系统,在编辑他们的封闭问题时发现困难。在这项工作中,我们旨在识别有助于区分已建立用户和非已建立用户的编辑方法的特征,并激发推荐模型的需求。该推荐模型可以利用平台已建立用户的编辑风格,帮助每个用户编辑自己的封闭问题。
{"title":"Learning from Gurus: Analysis and Modeling of Reopened Questions on Stack Overflow","authors":"Rishabh Gupta, P. Reddy","doi":"10.1145/2888451.2888460","DOIUrl":"https://doi.org/10.1145/2888451.2888460","url":null,"abstract":"Community-driven Question Answering (Q&A) platforms are gaining popularity now-a-days and the number of posts on such platforms are increasing tremendously. Thus, the challenge to keep these platforms noise-free is attracting the interest of research community. Stack Overflow is one such popular computer programming related Q&A platform. The established users on Stack Overflow have learnt the acceptable format and scope of questions in due course. Even if their questions get closed, they are aware of the required edits, therefore the chances of their questions being reopened increases. On the other hand, non-established users have not adapted to the Stack Overflow system and find difficulty in editing their closed questions. In this work, we aim to identify features which help differentiate editing approaches of established and non-established users, and motivate the need of recommendation model. Such a recommendation model can assist every user to edit their closed questions leveraging the edit-style of the established users of the platform.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121798588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving Urban Transportation through Social Media Analytics 通过社交媒体分析改善城市交通
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888478
Manjira Sinha, P. Varma, Gayatri Sivakumar, Mridula Singh, Tridib Mukherjee, D. Chander, K. Dasgupta
Citizens tend to discuss issues in public forums, social media, and web blogs. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, aggregation and visualization of urban public transportation issues. The primary challenges in deriving useful insights from web-based sources, stem from -- (a) the number of reports; (b) incomplete or implicit spatio-temporal context; and the (c) unstructured nature of text in these reports. The work initiates with the formal complaint data from the largest public transportation agency in Bangalore, complemented by complaint reports from web-based and social media sources. Text data is categorized into different transportation related problems and spatio-temporal context is added to the text data for geo-tagging and identifying persistent issues. A well-organized dashboard is developed for efficient visualization. The dashboard is currently being piloted with the largest transportation agency in Bangalore.
公民倾向于在公共论坛、社交媒体和网络博客上讨论问题。鉴于与公共交通相关的问题是通过基于网络的资源最活跃地报道的,我们提出了一个整体框架,用于收集、分类、汇总和可视化城市公共交通问题。从网络资源中获得有用见解的主要挑战源于(a)报告的数量;(b)不完整或隐含的时空背景;(三)报告中文本的非结构化性质。这项工作从班加罗尔最大的公共交通机构的正式投诉数据开始,辅以网络和社交媒体来源的投诉报告。将文本数据分类为不同的交通相关问题,并在文本数据中添加时空上下文,用于地理标记和识别持久性问题。一个组织良好的仪表板是为了高效的可视化而开发的。该仪表板目前正在班加罗尔最大的交通机构进行试验。
{"title":"Improving Urban Transportation through Social Media Analytics","authors":"Manjira Sinha, P. Varma, Gayatri Sivakumar, Mridula Singh, Tridib Mukherjee, D. Chander, K. Dasgupta","doi":"10.1145/2888451.2888478","DOIUrl":"https://doi.org/10.1145/2888451.2888478","url":null,"abstract":"Citizens tend to discuss issues in public forums, social media, and web blogs. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, aggregation and visualization of urban public transportation issues. The primary challenges in deriving useful insights from web-based sources, stem from -- (a) the number of reports; (b) incomplete or implicit spatio-temporal context; and the (c) unstructured nature of text in these reports. The work initiates with the formal complaint data from the largest public transportation agency in Bangalore, complemented by complaint reports from web-based and social media sources. Text data is categorized into different transportation related problems and spatio-temporal context is added to the text data for geo-tagging and identifying persistent issues. A well-organized dashboard is developed for efficient visualization. The dashboard is currently being piloted with the largest transportation agency in Bangalore.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116620735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AMEO 2015: A dataset comprising AMCAT test scores, biodata details and employment outcomes of job seekers AMEO 2015:一个包含AMCAT测试分数、求职者生物数据细节和就业结果的数据集
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2892037
V. Aggarwal, Shashank Srikant, Harsh Nisar
More than a million engineers enter the global workforce every year. A relevant question is what determines the jobs and salaries these engineers are offered right after graduation. Previous studies have shown the influence of various factors such as college reputation, grades, the field one specializes in and market conditions for specific industries. An important input which such analyses do not have is a standardized measures of job skills done at the time of completion of studies. We present here Aspiring Minds' Employability Outcomes 2015 (AMEO 2015), a unique dataset which provides engineering graduates' employment outcomes (salaries, job titles and job locations) together with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality. Coupled with biodata information, AMEO 2015 provides an opportunity for a unique and comprehensive study of the entry level labor market. The data could be used to make an accurate salary predictor, but also understand what influences salary and job titles in the labor market. In this paper we describe the details of the dataset and discuss a spectrum of questions around meritocracy in labor markets, biases in labor selection and other prevalent market forces it can help uncover and answer. You can download the dataset at: http://research.aspiringminds.com/resources/
每年有超过一百万的工程师进入全球劳动力市场。一个相关的问题是,是什么决定了这些工程师毕业后的工作和薪水。以前的研究表明,大学声誉、成绩、专业领域和特定行业的市场条件等各种因素都会对就业产生影响。这种分析所没有的一项重要投入是在完成研究时对工作技能进行标准化衡量。我们在这里展示了《2015年就业能力结果》(AMEO 2015),这是一个独特的数据集,提供了工程专业毕业生的就业结果(工资、职称和工作地点)以及三个基本领域的标准化评估分数——认知技能、技术技能和个性。结合生物数据信息,AMEO 2015为入门级劳动力市场提供了一个独特而全面的研究机会。这些数据可以用来做一个准确的工资预测器,但也了解是什么影响了劳动力市场上的工资和职位。在本文中,我们描述了数据集的细节,并讨论了围绕劳动力市场中的精英管理、劳动力选择中的偏见以及它可以帮助发现和回答的其他普遍市场力量的一系列问题。您可以从http://research.aspiringminds.com/resources/下载该数据集
{"title":"AMEO 2015: A dataset comprising AMCAT test scores, biodata details and employment outcomes of job seekers","authors":"V. Aggarwal, Shashank Srikant, Harsh Nisar","doi":"10.1145/2888451.2892037","DOIUrl":"https://doi.org/10.1145/2888451.2892037","url":null,"abstract":"More than a million engineers enter the global workforce every year. A relevant question is what determines the jobs and salaries these engineers are offered right after graduation. Previous studies have shown the influence of various factors such as college reputation, grades, the field one specializes in and market conditions for specific industries. An important input which such analyses do not have is a standardized measures of job skills done at the time of completion of studies. We present here Aspiring Minds' Employability Outcomes 2015 (AMEO 2015), a unique dataset which provides engineering graduates' employment outcomes (salaries, job titles and job locations) together with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality. Coupled with biodata information, AMEO 2015 provides an opportunity for a unique and comprehensive study of the entry level labor market. The data could be used to make an accurate salary predictor, but also understand what influences salary and job titles in the labor market. In this paper we describe the details of the dataset and discuss a spectrum of questions around meritocracy in labor markets, biases in labor selection and other prevalent market forces it can help uncover and answer. You can download the dataset at: http://research.aspiringminds.com/resources/","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117091361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Events Describe Places: Tagging Places with Event Based Social Network Data 事件描述地点:用基于事件的社会网络数据标记地点
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888477
Vinod Hegde, A. Mileo, A. Pozdnoukhov
Location based services and Geospatial web applications have become popular in recent years due to wide adoption of mobile devices. Search and recommendation of places or Points of Interests (PoIs) are prominent services available on them. The effectiveness of these services crucially depends on the availability of tags that are descriptive of places. The major geospatial databases that contain data about places suffer from the lack of descriptive tags for places, since writing them is a time-consuming process and only a few users do it despite having knowledge about places. In order to tackle this issue and automatically generate descriptive tags for places, we propose a solution that utilizes data about a set of events that happen in a specific place and use it to extract meaningful descriptive tags for that place. We use data about events held at places on Meetup, a well known event based social network and apply Latent Dirichlet Allocation (LDA) to derive sets of probable descriptive tags for any place. In order to evaluate our approach, we measure semantic relatedness between tags derived for places on Meetup and manually assigned tags from Foursquare, a location based service. Results show that event data can be used to derive semantically relevant place tags. This shows that location based services can benefit from capturing data about events to derive place tags.
近年来,由于移动设备的广泛采用,基于位置的服务和地理空间web应用程序变得流行起来。搜索和推荐地点或兴趣点(PoIs)是他们提供的突出服务。这些服务的有效性在很大程度上取决于描述地点的标签的可用性。包含有关地点的数据的主要地理空间数据库缺乏地点的描述性标记,因为编写它们是一个耗时的过程,而且只有少数用户会这样做,尽管他们对地点有所了解。为了解决这个问题并自动为地点生成描述性标记,我们提出了一种解决方案,该解决方案利用在特定地点发生的一组事件的数据,并使用它为该地点提取有意义的描述性标记。我们使用在Meetup(一个著名的基于事件的社交网络)上举行的事件的数据,并应用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)来为任何地点派生可能的描述性标签集。为了评估我们的方法,我们测量了来自Meetup的地点标签和来自Foursquare(一个基于位置的服务)的手动分配标签之间的语义相关性。结果表明,事件数据可以用于派生语义相关的位置标签。这表明,基于位置的服务可以从捕获事件数据以派生位置标记中获益。
{"title":"Events Describe Places: Tagging Places with Event Based Social Network Data","authors":"Vinod Hegde, A. Mileo, A. Pozdnoukhov","doi":"10.1145/2888451.2888477","DOIUrl":"https://doi.org/10.1145/2888451.2888477","url":null,"abstract":"Location based services and Geospatial web applications have become popular in recent years due to wide adoption of mobile devices. Search and recommendation of places or Points of Interests (PoIs) are prominent services available on them. The effectiveness of these services crucially depends on the availability of tags that are descriptive of places. The major geospatial databases that contain data about places suffer from the lack of descriptive tags for places, since writing them is a time-consuming process and only a few users do it despite having knowledge about places. In order to tackle this issue and automatically generate descriptive tags for places, we propose a solution that utilizes data about a set of events that happen in a specific place and use it to extract meaningful descriptive tags for that place. We use data about events held at places on Meetup, a well known event based social network and apply Latent Dirichlet Allocation (LDA) to derive sets of probable descriptive tags for any place. In order to evaluate our approach, we measure semantic relatedness between tags derived for places on Meetup and manually assigned tags from Foursquare, a location based service. Results show that event data can be used to derive semantically relevant place tags. This shows that location based services can benefit from capturing data about events to derive place tags.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129671726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detecting Community Structures in Social Networks by Graph Sparsification 基于图稀疏的社交网络社区结构检测
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888479
Partha Basuchowdhuri, Satyaki Sikdar, Sonu Sreshtha, S. Majumder
Community structures are inherent in social networks and finding them is an interesting and well-studied problem. Finding community structures in social networks is similar to locating densely connected clusters of nodes in a graph. One of the popular methods for finding communities is to first find the inter-community edges and then removing them to reveal the communities. It is well-known that a network centrality measure named edge betweenness can be used to detect the inter-community edges. The edges with high edge betweenness are those that fall in a large number of shortest paths out of all possible pairs of shortest paths. Finding all-pair shortest paths is a computationally expensive task, especially for large-sized graphs. So we construct a t-spanner, a known graph sparsification technique, for finding edges with high betweenness and eventually find communities by removing such edges. Using the t-spanner, we then detect the inter-community edges in O(km) running time by building a distance oracle of size O(kn1+1/k), where t = 2k-1. Compared to the traditional community detection methods dependent on calculation of betweenness values, our algorithm runs much faster. Experiments show that our algorithm finds communities of quality comparable to the other state-of-the-art community detection algorithms.
社区结构是社会网络中固有的,找到它们是一个有趣且研究得很充分的问题。在社交网络中寻找社区结构类似于在图中定位密集连接的节点簇。一种常用的寻找群落的方法是首先找到群落间的边缘,然后将其移除以显示群落。众所周知,可以使用一种称为边缘之间的网络中心性度量来检测社区间的边缘。在所有可能的最短路径对中,处于大量最短路径中的边是具有高边间度的边。寻找全对最短路径是一项计算成本很高的任务,特别是对于大型图。因此,我们构造了一个t形扳手,一种已知的图稀疏化技术,用于寻找具有高中间度的边,并最终通过去除这些边来找到群落。然后,我们使用t-扳手,通过构建大小为O(kn1+1/k)的距离oracle,在O(km)运行时间内检测社区间边缘,其中t = 2k-1。与传统的依赖于计算中间值的社区检测方法相比,我们的算法运行速度更快。实验表明,我们的算法发现的社区质量与其他最先进的社区检测算法相当。
{"title":"Detecting Community Structures in Social Networks by Graph Sparsification","authors":"Partha Basuchowdhuri, Satyaki Sikdar, Sonu Sreshtha, S. Majumder","doi":"10.1145/2888451.2888479","DOIUrl":"https://doi.org/10.1145/2888451.2888479","url":null,"abstract":"Community structures are inherent in social networks and finding them is an interesting and well-studied problem. Finding community structures in social networks is similar to locating densely connected clusters of nodes in a graph. One of the popular methods for finding communities is to first find the inter-community edges and then removing them to reveal the communities. It is well-known that a network centrality measure named edge betweenness can be used to detect the inter-community edges. The edges with high edge betweenness are those that fall in a large number of shortest paths out of all possible pairs of shortest paths. Finding all-pair shortest paths is a computationally expensive task, especially for large-sized graphs. So we construct a t-spanner, a known graph sparsification technique, for finding edges with high betweenness and eventually find communities by removing such edges. Using the t-spanner, we then detect the inter-community edges in O(km) running time by building a distance oracle of size O(kn1+1/k), where t = 2k-1. Compared to the traditional community detection methods dependent on calculation of betweenness values, our algorithm runs much faster. Experiments show that our algorithm finds communities of quality comparable to the other state-of-the-art community detection algorithms.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134506644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
CitizenPulse: A Text Analytics framework for Proactive e-Governance - A Case Study of Mygov.in 公民脉动:主动电子政务的文本分析框架——以Mygov.in为例
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888463
Ankit Lamba, Deepak Yadav, A. Lele
Indian Citizens are beginning to express themselves via social media on a regular basis on various issues. Government of India have started an initiated called as Mygov.in as a collaborative portal where citizens can voice their opinions via free form comments. Analyzing this free form data is a huge challenge. In this paper we present a work in progress called as CitizenPulse framework, capable of performing text analytics on unstructured text using off-the-shelf text analytics components like Named Entity Recognition, Part of Speech and Stemming to name a few. Apart from integrating the text analytics components, CitizenPulse framework abstracts these building blocks as Object, and such different objects can be dragged, dropped and connected to construct a text analytics pipeline called as Analytics Softcore. As a case study we report the analysis of the Mygov.in portal specifically for the topic of Cleanliness in School Curriculum.
印度公民开始定期通过社交媒体表达自己对各种问题的看法。印度政府发起了一个名为Mygov的倡议。作为一个协作门户,公民可以通过自由形式的评论来表达他们的意见。分析这些自由格式的数据是一个巨大的挑战。在本文中,我们介绍了一个正在进行的工作,称为CitizenPulse框架,能够使用现成的文本分析组件(如命名实体识别,词性和词干)对非结构化文本进行文本分析。除了集成文本分析组件之外,CitizenPulse框架还将这些构建块抽象为Object,这些不同的对象可以被拖放和连接,以构建一个称为analytics Softcore的文本分析管道。作为一个案例研究,我们报告了对Mygov的分析。在门户网站专门为主题的清洁学校课程。
{"title":"CitizenPulse: A Text Analytics framework for Proactive e-Governance - A Case Study of Mygov.in","authors":"Ankit Lamba, Deepak Yadav, A. Lele","doi":"10.1145/2888451.2888463","DOIUrl":"https://doi.org/10.1145/2888451.2888463","url":null,"abstract":"Indian Citizens are beginning to express themselves via social media on a regular basis on various issues. Government of India have started an initiated called as Mygov.in as a collaborative portal where citizens can voice their opinions via free form comments. Analyzing this free form data is a huge challenge. In this paper we present a work in progress called as CitizenPulse framework, capable of performing text analytics on unstructured text using off-the-shelf text analytics components like Named Entity Recognition, Part of Speech and Stemming to name a few. Apart from integrating the text analytics components, CitizenPulse framework abstracts these building blocks as Object, and such different objects can be dragged, dropped and connected to construct a text analytics pipeline called as Analytics Softcore. As a case study we report the analysis of the Mygov.in portal specifically for the topic of Cleanliness in School Curriculum.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123728898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Some algorithms for correlated bandits with non-stationary rewards: Regret bounds and applications 具有非平稳奖励的相关盗匪算法:后悔界及其应用
Pub Date : 2016-03-13 DOI: 10.1145/2888451.2888475
Prathamesh Mayekar, N. Hemachandra
We first propose an online learning model wherein rewards for different actions/arms used by the user can be correlated and the reward stream can be non-stationary. Thus, this extends the standard multi-armed bandit learning model. We propose two algorthims, Greedy and Regression based UCB, that attempt to minimize the expected regret. We also obtain non-trivial upper bounds for the expected regret through theoretical analysis. We also provide some evidence for sub-polynomial increase in expected regret upon appropriate tuning of algorithm input parameters. These models are motivated by the problem of dynamic pricing of a product faced by a typical online retailer.
我们首先提出了一个在线学习模型,其中用户使用的不同动作/手臂的奖励可以相互关联,并且奖励流可以是非平稳的。因此,这扩展了标准的多臂强盗学习模型。我们提出了两种算法,贪心和基于回归的UCB,试图最小化预期后悔。通过理论分析,得到了期望后悔的非平凡上界。我们还提供了一些证据表明,在适当调整算法输入参数后,期望遗憾的次多项式增加。这些模型的动机是典型的在线零售商所面临的产品动态定价问题。
{"title":"Some algorithms for correlated bandits with non-stationary rewards: Regret bounds and applications","authors":"Prathamesh Mayekar, N. Hemachandra","doi":"10.1145/2888451.2888475","DOIUrl":"https://doi.org/10.1145/2888451.2888475","url":null,"abstract":"We first propose an online learning model wherein rewards for different actions/arms used by the user can be correlated and the reward stream can be non-stationary. Thus, this extends the standard multi-armed bandit learning model. We propose two algorthims, Greedy and Regression based UCB, that attempt to minimize the expected regret. We also obtain non-trivial upper bounds for the expected regret through theoretical analysis. We also provide some evidence for sub-polynomial increase in expected regret upon appropriate tuning of algorithm input parameters. These models are motivated by the problem of dynamic pricing of a product faced by a typical online retailer.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126568183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 3rd IKDD Conference on Data Science, 2016
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1