首页 > 最新文献

Proceedings of the 2nd International Workshop on Network Data Analytics最新文献

英文 中文
SCAN-XP: Parallel Structural Graph Clustering Algorithm on Intel Xeon Phi Coprocessors 基于Intel Xeon Phi协处理器的并行结构图聚类算法
Pub Date : 2017-05-14 DOI: 10.1145/3068943.3068949
Tomokatsu Takahashi, Hiroaki Shiokawa, H. Kitagawa
The structural graph clustering method SCAN, proposed by Xu et al., is successfully used in many applications because it not only detects densely connected nodes as clusters but also extracts sparsely connected nodes as hubs or outliers. However, it is difficult to applying SCAN to large-scale graphs since SCAN needs to evaluate the density for all adjacent nodes included in the given graphs. In this paper, so as to address the above problem, we present a novel algorithm SCAN-XP that performs over Intel Xeon Phi. We designed SCAN-XP in order to make best use of the hardware potential of Intel Xeon Phi by employing the following approaches: First, SCAN-XP avoids the bottlenecks that arise from parallel graph computations by providing good load balances among cores on the Intel Xeon Phi. Second, SCAN-XP effectively exploits 512 bit SIMD instructions implemented in the Intel Xeon Phi to speed up the density evaluations. As a result, SCAN-XP detects clusters, hubs, and outliers from large-scale graphs with much shorter computation time than SCAN. Specifically, SCAN-XP runs approximately 100 times faster than SCAN; for the graphs with 100 million edges, SCAN-XP is able to perform in a few seconds. In this paper, extensive evaluations on real-world graphs demonstrate the performance superiority of SCAN-XP over existing approaches.
Xu等人提出的结构图聚类方法SCAN在许多应用中得到了成功的应用,因为它不仅可以将连接密集的节点作为聚类检测,还可以将连接稀疏的节点作为枢纽或离群点提取。然而,由于SCAN需要评估给定图中包含的所有相邻节点的密度,因此很难将SCAN应用于大规模图。在本文中,为了解决上述问题,我们提出了一种新的SCAN-XP算法,该算法在Intel Xeon Phi上执行。我们设计SCAN-XP是为了充分利用英特尔Xeon Phi的硬件潜力,采用以下方法:首先,SCAN-XP通过在英特尔Xeon Phi的内核之间提供良好的负载平衡,避免了并行图形计算产生的瓶颈。其次,SCAN-XP有效地利用英特尔至强Phi处理器中实现的512位SIMD指令来加速密度评估。因此,SCAN- xp可以用比SCAN更短的计算时间检测大规模图中的集群、集线器和离群值。具体来说,SCAN- xp的运行速度比SCAN快大约100倍;对于有1亿个边的图,SCAN-XP可以在几秒钟内完成。在本文中,对真实世界的图形进行了广泛的评估,证明了SCAN-XP优于现有方法的性能优势。
{"title":"SCAN-XP: Parallel Structural Graph Clustering Algorithm on Intel Xeon Phi Coprocessors","authors":"Tomokatsu Takahashi, Hiroaki Shiokawa, H. Kitagawa","doi":"10.1145/3068943.3068949","DOIUrl":"https://doi.org/10.1145/3068943.3068949","url":null,"abstract":"The structural graph clustering method SCAN, proposed by Xu et al., is successfully used in many applications because it not only detects densely connected nodes as clusters but also extracts sparsely connected nodes as hubs or outliers. However, it is difficult to applying SCAN to large-scale graphs since SCAN needs to evaluate the density for all adjacent nodes included in the given graphs. In this paper, so as to address the above problem, we present a novel algorithm SCAN-XP that performs over Intel Xeon Phi. We designed SCAN-XP in order to make best use of the hardware potential of Intel Xeon Phi by employing the following approaches: First, SCAN-XP avoids the bottlenecks that arise from parallel graph computations by providing good load balances among cores on the Intel Xeon Phi. Second, SCAN-XP effectively exploits 512 bit SIMD instructions implemented in the Intel Xeon Phi to speed up the density evaluations. As a result, SCAN-XP detects clusters, hubs, and outliers from large-scale graphs with much shorter computation time than SCAN. Specifically, SCAN-XP runs approximately 100 times faster than SCAN; for the graphs with 100 million edges, SCAN-XP is able to perform in a few seconds. In this paper, extensive evaluations on real-world graphs demonstrate the performance superiority of SCAN-XP over existing approaches.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114716419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Proceedings of the 2nd International Workshop on Network Data Analytics 第二届网络数据分析国际研讨会论文集
Akhil Arora, Shourya Roy, A. Bhattacharya
We are delighted to present the papers from the 2nd NDA Workshop on Network Data Analytics, which took place on 19th May, 2017 co-located with the ACM SIGMOD conference in Chicago, Illinois, USA. Networks are prevalent in today's electronic world in a wide variety of domains ranging from engineering to social sciences, life sciences, physical sciences, and so on. Researchers and practitioners have studied networks in multiple ways like defining network metrics, providing theoretical results and examining problems like pattern mining, link prediction, etc. The NDA workshop is a forum for exchanging ideas and methods for mining, querying and learning with real-world networks, developing new common understandings of the problems at hand, sharing of data sets where applicable, and leveraging existing knowledge from different disciplines. The purpose of this workshop is to bring together researchers from academia, industry, and government, to create a forum for discussing recent advances in (large-scale) graph analysis, as well as propose and discuss novel methods and techniques towards addressing domain specific challenges and handling noise in real-world graphs.
我们很高兴地介绍第二届NDA网络数据分析研讨会的论文,该研讨会于2017年5月19日与ACM SIGMOD会议在美国伊利诺伊州芝加哥市举行。在当今的电子世界中,网络在从工程到社会科学、生命科学、物理科学等各个领域都很普遍。研究人员和实践者以多种方式研究网络,如定义网络指标、提供理论结果和检查模式挖掘、链接预测等问题。NDA研讨会是一个交流思想和方法的论坛,用于与现实世界的网络进行挖掘、查询和学习,发展对手头问题的新共识,在适用的情况下共享数据集,并利用来自不同学科的现有知识。本次研讨会的目的是将来自学术界、工业界和政府的研究人员聚集在一起,创建一个论坛,讨论(大规模)图分析的最新进展,以及提出和讨论解决领域特定挑战和处理现实世界图中的噪声的新方法和技术。
{"title":"Proceedings of the 2nd International Workshop on Network Data Analytics","authors":"Akhil Arora, Shourya Roy, A. Bhattacharya","doi":"10.1145/3068943","DOIUrl":"https://doi.org/10.1145/3068943","url":null,"abstract":"We are delighted to present the papers from the 2nd NDA Workshop on Network Data Analytics, which took place on 19th May, 2017 co-located with the ACM SIGMOD conference in Chicago, Illinois, USA. \u0000 \u0000Networks are prevalent in today's electronic world in a wide variety of domains ranging from engineering to social sciences, life sciences, physical sciences, and so on. Researchers and practitioners have studied networks in multiple ways like defining network metrics, providing theoretical results and examining problems like pattern mining, link prediction, etc. The NDA workshop is a forum for exchanging ideas and methods for mining, querying and learning with real-world networks, developing new common understandings of the problems at hand, sharing of data sets where applicable, and leveraging existing knowledge from different disciplines. The purpose of this workshop is to bring together researchers from academia, industry, and government, to create a forum for discussing recent advances in (large-scale) graph analysis, as well as propose and discuss novel methods and techniques towards addressing domain specific challenges and handling noise in real-world graphs.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114907488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of Structured Heterogeneous Networks from Massive Text Data: Extended Abstract 从海量文本数据构建结构化异构网络:扩展摘要
Pub Date : 2017-05-14 DOI: 10.1145/3068943.3068944
Jiawei Han
Network data analytics is important, powerful, and exciting. How big role may network data analytics play in the real world? Much real-world data is unstructured, in the form of natural language text. A grand challenges on big data research is to develop effective and scalable methods to turn such massive text data into actionable knowledge. In order to turn such massive unstructured, text-rich, but interconnected data into knowledge, we propose a data-to-network-to-knowledge (D2N2K) paradigm, that is, first transform data into relatively structured heterogeneous information networks, and then mine such text-rich and structure-rich heterogeneous networks to generate useful knowledge. We argue that such a paradigm represents a promising direction and network data analytics will play an essential role in transforming data to knowledge. However, a critical bottleneck in this game is mining structures from text data. We present our recent progress on developing effective methods for mining structures from massive text data and constructing structured heterogeneous information networks.
网络数据分析是重要的、强大的和令人兴奋的。网络数据分析在现实世界中可能扮演多大的角色?许多现实世界的数据都是非结构化的,以自然语言文本的形式存在。大数据研究面临的一大挑战是开发有效且可扩展的方法,将如此庞大的文本数据转化为可操作的知识。为了将这些海量的非结构化、富文本但相互关联的数据转化为知识,我们提出了数据到网络到知识(D2N2K)范式,即首先将数据转化为相对结构化的异构信息网络,然后对这些富文本和富结构的异构网络进行挖掘,生成有用的知识。我们认为,这种范式代表了一个有前途的方向,网络数据分析将在将数据转化为知识方面发挥重要作用。然而,这个游戏的一个关键瓶颈是从文本数据中挖掘结构。本文介绍了从海量文本数据中挖掘结构和构建结构化异构信息网络的有效方法的最新进展。
{"title":"Construction of Structured Heterogeneous Networks from Massive Text Data: Extended Abstract","authors":"Jiawei Han","doi":"10.1145/3068943.3068944","DOIUrl":"https://doi.org/10.1145/3068943.3068944","url":null,"abstract":"Network data analytics is important, powerful, and exciting. How big role may network data analytics play in the real world? Much real-world data is unstructured, in the form of natural language text. A grand challenges on big data research is to develop effective and scalable methods to turn such massive text data into actionable knowledge. In order to turn such massive unstructured, text-rich, but interconnected data into knowledge, we propose a data-to-network-to-knowledge (D2N2K) paradigm, that is, first transform data into relatively structured heterogeneous information networks, and then mine such text-rich and structure-rich heterogeneous networks to generate useful knowledge. We argue that such a paradigm represents a promising direction and network data analytics will play an essential role in transforming data to knowledge. However, a critical bottleneck in this game is mining structures from text data. We present our recent progress on developing effective methods for mining structures from massive text data and constructing structured heterogeneous information networks.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127391264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Repairing Noisy Graphs 修复噪声图
Pub Date : 2017-05-14 DOI: 10.1145/3068943.3068945
D. Srivastava
Graphs are a flexible way to represent data in a variety of applications, with nodes representing domain-specific entities (e.g., records in record linkage, products and types in an ontology) and edges capturing a variety of relationships between these entities (e.g., an equivalence relationship between records in record linkage, a type-subtype relationship between types in an ontology). Often, the edges in this graph are noisy, in that some edges are missing (i.e., real-world relationships that do not have corresponding edges in the graph) and some edges are spurious (i.e., edges in the graph that do not have corresponding real-world relationships). Directly analyzing noisy graphs can lead to undesirable outcomes, making it important to repair noisy graphs. In this talk, we describe an approach that takes advantage of properties of real-world relationships and their estimated probabilities to ask oracle queries (an abstraction of crowdsourcing) to efficiently repair the noisy graphs. We illustrate this approach for the case of graphs that are unions of cliques (which is the case for record linkage) and graphs that are trees (which is the case for ontologies), and present theoretical and empirical results for these cases. This is joint work with Donatella Firmani, Sainyam Galhotra and Barna Saha.
图是一种灵活的方式来表示各种应用程序中的数据,节点表示特定于领域的实体(例如,记录链接中的记录,本体中的产品和类型),边缘捕获这些实体之间的各种关系(例如,记录链接中记录之间的等价关系,本体中类型之间的类型-子类型关系)。通常,这个图中的边是有噪声的,因为有些边是缺失的(即,图中没有相应边的真实世界关系),有些边是假的(即,图中没有相应的真实世界关系的边)。直接分析有噪声的图可能会导致不期望的结果,因此修复有噪声的图非常重要。在这次演讲中,我们描述了一种利用现实世界关系的属性和它们的估计概率来请求oracle查询(众包的一种抽象)来有效修复噪声图的方法。我们举例说明了这种方法在图的情况下,是集团的联合(这是记录链接的情况)和图的树(这是本体论的情况),并提出了这些情况下的理论和实证结果。这是与Donatella Firmani, Sainyam Galhotra和Barna Saha的合作作品。
{"title":"Repairing Noisy Graphs","authors":"D. Srivastava","doi":"10.1145/3068943.3068945","DOIUrl":"https://doi.org/10.1145/3068943.3068945","url":null,"abstract":"Graphs are a flexible way to represent data in a variety of applications, with nodes representing domain-specific entities (e.g., records in record linkage, products and types in an ontology) and edges capturing a variety of relationships between these entities (e.g., an equivalence relationship between records in record linkage, a type-subtype relationship between types in an ontology). Often, the edges in this graph are noisy, in that some edges are missing (i.e., real-world relationships that do not have corresponding edges in the graph) and some edges are spurious (i.e., edges in the graph that do not have corresponding real-world relationships). Directly analyzing noisy graphs can lead to undesirable outcomes, making it important to repair noisy graphs. In this talk, we describe an approach that takes advantage of properties of real-world relationships and their estimated probabilities to ask oracle queries (an abstraction of crowdsourcing) to efficiently repair the noisy graphs. We illustrate this approach for the case of graphs that are unions of cliques (which is the case for record linkage) and graphs that are trees (which is the case for ontologies), and present theoretical and empirical results for these cases. This is joint work with Donatella Firmani, Sainyam Galhotra and Barna Saha.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129437534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Prediction for Graph Queries 图查询的性能预测
Pub Date : 2017-05-14 DOI: 10.1145/3068943.3068947
M. Namaki, K. Sasani, Yinghui Wu, A. Gebremedhin
Query performance prediction has shown benefits to query optimization and resource allocation for relational databases. Emerging applications are leading to search scenarios where workloads with heterogeneous, structure-less analytical queries are processed over large-scale graph and network data. This calls for effective models to predict the performance of graph analytical queries, which are often more involved than their relational counterparts. In this paper, we study and evaluate predictive techniques for graph query performance prediction. We make several contributions. (1) We propose a general learning framework that makes use of practical and computationally efficient statistics from query scenarios and employs regression models. (2) We instantiate the framework with two routinely issued query classes, namely, reachability and graph pattern matching, that exhibit different query complexity. We develop modeling and learning algorithms for both query classes. (3) We show that our prediction models readily apply to resource-bounded querying, by providing a learning-based workload optimization strategy. Given a query workload and a time bound, the models select queries to be processed with a maximized query profit and a total cost within the bound. Using real-world graphs, we experimentally demonstrate the efficacy of our framework in terms of accuracy and the effectiveness of workload optimization.
查询性能预测对关系数据库的查询优化和资源分配有好处。新兴的应用程序正在导致搜索场景,在这些场景中,具有异构、无结构分析查询的工作负载是在大规模的图和网络数据上处理的。这就需要有效的模型来预测图分析查询的性能,图分析查询通常比关系查询更复杂。在本文中,我们研究和评估了用于图查询性能预测的预测技术。我们做了几项贡献。(1)我们提出了一个通用的学习框架,该框架利用查询场景中实用且计算效率高的统计数据,并采用回归模型。(2)我们使用两个常规发布的查询类实例化框架,即可达性和图模式匹配,它们表现出不同的查询复杂度。我们为这两个查询类开发建模和学习算法。(3)通过提供基于学习的工作负载优化策略,我们证明了我们的预测模型很容易应用于资源边界查询。给定查询工作负载和时间限制,模型选择要处理的查询,并在该范围内获得最大的查询利润和总成本。使用真实世界的图表,我们通过实验证明了我们的框架在准确性和工作负载优化的有效性方面的有效性。
{"title":"Performance Prediction for Graph Queries","authors":"M. Namaki, K. Sasani, Yinghui Wu, A. Gebremedhin","doi":"10.1145/3068943.3068947","DOIUrl":"https://doi.org/10.1145/3068943.3068947","url":null,"abstract":"Query performance prediction has shown benefits to query optimization and resource allocation for relational databases. Emerging applications are leading to search scenarios where workloads with heterogeneous, structure-less analytical queries are processed over large-scale graph and network data. This calls for effective models to predict the performance of graph analytical queries, which are often more involved than their relational counterparts. In this paper, we study and evaluate predictive techniques for graph query performance prediction. We make several contributions. (1) We propose a general learning framework that makes use of practical and computationally efficient statistics from query scenarios and employs regression models. (2) We instantiate the framework with two routinely issued query classes, namely, reachability and graph pattern matching, that exhibit different query complexity. We develop modeling and learning algorithms for both query classes. (3) We show that our prediction models readily apply to resource-bounded querying, by providing a learning-based workload optimization strategy. Given a query workload and a time bound, the models select queries to be processed with a maximized query profit and a total cost within the bound. Using real-world graphs, we experimentally demonstrate the efficacy of our framework in terms of accuracy and the effectiveness of workload optimization.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128097124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Using Graphical Features To Improve Demographic Prediction From Smart Phone Data 利用图形特征改进智能手机数据的人口预测
Pub Date : 2017-05-14 DOI: 10.1145/3068943.3068948
S. Akter, L. Holder
Demographic information such as gender, age, ethnicity, level of education, disabilities, employment, and socio-economic status are important in the area of social science, survey and marketing. But it is difficult to obtain the demographic information from users due to reluctance of users to participate and low response rate. Through automated demographics prediction from smart phone sensor data, researchers can obtain this valuable information in a nonintrusive and cost-effective manner. We approach the problem of demographic prediction, namely, classification of gender, age group and job type, through the use of a graphical feature based framework. The framework represents information collected from sensor networks as graphs, extracts useful and relevant graphical features, and predicts demographic information. We evaluated our approach on the Nokia Mobile Phone dataset for the three classification tasks: gender, age-group and job-type. Our approach produced comparable results with most of the state of the art methods while having the additional advantage of general applicability to sensor networks without using sophisticated and application-specific feature generation techniques, background knowledge and special techniques to address class imbalance.
人口统计信息,如性别、年龄、种族、教育水平、残疾、就业和社会经济地位,在社会科学、调查和营销领域都很重要。但由于用户不愿意参与,回复率低,很难从用户那里获得人口统计信息。通过智能手机传感器数据的自动人口统计预测,研究人员可以以非侵入性和成本效益的方式获得这些有价值的信息。我们通过使用基于图形特征的框架来处理人口预测问题,即性别、年龄组和工作类型的分类。该框架将从传感器网络中收集的信息表示为图形,提取有用和相关的图形特征,并预测人口统计信息。我们对诺基亚移动电话数据集的三个分类任务进行了评估:性别、年龄组和工作类型。我们的方法产生了与大多数最先进的方法相当的结果,同时具有普遍适用于传感器网络的额外优势,而无需使用复杂的和特定于应用的特征生成技术、背景知识和特殊技术来解决类不平衡问题。
{"title":"Using Graphical Features To Improve Demographic Prediction From Smart Phone Data","authors":"S. Akter, L. Holder","doi":"10.1145/3068943.3068948","DOIUrl":"https://doi.org/10.1145/3068943.3068948","url":null,"abstract":"Demographic information such as gender, age, ethnicity, level of education, disabilities, employment, and socio-economic status are important in the area of social science, survey and marketing. But it is difficult to obtain the demographic information from users due to reluctance of users to participate and low response rate. Through automated demographics prediction from smart phone sensor data, researchers can obtain this valuable information in a nonintrusive and cost-effective manner. We approach the problem of demographic prediction, namely, classification of gender, age group and job type, through the use of a graphical feature based framework. The framework represents information collected from sensor networks as graphs, extracts useful and relevant graphical features, and predicts demographic information. We evaluated our approach on the Nokia Mobile Phone dataset for the three classification tasks: gender, age-group and job-type. Our approach produced comparable results with most of the state of the art methods while having the additional advantage of general applicability to sensor networks without using sophisticated and application-specific feature generation techniques, background knowledge and special techniques to address class imbalance.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126994089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Graph Mining to Characterize Competition for Employment 图挖掘表征就业竞争
Pub Date : 2017-05-14 DOI: 10.1145/3068943.3068946
A. Toulis, Lukasz Golab
In this paper, we discuss a novel application of graph analytics to characterize competition in the workforce. We propose a methodology that relies on finding communities in a graph representing prospective employees (with edges connecting people who interviewed for the same job) and communities in a graph representing available jobs (with edges connecting jobs that interviewed the same person). We then apply the proposed methodology to a real dataset corresponding to cooperative internships offered to undergraduate students at a North American post-secondary institution, illustrating the benefits of our approach.
在本文中,我们讨论了图形分析的一个新应用,以表征劳动力中的竞争。我们提出了一种方法,它依赖于在表示未来员工的图中找到社区(用边连接面试同一工作的人)和在表示可用工作的图中找到社区(用边连接面试同一人的工作)。然后,我们将提出的方法应用于一个真实的数据集,该数据集对应于北美一所大专院校为本科生提供的合作实习,说明了我们方法的好处。
{"title":"Graph Mining to Characterize Competition for Employment","authors":"A. Toulis, Lukasz Golab","doi":"10.1145/3068943.3068946","DOIUrl":"https://doi.org/10.1145/3068943.3068946","url":null,"abstract":"In this paper, we discuss a novel application of graph analytics to characterize competition in the workforce. We propose a methodology that relies on finding communities in a graph representing prospective employees (with edges connecting people who interviewed for the same job) and communities in a graph representing available jobs (with edges connecting jobs that interviewed the same person). We then apply the proposed methodology to a real dataset corresponding to cooperative internships offered to undergraduate students at a North American post-secondary institution, illustrating the benefits of our approach.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121310049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Web and Social Media Analytics towards Enhancing Urban Transportations: A Case for Bangalore 网络和社会媒体分析促进城市交通:以班加罗尔为例
Pub Date : 2017-05-14 DOI: 10.1145/3068943.3068950
Manjira Sinha, P. Varma, Tridib Mukherjee
Cities today are typically plagued by multiple issues such as âĂŞ traffic jams, garbage, transit overload, public safety, drainage etc. Citizens today tend to discuss these issues in public forums, social media, web blogs, in a widespread manner. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, aggregation and visualization of urban public transportation issues. The primary challenges in deriving useful insights from web-based sources, stem from: (a) the number of reports; (b) incomplete or implicit spatio-temporal context; and the (c) unstructured nature of text in these reports. This paper provides the text categorization techniques that can be adopted to address specifically these challenges. The work initiates with the formal complaint data from the largest public transportation agency in Bangalore, complemented by complaint reports from web-based and social media sources. An easy to navigate and well-organized dashboard is developed for efficient visualization. The dashboard is currently being piloted with the largest transportation agency in Bangalore.
今天的城市通常受到多种问题的困扰,如âĂŞ交通堵塞、垃圾、交通超载、公共安全、排水等。今天的公民倾向于在公共论坛、社交媒体、网络博客上以广泛的方式讨论这些问题。鉴于与公共交通相关的问题是通过基于网络的资源最活跃地报道的,我们提出了一个整体框架,用于收集、分类、汇总和可视化城市公共交通问题。从网络资源中获得有用见解的主要挑战来自:(a)报告的数量;(b)不完整或隐含的时空背景;(三)报告中文本的非结构化性质。本文提供了文本分类技术,可以采用具体解决这些挑战。这项工作从班加罗尔最大的公共交通机构的正式投诉数据开始,辅以网络和社交媒体来源的投诉报告。一个易于导航和组织良好的仪表板开发有效的可视化。该仪表板目前正在班加罗尔最大的交通机构进行试验。
{"title":"Web and Social Media Analytics towards Enhancing Urban Transportations: A Case for Bangalore","authors":"Manjira Sinha, P. Varma, Tridib Mukherjee","doi":"10.1145/3068943.3068950","DOIUrl":"https://doi.org/10.1145/3068943.3068950","url":null,"abstract":"Cities today are typically plagued by multiple issues such as âĂŞ traffic jams, garbage, transit overload, public safety, drainage etc. Citizens today tend to discuss these issues in public forums, social media, web blogs, in a widespread manner. Given that issues related to public transportation are most actively reported across web-based sources, we present a holistic framework for collection, categorization, aggregation and visualization of urban public transportation issues. The primary challenges in deriving useful insights from web-based sources, stem from: (a) the number of reports; (b) incomplete or implicit spatio-temporal context; and the (c) unstructured nature of text in these reports. This paper provides the text categorization techniques that can be adopted to address specifically these challenges. The work initiates with the formal complaint data from the largest public transportation agency in Bangalore, complemented by complaint reports from web-based and social media sources. An easy to navigate and well-organized dashboard is developed for efficient visualization. The dashboard is currently being piloted with the largest transportation agency in Bangalore.","PeriodicalId":345682,"journal":{"name":"Proceedings of the 2nd International Workshop on Network Data Analytics","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128342918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 2nd International Workshop on Network Data Analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1