Mining Knowledge from Data: An Information Network Analysis Approach

Jiawei Han, Yizhou Sun, Xifeng Yan, Philip S. Yu
{"title":"Mining Knowledge from Data: An Information Network Analysis Approach","authors":"Jiawei Han, Yizhou Sun, Xifeng Yan, Philip S. Yu","doi":"10.1109/ICDE.2012.145","DOIUrl":null,"url":null,"abstract":"Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network researchers focus on homogeneous networks. Departing from both, we view interconnected, semi-structured datasets as heterogeneous, information-rich networks and study how to uncover hidden knowledge in such networks. For example, a university database can be viewed as a heterogeneous information network, where objects of multiple types, such as students, professors, courses, departments, and multiple typed relationships, such as teach and advise are intertwined together, providing abundant information. In this tutorial, we present an organized picture on mining heterogeneous information networks and introduce a set of interesting, effective and scalable network mining methods. The topics to be covered include (i) database as an information network, (ii) mining information networks: clustering, classification, ranking, similarity search, and meta path-guided analysis, (iii) construction of quality, informative networks by data mining, (iv) trend and evolution analysis in heterogeneous information networks, and (v) research frontiers. We show that heterogeneous information networks are informative, and link analysis on such networks is powerful at uncovering critical knowledge hidden in large semi-structured datasets. Finally, we also present a few promising research directions.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 28th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2012.145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network researchers focus on homogeneous networks. Departing from both, we view interconnected, semi-structured datasets as heterogeneous, information-rich networks and study how to uncover hidden knowledge in such networks. For example, a university database can be viewed as a heterogeneous information network, where objects of multiple types, such as students, professors, courses, departments, and multiple typed relationships, such as teach and advise are intertwined together, providing abundant information. In this tutorial, we present an organized picture on mining heterogeneous information networks and introduce a set of interesting, effective and scalable network mining methods. The topics to be covered include (i) database as an information network, (ii) mining information networks: clustering, classification, ranking, similarity search, and meta path-guided analysis, (iii) construction of quality, informative networks by data mining, (iv) trend and evolution analysis in heterogeneous information networks, and (v) research frontiers. We show that heterogeneous information networks are informative, and link analysis on such networks is powerful at uncovering critical knowledge hidden in large semi-structured datasets. Finally, we also present a few promising research directions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从数据中挖掘知识:一种信息网络分析方法
现实世界中的大多数对象和数据都是相互联系的,形成了复杂的、异构的、但往往是半结构化的信息网络。然而,许多数据库研究者认为数据库仅仅是一个支持存储和检索的数据存储库,而不是一个支持全面数据分析的信息丰富、相互关联和多类型的信息网络,而许多网络研究者关注的是同构网络。从两者出发,我们将相互连接的半结构化数据集视为异构的、信息丰富的网络,并研究如何在这些网络中发现隐藏的知识。例如,大学数据库可以看作是一个异构信息网络,学生、教授、课程、院系等多种类型的对象和教学、咨询等多种类型的关系交织在一起,提供了丰富的信息。在本教程中,我们对异构信息网络的挖掘进行了有组织的描述,并介绍了一套有趣、有效和可扩展的网络挖掘方法。将涵盖的主题包括(i)数据库作为信息网络;(ii)挖掘信息网络:聚类、分类、排序、相似搜索和元路径引导分析;(iii)利用数据挖掘构建高质量的信息网络;(iv)异构信息网络的趋势和进化分析;以及(v)研究前沿。我们证明了异构信息网络是信息性的,并且这种网络上的链接分析在揭示隐藏在大型半结构化数据集中的关键知识方面是强大的。最后,提出了今后的研究方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Keyword Query Reformulation on Structured Data Accuracy-Aware Uncertain Stream Databases Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks Project Daytona: Data Analytics as a Cloud Service Automatic Extraction of Structured Web Data with Domain Knowledge
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1