开放数据集之间共享概念的揭示

IF 2.1 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Web Semantics Pub Date : 2021-01-01 DOI:10.1016/j.websem.2020.100624
Miloš Bogdanović, Nataša Veljković, Milena Frtunić Gligorijević, Darko Puflović, Leonid Stoimenov
{"title":"开放数据集之间共享概念的揭示","authors":"Miloš Bogdanović,&nbsp;Nataša Veljković,&nbsp;Milena Frtunić Gligorijević,&nbsp;Darko Puflović,&nbsp;Leonid Stoimenov","doi":"10.1016/j.websem.2020.100624","DOIUrl":null,"url":null,"abstract":"<div><p><span>Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data<span><span> portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of </span>semantic similarity measures based on </span></span>natural language processing<span> mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"On revealing shared conceptualization among open datasets\",\"authors\":\"Miloš Bogdanović,&nbsp;Nataša Veljković,&nbsp;Milena Frtunić Gligorijević,&nbsp;Darko Puflović,&nbsp;Leonid Stoimenov\",\"doi\":\"10.1016/j.websem.2020.100624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data<span><span> portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of </span>semantic similarity measures based on </span></span>natural language processing<span> mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.</span></p></div>\",\"PeriodicalId\":49951,\"journal\":{\"name\":\"Journal of Web Semantics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Web Semantics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1570826820300573\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1570826820300573","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 2

摘要

开放和透明倡议不仅是科学进步的里程碑,而且还影响了组织和行业的各个领域。在这种影响下,世界各地的各种政府机构通过开放数据门户网站发布了大量的数据集。政府数据涵盖了不同的主题,可用数据的规模每年都在增长。发布的数据应该是可访问和可发现的。出于这些目的,门户利用数据集附带的元数据。然而,元数据的一部分经常丢失,这降低了用户获取所需信息的能力。随着已发布数据集规模的增长,这个问题也随之增加。我们在本文中描述的方法侧重于通过实现能够为未分类数据集所属的类别提出最佳匹配的知识结构和算法来减少这一问题。通过这样做,我们的目标是双重的:通过建议适当的类别来丰富数据集元数据,并增加其可见性和可发现性。我们的方法依赖于用户提供的关于开放数据集的信息——数据集标签中包含的数据集描述。由于数据集标签由于其来源而表达低一致性,在本文中,我们将提出一种基于自然语言处理机制的语义相似度量来优化其使用的方法。优化是根据减少用于数据集描述的不同标记值的数量来执行的。优化后,通过形式概念分析,使用数据集标签来揭示源自其使用的共享概念化。我们将通过比较优化过程前后使用形式概念分析生成的概念格,并使用生成的结构作为知识库对未分类的开放数据集进行分类,来展示我们的建议的优势。最后,我们将提出一种基于生成的知识库的分类机制,该机制利用语义相似度度量来提出适合未分类数据集的类别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On revealing shared conceptualization among open datasets

Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Web Semantics
Journal of Web Semantics 工程技术-计算机:人工智能
CiteScore
6.20
自引率
12.00%
发文量
22
审稿时长
14.6 weeks
期刊介绍: The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.
期刊最新文献
Uniqorn: Unified question answering over RDF knowledge graphs and natural language text KAE: A property-based method for knowledge graph alignment and extension Multi-stream graph attention network for recommendation with knowledge graph Ontology design facilitating Wikibase integration — and a worked example for historical data Web3-DAO: An ontology for decentralized autonomous organizations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1