Clustering patent document in the field of ICT (Information & Communication Technology)

A. Widodo, I. Budi
{"title":"Clustering patent document in the field of ICT (Information & Communication Technology)","authors":"A. Widodo, I. Budi","doi":"10.1109/STAIR.2011.5995789","DOIUrl":null,"url":null,"abstract":"The current classification of patent data that refers to the IPC (International Patent Classification) of the WIPO (World Intellectual Property Organization), deemed not reflect the classification of the field of ICT (Information & Communication Technology). ICT applications are usually included in sections G (Physics) and H (Electricity). This paper will evaluate the eight groupings of patents based on the IPC classes (G01, G06, G09, G11, H01, H03, H04, and H06) of patents registered in the Directorate General of Intellectual Property Rights in Indonesia, from the year 1991 to 2000. The algorithm used to grouping is KMeans, KMeans++, Hierchical Clustering, and a combination of these three algorithms with SVD (Singular Value Decomposition). For external validation, Purity and F-Measure are used, whereas Silhouette is used for internal validation. From the experimental results it can be concluded that SVD provides improvements to the clustering results. In addition, the use of abstract does not necessarily improve the performance of clustering, and the use of phrase does not always yield better cluster than the use of the word as index. Moreover, no cluster has purity measure greater than 50%, which means that the existing IPC classification has not been able to accommodate the field of ICT appropriately.","PeriodicalId":376671,"journal":{"name":"2011 International Conference on Semantic Technology and Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Semantic Technology and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STAIR.2011.5995789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

The current classification of patent data that refers to the IPC (International Patent Classification) of the WIPO (World Intellectual Property Organization), deemed not reflect the classification of the field of ICT (Information & Communication Technology). ICT applications are usually included in sections G (Physics) and H (Electricity). This paper will evaluate the eight groupings of patents based on the IPC classes (G01, G06, G09, G11, H01, H03, H04, and H06) of patents registered in the Directorate General of Intellectual Property Rights in Indonesia, from the year 1991 to 2000. The algorithm used to grouping is KMeans, KMeans++, Hierchical Clustering, and a combination of these three algorithms with SVD (Singular Value Decomposition). For external validation, Purity and F-Measure are used, whereas Silhouette is used for internal validation. From the experimental results it can be concluded that SVD provides improvements to the clustering results. In addition, the use of abstract does not necessarily improve the performance of clustering, and the use of phrase does not always yield better cluster than the use of the word as index. Moreover, no cluster has purity measure greater than 50%, which means that the existing IPC classification has not been able to accommodate the field of ICT appropriately.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ICT (Information & Communication Technology)领域专利文献聚类
目前的专利数据分类参照的是WIPO(世界知识产权组织)的IPC(国际专利分类),被认为不能反映ICT(信息与通信技术)领域的分类。信息通信技术应用通常包括在G(物理)和H(电力)部分。本文将对1991年至2000年在印度尼西亚知识产权总局注册的专利根据IPC类别(G01、G06、G09、G11、H01、H03、H04和H06)的八组专利进行评估。用于分组的算法是KMeans、kmeans++、分层聚类以及这三种算法与奇异值分解(SVD)的结合。对于外部验证,使用Purity和F-Measure,而内部验证使用Silhouette。从实验结果可以看出,奇异值分解对聚类结果有改善作用。此外,使用abstract并不一定会提高聚类的性能,使用短语并不总是比使用单词作为索引产生更好的聚类。此外,没有一个集群的纯度测量值大于50%,这意味着现有的IPC分类不能适当地适应ICT领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Word Sense Disambiguation by using domain knowledge Morphological analysis for rule based machine translation Measuring flow in gaming platforms Construction of topics and clusters in Topic Detection and Tracking tasks Phonetic coding methods for Malay names retrieval
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1