Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods

Chanjong Im, 김도완, Thomas Mandl
{"title":"Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods","authors":"Chanjong Im, 김도완, Thomas Mandl","doi":"10.5392/IJoC.2017.13.2.066","DOIUrl":null,"url":null,"abstract":"Patent classification is becoming more critical as patent filings have been increasing over the years. Despite comprehensive studies in the area, there remain several issues in classifying patents on IPC hierarchical levels. Not only structural complexity but also shortage of patents in the lower level of the hierarchy causes the decline in classification performance. Therefore, we propose a new method of classification based on different criteria that are categories defined by the domain’s experts mentioned in trend analysis reports, i.e. Patent Landscape Report (PLR). Several experiments were conducted with the purpose of identifying type of features and weighting methods that lead to the best classification performance using Support Vector Machine (SVM). Two types of features (noun and noun phrases) and five different weighting schemes (TF-idf, TF-rf, TF-icf, TF-icf-based, and TF-idcef-based) were experimented on.","PeriodicalId":31343,"journal":{"name":"International Journal of Contents","volume":"13 1","pages":"66-74"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Contents","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5392/IJoC.2017.13.2.066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Patent classification is becoming more critical as patent filings have been increasing over the years. Despite comprehensive studies in the area, there remain several issues in classifying patents on IPC hierarchical levels. Not only structural complexity but also shortage of patents in the lower level of the hierarchy causes the decline in classification performance. Therefore, we propose a new method of classification based on different criteria that are categories defined by the domain’s experts mentioned in trend analysis reports, i.e. Patent Landscape Report (PLR). Several experiments were conducted with the purpose of identifying type of features and weighting methods that lead to the best classification performance using Support Vector Machine (SVM). Two types of features (noun and noun phrases) and five different weighting schemes (TF-idf, TF-rf, TF-icf, TF-icf-based, and TF-idcef-based) were experimented on.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
专利文本分类:单图、双图和不同加权方法的实验
随着近年来专利申请的不断增加,专利分类变得越来越重要。尽管在该领域进行了全面的研究,但在IPC等级水平上对专利进行分类仍然存在一些问题。导致分类绩效下降的原因不仅是结构复杂,还有层次较低的专利数量不足。因此,我们提出了一种新的基于不同标准的分类方法,这些标准是由趋势分析报告中提到的领域专家定义的类别,即专利景观报告(PLR)。为了确定特征类型和加权方法,使用支持向量机(SVM)进行了一些实验,从而获得最佳的分类性能。对两类特征(名词和名词短语)和五种不同的权重方案(TF-idf、TF-rf、TF-icf、TF-icf-based和TF-idcef-based)进行了实验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
8 weeks
期刊最新文献
Correlation between virtual reality’s intervention and monitored brain activity: A systematic review Unfamiliar or Defamiliarization: The Uncanny Valley in Interactive Artwork Installations Consequences of Advertising Literacy among College Students Perception of Digital Restoration and Representation of Cultural Heritage -Focusing on Simulation and Simulacra Item Development to Predict the Driving Risk of Older Drivers using the Delphi Method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1