k -均值聚类方法在本科论文标题分类中的实现

L. Zahrotun, Nila Hutami Putri, Arfiani Nur Khusna
{"title":"k -均值聚类方法在本科论文标题分类中的实现","authors":"L. Zahrotun, Nila Hutami Putri, Arfiani Nur Khusna","doi":"10.1109/TSSA.2018.8708817","DOIUrl":null,"url":null,"abstract":"One of graduation requirements at university is completing undergraduate thesis. At Industrial Engineering Universitas Ahmad Dahlan, undergraduate thesis titles are documented by thesis coordinator. The problem is that students are less knowledgeable on thesis topics, so they do not really know the previous students’ thesis topics. Based on the problem, this research aims at developing a program to classify thesis title so the knowledge on the trend of thesis title topic can be got The method used in this research was K-Means clustering, while range measurement method used was cosine similarity. The testing used Silhouette Coefficient method. The phases from text mining were tokenizing, filtering, stemming, similarity, classifying, testing. The result of this research is a program that can process the title data into trend group pattern of thesis title topic.From 138 data obtained, there are three clusters arranged based on the field on Industrial Engineering study program. Silhouette Coefficient testing shows score of 0.5674 that shows the clustering result is classified low. It occurs since the textual data of the thesis title is too widely distributed, so the title has relatively low similarity score","PeriodicalId":159795,"journal":{"name":"2018 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"The Implementation of K-Means Clustering Method in Classifying Undergraduate Thesis Titles\",\"authors\":\"L. Zahrotun, Nila Hutami Putri, Arfiani Nur Khusna\",\"doi\":\"10.1109/TSSA.2018.8708817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of graduation requirements at university is completing undergraduate thesis. At Industrial Engineering Universitas Ahmad Dahlan, undergraduate thesis titles are documented by thesis coordinator. The problem is that students are less knowledgeable on thesis topics, so they do not really know the previous students’ thesis topics. Based on the problem, this research aims at developing a program to classify thesis title so the knowledge on the trend of thesis title topic can be got The method used in this research was K-Means clustering, while range measurement method used was cosine similarity. The testing used Silhouette Coefficient method. The phases from text mining were tokenizing, filtering, stemming, similarity, classifying, testing. The result of this research is a program that can process the title data into trend group pattern of thesis title topic.From 138 data obtained, there are three clusters arranged based on the field on Industrial Engineering study program. Silhouette Coefficient testing shows score of 0.5674 that shows the clustering result is classified low. It occurs since the textual data of the thesis title is too widely distributed, so the title has relatively low similarity score\",\"PeriodicalId\":159795,\"journal\":{\"name\":\"2018 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TSSA.2018.8708817\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSSA.2018.8708817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

大学毕业要求之一是完成本科毕业论文。在Ahmad Dahlan工业工程大学,本科论文题目由论文协调员记录。问题是学生对论文主题的了解较少,所以他们并不真正了解以前学生的论文主题。基于这一问题,本研究旨在开发一个对论文题目进行分类的程序,从而获得论文题目题目的趋势知识。本研究中使用的方法是K-Means聚类,使用的距离测量方法是余弦相似度。试验采用廓形系数法。文本挖掘的阶段包括标记化、过滤、词干提取、相似性、分类和测试。本研究的结果是一个可以将标题数据处理成论文标题主题趋势群模式的程序。从获得的138个数据中,根据工业工程研究计划的领域划分了三个集群。剪影系数检验结果为0.5674,表明聚类结果属于低分类。这是由于论文标题的文本数据分布过于广泛,所以标题的相似度评分相对较低
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The Implementation of K-Means Clustering Method in Classifying Undergraduate Thesis Titles
One of graduation requirements at university is completing undergraduate thesis. At Industrial Engineering Universitas Ahmad Dahlan, undergraduate thesis titles are documented by thesis coordinator. The problem is that students are less knowledgeable on thesis topics, so they do not really know the previous students’ thesis topics. Based on the problem, this research aims at developing a program to classify thesis title so the knowledge on the trend of thesis title topic can be got The method used in this research was K-Means clustering, while range measurement method used was cosine similarity. The testing used Silhouette Coefficient method. The phases from text mining were tokenizing, filtering, stemming, similarity, classifying, testing. The result of this research is a program that can process the title data into trend group pattern of thesis title topic.From 138 data obtained, there are three clusters arranged based on the field on Industrial Engineering study program. Silhouette Coefficient testing shows score of 0.5674 that shows the clustering result is classified low. It occurs since the textual data of the thesis title is too widely distributed, so the title has relatively low similarity score
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Error Pointing Correction System Implemented in the Air Balloon Communication System Thin Clients as Memoryless Computer for Reducing Digital Divide in East Indonesia Design and Implementation of WebRTC-Based Video Conference System in Odroid Board Leveraging SDN for Handover in Distributed Mobility Management of 5G Network Assessment of IT Governance of Bakti Internet Access Program Based on the COBIT5 Framework : Case Study: Balai Latihan Kerja Kendari
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1