{"title":"Improving the Efficiency of Link Prediction on Handling Incomplete Knowledge Graph Using Clustering","authors":"Fitri Susanti, N. Maulidevi, K. Surendro","doi":"10.1145/3587828.3587830","DOIUrl":null,"url":null,"abstract":"A knowledge graph (KG) is used to store knowledge in the form of connected facts. Facts in KG are represented in the form of a triple (subject, predicate, object) or (head, relation, tail). KG is widely used in question answering, information retrieval, classification, recommender systems, and so on. However, a common problem with KG is incomplete KG. A KG is called incomplete if there is a missing relationship between two entities. An incomplete KG can have an impact on decreasing the accuracy of a task that uses the KG. One solution to the incomplete KG is to use link prediction. Link prediction aims to predict the missing relationship between two entities in a KG. Another problem is that the size of KG is large, consisting of hundreds or millions of entities and relationships. Handling large KG also needs to be considered. Therefore, link prediction on large KG also needs to be considered so that the link prediction process is more efficient. This paper discusses link prediction using embedding to overcome the incomplete KG problem. In addition, it is proposed to use clustering to increase the efficiency of the link prediction process. Clustering is used to group the embedding results. After the embedding results are grouped, scoring and loss function calculations to predict missing links are carried out in groups that are considered appropriate. It is expected that with this grouping, the time of link prediction process can be more efficient because there is no need to check all the vectors in the embedding space.","PeriodicalId":340917,"journal":{"name":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587828.3587830","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A knowledge graph (KG) is used to store knowledge in the form of connected facts. Facts in KG are represented in the form of a triple (subject, predicate, object) or (head, relation, tail). KG is widely used in question answering, information retrieval, classification, recommender systems, and so on. However, a common problem with KG is incomplete KG. A KG is called incomplete if there is a missing relationship between two entities. An incomplete KG can have an impact on decreasing the accuracy of a task that uses the KG. One solution to the incomplete KG is to use link prediction. Link prediction aims to predict the missing relationship between two entities in a KG. Another problem is that the size of KG is large, consisting of hundreds or millions of entities and relationships. Handling large KG also needs to be considered. Therefore, link prediction on large KG also needs to be considered so that the link prediction process is more efficient. This paper discusses link prediction using embedding to overcome the incomplete KG problem. In addition, it is proposed to use clustering to increase the efficiency of the link prediction process. Clustering is used to group the embedding results. After the embedding results are grouped, scoring and loss function calculations to predict missing links are carried out in groups that are considered appropriate. It is expected that with this grouping, the time of link prediction process can be more efficient because there is no need to check all the vectors in the embedding space.