{"title":"EARLY DETECTION OF BREAST CANCER USING THE K-NEAREST NEIGHBOUR (K-NN) ALGORITHM","authors":"Refli Tiarma Ariani Panggabean, Ledy Octavia, Noormala Dwi, Aripin -","doi":"10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3194","DOIUrl":null,"url":null,"abstract":"ABSTRACT- Cancer is one of the Non-Communicable Disease groups whose growth and development are high-speed. One type of cancer is breast cancer (carcinoma mammae). Breast cancer is the leading cause of death for women. The first breast cancer cells can grow into tumors as large as 1 cm, spanning 8-12 years. The prevalence rate of breast cancer in Indonesia is 50 per 100,000 female population. The method used in this study uses the K-Nearest Neighbor (K-NN) algorithm by comparing k values, namely 3, 5, and 7. The dataset used was obtained from the UCI Machine Learning Repository with the Number of datasets after preprocessing, namely 653 data with a class consisting of benign tumors (benign) and malignant tumors (malignant). The variables used in this study take into account the variables of clump thickness, cell size uniformity, cell shape uniformity, marginal adhesion, single epithelial cell size, cell nucleus size, chromatin, normal cell nucleus, and mitosis. The results of the most influential classification for training and testing are using k = 3 with an accuracy of training and testing at a proportion of 70:30 of 83.8074% and 75%; the ratio of 80:20 is 84.6743% and 74.8092%; the percentage of 90:10 is 84.0136% and 84.6154%. Using the value of k = 3, the resulting gap between training and testing is similar.","PeriodicalId":499639,"journal":{"name":"Jusikom : Jurnal Sistem Informasi Ilmu Komputer","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jusikom : Jurnal Sistem Informasi Ilmu Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
ABSTRACT- Cancer is one of the Non-Communicable Disease groups whose growth and development are high-speed. One type of cancer is breast cancer (carcinoma mammae). Breast cancer is the leading cause of death for women. The first breast cancer cells can grow into tumors as large as 1 cm, spanning 8-12 years. The prevalence rate of breast cancer in Indonesia is 50 per 100,000 female population. The method used in this study uses the K-Nearest Neighbor (K-NN) algorithm by comparing k values, namely 3, 5, and 7. The dataset used was obtained from the UCI Machine Learning Repository with the Number of datasets after preprocessing, namely 653 data with a class consisting of benign tumors (benign) and malignant tumors (malignant). The variables used in this study take into account the variables of clump thickness, cell size uniformity, cell shape uniformity, marginal adhesion, single epithelial cell size, cell nucleus size, chromatin, normal cell nucleus, and mitosis. The results of the most influential classification for training and testing are using k = 3 with an accuracy of training and testing at a proportion of 70:30 of 83.8074% and 75%; the ratio of 80:20 is 84.6743% and 74.8092%; the percentage of 90:10 is 84.0136% and 84.6154%. Using the value of k = 3, the resulting gap between training and testing is similar.