{"title":"Learning And Predicting Diabetes Data Sets Using Semi-Supervised Learning","authors":"Radhika Tayal, A. Shankar","doi":"10.1109/Confluence47617.2020.9058276","DOIUrl":null,"url":null,"abstract":"Now these days, many tools have been developed by the researchers to analyze the impact of diabetes disease on common people within a definite period. However, all these tools have predicted the results based on the labeled dataset or smaller dataset. But in a recent environment, we have collected a large amount of data using both online and offline media. Consequently, data are generated from heterogeneous sources, are in unstructured form and voluminous, etc. As a result, it is not possible to use huge data by using traditional prediction algorithms because they work only on the structured dataset. In this paper, we have used the semi-supervised learning approach that works on a partially labeled dataset for predicting diabetes disease. The partial dataset is the combination of a labeled and unlabelled dataset. For prediction, we have considered 80% unlabelled datasets and 20% labeled datasets. We developed a user based interface for the user to build their prediction model using labeled and unlabeled datasets and analyze the data according to their requirements and interest. Our main objective is to develop a diabetes prediction system that can be used by the researcher and the common people using with minimal labelled datasets.","PeriodicalId":180005,"journal":{"name":"2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Confluence47617.2020.9058276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Now these days, many tools have been developed by the researchers to analyze the impact of diabetes disease on common people within a definite period. However, all these tools have predicted the results based on the labeled dataset or smaller dataset. But in a recent environment, we have collected a large amount of data using both online and offline media. Consequently, data are generated from heterogeneous sources, are in unstructured form and voluminous, etc. As a result, it is not possible to use huge data by using traditional prediction algorithms because they work only on the structured dataset. In this paper, we have used the semi-supervised learning approach that works on a partially labeled dataset for predicting diabetes disease. The partial dataset is the combination of a labeled and unlabelled dataset. For prediction, we have considered 80% unlabelled datasets and 20% labeled datasets. We developed a user based interface for the user to build their prediction model using labeled and unlabeled datasets and analyze the data according to their requirements and interest. Our main objective is to develop a diabetes prediction system that can be used by the researcher and the common people using with minimal labelled datasets.