{"title":"An Analysis of Properties of Malignant Cases for Imbalanced Breast Thermogram Feature Classification","authors":"B. Krawczyk, G. Schaefer","doi":"10.1109/ACPR.2013.45","DOIUrl":null,"url":null,"abstract":"Medical thermography has been demonstrated an effective and inexpensive method for detecting breast cancer, in particular for tumors in early stages and in dense tissue. Image features can be extracted from breast thermograms and used in a pattern classification stage for automated diagnosis and hence as a second objective opinion or for screening purposes. One of the main challenges for applying machine learning algorithms to this task is the high imbalance ratio between class distributions in the available training data. In this paper, we carefully examine the properties of the malignant minority class in order to gain insight into the nature of the data. We identify different types of minority class samples present in a breast thermogram dataset comprising about 150 cases. Using the gained knowledge, we analyse the performance of three state-of-the-art ensemble classifiers, a cost-sensitive one, one based on over-sampling and one using under-sampling, to evaluate which objects are the most difficult to classify correctly. Experimental analysis shows that there is a strong correlation between the type of minority sample and the performance of specific classifier ensemble types.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 2nd IAPR Asian Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACPR.2013.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Medical thermography has been demonstrated an effective and inexpensive method for detecting breast cancer, in particular for tumors in early stages and in dense tissue. Image features can be extracted from breast thermograms and used in a pattern classification stage for automated diagnosis and hence as a second objective opinion or for screening purposes. One of the main challenges for applying machine learning algorithms to this task is the high imbalance ratio between class distributions in the available training data. In this paper, we carefully examine the properties of the malignant minority class in order to gain insight into the nature of the data. We identify different types of minority class samples present in a breast thermogram dataset comprising about 150 cases. Using the gained knowledge, we analyse the performance of three state-of-the-art ensemble classifiers, a cost-sensitive one, one based on over-sampling and one using under-sampling, to evaluate which objects are the most difficult to classify correctly. Experimental analysis shows that there is a strong correlation between the type of minority sample and the performance of specific classifier ensemble types.