Scientific knowledge is possible with small-sample classification.

EURASIP journal on bioinformatics & systems biology Pub Date : 2013-08-20 DOI:10.1186/1687-4153-2013-10

Edward R Dougherty, Lori A Dalton

{"title":"Scientific knowledge is possible with small-sample classification.","authors":"Edward R Dougherty, Lori A Dalton","doi":"10.1186/1687-4153-2013-10","DOIUrl":null,"url":null,"abstract":"<p><p>: A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims.Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate. </p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2013 1","pages":"10"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4153-2013-10","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EURASIP journal on bioinformatics & systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/1687-4153-2013-10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

: A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims.Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

科学知识可以通过小样本分类实现。

例如，一篇典型的小样本生物标记物分类论文会根据3万个基因和不到100分的小样本来区分病理类型。一些分类规则被用来从这些数据设计分类器，但是我们没有给出很好的理由或条件，该算法应该表现良好。误差估计规则用于使用相同的数据估计总体上的分类误差，但是再一次，我们没有给出好的理由或条件，这个误差估计器应该产生一个好的估计，因此我们不知道分类器应该表现得有多好。事实上，几乎所有这类论文的误差估计都是高度不准确的。简而言之，我们没有理由提出任何要求。鉴于文献中空洞的小样本分类论文无处不在，人们很容易得出结论，在小样本环境中不可能获得科学知识。这并不是说成千上万的论文公开宣称科学知识在内容上是不可能的;相反，他们使用的方法排除了科学知识。在本文中，我们相反地认为，只要有足够的先验知识，小样本分类中的科学知识是可能的。这里讨论的一种自然的方法是通过模式识别的范例，我们将先验知识纳入整个分类过程(分类器设计和误差估计)，在给定可用信息的情况下优化过程的每一步，并获得分类器和误差估计器的性能的理论度量，后者是关键的认识论问题。总之，我们可以对所提出的小样本分类器及其误差估计进行科学验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

EURASIP journal on bioinformatics & systems biology

自引率

0.00%

发文量