Scalable embedding of multiple perspectives for indefinite life-science data analysis

2021 IEEE Symposium Series on Computational Intelligence (SSCI) Pub Date : 2021-12-05 DOI:10.1109/SSCI50451.2021.9659914

Maximilian Münch, Simon Heilig, Philipp Väth, Frank-Michael Schleif

{"title":"Scalable embedding of multiple perspectives for indefinite life-science data analysis","authors":"Maximilian Münch, Simon Heilig, Philipp Väth, Frank-Michael Schleif","doi":"10.1109/SSCI50451.2021.9659914","DOIUrl":null,"url":null,"abstract":"Life science data analysis frequently encounters particular challenges that cannot be solved with classical techniques from data analytics or machine learning domains. The complex inherent structure of the data and especially the encoding in non-standard ways, e.g., as genome- or protein-sequences, graph structure or histograms, often limit the development of appropriate classification models. To address these limitations, the application of domain-specific expert similarity measures has gained a lot of attention in the past. However, the use of such expert measures suffers from two major drawbacks: (a) there is not one outstanding similarity measure that guarantees success in all application scenarios, and (b) such similarity functions often lead to indefinite data that cannot be processed by classical machine learning methods. In order to tackle both of these limitations, this paper presents a method to embed indefinite life science data with various similarity measures at the same time into a complex-valued vector space. We test our approach on various life science data sets and evaluate the performance against other competitive methods to show its efficiency.","PeriodicalId":255763,"journal":{"name":"2021 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI50451.2021.9659914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Life science data analysis frequently encounters particular challenges that cannot be solved with classical techniques from data analytics or machine learning domains. The complex inherent structure of the data and especially the encoding in non-standard ways, e.g., as genome- or protein-sequences, graph structure or histograms, often limit the development of appropriate classification models. To address these limitations, the application of domain-specific expert similarity measures has gained a lot of attention in the past. However, the use of such expert measures suffers from two major drawbacks: (a) there is not one outstanding similarity measure that guarantees success in all application scenarios, and (b) such similarity functions often lead to indefinite data that cannot be processed by classical machine learning methods. In order to tackle both of these limitations, this paper presents a method to embed indefinite life science data with various similarity measures at the same time into a complex-valued vector space. We test our approach on various life science data sets and evaluate the performance against other competitive methods to show its efficiency.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

无限生命科学数据分析的多视角可扩展嵌入

生命科学数据分析经常遇到数据分析或机器学习领域的经典技术无法解决的特殊挑战。数据复杂的固有结构，特别是以非标准方式编码，如基因组或蛋白质序列、图形结构或直方图，往往限制了适当分类模型的发展。为了解决这些限制，特定领域的专家相似度度量的应用在过去得到了很多关注。然而，这种专家度量的使用有两个主要缺点:(a)没有一个突出的相似性度量保证在所有应用场景中成功，(b)这种相似性函数通常会导致不确定的数据，无法通过经典的机器学习方法处理。为了解决这两个问题，本文提出了一种将具有多种相似度量的不确定生命科学数据同时嵌入复值向量空间的方法。我们在各种生命科学数据集上测试了我们的方法，并与其他竞争方法进行了性能评估，以显示其效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE Symposium Series on Computational Intelligence (SSCI)

自引率

0.00%

发文量