Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data.

ArXiv Pub Date : 2024-09-04
Nicole Hayes, Ekaterina Merkurjev, Guo-Wei Wei
{"title":"Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data.","authors":"Nicole Hayes, Ekaterina Merkurjev, Guo-Wei Wei","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Data sets with imbalanced class sizes, where one class size is much smaller than that of others, occur exceedingly often in many applications, including those with biological foundations, such as disease diagnosis and drug discovery. Therefore, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to do so can result in heavy costs. Nonetheless, many data classification procedures do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this work, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) approaches and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification tasks on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed technique not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer procedure based on an attention mechanism for self-supervised learning. In addition, the model implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed method is validated using six molecular data sets and compared to other related techniques. The computational experiments show that the proposed technique is superior to competing approaches even in the case of a high class imbalance ratio.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11213158/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data sets with imbalanced class sizes, where one class size is much smaller than that of others, occur exceedingly often in many applications, including those with biological foundations, such as disease diagnosis and drug discovery. Therefore, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to do so can result in heavy costs. Nonetheless, many data classification procedures do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this work, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) approaches and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification tasks on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed technique not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer procedure based on an attention mechanism for self-supervised learning. In addition, the model implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed method is validated using six molecular data sets and compared to other related techniques. The computational experiments show that the proposed technique is superior to competing approaches even in the case of a high class imbalance ratio.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于图的双向变换器决策阈值调整算法,用于分类不平衡的分子数据。
在各种应用中,包括在药物发现和疾病诊断等具有生物学基础的应用中,经常会出现类大小不平衡的数据集,即一个类的大小远远小于其他类的大小。因此,能够识别不同大小类的数据元素极为重要,因为检测失败会导致高昂的成本。然而,许多数据分类算法在不平衡数据集上表现不佳,因为它们往往无法检测到属于代表性不足类别的元素。在本文中,我们提出了 BTDT-MBO 算法,该算法结合了梅里曼-本斯-奥舍(MBO)技术和双向变换器,以及距离相关性和决策阈值调整,适用于类别大小差异很大的高度不平衡分子数据集的数据分类问题。所提出的方法不仅整合了 MBO 算法的分类阈值调整,以帮助处理类不平衡问题,而且还使用了基于注意力机制的双向变换器模型来进行自我监督学习。此外,该方法还将距离相关性作为基于相似性图的框架的权重函数,调整后的 MBO 算法就是在该框架上运行的。我们使用六个分子数据集对所提出的模型进行了验证,并与其他竞争算法进行了全面比较。计算实验表明,即使在类不平衡率非常高的情况下,所提出的方法的性能也优于其他竞争技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data. A Geometric Tension Dynamics Model of Epithelial Convergent Extension. Learning Molecular Representation in a Cell. Ankle Exoskeletons May Hinder Standing Balance in Simple Models of Older and Younger Adults. Nonparametric causal inference for optogenetics: sequential excursion effects for dynamic regimes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1