{"title":"FuzzyPPI: Large-Scale Interaction of Human Proteome at Fuzzy Semantic Space","authors":"Anup Kumar Halder;Soumyendu Sekhar Bandyopadhyay;Witold Jedrzejewski;Subhadip Basu;Jacek Sroka","doi":"10.1109/TBDATA.2024.3375149","DOIUrl":null,"url":null,"abstract":"Large-scale protein-protein interaction (PPI) network of an organism provides key insights into its cellular and molecular functionalities, signaling pathways and underlying disease mechanisms. For any organism, the total unexplored protein interactions significantly outnumbers all known positive and negative interactions. For Human, all known PPI datasets contain only <inline-formula><tex-math>$\\sim\\!\\! 5.61$</tex-math></inline-formula> million positive and <inline-formula><tex-math>$\\sim\\!\\! 0.76$</tex-math></inline-formula> million negative interactions, which is <inline-formula><tex-math>$\\sim\\!\\! 3.1$</tex-math></inline-formula>% of potential interactions. We have implemented a distributed algorithm in Apache Spark that evaluates a Human PPI network of <inline-formula><tex-math>$\\sim \\!\\! 180$</tex-math></inline-formula> million potential interactions resulting from 18 994 reviewed proteins for which Gene Ontology (GO) annotations are available. The computed scores have been validated against <i>state-of-the-art</i> methods on benchmark datasets. FuzzyPPI performed significantly better with an average F1 score of 0.62 compared to GOntoSim (0.39), GOGO (0.38), and Wang (0.38) when tested with the Gold Standard PPI Dataset. The resulting scores are published with a web server for non-commercial use at <uri>http://fuzzyppi.mimuw.edu.pl/</uri>. Moreover, conventional PPI prediction methods produce binary results, but in fact this is just a simplification as PPIs have strengths or probabilities and recent studies show that protein binding affinities may prove to be effective in detecting protein complexes, disease association analysis, signaling network reconstruction, etc. Keeping these in mind, our algorithm is based on a fuzzy semantic scoring function and produces probabilities of interaction.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"47-58"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10463153/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Large-scale protein-protein interaction (PPI) network of an organism provides key insights into its cellular and molecular functionalities, signaling pathways and underlying disease mechanisms. For any organism, the total unexplored protein interactions significantly outnumbers all known positive and negative interactions. For Human, all known PPI datasets contain only $\sim\!\! 5.61$ million positive and $\sim\!\! 0.76$ million negative interactions, which is $\sim\!\! 3.1$% of potential interactions. We have implemented a distributed algorithm in Apache Spark that evaluates a Human PPI network of $\sim \!\! 180$ million potential interactions resulting from 18 994 reviewed proteins for which Gene Ontology (GO) annotations are available. The computed scores have been validated against state-of-the-art methods on benchmark datasets. FuzzyPPI performed significantly better with an average F1 score of 0.62 compared to GOntoSim (0.39), GOGO (0.38), and Wang (0.38) when tested with the Gold Standard PPI Dataset. The resulting scores are published with a web server for non-commercial use at http://fuzzyppi.mimuw.edu.pl/. Moreover, conventional PPI prediction methods produce binary results, but in fact this is just a simplification as PPIs have strengths or probabilities and recent studies show that protein binding affinities may prove to be effective in detecting protein complexes, disease association analysis, signaling network reconstruction, etc. Keeping these in mind, our algorithm is based on a fuzzy semantic scoring function and produces probabilities of interaction.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.