Pub Date : 2023-01-25DOI: 10.26599/BDMA.2022.9020023
Zhenpeng Wu;Jiantao Zheng;Jiashu Liu;Cuixiang Lin;Hong-Dong Li
As the least understood mode of alternative splicing, Intron Retention (IR) is emerging as an interesting area and has attracted more and more attention in the field of gene regulation and disease studies. Existing methods detect IR exclusively based on one or a few predefined metrics describing local or summarized characteristics of retained introns. These metrics are not able to describe the pattern of sequencing depth of intronic reads, which is an intuitive and informative characteristic of retained introns. We hypothesize that incorporating the distribution pattern of intronic reads will improve the accuracy of IR detection. Here we present DeepRetention, a novel approach for IR detection by modeling the pattern of sequencing depth of introns. Due to the lack of a gold standard dataset of IR, we first compare DeepRetention with two state-of-the-art methods, i.e. iREAD and IRFinder, on simulated RNA-seq datasets with retained introns. The results show that DeepRetention outperforms these two methods. Next, DeepRetention performs well when it is applied to third-generation long-read RNA-seq data, while IRFinder and iREAD are not applicable to detecting IR from the third-generation sequencing data. Further, we show that IRs predicted by DeepRetention are biologically meaningful on an RNA-seq dataset from Alzheimer's Disease (AD) samples. The differential IRs are found to be significantly associated with AD based on statistical evaluation of an AD-specific functional gene network. The parent genes of differential IRs are enriched in AD-related functions. In summary, DeepRetention detects IR from a new angle of view, providing a valuable tool for IR analysis.
{"title":"DeepRetention: A Deep Learning Approach for Intron Retention Detection","authors":"Zhenpeng Wu;Jiantao Zheng;Jiashu Liu;Cuixiang Lin;Hong-Dong Li","doi":"10.26599/BDMA.2022.9020023","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020023","url":null,"abstract":"As the least understood mode of alternative splicing, Intron Retention (IR) is emerging as an interesting area and has attracted more and more attention in the field of gene regulation and disease studies. Existing methods detect IR exclusively based on one or a few predefined metrics describing local or summarized characteristics of retained introns. These metrics are not able to describe the pattern of sequencing depth of intronic reads, which is an intuitive and informative characteristic of retained introns. We hypothesize that incorporating the distribution pattern of intronic reads will improve the accuracy of IR detection. Here we present DeepRetention, a novel approach for IR detection by modeling the pattern of sequencing depth of introns. Due to the lack of a gold standard dataset of IR, we first compare DeepRetention with two state-of-the-art methods, i.e. iREAD and IRFinder, on simulated RNA-seq datasets with retained introns. The results show that DeepRetention outperforms these two methods. Next, DeepRetention performs well when it is applied to third-generation long-read RNA-seq data, while IRFinder and iREAD are not applicable to detecting IR from the third-generation sequencing data. Further, we show that IRs predicted by DeepRetention are biologically meaningful on an RNA-seq dataset from Alzheimer's Disease (AD) samples. The differential IRs are found to be significantly associated with AD based on statistical evaluation of an AD-specific functional gene network. The parent genes of differential IRs are enriched in AD-related functions. In summary, DeepRetention detects IR from a new angle of view, providing a valuable tool for IR analysis.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 2","pages":"115-126"},"PeriodicalIF":13.6,"publicationDate":"2023-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10026288/10026289.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67984953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-24DOI: 10.26599/BDMA.2022.9020025
Yuanxin Xiang;Yi Lv;Wenqiang Lei;Jiancheng Lv
The squelch problem of ultra-short wave communication under non-stationary noise and low Signal-to-Noise Ratio (SNR) in a complex electromagnetic environment is still challenging. To alleviate the problem, we proposed a squelch algorithm for ultra-short wave communication based on a deep neural network and the traditional energy decision method. The proposed algorithm first predicts the speech existence probability using a three-layer Gated Recurrent Unit (GRU) with the speech banding spectrum as the feature. Then it gets the final squelch result by combining the strength of the signal energy and the speech existence probability. Multiple simulations and experiments are done to verify the robustness and effectiveness of the proposed algorithm. We simulate the algorithm in three situations: the typical Amplitude Modulation (AM) and Frequency Modulation (FM) in the ultra-short wave communication under different SNR environments, the non-stationary burst-like noise environments, and the real received signal of the ultra-short wave radio. The experimental results show that the proposed algorithm performs better than the traditional squelch methods in all the simulations and experiments. In particular, the false alarm rate of the proposed squelch algorithm for non-stationary burst-like noise is significantly lower than that of traditional squelch methods.
{"title":"Ultra-Short Wave Communication Squelch Algorithm Based on Deep Neural Network","authors":"Yuanxin Xiang;Yi Lv;Wenqiang Lei;Jiancheng Lv","doi":"10.26599/BDMA.2022.9020025","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020025","url":null,"abstract":"The squelch problem of ultra-short wave communication under non-stationary noise and low Signal-to-Noise Ratio (SNR) in a complex electromagnetic environment is still challenging. To alleviate the problem, we proposed a squelch algorithm for ultra-short wave communication based on a deep neural network and the traditional energy decision method. The proposed algorithm first predicts the speech existence probability using a three-layer Gated Recurrent Unit (GRU) with the speech banding spectrum as the feature. Then it gets the final squelch result by combining the strength of the signal energy and the speech existence probability. Multiple simulations and experiments are done to verify the robustness and effectiveness of the proposed algorithm. We simulate the algorithm in three situations: the typical Amplitude Modulation (AM) and Frequency Modulation (FM) in the ultra-short wave communication under different SNR environments, the non-stationary burst-like noise environments, and the real received signal of the ultra-short wave radio. The experimental results show that the proposed algorithm performs better than the traditional squelch methods in all the simulations and experiments. In particular, the false alarm rate of the proposed squelch algorithm for non-stationary burst-like noise is significantly lower than that of traditional squelch methods.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"106-114"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962958.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a huge number of satellites revolve around the earth, a great probability exists to observe and determine the change phenomena on the earth through the analysis of satellite images on a real-time basis. Therefore, classifying satellite images plays strong assistance in remote sensing communities for predicting tropical cyclones. In this article, a classification approach is proposed using Deep Convolutional Neural Network (DCNN), comprising numerous layers, which extract the features through a downsampling process for classifying satellite cloud images. DCNN is trained marvelously on cloud images with an impressive amount of prediction accuracy. Delivery time decreases for testing images, whereas prediction accuracy increases using an appropriate deep convolutional network with a huge number of training dataset instances. The satellite images are taken from the Meteorological & Oceanographic Satellite Data Archival Centre, the organization is responsible for availing satellite cloud images of India and its subcontinent. The proposed cloud image classification shows 94% prediction accuracy with the DCNN framework.
{"title":"Deep Convolutional Network Based Machine Intelligence Model for Satellite Cloud Image Classification","authors":"Kalyan Kumar Jena;Sourav Kumar Bhoi;Soumya Ranjan Nayak;Ranjit Panigrahi;Akash Kumar Bhoi","doi":"10.26599/BDMA.2021.9020017","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020017","url":null,"abstract":"As a huge number of satellites revolve around the earth, a great probability exists to observe and determine the change phenomena on the earth through the analysis of satellite images on a real-time basis. Therefore, classifying satellite images plays strong assistance in remote sensing communities for predicting tropical cyclones. In this article, a classification approach is proposed using Deep Convolutional Neural Network (DCNN), comprising numerous layers, which extract the features through a downsampling process for classifying satellite cloud images. DCNN is trained marvelously on cloud images with an impressive amount of prediction accuracy. Delivery time decreases for testing images, whereas prediction accuracy increases using an appropriate deep convolutional network with a huge number of training dataset instances. The satellite images are taken from the Meteorological & Oceanographic Satellite Data Archival Centre, the organization is responsible for availing satellite cloud images of India and its subcontinent. The proposed cloud image classification shows 94% prediction accuracy with the DCNN framework.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"32-43"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962954.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67846975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-24DOI: 10.26599/BDMA.2022.9020030
Haipeng Shi;Huan Chen;Qinghong Yang;Jun Wang;Haihe Shi
The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics. Numerous solving algorithms are used for these problems, and complex similarities and differences exist among these algorithms for the same problem, causing difficulty for researchers to select the appropriate one. To address this situation, combined with the formal partition-and-recur method, component technology, domain engineering, and generic programming, the paper presents a method for the development of a family of biological sequence analysis algorithms. It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms. The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability, understandability, and development efficiency of particular algorithms.
{"title":"A Method for Bio-Sequence Analysis Algorithm Development Based on the PAR Platform","authors":"Haipeng Shi;Huan Chen;Qinghong Yang;Jun Wang;Haihe Shi","doi":"10.26599/BDMA.2022.9020030","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020030","url":null,"abstract":"The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics. Numerous solving algorithms are used for these problems, and complex similarities and differences exist among these algorithms for the same problem, causing difficulty for researchers to select the appropriate one. To address this situation, combined with the formal partition-and-recur method, component technology, domain engineering, and generic programming, the paper presents a method for the development of a family of biological sequence analysis algorithms. It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms. The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability, understandability, and development efficiency of particular algorithms.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"11-20"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962956.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The thermal comfort of passengers in the carriage cannot be ignored. Thus, this research aims to establish a prediction model for the thermal comfort of the internal environment of a subway car and find the optimal input combination in establishing the prediction model of the predicted mean vote (PMV) index. Data-driven modeling utilizes data from experiments and questionnaires conducted in Nanjing Metro. Support vector machine (SVM), decision tree (DT), random forest (RF), and logistic regression (LR) were used to build four models. This research aims to select the most appropriate input variables for the predictive model. All possible combinations of 11 input variables were used to determine the most accurate model, with variable selection for each model comprising 102 350 iterations. In the PMV prediction, the RF model was the best when using the correlation coefficients square (R2) as the evaluation indicator (R2: 0.7680, mean squared error (MSE): 0.2868). The variables include clothing temperature (CT), convective heat transfer coefficient between the surface of the human body and the environment (CHTC), black bulb temperature (BBT), and thermal resistance of clothes (TROC). The RF model with MSE as the evaluation index also had the highest accuracy (R2: 0.7676, MSE: 0.2836). The variables include clothing surface area coefficient (CSAC), CT, BBT, and air velocity (AV). The results show that the RF model can efficiently predict the PMV of the subway car environment.
{"title":"Predicted Mean Vote of Subway Car Environment Based on Machine Learning","authors":"Kangkang Huang;Shihua Lu;Xinjun Li;Ke Feng;Weiwei Chen;Yi Xia","doi":"10.26599/BDMA.2022.9020028","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020028","url":null,"abstract":"The thermal comfort of passengers in the carriage cannot be ignored. Thus, this research aims to establish a prediction model for the thermal comfort of the internal environment of a subway car and find the optimal input combination in establishing the prediction model of the predicted mean vote (PMV) index. Data-driven modeling utilizes data from experiments and questionnaires conducted in Nanjing Metro. Support vector machine (SVM), decision tree (DT), random forest (RF), and logistic regression (LR) were used to build four models. This research aims to select the most appropriate input variables for the predictive model. All possible combinations of 11 input variables were used to determine the most accurate model, with variable selection for each model comprising 102 350 iterations. In the PMV prediction, the RF model was the best when using the correlation coefficients square (R2) as the evaluation indicator (R2: 0.7680, mean squared error (MSE): 0.2868). The variables include clothing temperature (CT), convective heat transfer coefficient between the surface of the human body and the environment (CHTC), black bulb temperature (BBT), and thermal resistance of clothes (TROC). The RF model with MSE as the evaluation index also had the highest accuracy (R2: 0.7676, MSE: 0.2836). The variables include clothing surface area coefficient (CSAC), CT, BBT, and air velocity (AV). The results show that the RF model can efficiently predict the PMV of the subway car environment.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"92-105"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962959.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-24DOI: 10.26599/BDMA.2022.9020018
Shu Yang;Ruiyu Chen;Laizhong Cui;Xiaolei Chang
Segment routing has been a novel architecture for traffic engineering in recent years. However, segment routing brings control overheads, i.e., additional packets headers should be inserted. The overheads can greatly reduce the forwarding efficiency for a large network, when segment headers become too long. To achieve the best of two targets, we propose the intelligent routing scheme for traffic engineering (IRTE), which can achieve load balancing with limited control overheads. To achieve optimal performance, we first formulate the problem as a mapping problem that maps different flows to key diversion points. Second, we prove the problem is nondeterministic polynomial (NP)-hard by reducing it to a k-dense subgraph problem. To solve this problem, we develop an ant colony optimization algorithm as improved ant colony optimization (IACO), which is widely used in network optimization problems. We also design the load balancing algorithm with diversion routing (LBA-DR), and analyze its theoretical performance. Finally, we evaluate the IRTE in different real-world topologies, and the results show that the IRTE outperforms traditional algorithms, e.g., the maximum bandwidth is 24.6% lower than that of traditional algorithms when evaluating on BellCanada topology.
{"title":"Intelligent Segment Routing: Toward Load Balancing with Limited Control Overheads","authors":"Shu Yang;Ruiyu Chen;Laizhong Cui;Xiaolei Chang","doi":"10.26599/BDMA.2022.9020018","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020018","url":null,"abstract":"Segment routing has been a novel architecture for traffic engineering in recent years. However, segment routing brings control overheads, i.e., additional packets headers should be inserted. The overheads can greatly reduce the forwarding efficiency for a large network, when segment headers become too long. To achieve the best of two targets, we propose the intelligent routing scheme for traffic engineering (IRTE), which can achieve load balancing with limited control overheads. To achieve optimal performance, we first formulate the problem as a mapping problem that maps different flows to key diversion points. Second, we prove the problem is nondeterministic polynomial (NP)-hard by reducing it to a k-dense subgraph problem. To solve this problem, we develop an ant colony optimization algorithm as improved ant colony optimization (IACO), which is widely used in network optimization problems. We also design the load balancing algorithm with diversion routing (LBA-DR), and analyze its theoretical performance. Finally, we evaluate the IRTE in different real-world topologies, and the results show that the IRTE outperforms traditional algorithms, e.g., the maximum bandwidth is 24.6% lower than that of traditional algorithms when evaluating on BellCanada topology.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"55-71"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09963625.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-24DOI: 10.26599/BDMA.2022.9020024
Samin Poudel;Marwan Bikdash
We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.
{"title":"Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering","authors":"Samin Poudel;Marwan Bikdash","doi":"10.26599/BDMA.2022.9020024","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020024","url":null,"abstract":"We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"72-84"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09963626.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67846982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-24DOI: 10.26599/BDMA.2022.9020027
Amit Kumar Rai;Nirupama Mandal;Krishna Kant Singh;Ivan Izonin
A semi supervised image classification method for satellite images is proposed in this paper. The satellite images contain enormous data that can be used in various applications. The analysis of the data is a tedious task due to the amount of data and the heterogeneity of the data. Thus, in this paper, a Radial Basis Function Neural Network (RBFNN) trained using Manta Ray Foraging Optimization algorithm (MRFO) is proposed. RBFNN is a three-layer network comprising of input, output, and hidden layers that can process large amounts. The trained network can discover hidden data patterns in unseen data. The learning algorithm and seed selection play a vital role in the performance of the network. The seed selection is done using the spectral indices to further improve the performance of the network. The manta ray foraging optimization algorithm is inspired by the intelligent behaviour of manta rays. It emulates three unique foraging behaviours namelys chain, cyclone, and somersault foraging. The satellite images contain enormous amount of data and thus require exploration in large search space. The spiral movement of the MRFO algorithm enables it to explore large search spaces effectively. The proposed method is applied on pre and post flooding Landsat 8 Operational Land Imager (OLI) images of New Brunswick area. The method was applied to identify and classify the land cover changes in the area induced by flooding. The images are classified using the proposed method and a change map is developed using post classification comparison. The change map shows that a large amount of agricultural area was washed away due to flooding. The measurement of the affected area in square kilometres is also performed for mitigation activities. The results show that post flooding the area covered by water is increased whereas the vegetated area is decreased. The performance of the proposed method is done with existing state-of-the-art methods.
{"title":"Satellite Image Classification Using a Hybrid Manta Ray Foraging Optimization Neural Network","authors":"Amit Kumar Rai;Nirupama Mandal;Krishna Kant Singh;Ivan Izonin","doi":"10.26599/BDMA.2022.9020027","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020027","url":null,"abstract":"A semi supervised image classification method for satellite images is proposed in this paper. The satellite images contain enormous data that can be used in various applications. The analysis of the data is a tedious task due to the amount of data and the heterogeneity of the data. Thus, in this paper, a Radial Basis Function Neural Network (RBFNN) trained using Manta Ray Foraging Optimization algorithm (MRFO) is proposed. RBFNN is a three-layer network comprising of input, output, and hidden layers that can process large amounts. The trained network can discover hidden data patterns in unseen data. The learning algorithm and seed selection play a vital role in the performance of the network. The seed selection is done using the spectral indices to further improve the performance of the network. The manta ray foraging optimization algorithm is inspired by the intelligent behaviour of manta rays. It emulates three unique foraging behaviours namelys chain, cyclone, and somersault foraging. The satellite images contain enormous amount of data and thus require exploration in large search space. The spiral movement of the MRFO algorithm enables it to explore large search spaces effectively. The proposed method is applied on pre and post flooding Landsat 8 Operational Land Imager (OLI) images of New Brunswick area. The method was applied to identify and classify the land cover changes in the area induced by flooding. The images are classified using the proposed method and a change map is developed using post classification comparison. The change map shows that a large amount of agricultural area was washed away due to flooding. The measurement of the affected area in square kilometres is also performed for mitigation activities. The results show that post flooding the area covered by water is increased whereas the vegetated area is decreased. The performance of the proposed method is done with existing state-of-the-art methods.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"44-54"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962957.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many efforts have been exerted toward screening potential drugs for targets, and conducting wet experiments remains a laborious and time-consuming approach. Artificial intelligence methods, such as Convolutional Neural Network (CNN), are widely used to facilitate new drug discovery. Owing to the structural limitations of CNN, features extracted from this method are local patterns that lack global information. However, global information extracted from the whole sequence and local patterns extracted from the special domain can influence the drugtarget affinity. A fusion of global information and local patterns can construct neural network calculations closer to actual biological processes. This paper proposes a Fingerprint-embedding framework for Drug-Target binding Affinity prediction (FingerDTA), which uses CNN to extract local patterns and utilize fingerprints to characterize global information. These fingerprints are generated on the basis of the whole sequence of drugs or targets. Furthermore, FingerDTA achieves comparable performance on Davis and KIBA data sets. In the case study of screening potential drugs for the spike protein of the coronavirus disease 2019 (COVID-19), 7 of the top 10 drugs have been confirmed potential by literature. Ultimately, the docking experiment demonstrates that FingerDTA can find novel drug candidates for targets. All codes are available at http://lanproxy.biodwhu.cn:9099/mszjaas/FingerDTA.git.
{"title":"FingerDTA: A Fingerprint-Embedding Framework for Drug-Target Binding Affinity Prediction","authors":"Xuekai Zhu;Juan Liu;Jian Zhang;Zhihui Yang;Feng Yang;Xiaolei Zhang","doi":"10.26599/BDMA.2022.9020005","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020005","url":null,"abstract":"Many efforts have been exerted toward screening potential drugs for targets, and conducting wet experiments remains a laborious and time-consuming approach. Artificial intelligence methods, such as Convolutional Neural Network (CNN), are widely used to facilitate new drug discovery. Owing to the structural limitations of CNN, features extracted from this method are local patterns that lack global information. However, global information extracted from the whole sequence and local patterns extracted from the special domain can influence the drugtarget affinity. A fusion of global information and local patterns can construct neural network calculations closer to actual biological processes. This paper proposes a Fingerprint-embedding framework for Drug-Target binding Affinity prediction (FingerDTA), which uses CNN to extract local patterns and utilize fingerprints to characterize global information. These fingerprints are generated on the basis of the whole sequence of drugs or targets. Furthermore, FingerDTA achieves comparable performance on Davis and KIBA data sets. In the case study of screening potential drugs for the spike protein of the coronavirus disease 2019 (COVID-19), 7 of the top 10 drugs have been confirmed potential by literature. Ultimately, the docking experiment demonstrates that FingerDTA can find novel drug candidates for targets. All codes are available at http://lanproxy.biodwhu.cn:9099/mszjaas/FingerDTA.git.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"1-10"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09963624.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-24DOI: 10.26599/BDMA.2022.9020017
Tripti Choudhary;Vishal Goyal;Atul Bansal
Automatic speech recognition systems are developed for translating the speech signals into the corresponding text representation. This translation is used in a variety of applications like voice enabled commands, assistive devices and bots, etc. There is a significant lack of efficient technology for Indian languages. In this paper, an wavelet transformer for automatic speech recognition (WTASR) of Indian language is proposed. The speech signals suffer from the problem of high and low frequency over different times due to variation in speech of the speaker. Thus, wavelets enable the network to analyze the signal in multiscale. The wavelet decomposition of the signal is fed in the network for generating the text. The transformer network comprises an encoder decoder system for speech translation. The model is trained on Indian language dataset for translation of speech into corresponding text. The proposed method is compared with other state of the art methods. The results show that the proposed WTASR has a low word error rate and can be used for effective speech recognition for Indian language.
{"title":"WTASR: Wavelet Transformer for Automatic Speech Recognition of Indian Languages","authors":"Tripti Choudhary;Vishal Goyal;Atul Bansal","doi":"10.26599/BDMA.2022.9020017","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020017","url":null,"abstract":"Automatic speech recognition systems are developed for translating the speech signals into the corresponding text representation. This translation is used in a variety of applications like voice enabled commands, assistive devices and bots, etc. There is a significant lack of efficient technology for Indian languages. In this paper, an wavelet transformer for automatic speech recognition (WTASR) of Indian language is proposed. The speech signals suffer from the problem of high and low frequency over different times due to variation in speech of the speaker. Thus, wavelets enable the network to analyze the signal in multiscale. The wavelet decomposition of the signal is fed in the network for generating the text. The transformer network comprises an encoder decoder system for speech translation. The model is trained on Indian language dataset for translation of speech into corresponding text. The proposed method is compared with other state of the art methods. The results show that the proposed WTASR has a low word error rate and can be used for effective speech recognition for Indian language.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"85-91"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962811.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}