Pub Date : 2024-07-01DOI: 10.21608/ijicis.2024.295310.1339
Ghada M. Elshamy, M. Alfonse, Islam M. Hegazy, Mostafa M. Aref
: Recently the conjunction between vision and language has created many intersecting tasks as visual question-answering systems, image captioning, etc. Specifically, dialog systems that depend on a visual scene play an important role in improving human-computer interaction technology. At the same time, reinforcement learning has emerged as a very successful paradigm for a variety of machine learning tasks, especially those tasks that aim to develop smart and humanoid machines. In this paper, we show how reinforcement learning is applied to conversational agents to build a powerful visual dialog agent. Visual Dialog task requires the agent to have a meaningful conversation about visual content in natural language. For a given image, its caption, dialog history (question/answer pairs)
{"title":"A COMPARATIVE STUDY ON REINFORCEMENT LEARNING BASED VISUAL DIALOG SYSTEMS","authors":"Ghada M. Elshamy, M. Alfonse, Islam M. Hegazy, Mostafa M. Aref","doi":"10.21608/ijicis.2024.295310.1339","DOIUrl":"https://doi.org/10.21608/ijicis.2024.295310.1339","url":null,"abstract":": Recently the conjunction between vision and language has created many intersecting tasks as visual question-answering systems, image captioning, etc. Specifically, dialog systems that depend on a visual scene play an important role in improving human-computer interaction technology. At the same time, reinforcement learning has emerged as a very successful paradigm for a variety of machine learning tasks, especially those tasks that aim to develop smart and humanoid machines. In this paper, we show how reinforcement learning is applied to conversational agents to build a powerful visual dialog agent. Visual Dialog task requires the agent to have a meaningful conversation about visual content in natural language. For a given image, its caption, dialog history (question/answer pairs)","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141839007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.21608/ijicis.2023.190345.1253
Ghada Kareem
: This paper represents a different way of denoising lower limb Surface electromyography sEMG signals using Daubechies wavelets Much noise will be needed to remove as we can from this signal for it to function properly. The previous works couldn’t accurately determine the most suitable method to be used for lower limbs. This paper uses different thresholding approaches to calculate the highest value of SNR to identify the best denoising method. And a complete detailed survey of denoising techniques for reducing noise from surface electromyography signals is provided. This research has important implications for the practical application of lower limb EMG. This paper aimed to ascertain what are the most optimal parameters to be applied while using wavelet transform (Daubechies wavelets) to achieve the highest possible SNR in sEMG of the lower limb. The sample that was used came from 11 healthy subjects doing 3 different movements, using 4 electrodes to extract the signal. To identify the best denoising is calculated using different thresholding types, Daubechies levels, and noise structures. The result from this experiment indicates that the hard-rigorous SURE threshold and scaled white noise provide the highest SNR in every signal tested but the Daubechies level differs from one signal to another.
{"title":"LOWER LIMB SEMG DENOISING USING DAUBECHIES WAVELETS","authors":"Ghada Kareem","doi":"10.21608/ijicis.2023.190345.1253","DOIUrl":"https://doi.org/10.21608/ijicis.2023.190345.1253","url":null,"abstract":": This paper represents a different way of denoising lower limb Surface electromyography sEMG signals using Daubechies wavelets Much noise will be needed to remove as we can from this signal for it to function properly. The previous works couldn’t accurately determine the most suitable method to be used for lower limbs. This paper uses different thresholding approaches to calculate the highest value of SNR to identify the best denoising method. And a complete detailed survey of denoising techniques for reducing noise from surface electromyography signals is provided. This research has important implications for the practical application of lower limb EMG. This paper aimed to ascertain what are the most optimal parameters to be applied while using wavelet transform (Daubechies wavelets) to achieve the highest possible SNR in sEMG of the lower limb. The sample that was used came from 11 healthy subjects doing 3 different movements, using 4 electrodes to extract the signal. To identify the best denoising is calculated using different thresholding types, Daubechies levels, and noise structures. The result from this experiment indicates that the hard-rigorous SURE threshold and scaled white noise provide the highest SNR in every signal tested but the Daubechies level differs from one signal to another.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132744813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.21608/ijicis.2023.184707.1244
Hesham Ibrahim
{"title":"Innovation initiative of Egyptian E-government system by using Big Data","authors":"Hesham Ibrahim","doi":"10.21608/ijicis.2023.184707.1244","DOIUrl":"https://doi.org/10.21608/ijicis.2023.184707.1244","url":null,"abstract":"","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115580091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-01DOI: 10.21608/ijicis.2023.176123.1230
S. Haridy, R. Ismail, N. Badr, M. Hashem
: In recent years, online services have received considerable attention worldwide. One crucial online service during the coronavirus disease (COVID-19) pandemic was e-governance. In which governments provides various services to their citizens using information and communication technology. However, the residents of Arab countries have faced numerous of obstacles and have not received the full benefits of e-governance. One of the main reasons is the absence of integration and information sharing. Therefore, in this study, a novel domain ontology for the Egyptian e-government has been proposed. The developed ontology can be used to solve a variety of interoperability problems. The development process starts with building ontology-driven conceptual model using OntoUML. It is one of the most used ontology-driven conceptual modeling languages. The proposed model is then converted to a computable web ontology via the Web Ontology Language. The resulted ontology is evaluated by the OntoMetrics quality metrics. Results are compared with the metrics collected from 20 e-government ontologies and proved that the proposed ontology has better understandability measurements.
{"title":"ONTOLOGY-DRIVEN CONCEPTUAL MODEL AND DOMAIN ONTOLOGY FOR EGYPTIAN E-GOVERNMENT","authors":"S. Haridy, R. Ismail, N. Badr, M. Hashem","doi":"10.21608/ijicis.2023.176123.1230","DOIUrl":"https://doi.org/10.21608/ijicis.2023.176123.1230","url":null,"abstract":": In recent years, online services have received considerable attention worldwide. One crucial online service during the coronavirus disease (COVID-19) pandemic was e-governance. In which governments provides various services to their citizens using information and communication technology. However, the residents of Arab countries have faced numerous of obstacles and have not received the full benefits of e-governance. One of the main reasons is the absence of integration and information sharing. Therefore, in this study, a novel domain ontology for the Egyptian e-government has been proposed. The developed ontology can be used to solve a variety of interoperability problems. The development process starts with building ontology-driven conceptual model using OntoUML. It is one of the most used ontology-driven conceptual modeling languages. The proposed model is then converted to a computable web ontology via the Web Ontology Language. The resulted ontology is evaluated by the OntoMetrics quality metrics. Results are compared with the metrics collected from 20 e-government ontologies and proved that the proposed ontology has better understandability measurements.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122400462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.21608/ijicis.2023.210435.1270
Donia Gamal, Marco Alfonse, Salud María Jiménez-Zafra, Moustafa Aref
: Arabic is a language with rich morphology and few resources. Arabic is therefore recognized as one of the most challenging languages for machine translation. The study of translation into Arabic has received significantly less attention than that of European languages. Consequently, further research into Arabic machine translation quality needs more investigation. This paper proposes a translation model between Arabic and English based on Neural Machine Translation (NMT). The proposed model employs a transformer with multi-head attention. It combines a feed-forward network with a multi-head attention mechanism. The NMT proposed model has demonstrated its effectiveness in improving translation by achieving an impressive accuracy of 97.68%, a loss of 0.0778, and a near-perfect Bilingual Evaluation Understudy (BLEU) score of 99.95. Future work will focus on exploring more effective ways of addressing the evaluation and quality estimation of NMT for low-data resource languages, which are often challenging as a result of the scarcity of reference translations and human annotators.
{"title":"Case Study of Improving English-Arabic Translation Using the Transformer Model.","authors":"Donia Gamal, Marco Alfonse, Salud María Jiménez-Zafra, Moustafa Aref","doi":"10.21608/ijicis.2023.210435.1270","DOIUrl":"https://doi.org/10.21608/ijicis.2023.210435.1270","url":null,"abstract":": Arabic is a language with rich morphology and few resources. Arabic is therefore recognized as one of the most challenging languages for machine translation. The study of translation into Arabic has received significantly less attention than that of European languages. Consequently, further research into Arabic machine translation quality needs more investigation. This paper proposes a translation model between Arabic and English based on Neural Machine Translation (NMT). The proposed model employs a transformer with multi-head attention. It combines a feed-forward network with a multi-head attention mechanism. The NMT proposed model has demonstrated its effectiveness in improving translation by achieving an impressive accuracy of 97.68%, a loss of 0.0778, and a near-perfect Bilingual Evaluation Understudy (BLEU) score of 99.95. Future work will focus on exploring more effective ways of addressing the evaluation and quality estimation of NMT for low-data resource languages, which are often challenging as a result of the scarcity of reference translations and human annotators.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125774318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.21608/ijicis.2023.190004.1252
A. Ibrahim, Marco Alfonse, M. Aref
: The term "Medical Text summarization" refers to the process of extracting or collecting more useful information from medical articles in a concise manner. Every day, the count of medical publications increases continuously, and applying text summarization techniques can minimize the time needed to manually transform medical papers into a summarized version. This study's goal is to present a summary of recent works in medical text summarization from 2018 to 2022. It includes 15 papers covering different methodologies such as Clinical Context-Aware (CCA), Prognosis Quality Recognition (PQR), Bidirectional Encoder Representations From Transformers (BERT), Generative Adversarial Networks (GAN), Recurrent Neural Network (RNN), and Sequence-To-Sequence (seq-2-seq) model. Also, the paper describes the newest datasets (PubMed, arXiv, SUMPUBMED, Evidence-Based Medicine Summarization, COVID-19 Open Research, BioMed Central, Clinical Context-Aware, Biomedical Relation Extraction Dataset, Semantic Scholar Open Research Corpus, and Prognosis Quality Recognition) and evaluation metrics (Recall-Oriented Understudy for Gisting Evaluation (ROUGE), F1 Metric, Bilingual Evaluation Understudy (BLEU), BERTScore (BS), and Accuracy) used in medical text summarization.
{"title":"A SYSTEMATIC REVIEW ON TEXT SUMMARIZATION OF MEDICAL RESEARCH ARTICLES","authors":"A. Ibrahim, Marco Alfonse, M. Aref","doi":"10.21608/ijicis.2023.190004.1252","DOIUrl":"https://doi.org/10.21608/ijicis.2023.190004.1252","url":null,"abstract":": The term \"Medical Text summarization\" refers to the process of extracting or collecting more useful information from medical articles in a concise manner. Every day, the count of medical publications increases continuously, and applying text summarization techniques can minimize the time needed to manually transform medical papers into a summarized version. This study's goal is to present a summary of recent works in medical text summarization from 2018 to 2022. It includes 15 papers covering different methodologies such as Clinical Context-Aware (CCA), Prognosis Quality Recognition (PQR), Bidirectional Encoder Representations From Transformers (BERT), Generative Adversarial Networks (GAN), Recurrent Neural Network (RNN), and Sequence-To-Sequence (seq-2-seq) model. Also, the paper describes the newest datasets (PubMed, arXiv, SUMPUBMED, Evidence-Based Medicine Summarization, COVID-19 Open Research, BioMed Central, Clinical Context-Aware, Biomedical Relation Extraction Dataset, Semantic Scholar Open Research Corpus, and Prognosis Quality Recognition) and evaluation metrics (Recall-Oriented Understudy for Gisting Evaluation (ROUGE), F1 Metric, Bilingual Evaluation Understudy (BLEU), BERTScore (BS), and Accuracy) used in medical text summarization.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129434572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.21608/ijicis.2023.160460.1216
Ahmed H. Abdul-kareem, Z. Fayed, S. Rady, Salsabil Amin, Bashar M. Nema
: When it comes to deciding on significant matters pertaining to their businesses, a large number of businesses and organizations rely on what are known as decision support systems (DSS). Both the theory and practice of decision support systems are continuing to advance, and they are occasionally converging with other significant advancements in information technology (IT), such as organizational computing, e-commerce and business, and pervasive computing. A well-designed decision support system is an interactive software-based system that assists decision-makers in identifying problems, finding solutions to those problems, and making decisions. This assistance might come in the form of raw data, documentation, personal expertise
{"title":"Advances in Decision Support Systems’ design aspects: architecture, applications, and methods","authors":"Ahmed H. Abdul-kareem, Z. Fayed, S. Rady, Salsabil Amin, Bashar M. Nema","doi":"10.21608/ijicis.2023.160460.1216","DOIUrl":"https://doi.org/10.21608/ijicis.2023.160460.1216","url":null,"abstract":": When it comes to deciding on significant matters pertaining to their businesses, a large number of businesses and organizations rely on what are known as decision support systems (DSS). Both the theory and practice of decision support systems are continuing to advance, and they are occasionally converging with other significant advancements in information technology (IT), such as organizational computing, e-commerce and business, and pervasive computing. A well-designed decision support system is an interactive software-based system that assists decision-makers in identifying problems, finding solutions to those problems, and making decisions. This assistance might come in the form of raw data, documentation, personal expertise","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126007447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.21608/ijicis.2023.188202.1249
D. Fawzy, Sherin M. Moussa, N. Badr
: Recently with the high implementation of numerous Internet of Things (IoT) based systems, it becomes a crucial need to have an effective data prediction approach for IoT data analysis that copes with sustainable smart city services. Nevertheless, IoT data add many data perspectives to consider, which complicate the data prediction process. This poses the urge for advanced data fusion methods that would preserve IoT data while ensuring data prediction accuracy, reliability, and robustness. Although different data prediction approaches have been presented for IoT applications, but maintaining IoT data characteristics is still a challenge. This paper presents our proposed approach the domain-independent Data Fusion for Data Prediction (DFDP) that consists of: (1) data fusion, which maintains IoT data massive size, faults, spatiotemporality, and freshness by employing a data input-data output fusion approach, and (2) data prediction, which utilizes the K-Nearest Neighbor data prediction technique on the fused data. DFDP is validated using IoT data from different smart cities datasets. The experiments examine the effective performance of DFDP that reaches 91.8% accuracy level.
{"title":"DATA FUSION FOR DATA PREDICTION: AN IoT-BASED DATA PREDICTION APPROACH FOR SMART CITIES","authors":"D. Fawzy, Sherin M. Moussa, N. Badr","doi":"10.21608/ijicis.2023.188202.1249","DOIUrl":"https://doi.org/10.21608/ijicis.2023.188202.1249","url":null,"abstract":": Recently with the high implementation of numerous Internet of Things (IoT) based systems, it becomes a crucial need to have an effective data prediction approach for IoT data analysis that copes with sustainable smart city services. Nevertheless, IoT data add many data perspectives to consider, which complicate the data prediction process. This poses the urge for advanced data fusion methods that would preserve IoT data while ensuring data prediction accuracy, reliability, and robustness. Although different data prediction approaches have been presented for IoT applications, but maintaining IoT data characteristics is still a challenge. This paper presents our proposed approach the domain-independent Data Fusion for Data Prediction (DFDP) that consists of: (1) data fusion, which maintains IoT data massive size, faults, spatiotemporality, and freshness by employing a data input-data output fusion approach, and (2) data prediction, which utilizes the K-Nearest Neighbor data prediction technique on the fused data. DFDP is validated using IoT data from different smart cities datasets. The experiments examine the effective performance of DFDP that reaches 91.8% accuracy level.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123738085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.21608/ijicis.2023.180102.1238
Eslam Sharshar, Huda Amin, N. Badr, E. Abdelsameea
: The prediction of liver fibrosis stages in Hepatitis B virus (HBV) and Hepatitis C virus (HCV) is an important issue. The gold standard for liver fibrosis stages evaluation is the liver biopsy but with a lot of drawbacks. So, it became necessary to use alternative methods to evaluate the stage of liver fibrosis. Many machine learning techniques were used as non-invasive alternative methods for doing the liver fibrosis prediction task to avoid the disadvantages of the liver biopsy. This study surveys many machine learning techniques that were applied for liver fibrosis prediction and differentiation between the stages of hepatic fibrosis on different medical HBV and HCV datasets using different blood tests and clinical parameters with applying several feature selection techniques. Also, the results and performance of classifier models are reviewed with comparison to non-invasive methods, which used for liver fibrosis prediction, such as FIB-4 index score and APRI score.
{"title":"Survey of Liver Fibrosis Prediction Using Machine Learning Techniques","authors":"Eslam Sharshar, Huda Amin, N. Badr, E. Abdelsameea","doi":"10.21608/ijicis.2023.180102.1238","DOIUrl":"https://doi.org/10.21608/ijicis.2023.180102.1238","url":null,"abstract":": The prediction of liver fibrosis stages in Hepatitis B virus (HBV) and Hepatitis C virus (HCV) is an important issue. The gold standard for liver fibrosis stages evaluation is the liver biopsy but with a lot of drawbacks. So, it became necessary to use alternative methods to evaluate the stage of liver fibrosis. Many machine learning techniques were used as non-invasive alternative methods for doing the liver fibrosis prediction task to avoid the disadvantages of the liver biopsy. This study surveys many machine learning techniques that were applied for liver fibrosis prediction and differentiation between the stages of hepatic fibrosis on different medical HBV and HCV datasets using different blood tests and clinical parameters with applying several feature selection techniques. Also, the results and performance of classifier models are reviewed with comparison to non-invasive methods, which used for liver fibrosis prediction, such as FIB-4 index score and APRI score.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121213975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.21608/ijicis.2023.193520.1255
Yomna A. Kawashti, D. Khattab, M. Aref
: With the rapid advancements of deep learning research, there have been many milestones achieved in the field of computer vision. However, most of these advances are only applicable in cases where hand-annotated datasets are available. This is considered the current bottleneck of deep learning that self-supervised learning aims to overcome. The self-supervised framework consists of proxy and target tasks. The proxy task is a self-supervised task pretrained on unlabeled data, the weights of which are transferred to the target task. The prevalent paradigm in self-supervised research is to pretrain using ImageNet which is a single-object centric dataset. In this work, we investigate whether this is the best choice when the target task is multi-object centric. We pretrain “SimSiam” which is a non-contrastive self-supervised algorithm using two different pretraining datasets: ImageNet100 (single-object centric) and COCO (multi-object centric). The transfer performance of each pretrained model is evaluated on the target task of multi-label classification using PascalVOC. Furtherly, we evaluate the two pretrained models using CityScapes; an autonomous driving dataset in order to study the implications of the chosen pretraining datasets in different domains. Our results showed that the SimSiam model pretrained using COCO consistently outperformed the ImageNet100 pretrained model by ~+1 percent (57.4 vs 58.3 mAP for CityScapes). This is significant since COCO is smaller in size. We conclude that using multi-object centric datasets for pretraining self-supervised learning algorithms is more efficient in cases where the target task is multi-object centric and in complex scene understanding tasks such as autonomous driving applications.
{"title":"Exploring Self-Supervised Pretraining Datasets for Complex Scene Understanding","authors":"Yomna A. Kawashti, D. Khattab, M. Aref","doi":"10.21608/ijicis.2023.193520.1255","DOIUrl":"https://doi.org/10.21608/ijicis.2023.193520.1255","url":null,"abstract":": With the rapid advancements of deep learning research, there have been many milestones achieved in the field of computer vision. However, most of these advances are only applicable in cases where hand-annotated datasets are available. This is considered the current bottleneck of deep learning that self-supervised learning aims to overcome. The self-supervised framework consists of proxy and target tasks. The proxy task is a self-supervised task pretrained on unlabeled data, the weights of which are transferred to the target task. The prevalent paradigm in self-supervised research is to pretrain using ImageNet which is a single-object centric dataset. In this work, we investigate whether this is the best choice when the target task is multi-object centric. We pretrain “SimSiam” which is a non-contrastive self-supervised algorithm using two different pretraining datasets: ImageNet100 (single-object centric) and COCO (multi-object centric). The transfer performance of each pretrained model is evaluated on the target task of multi-label classification using PascalVOC. Furtherly, we evaluate the two pretrained models using CityScapes; an autonomous driving dataset in order to study the implications of the chosen pretraining datasets in different domains. Our results showed that the SimSiam model pretrained using COCO consistently outperformed the ImageNet100 pretrained model by ~+1 percent (57.4 vs 58.3 mAP for CityScapes). This is significant since COCO is smaller in size. We conclude that using multi-object centric datasets for pretraining self-supervised learning algorithms is more efficient in cases where the target task is multi-object centric and in complex scene understanding tasks such as autonomous driving applications.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131702275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}