Pub Date : 2024-07-22DOI: 10.1007/s43674-024-00075-5
Bingzi Jin, Xiaojie Xu
During the last decade, the Chinese housing market has seen fast expansion, and the importance of housing price forecasts has surely increased, becoming an essential problem for policymakers and investors. In this article, we explore Gaussian process regressions across different kernels and basis functions for monthly office real estate price index forecasts for ten major Chinese cities from July 2005 to April 2021 using cross-validation and Bayesian optimizations that could endow the forecast models with higher adaptability and better generalization performance. The models constructed offer precise out-of-sample forecasts from May 2019 to April 2021 with relative root mean square errors ranging from 0.0205 to 0.5300% across the ten price indices. Benchmark analysis against the autoregressive model, autoregressive-generalized autoregressive conditional heteroskedasticity model, nonlinear autoregressive neural network model, support vector regression model, and regression tree model suggests that the Gaussian process regression model leads to statistically significant higher accuracy. Our findings might be utilized independently or in conjunction with other projections to create views on office real estate price index movements and undertake further policy research.
{"title":"Office real estate price index forecasts through Gaussian process regressions for ten major Chinese cities","authors":"Bingzi Jin, Xiaojie Xu","doi":"10.1007/s43674-024-00075-5","DOIUrl":"10.1007/s43674-024-00075-5","url":null,"abstract":"<div><p>During the last decade, the Chinese housing market has seen fast expansion, and the importance of housing price forecasts has surely increased, becoming an essential problem for policymakers and investors. In this article, we explore Gaussian process regressions across different kernels and basis functions for monthly office real estate price index forecasts for ten major Chinese cities from July 2005 to April 2021 using cross-validation and Bayesian optimizations that could endow the forecast models with higher adaptability and better generalization performance. The models constructed offer precise out-of-sample forecasts from May 2019 to April 2021 with relative root mean square errors ranging from 0.0205 to 0.5300% across the ten price indices. Benchmark analysis against the autoregressive model, autoregressive-generalized autoregressive conditional heteroskedasticity model, nonlinear autoregressive neural network model, support vector regression model, and regression tree model suggests that the Gaussian process regression model leads to statistically significant higher accuracy. Our findings might be utilized independently or in conjunction with other projections to create views on office real estate price index movements and undertake further policy research.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141817270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An approach to improve workers’ productivity performance without neglecting their well-being should be investigated. To elucidate the effects of systematic micro-break on intellectual concentration performance, a controlled laboratory experiment generated 31 participants’ data when each participant was performing cognitive comparison tasks. Systematic micro-break was given for 20 s after 7.5 min of cognitive work, for a total of 25 min of work tasks. Each participant performed the task under both conditions with and without micro-break intervention in a counterbalanced design. Two quantitative evaluations were made: the answering time and concentration time ratio. A subjective symptom questionnaire and the NASA task load index were applied for analytical consideration. The average answering time indicates that the performance under the influence of micro-break tends to be more stable over time and that it mitigates performance degradation compared to the performance in a condition without micro-break. For concentration time ratio scores, no significant difference was found between conditions with micro-break and without micro-break. However, a tendency was apparent by which the concentration time ratio score was higher in a condition with micro-break, which suggests higher cognitive performance. The subjective symptoms questionnaire indicated no significant difference between conditions with and without micro-break. Weighted NASA task load index questionnaire results indicated significant difference between both conditions with lower workload scores in conditions with micro-break. Results obtained from this study suggest that the implementation of systematic micro-break can support workers’ performance stability over time. Therefore, systematic micro-break can be promoted as a promising strategy for work recovery.
{"title":"Systematic micro-breaks affect concentration during cognitive comparison tasks: quantitative and qualitative measurements","authors":"Orchida Dianita, Kakeru Kitayama, Kimi Ueda, Hirotake Ishii, Hiroshi Shimoda, Fumiaki Obayashi","doi":"10.1007/s43674-024-00074-6","DOIUrl":"10.1007/s43674-024-00074-6","url":null,"abstract":"<div><p>An approach to improve workers’ productivity performance without neglecting their well-being should be investigated. To elucidate the effects of systematic micro-break on intellectual concentration performance, a controlled laboratory experiment generated 31 participants’ data when each participant was performing cognitive comparison tasks. Systematic micro-break was given for 20 s after 7.5 min of cognitive work, for a total of 25 min of work tasks. Each participant performed the task under both conditions with and without micro-break intervention in a counterbalanced design. Two quantitative evaluations were made: the answering time and concentration time ratio. A subjective symptom questionnaire and the NASA task load index were applied for analytical consideration. The average answering time indicates that the performance under the influence of micro-break tends to be more stable over time and that it mitigates performance degradation compared to the performance in a condition without micro-break. For concentration time ratio scores, no significant difference was found between conditions with micro-break and without micro-break. However, a tendency was apparent by which the concentration time ratio score was higher in a condition with micro-break, which suggests higher cognitive performance. The subjective symptoms questionnaire indicated no significant difference between conditions with and without micro-break. Weighted NASA task load index questionnaire results indicated significant difference between both conditions with lower workload scores in conditions with micro-break. Results obtained from this study suggest that the implementation of systematic micro-break can support workers’ performance stability over time. Therefore, systematic micro-break can be promoted as a promising strategy for work recovery.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43674-024-00074-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142412309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1007/s43674-024-00073-7
Jari Isohanni
Colour differentiation is crucial in machine learning and computer vision. It is often used when identifying items and objects based on distinct colours. While common colours like blue, red, green, and yellow are easily distinguishable, some applications require recognising subtle colour variations. Such demands arise in sectors like agriculture, printing, healthcare, and packaging. This research employs prevalent unsupervised learning techniques to detect printed colours on paper, focusing on CMYK ink (saturation) levels necessary for recognition against a white background. The aim is to assess whether unsupervised clustering can identify colours within QR-Codes. One use-case for this research is usage of functional inks, ones that change colour based on environmental factors. Within QR-Codes they serve as low-cost IoT sensors. Results of this research indicate that K-means, C-means, Gaussian Mixture Model (GMM), Hierarchical clustering, and Spectral clustering perform well in recognising colour differences when CMYK saturation is 20% or higher in at least one channel. K-means stands out when saturation drops below 10%, although its accuracy diminishes significantly, especially for yellow or magenta channels. A saturation of at least 10% in one CMYK channel is needed for reliable colour detection using unsupervised learning. To handle ink densities below 5%, further research or alternative unsupervised methods may be necessary.
{"title":"Recognising small colour changes with unsupervised learning, comparison of methods","authors":"Jari Isohanni","doi":"10.1007/s43674-024-00073-7","DOIUrl":"10.1007/s43674-024-00073-7","url":null,"abstract":"<div><p>Colour differentiation is crucial in machine learning and computer vision. It is often used when identifying items and objects based on distinct colours. While common colours like blue, red, green, and yellow are easily distinguishable, some applications require recognising subtle colour variations. Such demands arise in sectors like agriculture, printing, healthcare, and packaging. This research employs prevalent unsupervised learning techniques to detect printed colours on paper, focusing on CMYK ink (saturation) levels necessary for recognition against a white background. The aim is to assess whether unsupervised clustering can identify colours within QR-Codes. One use-case for this research is usage of functional inks, ones that change colour based on environmental factors. Within QR-Codes they serve as low-cost IoT sensors. Results of this research indicate that K-means, C-means, Gaussian Mixture Model (GMM), Hierarchical clustering, and Spectral clustering perform well in recognising colour differences when CMYK saturation is 20% or higher in at least one channel. K-means stands out when saturation drops below 10%, although its accuracy diminishes significantly, especially for yellow or magenta channels. A saturation of at least 10% in one CMYK channel is needed for reliable colour detection using unsupervised learning. To handle ink densities below 5%, further research or alternative unsupervised methods may be necessary.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43674-024-00073-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140696406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study builds upon the research conducted on the personal color decision system. It employs color space logic and reduces limitations associated with capturing photos, aiming to enhance the existing personal color decision method. The objective is to obtain more reliable and objective results for personal color analysis. Our proposed approach focuses on developing a comprehensive color selection framework by leveraging personal color databases and employing decision tree methods. The findings of this research suggest that utilizing personal color analysis in image creation can assist individuals in cultivating a positive and confident image, which holds significance in interpersonal relationships and social interactions.
{"title":"Personal color analysis using color space algorithm","authors":"Tanakorn Withurat, Wannapa Sripen, Juntanee Pattanasukkul, Witsarut Wongsim, Suchawalee Jeeratanyasakul, Thitirat Siriborvornratanakul","doi":"10.1007/s43674-024-00071-9","DOIUrl":"10.1007/s43674-024-00071-9","url":null,"abstract":"<div><p>This study builds upon the research conducted on the personal color decision system. It employs color space logic and reduces limitations associated with capturing photos, aiming to enhance the existing personal color decision method. The objective is to obtain more reliable and objective results for personal color analysis. Our proposed approach focuses on developing a comprehensive color selection framework by leveraging personal color databases and employing decision tree methods. The findings of this research suggest that utilizing personal color analysis in image creation can assist individuals in cultivating a positive and confident image, which holds significance in interpersonal relationships and social interactions.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140717859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-06DOI: 10.1007/s43674-024-00072-8
Lucas Elias de Andrade Cruvinel, Wanderlei Malaquias Pereira Jr., Amanda Isabela de Campos, Rogério Pinto Espíndola, Antover Panazzolo Sarmento, Daniel de Lima Araújo, Gustavo de Assis Costa, Roberto Viegas Dutra
The concrete mixture design and mix proportioning procedure, along with its influence on the compressive strength of concrete, is a well-known problem in civil engineering that requires the execution of numerous tests. With the emergence of modern machine learning techniques, the possibility of automating this process has become a reality. However, a significant volume of data is necessary to take advantage of existing models and algorithms. Recent literature presents different datasets, each with its own unique details, for training their models. In this paper, we integrated some of these existing datasets to improve training and, consequently, the models' results. Therefore, using this new dataset, we tested various models for the prediction task. The resulting dataset comprises 2358 records with seven input variables related to the mixture design, while the output represents the compressive strength of concrete. The dataset was subjected to several pre-processing techniques, and afterward, machine learning models, such as regressions, trees, and ensembles, were used to estimate the compressive strength. Some of these methods proved satisfactory for the prediction problem, with the best models achieving a coefficient of determination (R2) above 80%. Furthermore, a website with the trained model was created, allowing professionals in the field to utilize the AI technique in their everyday problem-solving.
{"title":"Application of artificial intelligence models to predict the compressive strength of concrete","authors":"Lucas Elias de Andrade Cruvinel, Wanderlei Malaquias Pereira Jr., Amanda Isabela de Campos, Rogério Pinto Espíndola, Antover Panazzolo Sarmento, Daniel de Lima Araújo, Gustavo de Assis Costa, Roberto Viegas Dutra","doi":"10.1007/s43674-024-00072-8","DOIUrl":"10.1007/s43674-024-00072-8","url":null,"abstract":"<div><p>The concrete mixture design and mix proportioning procedure, along with its influence on the compressive strength of concrete, is a well-known problem in civil engineering that requires the execution of numerous tests. With the emergence of modern machine learning techniques, the possibility of automating this process has become a reality. However, a significant volume of data is necessary to take advantage of existing models and algorithms. Recent literature presents different datasets, each with its own unique details, for training their models. In this paper, we integrated some of these existing datasets to improve training and, consequently, the models' results. Therefore, using this new dataset, we tested various models for the prediction task. The resulting dataset comprises 2358 records with seven input variables related to the mixture design, while the output represents the compressive strength of concrete. The dataset was subjected to several pre-processing techniques, and afterward, machine learning models, such as regressions, trees, and ensembles, were used to estimate the compressive strength. Some of these methods proved satisfactory for the prediction problem, with the best models achieving a coefficient of determination (<i>R</i><sup>2</sup>) above 80%. Furthermore, a website with the trained model was created, allowing professionals in the field to utilize the AI technique in their everyday problem-solving.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142410085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study introduces a web application designed to address the challenge of ensuring correct posture and performance in weightlifting exercises, with a particular focus on fundamental bodyweight movements targeting various body parts. The problem at hand primarily concerns beginners who require guidance for accurate exercise execution. To tackle this issue, the tool leverages a live camera in conjunction with the MediaPipe and OpenCV frameworks to extract key points from the user's body. It concentrates on seven core exercise postures, using these key points to calculate numerical values and angles. Users are required to adjust their view angles to activate the tool's pose estimation functions. An algorithm, based on predefined rules that determine posture thresholds and angles between three key points, is employed to detect incorrect postures, provide real-time feedback, and track repetition counts. The completion of all required stages is necessary to count a repetition as correct. Additionally, in this study, we have expanded the algorithm to include three new exercise postures: Bent over Dumbbell Row, Seated Triceps Press, and Dumbbell Fly. We have also adapted the system to detect the lying down view, which is essential for the Dumbbell Fly posture. The results of testing this application demonstrate further development potential, particularly in enhancing the model’s framework to accommodate challenges such as high light intensity, pale skin tones, and instances when a body part is obscured by an object.
{"title":"Real-time weight training counting and correction using MediaPipe","authors":"Thananan Luangaphirom, Sirirat Lueprasert, Phopthorn Kaewvichit, Siraphong Boonphotsiri, Tanakorn Burapasikarin, Thitirat Siriborvornratanakul","doi":"10.1007/s43674-024-00070-w","DOIUrl":"10.1007/s43674-024-00070-w","url":null,"abstract":"<div><p>This study introduces a web application designed to address the challenge of ensuring correct posture and performance in weightlifting exercises, with a particular focus on fundamental bodyweight movements targeting various body parts. The problem at hand primarily concerns beginners who require guidance for accurate exercise execution. To tackle this issue, the tool leverages a live camera in conjunction with the MediaPipe and OpenCV frameworks to extract key points from the user's body. It concentrates on seven core exercise postures, using these key points to calculate numerical values and angles. Users are required to adjust their view angles to activate the tool's pose estimation functions. An algorithm, based on predefined rules that determine posture thresholds and angles between three key points, is employed to detect incorrect postures, provide real-time feedback, and track repetition counts. The completion of all required stages is necessary to count a repetition as correct. Additionally, in this study, we have expanded the algorithm to include three new exercise postures: Bent over Dumbbell Row, Seated Triceps Press, and Dumbbell Fly. We have also adapted the system to detect the lying down view, which is essential for the Dumbbell Fly posture. The results of testing this application demonstrate further development potential, particularly in enhancing the model’s framework to accommodate challenges such as high light intensity, pale skin tones, and instances when a body part is obscured by an object.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140233491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contemporary font design is a labor-intensive process. To address this, we utilize deep learning, specifically StyleGAN2-ADA and Real-ESRGAN, for automated Thai font generation. StyleGAN2-ADA incorporates adaptive discriminator augmentation (ADA) for image synthesis. By integrating Real-ESRGAN, font quality is enhanced. Our approach produces diverse, high-resolution fonts, as demonstrated in comparative experiments. In a survey with 50 participants, StyleGAN2-ADA without augmentation proves superior in legibility and visual appeal, while StyleGAN2-ADA with augmentation excels in diversity. This research highlights the efficiency of deep learning in creating high-quality Thai fonts and has implications for automated font design advancement.
{"title":"StyleGAN2-ADA and Real-ESRGAN: Thai font generation with generative adversarial networks","authors":"Nidchapan Nitisukanan, Chotika Boonthaweechok, Prapatsorn Tiawpanichkij, Juthamas Pissakul, Naliya Maneesawangwong, Thitirat Siriborvornratanakul","doi":"10.1007/s43674-024-00069-3","DOIUrl":"10.1007/s43674-024-00069-3","url":null,"abstract":"<div><p>Contemporary font design is a labor-intensive process. To address this, we utilize deep learning, specifically StyleGAN2-ADA and Real-ESRGAN, for automated Thai font generation. StyleGAN2-ADA incorporates adaptive discriminator augmentation (ADA) for image synthesis. By integrating Real-ESRGAN, font quality is enhanced. Our approach produces diverse, high-resolution fonts, as demonstrated in comparative experiments. In a survey with 50 participants, StyleGAN2-ADA without augmentation proves superior in legibility and visual appeal, while StyleGAN2-ADA with augmentation excels in diversity. This research highlights the efficiency of deep learning in creating high-quality Thai fonts and has implications for automated font design advancement.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139957526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study aims to automate the reporting process of house inspections, which enables prospective buyers to make informed decisions. Currently, the inspection report generated by an inspector involves inserting all defect images into a spreadsheet software and manually captioning each image with identified defects. To the best of our knowledge, there are no previous works or datasets that have automated this process. Therefore, this paper proposes a new image captioning dataset for house defect inspection, which is benchmarked with three deep learning-based models. Our models are based on the encoder–decoder architecture where three image encoders (i.e., VGG16, MobileNet, and InceptionV3) and one GRU-based decoder with an additive attention mechanism of Bahdanau are experimented. The experimental results indicate that, despite similar training losses in all models, VGG16 takes the least time to train a model, while MobileNet achieves the highest BLEU-1 to BLEU-4 scores of 0.866, 0.850, 0.823, and 0.728, respectively. However, InceptionV3 is suggested as the optimal model, since it outperforms the others in terms of accurate attention plots and its BLEU scores are comparable to the best scores obtained by MobileNet.
{"title":"Automatic image captioning in Thai for house defect using a deep learning-based approach","authors":"Manadda Jaruschaimongkol, Krittin Satirapiwong, Kittipan Pipatsattayanuwong, Suwant Temviriyakul, Ratchanat Sangprasert, Thitirat Siriborvornratanakul","doi":"10.1007/s43674-023-00068-w","DOIUrl":"10.1007/s43674-023-00068-w","url":null,"abstract":"<div><p>This study aims to automate the reporting process of house inspections, which enables prospective buyers to make informed decisions. Currently, the inspection report generated by an inspector involves inserting all defect images into a spreadsheet software and manually captioning each image with identified defects. To the best of our knowledge, there are no previous works or datasets that have automated this process. Therefore, this paper proposes a new image captioning dataset for house defect inspection, which is benchmarked with three deep learning-based models. Our models are based on the encoder–decoder architecture where three image encoders (i.e., VGG16, MobileNet, and InceptionV3) and one GRU-based decoder with an additive attention mechanism of Bahdanau are experimented. The experimental results indicate that, despite similar training losses in all models, VGG16 takes the least time to train a model, while MobileNet achieves the highest BLEU-1 to BLEU-4 scores of 0.866, 0.850, 0.823, and 0.728, respectively. However, InceptionV3 is suggested as the optimal model, since it outperforms the others in terms of accurate attention plots and its BLEU scores are comparable to the best scores obtained by MobileNet.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139090588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-23DOI: 10.1007/s43674-023-00067-x
Ammara Khan, Muhammad Tahir Rasheed, Hufsa Khan
Deep learning has played an important role in many real-life applications, especially in image classification. It is often found that some domain data are highly skewed, i.e., most of the data belongs to a handful of majority classes, and the minority classes only contain small amounts of information. It is important to acknowledge that skewed class distribution poses a significant challenge to machine learning algorithms. Due to which in case of imbalanced data distribution, the majority of machine and deep learning algorithms are not effective or may fail when it is highly imbalanced. In this study, a comprehensive analysis in case of imbalanced dataset is performed by considering deep learning based well known models. In particular, the best feature extractor model is identified and the current trend of latest feature extraction model is investigated. Moreover, to determine the global scientific research on the image classification of imbalanced mushroom dataset, a bibliometric analysis is conducted from 1991 to 2022. In summary, our findings may offer researchers a quick benchmarking reference and alternative approach to assessing trends in imbalanced data distributions in image classification research.
{"title":"An empirical study of deep learning-based feature extractor models for imbalanced image classification","authors":"Ammara Khan, Muhammad Tahir Rasheed, Hufsa Khan","doi":"10.1007/s43674-023-00067-x","DOIUrl":"10.1007/s43674-023-00067-x","url":null,"abstract":"<div><p>Deep learning has played an important role in many real-life applications, especially in image classification. It is often found that some domain data are highly skewed, i.e., most of the data belongs to a handful of majority classes, and the minority classes only contain small amounts of information. It is important to acknowledge that skewed class distribution poses a significant challenge to machine learning algorithms. Due to which in case of imbalanced data distribution, the majority of machine and deep learning algorithms are not effective or may fail when it is highly imbalanced. In this study, a comprehensive analysis in case of imbalanced dataset is performed by considering deep learning based well known models. In particular, the best feature extractor model is identified and the current trend of latest feature extraction model is investigated. Moreover, to determine the global scientific research on the image classification of imbalanced mushroom dataset, a bibliometric analysis is conducted from 1991 to 2022. In summary, our findings may offer researchers a quick benchmarking reference and alternative approach to assessing trends in imbalanced data distributions in image classification research.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"3 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138449164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text generation from charts is a task that involves automatically generating natural language text descriptions of data presented in chart form. This is a useful capability for tasks such as summarizing data for presentation or providing alternative representations of data for accessibility. In this work, we propose a hybrid deep network approach for text generation from table images in an academic format. The input to the model is a table image, which is first processed using Tesseract OCR (optical character recognition) to extract the data. The data are then passed through a Transformer (i.e., T5, K2T) model to generate the final text output. We evaluate the performance of our model on a dataset of academic papers. Results show that our network is able to generate high-quality text descriptions of charts. Specifically, the average BLEU scores are 0.072355 for T5 and 0.037907 for K2T. Our results demonstrate the effectiveness of the hybrid deep network approach for text generation from table images in an academic format.
{"title":"Chart-to-text generation using a hybrid deep network","authors":"Nontaporn Wonglek, Siriwalai Maneesinthu, Sivakorn Srichaiyaperk, Teerapon Saengmuang, Thitirat Siriborvornratanakul","doi":"10.1007/s43674-023-00066-y","DOIUrl":"10.1007/s43674-023-00066-y","url":null,"abstract":"<div><p>Text generation from charts is a task that involves automatically generating natural language text descriptions of data presented in chart form. This is a useful capability for tasks such as summarizing data for presentation or providing alternative representations of data for accessibility. In this work, we propose a hybrid deep network approach for text generation from table images in an academic format. The input to the model is a table image, which is first processed using Tesseract OCR (optical character recognition) to extract the data. The data are then passed through a Transformer (i.e., T5, K2T) model to generate the final text output. We evaluate the performance of our model on a dataset of academic papers. Results show that our network is able to generate high-quality text descriptions of charts. Specifically, the average BLEU scores are 0.072355 for T5 and 0.037907 for K2T. Our results demonstrate the effectiveness of the hybrid deep network approach for text generation from table images in an academic format.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"3 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71908752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}