Pub Date : 2024-01-27DOI: 10.5121/csit.2024.140213
Hemendra Vyas
Data is growing enormously across all industries, banking and financial institutions are no exception. Financial organizations are increasingly interested in effectively managing and using day to day data to make business decisions and complying with new and existing regulations. There are general regulatory requirements for data retention of up to 7 years which makes the overall data management process challenging. To overcome this challenge banks and financial institutes rely on regular data backups of individual applications. With new regulations such as Fundamental Review of Trading Books being implemented in 2023-24, which impact multiple areas of bank, there is an immediate need for a centralized database to handle big data. In this paper author proposes a big data platform for a typical investment bank which can unify the data needs of Trading, Market Risk, Credit Risk, Counterparty Risk, Enterprise Risk Management and Model Risk Management and help with regulatory compliance.
{"title":"Data Management for Trading, Risk and Regulatory Compliance in Investment Banking","authors":"Hemendra Vyas","doi":"10.5121/csit.2024.140213","DOIUrl":"https://doi.org/10.5121/csit.2024.140213","url":null,"abstract":"Data is growing enormously across all industries, banking and financial institutions are no exception. Financial organizations are increasingly interested in effectively managing and using day to day data to make business decisions and complying with new and existing regulations. There are general regulatory requirements for data retention of up to 7 years which makes the overall data management process challenging. To overcome this challenge banks and financial institutes rely on regular data backups of individual applications. With new regulations such as Fundamental Review of Trading Books being implemented in 2023-24, which impact multiple areas of bank, there is an immediate need for a centralized database to handle big data. In this paper author proposes a big data platform for a typical investment bank which can unify the data needs of Trading, Market Risk, Credit Risk, Counterparty Risk, Enterprise Risk Management and Model Risk Management and help with regulatory compliance.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"153 1-3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140492462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.5121/csit.2024.140206
Ram Sivaraman, Joe Xiao
An electrocardiogram (ECG) is a common method used for diagnosis of heart diseases. ECG is not sufficient to detect heart abnormalities early. Heart sound monitoring or phonocardiogram (PCG) is a non-invasive assessment that can be performed during routine exams. PCG can provide valuable details for both heart disorder diagnosis as well as any perioperative cardiac monitoring. Further, heart murmurs are abnormal signals generated by turbulent blood flow in the heart and are closely associated with specific heart diseases. This paper presents a new machine learning-based heart sounds evaluation for murmurs with high accuracy. A random forest classifier is built using the statistical moments of the coefficients extracted from the heart sounds. The classifier can predict the location of the heart sounds with over 90% accuracy. The random forest classifier has a murmur detection accuracy of over 70% for test dataset and detects with over 98% accuracy for the full dataset.
{"title":"A Novel Machine Learning-Based Heart Murmur Detection and Classification using Sound Feature Analysis","authors":"Ram Sivaraman, Joe Xiao","doi":"10.5121/csit.2024.140206","DOIUrl":"https://doi.org/10.5121/csit.2024.140206","url":null,"abstract":"An electrocardiogram (ECG) is a common method used for diagnosis of heart diseases. ECG is not sufficient to detect heart abnormalities early. Heart sound monitoring or phonocardiogram (PCG) is a non-invasive assessment that can be performed during routine exams. PCG can provide valuable details for both heart disorder diagnosis as well as any perioperative cardiac monitoring. Further, heart murmurs are abnormal signals generated by turbulent blood flow in the heart and are closely associated with specific heart diseases. This paper presents a new machine learning-based heart sounds evaluation for murmurs with high accuracy. A random forest classifier is built using the statistical moments of the coefficients extracted from the heart sounds. The classifier can predict the location of the heart sounds with over 90% accuracy. The random forest classifier has a murmur detection accuracy of over 70% for test dataset and detects with over 98% accuracy for the full dataset.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140492904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.5121/csit.2024.140204
Patrick H. Gaughan, En Cheng, Taylor C. Burgess, Aine C. Bolton
Over the centuries, the U.S. practice of law has evolved into a complex and amorphous profession. To facilitate improved analysis and understanding, this exploratory study seeks to partition law practice areas into meaningful subgroups. The study applies Latent Dirichlet Allocation (“LDA”) as a soft clustering method to 437,210 individual U.S. lawyer profiles in private practice in 2000. The profiles came from a nationally recognized directory. The resulting subgroupings contain terms consistent with the hypothesized relationships. The results also suggest the possibility of systematically binning individual practice areas into discrete practice area distributions. As such, this study makes contributions to the existing literature in at least three areas: 1) it provides support for the existence of the hypothesized law practice relationships; 2) it provides an empirical basis for developing an improved measurement of the U.S. practice of law; and 3) this study also suggests additional research to advance the field.
{"title":"Using Latent Dirichlet Allocation to Explore the Dimensionality of the U.S. Practice of Law","authors":"Patrick H. Gaughan, En Cheng, Taylor C. Burgess, Aine C. Bolton","doi":"10.5121/csit.2024.140204","DOIUrl":"https://doi.org/10.5121/csit.2024.140204","url":null,"abstract":"Over the centuries, the U.S. practice of law has evolved into a complex and amorphous profession. To facilitate improved analysis and understanding, this exploratory study seeks to partition law practice areas into meaningful subgroups. The study applies Latent Dirichlet Allocation (“LDA”) as a soft clustering method to 437,210 individual U.S. lawyer profiles in private practice in 2000. The profiles came from a nationally recognized directory. The resulting subgroupings contain terms consistent with the hypothesized relationships. The results also suggest the possibility of systematically binning individual practice areas into discrete practice area distributions. As such, this study makes contributions to the existing literature in at least three areas: 1) it provides support for the existence of the hypothesized law practice relationships; 2) it provides an empirical basis for developing an improved measurement of the U.S. practice of law; and 3) this study also suggests additional research to advance the field.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"59 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140492758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.5121/csit.2024.140202
Kamal Sarkar, S. Chowdhury
The process of creating a single summary from a group of related text documents obtained from many sources is known as multi-document summarization. The efficacy of a multidocument summarization system is heavily reliant upon the sentence similarity metric employed to eliminate redundant sentences from the summary, given that the documents may contain redundant information. The sentence similarity measure is also crucial for a graph-based multi-document summarization, where the presence of an edge between two phrases is decided by how similar the two sentences are to one another. To enhance multi-document summarization performance, this study provides a new method for defining a hybrid sentence similarity measure combining a lexical similarity measure and a BERT-based semantic similarity measure. Tests conducted on the benchmark datasets demonstrate how well the proposed hybrid sentence similarity metric is effective for enhancing multi-document summarization performance.
{"title":"Improving Salience-Based Multi-Document Summarization Performance using a Hybrid Sentence Similarity Measure","authors":"Kamal Sarkar, S. Chowdhury","doi":"10.5121/csit.2024.140202","DOIUrl":"https://doi.org/10.5121/csit.2024.140202","url":null,"abstract":"The process of creating a single summary from a group of related text documents obtained from many sources is known as multi-document summarization. The efficacy of a multidocument summarization system is heavily reliant upon the sentence similarity metric employed to eliminate redundant sentences from the summary, given that the documents may contain redundant information. The sentence similarity measure is also crucial for a graph-based multi-document summarization, where the presence of an edge between two phrases is decided by how similar the two sentences are to one another. To enhance multi-document summarization performance, this study provides a new method for defining a hybrid sentence similarity measure combining a lexical similarity measure and a BERT-based semantic similarity measure. Tests conducted on the benchmark datasets demonstrate how well the proposed hybrid sentence similarity metric is effective for enhancing multi-document summarization performance.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"11 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140492233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.5121/csit.2024.140211
Jason S. Chu, Sindhu Ghanta
Exploring the area of multimodal sentiment analysis, this paper addresses the growing significance of this field, driven by the exponential rise in multimodal data across platforms like YouTube. Traditional sentiment analysis, primarily focused on textual data, often overlooks the complexities and nuances of human emotions conveyed through audio and visual cues. Addressing this gap, our study explores a comprehensive approach that integrates data from text, audio, and images, applying state-of-the-art machine learning and deep learning techniques tailored to each modality. Our methodology is tested on the CMU-MOSEI dataset, a multimodal collection from YouTube, offering a diverse range of human sentiments. Our research highlights the limitations of conventional text-based sentiment analysis, especially in the context of the intricate expressions of sentiment that multimodal data encapsulates. By fusing audio and visual information with textual analysis, we aim to capture a more complete spectrum of human emotions. Our experimental results demonstrate notable improvements in precision, recall and accuracy for emotion prediction, validating the efficacy of our multimodal approach over single-modality methods. This study not only contributes to the ongoing advancements in sentiment analysis but also underscores the potential of multimodal approaches in providing more accurate and nuanced interpretations of human emotions.
{"title":"Integrative Sentiment Analysis: Leveraging Audio, Visual, and Textual Data","authors":"Jason S. Chu, Sindhu Ghanta","doi":"10.5121/csit.2024.140211","DOIUrl":"https://doi.org/10.5121/csit.2024.140211","url":null,"abstract":"Exploring the area of multimodal sentiment analysis, this paper addresses the growing significance of this field, driven by the exponential rise in multimodal data across platforms like YouTube. Traditional sentiment analysis, primarily focused on textual data, often overlooks the complexities and nuances of human emotions conveyed through audio and visual cues. Addressing this gap, our study explores a comprehensive approach that integrates data from text, audio, and images, applying state-of-the-art machine learning and deep learning techniques tailored to each modality. Our methodology is tested on the CMU-MOSEI dataset, a multimodal collection from YouTube, offering a diverse range of human sentiments. Our research highlights the limitations of conventional text-based sentiment analysis, especially in the context of the intricate expressions of sentiment that multimodal data encapsulates. By fusing audio and visual information with textual analysis, we aim to capture a more complete spectrum of human emotions. Our experimental results demonstrate notable improvements in precision, recall and accuracy for emotion prediction, validating the efficacy of our multimodal approach over single-modality methods. This study not only contributes to the ongoing advancements in sentiment analysis but also underscores the potential of multimodal approaches in providing more accurate and nuanced interpretations of human emotions.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"97 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140492634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a time when social media information is a valuable resource for gaining insights, the COVID-19 pandemic has released a flood of public sentiment, abundant with unstructured text data. This paper introduces CovBERT, a novel adaptation of the BERT model, specifically honed for the nuanced analysis of COVID-19-related discourse on X (formerly Twitter). CovBERT stands out by incorporating a bespoke vocabulary, meticulously curated from pandemic-centric tweets, resulting in a remarkable leap in sentiment analysis accuracy—from the baseline 72% to an impressive 78.64%. This paper not only presents a detailed comparison of CovBERT with the standard BERT model but also juxtaposes it against traditional machine learning approaches, showcasing its superior proficiency in decoding complex emotional undercurrents in social media data. Furthermore, the integration of geolocation analysis pipeline adds another layer of depth, offering a panoramic view of global sentiment trends.
{"title":"COVBERT: Enhancing Sentiment Analysis Accuracy in COVID-19 X Data through Customized BERT","authors":"Vanshaj Gupta, Jaydeep Patel, Safa Shubbar, Kambiz Ghazinour","doi":"10.5121/csit.2024.140212","DOIUrl":"https://doi.org/10.5121/csit.2024.140212","url":null,"abstract":"In a time when social media information is a valuable resource for gaining insights, the COVID-19 pandemic has released a flood of public sentiment, abundant with unstructured text data. This paper introduces CovBERT, a novel adaptation of the BERT model, specifically honed for the nuanced analysis of COVID-19-related discourse on X (formerly Twitter). CovBERT stands out by incorporating a bespoke vocabulary, meticulously curated from pandemic-centric tweets, resulting in a remarkable leap in sentiment analysis accuracy—from the baseline 72% to an impressive 78.64%. This paper not only presents a detailed comparison of CovBERT with the standard BERT model but also juxtaposes it against traditional machine learning approaches, showcasing its superior proficiency in decoding complex emotional undercurrents in social media data. Furthermore, the integration of geolocation analysis pipeline adds another layer of depth, offering a panoramic view of global sentiment trends.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"3 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140492310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Internet of Things (IoT) has emerged as the next big technological revolution in recent years with the potential to transform every sphere of human life. As devices, applications, and communication networks become increasingly connected and integrated, security and privacy concerns in IoT are growing at an alarming rate as well. While existing research has largely focused on centralized systems to detect security attacks, these systems do not scale well with the rapid growth of IoT devices and pose a single-point of failure risk. Furthermore, since data is extensively dispersed across huge networks of connected devices, decentralized computing is critical. Federated learning (FL) systems in the recent times has gained popularity as the distributed machine learning model that enables IoT edge devices to collaboratively train models in a decentralized manner while ensuring that data on a user’s device stays private without the contents or details of that data ever leaving that device. In this paper, we propose a federated learning based intrusion detection system using LSTM Autoencoder. The proposed technique allows IoT devices to train a global model without revealing their private data, enabling the training model to grow in size while protecting each participants local data. We conduct extensive experiments using the BoT-IoT data set and demonstrate that our solution can not only effectively improve IoT security against unknown attacks but also ensure users data privacy.
{"title":"Building a Robust Federated Learning based Intrusion Detection System in Internet of Things","authors":"Afrooz Rahmati, Afra Mashhadi, Geethapriya Thamilarasu","doi":"10.5121/csit.2024.140201","DOIUrl":"https://doi.org/10.5121/csit.2024.140201","url":null,"abstract":"The Internet of Things (IoT) has emerged as the next big technological revolution in recent years with the potential to transform every sphere of human life. As devices, applications, and communication networks become increasingly connected and integrated, security and privacy concerns in IoT are growing at an alarming rate as well. While existing research has largely focused on centralized systems to detect security attacks, these systems do not scale well with the rapid growth of IoT devices and pose a single-point of failure risk. Furthermore, since data is extensively dispersed across huge networks of connected devices, decentralized computing is critical. Federated learning (FL) systems in the recent times has gained popularity as the distributed machine learning model that enables IoT edge devices to collaboratively train models in a decentralized manner while ensuring that data on a user’s device stays private without the contents or details of that data ever leaving that device. In this paper, we propose a federated learning based intrusion detection system using LSTM Autoencoder. The proposed technique allows IoT devices to train a global model without revealing their private data, enabling the training model to grow in size while protecting each participants local data. We conduct extensive experiments using the BoT-IoT data set and demonstrate that our solution can not only effectively improve IoT security against unknown attacks but also ensure users data privacy.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140492168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.5121/csit.2024.140207
Th´eo Zangato, A. Osmani, Pegah Alizadeh
Amidst increasing energy demands and growing environmental concerns, the promotion of sustainable and energy-efficient practices has become imperative. This paper introduces a reinforcement learning-based technique for optimizing energy consumption and its associated costs, with a focus on energy management systems. A three-step approach for the efficient management of charging cycles in energy storage units within buildings is presented combining RL with prior knowledge. A unique strategy is adopted: clustering building load curves to discern typical energy consumption patterns, embedding domain knowledge into the learning algorithm to refine the agent’s action space and predicting of future observations to make real-time decisions. We showcase the effectiveness of our method using real-world data. It enables controlled exploration and efficient training of Energy Management System (EMS) agents. When compared to the benchmark, our model reduces energy costs by up to 15%, cutting down consumption during peak periods, and demonstrating adaptability across various building consumption profiles.
{"title":"Prior-Information Enhanced Reinforcement Learning for Energy Management Systems","authors":"Th´eo Zangato, A. Osmani, Pegah Alizadeh","doi":"10.5121/csit.2024.140207","DOIUrl":"https://doi.org/10.5121/csit.2024.140207","url":null,"abstract":"Amidst increasing energy demands and growing environmental concerns, the promotion of sustainable and energy-efficient practices has become imperative. This paper introduces a reinforcement learning-based technique for optimizing energy consumption and its associated costs, with a focus on energy management systems. A three-step approach for the efficient management of charging cycles in energy storage units within buildings is presented combining RL with prior knowledge. A unique strategy is adopted: clustering building load curves to discern typical energy consumption patterns, embedding domain knowledge into the learning algorithm to refine the agent’s action space and predicting of future observations to make real-time decisions. We showcase the effectiveness of our method using real-world data. It enables controlled exploration and efficient training of Energy Management System (EMS) agents. When compared to the benchmark, our model reduces energy costs by up to 15%, cutting down consumption during peak periods, and demonstrating adaptability across various building consumption profiles.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"83 1-2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140491938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.5121/csit.2024.140210
Suliman Alnutefy, Ali Alsuwayh
This research focuses on Unsupervised Anomaly Detection using the "ambient_temperature_system_failure.csv" dataset from Numenta Anomaly Benchmark (NAB). The dataset contains time-series temperature readings from an industrial machine's sensor. The aim is to detect anomalies indicating system failures or aberrant behavior without labeled data. Various algorithms, such as K-means, Gaussian/Elliptic Envelopes, Markov Chain, Isolation Forest, One-Class SVM, and RNNs, are applied to analyze the temperature data. These algorithms are chosen for their ability to identify significant deviations in unlabeled datasets. The study explores how these techniques enhance anomaly understanding in time series data, relevant in manufacturing, healthcare, and finance. This research's novelty lies in employing unsupervised learning techniques on a real-world dataset and understanding theiradaptability in anomaly detection. The results are expected to contribute valuable insights to the field, showcasing the practicality and effectiveness of these algorithms across various scenarios.
{"title":"Unsupervised Anomaly Detection","authors":"Suliman Alnutefy, Ali Alsuwayh","doi":"10.5121/csit.2024.140210","DOIUrl":"https://doi.org/10.5121/csit.2024.140210","url":null,"abstract":"This research focuses on Unsupervised Anomaly Detection using the \"ambient_temperature_system_failure.csv\" dataset from Numenta Anomaly Benchmark (NAB). The dataset contains time-series temperature readings from an industrial machine's sensor. The aim is to detect anomalies indicating system failures or aberrant behavior without labeled data. Various algorithms, such as K-means, Gaussian/Elliptic Envelopes, Markov Chain, Isolation Forest, One-Class SVM, and RNNs, are applied to analyze the temperature data. These algorithms are chosen for their ability to identify significant deviations in unlabeled datasets. The study explores how these techniques enhance anomaly understanding in time series data, relevant in manufacturing, healthcare, and finance. This research's novelty lies in employing unsupervised learning techniques on a real-world dataset and understanding theiradaptability in anomaly detection. The results are expected to contribute valuable insights to the field, showcasing the practicality and effectiveness of these algorithms across various scenarios.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"40 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140492305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.5121/csit.2024.140214
Fuping Ren, Jian Chen, Defu Zhang
Understanding complex policy documents can be challenging, highlighting the need for intelligent interpretation of Chinese policies. To enhance Chinese text summarization, this study utilized the mT5 model as the core framework and initial weights. Additionally, it reduced model size through parameter clipping, employed the Gap Sentence Generation (GSG) method as an unsupervised technique, and enhanced the Chinese tokenizer. After training on a meticulously processed 30GB Chinese training corpus, the study developed the enhanced mT5- GSG model. When fine-tuning on Chinese policy texts, it adopted the "Dropout Twice" approach and ingeniously merged the probability distribution of the two dropouts using the Wasserstein distance. Experimental results indicate that the proposed model achieved Rouge-1, Rouge-2, and Rouge-L scores of 56.13%, 45.76%, and 56.41% respectively on the Chinese policy text summarization dataset.
{"title":"An Improved mT5 Model for Chinese Text Summary Generation","authors":"Fuping Ren, Jian Chen, Defu Zhang","doi":"10.5121/csit.2024.140214","DOIUrl":"https://doi.org/10.5121/csit.2024.140214","url":null,"abstract":"Understanding complex policy documents can be challenging, highlighting the need for intelligent interpretation of Chinese policies. To enhance Chinese text summarization, this study utilized the mT5 model as the core framework and initial weights. Additionally, it reduced model size through parameter clipping, employed the Gap Sentence Generation (GSG) method as an unsupervised technique, and enhanced the Chinese tokenizer. After training on a meticulously processed 30GB Chinese training corpus, the study developed the enhanced mT5- GSG model. When fine-tuning on Chinese policy texts, it adopted the \"Dropout Twice\" approach and ingeniously merged the probability distribution of the two dropouts using the Wasserstein distance. Experimental results indicate that the proposed model achieved Rouge-1, Rouge-2, and Rouge-L scores of 56.13%, 45.76%, and 56.41% respectively on the Chinese policy text summarization dataset.","PeriodicalId":104179,"journal":{"name":"AI, Machine Learning and Applications","volume":"86 9-10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140491906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}