{"title":"Contextual Fine-Tuning of Language Models with Classifier-Driven Content Moderation for Text Generation.","authors":"Matan Punnaivanam, Palani Velvizhy","doi":"10.3390/e26121114","DOIUrl":null,"url":null,"abstract":"<p><p>In today's digital age, ensuring the appropriateness of content for children is crucial for their cognitive and emotional development. The rise of automated text generation technologies, such as Large Language Models like LLaMA, Mistral, and Zephyr, has created a pressing need for effective tools to filter and classify suitable content. However, the existing methods often fail to effectively address the intricate details and unique characteristics of children's literature. This study aims to bridge this gap by developing a robust framework that utilizes fine-tuned language models, classification techniques, and contextual story generation to generate and classify children's stories based on their suitability. Employing a combination of fine-tuning techniques on models such as LLaMA, Mistral, and Zephyr, alongside a BERT-based classifier, we evaluated the generated stories against established metrics like ROUGE, METEOR, and BERT Scores. The fine-tuned Mistral-7B model achieved a ROUGE-1 score of 0.4785, significantly higher than the base model's 0.3185, while Zephyr-7B-Beta achieved a METEOR score of 0.4154 compared to its base counterpart's score of 0.3602. The results indicated that the fine-tuned models outperformed base models, generating content more aligned with human standards. Moreover, the BERT Classifier exhibited high precision (0.95) and recall (0.97) for identifying unsuitable content, further enhancing the reliability of content classification. These findings highlight the potential of advanced language models in generating age-appropriate stories and enhancing content moderation strategies. This research has broader implications for educational technology, content curation, and parental control systems, offering a scalable approach to ensuring children's exposure to safe and enriching narratives.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"26 12","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11675295/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e26121114","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
In today's digital age, ensuring the appropriateness of content for children is crucial for their cognitive and emotional development. The rise of automated text generation technologies, such as Large Language Models like LLaMA, Mistral, and Zephyr, has created a pressing need for effective tools to filter and classify suitable content. However, the existing methods often fail to effectively address the intricate details and unique characteristics of children's literature. This study aims to bridge this gap by developing a robust framework that utilizes fine-tuned language models, classification techniques, and contextual story generation to generate and classify children's stories based on their suitability. Employing a combination of fine-tuning techniques on models such as LLaMA, Mistral, and Zephyr, alongside a BERT-based classifier, we evaluated the generated stories against established metrics like ROUGE, METEOR, and BERT Scores. The fine-tuned Mistral-7B model achieved a ROUGE-1 score of 0.4785, significantly higher than the base model's 0.3185, while Zephyr-7B-Beta achieved a METEOR score of 0.4154 compared to its base counterpart's score of 0.3602. The results indicated that the fine-tuned models outperformed base models, generating content more aligned with human standards. Moreover, the BERT Classifier exhibited high precision (0.95) and recall (0.97) for identifying unsuitable content, further enhancing the reliability of content classification. These findings highlight the potential of advanced language models in generating age-appropriate stories and enhancing content moderation strategies. This research has broader implications for educational technology, content curation, and parental control systems, offering a scalable approach to ensuring children's exposure to safe and enriching narratives.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.