{"title":"基于分区的文本摘要聚类","authors":"Subhransu Dash, Tanuj Mohanty, Sri Rijul Das, Ankit Mohanty, Rasmita Rautray","doi":"10.1109/APSIT58554.2023.10201655","DOIUrl":null,"url":null,"abstract":"The exponential growth of digital data has resulted in an unprecedented amount of information being generated on a daily basis. It has become increasingly difficult to keep up with the sheer volume of information, and manual text summarization has become a tedious and time-consuming task. As a result, text summarization has grown in significance as a field of study in natural language processing. This study offers a text summarizing method that identifies a text's key sentences using partition-based clustering and similarity metrics. The sentence similarity score is computed using Euclidian Distance (Euc), Cosine Similarity (Cos), and Jaccard Similarity (Jac). The proposed model uses possible combinations of clustering and similarity algorithms and is validated over the Document Understanding Conferences (DUC) dataset. The proposed model combination of K-Mean clustering with cosine similarity shows significantly better results than the other summarizers. Overall, this paper provides an efficient and effective way to generate text summaries that capture the essential information in a given text.","PeriodicalId":170044,"journal":{"name":"2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PCTS: Partition Based Clustering for Text Summarization\",\"authors\":\"Subhransu Dash, Tanuj Mohanty, Sri Rijul Das, Ankit Mohanty, Rasmita Rautray\",\"doi\":\"10.1109/APSIT58554.2023.10201655\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The exponential growth of digital data has resulted in an unprecedented amount of information being generated on a daily basis. It has become increasingly difficult to keep up with the sheer volume of information, and manual text summarization has become a tedious and time-consuming task. As a result, text summarization has grown in significance as a field of study in natural language processing. This study offers a text summarizing method that identifies a text's key sentences using partition-based clustering and similarity metrics. The sentence similarity score is computed using Euclidian Distance (Euc), Cosine Similarity (Cos), and Jaccard Similarity (Jac). The proposed model uses possible combinations of clustering and similarity algorithms and is validated over the Document Understanding Conferences (DUC) dataset. The proposed model combination of K-Mean clustering with cosine similarity shows significantly better results than the other summarizers. Overall, this paper provides an efficient and effective way to generate text summaries that capture the essential information in a given text.\",\"PeriodicalId\":170044,\"journal\":{\"name\":\"2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIT58554.2023.10201655\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIT58554.2023.10201655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
PCTS: Partition Based Clustering for Text Summarization
The exponential growth of digital data has resulted in an unprecedented amount of information being generated on a daily basis. It has become increasingly difficult to keep up with the sheer volume of information, and manual text summarization has become a tedious and time-consuming task. As a result, text summarization has grown in significance as a field of study in natural language processing. This study offers a text summarizing method that identifies a text's key sentences using partition-based clustering and similarity metrics. The sentence similarity score is computed using Euclidian Distance (Euc), Cosine Similarity (Cos), and Jaccard Similarity (Jac). The proposed model uses possible combinations of clustering and similarity algorithms and is validated over the Document Understanding Conferences (DUC) dataset. The proposed model combination of K-Mean clustering with cosine similarity shows significantly better results than the other summarizers. Overall, this paper provides an efficient and effective way to generate text summaries that capture the essential information in a given text.