International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management最新文献

英文中文

Data analytics and knowledge management approach for COVID-19 prediction and control. 用于 COVID-19 预测和控制的数据分析和知识管理方法。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 Epub Date: 2022-06-11 DOI: 10.1007/s41870-022-00967-0

Iqbal Hasan, Prince Dhawan, S A M Rizvi, Sanjay Dhir

The Coronavirus Disease (COVID-19) caused by SARS-CoV-2, continues to be a global threat. The major global concern among scientists and researchers is to develop innovative digital solutions for prediction and control of infection and to discover drugs for its cure. In this paper we developed a strategic technical solution for surveillance and control of COVID-19 in Delhi-National Capital Region (NCR). This work aims to elucidate the Delhi COVID-19 Data Management Framework, the backend mechanism of integrated Command and Control Center (iCCC) with plugged-in modules for various administrative, medical and field operations. Based on the time-series data extracted from iCCC repository, the forecasting of COVID-19 spread has been carried out for Delhi using the Auto-Regressive Integrated Moving Average (ARIMA) model as it can effectively predict the logistics requirements, active cases, positive patients, and death rate. The intelligence generated through this research has paved the way for the Government of National Capital Territory Delhi to strategize COVID-19 related policies formulation and implementation on real time basis. The outcome of this innovative work has led to the drastic reduction in COVID-19 positive cases and deaths in Delhi-NCR.

由 SARS-CoV-2 引起的冠状病毒病（COVID-19）仍是一个全球性威胁。全球科学家和研究人员关注的主要问题是开发创新的数字解决方案，用于预测和控制感染，并发现治愈该疾病的药物。在本文中，我们为德里-国家首都地区（NCR）监测和控制 COVID-19 开发了一个战略性技术解决方案。这项工作旨在阐明德里 COVID-19 数据管理框架、集成指挥与控制中心（iCCC）的后台机制以及各种行政、医疗和现场操作的插件模块。根据从 iCCC 储存库中提取的时间序列数据，利用自回归综合移动平均（ARIMA）模型对德里的 COVID-19 传播进行了预测，因为该模型可以有效预测后勤需求、活动病例、阳性患者和死亡率。这项研究产生的情报为德里国家首都直辖区政府实时制定和实施 COVID-19 相关政策的战略铺平了道路。这项创新工作的成果使得德里-国家首都直辖区的 COVID-19 阳性病例和死亡人数大幅减少。

{"title":"Data analytics and knowledge management approach for COVID-19 prediction and control.","authors":"Iqbal Hasan, Prince Dhawan, S A M Rizvi, Sanjay Dhir","doi":"10.1007/s41870-022-00967-0","DOIUrl":"10.1007/s41870-022-00967-0","url":null,"abstract":"The Coronavirus Disease (COVID-19) caused by SARS-CoV-2, continues to be a global threat. The major global concern among scientists and researchers is to develop innovative digital solutions for prediction and control of infection and to discover drugs for its cure. In this paper we developed a strategic technical solution for surveillance and control of COVID-19 in Delhi-National Capital Region (NCR). This work aims to elucidate the Delhi COVID-19 Data Management Framework, the backend mechanism of integrated Command and Control Center (iCCC) with plugged-in modules for various administrative, medical and field operations. Based on the time-series data extracted from iCCC repository, the forecasting of COVID-19 spread has been carried out for Delhi using the Auto-Regressive Integrated Moving Average (ARIMA) model as it can effectively predict the logistics requirements, active cases, positive patients, and death rate. The intelligence generated through this research has paved the way for the Government of National Capital Territory Delhi to strategize COVID-19 related policies formulation and implementation on real time basis. The outcome of this innovative work has led to the drastic reduction in COVID-19 positive cases and deaths in Delhi-NCR.","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 2","pages":"937-954"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10829533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports. 一种新的基于质心的句子分类方法，用于新冠肺炎新闻报道的提取摘要。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 Epub Date: 2023-03-24 DOI: 10.1007/s41870-023-01221-x

Sumanta Banerjee, Shyamapada Mukherjee, Sivaji Bandyopadhyay

A COVID-19 news covers subtopics like infections, deaths, the economy, jobs, and more. The proposed method generates a news summary based on the subtopics of a reader's interest. It extracts a centroid having the lexical pattern of the sentences on those subtopics by the frequently used words in them. The centroid is then used as a query in the vector space model (VSM) for sentence classification and extraction, producing a query focused summarization (QFS) of the documents. Three approaches, TF-IDF, word vector averaging, and auto-encoder are experimented to generate sentence embedding that are used in VSM. These embeddings are ranked depending on their similarities with the query embedding. A Novel approach has been introduced to find the value for the similarity parameter using a supervised technique to classify the sentences. Finally, the performance of the method has been assessed in two different ways. All the sentences of the dataset are considered together in the first assessment and in the second, each document wise group of sentences is considered separately using fivefold cross-validation. The proposed method has achieved a minimum of 0.60 to a maximum of 0.63 mean F1 scores with the three sentence encoding approaches on the test dataset.

新冠肺炎新闻涵盖了感染、死亡、经济、就业等副主题。所提出的方法基于读者感兴趣的子主题生成新闻摘要。它通过子主题中的常用词来提取具有子主题句子词汇模式的质心。然后，质心被用作向量空间模型（VSM）中的查询，用于句子分类和提取，从而生成文档的以查询为中心的摘要（QFS）。实验了TF-IDF、词向量平均和自动编码器三种方法来生成VSM中使用的句子嵌入。这些嵌入根据它们与查询嵌入的相似性进行排序。引入了一种新的方法，使用监督技术对句子进行分类，以找到相似性参数的值。最后，通过两种不同的方式对该方法的性能进行了评估。在第一次评估中，数据集的所有句子都被一起考虑，在第二次评估中使用五倍交叉验证分别考虑每个文档中的句子组。所提出的方法在测试数据集上使用三种句子编码方法获得了最小0.60到最大0.63的平均F1分数。

{"title":"A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports.","authors":"Sumanta Banerjee, Shyamapada Mukherjee, Sivaji Bandyopadhyay","doi":"10.1007/s41870-023-01221-x","DOIUrl":"10.1007/s41870-023-01221-x","url":null,"abstract":"A COVID-19 news covers subtopics like infections, deaths, the economy, jobs, and more. The proposed method generates a news summary based on the subtopics of a reader's interest. It extracts a centroid having the lexical pattern of the sentences on those subtopics by the frequently used words in them. The centroid is then used as a query in the vector space model (VSM) for sentence classification and extraction, producing a query focused summarization (QFS) of the documents. Three approaches, TF-IDF, word vector averaging, and auto-encoder are experimented to generate sentence embedding that are used in VSM. These embeddings are ranked depending on their similarities with the query embedding. A Novel approach has been introduced to find the value for the similarity parameter using a supervised technique to classify the sentences. Finally, the performance of the method has been assessed in two different ways. All the sentences of the dataset are considered together in the first assessment and in the second, each document wise group of sentences is considered separately using fivefold cross-validation. The proposed method has achieved a minimum of 0.60 to a maximum of 0.63 mean F1 scores with the three sentence encoding approaches on the test dataset.","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 4","pages":"1789-1801"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9606378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (SBS). 基于聚类和基于相似度选择的信用卡欺诈检测类平衡框架。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 DOI: 10.1007/s41870-022-00987-w

Hadeel Ahmad, Bassam Kasasbeh, Balqees Aldabaybah, Enas Rawashdeh

Credit card fraud is a growing problem nowadays and it has escalated during COVID-19 due to the authorities in many countries requiring people to use cashless transactions. Every year, billions of Euros are lost due to credit card fraud transactions, therefore, fraud detection systems are essential for financial institutions. As the classes' distribution is not equally represented in the credit card dataset, the machine learning trains the model according to the majority class which leads to inaccurate fraud predictions. For that, in this research, we mainly focus on processing unbalanced data by using an under-sampling technique to get more accurate and better results with different machine learning algorithms. We propose a framework that is based on clustering the dataset using fuzzy C-means and selecting similar fraud and normal instances that have the same features, which guarantees the integrity between the data features.

信用卡欺诈如今是一个日益严重的问题，由于许多国家的当局要求人们使用无现金交易，这一问题在COVID-19期间升级了。每年，由于信用卡欺诈交易造成数十亿欧元的损失，因此，欺诈检测系统对金融机构至关重要。由于类的分布在信用卡数据集中没有均匀地表示，机器学习根据大多数类训练模型，从而导致不准确的欺诈预测。为此，在本研究中，我们主要通过使用欠采样技术来处理不平衡数据，从而通过不同的机器学习算法获得更准确和更好的结果。我们提出了一种基于模糊c均值聚类数据集的框架，选择具有相同特征的相似欺诈和正常实例，保证了数据特征之间的完整性。

{"title":"Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (SBS).","authors":"Hadeel Ahmad, Bassam Kasasbeh, Balqees Aldabaybah, Enas Rawashdeh","doi":"10.1007/s41870-022-00987-w","DOIUrl":"https://doi.org/10.1007/s41870-022-00987-w","url":null,"abstract":"Credit card fraud is a growing problem nowadays and it has escalated during COVID-19 due to the authorities in many countries requiring people to use cashless transactions. Every year, billions of Euros are lost due to credit card fraud transactions, therefore, fraud detection systems are essential for financial institutions. As the classes' distribution is not equally represented in the credit card dataset, the machine learning trains the model according to the majority class which leads to inaccurate fraud predictions. For that, in this research, we mainly focus on processing unbalanced data by using an under-sampling technique to get more accurate and better results with different machine learning algorithms. We propose a framework that is based on clustering the dataset using fuzzy C-means and selecting similar fraud and normal instances that have the same features, which guarantees the integrity between the data features.","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 1","pages":"325-333"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9209320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10650975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Improved local descriptor (ILD): a novel fusion method in face recognition. 改进的局部描述符（ILD）：一种新的人脸识别融合方法。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 Epub Date: 2023-04-16 DOI: 10.1007/s41870-023-01245-3

Shekhar Karanwal

Literature suggests that by fusing multiple features there is immense improvement in the recognition rates as compared to the recognition rates of single descriptor. This motivate researchers to develop more and more fused descriptors by joining multiple features. Inspiring from the literature work, the proposed work launch novel local descriptor so-called Improved Local Descriptor (ILD), by joining features of 4 local descriptors. These are LBP, ELBP, MBP and LPQ. LBP captures local details. ELBP capture robust features in horizontal and vertical directions (elliptically) by using 3 × 5 and 5 × 3 patches. MBP minimizes image noise by median comparison to all the pixels and LPQ quantize the frequency components for obtaining feature size. These essential merits of 4 descriptors are encapsulated in one framework in the form of histogram feature. PCA is used further for compression and SVMs and NN are used for classification. Results on ORL, GT and Faces94 confirms strength of ILD, which beats separately implemented descriptors and various benchmark methods.

文献表明，通过融合多个特征，与单个描述符的识别率相比，识别率有了巨大的提高。这促使研究人员通过连接多个特征来开发越来越多的融合描述符。受文献工作的启发，该工作通过结合4个局部描述符的特征，推出了新的局部描述符，即改进的局部描述符（ILD）。这些是LBP、ELBP、MBP和LPQ。LBP捕获本地详细信息。ELBP通过使用3 × 5和5 × 3个补丁。MBP通过与所有像素进行中值比较来最小化图像噪声，并且LPQ量化频率分量以获得特征大小。4个描述符的这些基本优点以直方图特征的形式封装在一个框架中。PCA被进一步用于压缩，SVM和NN被用于分类。ORL、GT和Faces94的结果证实了ILD的强度，它击败了单独实现的描述符和各种基准方法。

{"title":"Improved local descriptor (ILD): a novel fusion method in face recognition.","authors":"Shekhar Karanwal","doi":"10.1007/s41870-023-01245-3","DOIUrl":"10.1007/s41870-023-01245-3","url":null,"abstract":"Literature suggests that by fusing multiple features there is immense improvement in the recognition rates as compared to the recognition rates of single descriptor. This motivate researchers to develop more and more fused descriptors by joining multiple features. Inspiring from the literature work, the proposed work launch novel local descriptor so-called Improved Local Descriptor (ILD), by joining features of 4 local descriptors. These are LBP, ELBP, MBP and LPQ. LBP captures local details. ELBP capture robust features in horizontal and vertical directions (elliptically) by using 3 × 5 and 5 × 3 patches. MBP minimizes image noise by median comparison to all the pixels and LPQ quantize the frequency components for obtaining feature size. These essential merits of 4 descriptors are encapsulated in one framework in the form of histogram feature. PCA is used further for compression and SVMs and NN are used for classification. Results on ORL, GT and Faces94 confirms strength of ILD, which beats separately implemented descriptors and various benchmark methods.","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 4","pages":"1885-1894"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10106113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9554057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel GCL hybrid classification model for paddy diseases. 水稻病害的GCL杂交分类新模型。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 DOI: 10.1007/s41870-022-01094-6

Shweta Lamba, Anupam Baliyan, Vinay Kukreja

The demand for agricultural products increased exponentially as the global population grew. The rapid development of computer vision-based artificial intelligence and deep learning-related technologies has impacted a wide range of industries, including disease detection and classification. This paper introduces a novel neural network-based hybrid model (GCL). GCL is a dataset-augmentation fusion of long-short term memory (LSTM) and convolutional neural network (CNN) with generative adversarial network (GAN). GAN is used for the augmentation of the dataset, CNN extracts the features and LSTM classifies the various paddy diseases. The GCL model is being investigated to improve the classification model's accuracy and reliability. The dataset was compiled using secondary resources such as Mendeley, Kaggle, UCI, and GitHub, having images of bacterial blight, leaf smut, and rice blast. The experimental setup for proving the efficacy of the GCL model demonstrates that the GCL is suitable for disease classification and works with 97% testing accuracy. GCL can further be used for the classification of more diseases of paddy.

随着全球人口的增长，对农产品的需求呈指数级增长。基于计算机视觉的人工智能和深度学习相关技术的快速发展已经影响了包括疾病检测和分类在内的广泛行业。提出了一种新的基于神经网络的混合模型。GCL是一种长短期记忆(LSTM)、卷积神经网络(CNN)和生成对抗网络(GAN)的数据集增强融合。利用GAN对数据集进行增强，CNN提取特征，LSTM对各种水稻病害进行分类。为了提高分类模型的准确性和可靠性，对GCL模型进行了研究。该数据集是利用Mendeley, Kaggle, UCI和GitHub等二手资源编制的，其中有细菌性疫病，叶黑穗病和稻瘟病的图像。验证GCL模型有效性的实验装置表明，GCL模型适用于疾病分类，测试准确率为97%。GCL可进一步用于水稻病害的分类。

{"title":"A novel GCL hybrid classification model for paddy diseases.","authors":"Shweta Lamba, Anupam Baliyan, Vinay Kukreja","doi":"10.1007/s41870-022-01094-6","DOIUrl":"https://doi.org/10.1007/s41870-022-01094-6","url":null,"abstract":"The demand for agricultural products increased exponentially as the global population grew. The rapid development of computer vision-based artificial intelligence and deep learning-related technologies has impacted a wide range of industries, including disease detection and classification. This paper introduces a novel neural network-based hybrid model (GCL). GCL is a dataset-augmentation fusion of long-short term memory (LSTM) and convolutional neural network (CNN) with generative adversarial network (GAN). GAN is used for the augmentation of the dataset, CNN extracts the features and LSTM classifies the various paddy diseases. The GCL model is being investigated to improve the classification model's accuracy and reliability. The dataset was compiled using secondary resources such as Mendeley, Kaggle, UCI, and GitHub, having images of bacterial blight, leaf smut, and rice blast. The experimental setup for proving the efficacy of the GCL model demonstrates that the GCL is suitable for disease classification and works with 97% testing accuracy. GCL can further be used for the classification of more diseases of paddy.","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 2","pages":"1127-1136"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484355/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10829992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Editorial. 社论。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 DOI: 10.1007/s41870-023-01182-1

M N Hoda

引用次数: 0

A novel stock counting system for detecting lot numbers using Tesseract OCR. 一种利用Tesseract OCR检测批号的新型库存计数系统。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 DOI: 10.1007/s41870-022-01107-4

Parkpoom Lertsawatwicha, Phumidon Phathong, Napatsorn Tantasanee, Kotchakorn Sarawutthinun, Thitirat Siriborvornratanakul

Counting stock is one of the warehouse's methods for preventing insatiable stock. Moreover, it could help the company forecast how many products they need to store and predict the replenished goods for customers. However, stock count in the medical business, which sells specialized medical equipment, needs more focus on, because it uses to treat the patient. So that lack of inventory should not happen. In a normal situation, stock count at some hospitals is quite hard for salespeople, especially hospitals in upcountry that far away. During the COVID-19 situation, many limits need to be strict. At this point, it causes a shortage of goods in many hospitals. In this paper, we represent how computer vision can help this process. When the hospital's officer sends images of stock to our system. The system will recognize the quantity and lot number of goods that remain in the hospital. Therefore, salespeople can decrease the times to visit hospitals. The result showed that for text detection and text recognition in a specific use case. Our prototype system achieves 84.17% in accuracy.

盘点库存是仓库防止库存贪得无厌的方法之一。此外，它可以帮助公司预测他们需要储存多少产品，并预测为客户补充的货物。然而，销售专业医疗设备的医疗业务的库存数量需要更多的关注，因为它用于治疗患者。所以库存不足不应该发生。在正常情况下，一些医院的库存清点对销售人员来说是相当困难的，尤其是在内陆那么远的医院。在COVID-19形势下，有许多限制需要严格。在这一点上，它造成了许多医院物资短缺。在本文中，我们展示了计算机视觉如何帮助这一过程。当医院的工作人员将库存图片发送到我们的系统时。系统将识别留在医院的货物数量和批号。因此，销售人员可以减少去医院的次数。结果表明，对于文本检测和文本识别具有特定的用例。我们的原型系统达到了84.17%的准确率。

{"title":"A novel stock counting system for detecting lot numbers using Tesseract OCR.","authors":"Parkpoom Lertsawatwicha, Phumidon Phathong, Napatsorn Tantasanee, Kotchakorn Sarawutthinun, Thitirat Siriborvornratanakul","doi":"10.1007/s41870-022-01107-4","DOIUrl":"https://doi.org/10.1007/s41870-022-01107-4","url":null,"abstract":"Counting stock is one of the warehouse's methods for preventing insatiable stock. Moreover, it could help the company forecast how many products they need to store and predict the replenished goods for customers. However, stock count in the medical business, which sells specialized medical equipment, needs more focus on, because it uses to treat the patient. So that lack of inventory should not happen. In a normal situation, stock count at some hospitals is quite hard for salespeople, especially hospitals in upcountry that far away. During the COVID-19 situation, many limits need to be strict. At this point, it causes a shortage of goods in many hospitals. In this paper, we represent how computer vision can help this process. When the hospital's officer sends images of stock to our system. The system will recognize the quantity and lot number of goods that remain in the hospital. Therefore, salespeople can decrease the times to visit hospitals. The result showed that for text detection and text recognition in a specific use case. Our prototype system achieves 84.17% in accuracy.","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 1","pages":"393-398"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9540281/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10650744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children. 在非母语儿童中使用多尺度递归网络进行封闭集自动语音识别。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 Epub Date: 2023-03-18 DOI: 10.1007/s41870-023-01224-8

Kodali Radha, Mohan Bansal

Children may benefit from automatic speaker identification in a variety of applications, including child security, safety, and education. The key focus of this study is to develop a closed-set child speaker identification system for non-native speakers of English in both text-dependent and text-independent speech tasks in order to track how the speaker's fluency affects the system. The multi-scale wavelet scattering transform is used to compensate for concerns like the loss of high-frequency information caused by the most widely used mel frequency cepstral coefficients feature extractor. The proposed large-scale speaker identification system succeeds well by employing wavelet scattered Bi-LSTM. While this procedure is used to identify non-native children in multiple classes, average values of accuracy, precision, recall, and F-measure are being used to assess the performance of the model in text-independent and text-dependent tasks, which outperforms the existing models.

在儿童安保、安全和教育等多种应用中，自动识别说话者可能会使儿童受益。本研究的重点是为非英语母语者开发一个封闭集儿童说话者识别系统，在依赖文本和不依赖文本的语音任务中跟踪说话者的流利程度对系统的影响。多尺度小波散射变换用于弥补最广泛使用的融频倒频谱系数特征提取器造成的高频信息丢失等问题。通过采用小波散射 Bi-LSTM 技术，拟议的大规模说话者识别系统取得了成功。该程序用于识别多个类别中的非母语儿童，准确率、精确率、召回率和 F-measure 的平均值被用来评估模型在与文本无关和与文本有关的任务中的性能，该模型的性能优于现有模型。

{"title":"Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children.","authors":"Kodali Radha, Mohan Bansal","doi":"10.1007/s41870-023-01224-8","DOIUrl":"10.1007/s41870-023-01224-8","url":null,"abstract":"Children may benefit from automatic speaker identification in a variety of applications, including child security, safety, and education. The key focus of this study is to develop a closed-set child speaker identification system for non-native speakers of English in both text-dependent and text-independent speech tasks in order to track how the speaker's fluency affects the system. The multi-scale wavelet scattering transform is used to compensate for concerns like the loss of high-frequency information caused by the most widely used mel frequency cepstral coefficients feature extractor. The proposed large-scale speaker identification system succeeds well by employing wavelet scattered Bi-LSTM. While this procedure is used to identify non-native children in multiple classes, average values of accuracy, precision, recall, and F-measure are being used to assess the performance of the model in text-independent and text-dependent tasks, which outperforms the existing models.","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 3","pages":"1375-1385"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10023307/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9298354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial. 社论。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 DOI: 10.1007/s41870-023-01239-1

M N Hoda

引用次数: 0

An integrated clustering and BERT framework for improved topic modeling. 用于改进主题建模的集成集群和BERT框架。

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Pub Date : 2023-01-01 Epub Date: 2023-05-06 DOI: 10.1007/s41870-023-01268-w

Lijimol George, P Sumathy

Topic modelling is a machine learning technique that is extensively used in Natural Language Processing (NLP) applications to infer topics within unstructured textual data. Latent Dirichlet Allocation (LDA) is one of the most used topic modeling techniques that can automatically detect topics from a huge collection of text documents. However, the LDA-based topic models alone do not always provide promising results. Clustering is one of the effective unsupervised machine learning algorithms that are extensively used in applications including extracting information from unstructured textual data and topic modeling. A hybrid model of Bidirectional Encoder Representations from Transformers (BERT) and Latent Dirichlet Allocation (LDA) in topic modeling with clustering based on dimensionality reduction have been studied in detail. As the clustering algorithms are computationally complex, the complexity increases with the higher number of features, the PCA, t-SNE and UMAP based dimensionality reduction methods are also performed. Finally, a unified clustering-based framework using BERT and LDA is proposed as part of this study for mining a set of meaningful topics from the massive text corpora. The experiments are conducted to demonstrate the effectiveness of the cluster-informed topic modeling framework using BERT and LDA by simulating user input on benchmark datasets. The experimental results show that clustering with dimensionality reduction would help infer more coherent topics and hence this unified clustering and BERT-LDA based approach can be effectively utilized for building topic modeling applications.

主题建模是一种机器学习技术，广泛用于自然语言处理（NLP）应用程序，以推断非结构化文本数据中的主题。潜在狄利克雷分配（LDA）是最常用的主题建模技术之一，可以自动检测大量文本文档中的主题。然而，基于LDA的主题模型本身并不总是提供有希望的结果。聚类是一种有效的无监督机器学习算法，广泛应用于从非结构化文本数据中提取信息和主题建模等领域。详细研究了基于降维聚类的主题建模中来自变换器的双向编码器表示（BERT）和潜在狄利克雷分配（LDA）的混合模型。由于聚类算法计算复杂，复杂度随着特征数量的增加而增加，因此还执行了基于PCA、t-SNE和UMAP的降维方法。最后，提出了一个基于BERT和LDA的统一聚类框架，用于从海量文本语料库中挖掘一组有意义的主题。通过在基准数据集上模拟用户输入，实验证明了使用BERT和LDA的集群知情主题建模框架的有效性。实验结果表明，降维聚类有助于推断出更连贯的主题，因此这种统一的聚类和基于BERT-LDA的方法可以有效地用于构建主题建模应用程序。

{"title":"An integrated clustering and BERT framework for improved topic modeling.","authors":"Lijimol George, P Sumathy","doi":"10.1007/s41870-023-01268-w","DOIUrl":"10.1007/s41870-023-01268-w","url":null,"abstract":"Topic modelling is a machine learning technique that is extensively used in Natural Language Processing (NLP) applications to infer topics within unstructured textual data. Latent Dirichlet Allocation (LDA) is one of the most used topic modeling techniques that can automatically detect topics from a huge collection of text documents. However, the LDA-based topic models alone do not always provide promising results. Clustering is one of the effective unsupervised machine learning algorithms that are extensively used in applications including extracting information from unstructured textual data and topic modeling. A hybrid model of Bidirectional Encoder Representations from Transformers (BERT) and Latent Dirichlet Allocation (LDA) in topic modeling with clustering based on dimensionality reduction have been studied in detail. As the clustering algorithms are computationally complex, the complexity increases with the higher number of features, the PCA, t-SNE and UMAP based dimensionality reduction methods are also performed. Finally, a unified clustering-based framework using BERT and LDA is proposed as part of this study for mining a set of meaningful topics from the massive text corpora. The experiments are conducted to demonstrate the effectiveness of the cluster-informed topic modeling framework using BERT and LDA by simulating user input on benchmark datasets. The experimental results show that clustering with dimensionality reduction would help infer more coherent topics and hence this unified clustering and BERT-LDA based approach can be effectively utilized for building topic modeling applications.","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":"15 4","pages":"2187-2195"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10163298/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9554064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀