首页 > 最新文献

Artificial Intelligence, NLP , Data Science and Cloud Computing Technology最新文献

英文 中文
Evaluating The Performance of Feature Extraction Techniques Using Classification Techniques 利用分类技术评价特征提取技术的性能
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131402
Harshit Mittal
Dimensionality reduction techniques are widely used in machine learning to reduce the computational complexity of the model and improve its performance by identifying the most relevant features. In this research paper, we compare various dimensionality reduction techniques, including Principal Component Analysis(PCA), Independent Component Analysis(ICA), Local Linear Embedding(LLE), Local Binary Patterns(LBP), and Simple Autoencoder, on the Olivetti dataset, which is a popular benchmark dataset in the field of face recognition. We evaluate the performance of these dimensionality reduction techniques using various classification algorithms, including Support Vector Classifier (SVC), Linear Discriminant Analysis (LDA), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The goal of this research is to determine which combination of dimensionality reduction technique and classification algorithm is the most effective for the Olivetti dataset. Our research provides insights into the performance of various dimensionality reduction techniques and classification algorithms on the Olivetti dataset. These results can be useful in improving the performance of face recognition systems and other applications that deal with high-dimensional data.
降维技术广泛应用于机器学习中,通过识别最相关的特征来降低模型的计算复杂度并提高其性能。在本文中,我们比较了各种降维技术,包括主成分分析(PCA)、独立成分分析(ICA)、局部线性嵌入(LLE)、局部二值模式(LBP)和简单自编码器,在Olivetti数据集上,这是人脸识别领域的一个流行的基准数据集。我们使用各种分类算法来评估这些降维技术的性能,包括支持向量分类器(SVC)、线性判别分析(LDA)、逻辑回归(LR)、k近邻(KNN)和支持向量机(SVM)。本研究的目的是确定哪种降维技术和分类算法的组合对Olivetti数据集最有效。我们的研究提供了对各种降维技术和分类算法在Olivetti数据集上的性能的见解。这些结果可以用于提高人脸识别系统和其他处理高维数据的应用程序的性能。
{"title":"Evaluating The Performance of Feature Extraction Techniques Using Classification Techniques","authors":"Harshit Mittal","doi":"10.5121/csit.2023.131402","DOIUrl":"https://doi.org/10.5121/csit.2023.131402","url":null,"abstract":"Dimensionality reduction techniques are widely used in machine learning to reduce the computational complexity of the model and improve its performance by identifying the most relevant features. In this research paper, we compare various dimensionality reduction techniques, including Principal Component Analysis(PCA), Independent Component Analysis(ICA), Local Linear Embedding(LLE), Local Binary Patterns(LBP), and Simple Autoencoder, on the Olivetti dataset, which is a popular benchmark dataset in the field of face recognition. We evaluate the performance of these dimensionality reduction techniques using various classification algorithms, including Support Vector Classifier (SVC), Linear Discriminant Analysis (LDA), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The goal of this research is to determine which combination of dimensionality reduction technique and classification algorithm is the most effective for the Olivetti dataset. Our research provides insights into the performance of various dimensionality reduction techniques and classification algorithms on the Olivetti dataset. These results can be useful in improving the performance of face recognition systems and other applications that deal with high-dimensional data.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123468223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drift Detection in Models Applied to the Recognition of Intentions in Short Sentences Using Convolutional Neural Networks for Classification 基于卷积神经网络的短句子意图识别模型中的漂移检测
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131404
Jairo R. Junior, Leandro A Silva
Significant advancements have been achieved in natural language processing models for text classification with the emergence of pre-trained transformers and deep learning. Despite promising results, deploying these models in production environments still faces challenges. Classification models are continuously evolving, adapting to new data and predictions. However, changes in data distribution over time can lead to a decline in performance, indicating that the model is outdated. This article aims to analyze the lifecycle of a natural language processing model by employing multivariate statistical methods capable of detecting model drift over time. These methods can be integrated into the training and workflow management of machine learning models. Preliminary results show that the statistical method Maximum Mean Discrepancy performs better in detecting drift in models trained with data from multiple domains through high-dimensional vector spaces after being subjected to an untrained auto-encoder. The classifier model achieved an accuracy rate of 93% in predicting intentions, using accuracy as the evaluation metric.
随着预训练变形器和深度学习的出现,用于文本分类的自然语言处理模型取得了重大进展。尽管结果令人鼓舞,但在生产环境中部署这些模型仍然面临挑战。分类模型不断发展,以适应新的数据和预测。但是,随着时间的推移,数据分布的变化可能导致性能下降,这表明该模型已经过时。本文旨在通过采用能够检测模型随时间漂移的多元统计方法来分析自然语言处理模型的生命周期。这些方法可以集成到机器学习模型的训练和工作流管理中。初步结果表明,在未经训练的自编码器作用下,统计方法Maximum Mean difference可以更好地检测由多域数据通过高维向量空间训练的模型的漂移。该分类器模型以准确率作为评价指标,在预测意图方面达到了93%的准确率。
{"title":"Drift Detection in Models Applied to the Recognition of Intentions in Short Sentences Using Convolutional Neural Networks for Classification","authors":"Jairo R. Junior, Leandro A Silva","doi":"10.5121/csit.2023.131404","DOIUrl":"https://doi.org/10.5121/csit.2023.131404","url":null,"abstract":"Significant advancements have been achieved in natural language processing models for text classification with the emergence of pre-trained transformers and deep learning. Despite promising results, deploying these models in production environments still faces challenges. Classification models are continuously evolving, adapting to new data and predictions. However, changes in data distribution over time can lead to a decline in performance, indicating that the model is outdated. This article aims to analyze the lifecycle of a natural language processing model by employing multivariate statistical methods capable of detecting model drift over time. These methods can be integrated into the training and workflow management of machine learning models. Preliminary results show that the statistical method Maximum Mean Discrepancy performs better in detecting drift in models trained with data from multiple domains through high-dimensional vector spaces after being subjected to an untrained auto-encoder. The classifier model achieved an accuracy rate of 93% in predicting intentions, using accuracy as the evaluation metric.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129058899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subverting Two Character Stereotypes at Once: Exploring AI's Role in Subverting Stereotypes 同时颠覆两种角色刻板印象:探索AI在颠覆刻板印象中的作用
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131401
Xiaohan Feng, Makoto Murakami
The Aim of this paper is to explore different ways of using AI to subvert stereotypes more efficiently and effectively. It will also enumerate the advantages and disadvantages of each approach, helping creators select the most appropriate method for their specific situations. AI opens up new possibilities, enabling anyone to effortlessly generate visually stunning images without the need for artistic skills. However, it also leads to the creation of more stereotypes when using large amounts of data. Consequently, stereotypes are becoming more prevalent and serious than ever before. Our belief is that we can use this situation in reverse, aiming to summarize stereotypes with AI and then subvert them through elemental exchange. In this study, we have attempted to develop a less time-consuming method to challenge character stereotypes while embracing the concept of "exchange." We selected two character archetypes, namely the "tyrant" and the "mad scientist," and summarized their stereotypes by generating AI images or asking ChatGPT questions. Additionally, we conducted a survey of real historical tyrants to gain insights into their behavior and characteristics. This step helped us comprehend the reasons behind stereotyping in artwork depicting tyrants. Based on this understanding, we made choices about which stereotypes to retain. The intention was to empower the audience to better evaluate the identity of the character. Finally, the two remaining character stereotypes were exchanged, and the design was completed. This paper documents the last and most time-consuming method. By examining a large number of sources and examining what stereotypical influences were used, we were able to achieve a greater effect of subverting stereotypes. The other method is much less time-consuming but somewhat more random. Whether one chooses by subjective experience or by the most frequent choices, there is no guarantee of the best outcome. In other words, it is the one that best guarantees that the audience will be able to quickly identify the original character and at the same time move the two characters the furthest away from the original stereotypical image of the original. In conclusion, if the designer has sufficient time, ai portrait + research or chatGPT + research can be chosen. If there is not enough time, the remaining methods can be chosen. The remaining methods take less time and the designer can try them all to get the desired result.
本文的目的是探索使用人工智能更有效地颠覆刻板印象的不同方法。它还将列举每种方法的优点和缺点,帮助创建者根据他们的具体情况选择最合适的方法。人工智能开辟了新的可能性,使任何人都可以毫不费力地生成视觉上令人惊叹的图像,而无需艺术技能。然而,当使用大量数据时,它也会导致创建更多的构造型。因此,陈规定型观念比以往任何时候都更加普遍和严重。我们的信念是,我们可以反过来利用这种情况,旨在用人工智能总结刻板印象,然后通过元素交换颠覆它们。在这项研究中,我们试图开发一种更省时的方法,在接受“交换”概念的同时挑战角色刻板印象。我们选择了两个角色原型,即“暴君”和“疯狂科学家”,并通过生成AI图像或向ChatGPT提问来总结他们的刻板印象。此外,我们对历史上真实的暴君进行了调查,以深入了解他们的行为和特征。这一步帮助我们理解了在描绘暴君的艺术作品中刻板印象背后的原因。基于这种理解,我们选择保留哪些刻板印象。这样做的目的是为了让观众更好地评价角色的身份。最后将剩下的两种人物定型进行交换,完成设计。本文记录了最后一种也是最耗时的方法。通过检查大量的资源和检查使用了哪些刻板印象的影响,我们能够实现颠覆刻板印象的更大效果。另一种方法耗时少得多,但在某种程度上更具随机性。无论一个人是根据主观经验还是根据最频繁的选择来选择,都不能保证最好的结果。换句话说,它是最能保证观众能够快速识别出原始角色,同时使两个角色离原始的刻板印象最远的角色。综上所述,如果设计师有足够的时间,可以选择ai portrait + research或者chatGPT + research。如果时间不够,可以选择其余的方法。剩下的方法花费的时间更少,设计师可以尝试所有方法来获得想要的结果。
{"title":"Subverting Two Character Stereotypes at Once: Exploring AI's Role in Subverting Stereotypes","authors":"Xiaohan Feng, Makoto Murakami","doi":"10.5121/csit.2023.131401","DOIUrl":"https://doi.org/10.5121/csit.2023.131401","url":null,"abstract":"The Aim of this paper is to explore different ways of using AI to subvert stereotypes more efficiently and effectively. It will also enumerate the advantages and disadvantages of each approach, helping creators select the most appropriate method for their specific situations. AI opens up new possibilities, enabling anyone to effortlessly generate visually stunning images without the need for artistic skills. However, it also leads to the creation of more stereotypes when using large amounts of data. Consequently, stereotypes are becoming more prevalent and serious than ever before. Our belief is that we can use this situation in reverse, aiming to summarize stereotypes with AI and then subvert them through elemental exchange. In this study, we have attempted to develop a less time-consuming method to challenge character stereotypes while embracing the concept of \"exchange.\" We selected two character archetypes, namely the \"tyrant\" and the \"mad scientist,\" and summarized their stereotypes by generating AI images or asking ChatGPT questions. Additionally, we conducted a survey of real historical tyrants to gain insights into their behavior and characteristics. This step helped us comprehend the reasons behind stereotyping in artwork depicting tyrants. Based on this understanding, we made choices about which stereotypes to retain. The intention was to empower the audience to better evaluate the identity of the character. Finally, the two remaining character stereotypes were exchanged, and the design was completed. This paper documents the last and most time-consuming method. By examining a large number of sources and examining what stereotypical influences were used, we were able to achieve a greater effect of subverting stereotypes. The other method is much less time-consuming but somewhat more random. Whether one chooses by subjective experience or by the most frequent choices, there is no guarantee of the best outcome. In other words, it is the one that best guarantees that the audience will be able to quickly identify the original character and at the same time move the two characters the furthest away from the original stereotypical image of the original. In conclusion, if the designer has sufficient time, ai portrait + research or chatGPT + research can be chosen. If there is not enough time, the remaining methods can be chosen. The remaining methods take less time and the designer can try them all to get the desired result.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116740975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chunker Based Sentiment Analysis for Nepali Text 基于分块器的尼泊尔语文本情感分析
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131406
A. Yajnik, Sabu Lama Tamang
The article represents the Sentiment Analysis (SA) of a Nepali sentence. Skip-gram model is used for the word to vector encoding. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 score of 0.6779 is observedfor positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences.
本文对一个尼泊尔语句子进行情感分析。Skip-gram模型用于字到向量的编码。在第一个实验中,使用Skip-gram模型生成每个句子的向量表示,然后使用多层感知器(multilayer Perceptron, MLP)分类,观察到正负分类的F1得分为0.6486,总体准确率为68%。而在第二个实验中,使用尼泊尔语解析器提取动词语块,并对动词语块进行了类似的实验。正负分类的F1得分为0.6779,总体准确率为85%。因此,基于Chunker的情感分析被证明比使用句子的情感分析更好。
{"title":"Chunker Based Sentiment Analysis for Nepali Text","authors":"A. Yajnik, Sabu Lama Tamang","doi":"10.5121/csit.2023.131406","DOIUrl":"https://doi.org/10.5121/csit.2023.131406","url":null,"abstract":"The article represents the Sentiment Analysis (SA) of a Nepali sentence. Skip-gram model is used for the word to vector encoding. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 score of 0.6779 is observedfor positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122126387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling the Power of TAG Using Statistical Parsing for Natural Languages 利用自然语言的统计解析揭示TAG的力量
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131407
Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar, Lenali Singh, Ajai Kumar
The Revolution of the Artificial Intelligence (AI) has started when machines could decipher enigmatic symbols concealed within messages. Subsequently, with the progress of Natural Language Processing (NLP), machines attained the capacity to understand and comprehend human language. Tree Adjoining Grammar (TAG) has become powerful grammatical formalism for processing Large-scale Grammar. However, TAG mostly rely on Grammar which is created by Languages expert and due to structural ambiguity in Natural Languages computation complexity of TAG is very high o(n^6). We observed that rules-based approach has many serious flaws, firstly, language evolves with time and it is impossible to create grammar which is extensive enough to represent every structure of language in real world. Secondly, it takes too much time and language resources to develop a practical solution. These difficulties motivated us to explore an alternative approach instead of completely rely on the rule-based method. In this paper, we proposed a Statistical Parsing algorithm for Natural Languages (NL) using TAG formalism where Parser makes crucial use of data driven model for identifying Syntactic dependencies of complex structure. We observed that using probabilistic model along with limited training data can significantly improve both the quality and performance of TAG Parser. We also demonstrate that the newer parser outperforms previous rule-based parser on given sample corpus. Our experiment for many Indian Languages, also provides further support for the claim that above mentioned approach might be an awaiting solution for problem that require rich structural analysis of corpus and constructing syntactic dependencies of any Natural Language without much depending on manual process of creating grammar for same. Finally, we present result of our on-going research where probability model will be applying to appropriate selection of adjunction of any given node of elementary trees and state chart representations are shared across derivation.
当机器能够破译隐藏在信息中的神秘符号时,人工智能(AI)的革命就开始了。随后,随着自然语言处理(NLP)的发展,机器获得了理解和理解人类语言的能力。树相邻语法(TAG)已成为处理大规模语法的有力语法形式。然而,TAG主要依赖于语言专家创建的语法,并且由于自然语言的结构歧义,TAG的计算复杂度非常高(n^6)。我们观察到基于规则的方法有很多严重的缺陷,首先,语言是随着时间的推移而进化的,不可能创造出足够广泛的语法来代表现实世界中语言的每一个结构。其次,开发一个实用的解决方案需要花费太多的时间和语言资源。这些困难促使我们探索一种替代方法,而不是完全依赖基于规则的方法。本文提出了一种基于TAG形式的自然语言统计解析算法,其中Parser充分利用数据驱动模型来识别复杂结构的句法依赖关系。我们观察到,使用概率模型和有限的训练数据可以显著提高TAG Parser的质量和性能。我们还证明,在给定的样本语料库上,新的解析器优于以前的基于规则的解析器。我们对许多印度语言的实验也进一步支持了上述方法可能是一个等待解决的问题,这些问题需要对语料库进行丰富的结构分析,并构建任何自然语言的句法依赖关系,而不太依赖于手动创建语法的过程。最后,我们介绍了我们正在进行的研究结果,其中概率模型将应用于适当选择初等树的任何给定节点的附加,并且状态图表示在推导过程中共享。
{"title":"Unveiling the Power of TAG Using Statistical Parsing for Natural Languages","authors":"Pavan Kurariya, Prashant Chaudhary, Jahnavi Bodhankar, Lenali Singh, Ajai Kumar","doi":"10.5121/csit.2023.131407","DOIUrl":"https://doi.org/10.5121/csit.2023.131407","url":null,"abstract":"The Revolution of the Artificial Intelligence (AI) has started when machines could decipher enigmatic symbols concealed within messages. Subsequently, with the progress of Natural Language Processing (NLP), machines attained the capacity to understand and comprehend human language. Tree Adjoining Grammar (TAG) has become powerful grammatical formalism for processing Large-scale Grammar. However, TAG mostly rely on Grammar which is created by Languages expert and due to structural ambiguity in Natural Languages computation complexity of TAG is very high o(n^6). We observed that rules-based approach has many serious flaws, firstly, language evolves with time and it is impossible to create grammar which is extensive enough to represent every structure of language in real world. Secondly, it takes too much time and language resources to develop a practical solution. These difficulties motivated us to explore an alternative approach instead of completely rely on the rule-based method. In this paper, we proposed a Statistical Parsing algorithm for Natural Languages (NL) using TAG formalism where Parser makes crucial use of data driven model for identifying Syntactic dependencies of complex structure. We observed that using probabilistic model along with limited training data can significantly improve both the quality and performance of TAG Parser. We also demonstrate that the newer parser outperforms previous rule-based parser on given sample corpus. Our experiment for many Indian Languages, also provides further support for the claim that above mentioned approach might be an awaiting solution for problem that require rich structural analysis of corpus and constructing syntactic dependencies of any Natural Language without much depending on manual process of creating grammar for same. Finally, we present result of our on-going research where probability model will be applying to appropriate selection of adjunction of any given node of elementary trees and state chart representations are shared across derivation.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133283994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What's in a Domain? Anaylsis of URL Features 域中有什么?URL特性分析
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131409
John Hawkins
Many data science problems require processing log data derived from web pages, apis or other internet traffic sources. URLs are one of the few ubiquitous data fields that describe internet activity, hence they require effective processing for a wide variety of machine learning applications. While URLs are structurally rich, the structure can be both domain specific and subject to change over time, making feature engineering for URLs an ongoing challenge. In this research we outline the key structural components of URLs and discuss the information available within each. We describe methods for generating features on these URL components and share an open source implementation of these ideas. In addition, we describe a method for exploring URL feature importance that allows for comparison and analysis of the information available inside URLs. We experiment with a collection of URL classification datasets and demonstrate the utility of these tools. Package and source code is open on https://pypi.org/project/url2features.
许多数据科学问题需要处理来自网页、api或其他互联网流量源的日志数据。url是描述互联网活动的少数普遍存在的数据字段之一,因此它们需要为各种各样的机器学习应用程序进行有效处理。虽然url的结构丰富,但结构可能是特定于域的,并且会随着时间的推移而变化,这使得url的特征工程成为一个持续的挑战。在本研究中,我们概述了url的关键结构组件,并讨论了每个组件中可用的信息。我们描述了在这些URL组件上生成特性的方法,并分享了这些思想的开源实现。此外,我们描述一个方法探索URL功能重要性,允许信息的比较和分析内部URL。我们对一组URL分类数据集进行了实验,并演示了这些工具的实用性。包和源代码在https://pypi.org/project/url2features上打开。
{"title":"What's in a Domain? Anaylsis of URL Features","authors":"John Hawkins","doi":"10.5121/csit.2023.131409","DOIUrl":"https://doi.org/10.5121/csit.2023.131409","url":null,"abstract":"Many data science problems require processing log data derived from web pages, apis or other internet traffic sources. URLs are one of the few ubiquitous data fields that describe internet activity, hence they require effective processing for a wide variety of machine learning applications. While URLs are structurally rich, the structure can be both domain specific and subject to change over time, making feature engineering for URLs an ongoing challenge. In this research we outline the key structural components of URLs and discuss the information available within each. We describe methods for generating features on these URL components and share an open source implementation of these ideas. In addition, we describe a method for exploring URL feature importance that allows for comparison and analysis of the information available inside URLs. We experiment with a collection of URL classification datasets and demonstrate the utility of these tools. Package and source code is open on https://pypi.org/project/url2features.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121985217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Umeed: VR Game Using NLP Models and Latent Semantic Analysis for Conversation Therapy for People with Speech Disorders 使用NLP模型和潜在语义分析的VR游戏对言语障碍患者的会话治疗
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131408
Umeed VR Game
UmeedVR aims to create a conversational therapy VR game using natural language processing for patients with Speech Disorders like Autism or Aphasia. This study developed 5 psychological task sets and 3 environments via Maya and Unity. The Topic-Modeling AI, employing 25 live participants' recordings and 980+ TwineAI datasets, generated initial VR grading with a coherence score averaging 6.98 themes in 5-minute conversations across scenarios, forming a foundation for enhancements. Employing latent semantic analysis (gensimcorpus Python) and Term-Frequency-Inverse Document-Frequency (TF-IDF), grammatical errors and user-specific improvements were addressed. Results were visualized via audio-visual plots, highlighting conversation topics based on occurrence and interpretability. UMEED enhances cognitive and intuitive skills, elevating average topics from 6.98 to 13.56 in a 5- minute conversation with a 143.12 coherence score. LSA achieved 98.39% accuracy, topic modeling 100%. Significantly, real-time grammatical correction integration in the game was realized.
UmeedVR旨在为患有自闭症或失语症等语言障碍的患者创建一款使用自然语言处理的会话治疗VR游戏。该研究通过Maya和Unity开发了5个心理任务集和3个环境。主题建模AI采用25名现场参与者的录音和980多个TwineAI数据集,生成了初始的VR评分,在不同场景的5分钟对话中,一致性得分平均为6.98个主题,为增强奠定了基础。使用潜在语义分析(gensimcorpus Python)和术语-频率-逆文档-频率(TF-IDF),解决了语法错误和用户特定的改进。结果通过视听情节可视化,根据发生和可解释性突出对话主题。UMEED提高了认知和直觉技能,在5分钟的对话中将平均话题从6.98提升到13.56,连贯性得分为143.12。LSA的准确率达到98.39%,主题建模达到100%。重要的是,实现了游戏中实时语法纠错的整合。
{"title":"Umeed: VR Game Using NLP Models and Latent Semantic Analysis for Conversation Therapy for People with Speech Disorders","authors":"Umeed VR Game","doi":"10.5121/csit.2023.131408","DOIUrl":"https://doi.org/10.5121/csit.2023.131408","url":null,"abstract":"UmeedVR aims to create a conversational therapy VR game using natural language processing for patients with Speech Disorders like Autism or Aphasia. This study developed 5 psychological task sets and 3 environments via Maya and Unity. The Topic-Modeling AI, employing 25 live participants' recordings and 980+ TwineAI datasets, generated initial VR grading with a coherence score averaging 6.98 themes in 5-minute conversations across scenarios, forming a foundation for enhancements. Employing latent semantic analysis (gensimcorpus Python) and Term-Frequency-Inverse Document-Frequency (TF-IDF), grammatical errors and user-specific improvements were addressed. Results were visualized via audio-visual plots, highlighting conversation topics based on occurrence and interpretability. UMEED enhances cognitive and intuitive skills, elevating average topics from 6.98 to 13.56 in a 5- minute conversation with a 143.12 coherence score. LSA achieved 98.39% accuracy, topic modeling 100%. Significantly, real-time grammatical correction integration in the game was realized.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121063873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kidney CT Image Analysis Using CNN 基于CNN的肾脏CT图像分析
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131403
Harshit Mittal
Medical image analysis is a vital component of modern medical practice, and the accuracy of such analysis is critical for accurate diagnosis and treatment. Computed tomography (CT) scans are commonly used to visualize the kidneys and identify abnormalities such as cysts, tumors, and stones. Manual interpretation of CT images can be time-consuming and subject to human error, leading to inaccurate diagnosis and treatment. Deep learning models based on Convolutional Neural Networks (CNNs) have shown promise in improving the accuracy and speed of medical image analysis. In this study, we present a CNN-based model to accurately classify CT images of the kidney into four categories: Normal, Cyst, Tumor, and Stone, using the CT KIDNEY DATASET. The proposed CNN model achieved an accuracy of 99.84% on the test set, with a precision of 0.9964, a recall of 0.9986, and a F1-score of 0.9975 for all categories. The model was able to accurately classify all images in the test set, indicating its high accuracy in identifying abnormalities in CT images of the kidney. The results of this study demonstrate the potential of deep learning models based on CNNs in accurately classifying CT images of the kidney, which could lead to improved diagnosis and treatment outcomes for patients. This study contributes to the growing body of literature on the use of deep learning models in medical image analysis, highlighting the potential of these models in improving the accuracy and efficiency of medical diagnosis.
医学图像分析是现代医学实践的重要组成部分,这种分析的准确性对准确的诊断和治疗至关重要。计算机断层扫描(CT)通常用于可视化肾脏和识别异常,如囊肿,肿瘤和结石。人工解读CT图像既耗时又容易出现人为错误,导致不准确的诊断和治疗。基于卷积神经网络(cnn)的深度学习模型在提高医学图像分析的准确性和速度方面显示出了希望。在这项研究中,我们提出了一个基于cnn的模型,使用CT肾脏数据集将肾脏的CT图像准确地分为四类:正常、囊肿、肿瘤和结石。本文提出的CNN模型在测试集上的准确率为99.84%,其中精密度为0.9964,召回率为0.9986,所有类别的f1分数为0.9975。该模型能够准确地对测试集中的所有图像进行分类,表明该模型在识别肾脏CT图像异常方面具有较高的准确性。本研究的结果证明了基于cnn的深度学习模型在准确分类肾脏CT图像方面的潜力,这可能会改善患者的诊断和治疗结果。这项研究促进了在医学图像分析中使用深度学习模型的文献的增长,突出了这些模型在提高医学诊断的准确性和效率方面的潜力。
{"title":"Kidney CT Image Analysis Using CNN","authors":"Harshit Mittal","doi":"10.5121/csit.2023.131403","DOIUrl":"https://doi.org/10.5121/csit.2023.131403","url":null,"abstract":"Medical image analysis is a vital component of modern medical practice, and the accuracy of such analysis is critical for accurate diagnosis and treatment. Computed tomography (CT) scans are commonly used to visualize the kidneys and identify abnormalities such as cysts, tumors, and stones. Manual interpretation of CT images can be time-consuming and subject to human error, leading to inaccurate diagnosis and treatment. Deep learning models based on Convolutional Neural Networks (CNNs) have shown promise in improving the accuracy and speed of medical image analysis. In this study, we present a CNN-based model to accurately classify CT images of the kidney into four categories: Normal, Cyst, Tumor, and Stone, using the CT KIDNEY DATASET. The proposed CNN model achieved an accuracy of 99.84% on the test set, with a precision of 0.9964, a recall of 0.9986, and a F1-score of 0.9975 for all categories. The model was able to accurately classify all images in the test set, indicating its high accuracy in identifying abnormalities in CT images of the kidney. The results of this study demonstrate the potential of deep learning models based on CNNs in accurately classifying CT images of the kidney, which could lead to improved diagnosis and treatment outcomes for patients. This study contributes to the growing body of literature on the use of deep learning models in medical image analysis, highlighting the potential of these models in improving the accuracy and efficiency of medical diagnosis.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"04 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127207697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Brands, Verticals and Contexts: Coherence Patterns in Consumer Attention 品牌、垂直市场和语境:消费者注意力的一致性模式
Pub Date : 2023-08-19 DOI: 10.5121/csit.2023.131410
John Hawkins
Consumers are expected to partially reveal their preferences and interests through the media they consume. The development of visual attention measurement with eye tracking technologies allows us to investigate the consistency of these preferences across the creative executions of a given brand and over all brands within a given vertical. In this study we use a large-scale attention measurement dataset to analyse a collection of digital display advertising impressions across a variety of industry verti- cals. We evaluate the extent to which the high attention contexts for a given brand’s ads remain consistent for that brand, and the extent to which those contexts remain consistent across many brands within an industry vertical. The results illustrate that consumer attention on advertising can vary significantly across creatives for a specific brand, and across a vertical. Nevertheless, there are coherence effects across campaigns that are stronger than random, and that contain actionable information at the level of industry vertical categorisation.
消费者会通过消费的媒介部分地揭示自己的喜好和兴趣。通过眼动追踪技术的视觉注意力测量的发展使我们能够调查这些偏好在给定品牌的创意执行中以及在给定垂直范围内的所有品牌中的一致性。在这项研究中,我们使用大规模的注意力测量数据集来分析各种行业垂直行业的数字展示广告印象。我们评估特定品牌广告的高关注度背景在多大程度上与该品牌保持一致,以及这些背景在垂直行业内许多品牌之间保持一致的程度。结果表明,消费者对广告的关注在特定品牌的创意和垂直领域之间存在显著差异。然而,跨活动的一致性效应比随机效应更强,并且在行业垂直分类层面包含可操作的信息。
{"title":"Brands, Verticals and Contexts: Coherence Patterns in Consumer Attention","authors":"John Hawkins","doi":"10.5121/csit.2023.131410","DOIUrl":"https://doi.org/10.5121/csit.2023.131410","url":null,"abstract":"Consumers are expected to partially reveal their preferences and interests through the media they consume. The development of visual attention measurement with eye tracking technologies allows us to investigate the consistency of these preferences across the creative executions of a given brand and over all brands within a given vertical. In this study we use a large-scale attention measurement dataset to analyse a collection of digital display advertising impressions across a variety of industry verti- cals. We evaluate the extent to which the high attention contexts for a given brand’s ads remain consistent for that brand, and the extent to which those contexts remain consistent across many brands within an industry vertical. The results illustrate that consumer attention on advertising can vary significantly across creatives for a specific brand, and across a vertical. Nevertheless, there are coherence effects across campaigns that are stronger than random, and that contain actionable information at the level of industry vertical categorisation.","PeriodicalId":430291,"journal":{"name":"Artificial Intelligence, NLP , Data Science and Cloud Computing Technology","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131246473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Artificial Intelligence, NLP , Data Science and Cloud Computing Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1