首页 > 最新文献

Annals of Data Science最新文献

英文 中文
Identifying the Intents Behind Website Visits by Employing Unsupervised Machine Learning Models
Q1 Decision Sciences Pub Date : 2025-01-09 DOI: 10.1007/s40745-024-00586-5
Judah Soobramoney, Retius Chifurira, Temesgen Zewotir, Knowledge Chinhamu

With digitisation globally on the rise, corporates are compelled to better understand the usage of their websites. In doing so, corporates will be empowered to better understand consumers, and make necessary adjustments to ultimately improve the corporate’s stance in the competitive global landscape of this modern age. However, the online website visit data has proven to be highly complex, big in data volume, and highly transactional with users expressing unique behaviours. Thus, extracting insight can be a complex problem to solve. This study aimed to employ unsupervised machine learning models to identify the intentions behind the visits on the observed website. The data studied was sourced from the Google Analytics tracking tool that was deployed on a corporate informative website. The study employed a k-means, hierarchical and dbscan unsupervised machine learning models to understand the intents behind visitors on the studied website. All three models detected five major intents that were expressed within the observed data. The intents identified were labelled as “accidentals”, “drop-offs”, “engrossed”, “get-in-touch” and “seekers”. On the observed data, all three unsupervised machine learning methods have performed well. However, in the context of the study, which investigated the intents that drove online visits, the hierarchical clustering method yielded superior results by maintaining the best balance between cluster homogeneity (stronger silhouette coefficients) and cluster size.

随着全球数字化程度的不断提高,企业不得不更好地了解其网站的使用情况。这样,企业就能更好地了解消费者,并做出必要的调整,最终提高企业在现代全球竞争格局中的地位。然而,事实证明,在线网站访问数据高度复杂、数据量大、交易性强,用户表现出独特的行为。因此,提取洞察力是一个复杂的问题。本研究旨在采用无监督机器学习模型来识别所观察网站访问背后的意图。所研究的数据来自部署在一个企业信息网站上的 Google Analytics 跟踪工具。研究采用了 k-means、分层和 dbscan 无监督机器学习模型来了解网站访问者背后的意图。这三种模型都检测到了观察数据中表达的五种主要意图。被识别的意图被标记为 "偶然"、"放弃"、"沉迷"、"接触 "和 "寻求"。在观测数据上,三种无监督机器学习方法都表现出色。不过,在调查在线访问意图的研究中,分层聚类方法通过在聚类同质性(更强的剪影系数)和聚类规模之间保持最佳平衡,取得了更优越的结果。
{"title":"Identifying the Intents Behind Website Visits by Employing Unsupervised Machine Learning Models","authors":"Judah Soobramoney,&nbsp;Retius Chifurira,&nbsp;Temesgen Zewotir,&nbsp;Knowledge Chinhamu","doi":"10.1007/s40745-024-00586-5","DOIUrl":"10.1007/s40745-024-00586-5","url":null,"abstract":"<div><p>With digitisation globally on the rise, corporates are compelled to better understand the usage of their websites. In doing so, corporates will be empowered to better understand consumers, and make necessary adjustments to ultimately improve the corporate’s stance in the competitive global landscape of this modern age. However, the online website visit data has proven to be highly complex, big in data volume, and highly transactional with users expressing unique behaviours. Thus, extracting insight can be a complex problem to solve. This study aimed to employ unsupervised machine learning models to identify the intentions behind the visits on the observed website. The data studied was sourced from the Google Analytics tracking tool that was deployed on a corporate informative website. The study employed a k-means, hierarchical and dbscan unsupervised machine learning models to understand the intents behind visitors on the studied website. All three models detected five major intents that were expressed within the observed data. The intents identified were labelled as “accidentals”, “drop-offs”, “engrossed”, “get-in-touch” and “seekers”. On the observed data, all three unsupervised machine learning methods have performed well. However, in the context of the study, which investigated the intents that drove online visits, the hierarchical clustering method yielded superior results by maintaining the best balance between cluster homogeneity (stronger silhouette coefficients) and cluster size.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"413 - 437"},"PeriodicalIF":0.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-024-00586-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Finite Mixture Model Based on the Generalized t Distributions with Two-Sided Censored Data
Q1 Decision Sciences Pub Date : 2024-09-25 DOI: 10.1007/s40745-024-00572-x
Ruijie Guan, Yaohua Rong, Weihu Cheng, Zhenyu Xin

In light of the rapid technological advancements witnessed in recent decades, numerous disciplines have been inundated with voluminous datasets characterized by multimodality, heavy-tailed distributions, and prevalent missing information. Consequently, the task of effectively modeling such intricate data poses a formidable yet indispensable challenge. This paper endeavors to address this challenge by introducing a novel finite mixture model predicated upon the generalized t distribution, tailored specifically to accommodate two-sided censored observations, thereby establishing a foundational framework for modeling this complex data structure. To facilitate parameter estimation within this model, we devise a variant of the EM-type algorithm, amalgamating the profile likelihood approach with the classical Expectation Conditional Maximization algorithm. Notably, this hybridized methodology affords analytical expressions in the E-step and a tractable M-step, thereby substantially enhancing computational expediency and efficiency. Furthermore, we furnish closed-form expressions delineating the observed information matrix, pivotal for approximating the asymptotic covariance matrix of the MLEs within this mixture model. To empirically evaluate the efficacy of the proposed algorithm, a series of simulation studies are conducted, demonstrating promising performance across various artificial datasets. Additionally, the practical applicability of the proposed methodology is elucidated through its deployment on two real-world datasets, thereby underscoring its feasibility and utility in practical settings.

{"title":"A Novel Finite Mixture Model Based on the Generalized t Distributions with Two-Sided Censored Data","authors":"Ruijie Guan,&nbsp;Yaohua Rong,&nbsp;Weihu Cheng,&nbsp;Zhenyu Xin","doi":"10.1007/s40745-024-00572-x","DOIUrl":"10.1007/s40745-024-00572-x","url":null,"abstract":"<div><p>In light of the rapid technological advancements witnessed in recent decades, numerous disciplines have been inundated with voluminous datasets characterized by multimodality, heavy-tailed distributions, and prevalent missing information. Consequently, the task of effectively modeling such intricate data poses a formidable yet indispensable challenge. This paper endeavors to address this challenge by introducing a novel finite mixture model predicated upon the generalized <i>t</i> distribution, tailored specifically to accommodate two-sided censored observations, thereby establishing a foundational framework for modeling this complex data structure. To facilitate parameter estimation within this model, we devise a variant of the EM-type algorithm, amalgamating the profile likelihood approach with the classical Expectation Conditional Maximization algorithm. Notably, this hybridized methodology affords analytical expressions in the E-step and a tractable M-step, thereby substantially enhancing computational expediency and efficiency. Furthermore, we furnish closed-form expressions delineating the observed information matrix, pivotal for approximating the asymptotic covariance matrix of the MLEs within this mixture model. To empirically evaluate the efficacy of the proposed algorithm, a series of simulation studies are conducted, demonstrating promising performance across various artificial datasets. Additionally, the practical applicability of the proposed methodology is elucidated through its deployment on two real-world datasets, thereby underscoring its feasibility and utility in practical settings.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"341 - 379"},"PeriodicalIF":0.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gated Graph Attention-based Crossover Snake (GGA-CS) Algorithm for Hyperspectral Image Classification
Q1 Decision Sciences Pub Date : 2024-08-20 DOI: 10.1007/s40745-024-00567-8
R. Ablin, G. Prabin

Hyperspectral image classification involves assigning pixels or regions within a hyperspectral image to specific classes or categories based on the spectral information captured across multiple bands. Traditional method faces several challenges such as High Dimensionality, Scalability, Spectral Variability, as well as Limited Contextual Information. Hence to solve these issues a novel Gated Graph Attention-based Crossover Snake (GGA-CS) algorithm is proposed for classifying hyperspectral images. In this work, a Graph Neural Network (GNN) is employed to capture both spectral and spatial relationships between pixels, and a gated attention mechanism is utilized to enhance specific spectral bands. After the training process, a crossover-based snake optimization is applied that tuned the parameter and obtain classification output of GNN and adjust the pixels to enhance the performances of GGA-CS method. The study is validated on diverse datasets namely the Indian Pines dataset, the University of Pavia dataset, as well as Salinas dataset. The evaluation of the GGA-CS method’s performance includes assessing its effectiveness using key metrics. Comparisons with state-of-the-art methods are conducted to gauge its efficacy in hyperspectral image classification, as demonstrated by experimental results.

{"title":"Gated Graph Attention-based Crossover Snake (GGA-CS) Algorithm for Hyperspectral Image Classification","authors":"R. Ablin,&nbsp;G. Prabin","doi":"10.1007/s40745-024-00567-8","DOIUrl":"10.1007/s40745-024-00567-8","url":null,"abstract":"<div><p>Hyperspectral image classification involves assigning pixels or regions within a hyperspectral image to specific classes or categories based on the spectral information captured across multiple bands. Traditional method faces several challenges such as High Dimensionality, Scalability, Spectral Variability, as well as Limited Contextual Information. Hence to solve these issues a novel Gated Graph Attention-based Crossover Snake (GGA-CS) algorithm is proposed for classifying hyperspectral images. In this work, a Graph Neural Network (GNN) is employed to capture both spectral and spatial relationships between pixels, and a gated attention mechanism is utilized to enhance specific spectral bands. After the training process, a crossover-based snake optimization is applied that tuned the parameter and obtain classification output of GNN and adjust the pixels to enhance the performances of GGA-CS method. The study is validated on diverse datasets namely the Indian Pines dataset, the University of Pavia dataset, as well as Salinas dataset. The evaluation of the GGA-CS method’s performance includes assessing its effectiveness using key metrics. Comparisons with state-of-the-art methods are conducted to gauge its efficacy in hyperspectral image classification, as demonstrated by experimental results.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"281 - 305"},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel-free Reduced Quadratic Surface Support Vector Machine with 0-1 Loss Function and L(_p)-norm Regularization
Q1 Decision Sciences Pub Date : 2024-08-19 DOI: 10.1007/s40745-024-00573-w
Mingyang Wu, Zhixia Yang

This paper presents a novel nonlinear binary classification method, namely the kernel-free reduced quadratic surface support vector machine with 0-1 loss function and L(_{p})-norm regularization (L(_p)-RQSSVM(_{0/1})). It uses kernel-free trick aimed at finding a reduced quadratic surface to separate samples, without considering the cross terms in quadratic form. This saves computational costs and provides better interpretability than methods using kernel functions. In addition, adding the 0-1 loss function and L(_p)-norm regularization to construct our L(_p)-RQSSVM(_{0/1}) enables sample sparsity and feature sparsity. The support vector (SV) of L(_p)-RQSSVM(_{0/1}) is defined, and it is derived that all SVs fall on the support hypersurfaces. Moreover, the optimality condition is explored theoretically, and a new iterative algorithm based on the alternating direction method of multipliers (ADMM) framework is used to solve our L(_p)-RQSSVM(_{0/1}) on the selected working set. The computational complexity and convergence of the algorithm are discussed. Furthermore, numerical experiments demonstrate that our L(_p)-RQSSVM(_{0/1}) achieves better classification accuracy, less SVs, and higher computational efficiency than other methods on most datasets. It also has feature sparsity under certain conditions.

{"title":"Kernel-free Reduced Quadratic Surface Support Vector Machine with 0-1 Loss Function and L(_p)-norm Regularization","authors":"Mingyang Wu,&nbsp;Zhixia Yang","doi":"10.1007/s40745-024-00573-w","DOIUrl":"10.1007/s40745-024-00573-w","url":null,"abstract":"<div><p>This paper presents a novel nonlinear binary classification method, namely the kernel-free reduced quadratic surface support vector machine with 0-1 loss function and L<span>(_{p})</span>-norm regularization (L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span>). It uses kernel-free trick aimed at finding a reduced quadratic surface to separate samples, without considering the cross terms in quadratic form. This saves computational costs and provides better interpretability than methods using kernel functions. In addition, adding the 0-1 loss function and L<span>(_p)</span>-norm regularization to construct our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> enables sample sparsity and feature sparsity. The support vector (SV) of L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> is defined, and it is derived that all SVs fall on the support hypersurfaces. Moreover, the optimality condition is explored theoretically, and a new iterative algorithm based on the alternating direction method of multipliers (ADMM) framework is used to solve our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> on the selected working set. The computational complexity and convergence of the algorithm are discussed. Furthermore, numerical experiments demonstrate that our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> achieves better classification accuracy, less SVs, and higher computational efficiency than other methods on most datasets. It also has feature sparsity under certain conditions.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"381 - 412"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-negative Sparse Matrix Factorization for Soft Clustering of Territory Risk Analysis 用于领土风险软聚类分析的非负稀疏矩阵因式分解
Q1 Decision Sciences Pub Date : 2024-08-10 DOI: 10.1007/s40745-024-00570-z
Shengkun Xie, Chong Gan, Anna T. Lawniczak

Developing effective methodologies for territory design and relativity estimation is crucial in auto insurance rate filings and reviews. This study introduces a novel approach utilizing fuzzy clustering to enhance the design process of territories for auto insurance rate-making and regulation. By adopting a soft clustering method, we aim to overcome the limitations of traditional hard clustering techniques and improve the assessment of territory risk. Furthermore, we employ non-negative sparse matrix approximation techniques to refine the estimates of risk relativities for basic rating units. This method promotes sparsity in the fuzzy membership matrix by eliminating small membership values, leading to more robust and interpretable results. We also compare the outcomes with those obtained using non-negative sparse principal component analysis, a technique explored in our previous research. Integrating fuzzy clustering with non-negative sparse matrix decomposition offers a promising approach for auto insurance rate filings. The combined methodology enhances decision-making and provides sparse estimates, which can be advantageous in various data science applications where fuzzy clustering is relevant.

{"title":"Non-negative Sparse Matrix Factorization for Soft Clustering of Territory Risk Analysis","authors":"Shengkun Xie,&nbsp;Chong Gan,&nbsp;Anna T. Lawniczak","doi":"10.1007/s40745-024-00570-z","DOIUrl":"10.1007/s40745-024-00570-z","url":null,"abstract":"<div><p>Developing effective methodologies for territory design and relativity estimation is crucial in auto insurance rate filings and reviews. This study introduces a novel approach utilizing fuzzy clustering to enhance the design process of territories for auto insurance rate-making and regulation. By adopting a soft clustering method, we aim to overcome the limitations of traditional hard clustering techniques and improve the assessment of territory risk. Furthermore, we employ non-negative sparse matrix approximation techniques to refine the estimates of risk relativities for basic rating units. This method promotes sparsity in the fuzzy membership matrix by eliminating small membership values, leading to more robust and interpretable results. We also compare the outcomes with those obtained using non-negative sparse principal component analysis, a technique explored in our previous research. Integrating fuzzy clustering with non-negative sparse matrix decomposition offers a promising approach for auto insurance rate filings. The combined methodology enhances decision-making and provides sparse estimates, which can be advantageous in various data science applications where fuzzy clustering is relevant.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"307 - 340"},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial Label Learning with Noisy Labels
Q1 Decision Sciences Pub Date : 2024-07-31 DOI: 10.1007/s40745-024-00552-1
Pan Zhao, Long Tang, Zhigeng Pan

Partial label learning (PLL) is a particular problem setting within weakly supervised learning. In PLL, each sample corresponds to a candidate label set in which only one label is true. However, in some practical application scenarios, the emergence of label noise can make some candidate sets lose their true labels, leading to a decline in model performance. In this work, a robust training strategy for PLL, derived from the joint training with co-regularization (JoCoR), is proposed to address this issue in PLL. Specifically, the proposed approach constructs two separate PLL models and a joint loss. The joint loss consists of not only two PLL losses but also a co-regularization term measuring the disagreement of the two models. By automatically selecting samples with small joint loss and using them to update the two models, our proposed approach is able to filter more and more suspected samples with noise candidate label sets. Gradually, the robustness of the PLL models to label noise strengthens due to the reduced disagreement of the two models. Experiments are conducted on two state-of-the-art PLL models using benchmark datasets under various noise levels. The results show that the proposed method can effectively stabilize the training process and reduce the model's overfitting to noisy candidate label sets.

{"title":"Partial Label Learning with Noisy Labels","authors":"Pan Zhao,&nbsp;Long Tang,&nbsp;Zhigeng Pan","doi":"10.1007/s40745-024-00552-1","DOIUrl":"10.1007/s40745-024-00552-1","url":null,"abstract":"<div><p>Partial label learning (PLL) is a particular problem setting within weakly supervised learning. In PLL, each sample corresponds to a candidate label set in which only one label is true. However, in some practical application scenarios, the emergence of label noise can make some candidate sets lose their true labels, leading to a decline in model performance. In this work, a robust training strategy for PLL, derived from the joint training with co-regularization (JoCoR), is proposed to address this issue in PLL. Specifically, the proposed approach constructs two separate PLL models and a joint loss. The joint loss consists of not only two PLL losses but also a co-regularization term measuring the disagreement of the two models. By automatically selecting samples with small joint loss and using them to update the two models, our proposed approach is able to filter more and more suspected samples with noise candidate label sets. Gradually, the robustness of the PLL models to label noise strengthens due to the reduced disagreement of the two models. Experiments are conducted on two state-of-the-art PLL models using benchmark datasets under various noise levels. The results show that the proposed method can effectively stabilize the training process and reduce the model's overfitting to noisy candidate label sets.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"199 - 212"},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel Method for Estimating Matusita Overlapping Coefficient Using Numerical Approximations 使用数值近似法估算马图西塔重叠系数的核方法
Q1 Decision Sciences Pub Date : 2024-07-27 DOI: 10.1007/s40745-024-00563-y
Omar M. Eidous, Enas A. Ananbeh
{"title":"Kernel Method for Estimating Matusita Overlapping Coefficient Using Numerical Approximations","authors":"Omar M. Eidous, Enas A. Ananbeh","doi":"10.1007/s40745-024-00563-y","DOIUrl":"https://doi.org/10.1007/s40745-024-00563-y","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"82 21","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141798320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Likelihood Estimation for Generalized Inflated Power Series Distributions 广义膨胀幂级数分布的最大似然估计
Q1 Decision Sciences Pub Date : 2024-07-23 DOI: 10.1007/s40745-024-00560-1
Robert L. Paige
{"title":"Maximum Likelihood Estimation for Generalized Inflated Power Series Distributions","authors":"Robert L. Paige","doi":"10.1007/s40745-024-00560-1","DOIUrl":"https://doi.org/10.1007/s40745-024-00560-1","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"82 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141812645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Farm-Level Smart Crop Recommendation Framework Using Machine Learning 利用机器学习的农场级智能作物推荐框架
Q1 Decision Sciences Pub Date : 2024-07-20 DOI: 10.1007/s40745-024-00534-3
Amit Bhola, Prabhat Kumar

Agriculture is the primary source of food, fuel, and raw materials and is vital to any country’s economy. Farmers, the backbone of agriculture, primarily rely on instinct to determine what crops to plant in any given season. They are comfortable following customary farming practices and standards and are oblivious to the fact that crop yield is highly dependent on current environmental and soil conditions. Crop recommendations involve multifaceted factors such as weather, soil quality, crop production, market demand, and prices, making it crucial for farmers to make well-informed decisions. An improper or imprudent crop recommendation can affect them, their families, and the entire agricultural sector. Modern technologies like artificial intelligence, machine learning, and data science have emerged as efficient solutions to combat issues like declining crop production and lower profits. This research proposes a Smart Crop Recommendation framework that leverages machine learning to empower farmers to make informed decisions about optimal crop selection. The framework consists of two phases: crop filtration and yield prediction. Crops are filtered in the first phase using an artificial neural network based on local input parameters. The second phase estimates yield for filtered crops, considering the season, farm area, and location data. The final recommendation provides farmers with crops aimed at maximizing profit. The remarkable 99.10% accuracy of the framework is demonstrated through experimentation using artificial neural networks and the 0.99 (text {R}^{text {2}}) error metric for the random forest. The uniqueness of this framework lies in its distinctive focus on the farm level and its consideration of the challenges and various agricultural features that change over time. The experimental results affirm the effectiveness of the framework, and its lightweight nature enhances its practicality, making it an efficient real-time recommendation solution.

{"title":"Farm-Level Smart Crop Recommendation Framework Using Machine Learning","authors":"Amit Bhola,&nbsp;Prabhat Kumar","doi":"10.1007/s40745-024-00534-3","DOIUrl":"10.1007/s40745-024-00534-3","url":null,"abstract":"<div><p>Agriculture is the primary source of food, fuel, and raw materials and is vital to any country’s economy. Farmers, the backbone of agriculture, primarily rely on instinct to determine what crops to plant in any given season. They are comfortable following customary farming practices and standards and are oblivious to the fact that crop yield is highly dependent on current environmental and soil conditions. Crop recommendations involve multifaceted factors such as weather, soil quality, crop production, market demand, and prices, making it crucial for farmers to make well-informed decisions. An improper or imprudent crop recommendation can affect them, their families, and the entire agricultural sector. Modern technologies like artificial intelligence, machine learning, and data science have emerged as efficient solutions to combat issues like declining crop production and lower profits. This research proposes a Smart Crop Recommendation framework that leverages machine learning to empower farmers to make informed decisions about optimal crop selection. The framework consists of two phases: crop filtration and yield prediction. Crops are filtered in the first phase using an artificial neural network based on local input parameters. The second phase estimates yield for filtered crops, considering the season, farm area, and location data. The final recommendation provides farmers with crops aimed at maximizing profit. The remarkable 99.10% accuracy of the framework is demonstrated through experimentation using artificial neural networks and the 0.99 <span>(text {R}^{text {2}})</span> error metric for the random forest. The uniqueness of this framework lies in its distinctive focus on the farm level and its consideration of the challenges and various agricultural features that change over time. The experimental results affirm the effectiveness of the framework, and its lightweight nature enhances its practicality, making it an efficient real-time recommendation solution.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"117 - 140"},"PeriodicalIF":0.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141819448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reaction Function for Financial Market Reacting to Events or Information 金融市场对事件或信息的反应函数
Q1 Decision Sciences Pub Date : 2024-07-17 DOI: 10.1007/s40745-024-00565-w
Bo Li, Guangle Du

Observations indicate that the distributions of stock returns in financial markets usually do not conform to normal distributions, but rather exhibit characteristics of high peaks, fat tails and biases. In this work, we assume that the effects of events or information on prices obey normal distribution, while financial markets often overreact or underreact to events or information, resulting in non normal distributions of stock returns. Based on the above assumptions, we for the first time propose a reaction function for a financial market reacting to events or information, and a model based on it to describe the distribution of real stock returns. Our analysis of the returns of China Securities Index 300 (CSI 300), the Standard & Poor’s 500 Index (SPX or S &P 500) and the Nikkei 225 Index (N225) at different time scales shows that financial markets often underreact to events or information with minor impacts, overreact to events or information with relatively significant impacts, and react slightly stronger to positive events or information than to negative ones. In addition, differences in financial markets and time scales of returns can also affect the shapes of the reaction functions.

观察表明,金融市场中股票收益的分布通常不符合正态分布,而是表现出峰值高、尾部肥大和偏差等特征。在本文中,我们假设事件或信息对价格的影响服从正态分布,而金融市场往往对事件或信息反应过度或反应不足,从而导致股票收益率的非正态分布。基于上述假设,我们首次提出了金融市场对事件或信息的反应函数,并在此基础上建立了描述实际股票收益率分布的模型。我们对中国证券指数 300(沪深 300)、标准普尔 500 指数(SPX 或 S&P 500)和日经 225 指数(N225)在不同时间尺度上的收益率进行分析后发现,金融市场往往对影响较小的事件或信息反应不足,对影响相对较大的事件或信息反应过度,对正面事件或信息的反应略强于负面事件或信息。此外,金融市场和回报时间尺度的不同也会影响反应函数的形状。
{"title":"Reaction Function for Financial Market Reacting to Events or Information","authors":"Bo Li,&nbsp;Guangle Du","doi":"10.1007/s40745-024-00565-w","DOIUrl":"10.1007/s40745-024-00565-w","url":null,"abstract":"<div><p>Observations indicate that the distributions of stock returns in financial markets usually do not conform to normal distributions, but rather exhibit characteristics of high peaks, fat tails and biases. In this work, we assume that the effects of events or information on prices obey normal distribution, while financial markets often overreact or underreact to events or information, resulting in non normal distributions of stock returns. Based on the above assumptions, we for the first time propose a reaction function for a financial market reacting to events or information, and a model based on it to describe the distribution of real stock returns. Our analysis of the returns of China Securities Index 300 (CSI 300), the Standard &amp; Poor’s 500 Index (SPX or S &amp;P 500) and the Nikkei 225 Index (N225) at different time scales shows that financial markets often underreact to events or information with minor impacts, overreact to events or information with relatively significant impacts, and react slightly stronger to positive events or information than to negative ones. In addition, differences in financial markets and time scales of returns can also affect the shapes of the reaction functions.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1265 - 1290"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141830830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1