Pub Date : 2023-03-16DOI: 10.3390/analytics2010015
R. Laher, F. Masci, L. Rebull, S. Schurr, Wendy Burt, A. Laity, M. Swain, D. Shupe, S. Groom, B. Rusholme, M. Kong, J. Good, V. Gorjian, R. Akeson, B. Fulton, D. Ciardi, S. Carey
Thoth is a free desktop/laptop software application with a friendly graphical user interface that facilitates routine data-visualization and statistical-calculation tasks for astronomy and astrophysical research (and other fields where numbers are visualized). This software has been upgraded with many significant improvements and new capabilities. The major upgrades consist of: (1) six new graph types, including 3D stacked-bar charts and 3D surface plots, made by the Orson 3D Charts library; (2) new saving and loading of graph settings; (3) a new batch-mode or command-line operation; (4) new graph-data annotation functions; (5) new options for data-file importation; and (6) a new built-in FITS-image viewer. There is now the requirement that Thoth be run under Java 1.8 or higher. Many other miscellaneous minor upgrades and bug fixes have also been made to Thoth. The newly implemented plotting options generally make possible graph construction and reuse with relative ease, without resorting to writing computer code. The illustrative astronomy case study of this paper demonstrates one of the many ways the software can be utilized. These new software features and refinements help make astronomers more efficient in their work of elucidating data.
Thoth是一个免费的桌面/笔记本电脑软件应用程序,具有友好的图形用户界面,便于天文学和天体物理学研究(以及其他数字可视化领域)的日常数据可视化和统计计算任务。这个软件已经升级了许多重要的改进和新的功能。主要的升级包括:(1)六种新的图形类型,包括3D堆叠条形图和3D表面图,由Orson 3D charts库制作;(2)图形设置的新保存和加载;(3)新的批处理模式或命令行操作;(4)新增图数据标注功能;(5)新的数据文件输入选项;(6)一个新的内置fits图像查看器。现在需要在Java 1.8或更高版本下运行Thoth。许多其他杂项的小升级和错误修复也做了透特。新实现的绘图选项通常可以相对轻松地构建和重用图形,而无需诉诸于编写计算机代码。本文的天文实例研究说明了该软件的多种应用方式之一。这些新的软件功能和改进有助于天文学家在阐明数据的工作中更有效。
{"title":"Upgraded Thoth: Software for Data Visualization and Statistics","authors":"R. Laher, F. Masci, L. Rebull, S. Schurr, Wendy Burt, A. Laity, M. Swain, D. Shupe, S. Groom, B. Rusholme, M. Kong, J. Good, V. Gorjian, R. Akeson, B. Fulton, D. Ciardi, S. Carey","doi":"10.3390/analytics2010015","DOIUrl":"https://doi.org/10.3390/analytics2010015","url":null,"abstract":"Thoth is a free desktop/laptop software application with a friendly graphical user interface that facilitates routine data-visualization and statistical-calculation tasks for astronomy and astrophysical research (and other fields where numbers are visualized). This software has been upgraded with many significant improvements and new capabilities. The major upgrades consist of: (1) six new graph types, including 3D stacked-bar charts and 3D surface plots, made by the Orson 3D Charts library; (2) new saving and loading of graph settings; (3) a new batch-mode or command-line operation; (4) new graph-data annotation functions; (5) new options for data-file importation; and (6) a new built-in FITS-image viewer. There is now the requirement that Thoth be run under Java 1.8 or higher. Many other miscellaneous minor upgrades and bug fixes have also been made to Thoth. The newly implemented plotting options generally make possible graph construction and reuse with relative ease, without resorting to writing computer code. The illustrative astronomy case study of this paper demonstrates one of the many ways the software can be utilized. These new software features and refinements help make astronomers more efficient in their work of elucidating data.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81525652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-11DOI: 10.3390/analytics2010014
Monika Arora, N. Chaganty
In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences.
{"title":"Application of Mixture Models for Doubly Inflated Count Data","authors":"Monika Arora, N. Chaganty","doi":"10.3390/analytics2010014","DOIUrl":"https://doi.org/10.3390/analytics2010014","url":null,"abstract":"In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86155839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-03DOI: 10.3390/analytics2010013
Aditya Tadakaluru, Xiao Qin
Location-based services (LBS) require users to provide their current location for service delivery and customization. Location privacy protection addresses concerns associated with the potential mishandling of location information submitted to the LBS provider. Location accuracy has a direct impact on the quality of service (QoS), where higher location accuracy results in better QoS. In general, the main goal of any location privacy technique is to achieve maximum QoS while providing minimum or no location information if possible, and using dummy locations is one such location privacy technique. In this paper, we introduced a temporal constraint attack whereby an adversary can exploit the temporal constraints associated with the semantic category of locations to eliminate dummy locations and identify the true location. We demonstrated how an adversary can devise a temporal constraint attack to breach the location privacy of a residential location. We addressed this major limitation of the current dummy approaches with a novel Voronoi-based semantically balanced framework (VSBDG) capable of generating dummy locations that can withstand a temporal constraint attack. Built based on real-world geospatial datasets, the VSBDG framework leverages spatial relationships and operations. Our results show a high physical dispersion cosine similarity of 0.988 between the semantic categories even with larger location set sizes. This indicates a strong and scalable semantic balance for each semantic category within the VSBDG’s output location set. The VSBDG algorithm is capable of producing location sets with high average minimum dispersion distance values of 5861.894 m for residential locations and 6258.046 m for POI locations. The findings demonstrate that the locations within each semantic category are scattered farther apart, entailing optimized location privacy.
基于位置的服务(LBS)要求用户提供他们当前的位置,以便进行服务交付和定制。位置隐私保护解决了与提交给LBS提供商的位置信息的潜在错误处理相关的问题。定位精度直接影响服务质量(QoS),定位精度越高,服务质量越好。一般来说,任何位置隐私技术的主要目标都是在尽可能少或不提供位置信息的情况下实现最大的QoS,而使用虚拟位置就是这样一种位置隐私技术。在本文中,我们引入了一种时间约束攻击,攻击者可以利用与位置语义类别相关的时间约束来消除虚拟位置并识别真实位置。我们演示了攻击者如何设计时间约束攻击来破坏住宅位置的位置隐私。我们通过一种新的基于voronoi的语义平衡框架(VSBDG)解决了当前虚拟方法的这一主要限制,该框架能够生成能够承受时间约束攻击的虚拟位置。基于现实世界地理空间数据集构建的vsdg框架利用了空间关系和操作。我们的结果表明,即使在较大的位置集大小下,语义类别之间的物理色散余弦相似度也很高,为0.988。这表明VSBDG的输出位置集中的每个语义类别具有强大且可扩展的语义平衡。VSBDG算法能够产生较高的平均最小色散距离值的位置集,住宅位置为5861.894 m, POI位置为6258.046 m。研究结果表明,每个语义类别中的位置分散得更远,需要优化的位置隐私。
{"title":"A Voronoi-Based Semantically Balanced Dummy Generation Framework for Location Privacy","authors":"Aditya Tadakaluru, Xiao Qin","doi":"10.3390/analytics2010013","DOIUrl":"https://doi.org/10.3390/analytics2010013","url":null,"abstract":"Location-based services (LBS) require users to provide their current location for service delivery and customization. Location privacy protection addresses concerns associated with the potential mishandling of location information submitted to the LBS provider. Location accuracy has a direct impact on the quality of service (QoS), where higher location accuracy results in better QoS. In general, the main goal of any location privacy technique is to achieve maximum QoS while providing minimum or no location information if possible, and using dummy locations is one such location privacy technique. In this paper, we introduced a temporal constraint attack whereby an adversary can exploit the temporal constraints associated with the semantic category of locations to eliminate dummy locations and identify the true location. We demonstrated how an adversary can devise a temporal constraint attack to breach the location privacy of a residential location. We addressed this major limitation of the current dummy approaches with a novel Voronoi-based semantically balanced framework (VSBDG) capable of generating dummy locations that can withstand a temporal constraint attack. Built based on real-world geospatial datasets, the VSBDG framework leverages spatial relationships and operations. Our results show a high physical dispersion cosine similarity of 0.988 between the semantic categories even with larger location set sizes. This indicates a strong and scalable semantic balance for each semantic category within the VSBDG’s output location set. The VSBDG algorithm is capable of producing location sets with high average minimum dispersion distance values of 5861.894 m for residential locations and 6258.046 m for POI locations. The findings demonstrate that the locations within each semantic category are scattered farther apart, entailing optimized location privacy.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83719603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.3390/analytics2010012
M. Kelbert
We present a number of upper and lower bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances in the cases of multivariate Gaussian distributions, Poisson distributions, binomial distributions, between a binomial and a Poisson distribution, and also in the case of negative binomial distributions are given. Next, the estimations of Lévy–Prohorov distance in terms of Wasserstein metrics are discussed, and Fréchet, Wasserstein and Hellinger distances for multivariate Gaussian distributions are evaluated. Some novel context-sensitive distances are introduced and a number of bounds mimicking the classical results from the information theory are proved.
{"title":"Survey of Distances between the Most Popular Distributions","authors":"M. Kelbert","doi":"10.3390/analytics2010012","DOIUrl":"https://doi.org/10.3390/analytics2010012","url":null,"abstract":"We present a number of upper and lower bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances in the cases of multivariate Gaussian distributions, Poisson distributions, binomial distributions, between a binomial and a Poisson distribution, and also in the case of negative binomial distributions are given. Next, the estimations of Lévy–Prohorov distance in terms of Wasserstein metrics are discussed, and Fréchet, Wasserstein and Hellinger distances for multivariate Gaussian distributions are evaluated. Some novel context-sensitive distances are introduced and a number of bounds mimicking the classical results from the information theory are proved.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89064747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-20DOI: 10.3390/analytics2010011
Stefan Wagenpfeil
The area of multimedia information retrieval (MMIR) faces two major challenges: the enormously growing number of multimedia objects (i.e., images, videos, audio, and text files), and the fast increasing level of detail of these objects (e.g., the number of pixels in images). Both challenges lead to a high demand of scalability, semantic representations, and explainability of MMIR processes. Smart MMIR solves these challenges by employing graph codes as an indexing structure, attaching semantic annotations for explainability, and employing application profiling for scaling, which results in human-understandable, expressive, and interoperable MMIR. The mathematical foundation, the modeling, implementation detail, and experimental results are shown in this paper, which confirm that Smart MMIR improves MMIR in the area of efficiency, effectiveness, and human understandability.
{"title":"Smart Multimedia Information Retrieval","authors":"Stefan Wagenpfeil","doi":"10.3390/analytics2010011","DOIUrl":"https://doi.org/10.3390/analytics2010011","url":null,"abstract":"The area of multimedia information retrieval (MMIR) faces two major challenges: the enormously growing number of multimedia objects (i.e., images, videos, audio, and text files), and the fast increasing level of detail of these objects (e.g., the number of pixels in images). Both challenges lead to a high demand of scalability, semantic representations, and explainability of MMIR processes. Smart MMIR solves these challenges by employing graph codes as an indexing structure, attaching semantic annotations for explainability, and employing application profiling for scaling, which results in human-understandable, expressive, and interoperable MMIR. The mathematical foundation, the modeling, implementation detail, and experimental results are shown in this paper, which confirm that Smart MMIR improves MMIR in the area of efficiency, effectiveness, and human understandability.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"154 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77800653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-17DOI: 10.3390/analytics2010010
J. Wolff
The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model, is a promising foundation for the development of artificial intelligence at the level of people or higher, also known as ‘artificial general intelligence’ (AGI). The SPTI, and alternatives to the SPTI, are considered and compared as potential foundations for the development of AGI. The alternatives include ‘Gato’ from DeepMind, ‘DALL·E 2’ from OpenAI, ‘Soar’ from Allen Newell, John Laird, and others, and ACT-R from John Anderson, Christian Lebiere, and others. A key principle in the SPTI and its development is the importance of information compression in human learning, perception, and cognition. Since there are many uncertainties between where we are now and, far into the future, anything that might qualify as an AGI, a multi-pronged attack on the problem is needed. The SPTI qualifies as the basis for one of those prongs. Although it will take time to achieve AGI, there is potential along the road for many useful benefits and applications of the research.
{"title":"The SP Theory of Intelligence, and Its Realisation in the SP Computer Model, as a Foundation for the Development of Artificial General Intelligence","authors":"J. Wolff","doi":"10.3390/analytics2010010","DOIUrl":"https://doi.org/10.3390/analytics2010010","url":null,"abstract":"The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model, is a promising foundation for the development of artificial intelligence at the level of people or higher, also known as ‘artificial general intelligence’ (AGI). The SPTI, and alternatives to the SPTI, are considered and compared as potential foundations for the development of AGI. The alternatives include ‘Gato’ from DeepMind, ‘DALL·E 2’ from OpenAI, ‘Soar’ from Allen Newell, John Laird, and others, and ACT-R from John Anderson, Christian Lebiere, and others. A key principle in the SPTI and its development is the importance of information compression in human learning, perception, and cognition. Since there are many uncertainties between where we are now and, far into the future, anything that might qualify as an AGI, a multi-pronged attack on the problem is needed. The SPTI qualifies as the basis for one of those prongs. Although it will take time to achieve AGI, there is potential along the road for many useful benefits and applications of the research.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"145 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76772234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-09DOI: 10.3390/analytics2010009
D. Köppl
Given a set of high-dimensional feature vectors S⊂Rn, the skyline or Pareto problem is to report the subset of vectors in S that are not dominated by any vector of S. Vectors closer to the origin are preferred: we say a vector x is dominated by another distinct vector y if x is equally or further away from the origin than y with respect to all its dimensions. The dynamic skyline problem allows us to shift the origin, which changes the answer set. This problem is crucial for dynamic recommender systems where users can shift the parameters and thus shift the origin. For each origin shift, a recomputation of the answer set from scratch is time intensive. To tackle this problem, we propose a parallel algorithm for dynamic skyline computation that uses multiple local split decision (LSD) trees concurrently. The geometric nature of the LSD trees allows us to reuse previous results. Experiments show that our proposed algorithm works well if the dimension is small in relation to the number of tuples to process.
{"title":"Dynamic Skyline Computation with LSD Trees","authors":"D. Köppl","doi":"10.3390/analytics2010009","DOIUrl":"https://doi.org/10.3390/analytics2010009","url":null,"abstract":"Given a set of high-dimensional feature vectors S⊂Rn, the skyline or Pareto problem is to report the subset of vectors in S that are not dominated by any vector of S. Vectors closer to the origin are preferred: we say a vector x is dominated by another distinct vector y if x is equally or further away from the origin than y with respect to all its dimensions. The dynamic skyline problem allows us to shift the origin, which changes the answer set. This problem is crucial for dynamic recommender systems where users can shift the parameters and thus shift the origin. For each origin shift, a recomputation of the answer set from scratch is time intensive. To tackle this problem, we propose a parallel algorithm for dynamic skyline computation that uses multiple local split decision (LSD) trees concurrently. The geometric nature of the LSD trees allows us to reuse previous results. Experiments show that our proposed algorithm works well if the dimension is small in relation to the number of tuples to process.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72958217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.3390/analytics2010008
Munshi Md. Shafwat Yazdan, Shah Saki, Raaghul Kumar
The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation’s pattern of energy use and renewable energy production is crucial for developing strategic plans. No previous study has been performed to explore the dynamics of power consumption with the change in renewable energy production on a country-wide scale. In contrast, a number of deep learning algorithms have demonstrated acceptable performance while handling sequential data in the era of data-driven predictions. In this study, we developed a scheme to investigate and predict total power consumption and renewable energy production time series for eleven years of data using a recurrent neural network (RNN). The dynamics of the interaction between the total annual power consumption and renewable energy production were investigated through extensive exploratory data analysis (EDA) and a feature engineering framework. The performance of the model was found to be satisfactory through the comparison of the predicted data with the observed data, the visualization of the distribution of the errors and root mean squared error (RMSE), and the R2 values of 0.084 and 0.82. Higher performance was achieved by increasing the number of epochs and hyperparameter tuning. The proposed framework has the potential to be used and transferred to investigate the trend of renewable energy production and power consumption and predict future scenarios for different communities. The incorporation of a cloud-based platform into the proposed pipeline to perform predictive studies from data acquisition to outcome generation may lead to real-time forecasting.
{"title":"Untangling Energy Consumption Dynamics with Renewable Energy Using Recurrent Neural Network","authors":"Munshi Md. Shafwat Yazdan, Shah Saki, Raaghul Kumar","doi":"10.3390/analytics2010008","DOIUrl":"https://doi.org/10.3390/analytics2010008","url":null,"abstract":"The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation’s pattern of energy use and renewable energy production is crucial for developing strategic plans. No previous study has been performed to explore the dynamics of power consumption with the change in renewable energy production on a country-wide scale. In contrast, a number of deep learning algorithms have demonstrated acceptable performance while handling sequential data in the era of data-driven predictions. In this study, we developed a scheme to investigate and predict total power consumption and renewable energy production time series for eleven years of data using a recurrent neural network (RNN). The dynamics of the interaction between the total annual power consumption and renewable energy production were investigated through extensive exploratory data analysis (EDA) and a feature engineering framework. The performance of the model was found to be satisfactory through the comparison of the predicted data with the observed data, the visualization of the distribution of the errors and root mean squared error (RMSE), and the R2 values of 0.084 and 0.82. Higher performance was achieved by increasing the number of epochs and hyperparameter tuning. The proposed framework has the potential to be used and transferred to investigate the trend of renewable energy production and power consumption and predict future scenarios for different communities. The incorporation of a cloud-based platform into the proposed pipeline to perform predictive studies from data acquisition to outcome generation may lead to real-time forecasting.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84623505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.3390/analytics2010007
Tristan Lim, Tao Pan, C. Ong, Shuaiwei Chen, Jie Jun Jeremy Chia
Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In particular, the study applied unsupervised machine learning techniques of link analysis, latent class analysis, and association analysis to undertake behavioral-based market segmentation, in view of attaining a profitable competitive advantage. To underpin the product development process with well grounded theoretical framing, this study asked the research question: “How may we establish a theory-driven approach for an analytics-driven process?” Findings of this study include a theoretical conceptual framework that underpinned the end-to-end segmentation-driven new product development process, backed by the empirical literature. The study hopes to provide: (i) for managerial practitioners, the use of case-based reasoning for practice-oriented new product development design, planning, and diagnosis efforts, and (ii) for researchers, the potentiality to test of the validity and robustness of an analytical-driven NPD process. The study also hopes to drive a wider research interest that studies the theory-driven approach for analytics-driven processes.
{"title":"Theory-Guided Analytics Process: Using Theories to Underpin an Analytics Process for New Banking Product Development Using Segmentation-Based Marketing Analytics Leveraging on Marketing Intelligence","authors":"Tristan Lim, Tao Pan, C. Ong, Shuaiwei Chen, Jie Jun Jeremy Chia","doi":"10.3390/analytics2010007","DOIUrl":"https://doi.org/10.3390/analytics2010007","url":null,"abstract":"Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In particular, the study applied unsupervised machine learning techniques of link analysis, latent class analysis, and association analysis to undertake behavioral-based market segmentation, in view of attaining a profitable competitive advantage. To underpin the product development process with well grounded theoretical framing, this study asked the research question: “How may we establish a theory-driven approach for an analytics-driven process?” Findings of this study include a theoretical conceptual framework that underpinned the end-to-end segmentation-driven new product development process, backed by the empirical literature. The study hopes to provide: (i) for managerial practitioners, the use of case-based reasoning for practice-oriented new product development design, planning, and diagnosis efforts, and (ii) for researchers, the potentiality to test of the validity and robustness of an analytical-driven NPD process. The study also hopes to drive a wider research interest that studies the theory-driven approach for analytics-driven processes.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90448763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coral-reefs are a significant species in marine life, which are affected by multiple diseases due to the stress and variation in heat under the impact of the ocean. The autonomous monitoring and detection of coral health are crucial for researchers to protect it at an early stage. The detection of coral diseases is a difficult task due to the inadequate coral-reef datasets. Therefore, we have developed a coral-reef benchmark dataset and proposed a Multi-scale Attention Feature Fusion Network (MAFFN) as a neck part of the YOLOv5’s network, called “MAFFN_YOLOv5”. The MAFFN_YOLOv5 model outperforms the state-of-the-art object detectors, such as YOLOv5, YOLOX, and YOLOR, by improving the detection accuracy to 8.64%, 3.78%, and 18.05%, respectively, based on the mean average precision (mAP@.5), and 7.8%, 3.72%, and 17.87%, respectively, based on the mAP@.5:.95. Consequently, we have tested a hardware-based deep neural network for the detection of coral-reef health.
{"title":"MAFFN_YOLOv5: Multi-Scale Attention Feature Fusion Network on the YOLOv5 Model for the Health Detection of Coral-Reefs Using a Built-In Benchmark Dataset","authors":"Sivamani Kalyana Sundara Rajan, Nedumaran Damodaran","doi":"10.3390/analytics2010006","DOIUrl":"https://doi.org/10.3390/analytics2010006","url":null,"abstract":"Coral-reefs are a significant species in marine life, which are affected by multiple diseases due to the stress and variation in heat under the impact of the ocean. The autonomous monitoring and detection of coral health are crucial for researchers to protect it at an early stage. The detection of coral diseases is a difficult task due to the inadequate coral-reef datasets. Therefore, we have developed a coral-reef benchmark dataset and proposed a Multi-scale Attention Feature Fusion Network (MAFFN) as a neck part of the YOLOv5’s network, called “MAFFN_YOLOv5”. The MAFFN_YOLOv5 model outperforms the state-of-the-art object detectors, such as YOLOv5, YOLOX, and YOLOR, by improving the detection accuracy to 8.64%, 3.78%, and 18.05%, respectively, based on the mean average precision (mAP@.5), and 7.8%, 3.72%, and 17.87%, respectively, based on the mAP@.5:.95. Consequently, we have tested a hardware-based deep neural network for the detection of coral-reef health.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135251678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}