首页 > 最新文献

Big data analytics最新文献

英文 中文
Upgraded Thoth: Software for Data Visualization and Statistics 升级Thoth:数据可视化和统计软件
Pub Date : 2023-03-16 DOI: 10.3390/analytics2010015
R. Laher, F. Masci, L. Rebull, S. Schurr, Wendy Burt, A. Laity, M. Swain, D. Shupe, S. Groom, B. Rusholme, M. Kong, J. Good, V. Gorjian, R. Akeson, B. Fulton, D. Ciardi, S. Carey
Thoth is a free desktop/laptop software application with a friendly graphical user interface that facilitates routine data-visualization and statistical-calculation tasks for astronomy and astrophysical research (and other fields where numbers are visualized). This software has been upgraded with many significant improvements and new capabilities. The major upgrades consist of: (1) six new graph types, including 3D stacked-bar charts and 3D surface plots, made by the Orson 3D Charts library; (2) new saving and loading of graph settings; (3) a new batch-mode or command-line operation; (4) new graph-data annotation functions; (5) new options for data-file importation; and (6) a new built-in FITS-image viewer. There is now the requirement that Thoth be run under Java 1.8 or higher. Many other miscellaneous minor upgrades and bug fixes have also been made to Thoth. The newly implemented plotting options generally make possible graph construction and reuse with relative ease, without resorting to writing computer code. The illustrative astronomy case study of this paper demonstrates one of the many ways the software can be utilized. These new software features and refinements help make astronomers more efficient in their work of elucidating data.
Thoth是一个免费的桌面/笔记本电脑软件应用程序,具有友好的图形用户界面,便于天文学和天体物理学研究(以及其他数字可视化领域)的日常数据可视化和统计计算任务。这个软件已经升级了许多重要的改进和新的功能。主要的升级包括:(1)六种新的图形类型,包括3D堆叠条形图和3D表面图,由Orson 3D charts库制作;(2)图形设置的新保存和加载;(3)新的批处理模式或命令行操作;(4)新增图数据标注功能;(5)新的数据文件输入选项;(6)一个新的内置fits图像查看器。现在需要在Java 1.8或更高版本下运行Thoth。许多其他杂项的小升级和错误修复也做了透特。新实现的绘图选项通常可以相对轻松地构建和重用图形,而无需诉诸于编写计算机代码。本文的天文实例研究说明了该软件的多种应用方式之一。这些新的软件功能和改进有助于天文学家在阐明数据的工作中更有效。
{"title":"Upgraded Thoth: Software for Data Visualization and Statistics","authors":"R. Laher, F. Masci, L. Rebull, S. Schurr, Wendy Burt, A. Laity, M. Swain, D. Shupe, S. Groom, B. Rusholme, M. Kong, J. Good, V. Gorjian, R. Akeson, B. Fulton, D. Ciardi, S. Carey","doi":"10.3390/analytics2010015","DOIUrl":"https://doi.org/10.3390/analytics2010015","url":null,"abstract":"Thoth is a free desktop/laptop software application with a friendly graphical user interface that facilitates routine data-visualization and statistical-calculation tasks for astronomy and astrophysical research (and other fields where numbers are visualized). This software has been upgraded with many significant improvements and new capabilities. The major upgrades consist of: (1) six new graph types, including 3D stacked-bar charts and 3D surface plots, made by the Orson 3D Charts library; (2) new saving and loading of graph settings; (3) a new batch-mode or command-line operation; (4) new graph-data annotation functions; (5) new options for data-file importation; and (6) a new built-in FITS-image viewer. There is now the requirement that Thoth be run under Java 1.8 or higher. Many other miscellaneous minor upgrades and bug fixes have also been made to Thoth. The newly implemented plotting options generally make possible graph construction and reuse with relative ease, without resorting to writing computer code. The illustrative astronomy case study of this paper demonstrates one of the many ways the software can be utilized. These new software features and refinements help make astronomers more efficient in their work of elucidating data.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81525652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Mixture Models for Doubly Inflated Count Data 混合模型在双重膨胀计数数据中的应用
Pub Date : 2023-03-11 DOI: 10.3390/analytics2010014
Monika Arora, N. Chaganty
In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences.
在卫生和社会科学以及计数数据分析很重要的其他领域,当零计数的频率很高(膨胀)时,采用零膨胀模型。由于多种原因,在某些情况下,经常会出现k > 0的附加计数值。零膨胀和k膨胀的泊松分布模型(ZkIP)更适合于这种情况。ZkIP模型是由三个组成部分组成的混合分布:0和k计数的简并分布和泊松分布。在本文中,我们提出了一种替代和计算速度快的期望最大化(EM)算法来获得分组零和k膨胀计数数据的参数估计。用完全数据法推导了渐近标准误差。我们将零膨胀和k膨胀的泊松模型与零膨胀和非膨胀的泊松模型进行比较。根据常用标准选择最佳模型。理论结果补充了来自健康科学的两个现实数据集的分析。
{"title":"Application of Mixture Models for Doubly Inflated Count Data","authors":"Monika Arora, N. Chaganty","doi":"10.3390/analytics2010014","DOIUrl":"https://doi.org/10.3390/analytics2010014","url":null,"abstract":"In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86155839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Voronoi-Based Semantically Balanced Dummy Generation Framework for Location Privacy 基于voronoi的位置隐私语义平衡虚拟生成框架
Pub Date : 2023-03-03 DOI: 10.3390/analytics2010013
Aditya Tadakaluru, Xiao Qin
Location-based services (LBS) require users to provide their current location for service delivery and customization. Location privacy protection addresses concerns associated with the potential mishandling of location information submitted to the LBS provider. Location accuracy has a direct impact on the quality of service (QoS), where higher location accuracy results in better QoS. In general, the main goal of any location privacy technique is to achieve maximum QoS while providing minimum or no location information if possible, and using dummy locations is one such location privacy technique. In this paper, we introduced a temporal constraint attack whereby an adversary can exploit the temporal constraints associated with the semantic category of locations to eliminate dummy locations and identify the true location. We demonstrated how an adversary can devise a temporal constraint attack to breach the location privacy of a residential location. We addressed this major limitation of the current dummy approaches with a novel Voronoi-based semantically balanced framework (VSBDG) capable of generating dummy locations that can withstand a temporal constraint attack. Built based on real-world geospatial datasets, the VSBDG framework leverages spatial relationships and operations. Our results show a high physical dispersion cosine similarity of 0.988 between the semantic categories even with larger location set sizes. This indicates a strong and scalable semantic balance for each semantic category within the VSBDG’s output location set. The VSBDG algorithm is capable of producing location sets with high average minimum dispersion distance values of 5861.894 m for residential locations and 6258.046 m for POI locations. The findings demonstrate that the locations within each semantic category are scattered farther apart, entailing optimized location privacy.
基于位置的服务(LBS)要求用户提供他们当前的位置,以便进行服务交付和定制。位置隐私保护解决了与提交给LBS提供商的位置信息的潜在错误处理相关的问题。定位精度直接影响服务质量(QoS),定位精度越高,服务质量越好。一般来说,任何位置隐私技术的主要目标都是在尽可能少或不提供位置信息的情况下实现最大的QoS,而使用虚拟位置就是这样一种位置隐私技术。在本文中,我们引入了一种时间约束攻击,攻击者可以利用与位置语义类别相关的时间约束来消除虚拟位置并识别真实位置。我们演示了攻击者如何设计时间约束攻击来破坏住宅位置的位置隐私。我们通过一种新的基于voronoi的语义平衡框架(VSBDG)解决了当前虚拟方法的这一主要限制,该框架能够生成能够承受时间约束攻击的虚拟位置。基于现实世界地理空间数据集构建的vsdg框架利用了空间关系和操作。我们的结果表明,即使在较大的位置集大小下,语义类别之间的物理色散余弦相似度也很高,为0.988。这表明VSBDG的输出位置集中的每个语义类别具有强大且可扩展的语义平衡。VSBDG算法能够产生较高的平均最小色散距离值的位置集,住宅位置为5861.894 m, POI位置为6258.046 m。研究结果表明,每个语义类别中的位置分散得更远,需要优化的位置隐私。
{"title":"A Voronoi-Based Semantically Balanced Dummy Generation Framework for Location Privacy","authors":"Aditya Tadakaluru, Xiao Qin","doi":"10.3390/analytics2010013","DOIUrl":"https://doi.org/10.3390/analytics2010013","url":null,"abstract":"Location-based services (LBS) require users to provide their current location for service delivery and customization. Location privacy protection addresses concerns associated with the potential mishandling of location information submitted to the LBS provider. Location accuracy has a direct impact on the quality of service (QoS), where higher location accuracy results in better QoS. In general, the main goal of any location privacy technique is to achieve maximum QoS while providing minimum or no location information if possible, and using dummy locations is one such location privacy technique. In this paper, we introduced a temporal constraint attack whereby an adversary can exploit the temporal constraints associated with the semantic category of locations to eliminate dummy locations and identify the true location. We demonstrated how an adversary can devise a temporal constraint attack to breach the location privacy of a residential location. We addressed this major limitation of the current dummy approaches with a novel Voronoi-based semantically balanced framework (VSBDG) capable of generating dummy locations that can withstand a temporal constraint attack. Built based on real-world geospatial datasets, the VSBDG framework leverages spatial relationships and operations. Our results show a high physical dispersion cosine similarity of 0.988 between the semantic categories even with larger location set sizes. This indicates a strong and scalable semantic balance for each semantic category within the VSBDG’s output location set. The VSBDG algorithm is capable of producing location sets with high average minimum dispersion distance values of 5861.894 m for residential locations and 6258.046 m for POI locations. The findings demonstrate that the locations within each semantic category are scattered farther apart, entailing optimized location privacy.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83719603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey of Distances between the Most Popular Distributions 最流行分布之间的距离调查
Pub Date : 2023-03-01 DOI: 10.3390/analytics2010012
M. Kelbert
We present a number of upper and lower bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances in the cases of multivariate Gaussian distributions, Poisson distributions, binomial distributions, between a binomial and a Poisson distribution, and also in the case of negative binomial distributions are given. Next, the estimations of Lévy–Prohorov distance in terms of Wasserstein metrics are discussed, and Fréchet, Wasserstein and Hellinger distances for multivariate Gaussian distributions are evaluated. Some novel context-sensitive distances are introduced and a number of bounds mimicking the classical results from the information theory are proved.
我们提出了一些最流行的概率分布之间的总变异距离的上界和下界。特别给出了多元高斯分布、泊松分布、二项分布、二项分布与泊松分布之间的总变异距离的估计,以及负二项分布的估计。其次,讨论了基于Wasserstein度量的l - prohorov距离的估计,并评估了多元高斯分布的fr, Wasserstein和Hellinger距离。引入了一些新的上下文敏感距离,并证明了一些模拟信息论经典结果的边界。
{"title":"Survey of Distances between the Most Popular Distributions","authors":"M. Kelbert","doi":"10.3390/analytics2010012","DOIUrl":"https://doi.org/10.3390/analytics2010012","url":null,"abstract":"We present a number of upper and lower bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances in the cases of multivariate Gaussian distributions, Poisson distributions, binomial distributions, between a binomial and a Poisson distribution, and also in the case of negative binomial distributions are given. Next, the estimations of Lévy–Prohorov distance in terms of Wasserstein metrics are discussed, and Fréchet, Wasserstein and Hellinger distances for multivariate Gaussian distributions are evaluated. Some novel context-sensitive distances are introduced and a number of bounds mimicking the classical results from the information theory are proved.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89064747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Smart Multimedia Information Retrieval 智能多媒体信息检索
Pub Date : 2023-02-20 DOI: 10.3390/analytics2010011
Stefan Wagenpfeil
The area of multimedia information retrieval (MMIR) faces two major challenges: the enormously growing number of multimedia objects (i.e., images, videos, audio, and text files), and the fast increasing level of detail of these objects (e.g., the number of pixels in images). Both challenges lead to a high demand of scalability, semantic representations, and explainability of MMIR processes. Smart MMIR solves these challenges by employing graph codes as an indexing structure, attaching semantic annotations for explainability, and employing application profiling for scaling, which results in human-understandable, expressive, and interoperable MMIR. The mathematical foundation, the modeling, implementation detail, and experimental results are shown in this paper, which confirm that Smart MMIR improves MMIR in the area of efficiency, effectiveness, and human understandability.
多媒体信息检索(MMIR)领域面临着两大挑战:多媒体对象(即图像、视频、音频和文本文件)数量的巨大增长,以及这些对象的细节水平的快速提高(例如,图像中的像素数量)。这两个挑战都导致了对mir过程的可伸缩性、语义表示和可解释性的高要求。智能mir通过使用图代码作为索引结构,附加语义注释以实现可解释性,并使用应用程序分析进行扩展来解决这些挑战,从而产生人类可理解、富有表现力和可互操作的mir。本文给出了数学基础、建模、实现细节和实验结果,证实了智能MMIR在效率、有效性和人类可理解性方面提高了MMIR。
{"title":"Smart Multimedia Information Retrieval","authors":"Stefan Wagenpfeil","doi":"10.3390/analytics2010011","DOIUrl":"https://doi.org/10.3390/analytics2010011","url":null,"abstract":"The area of multimedia information retrieval (MMIR) faces two major challenges: the enormously growing number of multimedia objects (i.e., images, videos, audio, and text files), and the fast increasing level of detail of these objects (e.g., the number of pixels in images). Both challenges lead to a high demand of scalability, semantic representations, and explainability of MMIR processes. Smart MMIR solves these challenges by employing graph codes as an indexing structure, attaching semantic annotations for explainability, and employing application profiling for scaling, which results in human-understandable, expressive, and interoperable MMIR. The mathematical foundation, the modeling, implementation detail, and experimental results are shown in this paper, which confirm that Smart MMIR improves MMIR in the area of efficiency, effectiveness, and human understandability.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"154 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77800653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The SP Theory of Intelligence, and Its Realisation in the SP Computer Model, as a Foundation for the Development of Artificial General Intelligence 智能的SP理论及其在SP计算机模型中的实现,作为通用人工智能发展的基础
Pub Date : 2023-02-17 DOI: 10.3390/analytics2010010
J. Wolff
The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model, is a promising foundation for the development of artificial intelligence at the level of people or higher, also known as ‘artificial general intelligence’ (AGI). The SPTI, and alternatives to the SPTI, are considered and compared as potential foundations for the development of AGI. The alternatives include ‘Gato’ from DeepMind, ‘DALL·E 2’ from OpenAI, ‘Soar’ from Allen Newell, John Laird, and others, and ACT-R from John Anderson, Christian Lebiere, and others. A key principle in the SPTI and its development is the importance of information compression in human learning, perception, and cognition. Since there are many uncertainties between where we are now and, far into the future, anything that might qualify as an AGI, a multi-pronged attack on the problem is needed. The SPTI qualifies as the basis for one of those prongs. Although it will take time to achieve AGI, there is potential along the road for many useful benefits and applications of the research.
本文的主题是SP智能理论(SPTI)及其在SP计算机模型中的实现,是人类或更高级别人工智能发展的有希望的基础,也被称为“人工通用智能”(AGI)。SPTI和SPTI的替代品被认为是AGI发展的潜在基础。备选词包括DeepMind的“Gato”、OpenAI的“DALL·e2”、Allen Newell、John Laird等人的“Soar”,以及John Anderson、Christian Lebiere等人的“ACT-R”。SPTI及其发展的一个关键原则是信息压缩在人类学习、感知和认知中的重要性。由于在我们现在所处的位置和遥远的未来之间存在许多不确定性,任何可能符合AGI标准的东西都需要对这个问题进行多管齐下的攻击。标准普尔指数有资格作为其中一种手段的基础。虽然实现AGI还需要时间,但在这条道路上,这项研究有可能带来许多有用的好处和应用。
{"title":"The SP Theory of Intelligence, and Its Realisation in the SP Computer Model, as a Foundation for the Development of Artificial General Intelligence","authors":"J. Wolff","doi":"10.3390/analytics2010010","DOIUrl":"https://doi.org/10.3390/analytics2010010","url":null,"abstract":"The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model, is a promising foundation for the development of artificial intelligence at the level of people or higher, also known as ‘artificial general intelligence’ (AGI). The SPTI, and alternatives to the SPTI, are considered and compared as potential foundations for the development of AGI. The alternatives include ‘Gato’ from DeepMind, ‘DALL·E 2’ from OpenAI, ‘Soar’ from Allen Newell, John Laird, and others, and ACT-R from John Anderson, Christian Lebiere, and others. A key principle in the SPTI and its development is the importance of information compression in human learning, perception, and cognition. Since there are many uncertainties between where we are now and, far into the future, anything that might qualify as an AGI, a multi-pronged attack on the problem is needed. The SPTI qualifies as the basis for one of those prongs. Although it will take time to achieve AGI, there is potential along the road for many useful benefits and applications of the research.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"145 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76772234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Skyline Computation with LSD Trees 动态天际线计算与LSD树
Pub Date : 2023-02-09 DOI: 10.3390/analytics2010009
D. Köppl
Given a set of high-dimensional feature vectors S⊂Rn, the skyline or Pareto problem is to report the subset of vectors in S that are not dominated by any vector of S. Vectors closer to the origin are preferred: we say a vector x is dominated by another distinct vector y if x is equally or further away from the origin than y with respect to all its dimensions. The dynamic skyline problem allows us to shift the origin, which changes the answer set. This problem is crucial for dynamic recommender systems where users can shift the parameters and thus shift the origin. For each origin shift, a recomputation of the answer set from scratch is time intensive. To tackle this problem, we propose a parallel algorithm for dynamic skyline computation that uses multiple local split decision (LSD) trees concurrently. The geometric nature of the LSD trees allows us to reuse previous results. Experiments show that our proposed algorithm works well if the dimension is small in relation to the number of tuples to process.
给定一组高维特征向量S∧Rn,天际线问题或帕累托问题是要报告S中不被S的任何向量支配的向量子集。更靠近原点的向量是首选的:如果向量x在所有维度上与原点相等或距离原点更远,我们说向量x被另一个不同的向量y支配。动态天际线问题允许我们移动原点,从而改变答案集。这个问题对于动态推荐系统至关重要,因为用户可以移动参数,从而移动原点。对于每次原点移位,从头开始重新计算答案集是非常耗时的。为了解决这个问题,我们提出了一种同时使用多个局部分裂决策树(LSD)的动态天际线计算并行算法。LSD树的几何特性允许我们重用以前的结果。实验表明,当元组的维数相对较少时,我们提出的算法效果良好。
{"title":"Dynamic Skyline Computation with LSD Trees","authors":"D. Köppl","doi":"10.3390/analytics2010009","DOIUrl":"https://doi.org/10.3390/analytics2010009","url":null,"abstract":"Given a set of high-dimensional feature vectors S⊂Rn, the skyline or Pareto problem is to report the subset of vectors in S that are not dominated by any vector of S. Vectors closer to the origin are preferred: we say a vector x is dominated by another distinct vector y if x is equally or further away from the origin than y with respect to all its dimensions. The dynamic skyline problem allows us to shift the origin, which changes the answer set. This problem is crucial for dynamic recommender systems where users can shift the parameters and thus shift the origin. For each origin shift, a recomputation of the answer set from scratch is time intensive. To tackle this problem, we propose a parallel algorithm for dynamic skyline computation that uses multiple local split decision (LSD) trees concurrently. The geometric nature of the LSD trees allows us to reuse previous results. Experiments show that our proposed algorithm works well if the dimension is small in relation to the number of tuples to process.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72958217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Untangling Energy Consumption Dynamics with Renewable Energy Using Recurrent Neural Network 基于递归神经网络的可再生能源能源消费动态分析
Pub Date : 2023-02-01 DOI: 10.3390/analytics2010008
Munshi Md. Shafwat Yazdan, Shah Saki, Raaghul Kumar
The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation’s pattern of energy use and renewable energy production is crucial for developing strategic plans. No previous study has been performed to explore the dynamics of power consumption with the change in renewable energy production on a country-wide scale. In contrast, a number of deep learning algorithms have demonstrated acceptable performance while handling sequential data in the era of data-driven predictions. In this study, we developed a scheme to investigate and predict total power consumption and renewable energy production time series for eleven years of data using a recurrent neural network (RNN). The dynamics of the interaction between the total annual power consumption and renewable energy production were investigated through extensive exploratory data analysis (EDA) and a feature engineering framework. The performance of the model was found to be satisfactory through the comparison of the predicted data with the observed data, the visualization of the distribution of the errors and root mean squared error (RMSE), and the R2 values of 0.084 and 0.82. Higher performance was achieved by increasing the number of epochs and hyperparameter tuning. The proposed framework has the potential to be used and transferred to investigate the trend of renewable energy production and power consumption and predict future scenarios for different communities. The incorporation of a cloud-based platform into the proposed pipeline to perform predictive studies from data acquisition to outcome generation may lead to real-time forecasting.
我们目前面临的环境问题需要长期的前瞻性努力来实现可持续增长。在这方面,可再生能源似乎是最实用和有效的替代品之一。了解一个国家的能源使用和可再生能源生产模式对于制定战略计划至关重要。以前没有研究在全国范围内探索电力消费与可再生能源生产变化的动态关系。相比之下,在数据驱动预测时代,许多深度学习算法在处理顺序数据时表现出了可接受的性能。在本研究中,我们开发了一种方案,使用递归神经网络(RNN)调查和预测11年数据的总功耗和可再生能源生产时间序列。通过广泛的探索性数据分析(EDA)和特征工程框架,研究了年总电力消耗与可再生能源生产之间相互作用的动态。通过对预测数据与观测数据的比较,误差和均方根误差(RMSE)分布的可视化,R2值分别为0.084和0.82,表明该模型的性能令人满意。通过增加epoch数和超参数调优,实现了更高的性能。所提出的框架有可能被用于调查可再生能源生产和电力消费的趋势,并预测不同社区的未来情景。将基于云的平台整合到拟议的管道中,以执行从数据采集到结果生成的预测研究,可能会实现实时预测。
{"title":"Untangling Energy Consumption Dynamics with Renewable Energy Using Recurrent Neural Network","authors":"Munshi Md. Shafwat Yazdan, Shah Saki, Raaghul Kumar","doi":"10.3390/analytics2010008","DOIUrl":"https://doi.org/10.3390/analytics2010008","url":null,"abstract":"The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation’s pattern of energy use and renewable energy production is crucial for developing strategic plans. No previous study has been performed to explore the dynamics of power consumption with the change in renewable energy production on a country-wide scale. In contrast, a number of deep learning algorithms have demonstrated acceptable performance while handling sequential data in the era of data-driven predictions. In this study, we developed a scheme to investigate and predict total power consumption and renewable energy production time series for eleven years of data using a recurrent neural network (RNN). The dynamics of the interaction between the total annual power consumption and renewable energy production were investigated through extensive exploratory data analysis (EDA) and a feature engineering framework. The performance of the model was found to be satisfactory through the comparison of the predicted data with the observed data, the visualization of the distribution of the errors and root mean squared error (RMSE), and the R2 values of 0.084 and 0.82. Higher performance was achieved by increasing the number of epochs and hyperparameter tuning. The proposed framework has the potential to be used and transferred to investigate the trend of renewable energy production and power consumption and predict future scenarios for different communities. The incorporation of a cloud-based platform into the proposed pipeline to perform predictive studies from data acquisition to outcome generation may lead to real-time forecasting.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84623505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Theory-Guided Analytics Process: Using Theories to Underpin an Analytics Process for New Banking Product Development Using Segmentation-Based Marketing Analytics Leveraging on Marketing Intelligence 理论指导的分析过程:利用基于细分市场的营销分析,利用营销智能,利用理论来支持新银行产品开发的分析过程
Pub Date : 2023-02-01 DOI: 10.3390/analytics2010007
Tristan Lim, Tao Pan, C. Ong, Shuaiwei Chen, Jie Jun Jeremy Chia
Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In particular, the study applied unsupervised machine learning techniques of link analysis, latent class analysis, and association analysis to undertake behavioral-based market segmentation, in view of attaining a profitable competitive advantage. To underpin the product development process with well grounded theoretical framing, this study asked the research question: “How may we establish a theory-driven approach for an analytics-driven process?” Findings of this study include a theoretical conceptual framework that underpinned the end-to-end segmentation-driven new product development process, backed by the empirical literature. The study hopes to provide: (i) for managerial practitioners, the use of case-based reasoning for practice-oriented new product development design, planning, and diagnosis efforts, and (ii) for researchers, the potentiality to test of the validity and robustness of an analytical-driven NPD process. The study also hopes to drive a wider research interest that studies the theory-driven approach for analytics-driven processes.
零售银行业正在经历相当大的产品竞争和颠覆。新产品开发对于应对这些挑战和重振产品线是必要的。本研究提出了一个实用的现实银行案例研究,其中利用营销分析来推动产品差异化战略。特别地,本研究应用链接分析、潜在类分析和关联分析等无监督机器学习技术进行基于行为的市场细分,以期获得有利可图的竞争优势。为了用良好的理论框架来支撑产品开发过程,本研究提出了一个研究问题:“我们如何为分析驱动的过程建立理论驱动的方法?”本研究的发现包括一个理论概念框架,该框架支持端到端细分驱动的新产品开发过程,并得到实证文献的支持。本研究希望提供:(i)为管理从业者提供以案例为基础的推理方法,用于以实践为导向的新产品开发设计、规划和诊断工作,以及(ii)为研究人员提供测试分析驱动的新产品开发过程的有效性和稳健性的可能性。该研究还希望推动更广泛的研究兴趣,研究分析驱动过程的理论驱动方法。
{"title":"Theory-Guided Analytics Process: Using Theories to Underpin an Analytics Process for New Banking Product Development Using Segmentation-Based Marketing Analytics Leveraging on Marketing Intelligence","authors":"Tristan Lim, Tao Pan, C. Ong, Shuaiwei Chen, Jie Jun Jeremy Chia","doi":"10.3390/analytics2010007","DOIUrl":"https://doi.org/10.3390/analytics2010007","url":null,"abstract":"Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In particular, the study applied unsupervised machine learning techniques of link analysis, latent class analysis, and association analysis to undertake behavioral-based market segmentation, in view of attaining a profitable competitive advantage. To underpin the product development process with well grounded theoretical framing, this study asked the research question: “How may we establish a theory-driven approach for an analytics-driven process?” Findings of this study include a theoretical conceptual framework that underpinned the end-to-end segmentation-driven new product development process, backed by the empirical literature. The study hopes to provide: (i) for managerial practitioners, the use of case-based reasoning for practice-oriented new product development design, planning, and diagnosis efforts, and (ii) for researchers, the potentiality to test of the validity and robustness of an analytical-driven NPD process. The study also hopes to drive a wider research interest that studies the theory-driven approach for analytics-driven processes.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90448763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAFFN_YOLOv5: Multi-Scale Attention Feature Fusion Network on the YOLOv5 Model for the Health Detection of Coral-Reefs Using a Built-In Benchmark Dataset MAFFN_YOLOv5:基于内置基准数据集的YOLOv5模型的多尺度关注特征融合网络珊瑚礁健康检测
Pub Date : 2023-01-19 DOI: 10.3390/analytics2010006
Sivamani Kalyana Sundara Rajan, Nedumaran Damodaran
Coral-reefs are a significant species in marine life, which are affected by multiple diseases due to the stress and variation in heat under the impact of the ocean. The autonomous monitoring and detection of coral health are crucial for researchers to protect it at an early stage. The detection of coral diseases is a difficult task due to the inadequate coral-reef datasets. Therefore, we have developed a coral-reef benchmark dataset and proposed a Multi-scale Attention Feature Fusion Network (MAFFN) as a neck part of the YOLOv5’s network, called “MAFFN_YOLOv5”. The MAFFN_YOLOv5 model outperforms the state-of-the-art object detectors, such as YOLOv5, YOLOX, and YOLOR, by improving the detection accuracy to 8.64%, 3.78%, and 18.05%, respectively, based on the mean average precision (mAP@.5), and 7.8%, 3.72%, and 17.87%, respectively, based on the mAP@.5:.95. Consequently, we have tested a hardware-based deep neural network for the detection of coral-reef health.
珊瑚礁是海洋生物中的重要物种,在海洋的影响下,由于压力和热量的变化,珊瑚礁受到多种疾病的影响。珊瑚健康的自主监测和检测对于研究人员在早期阶段保护珊瑚至关重要。由于珊瑚礁数据集不足,珊瑚疾病的检测是一项艰巨的任务。因此,我们开发了一个珊瑚礁基准数据集,并提出了一个多尺度注意力特征融合网络(MAFFN)作为YOLOv5网络的颈部部分,称为“MAFFN_YOLOv5”。基于平均精度(mAP@.5), MAFFN_YOLOv5模型的检测精度分别提高到8.64%、3.78%和18.05%,基于mAP@.5: 0.95, MAFFN_YOLOv5模型的检测精度分别提高到7.8%、3.72%和17.87%,优于目前最先进的目标检测器YOLOv5、YOLOX和YOLOR。因此,我们测试了一个基于硬件的深度神经网络来检测珊瑚礁的健康状况。
{"title":"MAFFN_YOLOv5: Multi-Scale Attention Feature Fusion Network on the YOLOv5 Model for the Health Detection of Coral-Reefs Using a Built-In Benchmark Dataset","authors":"Sivamani Kalyana Sundara Rajan, Nedumaran Damodaran","doi":"10.3390/analytics2010006","DOIUrl":"https://doi.org/10.3390/analytics2010006","url":null,"abstract":"Coral-reefs are a significant species in marine life, which are affected by multiple diseases due to the stress and variation in heat under the impact of the ocean. The autonomous monitoring and detection of coral health are crucial for researchers to protect it at an early stage. The detection of coral diseases is a difficult task due to the inadequate coral-reef datasets. Therefore, we have developed a coral-reef benchmark dataset and proposed a Multi-scale Attention Feature Fusion Network (MAFFN) as a neck part of the YOLOv5’s network, called “MAFFN_YOLOv5”. The MAFFN_YOLOv5 model outperforms the state-of-the-art object detectors, such as YOLOv5, YOLOX, and YOLOR, by improving the detection accuracy to 8.64%, 3.78%, and 18.05%, respectively, based on the mean average precision (mAP@.5), and 7.8%, 3.72%, and 17.87%, respectively, based on the mAP@.5:.95. Consequently, we have tested a hardware-based deep neural network for the detection of coral-reef health.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135251678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Big data analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1