Data Technologies and Applications最新文献_第8页

Research on the generalization of social bot detection from two dimensions: feature extraction and detection approaches 从特征提取和检测方法两个维度研究社交机器人检测的泛化

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-09-02 DOI: 10.1108/dta-02-2022-0084

Ziming Zeng, Tingting Li, Jingjing Sun, Shouqiang Sun, Yu Zhang

PurposeThe proliferation of bots in social networks has profoundly affected the interactions of legitimate users. Detecting and rejecting these unwelcome bots has become part of the collective Internet agenda. Unfortunately, as bot creators use more sophisticated approaches to avoid being discovered, it has become increasingly difficult to distinguish social bots from legitimate users. Therefore, this paper proposes a novel social bot detection mechanism to adapt to new and different kinds of bots.Design/methodology/approachThis paper proposes a research framework to enhance the generalization of social bot detection from two dimensions: feature extraction and detection approaches. First, 36 features are extracted from four views for social bot detection. Then, this paper analyzes the feature contribution in different kinds of social bots, and the features with stronger generalization are proposed. Finally, this paper introduces outlier detection approaches to enhance the ever-changing social bot detection.FindingsThe experimental results show that the more important features can be more effectively generalized to different social bot detection tasks. Compared with the traditional binary-class classifier, the proposed outlier detection approaches can better adapt to the ever-changing social bots with a performance of 89.23 per cent measured using the F1 score.Originality/valueBased on the visual interpretation of the feature contribution, the features with stronger generalization in different detection tasks are found. The outlier detection approaches are first introduced to enhance the detection of ever-changing social bots.

社交网络中机器人的激增深刻地影响了合法用户的互动。检测和拒绝这些不受欢迎的机器人已经成为互联网集体议程的一部分。不幸的是，随着机器人创建者使用更复杂的方法来避免被发现，将社交机器人与合法用户区分开来变得越来越困难。因此，本文提出了一种新的社交机器人检测机制，以适应新的和不同类型的机器人。设计/方法/方法本文从特征提取和检测方法两个维度提出了一种增强社交机器人检测泛化的研究框架。首先，从四个视图中提取36个特征用于社交机器人检测。然后分析了特征在不同类型社交机器人中的贡献，提出了具有较强泛化能力的特征。最后，本文介绍了异常值检测方法，以增强不断变化的社交机器人检测。实验结果表明，更重要的特征可以更有效地推广到不同的社交机器人检测任务中。与传统的二类分类器相比，本文提出的离群检测方法可以更好地适应不断变化的社交机器人，使用F1分数测量的性能为89.23%。原创性/价值基于特征贡献的视觉解释，找到在不同检测任务中具有较强泛化的特征。首先引入离群值检测方法来增强对不断变化的社交机器人的检测。

{"title":"Research on the generalization of social bot detection from two dimensions: feature extraction and detection approaches","authors":"Ziming Zeng, Tingting Li, Jingjing Sun, Shouqiang Sun, Yu Zhang","doi":"10.1108/dta-02-2022-0084","DOIUrl":"https://doi.org/10.1108/dta-02-2022-0084","url":null,"abstract":"PurposeThe proliferation of bots in social networks has profoundly affected the interactions of legitimate users. Detecting and rejecting these unwelcome bots has become part of the collective Internet agenda. Unfortunately, as bot creators use more sophisticated approaches to avoid being discovered, it has become increasingly difficult to distinguish social bots from legitimate users. Therefore, this paper proposes a novel social bot detection mechanism to adapt to new and different kinds of bots.Design/methodology/approachThis paper proposes a research framework to enhance the generalization of social bot detection from two dimensions: feature extraction and detection approaches. First, 36 features are extracted from four views for social bot detection. Then, this paper analyzes the feature contribution in different kinds of social bots, and the features with stronger generalization are proposed. Finally, this paper introduces outlier detection approaches to enhance the ever-changing social bot detection.FindingsThe experimental results show that the more important features can be more effectively generalized to different social bot detection tasks. Compared with the traditional binary-class classifier, the proposed outlier detection approaches can better adapt to the ever-changing social bots with a performance of 89.23 per cent measured using the F1 score.Originality/valueBased on the visual interpretation of the feature contribution, the features with stronger generalization in different detection tasks are found. The outlier detection approaches are first introduced to enhance the detection of ever-changing social bots.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"33 1","pages":"177-198"},"PeriodicalIF":1.6,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74177373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Impact on recommendation performance of online review helpfulness and consistency 在线评论的帮助性和一致性对推荐性能的影响

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-09-02 DOI: 10.1108/dta-04-2022-0172

Jaeseung Park, Xinzhe Li, Qing Li, Jaekyeong Kim

PurposeThe existing collaborative filtering algorithm may select an insufficiently representative customer as the neighbor of a target customer, which means that the performance in providing recommendations is not sufficiently accurate. This study aims to investigate the impact on recommendation performance of selecting influential and representative customers.Design/methodology/approachSome studies have shown that review helpfulness and consistency significantly affect purchase decision-making. Thus, this study focuses on customers who have written helpful and consistent reviews to select influential and representative neighbors. To achieve the purpose of this study, the authors apply a text-mining approach to analyze review helpfulness and consistency. In addition, they evaluate the performance of the proposed methodology using several real-world Amazon review data sets for experimental utility and reliability.FindingsThis study is the first to propose a methodology to investigate the effect of review consistency and helpfulness on recommendation performance. The experimental results confirmed that the recommendation performance was excellent when a neighbor was selected who wrote consistent or helpful reviews more than when neighbors were selected for all customers.Originality/valueThis study investigates the effect of review consistency and helpfulness on recommendation performance. Online review can enhance recommendation performance because it reflects the purchasing behavior of customers who consider reviews when purchasing items. The experimental results indicate that review helpfulness and consistency can enhance the performance of personalized recommendation services, increase customer satisfaction and increase confidence in a company.

现有的协同过滤算法可能会选择一个不够有代表性的客户作为目标客户的邻居，这意味着提供推荐的性能不够准确。本研究旨在探讨选择有影响力和代表性的顾客对推荐绩效的影响。一些研究表明，评论的帮助性和一致性显著影响购买决策。因此，本研究聚焦于撰写有帮助且一致的评论的顾客，以选择有影响力和代表性的邻居。为了达到本研究的目的，作者采用文本挖掘的方法来分析评论的有用性和一致性。此外，他们使用几个真实世界的亚马逊评论数据集来评估所提出方法的性能，以用于实验效用和可靠性。本研究首次提出了一种方法来研究评论一致性和有用性对推荐绩效的影响。实验结果证实，当选择一个评论一致或有帮助的邻居时，推荐性能比为所有客户选择邻居时都要好。原创性/价值本研究探讨评论一致性和有用性对推荐绩效的影响。在线评论可以提高推荐性能，因为它反映了在购买商品时考虑评论的客户的购买行为。实验结果表明，评论的帮助性和一致性可以提高个性化推荐服务的性能，提高客户满意度，增加对公司的信心。

{"title":"Impact on recommendation performance of online review helpfulness and consistency","authors":"Jaeseung Park, Xinzhe Li, Qing Li, Jaekyeong Kim","doi":"10.1108/dta-04-2022-0172","DOIUrl":"https://doi.org/10.1108/dta-04-2022-0172","url":null,"abstract":"PurposeThe existing collaborative filtering algorithm may select an insufficiently representative customer as the neighbor of a target customer, which means that the performance in providing recommendations is not sufficiently accurate. This study aims to investigate the impact on recommendation performance of selecting influential and representative customers.Design/methodology/approachSome studies have shown that review helpfulness and consistency significantly affect purchase decision-making. Thus, this study focuses on customers who have written helpful and consistent reviews to select influential and representative neighbors. To achieve the purpose of this study, the authors apply a text-mining approach to analyze review helpfulness and consistency. In addition, they evaluate the performance of the proposed methodology using several real-world Amazon review data sets for experimental utility and reliability.FindingsThis study is the first to propose a methodology to investigate the effect of review consistency and helpfulness on recommendation performance. The experimental results confirmed that the recommendation performance was excellent when a neighbor was selected who wrote consistent or helpful reviews more than when neighbors were selected for all customers.Originality/valueThis study investigates the effect of review consistency and helpfulness on recommendation performance. Online review can enhance recommendation performance because it reflects the purchasing behavior of customers who consider reviews when purchasing items. The experimental results indicate that review helpfulness and consistency can enhance the performance of personalized recommendation services, increase customer satisfaction and increase confidence in a company.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"18 1","pages":"199-221"},"PeriodicalIF":1.6,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83740601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

SeaRank: relevance prediction based on click models in a reinforcement learning framework SeaRank:强化学习框架中基于点击模型的相关性预测

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-09-01 DOI: 10.1108/dta-01-2022-0001

A. Keyhanipour, F. Oroumchian

PurposeUser feedback inferred from the user's search-time behavior could improve the learning to rank (L2R) algorithms. Click models (CMs) present probabilistic frameworks for describing and predicting the user's clicks during search sessions. Most of these CMs are based on common assumptions such as Attractiveness, Examination and User Satisfaction. CMs usually consider the Attractiveness and Examination as pre- and post-estimators of the actual relevance. They also assume that User Satisfaction is a function of the actual relevance. This paper extends the authors' previous work by building a reinforcement learning (RL) model to predict the relevance. The Attractiveness, Examination and User Satisfaction are estimated using a limited number of the features of the utilized benchmark data set and then they are incorporated in the construction of an RL agent. The proposed RL model learns to predict the relevance label of documents with respect to a given query more effectively than the baseline RL models for those data sets.Design/methodology/approachIn this paper, User Satisfaction is used as an indication of the relevance level of a query to a document. User Satisfaction itself is estimated through Attractiveness and Examination, and in turn, Attractiveness and Examination are calculated by the random forest algorithm. In this process, only a small subset of top information retrieval (IR) features are used, which are selected based on their mean average precision and normalized discounted cumulative gain values. Based on the authors' observations, the multiplication of the Attractiveness and Examination values of a given query–document pair closely approximates the User Satisfaction and hence the relevance level. Besides, an RL model is designed in such a way that the current state of the RL agent is determined by discretization of the estimated Attractiveness and Examination values. In this way, each query–document pair would be mapped into a specific state based on its Attractiveness and Examination values. Then, based on the reward function, the RL agent would try to choose an action (relevance label) which maximizes the received reward in its current state. Using temporal difference (TD) learning algorithms, such as Q-learning and SARSA, the learning agent gradually learns to identify an appropriate relevance label in each state. The reward that is used in the RL agent is proportional to the difference between the User Satisfaction and the selected action.FindingsExperimental results on MSLR-WEB10K and WCL2R benchmark data sets demonstrate that the proposed algorithm, named as SeaRank, outperforms baseline algorithms. Improvement is more noticeable in top-ranked results, which usually receive more attention from users.Originality/valueThis research provides a mapping from IR features to the CM features and thereafter utilizes these newly generated features to build an RL model. This RL model is proposed with the definition of the states, acti

目的从用户的搜索时间行为推断出的用户反馈可以改进学习排序（L2R）算法。点击模型（CM）提供了用于描述和预测用户在搜索会话期间的点击的概率框架。这些CM大多基于常见的假设，如吸引力、考试和用户满意度。CM通常将吸引力和检验视为实际相关性的前估计量和后估计量。他们还假设用户满意度是实际相关性的函数。本文通过建立强化学习（RL）模型来预测相关性，扩展了作者以前的工作。使用所使用的基准数据集的有限数量的特征来估计吸引力、检查和用户满意度，然后将它们纳入RL代理的构建中。所提出的RL模型比那些数据集的基线RL模型更有效地学习预测文档相对于给定查询的相关性标签。设计/方法论/方法在本文中，用户满意度被用作查询与文档的相关性水平的指示。用户满意度本身是通过吸引力和检查来估计的，而吸引力和检查又是通过随机森林算法来计算的。在这个过程中，只使用顶部信息检索（IR）特征的一小部分，这些特征是基于它们的平均精度和归一化的贴现累积增益值来选择的。根据作者的观察，给定查询-文档对的吸引力和检查值的乘积非常接近用户满意度，从而接近相关性水平。此外，RL模型是以这样的方式设计的，即RL代理的当前状态是通过估计的吸引力和检查值的离散化来确定的。通过这种方式，每个查询-文档对将根据其吸引力和检查值映射到特定状态。然后，基于奖励函数，RL代理将尝试选择在其当前状态下使所接收的奖励最大化的动作（相关性标签）。使用时间差（TD）学习算法，如Q学习和SARSA，学习代理逐渐学会在每个状态中识别适当的相关性标签。RL代理中使用的奖励与用户满意度和所选动作之间的差异成比例。在MSLR-WEB10K和WCL2R基准数据集上的实验结果表明，所提出的算法SeaRank优于基线算法。排名靠前的结果的改善更为明显，通常会受到用户的更多关注。原创性/价值这项研究提供了从IR特征到CM特征的映射，然后利用这些新生成的特征来构建RL模型。提出了RL模型，定义了状态、行为和奖励函数。通过在几个学习事件中应用TD学习算法，如Q学习和SARSA，RL代理将能够学习如何为给定的查询-文档对选择最合适的相关性标签。

{"title":"SeaRank: relevance prediction based on click models in a reinforcement learning framework","authors":"A. Keyhanipour, F. Oroumchian","doi":"10.1108/dta-01-2022-0001","DOIUrl":"https://doi.org/10.1108/dta-01-2022-0001","url":null,"abstract":"PurposeUser feedback inferred from the user's search-time behavior could improve the learning to rank (L2R) algorithms. Click models (CMs) present probabilistic frameworks for describing and predicting the user's clicks during search sessions. Most of these CMs are based on common assumptions such as Attractiveness, Examination and User Satisfaction. CMs usually consider the Attractiveness and Examination as pre- and post-estimators of the actual relevance. They also assume that User Satisfaction is a function of the actual relevance. This paper extends the authors' previous work by building a reinforcement learning (RL) model to predict the relevance. The Attractiveness, Examination and User Satisfaction are estimated using a limited number of the features of the utilized benchmark data set and then they are incorporated in the construction of an RL agent. The proposed RL model learns to predict the relevance label of documents with respect to a given query more effectively than the baseline RL models for those data sets.Design/methodology/approachIn this paper, User Satisfaction is used as an indication of the relevance level of a query to a document. User Satisfaction itself is estimated through Attractiveness and Examination, and in turn, Attractiveness and Examination are calculated by the random forest algorithm. In this process, only a small subset of top information retrieval (IR) features are used, which are selected based on their mean average precision and normalized discounted cumulative gain values. Based on the authors' observations, the multiplication of the Attractiveness and Examination values of a given query–document pair closely approximates the User Satisfaction and hence the relevance level. Besides, an RL model is designed in such a way that the current state of the RL agent is determined by discretization of the estimated Attractiveness and Examination values. In this way, each query–document pair would be mapped into a specific state based on its Attractiveness and Examination values. Then, based on the reward function, the RL agent would try to choose an action (relevance label) which maximizes the received reward in its current state. Using temporal difference (TD) learning algorithms, such as Q-learning and SARSA, the learning agent gradually learns to identify an appropriate relevance label in each state. The reward that is used in the RL agent is proportional to the difference between the User Satisfaction and the selected action.FindingsExperimental results on MSLR-WEB10K and WCL2R benchmark data sets demonstrate that the proposed algorithm, named as SeaRank, outperforms baseline algorithms. Improvement is more noticeable in top-ranked results, which usually receive more attention from users.Originality/valueThis research provides a mapping from IR features to the CM features and thereafter utilizes these newly generated features to build an RL model. This RL model is proposed with the definition of the states, acti","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47120228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Property Assertion Constraints for ontologies and knowledge graphs 本体和知识图的属性断言约束

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-08-13 DOI: 10.1108/dta-05-2022-0209

H. Dibowski

PurposeThe curation of ontologies and knowledge graphs (KGs) is an essential task for industrial knowledge-based applications, as they rely on the contained knowledge to be correct and error-free. Often, a significant amount of a KG is curated by humans. Established validation methods, such as Shapes Constraint Language, Shape Expressions or Web Ontology Language, can detect wrong statements only after their materialization, which can be too late. Instead, an approach that avoids errors and adequately supports users is required.Design/methodology/approachFor solving that problem, Property Assertion Constraints (PACs) have been developed. PACs extend the range definition of a property with additional logic expressed with SPARQL. For the context of a given instance and property, a tailored PAC query is dynamically built and triggered on the KG. It can determine all values that will result in valid property value assertions.FindingsPACs can avoid the expansion of KGs with invalid property value assertions effectively, as their contained expertise narrows down the valid options a user can choose from. This simplifies the knowledge curation and, most notably, relieves users or machines from knowing and applying this expertise, but instead enables a computer to take care of it.Originality/valuePACs are fundamentally different from existing approaches. Instead of detecting erroneous materialized facts, they can determine all semantically correct assertions before materializing them. This avoids invalid property value assertions and provides users an informed, purposeful assistance. To the author's knowledge, PACs are the only such approach.

目的本体和知识图(KGs)的管理是工业知识应用的一项基本任务，因为它们依赖于所包含的知识是正确和无错误的。通常，KG的很大一部分是由人类管理的。现有的验证方法，如形状约束语言、形状表达式或Web本体语言，只能在错误语句物化之后才能检测到错误语句，这可能为时已晚。相反，需要一种避免错误并充分支持用户的方法。设计/方法论/方法为了解决这个问题，开发了属性断言约束(Property Assertion Constraints, pac)。pac通过使用SPARQL表达的附加逻辑扩展属性的范围定义。对于给定实例和属性的上下文，将在KG上动态构建并触发定制的PAC查询。它可以确定将导致有效属性值断言的所有值。查找spacs可以有效地避免扩展具有无效属性值断言的kg，因为它们包含的专业知识缩小了用户可以选择的有效选项。这简化了知识管理，最值得注意的是，使用户或机器不必了解和应用这些专业知识，而是使计算机能够照顾它。原创性/价值pac从根本上不同于现有的方法。与检测错误的物化事实不同，它们可以在物化断言之前确定所有语义正确的断言。这避免了无效的属性值断言，并为用户提供了明智的、有目的的帮助。据作者所知，政治行动委员会是唯一这样的方法。

{"title":"Property Assertion Constraints for ontologies and knowledge graphs","authors":"H. Dibowski","doi":"10.1108/dta-05-2022-0209","DOIUrl":"https://doi.org/10.1108/dta-05-2022-0209","url":null,"abstract":"PurposeThe curation of ontologies and knowledge graphs (KGs) is an essential task for industrial knowledge-based applications, as they rely on the contained knowledge to be correct and error-free. Often, a significant amount of a KG is curated by humans. Established validation methods, such as Shapes Constraint Language, Shape Expressions or Web Ontology Language, can detect wrong statements only after their materialization, which can be too late. Instead, an approach that avoids errors and adequately supports users is required.Design/methodology/approachFor solving that problem, Property Assertion Constraints (PACs) have been developed. PACs extend the range definition of a property with additional logic expressed with SPARQL. For the context of a given instance and property, a tailored PAC query is dynamically built and triggered on the KG. It can determine all values that will result in valid property value assertions.FindingsPACs can avoid the expansion of KGs with invalid property value assertions effectively, as their contained expertise narrows down the valid options a user can choose from. This simplifies the knowledge curation and, most notably, relieves users or machines from knowing and applying this expertise, but instead enables a computer to take care of it.Originality/valuePACs are fundamentally different from existing approaches. Instead of detecting erroneous materialized facts, they can determine all semantically correct assertions before materializing them. This avoids invalid property value assertions and provides users an informed, purposeful assistance. To the author's knowledge, PACs are the only such approach.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"12 1","pages":"157-176"},"PeriodicalIF":1.6,"publicationDate":"2022-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80805380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

TMsDP: two-stage density peak clustering based on multi-strategy optimization TMsDP：基于多策略优化的两阶段密度峰值聚类

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-08-10 DOI: 10.1108/dta-08-2021-0222

Jie Ma, Zhiyuan Hao, Mo Hu

PurposeThe density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and another point with a higher ρ value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher ρ value and a higher δ value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP.Design/methodology/approachFirst, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results.FindingsThe experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms.Originality/valueThe authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.

目的提出密度峰值聚类算法(DP)，通过ρ值(局部密度)和δ值(ρ值较大的点与另一个点之间的距离)两个参数来识别聚类中心。根据DP的中心识别原理，潜在簇中心的ρ值和δ值应高于其他点。然而，这一原则可能会限制DP识别一些具有多中心或低密度区域中心的类别。另外，不恰当的DP分配策略可能会导致非中心点的错误分配结果。本文旨在解决上述问题，提高DP的聚类性能。设计/方法/方法首先，为了识别尽可能多的潜在聚类中心，作者引入针孔成像策略构建点域，以扩大潜在聚类中心的搜索范围。其次，他们设计了不同的计算方法来计算域距离、点域密度和域相似度。第三，采用域相似度实现域合并过程，优化最终聚类结果。在12个合成数据集和12个真实数据集上的实验结果表明，基于多策略优化(TMsDP)的两阶段密度峰值聚类优于多策略优化和其他最先进的算法。本文提出了一种新的基于DP的聚类方法TMsDP，并将点与点之间的关系转化为域与域之间的关系，最终进一步优化了DP的聚类性能。

{"title":"TMsDP: two-stage density peak clustering based on multi-strategy optimization","authors":"Jie Ma, Zhiyuan Hao, Mo Hu","doi":"10.1108/dta-08-2021-0222","DOIUrl":"https://doi.org/10.1108/dta-08-2021-0222","url":null,"abstract":"PurposeThe density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and another point with a higher ρ value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher ρ value and a higher δ value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP.Design/methodology/approachFirst, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results.FindingsThe experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms.Originality/valueThe authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43355836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A hybrid approach for predicting missing follower-followee links in social networks using topological features with ensemble learning 基于集成学习的拓扑特征预测社交网络中缺失的关注者-关注者链接的混合方法

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-07-09 DOI: 10.1108/dta-02-2022-0072

Riju Bhattacharya, N. K. Nagwani, Sarsij Tripathi

PurposeSocial networking platforms are increasingly using the Follower Link Prediction tool in an effort to expand the number of their users. It facilitates the discovery of previously unidentified individuals and can be employed to determine the relationships among the nodes in a social network. On the other hand, social site firms use follower–followee link prediction (FFLP) to increase their user base. FFLP can help identify unfamiliar people and determine node-to-node links in a social network. Choosing the appropriate person to follow becomes crucial as the number of users increases. A hybrid model employing the Ensemble Learning algorithm for FFLP (HMELA) is proposed to advise the formation of new follower links in large networks.Design/methodology/approachHMELA includes fundamental classification techniques for treating link prediction as a binary classification problem. The data sets are represented using a variety of machine-learning-friendly hybrid graph features. The HMELA is evaluated using six real-world social network data sets.FindingsThe first set of experiments used exploratory data analysis on a di-graph to produce a balanced matrix. The second set of experiments compared the benchmark and hybrid features on data sets. This was followed by using benchmark classifiers and ensemble learning methods. The experiments show that the proposed (HMELA) method predicts missing links better than other methods.Practical implicationsA hybrid suggested model for link prediction is proposed in this paper. The suggested HMELA model makes use of AUC scores to predict new future links. The proposed approach facilitates comprehension and insight into the domain of link prediction. This work is almost entirely aimed at academics, practitioners, and those involved in the field of social networks, etc. Also, the model is quite effective in the field of product recommendation and in recommending a new friend and user on social networks.Originality/valueThe outcome on six benchmark data sets revealed that when the HMELA strategy had been applied to all of the selected data sets, the area under the curve (AUC) scores were greater than when individual techniques were applied to the same data sets. Using the HMELA technique, the maximum AUC score in the Facebook data set has been increased by 10.3 per cent from 0.8449 to 0.9479. There has also been an 8.53 per cent increase in the accuracy of the Net Science, Karate Club and USAir databases. As a result, the HMELA strategy outperforms every other strategy tested in the study.

社交网络平台越来越多地使用追随者链接预测工具来扩大用户数量。它有助于发现以前未识别的个体，并可用于确定社会网络中节点之间的关系。另一方面，社交网站公司使用追随者-追随者链接预测(FFLP)来增加他们的用户基础。FFLP可以帮助识别不熟悉的人，并确定社会网络中的节点到节点链接。随着用户数量的增加，选择合适的追随者变得至关重要。提出了一种基于FFLP集成学习算法(HMELA)的混合模型，用于建议大型网络中新的追随者链接的形成。设计/方法/方法hmela包括将链路预测作为二元分类问题处理的基本分类技术。数据集使用各种机器学习友好的混合图特征表示。HMELA使用六个真实社会网络数据集进行评估。第一组实验使用探索性的数据分析在一个向线图上产生一个平衡的矩阵。第二组实验在数据集上比较了基准特征和混合特征。其次是使用基准分类器和集成学习方法。实验表明，该方法对缺失链接的预测效果优于其他方法。本文提出了一种用于链路预测的混合建议模型。建议的HMELA模型利用AUC分数来预测新的未来联系。所提出的方法有助于理解和洞察链接预测领域。这项工作几乎完全是针对学者，从业者，以及那些涉及到社会网络等领域。此外，该模型在产品推荐领域以及在社交网络上推荐新朋友和新用户方面也非常有效。原创性/价值六个基准数据集的结果显示，当HMELA策略应用于所有选定的数据集时，曲线下面积(AUC)分数大于单个技术应用于相同数据集时。使用HMELA技术，Facebook数据集的最大AUC得分从0.8449提高到0.9479，提高了10.3%。Net Science、空手道俱乐部和USAir数据库的准确率也提高了8.53%。因此，HMELA策略优于研究中测试的所有其他策略。

{"title":"A hybrid approach for predicting missing follower-followee links in social networks using topological features with ensemble learning","authors":"Riju Bhattacharya, N. K. Nagwani, Sarsij Tripathi","doi":"10.1108/dta-02-2022-0072","DOIUrl":"https://doi.org/10.1108/dta-02-2022-0072","url":null,"abstract":"PurposeSocial networking platforms are increasingly using the Follower Link Prediction tool in an effort to expand the number of their users. It facilitates the discovery of previously unidentified individuals and can be employed to determine the relationships among the nodes in a social network. On the other hand, social site firms use follower–followee link prediction (FFLP) to increase their user base. FFLP can help identify unfamiliar people and determine node-to-node links in a social network. Choosing the appropriate person to follow becomes crucial as the number of users increases. A hybrid model employing the Ensemble Learning algorithm for FFLP (HMELA) is proposed to advise the formation of new follower links in large networks.Design/methodology/approachHMELA includes fundamental classification techniques for treating link prediction as a binary classification problem. The data sets are represented using a variety of machine-learning-friendly hybrid graph features. The HMELA is evaluated using six real-world social network data sets.FindingsThe first set of experiments used exploratory data analysis on a di-graph to produce a balanced matrix. The second set of experiments compared the benchmark and hybrid features on data sets. This was followed by using benchmark classifiers and ensemble learning methods. The experiments show that the proposed (HMELA) method predicts missing links better than other methods.Practical implicationsA hybrid suggested model for link prediction is proposed in this paper. The suggested HMELA model makes use of AUC scores to predict new future links. The proposed approach facilitates comprehension and insight into the domain of link prediction. This work is almost entirely aimed at academics, practitioners, and those involved in the field of social networks, etc. Also, the model is quite effective in the field of product recommendation and in recommending a new friend and user on social networks.Originality/valueThe outcome on six benchmark data sets revealed that when the HMELA strategy had been applied to all of the selected data sets, the area under the curve (AUC) scores were greater than when individual techniques were applied to the same data sets. Using the HMELA technique, the maximum AUC score in the Facebook data set has been increased by 10.3 per cent from 0.8449 to 0.9479. There has also been an 8.53 per cent increase in the accuracy of the Net Science, Karate Club and USAir databases. As a result, the HMELA strategy outperforms every other strategy tested in the study.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"24 1","pages":"131-153"},"PeriodicalIF":1.6,"publicationDate":"2022-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83050313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mining the determinants of review helpfulness: a novel approach using intelligent feature engineering and explainable AI 挖掘评论有用性的决定因素:一种使用智能特征工程和可解释人工智能的新方法

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-07-05 DOI: 10.1108/dta-12-2021-0359

Jiho Kim, Hanjun Lee, Hongchul Lee

PurposeThis paper aims to find determinants that can predict the helpfulness of online customer reviews (OCRs) with a novel approach.Design/methodology/approachThe approach consists of feature engineering using various text mining techniques including BERT and machine learning models that can classify OCRs according to their potential helpfulness. Moreover, explainable artificial intelligence methodologies are used to identify the determinants for helpfulness.FindingsThe important result is that the boosting-based ensemble model showed the highest prediction performance. In addition, it was confirmed that the sentiment features of OCRs and the reputation of reviewers are important determinants that augment the review helpfulness.Research limitations/implicationsEach online community has different purposes, fields and characteristics. Thus, the results of this study cannot be generalized. However, it is expected that this novel approach can be integrated with any platform where online reviews are used.Originality/valueThis paper incorporates feature engineering methodologies for online reviews, including the latest methodology. It also includes novel techniques to contribute to ongoing research on mining the determinants of review helpfulness.

目的本文旨在用一种新颖的方法寻找能够预测在线顾客评论(ocr)有用性的决定因素。设计/方法/方法该方法包括使用各种文本挖掘技术的特征工程，包括BERT和机器学习模型，这些模型可以根据ocr的潜在有用性对其进行分类。此外，可解释的人工智能方法被用来确定决定因素的帮助。重要的结果是，基于助推的集成模型具有最高的预测性能。此外，还证实了ocr的情感特征和评论者的声誉是增加评论有用性的重要决定因素。研究局限/启示搜索网络社区具有不同的目的、领域和特点。因此，本研究的结果不能一概而论。然而，人们期望这种新颖的方法可以与任何使用在线评论的平台集成。原创性/价值本文结合了在线评论的特征工程方法，包括最新的方法。它还包括新技术，以促进正在进行的研究，挖掘审查有益的决定因素。

{"title":"Mining the determinants of review helpfulness: a novel approach using intelligent feature engineering and explainable AI","authors":"Jiho Kim, Hanjun Lee, Hongchul Lee","doi":"10.1108/dta-12-2021-0359","DOIUrl":"https://doi.org/10.1108/dta-12-2021-0359","url":null,"abstract":"PurposeThis paper aims to find determinants that can predict the helpfulness of online customer reviews (OCRs) with a novel approach.Design/methodology/approachThe approach consists of feature engineering using various text mining techniques including BERT and machine learning models that can classify OCRs according to their potential helpfulness. Moreover, explainable artificial intelligence methodologies are used to identify the determinants for helpfulness.FindingsThe important result is that the boosting-based ensemble model showed the highest prediction performance. In addition, it was confirmed that the sentiment features of OCRs and the reputation of reviewers are important determinants that augment the review helpfulness.Research limitations/implicationsEach online community has different purposes, fields and characteristics. Thus, the results of this study cannot be generalized. However, it is expected that this novel approach can be integrated with any platform where online reviews are used.Originality/valueThis paper incorporates feature engineering methodologies for online reviews, including the latest methodology. It also includes novel techniques to contribute to ongoing research on mining the determinants of review helpfulness.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"116 1","pages":"108-130"},"PeriodicalIF":1.6,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87907085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A cascaded deep-learning-based model for face mask detection 基于级联深度学习的口罩检测模型

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-06-28 DOI: 10.1108/dta-02-2022-0076

Akhil Kumar

PurposeThis work aims to present a deep learning model for face mask detection in surveillance environments such as automatic teller machines (ATMs), banks, etc. to identify persons wearing face masks. In surveillance environments, complete visibility of the face area is a guideline, and criminals and law offenders commit crimes by hiding their faces behind a face mask. The face mask detector model proposed in this work can be used as a tool and integrated with surveillance cameras in autonomous surveillance environments to identify and catch law offenders and criminals.Design/methodology/approachThe proposed face mask detector is developed by integrating the residual network (ResNet)34 feature extractor on top of three You Only Look Once (YOLO) detection layers along with the usage of the spatial pyramid pooling (SPP) layer to extract a rich and dense feature map. Furthermore, at the training time, data augmentation operations such as Mosaic and MixUp have been applied to the feature extraction network so that it can get trained with images of varying complexities. The proposed detector is trained and tested over a custom face mask detection dataset consisting of 52,635 images. For validation, comparisons have been provided with the performance of YOLO v1, v2, tiny YOLO v1, v2, v3 and v4 and other benchmark work present in the literature by evaluating performance metrics such as precision, recall, F1 score, mean average precision (mAP) for the overall dataset and average precision (AP) for each class of the dataset.FindingsThe proposed face mask detector achieved 4.75–9.75 per cent higher detection accuracy in terms of mAP, 5–31 per cent higher AP for detection of faces with masks and, specifically, 2–30 per cent higher AP for detection of face masks on the face region as compared to the tested baseline variants of YOLO. Furthermore, the usage of the ResNet34 feature extractor and SPP layer in the proposed detection model reduced the training time and the detection time. The proposed face mask detection model can perform detection over an image in 0.45 s, which is 0.2–0.15 s lesser than that for other tested YOLO variants, thus making the proposed detection model perform detections at a higher speed.Research limitations/implicationsThe proposed face mask detector model can be utilized as a tool to detect persons with face masks who are a potential threat to the automatic surveillance environments such as ATMs, banks, airport security checks, etc. The other research implication of the proposed work is that it can be trained and tested for other object detection problems such as cancer detection in images, fish species detection, vehicle detection, etc.Practical implicationsThe proposed face mask detector can be integrated with automatic surveillance systems and used as a tool to detect persons with face masks who are potential threats to ATMs, banks, etc. and in the present times of COVID-19 to detect if the people are following a COVID-appropria

本工作旨在提出一种深度学习模型，用于自动柜员机(atm)、银行等监控环境中的口罩检测，以识别戴口罩的人员。在监视环境中，面部区域完全可见是一种指导方针，犯罪分子和违法者将脸部隐藏在口罩后面进行犯罪。本文提出的面罩检测器模型可以作为一种工具，并与自主监控环境中的监控摄像头相结合，以识别和捕获违法者和犯罪分子。设计/方法/方法所提出的人脸检测器是在三个You Only Look Once (YOLO)检测层之上集成残差网络(ResNet)34特征提取器，并使用空间金字塔池(SPP)层提取丰富而密集的特征图。此外，在训练时，在特征提取网络中应用了马赛克和MixUp等数据增强操作，使其可以用不同复杂度的图像进行训练。该检测器在包含52,635张图像的自定义面罩检测数据集上进行训练和测试。为了验证，通过评估精度、召回率、F1分数、整体数据集的平均平均精度(mAP)和每类数据集的平均精度(AP)等性能指标，对YOLO v1、v2、微型YOLO v1、v2、v3和v4以及文献中存在的其他基准工作的性能进行了比较。与YOLO测试的基线变体相比，所提出的口罩检测器在mAP方面的检测准确率提高了4.75 - 9.75%，在检测戴口罩的面部时的AP提高了5 - 31%，特别是在检测面部区域的口罩时的AP提高了2 - 30%。此外，在检测模型中使用ResNet34特征提取器和SPP层，减少了训练时间和检测时间。本文提出的口罩检测模型可以在0.45 s内完成对一幅图像的检测，比其他已测试的YOLO变体检测时间缩短0.2-0.15 s，从而提高了检测速度。研究局限/启示建议的面罩侦测模型可作为一种工具，用以侦测对自动监察环境(例如自动柜员机、银行、机场保安检查等)构成潜在威胁的戴面罩人士。这项工作的另一个研究意义是，它可以被训练和测试用于其他目标检测问题，如图像中的癌症检测、鱼类检测、车辆检测等。实际意义提议的面罩检测器可以与自动监控系统集成，作为一种工具，用于检测对自动取款机、银行、等，并在当前COVID-19时期检测人们在公共场所是否遵循了佩戴口罩的COVID-19适当行为。独创性/价值本工作的新颖之处在于使用了带有YOLO检测层的ResNet34特征提取器，这使得所提出的模型成为一个紧凑而强大的基于卷积神经网络的人脸面具检测模型。此外，将SPP层应用于ResNet34特征提取器，使其能够提取出丰富而密集的特征图。本工作的另一个新颖之处是在训练网络中实现马赛克和混合数据增强，为特征提取器提供不同复杂性和方向的3倍图像，并进一步帮助实现更高的检测精度。该模型提取了丰富的特征，在训练时进行了增强，在保持检测速度的同时实现了较高的检测精度。

{"title":"A cascaded deep-learning-based model for face mask detection","authors":"Akhil Kumar","doi":"10.1108/dta-02-2022-0076","DOIUrl":"https://doi.org/10.1108/dta-02-2022-0076","url":null,"abstract":"PurposeThis work aims to present a deep learning model for face mask detection in surveillance environments such as automatic teller machines (ATMs), banks, etc. to identify persons wearing face masks. In surveillance environments, complete visibility of the face area is a guideline, and criminals and law offenders commit crimes by hiding their faces behind a face mask. The face mask detector model proposed in this work can be used as a tool and integrated with surveillance cameras in autonomous surveillance environments to identify and catch law offenders and criminals.Design/methodology/approachThe proposed face mask detector is developed by integrating the residual network (ResNet)34 feature extractor on top of three You Only Look Once (YOLO) detection layers along with the usage of the spatial pyramid pooling (SPP) layer to extract a rich and dense feature map. Furthermore, at the training time, data augmentation operations such as Mosaic and MixUp have been applied to the feature extraction network so that it can get trained with images of varying complexities. The proposed detector is trained and tested over a custom face mask detection dataset consisting of 52,635 images. For validation, comparisons have been provided with the performance of YOLO v1, v2, tiny YOLO v1, v2, v3 and v4 and other benchmark work present in the literature by evaluating performance metrics such as precision, recall, F1 score, mean average precision (mAP) for the overall dataset and average precision (AP) for each class of the dataset.FindingsThe proposed face mask detector achieved 4.75–9.75 per cent higher detection accuracy in terms of mAP, 5–31 per cent higher AP for detection of faces with masks and, specifically, 2–30 per cent higher AP for detection of face masks on the face region as compared to the tested baseline variants of YOLO. Furthermore, the usage of the ResNet34 feature extractor and SPP layer in the proposed detection model reduced the training time and the detection time. The proposed face mask detection model can perform detection over an image in 0.45 s, which is 0.2–0.15 s lesser than that for other tested YOLO variants, thus making the proposed detection model perform detections at a higher speed.Research limitations/implicationsThe proposed face mask detector model can be utilized as a tool to detect persons with face masks who are a potential threat to the automatic surveillance environments such as ATMs, banks, airport security checks, etc. The other research implication of the proposed work is that it can be trained and tested for other object detection problems such as cancer detection in images, fish species detection, vehicle detection, etc.Practical implicationsThe proposed face mask detector can be integrated with automatic surveillance systems and used as a tool to detect persons with face masks who are potential threats to ATMs, banks, etc. and in the present times of COVID-19 to detect if the people are following a COVID-appropria","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"1 1","pages":"84-107"},"PeriodicalIF":1.6,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85569302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A boosting-based transfer learning method to address absolute-rarity in skin lesion datasets and prevent weight-drift for melanoma detection 一种基于增强的迁移学习方法，用于解决皮肤病变数据集的绝对罕见性，并防止黑色素瘤检测的重量漂移

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-06-20 DOI: 10.1108/dta-10-2021-0296

L. Singh, R. Janghel, S. Sahu

PurposeAutomated skin lesion analysis plays a vital role in early detection. Having relatively small-sized imbalanced skin lesion datasets impedes learning and dominates research in automated skin lesion analysis. The unavailability of adequate data poses difficulty in developing classification methods due to the skewed class distribution.Design/methodology/approachBoosting-based transfer learning (TL) paradigms like Transfer AdaBoost algorithm can compensate for such a lack of samples by taking advantage of auxiliary data. However, in such methods, beneficial source instances representing the target have a fast and stochastic weight convergence, which results in “weight-drift” that negates transfer. In this paper, a framework is designed utilizing the “Rare-Transfer” (RT), a boosting-based TL algorithm, that prevents “weight-drift” and simultaneously addresses absolute-rarity in skin lesion datasets. RT prevents the weights of source samples from quick convergence. It addresses absolute-rarity using an instance transfer approach incorporating the best-fit set of auxiliary examples, which improves balanced error minimization. It compensates for class unbalance and scarcity of training samples in absolute-rarity simultaneously for inducing balanced error optimization.FindingsPromising results are obtained utilizing the RT compared with state-of-the-art techniques on absolute-rare skin lesion datasets with an accuracy of 92.5%. Wilcoxon signed-rank test examines significant differences amid the proposed RT algorithm and conventional algorithms used in the experiment.Originality/valueExperimentation is performed on absolute-rare four skin lesion datasets, and the effectiveness of RT is assessed based on accuracy, sensitivity, specificity and area under curve. The performance is compared with an existing ensemble and boosting-based TL methods.

目的自动皮肤病变分析在早期发现中起着至关重要的作用。相对较小的不平衡皮肤病变数据集阻碍了学习，并主导了自动皮肤病变分析的研究。由于类分布的偏斜，缺乏足够的数据给分类方法的发展带来了困难。设计/方法/方法基于boost的迁移学习(TL)范例，如transfer AdaBoost算法，可以通过利用辅助数据来弥补这种样本的缺乏。然而，在这种方法中，代表目标的有益源实例具有快速和随机的权重收敛，这导致“权重漂移”，从而否定了转移。本文设计了一个框架，利用“稀有转移”(RT)，一种基于增强的TL算法，防止“重量漂移”，同时解决皮肤病变数据集中的绝对稀有问题。RT可以防止源样本的权重快速收敛。它使用包含最佳拟合辅助示例集的实例转移方法来解决绝对稀缺性问题，从而提高了平衡误差最小化。它同时补偿训练样本的绝对稀缺性和类不平衡性，以诱导平衡误差优化。研究结果:与最先进的技术相比，利用RT在绝对罕见的皮肤病变数据集上获得了令人鼓舞的结果，准确率为92.5%。Wilcoxon符号秩检验检验了所提出的RT算法与实验中使用的常规算法之间的显著差异。独创性/价值实验在绝对罕见的四个皮肤病变数据集上进行，并根据准确性、灵敏度、特异性和曲线下面积评估RT的有效性。将其性能与现有的基于集成和增强的TL方法进行了比较。

{"title":"A boosting-based transfer learning method to address absolute-rarity in skin lesion datasets and prevent weight-drift for melanoma detection","authors":"L. Singh, R. Janghel, S. Sahu","doi":"10.1108/dta-10-2021-0296","DOIUrl":"https://doi.org/10.1108/dta-10-2021-0296","url":null,"abstract":"PurposeAutomated skin lesion analysis plays a vital role in early detection. Having relatively small-sized imbalanced skin lesion datasets impedes learning and dominates research in automated skin lesion analysis. The unavailability of adequate data poses difficulty in developing classification methods due to the skewed class distribution.Design/methodology/approachBoosting-based transfer learning (TL) paradigms like Transfer AdaBoost algorithm can compensate for such a lack of samples by taking advantage of auxiliary data. However, in such methods, beneficial source instances representing the target have a fast and stochastic weight convergence, which results in “weight-drift” that negates transfer. In this paper, a framework is designed utilizing the “Rare-Transfer” (RT), a boosting-based TL algorithm, that prevents “weight-drift” and simultaneously addresses absolute-rarity in skin lesion datasets. RT prevents the weights of source samples from quick convergence. It addresses absolute-rarity using an instance transfer approach incorporating the best-fit set of auxiliary examples, which improves balanced error minimization. It compensates for class unbalance and scarcity of training samples in absolute-rarity simultaneously for inducing balanced error optimization.FindingsPromising results are obtained utilizing the RT compared with state-of-the-art techniques on absolute-rare skin lesion datasets with an accuracy of 92.5%. Wilcoxon signed-rank test examines significant differences amid the proposed RT algorithm and conventional algorithms used in the experiment.Originality/valueExperimentation is performed on absolute-rare four skin lesion datasets, and the effectiveness of RT is assessed based on accuracy, sensitivity, specificity and area under curve. The performance is compared with an existing ensemble and boosting-based TL methods.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"14 1","pages":"1-17"},"PeriodicalIF":1.6,"publicationDate":"2022-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75255945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Construction of public security indicators based on characteristics of shared group behavior patterns 基于共享群体行为模式特征的公共安全指标构建

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2022-06-03 DOI: 10.1108/dta-12-2021-0389

Xiyue Deng, Xiaoming Li, Zhenzhen Chen, Meng Zhu, N. Xiong, Li Shen

PurposeHuman group behavior is the driving force behind many complex social and economic phenomena. Few studies have integrated multi-dimensional travel patterns and city interest points to construct urban security risk indicators. This paper combines traffic data and urban alarm data to analyze the safe travel characteristics of the urban population. The research results are helpful to explore the diversity of human group behavior, grasp the temporal and spatial laws and reveal regional security risks. It provides a reference for optimizing resource deployment and group intelligence analysis in emergency management.Design/methodology/approachBased on the dynamics index of group behavior, this paper mines the data of large shared bikes and ride-hailing in a big city of China. We integrate the urban interest points and travel dynamic characteristics, construct the urban traffic safety index based on alarm behavior and further calculate the urban safety index.FindingsThis study found significant differences in the travel power index among ride-sharing users. There is a positive correlation between user shared bike trips and the power-law bimodal phenomenon in the logarithmic coordinate system. It is closely related to the urban public security index.Originality/valueBased on group-shared dynamic index integrated alarm, we innovatively constructed an urban public safety index and analyzed the correlation of travel alarm behavior. The research results fully reveal the internal mechanism of the group behavior safety index and provide a valuable supplement for the police intelligence analysis.

目的人类群体行为是许多复杂社会和经济现象背后的驱动力。很少有研究将多维出行模式和城市兴趣点相结合来构建城市安全风险指标。本文结合交通数据和城市警报数据，分析了城市人口的安全出行特征。研究结果有助于探索人类群体行为的多样性，把握时空规律，揭示区域安全风险。为应急管理中优化资源配置和群体智能分析提供参考。设计/方法论/方法基于群体行为动力学指标，对中国大城市大型共享单车和叫车的数据进行挖掘。我们综合城市兴趣点和出行动态特征，构建基于报警行为的城市交通安全指数，并进一步计算城市安全指数。研究发现，拼车用户的出行能力指数存在显著差异。在对数坐标系中，用户共享单车出行与幂律双峰现象呈正相关。它与城市公共安全指数密切相关。独创性/价值基于群体共享动态指标综合报警，创新构建了城市公共安全指标，并分析了出行报警行为的相关性。研究结果充分揭示了群体行为安全指数的内在机制，为警方情报分析提供了有价值的补充。

{"title":"Construction of public security indicators based on characteristics of shared group behavior patterns","authors":"Xiyue Deng, Xiaoming Li, Zhenzhen Chen, Meng Zhu, N. Xiong, Li Shen","doi":"10.1108/dta-12-2021-0389","DOIUrl":"https://doi.org/10.1108/dta-12-2021-0389","url":null,"abstract":"PurposeHuman group behavior is the driving force behind many complex social and economic phenomena. Few studies have integrated multi-dimensional travel patterns and city interest points to construct urban security risk indicators. This paper combines traffic data and urban alarm data to analyze the safe travel characteristics of the urban population. The research results are helpful to explore the diversity of human group behavior, grasp the temporal and spatial laws and reveal regional security risks. It provides a reference for optimizing resource deployment and group intelligence analysis in emergency management.Design/methodology/approachBased on the dynamics index of group behavior, this paper mines the data of large shared bikes and ride-hailing in a big city of China. We integrate the urban interest points and travel dynamic characteristics, construct the urban traffic safety index based on alarm behavior and further calculate the urban safety index.FindingsThis study found significant differences in the travel power index among ride-sharing users. There is a positive correlation between user shared bike trips and the power-law bimodal phenomenon in the logarithmic coordinate system. It is closely related to the urban public security index.Originality/valueBased on group-shared dynamic index integrated alarm, we innovatively constructed an urban public safety index and analyzed the correlation of travel alarm behavior. The research results fully reveal the internal mechanism of the group behavior safety index and provide a valuable supplement for the police intelligence analysis.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48065288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0