ACM Transactions on Internet Technology最新文献_第7页

A Multi-type Classifier Ensemble for Detecting Fake Reviews Through Textual-based Feature Extraction 基于文本特征提取的多类型分类器集成检测虚假评论

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-04-05 DOI: https://dl.acm.org/doi/10.1145/3568676

Gregorius Satia Budhi, Raymond Chiong

The financial impact of online reviews has prompted some fraudulent sellers to generate fake consumer reviews for either promoting their products or discrediting competing products. In this study, we propose a novel ensemble model—the Multi-type Classifier Ensemble (MtCE)—combined with a textual-based featuring method, which is relatively independent of the system, to detect fake online consumer reviews. Unlike other ensemble models that utilise only the same type of single classifier, our proposed ensemble utilises several customised machine learning classifiers (including deep learning models) as its base classifiers. The results of our experiments show that the MtCE can adequately detect fake reviews, and that it outperforms other single and ensemble methods in terms of accuracy and other measurements for all the relevant public datasets used in this study. Moreover, if set correctly, the parameters of MtCE, such as base-classifier types, the total number of base classifiers, bootstrap, and the method to vote on output (e.g., majority or priority), can further improve the performance of the proposed ensemble.

在线评论的经济影响促使一些欺诈卖家产生虚假的消费者评论，要么推销自己的产品，要么诋毁竞争对手的产品。在这项研究中，我们提出了一种新的集成模型-多类型分类器集成(MtCE) -结合基于文本的特征方法，该方法相对独立于系统，以检测在线消费者评论的虚假。与其他仅使用相同类型的单个分类器的集成模型不同，我们提出的集成使用几个定制的机器学习分类器(包括深度学习模型)作为其基本分类器。我们的实验结果表明，MtCE可以充分检测虚假评论，并且在本研究中使用的所有相关公共数据集的准确性和其他测量方面，它优于其他单一和集成方法。此外，如果设置正确，MtCE的参数，如基本分类器类型、基本分类器总数、bootstrap和对输出进行投票的方法(例如多数或优先级)，可以进一步提高所建议的集成的性能。

{"title":"A Multi-type Classifier Ensemble for Detecting Fake Reviews Through Textual-based Feature Extraction","authors":"Gregorius Satia Budhi, Raymond Chiong","doi":"https://dl.acm.org/doi/10.1145/3568676","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3568676","url":null,"abstract":"The financial impact of online reviews has prompted some fraudulent sellers to generate fake consumer reviews for either promoting their products or discrediting competing products. In this study, we propose a novel ensemble model—the Multi-type Classifier Ensemble (MtCE)—combined with a textual-based featuring method, which is relatively independent of the system, to detect fake online consumer reviews. Unlike other ensemble models that utilise only the same type of single classifier, our proposed ensemble utilises several customised machine learning classifiers (including deep learning models) as its base classifiers. The results of our experiments show that the MtCE can adequately detect fake reviews, and that it outperforms other single and ensemble methods in terms of accuracy and other measurements for all the relevant public datasets used in this study. Moreover, if set correctly, the parameters of MtCE, such as base-classifier types, the total number of base classifiers, bootstrap, and the method to vote on output (e.g., majority or priority), can further improve the performance of the proposed ensemble.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"82 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time Pricing-based Resource Allocation in Open Market Environments 开放市场环境下基于实时定价的资源配置

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-04-05 DOI: https://dl.acm.org/doi/10.1145/3465237

Pankaj Mishra, Ahmed Moustafa, Takayuki Ito

Open market environments consist of a set of participants (vendors and consumers) that dynamically leave or join the market. As a result, the arising dynamism leads to uncertainties in supply and demand of the resources in these open markets. In specific, in such uncertain markets, vendors attempt to maximise their revenue by dynamically changing their selling prices according to the market demand. In this regard, an optimal resource allocation approach becomes immensely needed to optimise the selling prices based on the supply and demand of the resources in the open market. Therefore, optimal selling prices should maximise the revenue of vendors while protecting the utility of buyers. In this context, we propose a real-time pricing approach for resource allocation in open market environments. The proposed approach introduces a priority-based fairness mechanism to allocate the available resources in a reverse-auction paradigm. Finally, we compare the proposed approach with two state-of-the-art resource allocation approaches. The experimental results show that the proposed approach outperforms the other two resource allocation approaches in its ability to maximise the vendors’ revenue.

开放的市场环境由一组动态地离开或加入市场的参与者(供应商和消费者)组成。因此，产生的活力导致这些开放市场中资源的供求不确定。具体而言，在这种不确定的市场中，供应商试图根据市场需求动态改变销售价格，从而实现收入最大化。在这方面，迫切需要一种最优的资源配置方法，以优化公开市场中基于资源供需的销售价格。因此，最优销售价格应使卖主的收益最大化，同时保护买者的效用。在此背景下，我们提出了一种在开放市场环境中进行资源配置的实时定价方法。该方法引入了一种基于优先级的公平机制，以反向拍卖的方式分配可用资源。最后，我们将提出的方法与两种最先进的资源分配方法进行了比较。实验结果表明，该方法在实现供应商收益最大化方面优于其他两种资源分配方法。

{"title":"Real-time Pricing-based Resource Allocation in Open Market Environments","authors":"Pankaj Mishra, Ahmed Moustafa, Takayuki Ito","doi":"https://dl.acm.org/doi/10.1145/3465237","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3465237","url":null,"abstract":"Open market environments consist of a set of participants (vendors and consumers) that dynamically leave or join the market. As a result, the arising dynamism leads to uncertainties in supply and demand of the resources in these open markets. In specific, in such uncertain markets, vendors attempt to maximise their revenue by dynamically changing their selling prices according to the market demand. In this regard, an optimal resource allocation approach becomes immensely needed to optimise the selling prices based on the supply and demand of the resources in the open market. Therefore, optimal selling prices should maximise the revenue of vendors while protecting the utility of buyers. In this context, we propose a real-time pricing approach for resource allocation in open market environments. The proposed approach introduces a priority-based fairness mechanism to allocate the available resources in a reverse-auction paradigm. Finally, we compare the proposed approach with two state-of-the-art resource allocation approaches. The experimental results show that the proposed approach outperforms the other two resource allocation approaches in its ability to maximise the vendors’ revenue.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"82 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finding the Source in Networks: An Approach Based on Structural Entropy 在网络中寻找源:一种基于结构熵的方法

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-03-27 DOI: https://dl.acm.org/doi/10.1145/3568309

Chong Zhang, Qiang Guo, Luoyi Fu, Jiaxin Ding, Xinde Cao, Fei Long, Xinbing Wang, Chenghu Zhou

The popularity of intelligent devices provides straightforward access to the Internet and online social networks. However, the quick and easy data updates from networks also benefit the risk spreading, such as rumor, malware, or computer viruses. To this end, this article studies the problem of source detection, which is to infer the source node out of an aftermath of a cascade, that is, the observed infected graph G_N of the network at some time. Prior arts have adopted various statistical quantities such as degree, distance, or infection size to reflect the structural centrality of the source. In this article, we propose a new metric that we call the infected tree entropy (ITE), to utilize richer underlying structural features for source detection. Our idea of ITE is inspired by the conception of structural entropy [21], which demonstrated that the minimization of average bits to encode the network structures with different partitions is the principle for detecting the natural or true structures in real-world networks. Accordingly, our proposed ITE based estimator for the source tries to minimize the coding of network partitions brought by the infected tree rooted at all the potential sources, thus minimizing the structural deviation between the cascades from the potential sources and the actual infection process included in G_N. On polynomially growing geometric trees, with increasing tree heterogeneity, the ITE estimator remarkably yields more reliable detection under only moderate infection sizes, and returns an asymptotically complete detection. In contrast, for regular expanding trees, we still observe guaranteed detection probability of ITE estimator even with an infinite infection size, thanks to the degree regularity property. We also algorithmically realize the ITE based detection that enjoys linear time complexity via a message-passing scheme, and further extend it to general graphs. Extensive experiments on synthetic and real datasets confirm the superiority of ITE to the baselines. For example, ITE returns an accuracy of 85%, ranking the source among the top 10%, far exceeding 55% of the classic algorithm on scale-free networks.

智能设备的普及提供了直接访问互联网和在线社交网络的途径。然而，网络上快速便捷的数据更新也有利于风险的传播，如谣言、恶意软件或计算机病毒。为此，本文研究了源检测问题，即从级联的余波中推断出源节点，即从某一时刻观察到的网络感染图GN中推断出源节点。现有技术采用了各种统计量，如程度、距离或感染大小来反映源的结构中心性。在本文中，我们提出了一种新的度量，我们称之为感染树熵(ITE)，以利用更丰富的底层结构特征进行源检测。我们对ITE的想法受到了结构熵(structural entropy)概念的启发[21]，该概念证明了对不同分区的网络结构进行编码的平均比特的最小化是检测现实世界网络中自然或真实结构的原则。因此，我们提出的基于ITE的源估计器试图最小化扎根于所有潜在源的感染树所带来的网络分区编码，从而最小化来自潜在源的级联与GN中包含的实际感染过程之间的结构偏差。在多项式生长的几何树上，随着树异质性的增加，ITE估计器在中等感染规模下显著地产生更可靠的检测，并返回渐近完全检测。相比之下，对于规则扩展树，由于程度正则性，即使感染规模无限，我们仍然观察到ITE估计器的检测概率是有保证的。我们还通过消息传递方案算法实现了具有线性时间复杂度的基于ITE的检测，并将其进一步扩展到一般图中。在合成数据集和真实数据集上进行的大量实验证实了ITE相对于基线的优越性。例如，ITE返回的准确率为85%，将源排在前10%，远远超过无标度网络上经典算法的55%。

{"title":"Finding the Source in Networks: An Approach Based on Structural Entropy","authors":"Chong Zhang, Qiang Guo, Luoyi Fu, Jiaxin Ding, Xinde Cao, Fei Long, Xinbing Wang, Chenghu Zhou","doi":"https://dl.acm.org/doi/10.1145/3568309","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3568309","url":null,"abstract":"The popularity of intelligent devices provides straightforward access to the Internet and online social networks. However, the quick and easy data updates from networks also benefit the risk spreading, such as rumor, malware, or computer viruses. To this end, this article studies the problem of source detection, which is to infer the source node out of an aftermath of a cascade, that is, the observed infected graph GN of the network at some time. Prior arts have adopted various statistical quantities such as degree, distance, or infection size to reflect the structural centrality of the source. In this article, we propose a new metric that we call the infected tree entropy (ITE), to utilize richer underlying structural features for source detection. Our idea of ITE is inspired by the conception of structural entropy [21], which demonstrated that the minimization of average bits to encode the network structures with different partitions is the principle for detecting the natural or true structures in real-world networks. Accordingly, our proposed ITE based estimator for the source tries to minimize the coding of network partitions brought by the infected tree rooted at all the potential sources, thus minimizing the structural deviation between the cascades from the potential sources and the actual infection process included in GN. On polynomially growing geometric trees, with increasing tree heterogeneity, the ITE estimator remarkably yields more reliable detection under only moderate infection sizes, and returns an asymptotically complete detection. In contrast, for regular expanding trees, we still observe guaranteed detection probability of ITE estimator even with an infinite infection size, thanks to the degree regularity property. We also algorithmically realize the ITE based detection that enjoys linear time complexity via a message-passing scheme, and further extend it to general graphs. Extensive experiments on synthetic and real datasets confirm the superiority of ITE to the baselines. For example, ITE returns an accuracy of 85%, ranking the source among the top 10%, far exceeding 55% of the classic algorithm on scale-free networks.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"1 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Concept Drift in Software Defect Prediction: A Method for Detecting and Handling the Drift 软件缺陷预测中的概念漂移:一种检测和处理漂移的方法

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-03-27 DOI: 10.1145/3589342

Arvind Kumar Gangwar, Surinder Kumar

Software Defect Prediction (SDP) is crucial towards software quality assurance in software engineering. SDP analyzes the software metrics data for timely prediction of defect prone software modules. Prediction process is automated by constructing defect prediction classification models using machine learning techniques. These models are trained using metrics data from historical projects of similar types. Based on the learned experience, models are used to predict defect prone modules in currently tested software. These models perform well if the concept is stationary in a dynamic software development environment. But their performance degrades unexpectedly in the presence of change in concept (Concept Drift). Therefore, concept drift (CD) detection is an important activity for improving the overall accuracy of the prediction model. Previous studies on SDP have shown that CD may occur in software defect data and the used defect prediction model may require to be updated to deal with CD. This phenomenon of handling the CD is known as CD adaptation. It is observed that still efforts need to be done in this direction in the SDP domain. In this article, we have proposed a pair of paired learners (PoPL) approach for handling CD in SDP. We combined the drift detection capabilities of two independent paired learners and used the paired learner (PL) with the best performance in recent time for next prediction. We experimented on various publicly available software defect datasets garnered from public data repositories. Experimentation results showed that our proposed approach performed better than the existing similar works and the base PL model based on various performance measures.

在软件工程中，软件缺陷预测是保证软件质量的关键。SDP分析软件度量数据，以便及时预测容易出现缺陷的软件模块。利用机器学习技术构建缺陷预测分类模型，实现了预测过程的自动化。这些模型使用来自类似类型的历史项目的度量数据进行训练。基于所学的经验，模型被用来预测当前测试软件中容易出现缺陷的模块。如果概念在动态软件开发环境中是固定的，那么这些模型表现良好。但当概念发生变化时，它们的性能会意外下降(概念漂移)。因此，概念漂移(CD)检测是提高预测模型整体精度的重要活动。以往关于SDP的研究表明，软件缺陷数据中可能出现CD，所使用的缺陷预测模型可能需要更新来处理CD。这种处理CD的现象被称为CD适应。可以观察到，在SDP领域，仍需要在这个方向上作出努力。在本文中，我们提出了一对配对学习器(PoPL)方法来处理SDP中的CD。我们结合了两个独立的配对学习器的漂移检测能力，并使用最近表现最好的配对学习器(PL)进行下一次预测。我们对从公共数据存储库中收集的各种公开可用的软件缺陷数据集进行了实验。实验结果表明，我们提出的方法比现有的类似工作和基于各种性能指标的基本PL模型表现得更好。

{"title":"Concept Drift in Software Defect Prediction: A Method for Detecting and Handling the Drift","authors":"Arvind Kumar Gangwar, Surinder Kumar","doi":"10.1145/3589342","DOIUrl":"https://doi.org/10.1145/3589342","url":null,"abstract":"Software Defect Prediction (SDP) is crucial towards software quality assurance in software engineering. SDP analyzes the software metrics data for timely prediction of defect prone software modules. Prediction process is automated by constructing defect prediction classification models using machine learning techniques. These models are trained using metrics data from historical projects of similar types. Based on the learned experience, models are used to predict defect prone modules in currently tested software. These models perform well if the concept is stationary in a dynamic software development environment. But their performance degrades unexpectedly in the presence of change in concept (Concept Drift). Therefore, concept drift (CD) detection is an important activity for improving the overall accuracy of the prediction model. Previous studies on SDP have shown that CD may occur in software defect data and the used defect prediction model may require to be updated to deal with CD. This phenomenon of handling the CD is known as CD adaptation. It is observed that still efforts need to be done in this direction in the SDP domain. In this article, we have proposed a pair of paired learners (PoPL) approach for handling CD in SDP. We combined the drift detection capabilities of two independent paired learners and used the paired learner (PL) with the best performance in recent time for next prediction. We experimented on various publicly available software defect datasets garnered from public data repositories. Experimentation results showed that our proposed approach performed better than the existing similar works and the base PL model based on various performance measures.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"23 1","pages":"1 - 28"},"PeriodicalIF":5.3,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43163924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Uncertainty-Aware Personal Assistant for Making Personalized Privacy Decisions 不确定意识的个人助理做出个性化的隐私决定

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-03-23 DOI: https://dl.acm.org/doi/10.1145/3561820

Gonul Ayci, Murat Sensoy, Arzucan Özgür, Pinar Yolum

Many software systems, such as online social networks, enable users to share information about themselves. Although the action of sharing is simple, it requires an elaborate thought process on privacy: what to share, with whom to share, and for what purposes. Thinking about these for each piece of content to be shared is tedious. Recent approaches to tackle this problem build personal assistants that can help users by learning what is private over time and recommending privacy labels such as private or public to individual content that a user considers sharing. However, privacy is inherently ambiguous and highly personal. Existing approaches to recommend privacy decisions do not address these aspects of privacy sufficiently. Ideally, a personal assistant should be able to adjust its recommendation based on a given user, considering that user’s privacy understanding. Moreover, the personal assistant should be able to assess when its recommendation would be uncertain and let the user make the decision on her own. Accordingly, this article proposes a personal assistant that uses evidential deep learning to classify content based on its privacy label. An important characteristic of the personal assistant is that it can model its uncertainty in its decisions explicitly, determine that it does not know the answer, and delegate from making a recommendation when its uncertainty is high. By factoring in the user’s own understanding of privacy, such as risk factors or own labels, the personal assistant can personalize its recommendations per user. We evaluate our proposed personal assistant using a well-known dataset. Our results show that our personal assistant can accurately identify uncertain cases, personalize them to its user’s needs, and thus helps users preserve their privacy well.

许多软件系统，如在线社交网络，使用户能够分享自己的信息。虽然分享的行为很简单，但它需要一个详细的隐私思考过程:分享什么，与谁分享，以及为了什么目的分享。为每一条要分享的内容考虑这些是乏味的。最近解决这个问题的方法是建立个人助理，它可以帮助用户了解什么是私有的，并为用户考虑共享的个人内容推荐隐私标签，比如私有或公共。然而，隐私本质上是模糊的，是高度私人的。现有的建议隐私决策的方法没有充分解决隐私的这些方面。理想情况下，个人助理应该能够根据给定的用户调整其推荐，考虑到用户对隐私的理解。此外，个人助理应该能够评估它的建议何时是不确定的，并让用户自己做出决定。因此，本文提出了一种基于隐私标签使用证据深度学习对内容进行分类的个人助理。个人助理的一个重要特点是，它可以明确地对其决策中的不确定性进行建模，确定它不知道答案，并在不确定性较高时委托他人提出建议。通过考虑用户自己对隐私的理解，例如风险因素或自己的标签，个人助理可以为每个用户提供个性化的建议。我们使用一个众所周知的数据集来评估我们建议的个人助理。我们的研究结果表明，我们的个人助理可以准确地识别不确定情况，并根据用户的需求进行个性化处理，从而很好地保护用户的隐私。

{"title":"Uncertainty-Aware Personal Assistant for Making Personalized Privacy Decisions","authors":"Gonul Ayci, Murat Sensoy, Arzucan Özgür, Pinar Yolum","doi":"https://dl.acm.org/doi/10.1145/3561820","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3561820","url":null,"abstract":"Many software systems, such as online social networks, enable users to share information about themselves. Although the action of sharing is simple, it requires an elaborate thought process on privacy: what to share, with whom to share, and for what purposes. Thinking about these for each piece of content to be shared is tedious. Recent approaches to tackle this problem build personal assistants that can help users by learning what is private over time and recommending privacy labels such as private or public to individual content that a user considers sharing. However, privacy is inherently ambiguous and highly personal. Existing approaches to recommend privacy decisions do not address these aspects of privacy sufficiently. Ideally, a personal assistant should be able to adjust its recommendation based on a given user, considering that user’s privacy understanding. Moreover, the personal assistant should be able to assess when its recommendation would be uncertain and let the user make the decision on her own. Accordingly, this article proposes a personal assistant that uses evidential deep learning to classify content based on its privacy label. An important characteristic of the personal assistant is that it can model its uncertainty in its decisions explicitly, determine that it does not know the answer, and delegate from making a recommendation when its uncertainty is high. By factoring in the user’s own understanding of privacy, such as risk factors or own labels, the personal assistant can personalize its recommendations per user. We evaluate our proposed personal assistant using a well-known dataset. Our results show that our personal assistant can accurately identify uncertain cases, personalize them to its user’s needs, and thus helps users preserve their privacy well.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"72 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SAM: Multi-turn Response Selection Based on Semantic Awareness Matching 基于语义感知匹配的多回合响应选择

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-03-23 DOI: https://dl.acm.org/doi/10.1145/3545570

Rongjunchen Zhang, Tingmin Wu, Sheng Wen, Surya Nepal, Cecile Paris, Yang Xiang

Multi-turn response selection is a key issue in retrieval-based chatbots and has attracted considerable attention in the NLP (Natural Language processing) field. So far, researchers have developed many solutions that can select appropriate responses for multi-turn conversations. However, these works are still suffering from the semantic mismatch problem when responses and context share similar words with different meanings. In this article, we propose a novel chatbot model based on Semantic Awareness Matching, called SAM. SAM can capture both similarity and semantic features in the context by a two-layer matching network. Appropriate responses are selected according to the matching probability made through the aggregation of the two feature types. In the evaluation, we pick 4 widely used datasets and compare SAM’s performance to that of 12 other models. Experiment results show that SAM achieves substantial improvements, with up to 1.5% R₁₀@1 on Ubuntu Dialogue Corpus V2, 0.5% R₁₀@1 on Douban Conversation Corpus, and 1.3% R₁₀@1 on E-commerce Corpus.

多回合响应选择是基于检索的聊天机器人的一个关键问题，在自然语言处理领域备受关注。到目前为止，研究人员已经开发了许多解决方案，可以为多回合对话选择合适的回答。然而，这些作品仍然存在着语义不匹配的问题，即当回应和语境中有相似的词但含义不同时。在本文中，我们提出了一种新的基于语义感知匹配的聊天机器人模型，称为SAM。SAM可以通过两层匹配网络同时捕获上下文中的相似性和语义特征。根据两种特征类型聚合得到的匹配概率选择合适的响应。在评估中，我们选择了4个广泛使用的数据集，并将SAM的性能与其他12个模型的性能进行比较。实验结果表明，SAM在Ubuntu对话语料库V2上达到了1.5% R10@1，在豆瓣对话语料库上达到0.5% R10@1，在电子商务语料库上达到1.3% R10@1。

{"title":"SAM: Multi-turn Response Selection Based on Semantic Awareness Matching","authors":"Rongjunchen Zhang, Tingmin Wu, Sheng Wen, Surya Nepal, Cecile Paris, Yang Xiang","doi":"https://dl.acm.org/doi/10.1145/3545570","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3545570","url":null,"abstract":"Multi-turn response selection is a key issue in retrieval-based chatbots and has attracted considerable attention in the NLP (Natural Language processing) field. So far, researchers have developed many solutions that can select appropriate responses for multi-turn conversations. However, these works are still suffering from the semantic mismatch problem when responses and context share similar words with different meanings. In this article, we propose a novel chatbot model based on Semantic Awareness Matching, called SAM. SAM can capture both similarity and semantic features in the context by a two-layer matching network. Appropriate responses are selected according to the matching probability made through the aggregation of the two feature types. In the evaluation, we pick 4 widely used datasets and compare SAM’s performance to that of 12 other models. Experiment results show that SAM achieves substantial improvements, with up to 1.5% R10@1 on Ubuntu Dialogue Corpus V2, 0.5% R10@1 on Douban Conversation Corpus, and 1.3% R10@1 on E-commerce Corpus.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"1206 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time Pricing-based Resource Allocation in Open Market Environments 开放市场环境下基于实时定价的资源配置

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-03-14 DOI: 10.1145/3465237

P. Mishra, Ahmed Moustafa, T. Ito

Open market environments consist of a set of participants (vendors and consumers) that dynamically leave or join the market. As a result, the arising dynamism leads to uncertainties in supply and demand of the resources in these open markets. In specific, in such uncertain markets, vendors attempt to maximise their revenue by dynamically changing their selling prices according to the market demand. In this regard, an optimal resource allocation approach becomes immensely needed to optimise the selling prices based on the supply and demand of the resources in the open market. Therefore, optimal selling prices should maximise the revenue of vendors while protecting the utility of buyers. In this context, we propose a real-time pricing approach for resource allocation in open market environments. The proposed approach introduces a priority-based fairness mechanism to allocate the available resources in a reverse-auction paradigm. Finally, we compare the proposed approach with two state-of-the-art resource allocation approaches. The experimental results show that the proposed approach outperforms the other two resource allocation approaches in its ability to maximise the vendors’ revenue.

开放市场环境由一组动态离开或加入市场的参与者（供应商和消费者）组成。因此，不断增长的活力导致了这些开放市场资源供需的不确定性。具体而言，在这种不确定的市场中，供应商试图通过根据市场需求动态改变售价来实现收入最大化。在这方面，迫切需要一种最佳资源分配方法，以根据公开市场中资源的供需来优化售价。因此，最佳销售价格应最大限度地提高供应商的收入，同时保护买家的效用。在此背景下，我们提出了一种在公开市场环境中进行资源配置的实时定价方法。所提出的方法引入了一种基于优先级的公平机制，以在反向拍卖模式中分配可用资源。最后，我们将所提出的方法与两种最先进的资源分配方法进行了比较。实验结果表明，该方法在最大化供应商收入方面优于其他两种资源分配方法。

{"title":"Real-time Pricing-based Resource Allocation in Open Market Environments","authors":"P. Mishra, Ahmed Moustafa, T. Ito","doi":"10.1145/3465237","DOIUrl":"https://doi.org/10.1145/3465237","url":null,"abstract":"Open market environments consist of a set of participants (vendors and consumers) that dynamically leave or join the market. As a result, the arising dynamism leads to uncertainties in supply and demand of the resources in these open markets. In specific, in such uncertain markets, vendors attempt to maximise their revenue by dynamically changing their selling prices according to the market demand. In this regard, an optimal resource allocation approach becomes immensely needed to optimise the selling prices based on the supply and demand of the resources in the open market. Therefore, optimal selling prices should maximise the revenue of vendors while protecting the utility of buyers. In this context, we propose a real-time pricing approach for resource allocation in open market environments. The proposed approach introduces a priority-based fairness mechanism to allocate the available resources in a reverse-auction paradigm. Finally, we compare the proposed approach with two state-of-the-art resource allocation approaches. The experimental results show that the proposed approach outperforms the other two resource allocation approaches in its ability to maximise the vendors’ revenue.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"23 1","pages":"1 - 22"},"PeriodicalIF":5.3,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46098084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Taming Internet of Things Application Development with the IoTvar Middleware 用IoTvar中间件驯服物联网应用程序开发

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-28 DOI: 10.1145/3586010

P. Borges, C. Taconet, S. Chabridon, D. Conan, Everton Cavalcante, T. Batista

In the last years, Internet of Things (IoT) platforms have been designed to provide IoT applications with various services such as device discovery, context management, and data filtering. The lack of standardization has led each IoT platform to propose its own abstractions, APIs, and data models. As a consequence, programming interactions between an IoT consuming application and an IoT platform is time-consuming, is error prone, and depends on the developers’ level of knowledge about the IoT platform. To address these issues, this article introduces IoTvar, a middleware library deployed on the IoT consumer application that manages all its interactions with IoT platforms. IoTvar relies on declaring variables automatically mapped to sensors whose values are transparently updated with sensor observations through proxies on the client side. This article presents the IoTvar architecture and shows how it has been integrated into the FIWARE, OM2M, and muDEBS platforms. We also report the results of experiments performed to evaluate IoTvar, showing that it reduces the effort required to declare and manage IoT variables and has no considerable impact on CPU, memory, and energy consumption.

在过去的几年里，物联网(IoT)平台被设计为为物联网应用提供各种服务，如设备发现、上下文管理和数据过滤。由于缺乏标准化，每个物联网平台都提出了自己的抽象、api和数据模型。因此，在物联网消费应用程序和物联网平台之间编程交互是耗时的，容易出错，并且取决于开发人员对物联网平台的知识水平。为了解决这些问题，本文介绍了IoTvar，这是一个部署在IoT消费者应用程序上的中间件库，用于管理其与IoT平台的所有交互。IoTvar依赖于声明自动映射到传感器的变量，这些变量的值通过客户端的代理透明地更新传感器观察值。本文介绍了IoTvar体系结构，并展示了如何将其集成到FIWARE、OM2M和muDEBS平台中。我们还报告了评估IoTvar的实验结果，表明它减少了声明和管理物联网变量所需的工作量，并且对CPU，内存和能耗没有相当大的影响。

{"title":"Taming Internet of Things Application Development with the IoTvar Middleware","authors":"P. Borges, C. Taconet, S. Chabridon, D. Conan, Everton Cavalcante, T. Batista","doi":"10.1145/3586010","DOIUrl":"https://doi.org/10.1145/3586010","url":null,"abstract":"In the last years, Internet of Things (IoT) platforms have been designed to provide IoT applications with various services such as device discovery, context management, and data filtering. The lack of standardization has led each IoT platform to propose its own abstractions, APIs, and data models. As a consequence, programming interactions between an IoT consuming application and an IoT platform is time-consuming, is error prone, and depends on the developers’ level of knowledge about the IoT platform. To address these issues, this article introduces IoTvar, a middleware library deployed on the IoT consumer application that manages all its interactions with IoT platforms. IoTvar relies on declaring variables automatically mapped to sensors whose values are transparently updated with sensor observations through proxies on the client side. This article presents the IoTvar architecture and shows how it has been integrated into the FIWARE, OM2M, and muDEBS platforms. We also report the results of experiments performed to evaluate IoTvar, showing that it reduces the effort required to declare and manage IoT variables and has no considerable impact on CPU, memory, and energy consumption.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":" ","pages":"1 - 21"},"PeriodicalIF":5.3,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47551992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Low-code Development Framework for Cloud-native Edge Systems 云原生边缘系统的低代码开发框架

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-27 DOI: https://dl.acm.org/doi/10.1145/3563215

Wenzhao Zhang, Yuxuan Zhang, Hongchang Fan, Yi Gao, Wei Dong

Customizing and deploying an edge system are time-consuming and complex tasks because of hardware heterogeneity, third-party software compatibility, diverse performance requirements, and so on. In this article, we present TinyEdge, a holistic framework for the low-code development of edge systems. The key idea of TinyEdge is to use a top-down approach for designing edge systems. Developers select and configure TinyEdge modules to specify their interaction logic without dealing with the specific hardware or software. Taking the configuration as input, TinyEdge automatically generates the deployment package and estimates the performance with sufficient profiling. TinyEdge provides a unified development toolkit to specify module dependencies, functionalities, interactions, and configurations. We implement TinyEdge and evaluate its performance using real-world edge systems. Results show that: (1) TinyEdge achieves rapid customization of edge systems, reducing 44.15% of development time and 67.79% of lines of code on average compared with the state-of-the-art edge computing platforms; (2) TinyEdge builds compact modules and optimizes the latent circular dependency detection and message routing efficiency; (3) TinyEdge performance estimation has low absolute errors in various settings.

由于硬件异构性、第三方软件兼容性、不同的性能需求等原因，定制和部署边缘系统是一项耗时且复杂的任务。在本文中，我们介绍了TinyEdge，一个用于边缘系统低代码开发的整体框架。TinyEdge的关键思想是使用自上而下的方法来设计边缘系统。开发人员选择和配置TinyEdge模块来指定它们的交互逻辑，而不需要处理特定的硬件或软件。将配置作为输入，TinyEdge自动生成部署包，并通过充分的分析来估计性能。TinyEdge提供了一个统一的开发工具包来指定模块依赖、功能、交互和配置。我们实现了TinyEdge，并使用现实世界的边缘系统评估其性能。结果表明:(1)TinyEdge实现了边缘系统的快速定制，与目前最先进的边缘计算平台相比，平均减少44.15%的开发时间和67.79%的代码行数;(2) TinyEdge构建紧凑模块，优化潜在循环依赖检测和消息路由效率;(3) TinyEdge性能估计在各种设置下的绝对误差都很低。

{"title":"A Low-code Development Framework for Cloud-native Edge Systems","authors":"Wenzhao Zhang, Yuxuan Zhang, Hongchang Fan, Yi Gao, Wei Dong","doi":"https://dl.acm.org/doi/10.1145/3563215","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3563215","url":null,"abstract":"Customizing and deploying an edge system are time-consuming and complex tasks because of hardware heterogeneity, third-party software compatibility, diverse performance requirements, and so on. In this article, we present TinyEdge, a holistic framework for the low-code development of edge systems. The key idea of TinyEdge is to use a top-down approach for designing edge systems. Developers select and configure TinyEdge modules to specify their interaction logic without dealing with the specific hardware or software. Taking the configuration as input, TinyEdge automatically generates the deployment package and estimates the performance with sufficient profiling. TinyEdge provides a unified development toolkit to specify module dependencies, functionalities, interactions, and configurations. We implement TinyEdge and evaluate its performance using real-world edge systems. Results show that: (1) TinyEdge achieves rapid customization of edge systems, reducing 44.15% of development time and 67.79% of lines of code on average compared with the state-of-the-art edge computing platforms; (2) TinyEdge builds compact modules and optimizes the latent circular dependency detection and message routing efficiency; (3) TinyEdge performance estimation has low absolute errors in various settings.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"1 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Breaking CaptchaStar Using the BASECASS Methodology 使用BASECASS方法破解验证码

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-27 DOI: https://dl.acm.org/doi/10.1145/3546867

Carlos Hernández-Castro, David F. Barrero, Maria Dolores R-Moreno

In this article, we present fundamental design flaws of CaptchaStar. We also present a full analysis using the BASECASS methodology that employs machine learning techniques. By means of this methodology, we find an attack that bypasses CaptchaStar with almost 100% accuracy.

在本文中，我们介绍了CaptchaStar的基本设计缺陷。我们还使用采用机器学习技术的BASECASS方法进行了全面分析。通过这种方法，我们发现了一种几乎100%准确率绕过验证码星的攻击。

引用次数: 0