Frontiers of Computer Science最新文献

英文中文

A disk I/O optimized system for concurrent graph processing jobs 针对并发图形处理工作的磁盘 I/O 优化系统

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2024-01-22 DOI: 10.1007/s11704-023-2361-0

Xianghao Xu, Fang Wang, Hong Jiang, Yongli Cheng, Dan Feng, Peng Fang

In order to analyze and process the large graphs with high cost efficiency, researchers have developed a number of out-of-core graph processing systems in recent years based on just one commodity computer. On the other hand, with the rapidly growing need of analyzing graphs in the real-world, graph processing systems have to efficiently handle massive concurrent graph processing (CGP) jobs. Unfortunately, due to the inherent design for single graph processing job, existing out-of-core graph processing systems usually incur unnecessary data accesses and severe competition of I/O bandwidth when handling the CGP jobs. In this paper, we propose GraphCP, a disk I/O optimized out-of-core graph processing system that efficiently supports the processing of CGP jobs. GraphCP proposes a benefit-aware sharing execution model to share the I/O access and processing of graph data among the CGP jobs and adaptively schedule the graph data loading based on the states of vertices, which efficiently overcomes above challenges faced by existing out-of-core graph processing systems. Moreover, GraphCP adopts a dependency-based future-vertex updating model so as to reduce disk I/Os in the future iterations. In addition, GraphCP organizes the graph data with a Source-Sorted Sub-Block graph representation for better processing capacity and I/O access locality. Extensive evaluation results show that GraphCP is 20.5× and 8.9× faster than two out-of-core graph processing systems GridGraph and GraphZ, and 3.5× and 1.7× faster than two state-of-art concurrent graph processing systems Seraph and GraphSO.

为了以较高的成本效率分析和处理大型图，近年来，研究人员开发了许多仅基于一台商品计算机的外核图处理系统。另一方面，随着现实世界中对图分析需求的快速增长，图处理系统必须高效处理大规模并发图处理（CGP）作业。遗憾的是，由于单图处理作业的固有设计，现有的外核图处理系统在处理 CGP 作业时通常会产生不必要的数据访问和严重的 I/O 带宽竞争。在本文中，我们提出了GraphCP--一种磁盘I/O优化的核外图形处理系统，可有效支持CGP作业的处理。GraphCP提出了一种利益感知共享执行模型，在CGP作业之间共享图数据的I/O访问和处理，并根据顶点的状态自适应调度图数据加载，从而有效克服了现有核外图处理系统面临的上述挑战。此外，GraphCP 还采用了基于依赖关系的未来顶点更新模型，以减少未来迭代中的磁盘 I/O。此外，GraphCP 采用源排序子块图表示法组织图数据，以获得更好的处理能力和 I/O 访问局部性。广泛的评估结果表明，GraphCP的处理速度分别比两个外核图形处理系统GridGraph和GraphZ快20.5倍和8.9倍，比两个最先进的并发图形处理系统Seraph和GraphSO快3.5倍和1.7倍。

{"title":"A disk I/O optimized system for concurrent graph processing jobs","authors":"Xianghao Xu, Fang Wang, Hong Jiang, Yongli Cheng, Dan Feng, Peng Fang","doi":"10.1007/s11704-023-2361-0","DOIUrl":"https://doi.org/10.1007/s11704-023-2361-0","url":null,"abstract":"In order to analyze and process the large graphs with high cost efficiency, researchers have developed a number of out-of-core graph processing systems in recent years based on just one commodity computer. On the other hand, with the rapidly growing need of analyzing graphs in the real-world, graph processing systems have to efficiently handle massive concurrent graph processing (CGP) jobs. Unfortunately, due to the inherent design for single graph processing job, existing out-of-core graph processing systems usually incur unnecessary data accesses and severe competition of I/O bandwidth when handling the CGP jobs. In this paper, we propose GraphCP, a disk I/O optimized out-of-core graph processing system that efficiently supports the processing of CGP jobs. GraphCP proposes a benefit-aware sharing execution model to share the I/O access and processing of graph data among the CGP jobs and adaptively schedule the graph data loading based on the states of vertices, which efficiently overcomes above challenges faced by existing out-of-core graph processing systems. Moreover, GraphCP adopts a dependency-based future-vertex updating model so as to reduce disk I/Os in the future iterations. In addition, GraphCP organizes the graph data with a Source-Sorted Sub-Block graph representation for better processing capacity and I/O access locality. Extensive evaluation results show that GraphCP is 20.5× and 8.9× faster than two out-of-core graph processing systems GridGraph and GraphZ, and 3.5× and 1.7× faster than two state-of-art concurrent graph processing systems Seraph and GraphSO.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139560066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid concurrency control protocol for data sharing among heterogeneous blockchains 异构区块链数据共享的混合并发控制协议

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2024-01-22 DOI: 10.1007/s11704-022-2327-7

Tiezheng Guo, Zhiwei Zhang, Ye Yuan, Xiaochun Yang, Guoren Wang

With the development of information technology and cloud computing, data sharing has become an important part of scientific research. In traditional data sharing, data is stored on a third-party storage platform, which causes the owner to lose control of the data. As a result, there are issues of intentional data leakage and tampering by third parties, and the private information contained in the data may lead to more significant issues. Furthermore, data is frequently maintained on multiple storage platforms, posing significant hurdles in terms of enlisting multiple parties to engage in data sharing while maintaining consistency. In this work, we propose a new architecture for applying blockchains to data sharing and achieve efficient and reliable data sharing among heterogeneous blockchains. We design a new data sharing transaction mechanism based on the system architecture to protect the security of the raw data and the processing process. We also design and implement a hybrid concurrency control protocol to overcome issues caused by the large differences in blockchain performance in our system and to improve the success rate of data sharing transactions. We took Ethereum and Hyperledger Fabric as examples to conduct cross-blockchain data sharing experiments. The results show that our system achieves data sharing across heterogeneous blockchains with reasonable performance and has high scalability.

随着信息技术和云计算的发展，数据共享已成为科学研究的重要组成部分。在传统的数据共享中，数据存储在第三方存储平台上，数据所有者失去了对数据的控制。因此，存在第三方故意泄露和篡改数据的问题，数据中包含的私人信息可能会导致更严重的问题。此外，数据经常保存在多个存储平台上，这给如何在保持一致性的同时争取多方参与数据共享带来了巨大障碍。在这项工作中，我们提出了一种将区块链应用于数据共享的新架构，并实现了异构区块链之间高效可靠的数据共享。我们在系统架构的基础上设计了一种新的数据共享交易机制，以保护原始数据和处理过程的安全。我们还设计并实现了一种混合并发控制协议，以克服系统中区块链性能差异较大所带来的问题，提高数据共享交易的成功率。我们以 Ethereum 和 Hyperledger Fabric 为例，进行了跨区块链数据共享实验。结果表明，我们的系统以合理的性能实现了跨异构区块链的数据共享，并具有较高的可扩展性。

{"title":"Hybrid concurrency control protocol for data sharing among heterogeneous blockchains","authors":"Tiezheng Guo, Zhiwei Zhang, Ye Yuan, Xiaochun Yang, Guoren Wang","doi":"10.1007/s11704-022-2327-7","DOIUrl":"https://doi.org/10.1007/s11704-022-2327-7","url":null,"abstract":"With the development of information technology and cloud computing, data sharing has become an important part of scientific research. In traditional data sharing, data is stored on a third-party storage platform, which causes the owner to lose control of the data. As a result, there are issues of intentional data leakage and tampering by third parties, and the private information contained in the data may lead to more significant issues. Furthermore, data is frequently maintained on multiple storage platforms, posing significant hurdles in terms of enlisting multiple parties to engage in data sharing while maintaining consistency. In this work, we propose a new architecture for applying blockchains to data sharing and achieve efficient and reliable data sharing among heterogeneous blockchains. We design a new data sharing transaction mechanism based on the system architecture to protect the security of the raw data and the processing process. We also design and implement a hybrid concurrency control protocol to overcome issues caused by the large differences in blockchain performance in our system and to improve the success rate of data sharing transactions. We took Ethereum and Hyperledger Fabric as examples to conduct cross-blockchain data sharing experiments. The results show that our system achieves data sharing across heterogeneous blockchains with reasonable performance and has high scalability.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139559870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A probabilistic generative model for tracking multi-knowledge concept mastery probability 跟踪多知识概念掌握概率的概率生成模型

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2024-01-22 DOI: 10.1007/s11704-023-3008-x

Hengyu Liu, Tiancheng Zhang, Fan Li, Minghe Yu, Ge Yu

Knowledge tracing aims to track students’ knowledge status over time to predict students’ future performance accurately. In a real environment, teachers expect knowledge tracing models to provide the interpretable result of knowledge status. Markov chain-based knowledge tracing (MCKT) models, such as Bayesian Knowledge Tracing, can track knowledge concept mastery probability over time. However, as the number of tracked knowledge concepts increases, the time complexity of MCKT predicting student performance increases exponentially (also called explaining away problem). When the number of tracked knowledge concepts is large, we cannot utilize MCKT to track knowledge concept mastery probability over time. In addition, the existing MCKT models only consider the relationship between students’ knowledge status and problems when modeling students’ responses but ignore the relationship between knowledge concepts in the same problem. To address these challenges, we propose an inTerpretable pRobAbilistiC gEnerative moDel (TRACED), which can track students’ numerous knowledge concepts mastery probabilities over time. To solve explain away problem, we design long and short-term memory (LSTM)-based networks to approximate the posterior distribution, predict students’ future performance, and propose a heuristic algorithm to train LSTMs and probabilistic graphical model jointly. To better model students’ exercise responses, we proposed a logarithmic linear model with three interactive strategies, which models students’ exercise responses by considering the relationship among students’ knowledge status, knowledge concept, and problems. We conduct experiments with four real-world datasets in three knowledge-driven tasks. The experimental results show that TRACED outperforms existing knowledge tracing methods in predicting students’ future performance and can learn the relationship among students, knowledge concepts, and problems from students’ exercise sequences. We also conduct several case studies. The case studies show that TRACED exhibits excellent interpretability and thus has the potential for personalized automatic feedback in the real-world educational environment.

知识追踪的目的是跟踪学生在一段时间内的知识状况，从而准确预测学生的未来表现。在现实环境中，教师希望知识追踪模型能提供可解释的知识状况结果。基于马尔可夫链的知识追踪（MCKT）模型，如贝叶斯知识追踪，可以追踪一段时间内知识概念的掌握概率。然而，随着跟踪知识概念数量的增加，MCKT 预测学生成绩的时间复杂性也会呈指数级增长（也称为解释问题）。当跟踪的知识概念数量较多时，我们就无法利用 MCKT 来跟踪知识概念在一段时间内的掌握概率。此外，现有的 MCKT 模型在对学生的反应建模时，只考虑了学生的知识状况与问题之间的关系，却忽略了同一问题中知识概念之间的关系。为了解决这些难题，我们提出了一种可解释的知识概念掌握概率模型（TRACED），该模型可以随时间跟踪学生对众多知识概念的掌握概率。为了解决解释问题，我们设计了基于长短期记忆（LSTM）的网络来逼近后验分布，预测学生的未来成绩，并提出了一种启发式算法来联合训练 LSTM 和概率图形模型。为了更好地模拟学生的练习反应，我们提出了具有三种交互策略的对数线性模型，该模型通过考虑学生的知识状况、知识概念和问题之间的关系来模拟学生的练习反应。我们在三个知识驱动任务中使用四个真实世界数据集进行了实验。实验结果表明，TRACED 在预测学生未来成绩方面优于现有的知识追踪方法，并能从学生的练习序列中学习学生、知识概念和问题之间的关系。我们还进行了几项案例研究。案例研究表明，TRACED 具有出色的可解释性，因此有可能在现实世界的教育环境中提供个性化的自动反馈。

{"title":"A probabilistic generative model for tracking multi-knowledge concept mastery probability","authors":"Hengyu Liu, Tiancheng Zhang, Fan Li, Minghe Yu, Ge Yu","doi":"10.1007/s11704-023-3008-x","DOIUrl":"https://doi.org/10.1007/s11704-023-3008-x","url":null,"abstract":"Knowledge tracing aims to track students’ knowledge status over time to predict students’ future performance accurately. In a real environment, teachers expect knowledge tracing models to provide the interpretable result of knowledge status. Markov chain-based knowledge tracing (MCKT) models, such as Bayesian Knowledge Tracing, can track knowledge concept mastery probability over time. However, as the number of tracked knowledge concepts increases, the time complexity of MCKT predicting student performance increases exponentially (also called explaining away problem). When the number of tracked knowledge concepts is large, we cannot utilize MCKT to track knowledge concept mastery probability over time. In addition, the existing MCKT models only consider the relationship between students’ knowledge status and problems when modeling students’ responses but ignore the relationship between knowledge concepts in the same problem. To address these challenges, we propose an inTerpretable pRobAbilistiC gEnerative moDel (TRACED), which can track students’ numerous knowledge concepts mastery probabilities over time. To solve explain away problem, we design long and short-term memory (LSTM)-based networks to approximate the posterior distribution, predict students’ future performance, and propose a heuristic algorithm to train LSTMs and probabilistic graphical model jointly. To better model students’ exercise responses, we proposed a logarithmic linear model with three interactive strategies, which models students’ exercise responses by considering the relationship among students’ knowledge status, knowledge concept, and problems. We conduct experiments with four real-world datasets in three knowledge-driven tasks. The experimental results show that TRACED outperforms existing knowledge tracing methods in predicting students’ future performance and can learn the relationship among students, knowledge concepts, and problems from students’ exercise sequences. We also conduct several case studies. The case studies show that TRACED exhibits excellent interpretability and thus has the potential for personalized automatic feedback in the real-world educational environment.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139560567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GRAMO: geometric resampling augmentation for monocular 3D object detection GRAMO：用于单目三维物体检测的几何重采样增强技术

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2024-01-15 DOI: 10.1007/s11704-023-3242-2

He Guan, Chunfeng Song, Zhaoxiang Zhang

Data augmentation is widely recognized as an effective means of bolstering model robustness. However, when applied to monocular 3D object detection, non-geometric image augmentation neglects the critical link between the image and physical space, resulting in the semantic collapse of the extended scene. To address this issue, we propose two geometric-level data augmentation operators named Geometric-Copy-Paste (Geo-CP) and Geometric-Crop-Shrink (Geo-CS). Both operators introduce geometric consistency based on the principle of perspective projection, complementing the options available for data augmentation in monocular 3D. Specifically, Geo-CP replicates local patches by reordering object depths to mitigate perspective occlusion conflicts, and Geo-CS re-crops local patches for simultaneous scaling of distance and scale to unify appearance and annotation. These operations ameliorate the problem of class imbalance in the monocular paradigm by increasing the quantity and distribution of geometrically consistent samples. Experiments demonstrate that our geometric-level augmentation operators effectively improve robustness and performance in the KITTI and Waymo monocular 3D detection benchmarks.

数据增强被广泛认为是增强模型鲁棒性的有效手段。然而，当应用于单目三维物体检测时，非几何图像增强忽略了图像与物理空间之间的关键联系，导致扩展场景的语义坍塌。为了解决这个问题，我们提出了两个几何级数据增强算子，分别名为 "几何-复制-粘贴（Geo-CP）"和 "几何-裁剪-收缩（Geo-CS）"。这两个操作符都基于透视投影原理引入几何一致性，补充了单目三维数据增强的可用选项。具体来说，Geo-CP 通过对物体深度重新排序来复制局部斑块，以缓解透视遮挡冲突；Geo-CS 则重新裁剪局部斑块，同时缩放距离和比例，以统一外观和注释。这些操作通过增加几何一致性样本的数量和分布，改善了单目范例中的类不平衡问题。实验证明，在 KITTI 和 Waymo 单目 3D 检测基准测试中，我们的几何级增强运算符有效地提高了鲁棒性和性能。

{"title":"GRAMO: geometric resampling augmentation for monocular 3D object detection","authors":"He Guan, Chunfeng Song, Zhaoxiang Zhang","doi":"10.1007/s11704-023-3242-2","DOIUrl":"https://doi.org/10.1007/s11704-023-3242-2","url":null,"abstract":"Data augmentation is widely recognized as an effective means of bolstering model robustness. However, when applied to monocular 3D object detection, non-geometric image augmentation neglects the critical link between the image and physical space, resulting in the semantic collapse of the extended scene. To address this issue, we propose two geometric-level data augmentation operators named Geometric-Copy-Paste (Geo-CP) and Geometric-Crop-Shrink (Geo-CS). Both operators introduce geometric consistency based on the principle of perspective projection, complementing the options available for data augmentation in monocular 3D. Specifically, Geo-CP replicates local patches by reordering object depths to mitigate perspective occlusion conflicts, and Geo-CS re-crops local patches for simultaneous scaling of distance and scale to unify appearance and annotation. These operations ameliorate the problem of class imbalance in the monocular paradigm by increasing the quantity and distribution of geometrically consistent samples. Experiments demonstrate that our geometric-level augmentation operators effectively improve robustness and performance in the KITTI and Waymo monocular 3D detection benchmarks.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139476680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Partially-hiding functional encryption for degree-2 polynomials with fine-grained access control 为具有细粒度访问控制的 2 级多项式进行部分隐藏函数加密

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2024-01-15 DOI: 10.1007/s11704-023-3461-6

Haifeng Qian, Cheng Lin, Qiaohan Chu, Jie Chen

引用次数: 0

Single depth image 3D face reconstruction via domain adaptive learning 通过域自适应学习重建单一深度图像三维人脸

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2024-01-13 DOI: 10.1007/s11704-023-3541-7

Xiaoxu Cai, Jianwen Lou, Jiajun Bu, Junyu Dong, Haishuai Wang, Hui Yu

引用次数: 0

Degradation-adaptive neural network for jointly single image dehazing and desnowing 衰减自适应神经网络用于联合单一图像去毛刺和去雪花

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2024-01-13 DOI: 10.1007/s11704-023-2764-y

Erkang Chen, Sixiang Chen, Tian Ye, Yun Liu

引用次数: 0

Traceable ring signature schemes based on SM2 digital signature algorithm and its applications in the data sharing scheme 基于 SM2 数字签名算法的可溯源环签名方案及其在数据共享方案中的应用

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2024-01-13 DOI: 10.1007/s11704-023-3318-z

Yongxin Zhang, Hong Lei, Bin Wang, Qinghao Wang, Ning Lu, Wenbo Shi, Bangdao Chen, Qiuling Yue

引用次数: 0

Rts: learning robustly from time series data with noisy label Rts：从带有噪声标签的时间序列数据中稳健学习

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2023-12-28 DOI: 10.1007/s11704-023-3200-z

Zhi Zhou, Yi-Xuan Jin, Yu-Feng Li

Significant progress has been made in machine learning with large amounts of clean labels and static data. However, in many real-world applications, the data often changes with time and it is difficult to obtain massive clean annotations, that is, noisy labels and time series are faced simultaneously. For example, in product-buyer evaluation, each sample records the daily time behavior of users, but the long transaction period brings difficulties to analysis, and salespeople often erroneously annotate the user’s purchase behavior. Such a novel setting, to our best knowledge, has not been thoroughly studied yet, and there is still a lack of effective machine learning methods. In this paper, we present a systematic approach RTS both theoretically and empirically, consisting of two components, Noise-Tolerant Time Series Representation and Purified Oversampling Learning. Specifically, we propose reducing label noise’s destructive impact to obtain robust feature representations and potential clean samples. Then, a novel learning method based on the purified data and time series oversampling is adopted to train an unbiased model. Theoretical analysis proves that our proposal can improve the quality of the noisy data set. Empirical experiments on diverse tasks, such as the house-buyer evaluation task from real-world applications and various benchmark tasks, clearly demonstrate that our new algorithm robustly outperforms many competitive methods.

利用大量干净的标签和静态数据进行机器学习已经取得了重大进展。然而，在现实世界的许多应用中，数据往往会随时间发生变化，很难获得大量干净的注释，即同时面临噪声标签和时间序列的问题。例如，在商品购买评价中，每个样本都记录了用户每天的时间行为，但交易周期较长，给分析带来了困难，而且销售人员经常错误地注释用户的购买行为。据我们所知，这样一种新颖的环境尚未得到深入研究，而且仍然缺乏有效的机器学习方法。在本文中，我们从理论和经验两方面提出了一种系统的 RTS 方法，它由两个部分组成：噪声容忍时间序列表示和纯化过采样学习。具体来说，我们建议减少标签噪声的破坏性影响，以获得稳健的特征表示和潜在的干净样本。然后，采用一种基于净化数据和时间序列超采样的新型学习方法来训练无偏模型。理论分析证明，我们的建议可以提高噪声数据集的质量。在各种任务（如实际应用中的房屋购买评估任务和各种基准任务）上的经验实验清楚地表明，我们的新算法稳健地优于许多竞争方法。

{"title":"Rts: learning robustly from time series data with noisy label","authors":"Zhi Zhou, Yi-Xuan Jin, Yu-Feng Li","doi":"10.1007/s11704-023-3200-z","DOIUrl":"https://doi.org/10.1007/s11704-023-3200-z","url":null,"abstract":"Significant progress has been made in machine learning with large amounts of clean labels and static data. However, in many real-world applications, the data often changes with time and it is difficult to obtain massive clean annotations, that is, noisy labels and time series are faced simultaneously. For example, in product-buyer evaluation, each sample records the daily time behavior of users, but the long transaction period brings difficulties to analysis, and salespeople often erroneously annotate the user’s purchase behavior. Such a novel setting, to our best knowledge, has not been thoroughly studied yet, and there is still a lack of effective machine learning methods. In this paper, we present a systematic approach RTS both theoretically and empirically, consisting of two components, Noise-Tolerant Time Series Representation and Purified Oversampling Learning. Specifically, we propose reducing label noise’s destructive impact to obtain robust feature representations and potential clean samples. Then, a novel learning method based on the purified data and time series oversampling is adopted to train an unbiased model. Theoretical analysis proves that our proposal can improve the quality of the noisy data set. Empirical experiments on diverse tasks, such as the house-buyer evaluation task from real-world applications and various benchmark tasks, clearly demonstrate that our new algorithm robustly outperforms many competitive methods.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139056677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A general tail item representation enhancement framework for sequential recommendation 用于顺序推荐的一般尾部项目表示增强框架

IF 4.2 3区计算机科学 Q1 Mathematics

Frontiers of Computer Science

Pub Date : 2023-12-28 DOI: 10.1007/s11704-023-3112-y

Mingyue Cheng, Qi Liu, Wenyu Zhang, Zhiding Liu, Hongke Zhao, Enhong Chen

Recently advancements in deep learning models have significantly facilitated the development of sequential recommender systems (SRS). However, the current deep model structures are limited in their ability to learn high-quality embeddings with insufficient data. Meanwhile, highly skewed long-tail distribution is very common in recommender systems. Therefore, in this paper, we focus on enhancing the representation of tail items to improve sequential recommendation performance. Through empirical studies on benchmarks, we surprisingly observe that both the ranking performance and training procedure are greatly hindered by the poorly optimized tail item embeddings. To address this issue, we propose a sequential recommendation framework named TailRec that enables contextual information of tail item well-leveraged and greatly improves its corresponding representation. Given the characteristics of the sequential recommendation task, the surrounding interaction records of each tail item are regarded as contextual information without leveraging any additional side information. This approach allows for the mining of contextual information from cross-sequence behaviors to boost the performance of sequential recommendations. Such a light contextual filtering component is plug-and-play for a series of SRS models. To verify the effectiveness of the proposed TailRec, we conduct extensive experiments over several popular benchmark recommenders. The experimental results demonstrate that TailRec can greatly improve the recommendation results and speed up the training process. The codes of our methods have been available.

最近，深度学习模型的进步极大地促进了顺序推荐系统（SRS）的发展。然而，目前的深度模型结构在数据不足的情况下学习高质量嵌入的能力有限。同时，高度倾斜的长尾分布在推荐系统中非常常见。因此，在本文中，我们将重点放在增强尾部项目的表示上，以提高顺序推荐性能。通过对基准的实证研究，我们惊讶地发现，优化不佳的尾项嵌入会极大地阻碍排名性能和训练过程。为了解决这个问题，我们提出了一种名为 TailRec 的顺序推荐框架，它可以充分利用尾项的上下文信息，并大大改进其相应的表示。鉴于顺序推荐任务的特点，每个尾项的周边交互记录都被视为上下文信息，而无需利用任何额外的侧面信息。这种方法可以从跨序列行为中挖掘上下文信息，从而提高序列推荐的性能。这种轻型上下文过滤组件对于一系列 SRS 模型来说是即插即用的。为了验证所提出的 TailRec 的有效性，我们对几种流行的基准推荐器进行了广泛的实验。实验结果表明，TailRec 可以大大改善推荐结果，并加快训练过程。我们的方法代码已经完成。

{"title":"A general tail item representation enhancement framework for sequential recommendation","authors":"Mingyue Cheng, Qi Liu, Wenyu Zhang, Zhiding Liu, Hongke Zhao, Enhong Chen","doi":"10.1007/s11704-023-3112-y","DOIUrl":"https://doi.org/10.1007/s11704-023-3112-y","url":null,"abstract":"Recently advancements in deep learning models have significantly facilitated the development of sequential recommender systems (SRS). However, the current deep model structures are limited in their ability to learn high-quality embeddings with insufficient data. Meanwhile, highly skewed long-tail distribution is very common in recommender systems. Therefore, in this paper, we focus on enhancing the representation of tail items to improve sequential recommendation performance. Through empirical studies on benchmarks, we surprisingly observe that both the ranking performance and training procedure are greatly hindered by the poorly optimized tail item embeddings. To address this issue, we propose a sequential recommendation framework named TailRec that enables contextual information of tail item well-leveraged and greatly improves its corresponding representation. Given the characteristics of the sequential recommendation task, the surrounding interaction records of each tail item are regarded as contextual information without leveraging any additional side information. This approach allows for the mining of contextual information from cross-sequence behaviors to boost the performance of sequential recommendations. Such a light contextual filtering component is plug-and-play for a series of SRS models. To verify the effectiveness of the proposed TailRec, we conduct extensive experiments over several popular benchmark recommenders. The experimental results demonstrate that TailRec can greatly improve the recommendation results and speed up the training process. The codes of our methods have been available.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139056400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Frontiers of Computer Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀