首页 > 最新文献

Data Intelligence最新文献

英文 中文
Relation Extraction Based on Prompt Information and Feature Reuse 基于提示信息和特征重用的关系提取
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00192
Ping Feng, Xin Zhang, Jian Zhao, Yingying Wang, Biao Huang
ABSTRACT To alleviate the problem of under-utilization features of sentence-level relation extraction, which leads to insufficient performance of the pre-trained language model and underutilization of the feature vector, a sentence-level relation extraction method based on adding prompt information and feature reuse is proposed. At first, in addition to the pair of nominals and sentence information, a piece of prompt information is added, and the overall feature information consists of sentence information, entity pair information, and prompt information, and then the features are encoded by the pre-trained language model ROBERTA. Moreover, in the pre-trained language model, BIGRU is also introduced in the composition of the neural network to extract information, and the feature information is passed through the neural network to form several sets of feature vectors. After that, these feature vectors are reused in different combinations to form multiple outputs, and the outputs are aggregated using ensemble-learning soft voting to perform relation classification. In addition to this, the sum of cross-entropy, KL divergence, and negative log-likelihood loss is used as the final loss function in this paper. In the comparison experiments, the model based on adding prompt information and feature reuse achieved higher results of the SemEval-2010 task 8 relational dataset.
摘要为了解决句子级关系提取中特征利用不足,导致预先训练的语言模型性能不足和特征向量利用不足的问题,提出了一种基于添加提示信息和特征重用的句子级关系抽取方法。首先,除了名词对和句子信息外,还添加了一条提示信息,整体特征信息由句子信息、实体对信息和提示信息组成,然后通过预先训练的语言模型ROBERTA对特征进行编码。此外,在预训练的语言模型中,在神经网络的组成中也引入了BIGRU来提取信息,并将特征信息通过神经网络形成多组特征向量。之后,将这些特征向量以不同的组合重复使用,以形成多个输出,并使用集成学习软投票对输出进行聚合,以执行关系分类。除此之外,本文还使用交叉熵、KL散度和负对数似然损失之和作为最终损失函数。在对比实验中,基于添加提示信息和特征重用的模型在SemEval-2010任务8关系数据集中获得了更高的结果。
{"title":"Relation Extraction Based on Prompt Information and Feature Reuse","authors":"Ping Feng, Xin Zhang, Jian Zhao, Yingying Wang, Biao Huang","doi":"10.1162/dint_a_00192","DOIUrl":"https://doi.org/10.1162/dint_a_00192","url":null,"abstract":"ABSTRACT To alleviate the problem of under-utilization features of sentence-level relation extraction, which leads to insufficient performance of the pre-trained language model and underutilization of the feature vector, a sentence-level relation extraction method based on adding prompt information and feature reuse is proposed. At first, in addition to the pair of nominals and sentence information, a piece of prompt information is added, and the overall feature information consists of sentence information, entity pair information, and prompt information, and then the features are encoded by the pre-trained language model ROBERTA. Moreover, in the pre-trained language model, BIGRU is also introduced in the composition of the neural network to extract information, and the feature information is passed through the neural network to form several sets of feature vectors. After that, these feature vectors are reused in different combinations to form multiple outputs, and the outputs are aggregated using ensemble-learning soft voting to perform relation classification. In addition to this, the sum of cross-entropy, KL divergence, and negative log-likelihood loss is used as the final loss function in this paper. In the comparison experiments, the model based on adding prompt information and feature reuse achieved higher results of the SemEval-2010 task 8 relational dataset.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"824-840"},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47902217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Achieving Transparency: A Metadata Perspective 实现透明度:元数据视角
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00188
Daniel W. Gillman
ABSTRACT Transparency is vital to realizing the promise of evidenced-based policymaking, where “evidence-based” means including information as to what data mean and why they should be trusted. Transparency, in turn, requires that enough of this information is provided. Loosely speaking then, transparency is achieved when sufficient documentation is provided. Sufficiency is situation specific, both for the provider and consumer of the documentation. These ideas are presented in two recent US commissioned reports: The Promise of Evidence-Based Policymaking and Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Metadata are a more formalized kind of documentation, and in this paper, we provide and demonstrate necessary, sufficient, and general conditions for achieving transparency from the metadata perspective: conforming to a specification, providing quality metadata, and creating a usable interface to the metadata. These conditions are important for any metadata system, but here the specification is tied to our framework for metadata quality based on the situation-specific needs for transparency. These ideas are described, and their interrelationships are explored.
透明度对于实现循证决策的承诺至关重要,其中“循证”意味着包括有关数据含义以及为什么应该信任数据的信息。而透明度则要求提供足够的信息。粗略地说,当提供了足够的文档时,就实现了透明度。充分性是特定于文档提供者和使用者的情况的。这些观点出现在美国最近委托编写的两份报告中:为国家科学与工程统计中心和所有联邦统计机构编写的《基于证据的决策承诺》和《统计信息透明度》。元数据是一种更加形式化的文档,在本文中,我们提供并演示了从元数据角度实现透明度的必要、充分和一般条件:符合规范,提供高质量的元数据,并为元数据创建可用的接口。这些条件对于任何元数据系统都很重要,但是这里的规范是与我们的元数据质量框架联系在一起的,该框架基于特定情况下对透明度的需求。对这些思想进行了描述,并探讨了它们之间的相互关系。
{"title":"Achieving Transparency: A Metadata Perspective","authors":"Daniel W. Gillman","doi":"10.1162/dint_a_00188","DOIUrl":"https://doi.org/10.1162/dint_a_00188","url":null,"abstract":"ABSTRACT Transparency is vital to realizing the promise of evidenced-based policymaking, where “evidence-based” means including information as to what data mean and why they should be trusted. Transparency, in turn, requires that enough of this information is provided. Loosely speaking then, transparency is achieved when sufficient documentation is provided. Sufficiency is situation specific, both for the provider and consumer of the documentation. These ideas are presented in two recent US commissioned reports: The Promise of Evidence-Based Policymaking and Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Metadata are a more formalized kind of documentation, and in this paper, we provide and demonstrate necessary, sufficient, and general conditions for achieving transparency from the metadata perspective: conforming to a specification, providing quality metadata, and creating a usable interface to the metadata. These conditions are important for any metadata system, but here the specification is tied to our framework for metadata quality based on the situation-specific needs for transparency. These ideas are described, and their interrelationships are explored.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"261-274"},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41454480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Computer-aided Detection of Tuberculosis from Microbiological and Radiographic Images 计算机辅助从微生物学和放射学图像中检测肺结核
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00198
Abdullahi Umar Ibrahim, Ayse Gunay Kibarer, Fadi M. Al-Turjman
Tuberculosis caused by Mycobacterium tuberculosis have been a major challenge for medical and healthcare sectors in many underdeveloped countries with limited diagnosis tools. Tuberculosis can be detected from microscopic slides and chest X-ray but as a result of the high cases of tuberculosis, this method can be tedious for both Microbiologists and Radiologists and can lead to miss-diagnosis. These challenges can be solved by employing Computer-Aided Detection (CAD)via AI-driven models which learn features based on convolution and result in an output with high accuracy. In this paper, we described automated discrimination of X-ray and microscope slide images into tuberculosis and non-tuberculosis cases using pretrained AlexNet Models. The study employed Chest X-ray dataset made available on Kaggle repository and microscopic slide images from both Near East University Hospital and Kaggle repository. For classification of tuberculosis using microscopic slide images, the model achieved 90.56% accuracy, 97.78% sensitivity and 83.33% specificity for 70: 30 splits. For classification of tuberculosis using X-ray images, the model achieved 93.89% accuracy, 96.67% sensitivity and 91.11% specificity for 70:30 splits. Our result is in line with the notion that CNN models can be used for classifying medical images with higher accuracy and precision.
结核分枝杆菌引起的结核病一直是许多诊断工具有限的欠发达国家医疗保健部门面临的重大挑战。肺结核可以通过显微镜切片和胸部X光片进行检测,但由于肺结核病例多,这种方法对微生物学家和放射科医生来说都很乏味,并可能导致漏诊。这些挑战可以通过人工智能驱动的模型使用计算机辅助检测(CAD)来解决,该模型基于卷积学习特征并产生高精度的输出。在本文中,我们描述了使用预训练的AlexNet模型将X射线和显微镜载玻片图像自动区分为结核病和非结核病病例。该研究使用了Kaggle存储库中提供的胸部X光数据集以及近东大学医院和Kaggle储存库的显微镜载玻片图像。对于使用显微镜载玻片图像进行结核病分类,该模型在70:30分割时实现了90.56%的准确率、97.78%的灵敏度和83.33%的特异性。对于使用X射线图像进行结核病分类,该模型在70:30分割时实现了93.89%的准确率、96.67%的灵敏度和91.11%的特异性。我们的结果与CNN模型可以用于以更高精度和精度对医学图像进行分类的概念一致。
{"title":"Computer-aided Detection of Tuberculosis from Microbiological and Radiographic Images","authors":"Abdullahi Umar Ibrahim, Ayse Gunay Kibarer, Fadi M. Al-Turjman","doi":"10.1162/dint_a_00198","DOIUrl":"https://doi.org/10.1162/dint_a_00198","url":null,"abstract":"\u0000 Tuberculosis caused by Mycobacterium tuberculosis have been a major challenge for medical and healthcare sectors in many underdeveloped countries with limited diagnosis tools. Tuberculosis can be detected from microscopic slides and chest X-ray but as a result of the high cases of tuberculosis, this method can be tedious for both Microbiologists and Radiologists and can lead to miss-diagnosis. These challenges can be solved by employing Computer-Aided Detection (CAD)via AI-driven models which learn features based on convolution and result in an output with high accuracy. In this paper, we described automated discrimination of X-ray and microscope slide images into tuberculosis and non-tuberculosis cases using pretrained AlexNet Models. The study employed Chest X-ray dataset made available on Kaggle repository and microscopic slide images from both Near East University Hospital and Kaggle repository. For classification of tuberculosis using microscopic slide images, the model achieved 90.56% accuracy, 97.78% sensitivity and 83.33% specificity for 70: 30 splits. For classification of tuberculosis using X-ray images, the model achieved 93.89% accuracy, 96.67% sensitivity and 91.11% specificity for 70:30 splits. Our result is in line with the notion that CNN models can be used for classifying medical images with higher accuracy and precision.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43215088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RS-SVM Machine Learning Approach Driven by Case Data for Selecting Urban Drainage Network Restoration Scheme 基于案例数据驱动的RS-SVM机器学习方法在城市排水管网修复方案选择中的应用
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00208
Li Jiang, Zheng Geng, Dong-Hwan Gu, Shuai Guo, Rongmin Huang, Haoke Cheng, Kaixuan Zhu
ABSTRACT Urban drainage pipe network is the backbone of urban drainage, flood control and water pollution prevention, and is also an essential symbol to measure the level of urban modernization. A large number of underground drainage pipe networks in aged urban areas have been laid for a long time and have reached or practically reached the service age. The repair of drainage pipe networks has attracted extensive attention from all walks of life. Since the Ministry of ecological environment and the national development and Reform Commission jointly issued the action plan for the Yangtze River Protection and restoration in 2019, various provinces in the Yangtze River Basin, such as Anhui, Jiangxi and Hunan, have extensively carried out PPP projects for urban pipeline restoration, in order to improve the quality and efficiency of sewage treatment. Based on the management practice of urban pipe network restoration project in Wuhu City, Anhui Province, this paper analyzes the problems of lengthy construction period and repeated operation caused by the mismatch between the design schedule of the restoration scheme and the construction schedule of the pipe network restoration in the existing project management mode, and proposes a model of urban drainage pipe network restoration scheme selection based on the improved support vector machine. The validity and feasibility of the model are analyzed and verified by collecting the data in the project practice. The research results show that the model has a favorable effect on the selection of urban drainage pipeline restoration schemes, and its accuracy can reach 90%. The research results can provide method guidance and technical support for the rapid decision-making of urban drainage pipeline restoration projects.
城市排水管网是城市排水、防洪和水污染防治的骨干,也是衡量城市现代化水平的重要标志。高龄城区大量地下排水管网敷设时间较长,已达到或实际达到使用年限。排水管网的修补工作引起了社会各界的广泛关注。自生态环境部和国家发改委于2019年联合发布《长江保护修复行动计划》以来,为提高污水处理质量和效率,安徽、江西、湖南等长江流域各省广泛开展城市管道修复PPP项目。本文基于安徽省芜湖市城市管网修复工程的管理实践,分析了现有项目管理模式中因修复方案的设计进度与管网修复施工进度不匹配导致的工期长、重复运行等问题,提出了一种基于改进支持向量机的城市排水管网修复方案选择模型。通过工程实际数据的收集,对模型的有效性和可行性进行了分析和验证。研究结果表明,该模型对城市排水管道修复方案的选择具有较好的效果,其准确率可达90%。研究成果可为城市排水管道修复工程的快速决策提供方法指导和技术支持。
{"title":"RS-SVM Machine Learning Approach Driven by Case Data for Selecting Urban Drainage Network Restoration Scheme","authors":"Li Jiang, Zheng Geng, Dong-Hwan Gu, Shuai Guo, Rongmin Huang, Haoke Cheng, Kaixuan Zhu","doi":"10.1162/dint_a_00208","DOIUrl":"https://doi.org/10.1162/dint_a_00208","url":null,"abstract":"ABSTRACT Urban drainage pipe network is the backbone of urban drainage, flood control and water pollution prevention, and is also an essential symbol to measure the level of urban modernization. A large number of underground drainage pipe networks in aged urban areas have been laid for a long time and have reached or practically reached the service age. The repair of drainage pipe networks has attracted extensive attention from all walks of life. Since the Ministry of ecological environment and the national development and Reform Commission jointly issued the action plan for the Yangtze River Protection and restoration in 2019, various provinces in the Yangtze River Basin, such as Anhui, Jiangxi and Hunan, have extensively carried out PPP projects for urban pipeline restoration, in order to improve the quality and efficiency of sewage treatment. Based on the management practice of urban pipe network restoration project in Wuhu City, Anhui Province, this paper analyzes the problems of lengthy construction period and repeated operation caused by the mismatch between the design schedule of the restoration scheme and the construction schedule of the pipe network restoration in the existing project management mode, and proposes a model of urban drainage pipe network restoration scheme selection based on the improved support vector machine. The validity and feasibility of the model are analyzed and verified by collecting the data in the project practice. The research results show that the model has a favorable effect on the selection of urban drainage pipeline restoration schemes, and its accuracy can reach 90%. The research results can provide method guidance and technical support for the rapid decision-making of urban drainage pipeline restoration projects.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"413-437"},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41451396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Three Heads Better than One: Pure Entity, Relation Label and Adversarial Training for Cross-domain Few-shot Relation Extraction 三个臭皮子比一个好:纯实体、关系标签和跨域少镜头关系抽取的对抗训练
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00190
Wenlong Fang, Chunping Ouyang, Qiang Lin, Yue Yuan
ABSTRACT In this paper, we study cross-domain relation extraction. Since new data mapping to feature spaces always differs from the previously seen data due to a domain shift, few-shot relation extraction often perform poorly. To solve the problems caused by cross-domain, we propose a method for combining the pure entity, relation labels and adversarial (PERLA). We first use entities and complete sentences for separate encoding to obtain context-independent entity features. Then, we combine relation labels which are useful for relation extraction to mitigate context noise. We combine adversarial to reduce the noise caused by cross-domain. We conducted experiments on the publicly available cross-domain relation extraction dataset Fewrel 2.0[1]①, and the results show that our approach improves accuracy and has better transferability for better adaptation to cross-domain tasks.
摘要在本文中,我们研究了跨领域关系提取。由于域偏移,映射到特征空间的新数据总是与以前看到的数据不同,因此少镜头关系提取通常表现不佳。为了解决跨域引起的问题,我们提出了一种将纯实体、关系标签和对抗性(PERLA)相结合的方法。我们首先使用实体和完整句子进行单独编码,以获得与上下文无关的实体特征。然后,我们结合关系标签,这对于关系提取有用,以减轻上下文噪声。我们结合对抗性来减少由跨域引起的噪声。我们在公开的跨域关系提取数据集Fewrel 2.0[1]①上进行了实验,结果表明,我们的方法提高了精度,具有更好的可转移性,能够更好地适应跨域任务。
{"title":"Three Heads Better than One: Pure Entity, Relation Label and Adversarial Training for Cross-domain Few-shot Relation Extraction","authors":"Wenlong Fang, Chunping Ouyang, Qiang Lin, Yue Yuan","doi":"10.1162/dint_a_00190","DOIUrl":"https://doi.org/10.1162/dint_a_00190","url":null,"abstract":"ABSTRACT In this paper, we study cross-domain relation extraction. Since new data mapping to feature spaces always differs from the previously seen data due to a domain shift, few-shot relation extraction often perform poorly. To solve the problems caused by cross-domain, we propose a method for combining the pure entity, relation labels and adversarial (PERLA). We first use entities and complete sentences for separate encoding to obtain context-independent entity features. Then, we combine relation labels which are useful for relation extraction to mitigate context noise. We combine adversarial to reduce the noise caused by cross-domain. We conducted experiments on the publicly available cross-domain relation extraction dataset Fewrel 2.0[1]①, and the results show that our approach improves accuracy and has better transferability for better adaptation to cross-domain tasks.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"807-823"},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48827572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Neural Collaborative Filtering with Embedding Dimension Correlations 基于嵌入维数关联的对抗神经协同过滤
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00151
Yi Gao, Jianxia Chen, Liang Xiao, Hongyang Wang, Liwei Pan, Xuan Wen, Zhiwei Ye, Xinyun Wu
ABSTRACT Recently, convolutional neural networks (CNNs) have achieved excellent performance for the recommendation system by extracting deep features and building collaborative filtering models. However, CNNs have been verified susceptible to adversarial examples. This is because adversarial samples are subtle non-random disturbances, which indicates that machine learning models produce incorrect outputs. Therefore, we propose a novel model of Adversarial Neural Collaborative Filtering with Embedding Dimension Correlations, named ANCF in short, to address the adversarial problem of CNN-based recommendation system. In particular, the proposed ANCF model adopts the matrix factorization to train the adversarial personalized ranking in the prediction layer. This is because matrix factorization supposes that the linear interaction of the latent factors, which are captured between the user and the item, can describe the observable feedback, thus the proposed ANCF model can learn more complicated representation of their latent factors to improve the performance of recommendation. In addition, the ANCF model utilizes the outer product instead of the inner product or concatenation to learn explicitly pairwise embedding dimensional correlations and obtain the interaction map from which CNNs can utilize its strengths to learn high-order correlations. As a result, the proposed ANCF model can improve the robustness performance by the adversarial personalized ranking, and obtain more information by encoding correlations between different embedding layers. Experimental results carried out on three public datasets demonstrate that the ANCF model outperforms other existing recommendation models.
摘要近年来,卷积神经网络通过提取深层特征和构建协同过滤模型,在推荐系统中取得了优异的性能。然而,细胞神经网络已被证实易受对抗性例子的影响。这是因为对抗性样本是微妙的非随机干扰,这表明机器学习模型产生了不正确的输出。因此,我们提出了一种新的嵌入维度相关性的对抗性神经协作过滤模型,简称ANCF,以解决基于CNN的推荐系统的对抗性问题。特别地,所提出的ANCF模型采用矩阵分解来训练预测层中的对抗性个性化排序。这是因为矩阵分解假设在用户和项目之间捕获的潜在因素的线性交互可以描述可观察的反馈,因此所提出的ANCF模型可以学习其潜在因素的更复杂的表示,以提高推荐性能。此外,ANCF模型利用外积而不是内积或级联来明确地学习成对嵌入维度相关性,并获得相互作用图,CNN可以从中利用其强度来学习高阶相关性。因此,所提出的ANCF模型可以通过对抗性个性化排序来提高鲁棒性性能,并通过编码不同嵌入层之间的相关性来获得更多信息。在三个公共数据集上进行的实验结果表明,ANCF模型优于其他现有的推荐模型。
{"title":"Adversarial Neural Collaborative Filtering with Embedding Dimension Correlations","authors":"Yi Gao, Jianxia Chen, Liang Xiao, Hongyang Wang, Liwei Pan, Xuan Wen, Zhiwei Ye, Xinyun Wu","doi":"10.1162/dint_a_00151","DOIUrl":"https://doi.org/10.1162/dint_a_00151","url":null,"abstract":"ABSTRACT Recently, convolutional neural networks (CNNs) have achieved excellent performance for the recommendation system by extracting deep features and building collaborative filtering models. However, CNNs have been verified susceptible to adversarial examples. This is because adversarial samples are subtle non-random disturbances, which indicates that machine learning models produce incorrect outputs. Therefore, we propose a novel model of Adversarial Neural Collaborative Filtering with Embedding Dimension Correlations, named ANCF in short, to address the adversarial problem of CNN-based recommendation system. In particular, the proposed ANCF model adopts the matrix factorization to train the adversarial personalized ranking in the prediction layer. This is because matrix factorization supposes that the linear interaction of the latent factors, which are captured between the user and the item, can describe the observable feedback, thus the proposed ANCF model can learn more complicated representation of their latent factors to improve the performance of recommendation. In addition, the ANCF model utilizes the outer product instead of the inner product or concatenation to learn explicitly pairwise embedding dimensional correlations and obtain the interaction map from which CNNs can utilize its strengths to learn high-order correlations. As a result, the proposed ANCF model can improve the robustness performance by the adversarial personalized ranking, and obtain more information by encoding correlations between different embedding layers. Experimental results carried out on three public datasets demonstrate that the ANCF model outperforms other existing recommendation models.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"786-806"},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45481064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous Metadata in Continuous Integration, Stream Processing and Enterprise DataOps 连续集成、流处理和企业数据操作中的连续元数据
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00193
M. Underwood
ABSTRACT Implementations of metadata tend to favor centralized, static metadata. This depiction is at variance with the past decade of focus on big data, cloud native architectures and streaming platforms. Big data velocity can demand a correspondingly dynamic view of metadata. These trends, which include DevOps, CI/CD, DataOps and data fabric, are surveyed. Several specific cloud native tools are reviewed and weaknesses in their current metadata use are identified. Implementations are suggested which better exploit capabilities of streaming platform paradigms, in which metadata is continuously collected in dynamic contexts. Future cloud native software features are identified which could enable streamed metadata to power real time data fusion or fine tune automated reasoning through real time ontology updates.
摘要元数据的实现倾向于支持集中式的静态元数据。这种描述与过去十年对大数据、云原生架构和流媒体平台的关注不一致。大数据速度可能需要相应的元数据动态视图。调查了这些趋势,包括DevOps、CI/CD、DataOps和数据结构。审查了几个特定的云原生工具,并确定了它们当前元数据使用中的弱点。提出了更好地利用流媒体平台范式的功能的实现,其中元数据在动态上下文中不断收集。确定了未来的云原生软件功能,这些功能可以使流式元数据能够支持实时数据融合或通过实时本体更新微调自动推理。
{"title":"Continuous Metadata in Continuous Integration, Stream Processing and Enterprise DataOps","authors":"M. Underwood","doi":"10.1162/dint_a_00193","DOIUrl":"https://doi.org/10.1162/dint_a_00193","url":null,"abstract":"ABSTRACT Implementations of metadata tend to favor centralized, static metadata. This depiction is at variance with the past decade of focus on big data, cloud native architectures and streaming platforms. Big data velocity can demand a correspondingly dynamic view of metadata. These trends, which include DevOps, CI/CD, DataOps and data fabric, are surveyed. Several specific cloud native tools are reviewed and weaknesses in their current metadata use are identified. Implementations are suggested which better exploit capabilities of streaming platform paradigms, in which metadata is continuously collected in dynamic contexts. Future cloud native software features are identified which could enable streamed metadata to power real time data fusion or fine tune automated reasoning through real time ontology updates.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"275-288"},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49258477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning for Medication Recommendation: A Systematic Survey 深度学习用于药物推荐:一个系统的调查
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-17 DOI: 10.1162/dint_a_00197
Z. Ali, Y. Huang, Irfan Ullah, Junlan Feng, Chao Deng, Nimbeshaho Thierry, Asad Khan, Asim Ullah Jan, Xiaoli Shen, Wu Rui, G. Qi
ABSTRACT Making medication prescriptions in response to the patient's diagnosis is a challenging task. The number of pharmaceutical companies, their inventory of medicines, and the recommended dosage confront a doctor with the well-known problem of information and cognitive overload. To assist a medical practitioner in making informed decisions regarding a medical prescription to a patient, researchers have exploited electronic health records (EHRs) in automatically recommending medication. In recent years, medication recommendation using EHRs has been a salient research direction, which has attracted researchers to apply various deep learning (DL) models to the EHRs of patients in recommending prescriptions. Yet, in the absence of a holistic survey article, it needs a lot of effort and time to study these publications in order to understand the current state of research and identify the best-performing models along with the trends and challenges. To fill this research gap, this survey reports on state-of-the-art DL-based medication recommendation methods. It reviews the classification of DL-based medication recommendation (MR) models, compares their performance, and the unavoidable issues they face. It reports on the most common datasets and metrics used in evaluating MR models. The findings of this study have implications for researchers interested in MR models.
摘要根据患者的诊断开具药物处方是一项具有挑战性的任务。制药公司的数量、药品库存和推荐剂量让医生面临着众所周知的信息和认知过载问题。为了帮助医生就患者的处方做出明智的决定,研究人员利用电子健康记录(EHR)自动推荐药物。近年来,使用EHR的药物推荐一直是一个突出的研究方向,这吸引了研究人员在推荐处方时将各种深度学习(DL)模型应用于患者的EHR。然而,在缺乏全面调查文章的情况下,研究这些出版物需要大量的精力和时间,以了解研究的现状,并确定表现最佳的模型以及趋势和挑战。为了填补这一研究空白,本次调查报告了最先进的基于DL的药物推荐方法。它回顾了基于DL的药物推荐(MR)模型的分类,比较了它们的性能,以及它们面临的不可避免的问题。它报告了用于评估MR模型的最常见数据集和指标。这项研究的发现对对MR模型感兴趣的研究人员具有启示意义。
{"title":"Deep Learning for Medication Recommendation: A Systematic Survey","authors":"Z. Ali, Y. Huang, Irfan Ullah, Junlan Feng, Chao Deng, Nimbeshaho Thierry, Asad Khan, Asim Ullah Jan, Xiaoli Shen, Wu Rui, G. Qi","doi":"10.1162/dint_a_00197","DOIUrl":"https://doi.org/10.1162/dint_a_00197","url":null,"abstract":"ABSTRACT Making medication prescriptions in response to the patient's diagnosis is a challenging task. The number of pharmaceutical companies, their inventory of medicines, and the recommended dosage confront a doctor with the well-known problem of information and cognitive overload. To assist a medical practitioner in making informed decisions regarding a medical prescription to a patient, researchers have exploited electronic health records (EHRs) in automatically recommending medication. In recent years, medication recommendation using EHRs has been a salient research direction, which has attracted researchers to apply various deep learning (DL) models to the EHRs of patients in recommending prescriptions. Yet, in the absence of a holistic survey article, it needs a lot of effort and time to study these publications in order to understand the current state of research and identify the best-performing models along with the trends and challenges. To fill this research gap, this survey reports on state-of-the-art DL-based medication recommendation methods. It reviews the classification of DL-based medication recommendation (MR) models, compares their performance, and the unavoidable issues they face. It reports on the most common datasets and metrics used in evaluating MR models. The findings of this study have implications for researchers interested in MR models.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"303-354"},"PeriodicalIF":3.9,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46680129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
HUSS: A Heuristic Method for Understanding the Semantic Structure of Spreadsheets 一种理解电子表格语义结构的启发式方法
3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-11 DOI: 10.1162/dint_a_00201
Xindong Wu, Hao Chen, Chenyang Bu, Shengwei Ji, Zan Zhang, Victor S. Sheng
Abstract Spreadsheets contain a lot of valuable data and have many practical applications. The key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets, e.g., identifying cell function types and discovering relationships between cell pairs. Most existing methods for understanding the semantic structure of spreadsheets do not make use of the semantic information of cells. A few studies do, but they ignore the layout structure information of spreadsheets, which affects the performance of cell function classification and the discovery of different relationship types of cell pairs. In this paper, we propose a Heuristic algorithm for Understanding the Semantic Structure of spreadsheets (HUSS). Specifically, for improving the cell function classification, we propose an error correction mechanism (ECM) based on an existing cell function classification model [11] and the layout features of spreadsheets. For improving the table structure analysis, we propose five types of heuristic rules to extract four different types of cell pairs, based on the cell style and spatial location information. Our experimental results on five real-world datasets demonstrate that HUSS can effectively understand the semantic structure of spreadsheets and outperforms corresponding baselines.
电子表格包含了大量有价值的数据,有许多实际应用。这些实际应用的关键技术是如何使机器理解电子表格的语义结构,例如,识别单元格功能类型和发现单元格对之间的关系。大多数现有的理解电子表格语义结构的方法都没有利用单元格的语义信息。虽然有一些研究做到了这一点,但它们忽略了电子表格的布局结构信息,从而影响了单元格功能分类的性能和单元格对不同关系类型的发现。本文提出了一种理解电子表格语义结构的启发式算法(HUSS)。具体而言,为了改进单元格功能分类,我们提出了一种基于现有单元格功能分类模型[11]和电子表格布局特征的纠错机制(ECM)。为了改进表结构分析,我们提出了基于单元格样式和空间位置信息的五种启发式规则来提取四种不同类型的单元格对。我们在五个真实数据集上的实验结果表明,HUSS可以有效地理解电子表格的语义结构,并且优于相应的基线。
{"title":"HUSS: A Heuristic Method for Understanding the Semantic Structure of Spreadsheets","authors":"Xindong Wu, Hao Chen, Chenyang Bu, Shengwei Ji, Zan Zhang, Victor S. Sheng","doi":"10.1162/dint_a_00201","DOIUrl":"https://doi.org/10.1162/dint_a_00201","url":null,"abstract":"Abstract Spreadsheets contain a lot of valuable data and have many practical applications. The key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets, e.g., identifying cell function types and discovering relationships between cell pairs. Most existing methods for understanding the semantic structure of spreadsheets do not make use of the semantic information of cells. A few studies do, but they ignore the layout structure information of spreadsheets, which affects the performance of cell function classification and the discovery of different relationship types of cell pairs. In this paper, we propose a Heuristic algorithm for Understanding the Semantic Structure of spreadsheets (HUSS). Specifically, for improving the cell function classification, we propose an error correction mechanism (ECM) based on an existing cell function classification model [11] and the layout features of spreadsheets. For improving the table structure analysis, we propose five types of heuristic rules to extract four different types of cell pairs, based on the cell style and spatial location information. Our experimental results on five real-world datasets demonstrate that HUSS can effectively understand the semantic structure of spreadsheets and outperforms corresponding baselines.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136006652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Source-Aware Embedding Training on Heterogeneous Information Networks 异构信息网络的源感知嵌入训练
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-11 DOI: 10.1162/dint_a_00200
Tsai Hor Chan, Chi Ho Wong, Jiajun Shen, Guosheng Yin
ABSTRACT Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms. This motivates us to propose SUMSHINE (Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding)—a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN. Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.
异构信息网络(HINs)已经广泛应用于现实世界的任务,如推荐系统、社交网络和引文网络。现有的HIN表示学习方法可以有效地学习网络中的语义和结构特征,但对单个HIN内子图分布差异的认识很少。然而,我们发现忽略多源子图之间的分布差异会阻碍图嵌入学习算法的有效性。这促使我们提出了SUMSHINE(可扩展无监督多源异构信息网络嵌入)——一个可扩展的无监督框架来对齐HIN的多个源之间的嵌入分布。在各种下游任务的真实数据集上的实验结果验证了我们的方法优于最先进的异构信息网络嵌入算法的性能。
{"title":"Source-Aware Embedding Training on Heterogeneous Information Networks","authors":"Tsai Hor Chan, Chi Ho Wong, Jiajun Shen, Guosheng Yin","doi":"10.1162/dint_a_00200","DOIUrl":"https://doi.org/10.1162/dint_a_00200","url":null,"abstract":"ABSTRACT Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms. This motivates us to propose SUMSHINE (Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding)—a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN. Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"611-635"},"PeriodicalIF":3.9,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44509817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Data Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1