首页 > 最新文献

International Journal of Data Warehousing and Mining最新文献

英文 中文
An Engineering Domain Knowledge-Based Framework for Modelling Highly Incomplete Industrial Data 高度不完备工业数据建模的工程领域知识框架
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-10-01 DOI: 10.4018/ijdwm.2021100103
Han Li, Zhao Liu, P. Zhu
The missing values in industrial data restrict the applications. Although this incomplete data contains enough information for engineers to support subsequent development, there are still too many missing values for algorithms to establish precise models. This is because the engineering domain knowledge is not considered, and valuable information is not fully captured. Therefore, this article proposes an engineering domain knowledge-based framework for modelling incomplete industrial data. The raw datasets are partitioned and processed at different scales. Firstly, the hierarchical features are combined to decrease the missing ratio. In order to fill the missing values in special data, which is identified for classifying the samples, samples with only part of the features presented are fully utilized instead of being removed to establish local imputation model. Then samples are divided into different groups to transfer the information. A series of industrial data is analyzed for verifying the feasibility of the proposed method.
工业数据中的缺失值限制了应用。虽然这些不完整的数据包含了足够工程师支持后续开发的信息,但对于算法建立精确模型来说,仍然有太多的缺失值。这是因为没有考虑到工程领域的知识,并且没有完全捕获有价值的信息。因此,本文提出了一种基于工程领域知识的不完全工业数据建模框架。对原始数据集进行了不同尺度的分区和处理。首先,结合层次特征降低缺失率;为了填补识别出来的特殊数据中缺失的值,对样本进行分类,不去除只呈现部分特征的样本,而是充分利用这些特征来建立局部插值模型。然后将样本分成不同的组来传递信息。通过对一系列工业数据的分析,验证了所提方法的可行性。
{"title":"An Engineering Domain Knowledge-Based Framework for Modelling Highly Incomplete Industrial Data","authors":"Han Li, Zhao Liu, P. Zhu","doi":"10.4018/ijdwm.2021100103","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100103","url":null,"abstract":"The missing values in industrial data restrict the applications. Although this incomplete data contains enough information for engineers to support subsequent development, there are still too many missing values for algorithms to establish precise models. This is because the engineering domain knowledge is not considered, and valuable information is not fully captured. Therefore, this article proposes an engineering domain knowledge-based framework for modelling incomplete industrial data. The raw datasets are partitioned and processed at different scales. Firstly, the hierarchical features are combined to decrease the missing ratio. In order to fill the missing values in special data, which is identified for classifying the samples, samples with only part of the features presented are fully utilized instead of being removed to establish local imputation model. Then samples are divided into different groups to transfer the information. A series of industrial data is analyzed for verifying the feasibility of the proposed method.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81853124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
P2P-COVID-GAN: Classification and Segmentation of COVID-19 Lung Infections From CT Images Using GAN P2P-COVID-GAN:基于GAN的CT图像中COVID-19肺部感染的分类和分割
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-10-01 DOI: 10.4018/ijdwm.2021100105
R. Abirami, M. DuraiRajVincentP., S. Kadry
Early and automatic segmentation of lung infections from computed tomography images of COVID-19 patients is crucial for timely quarantine and effective treatment. However, automating the segmentation of lung infection from CT slices is challenging due to a lack of contrast between the normal and infected tissues. A CNN and GAN-based framework are presented to classify and then segment the lung infections automatically from COVID-19 lung CT slices. In this work, the authors propose a novel method named P2P-COVID-SEG to automatically classify COVID-19 and normal CT images and then segment COVID-19 lung infections from CT images using GAN. The proposed model outperformed the existing classification models with an accuracy of 98.10%. The segmentation results outperformed existing methods and achieved infection segmentation with accurate boundaries. The Dice coefficient achieved using GAN segmentation is 81.11%. The segmentation results demonstrate that the proposed model outperforms the existing models and achieves state-of-the-art performance.
从COVID-19患者的计算机断层扫描图像中早期自动分割肺部感染对于及时隔离和有效治疗至关重要。然而,由于缺乏正常组织和感染组织之间的对比,从CT切片中自动分割肺部感染是具有挑战性的。提出了一种基于CNN和gan的框架,对COVID-19肺部CT切片的肺部感染进行自动分类和分割。本文提出了一种新颖的P2P-COVID-SEG方法,对COVID-19和正常CT图像进行自动分类,然后使用GAN从CT图像中分割COVID-19肺部感染。该模型的准确率达到98.10%,优于现有的分类模型。分割结果优于现有方法,实现了边界准确的感染分割。使用GAN分割得到的Dice系数为81.11%。分割结果表明,所提模型优于现有模型,达到了最先进的性能。
{"title":"P2P-COVID-GAN: Classification and Segmentation of COVID-19 Lung Infections From CT Images Using GAN","authors":"R. Abirami, M. DuraiRajVincentP., S. Kadry","doi":"10.4018/ijdwm.2021100105","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100105","url":null,"abstract":"Early and automatic segmentation of lung infections from computed tomography images of COVID-19 patients is crucial for timely quarantine and effective treatment. However, automating the segmentation of lung infection from CT slices is challenging due to a lack of contrast between the normal and infected tissues. A CNN and GAN-based framework are presented to classify and then segment the lung infections automatically from COVID-19 lung CT slices. In this work, the authors propose a novel method named P2P-COVID-SEG to automatically classify COVID-19 and normal CT images and then segment COVID-19 lung infections from CT images using GAN. The proposed model outperformed the existing classification models with an accuracy of 98.10%. The segmentation results outperformed existing methods and achieved infection segmentation with accurate boundaries. The Dice coefficient achieved using GAN segmentation is 81.11%. The segmentation results demonstrate that the proposed model outperforms the existing models and achieves state-of-the-art performance.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79083588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Novel Filter-Wrapper Algorithm on Intuitionistic Fuzzy Set for Attribute Reduction From Decision Tables 基于直觉模糊集的决策表属性约简滤波-包装算法
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-10-01 DOI: 10.4018/ijdwm.2021100104
Thang Truong Nguyen, Long Giang Nguyen, D. T. Tran, T. T. Nguyen, Huy Quang Nguyen, Anh Viet Pham, T. D. Vu
Attribute reduction from decision tables is one of the crucial topics in data mining. This problem belongs to NP-hard and many approximation algorithms based on the filter or the filter-wrapper approaches have been designed to find the reducts. Intuitionistic fuzzy set (IFS) has been regarded as the effective tool to deal with such the problem by adding two degrees, namely the membership and non-membership for each data element. The separation of attributes in the view of two counterparts as in the IFS set would increase the quality of classification and reduce the reducts. From this motivation, this paper proposes a new filter-wrapper algorithm based on the IFS for attribute reduction from decision tables. The contributions include a new instituitionistics fuzzy distance between partitions accompanied with theoretical analysis. The filter-wrapper algorithm is designed based on that distance with the new stopping condition based on the concept of delta-equality. Experiments are conducted on the benchmark UCI machine learning repository datasets.
决策表的属性约简是数据挖掘中的关键问题之一。这个问题属于np困难问题,人们设计了许多基于过滤器或过滤器-包装方法的近似算法来寻找约简。直觉模糊集(IFS)被认为是处理这类问题的有效工具,它为每个数据元素增加两个度,即隶属度和非隶属度。在IFS集合中,在两个对应物的视图中分离属性将提高分类质量并减少约简。基于这一动机,本文提出了一种新的基于IFS的过滤-包装算法,用于决策表的属性约简。贡献包括一个新的制度模糊距离分区与理论分析。基于该距离和基于delta-等式的新停止条件设计了滤波-包装算法。在基准UCI机器学习存储库数据集上进行了实验。
{"title":"A Novel Filter-Wrapper Algorithm on Intuitionistic Fuzzy Set for Attribute Reduction From Decision Tables","authors":"Thang Truong Nguyen, Long Giang Nguyen, D. T. Tran, T. T. Nguyen, Huy Quang Nguyen, Anh Viet Pham, T. D. Vu","doi":"10.4018/ijdwm.2021100104","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100104","url":null,"abstract":"Attribute reduction from decision tables is one of the crucial topics in data mining. This problem belongs to NP-hard and many approximation algorithms based on the filter or the filter-wrapper approaches have been designed to find the reducts. Intuitionistic fuzzy set (IFS) has been regarded as the effective tool to deal with such the problem by adding two degrees, namely the membership and non-membership for each data element. The separation of attributes in the view of two counterparts as in the IFS set would increase the quality of classification and reduce the reducts. From this motivation, this paper proposes a new filter-wrapper algorithm based on the IFS for attribute reduction from decision tables. The contributions include a new instituitionistics fuzzy distance between partitions accompanied with theoretical analysis. The filter-wrapper algorithm is designed based on that distance with the new stopping condition based on the concept of delta-equality. Experiments are conducted on the benchmark UCI machine learning repository datasets.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84367662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ETL Logs Under a Pattern-Oriented Approach 面向模式方法下的ETL日志
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-10-01 DOI: 10.4018/ijdwm.2021100102
Bruno Oliveira, Óscar Oliveira, O. Belo
Considering extract-transform-load (ETL) as a complex and evolutionary process, development teams must conscientiously and rigorously create log strategies for retrieving the most value of the information that can be gathered from the events that occur through the ETL workflow. Efficient logging strategies must be structured so that metrics, logs, and alerts can, beyond their troubleshooting capabilities, provide insights about the system. This paper presents a configurable and flexible ETL component for creating logging mechanisms in ETL workflows. A pattern-oriented approach is followed as a way to abstract ETL activities and enable its mapping to physical primitives that can be interpreted by ETL commercial tools.
考虑到提取-转换-加载(ETL)是一个复杂且不断发展的过程,开发团队必须认真且严格地创建日志策略,以便从通过ETL工作流发生的事件中收集最有价值的信息。必须构建有效的日志策略,以便度量、日志和警报能够提供关于系统的洞察,而不仅仅是它们的故障排除功能。本文提出了一个可配置的、灵活的ETL组件,用于在ETL工作流中创建日志机制。采用面向模式的方法作为一种抽象ETL活动并使其映射到可由ETL商业工具解释的物理原语的方法。
{"title":"ETL Logs Under a Pattern-Oriented Approach","authors":"Bruno Oliveira, Óscar Oliveira, O. Belo","doi":"10.4018/ijdwm.2021100102","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100102","url":null,"abstract":"Considering extract-transform-load (ETL) as a complex and evolutionary process, development teams must conscientiously and rigorously create log strategies for retrieving the most value of the information that can be gathered from the events that occur through the ETL workflow. Efficient logging strategies must be structured so that metrics, logs, and alerts can, beyond their troubleshooting capabilities, provide insights about the system. This paper presents a configurable and flexible ETL component for creating logging mechanisms in ETL workflows. A pattern-oriented approach is followed as a way to abstract ETL activities and enable its mapping to physical primitives that can be interpreted by ETL commercial tools.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85506565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Novel Approach Using Non-Synonymous Materialized Queries for Data Warehousing 使用非同义物化查询的数据仓库新方法
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-07-01 DOI: 10.4018/IJDWM.2021070102
S. Chakraborty
Data from multiple sources are loaded into the organization data warehouse for analysis. Since some OLAP queries are quite frequently fired on the warehouse data, their execution time is reduced by storing the queries and results in a relational database, referred as materialized query database (MQDB). If the tables, fields, functions, and criteria of input query and stored query are the same but the query criteria specified in WHERE or HAVING clause do not match, then they are considered non-synonymous to each other. In the present research, the results of non-synonymous queries are generated by reusing the existing stored results after applying UNION or MINUS operations on them. This will reduce the execution time of non-synonymous queries. For superset criteria values of input query, UNION operation is applied, and for subset values, MINUS operation is applied. Incremental result processing of existing stored results, if required, is performed using Data Marts.
来自多个源的数据被加载到组织数据仓库中进行分析。由于一些OLAP查询经常在仓库数据上触发,因此通过将查询和结果存储在关系数据库(称为物化查询数据库(MQDB))中,可以减少它们的执行时间。如果输入查询和存储查询的表、字段、函数和条件相同,但WHERE或HAVING子句中指定的查询条件不匹配,则认为它们彼此非同义。在本研究中,非同义查询的结果是通过对已有存储的结果进行UNION或MINUS操作后重用产生的。这将减少非同义查询的执行时间。对于输入查询的超集标准值,应用UNION操作,对于子集值,应用MINUS操作。如果需要,可以使用Data markets对现有存储结果进行增量结果处理。
{"title":"A Novel Approach Using Non-Synonymous Materialized Queries for Data Warehousing","authors":"S. Chakraborty","doi":"10.4018/IJDWM.2021070102","DOIUrl":"https://doi.org/10.4018/IJDWM.2021070102","url":null,"abstract":"Data from multiple sources are loaded into the organization data warehouse for analysis. Since some OLAP queries are quite frequently fired on the warehouse data, their execution time is reduced by storing the queries and results in a relational database, referred as materialized query database (MQDB). If the tables, fields, functions, and criteria of input query and stored query are the same but the query criteria specified in WHERE or HAVING clause do not match, then they are considered non-synonymous to each other. In the present research, the results of non-synonymous queries are generated by reusing the existing stored results after applying UNION or MINUS operations on them. This will reduce the execution time of non-synonymous queries. For superset criteria values of input query, UNION operation is applied, and for subset values, MINUS operation is applied. Incremental result processing of existing stored results, if required, is performed using Data Marts.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74450217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scalable Biclustering Algorithm Considers the Presence or Absence of Properties 可伸缩双聚类算法考虑属性的存在与否
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021010103
Abdélilah Balamane
Most existing biclustering algorithms take into account the properties that hold for a set of objects. However, it could be beneficial in several application domains such as organized crimes, genetics, or digital marketing to identify homogeneous groups of similar objects in terms of both the presence and the absence of attributes. In this paper, the author proposes a scalable and efficient algorithm of biclustering that exploits a binary matrix to produce at least three types of biclusters where the cell's column (1) are filled with 1's, (2) are filled with 0's, and (3) some columns filled with 1's and/or with 0's. This procedure is scalable and it's executed without having to consider the complementary of the initial binary context. The implementation and validation of the method on data sets illustrates its potential in the discovery of relevant patterns.
大多数现有的双聚类算法都考虑到一组对象的属性。但是,在有组织犯罪、遗传学或数字营销等几个应用领域中,根据是否存在属性来识别相似对象的同质组可能是有益的。在本文中,作者提出了一种可扩展且高效的双聚类算法,该算法利用一个二进制矩阵来产生至少三种类型的双聚类,其中单元格的列(1)被1填充,(2)被0填充,(3)某些列被1和/或0填充。这个过程是可伸缩的,它的执行不需要考虑初始二进制上下文的补充。该方法在数据集上的实现和验证说明了它在发现相关模式方面的潜力。
{"title":"Scalable Biclustering Algorithm Considers the Presence or Absence of Properties","authors":"Abdélilah Balamane","doi":"10.4018/IJDWM.2021010103","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010103","url":null,"abstract":"Most existing biclustering algorithms take into account the properties that hold for a set of objects. However, it could be beneficial in several application domains such as organized crimes, genetics, or digital marketing to identify homogeneous groups of similar objects in terms of both the presence and the absence of attributes. In this paper, the author proposes a scalable and efficient algorithm of biclustering that exploits a binary matrix to produce at least three types of biclusters where the cell's column (1) are filled with 1's, (2) are filled with 0's, and (3) some columns filled with 1's and/or with 0's. This procedure is scalable and it's executed without having to consider the complementary of the initial binary context. The implementation and validation of the method on data sets illustrates its potential in the discovery of relevant patterns.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75477856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Enhancing Data Quality at ETL Stage of Data Warehousing 提高数据仓库ETL阶段的数据质量
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021010105
Neha Gupta, Sakshi Jolly
Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.
数据通常从具有不同格式的多个数据源进入数据仓库,具体分为三组(即结构化、半结构化和非结构化)。使用各种数据挖掘技术来收集、提炼和分析数据,从而导致数据质量管理问题。当数据受ETL方法约束时,就会进行数据净化,以保持和提高数据质量。数据可能包含不必要的信息,并且可能具有不适当的符号,这些符号可以定义为虚拟值、隐值或缺失值。本文改进了使用点积的期望最大化算法来处理隐式数据,使用Gower度量来确保虚拟值的DBSCAN方法,使用Minkowski距离来改善矛盾数据的结果的Wards算法和使用欧几里得距离度量来处理数据集中缺失值的K-means算法。这些距离度量提高了数据质量,还有助于提供加载到数据仓库中的一致数据。
{"title":"Enhancing Data Quality at ETL Stage of Data Warehousing","authors":"Neha Gupta, Sakshi Jolly","doi":"10.4018/IJDWM.2021010105","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010105","url":null,"abstract":"Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73515708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Development of a Framework for Preserving the Disease-Evidence-Information to Support Efficient Disease Diagnosis 疾病证据信息保存框架的开发以支持有效的疾病诊断
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021040104
V. Rajinikanth, S. Kadry
In medical domain, the detection of the acute diseases based on the medical data plays a vital role in identifying the nature, cause, and the severity of the disease with suitable accuracy; this information supports the doctor during the decision making and treatment planning procedures. The research aims to develop a framework for preserving the disease-evidence-information (DEvI) to support the automated disease detection process. Various phases of DEvI include (1) data collection, (2) data pre- and post-processing, (3) disease information mining, and (4) implementation of a deep-neural-network (DNN) architecture to detect the disease. To demonstrate the proposed framework, assessment of lung nodule (LN) is presented, and the attained result confirms that this framework helps to attain better segmentation as well as classification result. This technique is clinically significant and helps to reduce the diagnostic burden of the doctor during the malignant LN detection.
在医学领域,基于医学数据的急性疾病检测对于准确识别疾病的性质、病因和严重程度起着至关重要的作用;这些信息支持医生在决策和治疗计划过程中。本研究旨在建立一个保存疾病证据信息(DEvI)的框架,以支持疾病自动检测过程。DEvI的各个阶段包括(1)数据收集,(2)数据预处理和后处理,(3)疾病信息挖掘,以及(4)实现深度神经网络(DNN)架构来检测疾病。为了验证所提出的框架,提出了对肺结节(LN)的评估,获得的结果证实了该框架有助于获得更好的分割和分类结果。这项技术具有临床意义,有助于减轻医生在恶性LN检测过程中的诊断负担。
{"title":"Development of a Framework for Preserving the Disease-Evidence-Information to Support Efficient Disease Diagnosis","authors":"V. Rajinikanth, S. Kadry","doi":"10.4018/IJDWM.2021040104","DOIUrl":"https://doi.org/10.4018/IJDWM.2021040104","url":null,"abstract":"In medical domain, the detection of the acute diseases based on the medical data plays a vital role in identifying the nature, cause, and the severity of the disease with suitable accuracy; this information supports the doctor during the decision making and treatment planning procedures. The research aims to develop a framework for preserving the disease-evidence-information (DEvI) to support the automated disease detection process. Various phases of DEvI include (1) data collection, (2) data pre- and post-processing, (3) disease information mining, and (4) implementation of a deep-neural-network (DNN) architecture to detect the disease. To demonstrate the proposed framework, assessment of lung nodule (LN) is presented, and the attained result confirms that this framework helps to attain better segmentation as well as classification result. This technique is clinically significant and helps to reduce the diagnostic burden of the doctor during the malignant LN detection.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81417632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Volunteer Data Warehouse: State of the Art 志愿者数据仓库:最新技术
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021070101
Amir Sakka, S. Bimonte, F. Pinet, Lucile Sautot
With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.
随着众包系统的成熟,新的分析可能性出现,志愿者通过将实践和日常经验中产生的隐性知识发挥关键作用。同时,数据仓库和OLAP系统代表了决策支持系统的第一个公民。它们允许根据多维模型分析大量数据。多维模型越能反映决策者的分析需求,DW项目就越成功。然而,当志愿者参与DW的设计时,现有的DW设计方法存在一些局限性。在这项工作中,作者介绍了志愿数据仓库(VDW)设计的主要特征,并研究了现有的主要数据仓库设计方法,以找出他们如何能够为实现这种特定数据仓库方法所需的特征做出贡献。为了提供一个对现有工作进行分类的正式框架,他们对经典DW用户和志愿者之间的差异进行了研究。本文还提出了一组VDW的开放问题。
{"title":"Volunteer Data Warehouse: State of the Art","authors":"Amir Sakka, S. Bimonte, F. Pinet, Lucile Sautot","doi":"10.4018/IJDWM.2021070101","DOIUrl":"https://doi.org/10.4018/IJDWM.2021070101","url":null,"abstract":"With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87022433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ranking News Feed Updates on Social Media: A Review and Expertise-Aware Approach 对社交媒体上的新闻动态更新进行排名:一种审查和专业知识意识方法
IF 1.2 4区 计算机科学 Q3 Computer Science Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021010102
S. Belkacem, K. Boukhalfa
Social media are used by hundreds of millions of users worldwide. On these platforms, any user can post and share updates with individuals from his social network. Due to the large amount of data, users are overwhelmed by updates displayed chronologically in their newsfeed. Moreover, most of them are irrelevant. Ranking newsfeed updates in order of relevance is proposed to help users quickly catch up with the relevant updates. In this work, the authors first study approaches proposed in this area according to four main criteria: features that may influence relevance, relevance prediction models, training and evaluation methods, and evaluation platforms. Then the authors propose an approach that leverages another type of feature which is the expertise of the update's author for the corresponding topics. Experimental results on Twitter highlight that judging expertise, which has not been considered in the academic and the industrial communities, is crucial for maximizing the relevance of updates in newsfeeds.
全球有数亿用户使用社交媒体。在这些平台上,任何用户都可以在他的社交网络上发布和分享更新。由于大量的数据,用户被按时间顺序显示的更新淹没了。此外,其中大多数都是无关紧要的。根据相关性对新闻源更新进行排序,以帮助用户快速赶上相关更新。在这项工作中,作者首先根据四个主要标准研究了该领域提出的方法:可能影响相关性的特征、相关性预测模型、训练和评估方法以及评估平台。然后作者提出了一种方法,该方法利用了另一种类型的特性,即更新作者对相应主题的专业知识。Twitter上的实验结果强调,评判专业知识对于最大限度地提高新闻源更新的相关性至关重要,而这一点在学术界和工业界都没有得到考虑。
{"title":"Ranking News Feed Updates on Social Media: A Review and Expertise-Aware Approach","authors":"S. Belkacem, K. Boukhalfa","doi":"10.4018/IJDWM.2021010102","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010102","url":null,"abstract":"Social media are used by hundreds of millions of users worldwide. On these platforms, any user can post and share updates with individuals from his social network. Due to the large amount of data, users are overwhelmed by updates displayed chronologically in their newsfeed. Moreover, most of them are irrelevant. Ranking newsfeed updates in order of relevance is proposed to help users quickly catch up with the relevant updates. In this work, the authors first study approaches proposed in this area according to four main criteria: features that may influence relevance, relevance prediction models, training and evaluation methods, and evaluation platforms. Then the authors propose an approach that leverages another type of feature which is the expertise of the update's author for the corresponding topics. Experimental results on Twitter highlight that judging expertise, which has not been considered in the academic and the industrial communities, is crucial for maximizing the relevance of updates in newsfeeds.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82725512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Data Warehousing and Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1