首页 > 最新文献

J. Univers. Comput. Sci.最新文献

英文 中文
Big Data Provenance Using Blockchain for Qualitative Analytics via Machine Learning 通过机器学习使用区块链进行定性分析的大数据来源
Pub Date : 2023-05-28 DOI: 10.3897/jucs.93533
Kashif Mehboob Khan, Warda Haider, N. A. Khan, Darakhshan Saleem
The amount of data is increasing rapidly as more and more devices are being linked to the Internet. Big data has a variety of uses and benefits, but it also has numerous challenges associated with it that are required to be resolved to raise the caliber of available services, including data integrity and security, analytics, acumen, and organization of Big data. While actively seeking the best way to manage, systemize, integrate, and affix Big data, we concluded that blockchain methodology contributes significantly. Its presented approaches for decentralized data management, digital property reconciliation, and internet of things data interchange have a massive impact on how Big data will advance. Unauthorized access to the data is very challenging due to the ciphered and decentralized data preservation in the blockchain network. This paper proposes insights related to specific Big data applications that can be analyzed by machine learning algorithms, driven by data provenance, and coupled with blockchain technology to increase data trustworthiness by giving interference-resistant information associated with the lineage and chronology of data records. The scenario of record tampering and big data provenance has been illustrated here using a diabetes prediction. The study carries out an empirical analysis on hundreds of patient records to perform the evaluation and to observe the impact of tampered records on big data analysis i.e diabetes model prediction. Through our experimentation, we may infer that under our blockchain-based system the unchangeable and tamper-proof metadata connected to the source and evolution of records produced verifiability to acquired data and thus high accuracy to our diabetes prediction model. 
随着越来越多的设备连接到互联网,数据量正在迅速增加。大数据有各种各样的用途和好处,但它也有许多与之相关的挑战,需要解决这些挑战,以提高可用服务的水平,包括数据完整性和安全性、分析、敏锐性和大数据的组织。在积极寻求管理、系统化、整合和附加大数据的最佳方法的同时,我们得出结论,区块链方法贡献巨大。它提出的分散数据管理、数字财产对账和物联网数据交换的方法对大数据的发展产生了巨大的影响。由于区块链网络中的加密和分散的数据保存,对数据的未经授权访问非常具有挑战性。本文提出了与特定大数据应用相关的见解,这些应用可以通过机器学习算法进行分析,由数据来源驱动,并与区块链技术相结合,通过提供与数据记录的谱系和年表相关的抗干扰信息来提高数据的可信度。记录篡改和大数据来源的场景在这里用糖尿病预测来说明。本研究通过对数百份患者病历进行实证分析,进行评估,观察篡改病历对大数据分析即糖尿病模型预测的影响。通过我们的实验,我们可以推断,在我们基于区块链的系统下,与记录的来源和演变相连接的不可更改和防篡改的元数据为获取的数据提供了可验证性,从而为我们的糖尿病预测模型提供了高精度。
{"title":"Big Data Provenance Using Blockchain for Qualitative Analytics via Machine Learning","authors":"Kashif Mehboob Khan, Warda Haider, N. A. Khan, Darakhshan Saleem","doi":"10.3897/jucs.93533","DOIUrl":"https://doi.org/10.3897/jucs.93533","url":null,"abstract":"The amount of data is increasing rapidly as more and more devices are being linked to the Internet. Big data has a variety of uses and benefits, but it also has numerous challenges associated with it that are required to be resolved to raise the caliber of available services, including data integrity and security, analytics, acumen, and organization of Big data. While actively seeking the best way to manage, systemize, integrate, and affix Big data, we concluded that blockchain methodology contributes significantly. Its presented approaches for decentralized data management, digital property reconciliation, and internet of things data interchange have a massive impact on how Big data will advance. Unauthorized access to the data is very challenging due to the ciphered and decentralized data preservation in the blockchain network. This paper proposes insights related to specific Big data applications that can be analyzed by machine learning algorithms, driven by data provenance, and coupled with blockchain technology to increase data trustworthiness by giving interference-resistant information associated with the lineage and chronology of data records. The scenario of record tampering and big data provenance has been illustrated here using a diabetes prediction. The study carries out an empirical analysis on hundreds of patient records to perform the evaluation and to observe the impact of tampered records on big data analysis i.e diabetes model prediction. Through our experimentation, we may infer that under our blockchain-based system the unchangeable and tamper-proof metadata connected to the source and evolution of records produced verifiability to acquired data and thus high accuracy to our diabetes prediction model. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"30 1","pages":"446-469"},"PeriodicalIF":0.0,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78240578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topological Similarity and Centrality Driven Hybrid Deep Learning for Temporal Link Prediction 拓扑相似性和中心性驱动的混合深度学习用于时间链接预测
Pub Date : 2023-05-28 DOI: 10.3897/jucs.99169
Abubakhari Sserwadda, Alper Ozcan, Y. Yaslan
Several real-world phenomena, including social, communication, transportation, and biological networks, can be efficiently expressed as graphs. This enables the deployment of graph algorithms to infer information from such complex network interactions to enhance graph applications’ accuracy, including link prediction, node classification, and clustering. However, the large size and complexity of the network data limit the efficiency of the learning algorithms in making decisions from such graph datasets. To overcome these limitations, graph embedding techniques are usually adopted. However, many studies not only assume static networks but also pay less attention to preserving the network topological and centrality information, which information is key in analyzing networks. In order to fill these gaps, we propose a novel end-to-end unified Topological Similarity and Centrality driven Hybrid Deep Learning model for Temporal Link Prediction (TSC-TLP). First, we extract topological similarity and centrality-based features from the raw networks. Next, we systematically aggregate these topological and centrality features to act as inputs for the encoder. In addition, we leverage the long short-term memory (LSTM) layer to learn the underlying temporal information in the graph snapshots. Lastly, we impose topological similarity and centrality constraints on the model learning to enforce preserving of topological structure and node centrality role of the input graphs in the learned embeddings. The proposed TSC-TLP is tested on 3 real-world temporal social networks. On average, it exhibits a 4% improvement in link prediction accuracy and a 37% reduction in MSE on centrality prediction over the best benchmark.
一些现实世界的现象,包括社会、通信、交通和生物网络,可以有效地用图形表示。这使得部署图算法能够从这种复杂的网络交互中推断信息,以提高图应用程序的准确性,包括链接预测、节点分类和聚类。然而,网络数据的庞大规模和复杂性限制了从此类图数据集进行决策的学习算法的效率。为了克服这些限制,通常采用图嵌入技术。然而,许多研究只假设网络是静态的,而对网络拓扑和中心性信息的保留关注较少,而这些信息是分析网络的关键。为了填补这些空白,我们提出了一种新的端到端统一拓扑相似性和中心性驱动的时间链路预测混合深度学习模型(TSC-TLP)。首先,我们从原始网络中提取拓扑相似性和基于中心性的特征。接下来,我们系统地聚合这些拓扑和中心性特征,作为编码器的输入。此外,我们利用长短期记忆(LSTM)层来学习图快照中的底层时间信息。最后,我们在模型学习中施加拓扑相似性和中心性约束,以确保在学习的嵌入中保留输入图的拓扑结构和节点中心性。提出的TSC-TLP在3个现实社会网络上进行了测试。平均而言,与最佳基准相比,它在链路预测精度方面提高了4%,在中心性预测方面的MSE降低了37%。
{"title":"Topological Similarity and Centrality Driven Hybrid Deep Learning for Temporal Link Prediction","authors":"Abubakhari Sserwadda, Alper Ozcan, Y. Yaslan","doi":"10.3897/jucs.99169","DOIUrl":"https://doi.org/10.3897/jucs.99169","url":null,"abstract":"Several real-world phenomena, including social, communication, transportation, and biological networks, can be efficiently expressed as graphs. This enables the deployment of graph algorithms to infer information from such complex network interactions to enhance graph applications’ accuracy, including link prediction, node classification, and clustering. However, the large size and complexity of the network data limit the efficiency of the learning algorithms in making decisions from such graph datasets. To overcome these limitations, graph embedding techniques are usually adopted. However, many studies not only assume static networks but also pay less attention to preserving the network topological and centrality information, which information is key in analyzing networks. In order to fill these gaps, we propose a novel end-to-end unified Topological Similarity and Centrality driven Hybrid Deep Learning model for Temporal Link Prediction (TSC-TLP). First, we extract topological similarity and centrality-based features from the raw networks. Next, we systematically aggregate these topological and centrality features to act as inputs for the encoder. In addition, we leverage the long short-term memory (LSTM) layer to learn the underlying temporal information in the graph snapshots. Lastly, we impose topological similarity and centrality constraints on the model learning to enforce preserving of topological structure and node centrality role of the input graphs in the learned embeddings. The proposed TSC-TLP is tested on 3 real-world temporal social networks. On average, it exhibits a 4% improvement in link prediction accuracy and a 37% reduction in MSE on centrality prediction over the best benchmark.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"10 1","pages":"470-490"},"PeriodicalIF":0.0,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78546791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Electronic Nose and Eye Systems for Detection of Adulteration in Olive Oil based on Chemometrics and Optimization Approaches 基于化学计量学和优化方法的电子鼻和电子眼系统在橄榄油掺假检测中的应用
Pub Date : 2023-04-28 DOI: 10.3897/jucs.90346
Seyedeh Mahsa Mirhoseini-Moghaddam, Mohammad Reza Yamaghani, A. Bakhshipour
In this study, a combined system of electronic nose (e-nose) and computer vision was developed for the detection of adulteration in extra virgin olive oil (EVOO). The canola oil was blended with the pure EVOO to provide adulterations at four levels of 5, 10, 15, and 20%. Data collection was carried out using an e-nose system containing 13 metal oxide gas sensors, and a computer vision system. Applying principal component analysis (PCA) on the e-nose-extracted features showed that 93% and 92% of total data variance was covered by the three first PCs generated from Maximum Sensor Response (MSR), Area Under Curve (AUC) features, respectively. Cluster analysis verified that the pure and impure EVOO samples can be categorized by e-nose properties. PCA-Quadratic Discriminant Analysis (PCA-QDA) classified the EVOOs with an accuracy of 100%. Multiple Linear Regression (MLR) was able to estimate the adulteration percentage with the R2 of 0.8565 and RMSE of 2.7125 on the validation dataset. Moreover, factor analysis using Partial Least Square (PLS) introduced the MQ3 and TGS2620 sensors as the most important e-nose sensors for EVOO adulteration monitoring. Application of Response Surface Methodology (RSM) on RGB, HSV, L*,a*, and b* as color parameters of the EVOO images revealed that the color parameters are at their optimal state in the case up to 0.1% of canola impurity, where the obtained desirability index was 97%. Results of this study demonstrated the high capability of e-nose and computer vision systems for accurate, fast and non-destructive detection of adulteration in EVOO and detection of food adulteration may be more reliable using these artificial senses. 
在这项研究中,开发了一种电子鼻和计算机视觉相结合的系统,用于检测特级初榨橄榄油(EVOO)中的掺假。将菜籽油与纯EVOO混合,以提供5%、10%、15%和20%四个级别的掺假。数据收集使用了一个电子鼻系统,其中包含13个金属氧化物气体传感器和一个计算机视觉系统。应用主成分分析(PCA)对e-nose提取的特征进行分析,结果表明,由最大传感器响应(MSR)和曲线下面积(AUC)特征产生的前三个pc分别覆盖了93%和92%的总数据方差。聚类分析验证了纯和不纯EVOO样品可以通过电子鼻特性进行分类。pca -二次判别分析(PCA-QDA)对evoo的分类准确率为100%。多元线性回归(MLR)能够在验证数据集上估计掺假百分比,R2为0.8565,RMSE为2.7125。此外,利用偏最小二乘法(PLS)进行因子分析,将MQ3和TGS2620传感器作为EVOO掺假监测中最重要的电子鼻传感器。应用响应面法(RSM)对EVOO图像的RGB、HSV、L*、a*和b*作为颜色参数进行分析,结果表明,当油菜籽杂质含量为0.1%时,颜色参数处于最佳状态,获得的理想指数为97%。本研究结果表明,电子鼻和计算机视觉系统能够准确、快速、无损地检测EVOO中的掺假,使用这些人工感官检测食品掺假可能更可靠。
{"title":"Application of Electronic Nose and Eye Systems for Detection of Adulteration in Olive Oil based on Chemometrics and Optimization Approaches","authors":"Seyedeh Mahsa Mirhoseini-Moghaddam, Mohammad Reza Yamaghani, A. Bakhshipour","doi":"10.3897/jucs.90346","DOIUrl":"https://doi.org/10.3897/jucs.90346","url":null,"abstract":"In this study, a combined system of electronic nose (e-nose) and computer vision was developed for the detection of adulteration in extra virgin olive oil (EVOO). The canola oil was blended with the pure EVOO to provide adulterations at four levels of 5, 10, 15, and 20%. Data collection was carried out using an e-nose system containing 13 metal oxide gas sensors, and a computer vision system. Applying principal component analysis (PCA) on the e-nose-extracted features showed that 93% and 92% of total data variance was covered by the three first PCs generated from Maximum Sensor Response (MSR), Area Under Curve (AUC) features, respectively. Cluster analysis verified that the pure and impure EVOO samples can be categorized by e-nose properties. PCA-Quadratic Discriminant Analysis (PCA-QDA) classified the EVOOs with an accuracy of 100%. Multiple Linear Regression (MLR) was able to estimate the adulteration percentage with the R2 of 0.8565 and RMSE of 2.7125 on the validation dataset. Moreover, factor analysis using Partial Least Square (PLS) introduced the MQ3 and TGS2620 sensors as the most important e-nose sensors for EVOO adulteration monitoring. Application of Response Surface Methodology (RSM) on RGB, HSV, L*,a*, and b* as color parameters of the EVOO images revealed that the color parameters are at their optimal state in the case up to 0.1% of canola impurity, where the obtained desirability index was 97%. Results of this study demonstrated the high capability of e-nose and computer vision systems for accurate, fast and non-destructive detection of adulteration in EVOO and detection of food adulteration may be more reliable using these artificial senses. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"1 1","pages":"300-325"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73358895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-Effective Scheduling in Fog Computing: An Environment Based on Modified PROMETHEE Technique 雾计算中的高效调度:基于改进PROMETHEE技术的环境
Pub Date : 2023-04-28 DOI: 10.3897/jucs.90429
Shefali Varshney, Rajinder Sandhu, P. K. Gupta
With the rising use of Internet of Things (IoT)-enabled devices, there is a significant increase in the use of smart applications that provide their response in real time. This rising demand imposes many issues such as scheduling, cost, overloading of servers, etc. To overcome these, a cost-effective scheduling technique has been proposed for the allocation of smart applications. The aim of this paper is to provide better profit by the Fog environment and minimize the cost of smart applications from the user end. The proposed framework has been evaluated with the help of a test bed containing four analysis phases and is compared on the basis of five metrics- average allocation time, average profit by the Fog environment, average cost of smart applications, resource utilization and number of applications run within given latency. The proposed framework performs better under all the provided metrics. 
随着支持物联网(IoT)的设备的使用越来越多,实时提供响应的智能应用程序的使用显着增加。这种不断增长的需求带来了许多问题,如调度、成本、服务器过载等。为了克服这些问题,提出了一种具有成本效益的智能应用程序分配调度技术。本文的目的是通过雾环境提供更好的利润,并从用户端最小化智能应用程序的成本。在包含四个分析阶段的测试平台的帮助下,对所提出的框架进行了评估,并根据五个指标进行了比较——平均分配时间、Fog环境的平均利润、智能应用程序的平均成本、资源利用率和在给定延迟内运行的应用程序数量。建议的框架在所有提供的指标下表现更好。
{"title":"Cost-Effective Scheduling in Fog Computing: An Environment Based on Modified PROMETHEE Technique","authors":"Shefali Varshney, Rajinder Sandhu, P. K. Gupta","doi":"10.3897/jucs.90429","DOIUrl":"https://doi.org/10.3897/jucs.90429","url":null,"abstract":"With the rising use of Internet of Things (IoT)-enabled devices, there is a significant increase in the use of smart applications that provide their response in real time. This rising demand imposes many issues such as scheduling, cost, overloading of servers, etc. To overcome these, a cost-effective scheduling technique has been proposed for the allocation of smart applications. The aim of this paper is to provide better profit by the Fog environment and minimize the cost of smart applications from the user end. The proposed framework has been evaluated with the help of a test bed containing four analysis phases and is compared on the basis of five metrics- average allocation time, average profit by the Fog environment, average cost of smart applications, resource utilization and number of applications run within given latency. The proposed framework performs better under all the provided metrics. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"7 1","pages":"397-416"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90191301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Mobility Prediction with Region-based Flows and Road Traffic Data 基于区域流量和道路交通数据的人类流动性预测
Pub Date : 2023-04-28 DOI: 10.3897/jucs.94514
Fernando Terroso-Sáenz, Andrés Muñoz
Predicting human mobility is a key element in the development of intelligent transport systems. Current digital technologies enable capturing a wealth of data on mobility flows between geographic areas, which are then used to train machine learning models to predict these flows. However, most works have only considered a single data source for building these models or different sources but covering the same spatial area. In this paper we propose to augment a macro open-data mobility study based on cellular phones with data from a road traffic sensor located within a specific motorway of one of the mobility areas in the study. The results show that models trained with the fusion of both types of data, especially long short-term memory (LSTM) and Gated Recurrent Unit (GRU) neural networks, provide a more reliable prediction than models based only on the open data source. These results show that it is possible to predict the traffic entering a particular city in the next 30 minutes with an absolute error less than 10%. Thus, this work is a further step towards improving the prediction of human mobility in interurban areas by fusing open data with data from IoT systems.
预测人类的流动性是智能交通系统发展的关键因素。目前的数字技术能够捕获地理区域之间流动流动的大量数据,然后将这些数据用于训练机器学习模型来预测这些流动。然而,大多数工作只考虑一个数据源来构建这些模型,或者不同的数据源覆盖相同的空间区域。在本文中,我们建议使用位于研究中一个移动区域的特定高速公路内的道路交通传感器的数据来增强基于手机的宏观开放数据移动研究。结果表明,两种数据融合训练的模型,特别是长短期记忆(LSTM)和门控制循环单元(GRU)神经网络,比仅基于开放数据源的模型提供了更可靠的预测。这些结果表明,在绝对误差小于10%的情况下,预测未来30分钟进入特定城市的交通是可能的。因此,这项工作是通过融合开放数据和物联网系统的数据来改善城际地区人类流动性预测的又一步。
{"title":"Human Mobility Prediction with Region-based Flows and Road Traffic Data","authors":"Fernando Terroso-Sáenz, Andrés Muñoz","doi":"10.3897/jucs.94514","DOIUrl":"https://doi.org/10.3897/jucs.94514","url":null,"abstract":"Predicting human mobility is a key element in the development of intelligent transport systems. Current digital technologies enable capturing a wealth of data on mobility flows between geographic areas, which are then used to train machine learning models to predict these flows. However, most works have only considered a single data source for building these models or different sources but covering the same spatial area. In this paper we propose to augment a macro open-data mobility study based on cellular phones with data from a road traffic sensor located within a specific motorway of one of the mobility areas in the study. The results show that models trained with the fusion of both types of data, especially long short-term memory (LSTM) and Gated Recurrent Unit (GRU) neural networks, provide a more reliable prediction than models based only on the open data source. These results show that it is possible to predict the traffic entering a particular city in the next 30 minutes with an absolute error less than 10%. Thus, this work is a further step towards improving the prediction of human mobility in interurban areas by fusing open data with data from IoT systems.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"14 1","pages":"374-396"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87147768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Fusion and NRML Metric Learning for Facial Kinship Verification 人脸亲属关系验证的特征融合与NRML度量学习
Pub Date : 2023-04-28 DOI: 10.3897/jucs.89254
Fahimeh Ramazankhani, Mahdi Yazdian Dehkordi, M. Rezaeian
Features extracted from facial images are used in various fields such as kinship verification. The kinship verification system determines the kin or non-kin relation between a pair of facial images by analysing their facial features. In this research, different texture and color features have been used along with the metric learning method, to verify the kinship for the four kinship relations of father-son, father-daughter, mother-son and mother-daughter. First, by fusing effective features, NRML metric learning used to generate the discriminative feature vector, then SVM classifier used to verify to kinship relations. To measure the accuracy of the proposed method, KinFaceW-I and KinFaceW-II databases have been used. The results of the evaluations show that the feature fusion and NRML metric learning methods have been able to improve the performance of the kinship verification system. In addition to the proposed approach, the effect of feature extraction from the image blocks or the whole image is investigated and the results are presented. The results indicate that feature extraction in block form, can be effective in improving the final accuracy of kinship verification.
从面部图像中提取的特征被用于各种领域,如亲属关系验证。亲属关系验证系统通过分析一对人脸图像的面部特征来确定其亲属关系或非亲属关系。本研究采用不同的纹理和颜色特征,结合度量学习方法,对父子、父女、母子和母女四种亲属关系进行了亲属关系验证。首先通过融合有效特征,利用NRML度量学习生成判别特征向量,然后利用SVM分类器对亲缘关系进行验证。为了测量所提出方法的准确性,使用了KinFaceW-I和KinFaceW-II数据库。评价结果表明,特征融合和NRML度量学习方法能够提高亲属关系验证系统的性能。除了提出的方法外,还研究了从图像块或整个图像中提取特征的效果,并给出了结果。结果表明,以块的形式提取特征可以有效地提高亲属关系验证的最终准确性。
{"title":"Feature Fusion and NRML Metric Learning for Facial Kinship Verification","authors":"Fahimeh Ramazankhani, Mahdi Yazdian Dehkordi, M. Rezaeian","doi":"10.3897/jucs.89254","DOIUrl":"https://doi.org/10.3897/jucs.89254","url":null,"abstract":"Features extracted from facial images are used in various fields such as kinship verification. The kinship verification system determines the kin or non-kin relation between a pair of facial images by analysing their facial features. In this research, different texture and color features have been used along with the metric learning method, to verify the kinship for the four kinship relations of father-son, father-daughter, mother-son and mother-daughter. First, by fusing effective features, NRML metric learning used to generate the discriminative feature vector, then SVM classifier used to verify to kinship relations. To measure the accuracy of the proposed method, KinFaceW-I and KinFaceW-II databases have been used. The results of the evaluations show that the feature fusion and NRML metric learning methods have been able to improve the performance of the kinship verification system. In addition to the proposed approach, the effect of feature extraction from the image blocks or the whole image is investigated and the results are presented. The results indicate that feature extraction in block form, can be effective in improving the final accuracy of kinship verification.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"14 1","pages":"326-348"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86696440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic assignment of diagnosis codes to free-form text medical note 自动分配诊断代码的自由格式文本医疗说明
Pub Date : 2023-04-28 DOI: 10.3897/jucs.89923
Stefan Strydom, Andrei Michael Dreyer, Brink van der Merwe
International Classification of Disease (ICD) coding plays a significant role in classify-ing morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this paper, we propose a transformer-based architecture with label-wise attention for predicting ICD codes on a medical dataset. The transformer model is first pre-trained from scratch on a medical dataset. Once this is done, the pre-trained model is used to generate representations of the tokens in the clinical documents, which are fed into the label-wise attention layer. Finally, the outputs from the label-wise attention layer are fed into a feed-forward neural network to predict appropriate ICD codes for the input document. We evaluate our model using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III dataset. Our experimental results show that our transformer model outperforms all previous models in terms of micro-F1 for the full label set from the MIMIC-III dataset. This is also the first successful application of a pre-trained transformer architecture to the auto-coding problem on the full MIMIC-III dataset.
国际疾病分类(ICD)编码在发病率和死亡率分类中起着重要作用。目前,ICD代码由医生或专业临床编码人员手工分配给患者的病历。这种做法容易出错,培训熟练的临床编码人员需要时间和人力资源。ICD代码的自动预测可以帮助减轻这一负担。在本文中,我们提出了一种基于变压器的结构,具有标签式关注,用于预测医疗数据集上的ICD代码。变压器模型首先在医疗数据集上从零开始进行预训练。一旦完成,预训练模型将用于生成临床文档中令牌的表示,这些令牌将被馈送到标签关注层。最后,来自标签注意层的输出被馈送到前馈神经网络中,以预测输入文档的适当ICD代码。我们使用来自MIMIC-III数据集的医院出院摘要及其相应的ICD-9代码来评估我们的模型。实验结果表明,对于来自MIMIC-III数据集的完整标签集,我们的变压器模型在micro-F1方面优于所有先前的模型。这也是在完整MIMIC-III数据集上首次成功地将预训练的变压器架构应用于自动编码问题。
{"title":"Automatic assignment of diagnosis codes to free-form text medical note","authors":"Stefan Strydom, Andrei Michael Dreyer, Brink van der Merwe","doi":"10.3897/jucs.89923","DOIUrl":"https://doi.org/10.3897/jucs.89923","url":null,"abstract":"International Classification of Disease (ICD) coding plays a significant role in classify-ing morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this paper, we propose a transformer-based architecture with label-wise attention for predicting ICD codes on a medical dataset. The transformer model is first pre-trained from scratch on a medical dataset. Once this is done, the pre-trained model is used to generate representations of the tokens in the clinical documents, which are fed into the label-wise attention layer. Finally, the outputs from the label-wise attention layer are fed into a feed-forward neural network to predict appropriate ICD codes for the input document. We evaluate our model using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III dataset. Our experimental results show that our transformer model outperforms all previous models in terms of micro-F1 for the full label set from the MIMIC-III dataset. This is also the first successful application of a pre-trained transformer architecture to the auto-coding problem on the full MIMIC-III dataset.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"106 1","pages":"349-373"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81076947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Undergraduate research in software engineering. An experience and evaluation report 软件工程本科研究。经验及评估报告
Pub Date : 2023-03-28 DOI: 10.3897/jucs.95718
Gerardo Matturro
The purpose of this paper is to present an undergraduate research experience process model and the evaluation of seven years of its application in an undergraduate research program in software engineering. Undergraduate students who participated in research projects between 2015 and 2022 were surveyed to find out a) their motivations for participating in research projects in software engineering, b) the skills they consider they have acquired or improved by participating in those projects, and c) their perception of benefits and utility for their future work and professional activities. Results reveal that participation in real research projects in software engineering is highly valued by undergraduate students, who perceive benefits in the development of research and soft skills, and for their future professional activity. In addition, these undergraduate research projects and the process followed show that it is feasible to make original contributions to the body of knowledge of software engineering.
本文的目的是提出一个本科生科研经历过程模型,并对其在软件工程专业本科生科研项目中的七年应用进行了评价。对2015年至2022年间参与研究项目的本科生进行了调查,以找出a)他们参与软件工程研究项目的动机,b)他们认为通过参与这些项目获得或提高的技能,以及c)他们对未来工作和专业活动的收益和效用的看法。结果表明,参与软件工程的实际研究项目受到本科生的高度重视,他们认为这对研究和软技能的发展以及他们未来的专业活动都有好处。此外,这些本科生的研究项目和随后的过程表明,对软件工程的知识体系做出原创性贡献是可行的。
{"title":"Undergraduate research in software engineering. An experience and evaluation report","authors":"Gerardo Matturro","doi":"10.3897/jucs.95718","DOIUrl":"https://doi.org/10.3897/jucs.95718","url":null,"abstract":"The purpose of this paper is to present an undergraduate research experience process model and the evaluation of seven years of its application in an undergraduate research program in software engineering. Undergraduate students who participated in research projects between 2015 and 2022 were surveyed to find out a) their motivations for participating in research projects in software engineering, b) the skills they consider they have acquired or improved by participating in those projects, and c) their perception of benefits and utility for their future work and professional activities. Results reveal that participation in real research projects in software engineering is highly valued by undergraduate students, who perceive benefits in the development of research and soft skills, and for their future professional activity. In addition, these undergraduate research projects and the process followed show that it is feasible to make original contributions to the body of knowledge of software engineering.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"29 1","pages":"203-221"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74948532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Structural and Semantic Measures for JSON Document Clustering 利用JSON文档聚类的结构和语义度量
Pub Date : 2023-03-28 DOI: 10.3897/jucs.86563
Uma Priya D, P. S. Thilagam
In recent years, the increased use of smart devices and digital business opportunities has generated massive heterogeneous JSON data daily, making efficient data storage and management more difficult. Existing research uses different similarity metrics and clusters the documents to support the above tasks effectively. However, extant approaches have focused on either structural or semantic similarity of schemas. As JSON documents are application-specific, differently annotated JSON schemas are not only structurally heterogeneous but also differ by the context of the JSON attributes. Therefore, there is a need to consider the structural, semantic, and contextual properties of JSON schemas to perform meaningful clustering of JSON documents. This work proposes an approach to cluster heterogeneous JSON documents using the similarity fusion method. The similarity fusion matrix is constructed using structural, semantic, and contextual measures of JSON schemas. The experimental results demonstrate that the proposed approach outperforms the existing approaches significantly. 
近年来,随着智能设备的使用和数字化商业机会的增加,每天都会产生大量异构JSON数据,使得高效的数据存储和管理变得更加困难。现有的研究使用不同的相似度度量并对文档进行聚类来有效地支持上述任务。然而,现有的方法主要关注模式的结构相似性或语义相似性。由于JSON文档是特定于应用程序的,不同注释的JSON模式不仅在结构上是异构的,而且还因JSON属性的上下文而有所不同。因此,需要考虑JSON模式的结构、语义和上下文属性,以便对JSON文档执行有意义的集群。本文提出了一种使用相似度融合方法对异构JSON文档进行聚类的方法。使用JSON模式的结构、语义和上下文度量来构建相似性融合矩阵。实验结果表明,该方法明显优于现有方法。
{"title":"Leveraging Structural and Semantic Measures for JSON Document Clustering","authors":"Uma Priya D, P. S. Thilagam","doi":"10.3897/jucs.86563","DOIUrl":"https://doi.org/10.3897/jucs.86563","url":null,"abstract":"In recent years, the increased use of smart devices and digital business opportunities has generated massive heterogeneous JSON data daily, making efficient data storage and management more difficult. Existing research uses different similarity metrics and clusters the documents to support the above tasks effectively. However, extant approaches have focused on either structural or semantic similarity of schemas. As JSON documents are application-specific, differently annotated JSON schemas are not only structurally heterogeneous but also differ by the context of the JSON attributes. Therefore, there is a need to consider the structural, semantic, and contextual properties of JSON schemas to perform meaningful clustering of JSON documents. This work proposes an approach to cluster heterogeneous JSON documents using the similarity fusion method. The similarity fusion matrix is constructed using structural, semantic, and contextual measures of JSON schemas. The experimental results demonstrate that the proposed approach outperforms the existing approaches significantly. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"27 1","pages":"222-241"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81175532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIMLA: A Modular and Modifiable Data Preparation, Organization, and Fusion Infrastructure to Partially Support the Development of Context-aware MMLA Solutions CIMLA:模块化和可修改的数据准备、组织和融合基础设施,部分支持上下文感知MMLA解决方案的开发
Pub Date : 2023-03-28 DOI: 10.3897/jucs.84558
Shashi Kant Shankar, Adolfo Ruiz-Calleja, L. Prieto, M. Rodríguez-Triana, Pankaj Chejara, Sandesh Tripathi
Multimodal Learning Analytics (MMLA) solutions aim to provide a more holistic picture of a learning situation by processing multimodal educational data. Considering contextual information of a learning situation is known to help in providing more relevant outputs to educational stakeholders. However, most of the MMLA solutions are still in prototyping phase and dealing with different dimensions of an authentic MMLA situation that involve multiple cross-disciplinary stakeholders like teachers, researchers, and developers. One of the reasons behind still being in prototyping phase of the development lifecycle is related to the challenges that software developers face at different levels in developing context-aware MMLA solutions. In this paper, we identify the requirements and propose a data infrastructure called CIMLA. It includes different data processing components following a standard data processing pipeline and considers contextual information following a data structure. It has been evaluated in three authentic MMLA scenarios involving different cross-disciplinary stakeholders following the Software Architecture Analysis Method. Its fitness was analyzed in each of the three scenarios and developers were interviewed to assess whether it meets functional and non-functional requirements. Results showed that CIMLA supports modularity in developing context-aware MMLA solutions and each of its modules can be reused with required modifications in the development of other solutions. In the future, the current involvement of a developer in customizing the configuration file to consider contextual information can be investigated.
多模态学习分析(MMLA)解决方案旨在通过处理多模态教育数据,为学习情况提供更全面的图景。考虑学习情况的上下文信息有助于向教育利益相关者提供更相关的输出。然而,大多数MMLA解决方案仍处于原型设计阶段,并处理真实MMLA情况的不同维度,涉及多个跨学科利益相关者,如教师、研究人员和开发人员。仍然处于开发生命周期的原型阶段背后的原因之一与软件开发人员在开发上下文感知MMLA解决方案时在不同级别面临的挑战有关。在本文中,我们确定了需求并提出了一个称为CIMLA的数据基础设施。它包括遵循标准数据处理管道的不同数据处理组件,并考虑遵循数据结构的上下文信息。它已经在三个真实的MMLA场景中进行了评估,这些场景涉及不同的跨学科利益相关者,遵循软件架构分析方法。在三种场景中对其适用性进行了分析,并对开发人员进行了访谈,以评估它是否满足功能和非功能需求。结果表明,CIMLA支持模块化开发上下文感知MMLA解决方案,其每个模块都可以在开发其他解决方案时进行必要的修改。将来,可以调查开发人员在定制配置文件以考虑上下文信息方面的当前参与情况。
{"title":"CIMLA: A Modular and Modifiable Data Preparation, Organization, and Fusion Infrastructure to Partially Support the Development of Context-aware MMLA Solutions","authors":"Shashi Kant Shankar, Adolfo Ruiz-Calleja, L. Prieto, M. Rodríguez-Triana, Pankaj Chejara, Sandesh Tripathi","doi":"10.3897/jucs.84558","DOIUrl":"https://doi.org/10.3897/jucs.84558","url":null,"abstract":"Multimodal Learning Analytics (MMLA) solutions aim to provide a more holistic picture of a learning situation by processing multimodal educational data. Considering contextual information of a learning situation is known to help in providing more relevant outputs to educational stakeholders. However, most of the MMLA solutions are still in prototyping phase and dealing with different dimensions of an authentic MMLA situation that involve multiple cross-disciplinary stakeholders like teachers, researchers, and developers. One of the reasons behind still being in prototyping phase of the development lifecycle is related to the challenges that software developers face at different levels in developing context-aware MMLA solutions. In this paper, we identify the requirements and propose a data infrastructure called CIMLA. It includes different data processing components following a standard data processing pipeline and considers contextual information following a data structure. It has been evaluated in three authentic MMLA scenarios involving different cross-disciplinary stakeholders following the Software Architecture Analysis Method. Its fitness was analyzed in each of the three scenarios and developers were interviewed to assess whether it meets functional and non-functional requirements. Results showed that CIMLA supports modularity in developing context-aware MMLA solutions and each of its modules can be reused with required modifications in the development of other solutions. In the future, the current involvement of a developer in customizing the configuration file to consider contextual information can be investigated.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"25 1","pages":"265-297"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82094807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
J. Univers. Comput. Sci.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1