首页 > 最新文献

2021 the 5th International Conference on Information System and Data Mining最新文献

英文 中文
Weighted Ensemble of Neural and Probabilistic Graphical Models for Click Prediction 点击预测的神经和概率图形模型的加权集成
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471307
Kritarth Bisht, Seba Susan
Predicting user behavior in web mining is an important concept with commercial implications. The user response to search engine results is crucial for understanding the relative popularity of websites and market trends. The most popular way of understanding user interests is via click models that can predict whether a user will click on a search engine result or not, based on past observations. There are two main categories of click models, namely, the neural network based models and the probabilistic graphical models. In this paper, we combine the goodness of both approaches by presenting a weighted ensemble of both types of models. The weighted sum of softmax scores integrates the predictions of the individual models. Assigning higher weights to the neural models is found to improve the performance of the ensemble. The AUC and perplexity scores of our weighted ensemble model are higher than the state of the art, as proved by experiments on the benchmark Tiangong-ST dataset.
在web挖掘中预测用户行为是一个具有商业意义的重要概念。用户对搜索引擎结果的反应对于了解网站的相对受欢迎程度和市场趋势至关重要。了解用户兴趣的最流行方法是通过点击模型,该模型可以根据过去的观察结果预测用户是否会点击搜索引擎结果。点击模型主要有两大类,即基于神经网络的点击模型和概率图模型。在本文中,我们通过提出两种模型的加权集合来结合这两种方法的优点。softmax得分的加权和整合了各个模型的预测。为神经模型分配更高的权重可以提高集成的性能。在天宫- st基准数据集上的实验证明,我们的加权集成模型的AUC和perplexity得分高于目前的水平。
{"title":"Weighted Ensemble of Neural and Probabilistic Graphical Models for Click Prediction","authors":"Kritarth Bisht, Seba Susan","doi":"10.1145/3471287.3471307","DOIUrl":"https://doi.org/10.1145/3471287.3471307","url":null,"abstract":"Predicting user behavior in web mining is an important concept with commercial implications. The user response to search engine results is crucial for understanding the relative popularity of websites and market trends. The most popular way of understanding user interests is via click models that can predict whether a user will click on a search engine result or not, based on past observations. There are two main categories of click models, namely, the neural network based models and the probabilistic graphical models. In this paper, we combine the goodness of both approaches by presenting a weighted ensemble of both types of models. The weighted sum of softmax scores integrates the predictions of the individual models. Assigning higher weights to the neural models is found to improve the performance of the ensemble. The AUC and perplexity scores of our weighted ensemble model are higher than the state of the art, as proved by experiments on the benchmark Tiangong-ST dataset.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"46 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123519173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Characterization of the organizational climate in public schools from the teacher's perception using the Estanones scale 利用Estanones量表从教师的感知来表征公立学校的组织氛围
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471310
Karim Roca, Belinda Navarro, Hector Carlos, Edwin Delgado, M. Ore
The objective of this work is to determine the degree, direction and significance of the relationship that exists between transformational leadership and the organizational climate in teachers in public management educational institutions. The randomized stratified sample consisted of 120 teachers. The research had a quantitative approach, correlational type, and cross-sectional design. The information was collected with the Transformational Leadership Scale and the Organizational Climate Scale, on the other hand, the content validity and reliability of the instruments were corroborated according to the standards of the scientific community with the Aiken Validity coefficient, the Alpha coefficient of Cronbach and the Kuder-Richardson coefficient (KR-20), respectively.The statistical analysis of the data was done with the Stanonese scale for the description of the qualitative levels of the variables, and the parametric test Pearson's correlation coefficient (r) for the hypothesis test. The results showed direct correlations of moderate intensity; while the dimension of inspirational communication shows a low direct correlation with the organizational climate. Finally, the findings turned out to be statistically significant at a probability level of 0.05.
本研究的目的是确定公共管理教育机构教师的变革型领导与组织氛围之间存在的关系的程度、方向和意义。随机分层样本由120名教师组成。本研究采用定量方法、相关型和横断面设计。采用变革型领导力量表和组织氛围量表进行信息收集,并采用Aiken效度系数、Alpha Cronbach系数和kader - richardson系数(KR-20)分别按照科学界的标准对工具的内容效度和信度进行确证。数据的统计分析使用Stanonese量表来描述变量的定性水平,并使用参数检验Pearson相关系数(r)进行假设检验。结果表明:中等强度直接相关;而鼓舞性沟通维度与组织氛围的直接相关程度较低。最后,这些发现在0.05的概率水平上具有统计学意义。
{"title":"Characterization of the organizational climate in public schools from the teacher's perception using the Estanones scale","authors":"Karim Roca, Belinda Navarro, Hector Carlos, Edwin Delgado, M. Ore","doi":"10.1145/3471287.3471310","DOIUrl":"https://doi.org/10.1145/3471287.3471310","url":null,"abstract":"The objective of this work is to determine the degree, direction and significance of the relationship that exists between transformational leadership and the organizational climate in teachers in public management educational institutions. The randomized stratified sample consisted of 120 teachers. The research had a quantitative approach, correlational type, and cross-sectional design. The information was collected with the Transformational Leadership Scale and the Organizational Climate Scale, on the other hand, the content validity and reliability of the instruments were corroborated according to the standards of the scientific community with the Aiken Validity coefficient, the Alpha coefficient of Cronbach and the Kuder-Richardson coefficient (KR-20), respectively.The statistical analysis of the data was done with the Stanonese scale for the description of the qualitative levels of the variables, and the parametric test Pearson's correlation coefficient (r) for the hypothesis test. The results showed direct correlations of moderate intensity; while the dimension of inspirational communication shows a low direct correlation with the organizational climate. Finally, the findings turned out to be statistically significant at a probability level of 0.05.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131674107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Application of Generative Adversarial Networks for Robust Inference in Computational Fluid Dynamics 生成对抗网络在计算流体力学鲁棒推理中的应用
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471304
Chaity Banerjee, Chad Lilian, D. Reasor, E. Pasiliao, Tathagata Mukherjee
In this paper we propose a robust learning pipeline for inference in computational fluid dynamics (CFD) systems in the presence of faulty sensor data. The standard methods for handling faulty sensor data involve outlier detection techniques which assume that the faulty data is generated from the tail regions of the underlying data distribution and hence can be eliminated by modeling the high probability regions of the distribution. However this assumption is not always true and subtle faults in sensors can lead to recording of faulty data which can be thought of as being generated from a subtly perturbed version of the underlying distribution. Methods based on outlier detection techniques will fail to work under these settings and hence novel approaches are required for eliminating faulty data in such systems. In this work we explore the use of a Generative Adversarial Network (GAN) for this purpose. We train the generator network of the GAN to generate “fake” sensor data that mimics the distribution of the real data, albeit, a slightly perturbed one. We use this to train a discriminator network which learns to distinguish between the “real” and “fake” data generated from the generator. This discriminator is then used to filter out faulty sensor data generated from a perturbed version of the distribution generating the real data. We also build a simple regressor that uses the trained discriminator to perform robust regression on the CFD data after eliminating faulty sensor data. We tested the robust regression pipeline with CFD data for predicting fluid flow characteristics (specifically the angle of attack (AoA)) over a 2D foil. Our discriminator trained in a GAN framework could eliminate faulty sensor data, generated using the trained generator, with ∼ 100 % efficiency. The filtered data is then used for inference of the fluid flow parameters using the regressor.
在本文中,我们提出了一种鲁棒学习管道,用于在存在故障传感器数据的计算流体动力学(CFD)系统中进行推理。处理故障传感器数据的标准方法涉及异常点检测技术,该技术假设故障数据来自底层数据分布的尾部区域,因此可以通过对分布的高概率区域建模来消除故障数据。然而,这种假设并不总是正确的,传感器中的细微故障可能导致记录错误的数据,这些数据可以被认为是由潜在分布的微妙扰动版本产生的。基于离群值检测技术的方法将无法在这些设置下工作,因此需要新的方法来消除此类系统中的错误数据。在这项工作中,我们探讨了为此目的使用生成对抗网络(GAN)。我们训练GAN的生成器网络来生成“假”传感器数据,这些数据模仿真实数据的分布,尽管略有扰动。我们用它来训练鉴别器网络,该网络学习区分从生成器生成的“真实”和“虚假”数据。然后使用该鉴别器过滤掉由产生真实数据的分布的扰动版本产生的故障传感器数据。我们还构建了一个简单的回归器,该回归器使用训练好的鉴别器在消除故障传感器数据后对CFD数据进行鲁棒回归。我们用CFD数据对鲁棒回归管道进行了测试,以预测流体在二维叶面上的流动特性(特别是迎角(AoA))。我们在GAN框架中训练的鉴别器可以消除使用训练过的生成器生成的错误传感器数据,效率为100%。然后将过滤后的数据用于使用回归器推断流体流动参数。
{"title":"An Application of Generative Adversarial Networks for Robust Inference in Computational Fluid Dynamics","authors":"Chaity Banerjee, Chad Lilian, D. Reasor, E. Pasiliao, Tathagata Mukherjee","doi":"10.1145/3471287.3471304","DOIUrl":"https://doi.org/10.1145/3471287.3471304","url":null,"abstract":"In this paper we propose a robust learning pipeline for inference in computational fluid dynamics (CFD) systems in the presence of faulty sensor data. The standard methods for handling faulty sensor data involve outlier detection techniques which assume that the faulty data is generated from the tail regions of the underlying data distribution and hence can be eliminated by modeling the high probability regions of the distribution. However this assumption is not always true and subtle faults in sensors can lead to recording of faulty data which can be thought of as being generated from a subtly perturbed version of the underlying distribution. Methods based on outlier detection techniques will fail to work under these settings and hence novel approaches are required for eliminating faulty data in such systems. In this work we explore the use of a Generative Adversarial Network (GAN) for this purpose. We train the generator network of the GAN to generate “fake” sensor data that mimics the distribution of the real data, albeit, a slightly perturbed one. We use this to train a discriminator network which learns to distinguish between the “real” and “fake” data generated from the generator. This discriminator is then used to filter out faulty sensor data generated from a perturbed version of the distribution generating the real data. We also build a simple regressor that uses the trained discriminator to perform robust regression on the CFD data after eliminating faulty sensor data. We tested the robust regression pipeline with CFD data for predicting fluid flow characteristics (specifically the angle of attack (AoA)) over a 2D foil. Our discriminator trained in a GAN framework could eliminate faulty sensor data, generated using the trained generator, with ∼ 100 % efficiency. The filtered data is then used for inference of the fluid flow parameters using the regressor.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122851072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Application of Machine Learning in Bitcoin Ransomware Family Prediction 机器学习在比特币勒索软件家族预测中的应用
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471300
Shengyun Xu
In recent years, ransomware attacks have become increasingly rampant, resulting in many large companies or financial institutions suffering heavy losses from ransomware attacks. Bitcoin, is a means of payment demanded by the Ransomware Family. By comparing and analyzing the characteristics of bitcoin transactions, we can predict the types of Ransomware Family. Therefore, in this paper, the algorithm of machine learning is used to put forward the prediction method of Ransomware Family, so as to achieve the better effect of helping the attacked institutions to avoid being extorted effectively. In the traditional method, the judgment of Ransomware Family can only rely on human experience and subjective judgment, instead of accurate and batch analysis of Bitcoin transactions and prediction results. In this paper, a large number of known data sets of bitcoin's transaction features are used for analysis and modeling. First, we carried out descriptive statistical analysis to explore the differences between different Ransomware Families in bitcoin trading behavior. Next, we used a series of machine learning models to build the prediction model of Ransomware Family and conduct identification and classification, so as to help avoid financial losses from the Ransomware. Finally, we found that Ransomware family species were most significantly affected by year. In addition, it can be found that the accuracy of the Boosting model is the highest, and the test error is only about 3%.
近年来,勒索软件攻击日益猖獗,导致许多大公司或金融机构遭受勒索软件攻击,损失惨重。比特币是勒索软件家族要求的一种支付手段。通过对比特币交易特征的比较分析,可以预测勒索软件家族的类型。因此,本文利用机器学习的算法,提出了勒索软件家族的预测方法,以达到帮助被攻击机构有效避免被勒索的更好效果。在传统的方法中,勒索软件家族的判断只能依靠人的经验和主观判断,而不能对比特币交易和预测结果进行准确、批量的分析。本文利用大量已知的比特币交易特征数据集进行分析和建模。首先,我们进行描述性统计分析,探讨不同勒索软件家族在比特币交易行为上的差异。接下来,我们使用一系列机器学习模型建立Ransomware Family的预测模型,并进行识别和分类,以帮助避免Ransomware带来的经济损失。最后,我们发现勒索病毒科物种受年份的影响最为显著。此外,可以发现boost模型的精度最高,测试误差仅为3%左右。
{"title":"The Application of Machine Learning in Bitcoin Ransomware Family Prediction","authors":"Shengyun Xu","doi":"10.1145/3471287.3471300","DOIUrl":"https://doi.org/10.1145/3471287.3471300","url":null,"abstract":"In recent years, ransomware attacks have become increasingly rampant, resulting in many large companies or financial institutions suffering heavy losses from ransomware attacks. Bitcoin, is a means of payment demanded by the Ransomware Family. By comparing and analyzing the characteristics of bitcoin transactions, we can predict the types of Ransomware Family. Therefore, in this paper, the algorithm of machine learning is used to put forward the prediction method of Ransomware Family, so as to achieve the better effect of helping the attacked institutions to avoid being extorted effectively. In the traditional method, the judgment of Ransomware Family can only rely on human experience and subjective judgment, instead of accurate and batch analysis of Bitcoin transactions and prediction results. In this paper, a large number of known data sets of bitcoin's transaction features are used for analysis and modeling. First, we carried out descriptive statistical analysis to explore the differences between different Ransomware Families in bitcoin trading behavior. Next, we used a series of machine learning models to build the prediction model of Ransomware Family and conduct identification and classification, so as to help avoid financial losses from the Ransomware. Finally, we found that Ransomware family species were most significantly affected by year. In addition, it can be found that the accuracy of the Boosting model is the highest, and the test error is only about 3%.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133348989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Selection and Verification of Privacy Parameters for Local Differentially Private Data Aggregation 局部差分私有数据聚合中隐私参数的选择与验证
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471306
Snehkumar Shahani, Abraham Jibi, R. Venkateswaran
Acquiring and aggregating data from a group of individuals is crucial for studying their general behavior. Differentially Private (DP) techniques, characterized by the parameter ϵ, help to protect Individually Identifiable Data (IID) of individuals participating in such data collection. However, such techniques affect the usefulness of the data leading to a trade-off between usefulness and privacy, thereby making the selection of ϵ an important problem before data acquisition. In this work, we use a mathematical formalism to estimate usefulness and privacy for sum query as aggregate analysis for the local model of privacy. The mathematical relation enables the application of a variety of optimization techniques, discussed in the work, to select an optimal value of ϵ. Existing methods for selecting ϵ are based on financial parameters, but they heavily rely on past data and domain knowledge which may not be available in many cases. To address this, we have provided Knee-point based recommendations along with a selection criterion to choose the method of recommendation depending on the availability of information. This allows analysts to take enlightened decisions while negotiating the value of ϵ. Our experiments on synthetic and real-world datasets unambiguously demonstrate the strength of the mathematical model and the recommended values
从一群人那里获取和汇总数据对于研究他们的一般行为至关重要。差异隐私(DP)技术,以参数为特征,有助于保护参与此类数据收集的个人的个人可识别数据(IID)。然而,这些技术会影响数据的有用性,导致有用性和隐私性之间的权衡,从而使数据采集之前的选择成为一个重要问题。在这项工作中,我们使用数学形式来估计求和查询的有用性和隐私性,作为隐私局部模型的聚合分析。这种数学关系使我们能够应用工作中讨论的各种优化技术来选择一个最佳的λ值。现有的选择λ的方法是基于财务参数的,但它们严重依赖于过去的数据和领域知识,而这些在很多情况下是不可用的。为了解决这个问题,我们提供了基于膝点的推荐,以及根据信息的可用性选择推荐方法的选择标准。这使得分析师在讨论λ的值时能够做出明智的决定。我们在合成数据集和真实数据集上的实验明确地证明了数学模型和推荐值的强度
{"title":"Selection and Verification of Privacy Parameters for Local Differentially Private Data Aggregation","authors":"Snehkumar Shahani, Abraham Jibi, R. Venkateswaran","doi":"10.1145/3471287.3471306","DOIUrl":"https://doi.org/10.1145/3471287.3471306","url":null,"abstract":"Acquiring and aggregating data from a group of individuals is crucial for studying their general behavior. Differentially Private (DP) techniques, characterized by the parameter ϵ, help to protect Individually Identifiable Data (IID) of individuals participating in such data collection. However, such techniques affect the usefulness of the data leading to a trade-off between usefulness and privacy, thereby making the selection of ϵ an important problem before data acquisition. In this work, we use a mathematical formalism to estimate usefulness and privacy for sum query as aggregate analysis for the local model of privacy. The mathematical relation enables the application of a variety of optimization techniques, discussed in the work, to select an optimal value of ϵ. Existing methods for selecting ϵ are based on financial parameters, but they heavily rely on past data and domain knowledge which may not be available in many cases. To address this, we have provided Knee-point based recommendations along with a selection criterion to choose the method of recommendation depending on the availability of information. This allows analysts to take enlightened decisions while negotiating the value of ϵ. Our experiments on synthetic and real-world datasets unambiguously demonstrate the strength of the mathematical model and the recommended values","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121042492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Development of Virtual Skill Trainers and Their Validation Study Analysis Using Machine Learning 基于机器学习的虚拟技能培训师开发及其验证研究分析
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471296
Seema Shedage, Jake Farmer, Doga Demirel, Tansel Halic, S. Kockara, V. Arikatla, K. Sexton, Shahryar Ahmadi
Minimally invasive skills assessment is important in developing competent surgical simulators and executing reliable skills evaluation [9]. Arthroscopy and Laparoscopy surgeries are considered Minimally Invasive Surgeries (MIS). In MIS, the surgeon operates through small incisions with specialized narrow instruments, fiberoptic lights, and a monitor. Arthroscopy surgery is used to diagnose and treat joints problems, and Laparoscopic procedures are performed on the abdominal cavity. Due to non-natural hand-eye coordination, narrow field-of-view, and limited instrument control, MIS training is challenging to master. We are analyzing two simulators' data, Virtual Arthroscopic Tear Diagnosis and Evaluation Platform (VATDEP) and Gentleness Simulator. Both simulators went through the validation studies with human subjects. We recorded simulation data during the validation studies, such as tool motion, position, and task time. Recorded data went through the data preprocessing; after the data cleaning, we extracted the recoded data features and normalized them. Normalized features were used to input various machine learning algorithms, including K-nearest neighbor (KNN), Support vector machine (SVM), and Logistic regression (LR). The average accuracy was evaluated through k-fold cross-validation. The proposed methods validated using 10 subjects (5 experts, 5 novices) for the VATDEP simulator. 23 subjects (4 experts and 19 novices) for the Gentleness Simulator. The result shows a significant difference between the expert and novice population with the p < 0.05 using the Mann-Whitney U-test. The VATDEP simulator's classification algorithms' average accuracy is 74% and 80% for the Gentleness Simulator. The results show that the normalized features and with KNN, SVM, and LR classifiers can provide accurate classification of experts and novices. The evaluation technique proposed in this study can develop surgical training by providing appropriate feedback to trainees to evaluate proficiency.
微创技能评估对于开发合格的手术模拟器和执行可靠的技能评估非常重要[9]。关节镜和腹腔镜手术被认为是微创手术。在MIS中,外科医生使用专门的狭窄器械、光纤灯和监视器通过小切口进行手术。关节镜手术用于诊断和治疗关节问题,腹腔镜手术在腹腔进行。由于非自然的手眼协调、狭窄的视野和有限的仪器控制,MIS训练是具有挑战性的。我们分析了两个模拟器的数据,虚拟关节镜撕裂诊断和评估平台(VATDEP)和温柔模拟器。这两个模拟器都通过了人类受试者的验证研究。我们在验证研究期间记录了模拟数据,例如工具运动,位置和任务时间。记录的数据经过数据预处理;经过数据清洗后,提取编码后的数据特征并进行归一化处理。使用归一化特征输入各种机器学习算法,包括k -最近邻(KNN),支持向量机(SVM)和逻辑回归(LR)。通过k-fold交叉验证评估平均准确度。采用10名被试(5名专家,5名新手)对VATDEP模拟器进行了验证。温柔模拟器实验对象23人(专家4人,新手19人)。经Mann-Whitney u检验,专家型人群与新手群体差异显著,p < 0.05。VATDEP模拟器的分类算法平均准确率为74%,gentle模拟器的分类算法平均准确率为80%。结果表明,将归一化特征与KNN、SVM和LR分类器相结合,可以为专家和新手提供准确的分类。本研究提出的评估技术,可透过提供适当的反馈,以评估受训者的熟练程度,进而发展外科训练。
{"title":"Development of Virtual Skill Trainers and Their Validation Study Analysis Using Machine Learning","authors":"Seema Shedage, Jake Farmer, Doga Demirel, Tansel Halic, S. Kockara, V. Arikatla, K. Sexton, Shahryar Ahmadi","doi":"10.1145/3471287.3471296","DOIUrl":"https://doi.org/10.1145/3471287.3471296","url":null,"abstract":"Minimally invasive skills assessment is important in developing competent surgical simulators and executing reliable skills evaluation [9]. Arthroscopy and Laparoscopy surgeries are considered Minimally Invasive Surgeries (MIS). In MIS, the surgeon operates through small incisions with specialized narrow instruments, fiberoptic lights, and a monitor. Arthroscopy surgery is used to diagnose and treat joints problems, and Laparoscopic procedures are performed on the abdominal cavity. Due to non-natural hand-eye coordination, narrow field-of-view, and limited instrument control, MIS training is challenging to master. We are analyzing two simulators' data, Virtual Arthroscopic Tear Diagnosis and Evaluation Platform (VATDEP) and Gentleness Simulator. Both simulators went through the validation studies with human subjects. We recorded simulation data during the validation studies, such as tool motion, position, and task time. Recorded data went through the data preprocessing; after the data cleaning, we extracted the recoded data features and normalized them. Normalized features were used to input various machine learning algorithms, including K-nearest neighbor (KNN), Support vector machine (SVM), and Logistic regression (LR). The average accuracy was evaluated through k-fold cross-validation. The proposed methods validated using 10 subjects (5 experts, 5 novices) for the VATDEP simulator. 23 subjects (4 experts and 19 novices) for the Gentleness Simulator. The result shows a significant difference between the expert and novice population with the p < 0.05 using the Mann-Whitney U-test. The VATDEP simulator's classification algorithms' average accuracy is 74% and 80% for the Gentleness Simulator. The results show that the normalized features and with KNN, SVM, and LR classifiers can provide accurate classification of experts and novices. The evaluation technique proposed in this study can develop surgical training by providing appropriate feedback to trainees to evaluate proficiency.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124388115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nepal Stock Market Movement Prediction with Machine Learning 尼泊尔股市走势预测与机器学习
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471289
Shu-Fei Zhao
Financial market predicting is a popular theme of lots of researches in recent years. However, the majority of previous studies are focus on markets in great countries like China and United States, while some small countries are drawn less attention. To cover this shortage in current literature, we determined to use and compare 17 types of machine learning models to foresee Nepal market in this paper. Based on stock prices, 10 technical indicators were computed as input features. In addition, we also added emotional factors extracted from financial news to improve the prediction performance, which was evaluated by accuracy and F1 score. We predicted whether the closing price would rise or descend after three horizons: 1-day movement, 15-day movement and 30-day movement. From our experiment results, we found that linear SVM and XGBoost perform best and are the best options for further consideration in the trading process.
金融市场预测是近年来众多研究热点之一。然而,以往的研究大多集中在像中国和美国这样的大国市场,而一些小国家的关注较少。为了弥补当前文献中的这一不足,我们决定在本文中使用并比较17种类型的机器学习模型来预测尼泊尔市场。以股票价格为基础,计算10个技术指标作为输入特征。此外,我们还加入了从财经新闻中提取的情感因素来提高预测性能,并通过准确率和F1评分来评价预测结果。我们预测了三个视界:1天运动,15天运动和30天运动后收盘价是否会上涨或下跌。从我们的实验结果中,我们发现线性SVM和XGBoost表现最好,是交易过程中进一步考虑的最佳选择。
{"title":"Nepal Stock Market Movement Prediction with Machine Learning","authors":"Shu-Fei Zhao","doi":"10.1145/3471287.3471289","DOIUrl":"https://doi.org/10.1145/3471287.3471289","url":null,"abstract":"Financial market predicting is a popular theme of lots of researches in recent years. However, the majority of previous studies are focus on markets in great countries like China and United States, while some small countries are drawn less attention. To cover this shortage in current literature, we determined to use and compare 17 types of machine learning models to foresee Nepal market in this paper. Based on stock prices, 10 technical indicators were computed as input features. In addition, we also added emotional factors extracted from financial news to improve the prediction performance, which was evaluated by accuracy and F1 score. We predicted whether the closing price would rise or descend after three horizons: 1-day movement, 15-day movement and 30-day movement. From our experiment results, we found that linear SVM and XGBoost perform best and are the best options for further consideration in the trading process.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132802087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recommender System: Personalizing User Experience or Scientifically Deceiving Users? 推荐系统:个性化用户体验还是科学欺骗用户?
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471303
Ramachandran Trichur Narayanan
Recommender system is taking the lead among many things that the digital world offers today, to every customer visiting online portals for any service. Since its popularity from the time of Netflix competition, recommender system has become more visible and an important marketing and sales tool for corporates augmenting their offers online. Ongoing research initiatives in recommender systems, large datasets available for users across the globe, and corporate collaborations have led to improvised algorithms, and reduced errors in estimating predictions. Software and hardware tools that enable easy gathering of implicit and explicit data have helped recommender system to quickly adapt to the needs of the users. It is in this background the possibility of recommender system inducing the customer to pre-determined items by presenting fabricated predictions, as if it is resultant of scientific principles, need to be considered. In this paper, we give an overview of the recommender system, discuss how various components of the recommender system may be manipulated to allure innocent customers with false ratings, and also discuss the importance of engaging stakeholders to develop a robust recommender system.
推荐系统在当今数字世界提供的许多东西中处于领先地位,为每个访问在线门户网站的客户提供任何服务。自从与Netflix竞争以来,推荐系统已经变得更加明显,成为企业增加在线报价的重要营销和销售工具。推荐系统中正在进行的研究计划、可供全球用户使用的大型数据集以及企业合作导致了临时算法,并减少了估计预测的错误。能够轻松收集隐式和显式数据的软件和硬件工具帮助推荐系统快速适应用户的需求。正是在这种背景下,推荐系统通过提供虚构的预测来诱导客户购买预定项目的可能性,就好像它是科学原理的结果一样,需要考虑。在本文中,我们概述了推荐系统,讨论了如何操纵推荐系统的各个组成部分,以虚假评级吸引无辜的客户,并讨论了吸引利益相关者开发健壮的推荐系统的重要性。
{"title":"Recommender System: Personalizing User Experience or Scientifically Deceiving Users?","authors":"Ramachandran Trichur Narayanan","doi":"10.1145/3471287.3471303","DOIUrl":"https://doi.org/10.1145/3471287.3471303","url":null,"abstract":"Recommender system is taking the lead among many things that the digital world offers today, to every customer visiting online portals for any service. Since its popularity from the time of Netflix competition, recommender system has become more visible and an important marketing and sales tool for corporates augmenting their offers online. Ongoing research initiatives in recommender systems, large datasets available for users across the globe, and corporate collaborations have led to improvised algorithms, and reduced errors in estimating predictions. Software and hardware tools that enable easy gathering of implicit and explicit data have helped recommender system to quickly adapt to the needs of the users. It is in this background the possibility of recommender system inducing the customer to pre-determined items by presenting fabricated predictions, as if it is resultant of scientific principles, need to be considered. In this paper, we give an overview of the recommender system, discuss how various components of the recommender system may be manipulated to allure innocent customers with false ratings, and also discuss the importance of engaging stakeholders to develop a robust recommender system.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123845560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Email Clustering & Generating Email Templates Based on Their Topics 电子邮件聚类&基于主题生成电子邮件模板
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471298
Fatih Coşkun, C. Gezer, V. C. Gungor
Email templates have a significant impact on users in terms of productivity. Using an email template that is produced successfully is going to transfer the main information with a considerable impression. While the previous studies were focused on the email generation by text-differences in the content of the emails, generated templates based on email topics can provide better productivity for the companies. This article proposes a system, in which user emails are clustered according to the topics of the emails, and introduces an email template generation system that utilizes the sample emails belonging to the formed email clusters. For this purpose, the Enron email dataset has been used and the performance of different text preprocessing and topic modeling algorithms, such as DMM, GPU-DMM, GPU-PDMM, LF-DMM, LDA, LF-LDA, BTM, WNTM, PTM, SATM, have been investigated and compared to determine the most efficient one. After obtaining the email topics, the system shows the examples of the emails representing the selected topics and enables the authorized users to create templates that generalize these topics.
电子邮件模板对用户的工作效率有很大的影响。使用一个制作成功的电子邮件模板可以传递主要信息,给人留下深刻的印象。虽然以前的研究主要集中在电子邮件内容的文本差异生成电子邮件,但基于电子邮件主题生成的模板可以为公司提供更好的生产力。本文提出了一个根据邮件主题对用户邮件进行聚类的系统,并介绍了一个利用聚类后的邮件样本生成邮件模板的系统。为此,我们使用了安然电子邮件数据集,并对不同文本预处理和主题建模算法(如DMM、GPU-DMM、GPU-PDMM、LF-DMM、LDA、LF-LDA、BTM、WNTM、PTM、SATM)的性能进行了研究和比较,以确定最有效的算法。获取邮件主题后,系统将显示代表所选主题的邮件示例,并允许授权用户创建泛化这些主题的模板。
{"title":"Email Clustering & Generating Email Templates Based on Their Topics","authors":"Fatih Coşkun, C. Gezer, V. C. Gungor","doi":"10.1145/3471287.3471298","DOIUrl":"https://doi.org/10.1145/3471287.3471298","url":null,"abstract":"Email templates have a significant impact on users in terms of productivity. Using an email template that is produced successfully is going to transfer the main information with a considerable impression. While the previous studies were focused on the email generation by text-differences in the content of the emails, generated templates based on email topics can provide better productivity for the companies. This article proposes a system, in which user emails are clustered according to the topics of the emails, and introduces an email template generation system that utilizes the sample emails belonging to the formed email clusters. For this purpose, the Enron email dataset has been used and the performance of different text preprocessing and topic modeling algorithms, such as DMM, GPU-DMM, GPU-PDMM, LF-DMM, LDA, LF-LDA, BTM, WNTM, PTM, SATM, have been investigated and compared to determine the most efficient one. After obtaining the email topics, the system shows the examples of the emails representing the selected topics and enables the authorized users to create templates that generalize these topics.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114497443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LASTD: A Manually Annotated and Tested Large Arabic Sentiment Tweets Dataset LASTD:一个手动标注和测试的大型阿拉伯语情感推文数据集
Pub Date : 2021-05-27 DOI: 10.1145/3471287.3471293
Kariman Elshakankery, M. Fayek, Mona Farouk
With the growing attention towards Arabic Sentiment Analysis (SA), the availability of annotated dataset has raised. Although acquiring dataset from social media platforms, microblogs and so on is an easy task, annotation is the hard part. Dataset annotation requires a lot of manual tedious work which stands as a major problem. In addition to that, some datasets are built in house and aren't available for public access. This paper introduces the LASTD which is a manually annotated dataset for Arabic tweets sentiment analysis along with an insight of its statistics and benchmarks. It consists of more than 15K Arabic tweets annotated as positive, negative and neutral. Using 10-cross validation, three different classifiers were trained and tested for 3-class classification problem and 2-class classification problem. The support vector machine (SVM) classifier tends to have the highest accuracy. LASTD is made public for academic research.
随着人们对阿拉伯语情感分析(SA)的日益关注,标注数据集的可用性也越来越高。虽然从社交媒体平台、微博等获取数据集是一件容易的事情,但标注是困难的部分。数据集标注需要大量繁琐的手工工作,这是一个主要问题。除此之外,一些数据集是内部构建的,不供公众访问。本文介绍了LASTD,这是一个用于阿拉伯语推文情感分析的手动注释数据集,并对其统计数据和基准进行了分析。它由超过15K的阿拉伯语推文组成,这些推文被标注为积极的、消极的和中立的。采用10-交叉验证,对3类分类问题和2类分类问题分别训练和测试了3种不同的分类器。支持向量机(SVM)分类器往往具有最高的准确率。LASTD被公开用于学术研究。
{"title":"LASTD: A Manually Annotated and Tested Large Arabic Sentiment Tweets Dataset","authors":"Kariman Elshakankery, M. Fayek, Mona Farouk","doi":"10.1145/3471287.3471293","DOIUrl":"https://doi.org/10.1145/3471287.3471293","url":null,"abstract":"With the growing attention towards Arabic Sentiment Analysis (SA), the availability of annotated dataset has raised. Although acquiring dataset from social media platforms, microblogs and so on is an easy task, annotation is the hard part. Dataset annotation requires a lot of manual tedious work which stands as a major problem. In addition to that, some datasets are built in house and aren't available for public access. This paper introduces the LASTD which is a manually annotated dataset for Arabic tweets sentiment analysis along with an insight of its statistics and benchmarks. It consists of more than 15K Arabic tweets annotated as positive, negative and neutral. Using 10-cross validation, three different classifiers were trained and tested for 3-class classification problem and 2-class classification problem. The support vector machine (SVM) classifier tends to have the highest accuracy. LASTD is made public for academic research.","PeriodicalId":306474,"journal":{"name":"2021 the 5th International Conference on Information System and Data Mining","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116328218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2021 the 5th International Conference on Information System and Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1