Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services最新文献

英文中文

A Rule-based Skyline Computation over a Dynamic Database 动态数据库上基于规则的Skyline计算

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429117

Ghazaleh Babanejad Dehaki, H. Ibrahim, F. Sidi, N. Udzir, A. Alwan

Skyline query which relies on the notion of Pareto dominance filters the data items from a database by ensuring only those data items that are not worse than any others are selected as skylines. However, the dynamic nature of databases in which their states and/or structures change throughout their lifetime to incorporate the current and latest information of database applications, requires a new set of skylines to be derived. Blindly computing skylines on the new state/structure of a database is inefficient, as not all the data items are affected by the changes. Hence, this paper proposes a rule-based approach in tackling the above issue with the main aim at avoiding unnecessary skyline computations. Based on the type of operation that changes the state/structure of a database, i.e. insert/delete/update a data item(s) or add/remove a dimension(s), a set of rules are defined. Besides, the prominent dominance relationships when pairwise comparisons are performed are retained; which are then utilised in the process of computing a new set of skylines. Several analyses have been conducted to evaluate the performance and prove the efficiency of our proposed solution.

天际线查询依赖于帕累托优势的概念，通过确保只有那些不比其他任何数据项差的数据项被选为天际线来过滤数据库中的数据项。然而，数据库的动态特性(其状态和/或结构在其整个生命周期中不断变化，以合并数据库应用程序的当前和最新信息)需要派生一组新的天际线。盲目地计算数据库的新状态/结构的天际线是低效的，因为并不是所有的数据项都受到更改的影响。因此，本文提出了一种基于规则的方法来解决上述问题，主要目的是避免不必要的天际线计算。基于改变数据库状态/结构的操作类型，即插入/删除/更新数据项或添加/删除维度，定义了一组规则。此外，两两比较时显著的优势关系得以保留;然后在计算一组新的天际线的过程中使用这些数据。已经进行了一些分析来评估性能并证明我们提出的解决方案的效率。

{"title":"A Rule-based Skyline Computation over a Dynamic Database","authors":"Ghazaleh Babanejad Dehaki, H. Ibrahim, F. Sidi, N. Udzir, A. Alwan","doi":"10.1145/3428757.3429117","DOIUrl":"https://doi.org/10.1145/3428757.3429117","url":null,"abstract":"Skyline query which relies on the notion of Pareto dominance filters the data items from a database by ensuring only those data items that are not worse than any others are selected as skylines. However, the dynamic nature of databases in which their states and/or structures change throughout their lifetime to incorporate the current and latest information of database applications, requires a new set of skylines to be derived. Blindly computing skylines on the new state/structure of a database is inefficient, as not all the data items are affected by the changes. Hence, this paper proposes a rule-based approach in tackling the above issue with the main aim at avoiding unnecessary skyline computations. Based on the type of operation that changes the state/structure of a database, i.e. insert/delete/update a data item(s) or add/remove a dimension(s), a set of rules are defined. Besides, the prominent dominance relationships when pairwise comparisons are performed are retained; which are then utilised in the process of computing a new set of skylines. Several analyses have been conducted to evaluate the performance and prove the efficiency of our proposed solution.","PeriodicalId":212557,"journal":{"name":"Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127146877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Analysis of Relationship between Confirmation Bias and Web Search Behavior 确认偏差与网络搜索行为的关系分析

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429086

Masaki Suzuki, Yusuke Yamamoto

In this paper, we analyze the relationship between web search behavior and confirmation bias, in which people prefer to browse information that supports their existing opinions and beliefs. We conducted an online user experiment in which 89 participants were asked to perform a web search task to obtain health information. In this experiment, we controlled the participants' prior beliefs by presenting them with prior information to manipulate their impressions of a search topic prior to performing the search task. We then analyzed their behavioral logs during the search task. The results demonstrate that participants with confirmation bias frequently browsed only the top search results and completed the search task quickly. The results also indicate that, even if participants with the confirmation bias possessed health literacy, they did not utilize this literacy, even though such literacy is essential when viewing health information on the web critically.

在本文中，我们分析了网络搜索行为与确认偏差之间的关系，确认偏差是指人们更喜欢浏览支持他们现有观点和信念的信息。我们进行了一项在线用户实验，要求89名参与者执行网络搜索任务以获取健康信息。在本实验中，我们通过向参与者提供先验信息来控制他们在执行搜索任务之前对搜索主题的印象，从而控制他们的先验信念。然后我们分析了他们在搜索任务期间的行为日志。结果表明，有确认偏误的被试往往只浏览搜索结果的顶部，并快速完成搜索任务。结果还表明，即使确认偏差的参与者拥有健康素养，他们也没有利用这种素养，即使这种素养在批判性地查看网络上的健康信息时是必不可少的。

引用次数: 7

Extracting Rhetorical Question from Twitter 从推特上提取反问句

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429123

Rinji Suzuki, Akiyo Nadamoto

Many types of content exist on SNSs. Sometimes authors' opinions are not properly communicated to the reader. The content might be inflammatory, known as flaming. We infer the importance of extracting passages in which the author's opinion is not communicated correctly when it is presented to the reader. This study particularly examines tweets, a popular message system of the Twitter SNS, and also specifically examines "rhetorical questions." Rhetorical questions are sometimes known as mandarin sentences. People might misunderstand them and might flame the author. We consider it important to extract rhetorical question tweets automatically and present them. This paper proposes a method to extract rhetorical question tweets. First, we propose two definitions of rhetorical question tweets by our preliminary experiment. Next we propose a method extracting rhetorical question tweets based on two definitions. Definition 1 is Including the author's opinion in a question. Definition 2 is Including an author's opinion sentence, commentary sentence, or sentiment reversal in a sentence. Specifically, we proposed a method of opinion sentence extraction, commentary sentence extraction, and sentiment reversal extraction. Furthermore, we conducted two experiments and measured the benefits of our proposed methods.

sns上存在许多类型的内容。有时作者的观点没有恰当地传达给读者。内容可能是煽动性的，被称为煽动性的。我们推断，当作者的观点被呈现给读者时，提取其中没有正确传达的段落的重要性。这项研究特别研究了推特(Twitter SNS的一种流行信息系统)，也特别研究了“反问”。反问句有时被称为普通话句。人们可能会误解他们，可能会诋毁作者。我们认为自动提取反问句推文并呈现它们是很重要的。本文提出了一种提取反问句推文的方法。首先，我们通过初步实验提出了反问句推文的两种定义。接下来，我们提出了一种基于两种定义提取反问句推文的方法。定义1是在问题中包含作者的观点。定义二:在一个句子中包含作者的观点句、评论句或情感反转。具体来说，我们提出了一种观点句提取、评论句提取和情感反转提取方法。此外，我们进行了两个实验，并测量了我们提出的方法的好处。

{"title":"Extracting Rhetorical Question from Twitter","authors":"Rinji Suzuki, Akiyo Nadamoto","doi":"10.1145/3428757.3429123","DOIUrl":"https://doi.org/10.1145/3428757.3429123","url":null,"abstract":"Many types of content exist on SNSs. Sometimes authors' opinions are not properly communicated to the reader. The content might be inflammatory, known as flaming. We infer the importance of extracting passages in which the author's opinion is not communicated correctly when it is presented to the reader. This study particularly examines tweets, a popular message system of the Twitter SNS, and also specifically examines \"rhetorical questions.\" Rhetorical questions are sometimes known as mandarin sentences. People might misunderstand them and might flame the author. We consider it important to extract rhetorical question tweets automatically and present them. This paper proposes a method to extract rhetorical question tweets. First, we propose two definitions of rhetorical question tweets by our preliminary experiment. Next we propose a method extracting rhetorical question tweets based on two definitions. Definition 1 is Including the author's opinion in a question. Definition 2 is Including an author's opinion sentence, commentary sentence, or sentiment reversal in a sentence. Specifically, we proposed a method of opinion sentence extraction, commentary sentence extraction, and sentiment reversal extraction. Furthermore, we conducted two experiments and measured the benefits of our proposed methods.","PeriodicalId":212557,"journal":{"name":"Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128801047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Methodological Approach to Compare Ontologies: Proposal and Application for SLAM Ontologies 本体比较的方法论方法:SLAM本体的建议与应用

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429091

Yudith Cardinale, M. Cornejo-Lupa, Regina P. Ticona-Herrera, D. Barrios-Aranibar

Representation of the knowledge related to any domain with flexible and well-defined models, such as ontologies, provides the base to develop efficient and interoperable solutions. Hence, a proliferation of ontologies in many domains is unleashed. It is necessary to define how to compare such ontologies to decide which one is the most suitable for specific needs of users/developers. Since the emerging developing of ontologies, several studies have proposed criteria to evaluate them. Nevertheless, there is still a lack of practical and reproducible guidelines to drive a comparative evaluation of ontologies as a systematic process. In this paper, we propose a methodological process to qualitatively and quantitatively compare ontologies at Lexical, Structural, and Domain Knowledge levels, considering Correctness and Quality perspectives. Since the evaluation methods of our proposal are based in a golden-standard, it can be customized to compare ontologies in any domain. To show the suitability of our proposal, we apply our methodological approach to conduct a comparative study of ontologies in the robotic domain, in particularly for the Simultaneous Localization and Mapping (SLAM) problem. With this study case, we demonstrate that with this methodological comparative process, we are able to identify the strengths and weaknesses of ontologies, as well as the gaps still needed to fill in the target domain (SLAM for our study case).

使用灵活且定义良好的模型(如本体)表示与任何领域相关的知识，为开发高效且可互操作的解决方案提供了基础。因此，在许多领域中释放了大量的本体。有必要定义如何比较这些本体，以确定哪一个最适合用户/开发人员的特定需求。自本体出现以来，一些研究提出了评价本体的标准。然而，仍然缺乏实用的和可重复的指导方针来推动本体的比较评价作为一个系统的过程。在本文中，我们提出了一种方法学过程，从词汇、结构和领域知识层面对本体进行定性和定量比较，同时考虑到正确性和质量的观点。由于我们的提案的评估方法基于黄金标准，因此可以自定义它以比较任何领域中的本体。为了证明我们的建议的适用性，我们应用我们的方法对机器人领域的本体进行了比较研究，特别是针对同时定位和映射(SLAM)问题。通过本研究案例，我们证明了通过这种方法比较过程，我们能够识别本体论的优点和缺点，以及目标领域中仍然需要填补的空白(我们的研究案例的SLAM)。

{"title":"A Methodological Approach to Compare Ontologies: Proposal and Application for SLAM Ontologies","authors":"Yudith Cardinale, M. Cornejo-Lupa, Regina P. Ticona-Herrera, D. Barrios-Aranibar","doi":"10.1145/3428757.3429091","DOIUrl":"https://doi.org/10.1145/3428757.3429091","url":null,"abstract":"Representation of the knowledge related to any domain with flexible and well-defined models, such as ontologies, provides the base to develop efficient and interoperable solutions. Hence, a proliferation of ontologies in many domains is unleashed. It is necessary to define how to compare such ontologies to decide which one is the most suitable for specific needs of users/developers. Since the emerging developing of ontologies, several studies have proposed criteria to evaluate them. Nevertheless, there is still a lack of practical and reproducible guidelines to drive a comparative evaluation of ontologies as a systematic process. In this paper, we propose a methodological process to qualitatively and quantitatively compare ontologies at Lexical, Structural, and Domain Knowledge levels, considering Correctness and Quality perspectives. Since the evaluation methods of our proposal are based in a golden-standard, it can be customized to compare ontologies in any domain. To show the suitability of our proposal, we apply our methodological approach to conduct a comparative study of ontologies in the robotic domain, in particularly for the Simultaneous Localization and Mapping (SLAM) problem. With this study case, we demonstrate that with this methodological comparative process, we are able to identify the strengths and weaknesses of ontologies, as well as the gaps still needed to fill in the target domain (SLAM for our study case).","PeriodicalId":212557,"journal":{"name":"Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129036215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Resume Generator with Augmented Reality Features 具有增强现实功能的简历生成器

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429094

Mary Chew Jia Yi, Ong Huey Fang

A resume is an essential tool for job seekers when it comes to job hunting. This paper is intended to develop a web-based resume generator alongside with augmented reality features, known as AResume. The web-based application is built for job applicants who have difficulty in creating a professional resume from scratch, as well as trying to attempt the 'one-size-fits-all' approach. AR.js and A-Frame are the main libraries or web AR frameworks employed in the development of AResume to enrich the experience of augmented reality. A web-based AR is developed over mobile AR because of its lightweight, cross-platform support and no installation required. A generated resume is embedded with a QR code and AR markers. The QR code could be scanned using a smartphone to direct users to the AR scanner website. Users are able to move the scanner from marker to marker to view different contents such as videos, photos, and documents. AResume not only enables job applicants to create a resume with augmented features but also provides a better user experience for hiring managers when reviewing resumes.

简历是求职者找工作时必不可少的工具。本文的目的是开发一个基于网络的简历生成器与增强现实功能，被称为简历。这个基于网络的应用程序是为那些在从零开始制作专业简历方面有困难的求职者而设计的，同时也试图尝试“一刀切”的方法。AR.js和A-Frame是AResume开发中使用的主要库或web AR框架，以丰富增强现实的体验。基于web的AR是在移动AR之上开发的，因为它轻量级、跨平台支持且不需要安装。生成的简历嵌入QR码和AR标记。用户可以用智能手机扫描二维码，引导用户进入AR扫描仪网站。用户可以将扫描仪从一个标记移动到另一个标记，以查看不同的内容，如视频、照片和文档。简历不仅使求职者能够创建具有增强功能的简历，而且还为招聘经理在审阅简历时提供了更好的用户体验。

引用次数: 0

An Analysis of Confidentiality Issues in Data Lakes 数据湖中的保密问题分析

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429109

João Luiz Monteiro Joaquim, R. Mello

A data lake is a relatively recent technology to maintain and allow access to voluminous and heterogeneous data sources. Governments, large corporations and startups have increasingly considered it for storing useful data and obtain valuable business trends. However, there is still a long evolutionary path related to data lake management, where data security is an open issue. In this paper we investigate confidentiality issues in the context of data lakes, with a focus on authentication and authorization. We apply a systematic review methodology focusing on approaches that provide some technology for authentication and authorization management. In the following, we compare the selected studies w.r.t. the used technologies and we also analyze how they are positioned w.r.t. a reference architecture for a data lake management system. This is the first paper that presents such a kind of analysis for data lakes.

数据湖是一种相对较新的技术，用于维护和访问海量异构数据源。政府、大公司和初创公司越来越多地将其视为存储有用数据和获取有价值的业务趋势的工具。然而，与数据湖管理相关的发展道路仍然很长，其中数据安全是一个开放的问题。在本文中，我们研究了数据湖背景下的机密性问题，重点是身份验证和授权。我们采用系统的审查方法，重点关注为身份验证和授权管理提供一些技术的方法。在下文中，我们比较了所选的研究和使用的技术，并分析了它们如何定位为数据湖管理系统的参考架构。这是第一篇对数据湖进行这种分析的论文。

引用次数: 0

Web Scraping versus Twitter API: A Comparison for a Credibility Analysis 网页抓取与Twitter API:可信度分析的比较

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429104

Irvin Dongo, Yudith Cadinale, A. Aguilera, F. Martínez, Yuni Quintero, Sergio Barrios

Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes.

Twitter是网络上最受欢迎的信息来源之一。因此，有许多研究集中在分析共享信息的可信度上。大多数建议使用Twitter API或web抓取来提取数据以执行此类分析。两种提取技术各有优缺点。在这项工作中，我们提出了一项研究来评估他们的表现和行为。这项研究的动机来自于有必要知道如何提取在线信息，以便实时分析网络上发布的内容的可信度。为此，我们开发了一个框架，该框架提供了数据提取的两种替代方案，并实现了先前提出的可信度模型。我们的框架是作为一个能够实时分析推文的谷歌Chrome扩展实现的。结果报告，两种方法产生相同的可信度值，当一个稳健的规范化过程应用到文本(即，推文)。此外，在时间性能方面，web抓取比Twitter API更快，在获取数据方面更灵活;然而，网页抓取对网站的变化非常敏感。

{"title":"Web Scraping versus Twitter API: A Comparison for a Credibility Analysis","authors":"Irvin Dongo, Yudith Cadinale, A. Aguilera, F. Martínez, Yuni Quintero, Sergio Barrios","doi":"10.1145/3428757.3429104","DOIUrl":"https://doi.org/10.1145/3428757.3429104","url":null,"abstract":"Twitter is one of the most popular information source available on the Web. Thus, there exist many studies focused on analyzing the credibility of the shared information. Most proposals use either Twitter API or web scraping to extract the data to perform such analysis. Both extraction techniques have advantages and disadvantages. In this work, we present a study to evaluate their performance and behavior. The motivation for this research comes from the necessity to know ways to extract online information in order to analyze in real-time the credibility of the content posted on the Web. To do so, we develop a framework which offers both alternatives of data extraction and implements a previously proposed credibility model. Our framework is implemented as a Google Chrome extension able to analyze tweets in real-time. Results report that both methods produce identical credibility values, when a robust normalization process is applied to the text (i.e., tweet). Moreover, concerning the time performance, web scraping is faster than Twitter API, and it is more flexible in terms of obtaining data; however, web scraping is very sensitive to website changes.","PeriodicalId":212557,"journal":{"name":"Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116034086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Tailored Graph Embeddings for Entity Alignment on Historical Data 历史数据实体对齐的定制图嵌入

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429111

J. Baas, M. Dastani, A. Feelders

In the domain of the Dutch cultural heritage various data sets describe different aspects of life during the Dutch Golden Age. These data sets, in the form of RDF graphs, use different standards and contain noise in the values of literal nodes, such as misspelled names and uncertainty in dates. The Golden Agents project aims at answering queries about the Dutch Golden ages using these distributed and independently maintained data sets. A problem in this project, among many other problems, is the identification of persons who occur in multiple data sets but under different URI's. This paper aims to solve this specific problem and generate a linkset, i.e. a set of pairs of URI's which are judged to represent the same person. We use domain knowledge in the application of an existing node context generation algorithm to serve as input for GloVe, an algorithm originally designed for embedding words. This embedding is then used to train a classifier on pairs of URI's which are known duplicates and non-duplicates. Using just the cosine similarity between URI-pairs in embedding space for prediction, we obtain a simple classifier with an F½-score of around 0.85, even when very few training examples are provided. On larger training sets, more complex classifiers are shown to reach an F½-score of up to 0.88.

在荷兰文化遗产领域，各种数据集描述了荷兰黄金时代生活的不同方面。这些数据集以RDF图的形式使用不同的标准，并且在文字节点的值中包含噪声，例如拼写错误的名称和日期中的不确定性。Golden Agents项目旨在利用这些分布式且独立维护的数据集回答有关荷兰黄金时代的问题。在许多其他问题中，这个项目中的一个问题是识别出现在多个数据集中但使用不同URI的人员。本文旨在解决这一特定问题，并生成一个链接集，即一组被判断为代表同一个人的URI对的集合。我们在现有节点上下文生成算法的应用中使用领域知识作为GloVe的输入，GloVe是一种最初设计用于嵌入单词的算法。然后使用这种嵌入在已知的重复和非重复的URI对上训练分类器。仅使用嵌入空间中uri对之间的余弦相似度进行预测，我们获得了一个F½-分数约为0.85的简单分类器，即使提供的训练示例非常少。在更大的训练集上，更复杂的分类器可以达到高达0.88的F½-分数。

{"title":"Tailored Graph Embeddings for Entity Alignment on Historical Data","authors":"J. Baas, M. Dastani, A. Feelders","doi":"10.1145/3428757.3429111","DOIUrl":"https://doi.org/10.1145/3428757.3429111","url":null,"abstract":"In the domain of the Dutch cultural heritage various data sets describe different aspects of life during the Dutch Golden Age. These data sets, in the form of RDF graphs, use different standards and contain noise in the values of literal nodes, such as misspelled names and uncertainty in dates. The Golden Agents project aims at answering queries about the Dutch Golden ages using these distributed and independently maintained data sets. A problem in this project, among many other problems, is the identification of persons who occur in multiple data sets but under different URI's. This paper aims to solve this specific problem and generate a linkset, i.e. a set of pairs of URI's which are judged to represent the same person. We use domain knowledge in the application of an existing node context generation algorithm to serve as input for GloVe, an algorithm originally designed for embedding words. This embedding is then used to train a classifier on pairs of URI's which are known duplicates and non-duplicates. Using just the cosine similarity between URI-pairs in embedding space for prediction, we obtain a simple classifier with an F½-score of around 0.85, even when very few training examples are provided. On larger training sets, more complex classifiers are shown to reach an F½-score of up to 0.88.","PeriodicalId":212557,"journal":{"name":"Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114926281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Extraction Method for a Recipe's Uniqueness based on Recipe Frequency and LexRank of Procedures 基于配方频次和过程LexRank的配方唯一性提取方法

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429128

Tatsuya Oonita, D. Kitayama

Users often obtain recipes from culinary websites when they are cooking. In this case, various recipes for the same dish are displayed. Therefore, users compare each recipe and decide which one they want to use. We believe that it would be easier to select a recipe if we extract the point of uniqueness of each recipe and present it in the search results. In this paper, we propose a method of extracting the uniqueness of a recipe by analyzing the ingredients and procedures used. Specifically, we reference a basic recipe that describes the standard cooking methods of a dish and extract the differences between it and other recipes to ascertain the points of uniqueness of the recipe using procedures' importance and correspondence. As a result of implementing the proposed method with several recipes, we confirmed that the proposed method is able to extract the uniqueness.

用户经常在烹饪时从烹饪网站获取食谱。在这种情况下，会显示同一道菜的各种食谱。因此，用户比较每个配方，并决定他们想要使用哪一个。我们认为，如果我们提取每个食谱的唯一性点并将其呈现在搜索结果中，将更容易选择食谱。在本文中，我们提出了一种通过分析成分和程序来提取配方独特性的方法。具体来说，我们参考了一个基本食谱，它描述了一道菜的标准烹饪方法，并提取了它与其他食谱的区别，利用程序的重要性和对应性来确定该食谱的独特性。通过对多个菜谱的实现，验证了所提方法能够提取出唯一性。

引用次数: 0

Chemoinformatics for Data Scientists: an Overview 数据科学家的化学信息学:概述

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Pub Date : 2020-11-30 DOI: 10.1145/3428757.3429147

Shrooq A. Alsenan, Isra M. Al-Turaiki, Alaaeldin M. Hafez

Shrooq A. Alsenan∗ 436203869@student.ksu.edu.sa Information Systems Department, College of Computer and Information Sciences, King Saud University Riyadh, Saudi Arabia Isra Al-Turaiki Information Technology Department, College of Computer and Information Sciences, King Saud University Riyadh, Saudi Arabia ialturaiki@ksu.edu.sa Alaaeldin Hafez Information Systems Department, College of Computer and Information Sciences, King Saud University Riydh, Saudi Arabia ahafez@ksu.edu.sa

Shrooq A. Alsenan * 436203869@student.ksu.edu.sa沙特阿拉伯利雅得沙特国王大学计算机与信息科学学院信息系统系Al-Turaiki信息技术系ialturaiki@ksu.edu.sa沙特阿拉伯利雅得沙特国王大学计算机与信息科学学院信息系统系Alaaeldin Hafez ahafez@ksu.edu.sa

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀