首页 > 最新文献

2017 3th International Conference on Web Research (ICWR)最新文献

英文 中文
A framework for linked data fusion and quality assessment 关联数据融合和质量评估框架
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959307
M. K. Nahari, Nasser Ghadiri, Zahra Jafarifard, A. B. Dastjerdi, J. Sack
The growth of semantic web technologies underpins the ever-increasing development of linked data and their applications. In recent years, the number of linked data sources has been raised from 12 to more than 2973 sets. The datasets are managed as decentralized sources, and their quality is a serious concern. The assessment of the quality of linked data is a key to adopting them in different fields because each data set has been developed by a different group, using various methods and tools. Moreover, crowd sourcing contributes as one of the main strategies in data collection. This contribution is seen in the tourism industry or E-commerce fields and deserves attention. The qualitative and quantitative diversity of such data is higher than those generated by official organizations and firms. In this paper, we first overview and evaluate the dimensions and measures for the quality assessment of data. Then, we present a novel framework as a solution for improving linked data quality evaluation and data fusion. Finally, we adopt several tools to assess the quality of data of some reputable data sources using the proposed framework.
语义网技术的发展为关联数据及其应用的不断发展提供了基础。近年来,链接数据源的数量从12个增加到2973多个。数据集作为分散的来源进行管理,它们的质量是一个严重的问题。对关联数据的质量进行评估是在不同领域采用关联数据的关键,因为每个数据集都是由不同的小组使用不同的方法和工具开发的。此外,众包是数据收集的主要策略之一。这种贡献在旅游行业或电子商务领域都可以看到,值得关注。这些数据在质量和数量上的多样性高于官方组织和公司产生的数据。在本文中,我们首先概述和评估数据质量评估的维度和措施。然后,我们提出了一个新的框架作为改进关联数据质量评估和数据融合的解决方案。最后,我们采用了几个工具来评估一些信誉良好的数据源的数据质量。
{"title":"A framework for linked data fusion and quality assessment","authors":"M. K. Nahari, Nasser Ghadiri, Zahra Jafarifard, A. B. Dastjerdi, J. Sack","doi":"10.1109/ICWR.2017.7959307","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959307","url":null,"abstract":"The growth of semantic web technologies underpins the ever-increasing development of linked data and their applications. In recent years, the number of linked data sources has been raised from 12 to more than 2973 sets. The datasets are managed as decentralized sources, and their quality is a serious concern. The assessment of the quality of linked data is a key to adopting them in different fields because each data set has been developed by a different group, using various methods and tools. Moreover, crowd sourcing contributes as one of the main strategies in data collection. This contribution is seen in the tourism industry or E-commerce fields and deserves attention. The qualitative and quantitative diversity of such data is higher than those generated by official organizations and firms. In this paper, we first overview and evaluate the dimensions and measures for the quality assessment of data. Then, we present a novel framework as a solution for improving linked data quality evaluation and data fusion. Finally, we adopt several tools to assess the quality of data of some reputable data sources using the proposed framework.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129660318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A bottom-up algorithm to create structurally balanced social networks by modifying the sources of tension 一种自下而上的算法,通过修改紧张的来源来创建结构平衡的社交网络
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959315
Sajjad Salehi, F. Taghiyareh
The study of social structure and the effect of it on social members is an attractive area in social networks. Structural balance theory focuses on patterns of signed links and frequency/popularity of them. In recent years several works try to define some approximations to calculate the distance of one unbalanced graph from nearest balanced one. But these works don't have any idea about the links with unstable signs that changing their sign makes the network more balanced. Also, some works introduce a centralized algorithm to detect these links. In this paper, we have introduced a localized algorithm for detecting and changing the sign of these links as a source of tension. The results of simulation for several scale-free networks with different features show that proposed algorithm has the ability to move the network to a balanced one. AS the proposed algorithm focuses on components of the social network to calculate localized measures, it is appropriate for agent-based models to study other social phenomena.
社会结构及其对社会成员的影响是社会网络研究的热点。结构平衡理论关注的是符号链接的模式及其频率/流行程度。近年来,一些研究尝试定义一些近似来计算一个不平衡图与最近的平衡图之间的距离。但这些作品没有考虑到符号不稳定的链接,改变符号会使网络更加平衡。此外,还有一些工作介绍了一种集中的算法来检测这些链接。在本文中,我们介绍了一种局部算法,用于检测和改变这些作为张力源的链接的符号。对多个具有不同特征的无标度网络的仿真结果表明,该算法具有使网络向平衡网络移动的能力。由于该算法主要关注社会网络的组成部分来计算局部度量,因此适用于基于智能体的模型来研究其他社会现象。
{"title":"A bottom-up algorithm to create structurally balanced social networks by modifying the sources of tension","authors":"Sajjad Salehi, F. Taghiyareh","doi":"10.1109/ICWR.2017.7959315","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959315","url":null,"abstract":"The study of social structure and the effect of it on social members is an attractive area in social networks. Structural balance theory focuses on patterns of signed links and frequency/popularity of them. In recent years several works try to define some approximations to calculate the distance of one unbalanced graph from nearest balanced one. But these works don't have any idea about the links with unstable signs that changing their sign makes the network more balanced. Also, some works introduce a centralized algorithm to detect these links. In this paper, we have introduced a localized algorithm for detecting and changing the sign of these links as a source of tension. The results of simulation for several scale-free networks with different features show that proposed algorithm has the ability to move the network to a balanced one. AS the proposed algorithm focuses on components of the social network to calculate localized measures, it is appropriate for agent-based models to study other social phenomena.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128342213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Plagiarism detection of flowchart images in the texts 文本中流程图图像的抄袭检测
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959317
Behnam Hadi, M. Kargar
Today, much more than in the past are discussed of plagiarism in the research. Conditions of the Web and Possibility of complex and smart searches in a short time, is rated to this, and as a result has arrived significant damages to the research. Tools designed to deal with plagiarism act on the text and ignore images. On the other, an inseparable part of information transfer are images that transfer the large volume of information in an article or scientific research. Because of the images include a very wide range and especially found large amounts of flowchart images in the computer's texts, and as respects, flowcharts are carrying a lot of information, could be one of the options of plagiarism. The purpose of this paper is examine the plagiarism rate of a paper in terms of flowchart images plagiarism using artificial neural network. The average of flowchart images recognition accuracy in terms of structure, nodes and edges in the proposed method with 81.91 percent, indicating the success of this method.
今天,比过去更多的是在研究中讨论抄袭。网络的条件和在短时间内进行复杂和智能搜索的可能性,被评为这一点,并因此给研究带来了重大损害。工具设计来处理抄袭行为的文字和忽略图像。另一方面,信息传递的一个不可分割的部分是图像,在一篇文章或科学研究中传递了大量的信息。由于图像包含的范围非常广泛,特别是在计算机的文本中发现了大量的流程图图像,而且就图像而言,流程图承载着大量的信息,可能成为抄袭的选择之一。本文的目的是利用人工神经网络从流程图图像抄袭的角度考察论文的抄袭率。该方法在结构、节点和边缘方面的平均识别准确率为81.91%,表明该方法是成功的。
{"title":"Plagiarism detection of flowchart images in the texts","authors":"Behnam Hadi, M. Kargar","doi":"10.1109/ICWR.2017.7959317","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959317","url":null,"abstract":"Today, much more than in the past are discussed of plagiarism in the research. Conditions of the Web and Possibility of complex and smart searches in a short time, is rated to this, and as a result has arrived significant damages to the research. Tools designed to deal with plagiarism act on the text and ignore images. On the other, an inseparable part of information transfer are images that transfer the large volume of information in an article or scientific research. Because of the images include a very wide range and especially found large amounts of flowchart images in the computer's texts, and as respects, flowcharts are carrying a lot of information, could be one of the options of plagiarism. The purpose of this paper is examine the plagiarism rate of a paper in terms of flowchart images plagiarism using artificial neural network. The average of flowchart images recognition accuracy in terms of structure, nodes and edges in the proposed method with 81.91 percent, indicating the success of this method.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130152069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Using the opinion leaders in social networks to improve the cold start challenge in recommender systems 利用社交网络中的意见领袖改进推荐系统的冷启动挑战
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959306
Seyed Ali Mohammadi, Azam Andalib
The increasing volume of information about goods and services has been growing confusion for online buyers in cyberspace and this problem still continues. One of the most important ways to deal with the information overload is using a system called recommender system. The task of a recommender system is to offer the most appropriate and the nearest product to the user's demands and needs. In this system, one of the main problems is the cold start challenge. This problem occurs when a new user logs on and because there is no sufficient information available in the system from the user, the system won't be able to provide appropriate recommendation and the system error will rise. In this paper, we propose to use a new measurement called opinion leaders to alleviate this problem. Opinion leader is a person that his opinion has an impact on the target user. As a result, in the case of a new user logging in and the user — item's matrix sparseness, we can use the opinion of opinion leaders to offer the appropriate recommendation for new users and thereby increase the accuracy of the recommender system. The results of several conducted tests showed that opinion leaders combined with recommender systems will effectively reduce the recommendation errors.
越来越多的关于商品和服务的信息使网上买家越来越困惑,这个问题仍然存在。处理信息过载的最重要的方法之一是使用一个叫做推荐系统的系统。推荐系统的任务是提供最合适和最接近用户需求的产品。在该系统中,冷启动挑战是一个主要问题。当新用户登录时,由于系统中没有足够的用户可用信息,系统将无法提供适当的推荐,从而出现系统错误。在本文中,我们建议使用一种叫做意见领袖的新测量来缓解这一问题。意见领袖是指他的意见对目标用户有影响的人。因此,在新用户登录和用户项矩阵稀疏的情况下,我们可以利用意见领袖的意见为新用户提供合适的推荐,从而提高推荐系统的准确性。多次测试结果表明,意见领袖与推荐系统相结合可以有效减少推荐错误。
{"title":"Using the opinion leaders in social networks to improve the cold start challenge in recommender systems","authors":"Seyed Ali Mohammadi, Azam Andalib","doi":"10.1109/ICWR.2017.7959306","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959306","url":null,"abstract":"The increasing volume of information about goods and services has been growing confusion for online buyers in cyberspace and this problem still continues. One of the most important ways to deal with the information overload is using a system called recommender system. The task of a recommender system is to offer the most appropriate and the nearest product to the user's demands and needs. In this system, one of the main problems is the cold start challenge. This problem occurs when a new user logs on and because there is no sufficient information available in the system from the user, the system won't be able to provide appropriate recommendation and the system error will rise. In this paper, we propose to use a new measurement called opinion leaders to alleviate this problem. Opinion leader is a person that his opinion has an impact on the target user. As a result, in the case of a new user logging in and the user — item's matrix sparseness, we can use the opinion of opinion leaders to offer the appropriate recommendation for new users and thereby increase the accuracy of the recommender system. The results of several conducted tests showed that opinion leaders combined with recommender systems will effectively reduce the recommendation errors.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114430514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Linked data partitioning for RDF processing on Apache Spark 在Apache Spark上进行RDF处理的链接数据分区
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959308
Amir Hossein Atashkar, Nasser Ghadiri, Mehdi Joodaki
RDF models are widely used in the web of data due to their flexibility and similarity to graph patterns. Because of the growing use of RDFs, their volumes and contents are increasing. Therefore, processing of such massive amount of data on a single machine is not efficient enough, because of the response time and limited hardware resources. A common approach to overcome this limitation is cluster processing and huge datasets could benefit distributed cluster processing on Apache Hadoop. Because of using too much of hard disks, the processing time is usually inadequate. In this paper, we propose a partitiong approach based on Apache Spark for rapid processing of RDF data models. A key feature of Apache Spark is using main memory instead of hard disk, so the speed of data processing in our method is improved. We have evaluated the proposed method by runing SQL queris on RDF data which partitioned on the cluster and demonstrates improved performance.
RDF模型由于其灵活性和与图模式的相似性而广泛应用于数据网络。由于rdf的使用越来越多,它们的数量和内容也在增加。因此,由于响应时间和有限的硬件资源,在一台机器上处理如此大量的数据是不够高效的。克服这一限制的一个常见方法是集群处理,庞大的数据集可以使Apache Hadoop上的分布式集群处理受益。由于使用过多的硬盘,处理时间通常不足。在本文中,我们提出了一种基于Apache Spark的分区方法来快速处理RDF数据模型。Apache Spark的一个重要特点是使用主存而不是硬盘,因此我们的方法提高了数据处理的速度。我们通过在集群上分区的RDF数据上运行SQL查询来评估所提出的方法,并展示了改进的性能。
{"title":"Linked data partitioning for RDF processing on Apache Spark","authors":"Amir Hossein Atashkar, Nasser Ghadiri, Mehdi Joodaki","doi":"10.1109/ICWR.2017.7959308","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959308","url":null,"abstract":"RDF models are widely used in the web of data due to their flexibility and similarity to graph patterns. Because of the growing use of RDFs, their volumes and contents are increasing. Therefore, processing of such massive amount of data on a single machine is not efficient enough, because of the response time and limited hardware resources. A common approach to overcome this limitation is cluster processing and huge datasets could benefit distributed cluster processing on Apache Hadoop. Because of using too much of hard disks, the processing time is usually inadequate. In this paper, we propose a partitiong approach based on Apache Spark for rapid processing of RDF data models. A key feature of Apache Spark is using main memory instead of hard disk, so the speed of data processing in our method is improved. We have evaluated the proposed method by runing SQL queris on RDF data which partitioned on the cluster and demonstrates improved performance.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133949170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Presenting a model based on social network analysis in order to offer a diet to users proper to their mood 提出一个基于社交网络分析的模型,为用户提供适合他们心情的饮食
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959318
Maryam Tasviri, S. Golpayegani, Hoda Ghavamipoor
This study presents a model offering people which food is healthier and makes them more satisfied based on their moods and food consumption behaviors. The social network analysis techniques are applied on the food consumption of the people whom were recorded in information systems. The implementation method is according to this procedure: first, people classified to 6 groups based on their nutrition style getting from the Islamic traditional medicine and some previous papers in the modern medicine. Then, a data network was made from people's relationships. Afterwards, a model has been presented based on the analysis of that network. To evaluate this model, the proposed method is applied on a university's self-service restaurant system. The results show that people with a healthy or "hot and wet" temperament nutrition style, have personality traits of "extraversion", "openness" and "conscientiousness". Moreover, people with a traditional nutrition style or "cold and wet" temperament nutrition, are people with personality traits of "introversion" and "neuroticism".
本研究提出了一个模型,根据人们的情绪和食物消费行为,为人们提供更健康的食物,并使他们更满意。将社会网络分析技术应用于信息系统中记录的人们的食物消费。实施方法是这样的:首先,根据伊斯兰传统医学和现代医学的一些文献中获得的营养方式,将人分为6组。然后,人们的关系组成了一个数据网络。然后,在对该网络进行分析的基础上,提出了一个模型。为了对该模型进行评价,将该方法应用于某高校自助餐厅系统。结果表明,具有健康或“湿热”气质营养风格的人具有“外向”、“开放”和“尽责”的人格特征。此外,具有传统营养风格或“寒湿”气质营养的人,是具有“内向”和“神经质”人格特征的人。
{"title":"Presenting a model based on social network analysis in order to offer a diet to users proper to their mood","authors":"Maryam Tasviri, S. Golpayegani, Hoda Ghavamipoor","doi":"10.1109/ICWR.2017.7959318","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959318","url":null,"abstract":"This study presents a model offering people which food is healthier and makes them more satisfied based on their moods and food consumption behaviors. The social network analysis techniques are applied on the food consumption of the people whom were recorded in information systems. The implementation method is according to this procedure: first, people classified to 6 groups based on their nutrition style getting from the Islamic traditional medicine and some previous papers in the modern medicine. Then, a data network was made from people's relationships. Afterwards, a model has been presented based on the analysis of that network. To evaluate this model, the proposed method is applied on a university's self-service restaurant system. The results show that people with a healthy or \"hot and wet\" temperament nutrition style, have personality traits of \"extraversion\", \"openness\" and \"conscientiousness\". Moreover, people with a traditional nutrition style or \"cold and wet\" temperament nutrition, are people with personality traits of \"introversion\" and \"neuroticism\".","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133108212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessments Sqli and Xss vulnerability in several organizational websites of North khorasan in Iran and offer solutions to fix these vulnerabilities 评估伊朗北呼罗珊几个组织网站Sqli和Xss漏洞,并提供解决这些漏洞的解决方案
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959303
Fatemeh Talebzadeh Pirvadlu, Ghodrat Sepidnam
Vulnerabilities in web applications are due to various factors. Failure to properly validated user input is one of the factors that led to run unauthorized code in these programs. Sqli and Xss are two common vulnerabilities in web applications, That is due to lack of proper input validation. Therefore, in this paper we study how to protect organizational websites of north khorasan in iran against Sqli and Xss vulnerabilities. We have analyzed eleven websites. Ten of which related to government organizations and one of them is from private organization. These Web sites have been tested with licenses taken from the relevant organizations.
web应用程序中的漏洞是由各种因素造成的。未能正确验证用户输入是导致在这些程序中运行未经授权代码的因素之一。sql和Xss是web应用程序中两个常见的漏洞,这是由于缺乏适当的输入验证。因此,本文研究了如何保护伊朗呼罗珊北部的组织网站免受Sqli和Xss漏洞的攻击。我们分析了11个网站。其中10个与政府机构有关,1个来自私人机构。这些网站已经使用从相关组织获得的许可证进行了测试。
{"title":"Assessments Sqli and Xss vulnerability in several organizational websites of North khorasan in Iran and offer solutions to fix these vulnerabilities","authors":"Fatemeh Talebzadeh Pirvadlu, Ghodrat Sepidnam","doi":"10.1109/ICWR.2017.7959303","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959303","url":null,"abstract":"Vulnerabilities in web applications are due to various factors. Failure to properly validated user input is one of the factors that led to run unauthorized code in these programs. Sqli and Xss are two common vulnerabilities in web applications, That is due to lack of proper input validation. Therefore, in this paper we study how to protect organizational websites of north khorasan in iran against Sqli and Xss vulnerabilities. We have analyzed eleven websites. Ten of which related to government organizations and one of them is from private organization. These Web sites have been tested with licenses taken from the relevant organizations.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125882394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast and scalable protein motif sequence clustering based on Hadoop framework 基于Hadoop框架的快速可扩展蛋白质基序序列聚类
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959300
Erfan Farhangi, Nasser Ghadiri, Mahsa Asadi, M. Nikbakht, Sylvain Pitre
In recent years, we are faced with large amounts of sporadic unstructured data on the web. With the explosive growth of such data, there is a growing need for effective methods such as clustering to analyze and extract information. Biological data forms an important part of unstructured data on the web. Protein sequence databases are considered as a primary source of biological data. Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed of data processing and analysis. Proteins are responsible for most of the activities in cells. The majority of proteins show their function through interaction with other proteins. Hence, prediction of protein interactions is an important research area in the biomedical sciences. Motifs are fragments frequently occurred in protein sequences. A well- known method to specify the protein interaction is based on motif Clustering. Existing works on motif clustering methods share the problem of limitation in the number of clusters. However, regarding the vast amount of motifs and the necessity of a large number of clusters, it seems that an efficient, scalable and fast method is necessary to cluster such large number of sequences. In this paper, we propose a novel approach to cluster a large number of motifs. Our approach includes extracting motifs within protein sequences, feature selection, preprocessing, dimension reduction and utilizing BigFCM (a large-scale fuzzy clustering) on several distributed nodes with Hadoop framework to take the advantage of MapReduce Programming. Experimental Results show very good Performance of our approach.
近年来,我们在网络上面临着大量零星的非结构化数据。随着此类数据的爆炸式增长,越来越需要聚类等有效的方法来分析和提取信息。生物数据是网络上非结构化数据的重要组成部分。蛋白质序列数据库被认为是生物数据的主要来源。聚类可以帮助将序列组织成同源和功能相似的组,可以提高数据处理和分析的速度。蛋白质负责细胞中的大部分活动。大多数蛋白质通过与其他蛋白质的相互作用来显示其功能。因此,蛋白质相互作用的预测是生物医学的一个重要研究领域。基序是蛋白质序列中经常出现的片段。一种已知的确定蛋白质相互作用的方法是基于基序聚类。现有的基序聚类方法都存在聚类数量有限的问题。然而,考虑到基序的数量巨大,需要大量的聚类,似乎需要一种高效、可扩展和快速的方法来聚类如此大量的序列。本文提出了一种聚类大量基序的新方法。我们的方法包括提取蛋白质序列中的基序、特征选择、预处理、降维,并利用Hadoop框架在多个分布式节点上利用BigFCM(一种大规模模糊聚类)来利用MapReduce编程的优势。实验结果表明,该方法具有良好的性能。
{"title":"Fast and scalable protein motif sequence clustering based on Hadoop framework","authors":"Erfan Farhangi, Nasser Ghadiri, Mahsa Asadi, M. Nikbakht, Sylvain Pitre","doi":"10.1109/ICWR.2017.7959300","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959300","url":null,"abstract":"In recent years, we are faced with large amounts of sporadic unstructured data on the web. With the explosive growth of such data, there is a growing need for effective methods such as clustering to analyze and extract information. Biological data forms an important part of unstructured data on the web. Protein sequence databases are considered as a primary source of biological data. Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed of data processing and analysis. Proteins are responsible for most of the activities in cells. The majority of proteins show their function through interaction with other proteins. Hence, prediction of protein interactions is an important research area in the biomedical sciences. Motifs are fragments frequently occurred in protein sequences. A well- known method to specify the protein interaction is based on motif Clustering. Existing works on motif clustering methods share the problem of limitation in the number of clusters. However, regarding the vast amount of motifs and the necessity of a large number of clusters, it seems that an efficient, scalable and fast method is necessary to cluster such large number of sequences. In this paper, we propose a novel approach to cluster a large number of motifs. Our approach includes extracting motifs within protein sequences, feature selection, preprocessing, dimension reduction and utilizing BigFCM (a large-scale fuzzy clustering) on several distributed nodes with Hadoop framework to take the advantage of MapReduce Programming. Experimental Results show very good Performance of our approach.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123952299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimizing multi objective based workflow scheduling in cloud computing using black hole algorithm 基于黑洞算法的云计算多目标工作流调度优化
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959313
F. Ebadifard, S. M. Babamir
Cloud computing employs parallel and distributed computing concepts to provide users with shared resources through the internet. One of the most important issues which are raised in a cloud environment is task scheduling on existing resources; so that on the one hand it can provide user's requirements, such as minimum run time or cost and on the other hand with the proper use of resources, can also cause service providers' benefits. In this paper we extended a recent heuristic algorithm called Black hole Optimization (BHO) and present a multi objective scheduling method for workflow application based on Pareto optimizer algorithm. Our proposed method can consider user requirements and also the interests of service providers. Using the balanced and unbalanced workflow we compared our proposed method with algorithms of SPEA2 and NSGA2 based on the parameters of completion time and cost and resource efficiency.
云计算采用并行和分布式计算概念,通过互联网为用户提供共享资源。在云环境中提出的最重要的问题之一是对现有资源的任务调度;这样一来,一方面可以满足用户的需求,如最小的运行时间或成本,另一方面与资源的合理利用,也可以使服务提供商受益。本文对黑洞优化算法(BHO)进行了扩展,提出了一种基于Pareto优化算法的工作流应用多目标调度方法。我们提出的方法可以考虑用户需求和服务提供商的利益。基于完成时间、成本和资源效率等参数,采用平衡和不平衡工作流与SPEA2和NSGA2算法进行了比较。
{"title":"Optimizing multi objective based workflow scheduling in cloud computing using black hole algorithm","authors":"F. Ebadifard, S. M. Babamir","doi":"10.1109/ICWR.2017.7959313","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959313","url":null,"abstract":"Cloud computing employs parallel and distributed computing concepts to provide users with shared resources through the internet. One of the most important issues which are raised in a cloud environment is task scheduling on existing resources; so that on the one hand it can provide user's requirements, such as minimum run time or cost and on the other hand with the proper use of resources, can also cause service providers' benefits. In this paper we extended a recent heuristic algorithm called Black hole Optimization (BHO) and present a multi objective scheduling method for workflow application based on Pareto optimizer algorithm. Our proposed method can consider user requirements and also the interests of service providers. Using the balanced and unbalanced workflow we compared our proposed method with algorithms of SPEA2 and NSGA2 based on the parameters of completion time and cost and resource efficiency.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129266158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Persian multimedia search services' users propensities 波斯语多媒体搜索服务的用户倾向
Pub Date : 2017-04-01 DOI: 10.1109/ICWR.2017.7959304
M. Mahmoudi, M. Azimzade, M. Esnaashari, M. Farhoodi, Reza Badie
Nowadays, search engines are prominent tools, which are required by users, for finding information in web. Multimedia search engines are of special importance due to two different reasons; 1) attractiveness of multimedia contents and 2) growing rate of the creation and online dissemination of such contents. In this paper every effort is made to analyze and recognize the propensities of the users of Persian multimedia search services. For this purpose, behaviors of Iranian users of Parsijoo's image, voice and video search services has been studied by analyzing its usage log files. The analyses, which have been carried out by using users' queries for a time period of three months, can be categorized into two distinct types; holistic analyses and the ones based on using frequently used queries. The results of the analyses have shown that users are mostly after entertainments and amusement topics when they use multimedia search services.
如今,搜索引擎是用户在网络上查找信息的重要工具。由于两个不同的原因,多媒体搜索引擎具有特殊的重要性;1)多媒体内容的吸引力和2)这些内容的创作和在线传播速度的增长。本文试图对波斯语多媒体搜索服务的用户倾向进行分析和识别。为此,通过分析Parsijoo的使用日志文件,研究了伊朗用户使用Parsijoo的图像、语音和视频搜索服务的行为。这些分析是通过使用用户三个月的查询进行的,可以分为两种不同的类型;整体分析和基于使用常用查询的分析。分析结果表明,用户在使用多媒体搜索服务时,主要是追求娱乐和娱乐话题。
{"title":"Persian multimedia search services' users propensities","authors":"M. Mahmoudi, M. Azimzade, M. Esnaashari, M. Farhoodi, Reza Badie","doi":"10.1109/ICWR.2017.7959304","DOIUrl":"https://doi.org/10.1109/ICWR.2017.7959304","url":null,"abstract":"Nowadays, search engines are prominent tools, which are required by users, for finding information in web. Multimedia search engines are of special importance due to two different reasons; 1) attractiveness of multimedia contents and 2) growing rate of the creation and online dissemination of such contents. In this paper every effort is made to analyze and recognize the propensities of the users of Persian multimedia search services. For this purpose, behaviors of Iranian users of Parsijoo's image, voice and video search services has been studied by analyzing its usage log files. The analyses, which have been carried out by using users' queries for a time period of three months, can be categorized into two distinct types; holistic analyses and the ones based on using frequently used queries. The results of the analyses have shown that users are mostly after entertainments and amusement topics when they use multimedia search services.","PeriodicalId":304897,"journal":{"name":"2017 3th International Conference on Web Research (ICWR)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121269751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2017 3th International Conference on Web Research (ICWR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1