首页 > 最新文献

2017 International Conference on Progress in Informatics and Computing (PIC)最新文献

英文 中文
Towards fuzzy QoS driven service selection with user requirements 面向模糊QoS驱动的用户需求服务选择
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359548
Jiajun Xu, Lin Guo, Ruxia Zhang, Yin Zhang, Hualang Hu, Fei Wang, Zhiyuan Pei
Many QoS-aware service selection approaches assume that the QoS attributes are crisp values and the actual user requirements are not taken into consideration, when the service-oriented applications are constructed. As a result, users searching result may not be correct and good, because there are uncertainties in the data and the optimal solutions but not satisfying some requirements may not be acceptable to some users. In this paper, we propose to use Fuzzy Set Theory (FST) and fuzzy genetic algorithm (FGA) for QoS-based service selection. FST is applied to specify the triangular fuzzy-valued description of the QoS properties. A FGA is proposed to solve the QoS-aware service composition problem, which considers the actual QoS requirements from users in the selection process. Empirical comparisons with two algorithms on different scales of composite service indicate that FGA is highly competitive regards to searching capability.
在构造面向服务的应用程序时,许多支持QoS的服务选择方法都假定QoS属性是清晰的值,而不考虑实际的用户需求。因此,用户的搜索结果可能不是正确和良好的,因为数据中存在不确定性,而不满足某些要求的最优解可能是某些用户无法接受的。在本文中,我们提出使用模糊集合理论(FST)和模糊遗传算法(FGA)进行基于qos的服务选择。应用FST来指定QoS属性的三角模糊值描述。为了解决感知QoS的服务组合问题,提出了一种FGA算法,在选择过程中考虑用户的实际QoS需求。两种算法在不同规模的复合服务上的经验比较表明,FGA在搜索能力方面具有很强的竞争力。
{"title":"Towards fuzzy QoS driven service selection with user requirements","authors":"Jiajun Xu, Lin Guo, Ruxia Zhang, Yin Zhang, Hualang Hu, Fei Wang, Zhiyuan Pei","doi":"10.1109/PIC.2017.8359548","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359548","url":null,"abstract":"Many QoS-aware service selection approaches assume that the QoS attributes are crisp values and the actual user requirements are not taken into consideration, when the service-oriented applications are constructed. As a result, users searching result may not be correct and good, because there are uncertainties in the data and the optimal solutions but not satisfying some requirements may not be acceptable to some users. In this paper, we propose to use Fuzzy Set Theory (FST) and fuzzy genetic algorithm (FGA) for QoS-based service selection. FST is applied to specify the triangular fuzzy-valued description of the QoS properties. A FGA is proposed to solve the QoS-aware service composition problem, which considers the actual QoS requirements from users in the selection process. Empirical comparisons with two algorithms on different scales of composite service indicate that FGA is highly competitive regards to searching capability.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Lexicon-based semi-CRF for Chinese clinical text word segmentation 基于词典的中文临床文本分词半crf
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359512
Guoqing Xia, Yao Shen, Qian-Xiang Lin
Word segmentation is in most cases a base for text analysis and absolutely vital to the accuracy of subsequent natural language processing (NLP) tasks. While word segmentation for normal text has been intensively studied and quite a few algorithms have been proposed, these algorithms however do not work well in special fields, e.g., in clinical text analysis. Besides, most state-of-the-art methods have difficulties in identifying out-of-vocabulary (OOV) words. For these two reasons, in this paper, we propose a semi-supervised CRF (semi-CRF) algorithm for Chinese clinical text word segmentation. Semi-CRF is implemented by modifying the learning objective so as to adapt for partial labeled data. Training data are obtained by applying a bidirectional lexicon matching scheme. A modified Viterbi algorithm using lexicon matching scheme is also proposed for word segmentation on raw sentences. Experiments show that our model has a precision of 93.88% on test data and outperforms two popular open source Chinese word segmentation tools i.e., HanLP and THULAC. By using lexicon, our model is able to be adapted for other domain text word segmentation.
在大多数情况下,分词是文本分析的基础,对后续自然语言处理(NLP)任务的准确性至关重要。虽然对正常文本的分词已经进行了深入的研究,并且已经提出了相当多的算法,但是这些算法在特殊领域,例如临床文本分析中表现不佳。此外,大多数最先进的方法在识别词汇外(OOV)单词方面存在困难。基于这两个原因,本文提出了一种用于中文临床文本分词的半监督CRF (semi-CRF)算法。半crf是通过修改学习目标来实现的,以适应部分标记的数据。训练数据的获取采用双向词典匹配方案。针对原始句子的分词问题,提出了一种基于词典匹配的改进Viterbi算法。实验表明,该模型在测试数据上的准确率为93.88%,优于两种流行的开源中文分词工具,即HanLP和THULAC。通过使用词典,我们的模型可以适用于其他领域的文本分词。
{"title":"Lexicon-based semi-CRF for Chinese clinical text word segmentation","authors":"Guoqing Xia, Yao Shen, Qian-Xiang Lin","doi":"10.1109/PIC.2017.8359512","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359512","url":null,"abstract":"Word segmentation is in most cases a base for text analysis and absolutely vital to the accuracy of subsequent natural language processing (NLP) tasks. While word segmentation for normal text has been intensively studied and quite a few algorithms have been proposed, these algorithms however do not work well in special fields, e.g., in clinical text analysis. Besides, most state-of-the-art methods have difficulties in identifying out-of-vocabulary (OOV) words. For these two reasons, in this paper, we propose a semi-supervised CRF (semi-CRF) algorithm for Chinese clinical text word segmentation. Semi-CRF is implemented by modifying the learning objective so as to adapt for partial labeled data. Training data are obtained by applying a bidirectional lexicon matching scheme. A modified Viterbi algorithm using lexicon matching scheme is also proposed for word segmentation on raw sentences. Experiments show that our model has a precision of 93.88% on test data and outperforms two popular open source Chinese word segmentation tools i.e., HanLP and THULAC. By using lexicon, our model is able to be adapted for other domain text word segmentation.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125916527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The study of image feature extraction and classification 图像特征提取与分类的研究
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359537
Jingjin Guo, Lizhen Liu, Wei Song, Chao Du, Xinlei Zhao
As we all know, research continues in the areas of image classification development in computer vision. People eagerly hope to achieve a perfect classification accuracy, however, many results of these experiments are less than satisfactory because of many complex factors. Therefore, in order to find these factors and improve the classification accuracy, we describe the details of classification methods with logistic regression and support vector machine algorithm, then discuss the impact of different methods on classification results and the factors that affect the classification accuracy in one method. Before that we briefly introduce the image feature extraction which plays a necessary role of image classification.
众所周知,计算机视觉在图像分类领域的研究仍在继续。人们迫切希望达到完美的分类精度,然而由于许多复杂的因素,这些实验的许多结果并不令人满意。因此,为了找到这些因素,提高分类精度,我们详细描述了逻辑回归和支持向量机算法的分类方法,然后讨论了不同方法对分类结果的影响以及一种方法中影响分类精度的因素。在此之前,我们简单介绍了图像特征提取,它在图像分类中起着必不可少的作用。
{"title":"The study of image feature extraction and classification","authors":"Jingjin Guo, Lizhen Liu, Wei Song, Chao Du, Xinlei Zhao","doi":"10.1109/PIC.2017.8359537","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359537","url":null,"abstract":"As we all know, research continues in the areas of image classification development in computer vision. People eagerly hope to achieve a perfect classification accuracy, however, many results of these experiments are less than satisfactory because of many complex factors. Therefore, in order to find these factors and improve the classification accuracy, we describe the details of classification methods with logistic regression and support vector machine algorithm, then discuss the impact of different methods on classification results and the factors that affect the classification accuracy in one method. Before that we briefly introduce the image feature extraction which plays a necessary role of image classification.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"23 23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128440355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mineral identification and geological mapping using near-infrared spectroscopy analysis 利用近红外光谱分析进行矿物识别和地质填图
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359526
Xiu Lian-cun, Zheng Zhizhong, Che Chunxia, Gao Yang
In recent years, the near-infrared spectroscopy measurement technology in China has made considerable progress; it has been widely used in the fields of agriculture, chemistry, geology, and medicine measurement. The near-infrared spectroscopy is very sensitive to the characteristics of C-H (methyl, a methylene group, a methoxy group, a carboxyl group, an aryl group, etc.), hydroxy O-H, mercapto S-H, and amino N-H. So, it can distinguish the crystallinity of single mineral (clay minerals, chlorite, serpentine, etc.), containing hydroxy silicate minerals (epidote, amphibole, etc.), sulfate minerals (alunite, pyritepotassium alum, gypsum, etc.), and carbonate minerals (calcite, dolomite, etc.) in the layered silicate. The character of near-infrared spectroscopy is one of the important guarantees for the instrument development of the small portable near-infrared mineral analyzer, and for its fast, accurate identification of rock samples. In this paper, we focus on the near-infrared spectral characteristics, identification method, and quantitative analysis method for the low-temperature alteration minerals; at the same time, a portable near infrared spectrometer and its principle were presented. In the experiment, spectral parameters were acquired from the characteristic spectra of altered minerals, and the relationship between altered minerals and ore-forming were established. The measured spectral data from the drilling rock cores in Zijinshan Mine in Fujian province was used as an example to illustrate the credibility of the proposed method.
近年来,近红外光谱测量技术在国内取得了长足的进步;它已广泛应用于农业、化学、地质、医药计量等领域。近红外光谱对C-H(甲基、亚甲基、甲氧基、羧基、芳基等)、羟基O-H、巯基S-H、氨基N-H的特征非常敏感。因此,它可以区分层状硅酸盐中单一矿物(粘土矿物、绿泥石、蛇纹石等)、含羟基硅酸盐矿物(绿帘石、角闪孔等)、硫酸盐矿物(明矾石、黄铁矿钾明矾、石膏等)和碳酸盐矿物(方解石、白云石等)的结晶度。近红外光谱的特性是小型便携式近红外矿物分析仪研制的重要保证之一,也是其快速、准确鉴定岩石样品的重要保证之一。本文重点介绍了低温蚀变矿物的近红外光谱特征、鉴定方法和定量分析方法;同时,介绍了一种便携式近红外光谱仪及其工作原理。实验从蚀变矿物特征光谱中获取了光谱参数,建立了蚀变矿物与成矿的关系。以福建紫金山矿岩心钻孔实测光谱数据为例,说明了该方法的可靠性。
{"title":"Mineral identification and geological mapping using near-infrared spectroscopy analysis","authors":"Xiu Lian-cun, Zheng Zhizhong, Che Chunxia, Gao Yang","doi":"10.1109/PIC.2017.8359526","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359526","url":null,"abstract":"In recent years, the near-infrared spectroscopy measurement technology in China has made considerable progress; it has been widely used in the fields of agriculture, chemistry, geology, and medicine measurement. The near-infrared spectroscopy is very sensitive to the characteristics of C-H (methyl, a methylene group, a methoxy group, a carboxyl group, an aryl group, etc.), hydroxy O-H, mercapto S-H, and amino N-H. So, it can distinguish the crystallinity of single mineral (clay minerals, chlorite, serpentine, etc.), containing hydroxy silicate minerals (epidote, amphibole, etc.), sulfate minerals (alunite, pyritepotassium alum, gypsum, etc.), and carbonate minerals (calcite, dolomite, etc.) in the layered silicate. The character of near-infrared spectroscopy is one of the important guarantees for the instrument development of the small portable near-infrared mineral analyzer, and for its fast, accurate identification of rock samples. In this paper, we focus on the near-infrared spectral characteristics, identification method, and quantitative analysis method for the low-temperature alteration minerals; at the same time, a portable near infrared spectrometer and its principle were presented. In the experiment, spectral parameters were acquired from the characteristic spectra of altered minerals, and the relationship between altered minerals and ore-forming were established. The measured spectral data from the drilling rock cores in Zijinshan Mine in Fujian province was used as an example to illustrate the credibility of the proposed method.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132012909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Research on the complexity of technological innovation supported by Internet 基于互联网的技术创新复杂性研究
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359570
Cuilan Wang, Deji Wang
This paper analyzes the complexity of Internet innovation using scale-free network theory. The evolutionary model has also been numerically simulated. The conclusion is that the model has small-world and scale-free features. The small-world effect shows that there is a wide range of high efficiency Internet resource integration. Scale-free features indicate that a few core nodes become central. Therefore, we should not only see the characteristics to enhance the positive effect of innovation performance, but also avoid the risk of lock-in and vulnerability.
本文运用无标度网络理论分析了互联网创新的复杂性。该进化模型也进行了数值模拟。结论是该模型具有小世界和无标度特征。小世界效应表明,存在着范围广泛、效率较高的互联网资源整合。无标度特征表明几个核心节点成为中心。因此,我们既要看到增强创新绩效积极效应的特征,又要避免创新绩效的锁定和脆弱性风险。
{"title":"Research on the complexity of technological innovation supported by Internet","authors":"Cuilan Wang, Deji Wang","doi":"10.1109/PIC.2017.8359570","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359570","url":null,"abstract":"This paper analyzes the complexity of Internet innovation using scale-free network theory. The evolutionary model has also been numerically simulated. The conclusion is that the model has small-world and scale-free features. The small-world effect shows that there is a wide range of high efficiency Internet resource integration. Scale-free features indicate that a few core nodes become central. Therefore, we should not only see the characteristics to enhance the positive effect of innovation performance, but also avoid the risk of lock-in and vulnerability.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134354441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An approach based on decision tree for analysis of behavior with combined cycle power plant 基于决策树的联合循环电厂行为分析方法
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359583
Abshukirov Zhandos, Jian Guo
This paper presents about Combined Cycle Power Plant (CCPP) and decision tree. CCPP considered as the best effective power suppliers to the large temperature incline between its gas turbine passage and the environment or the cooling process, and to help of their engineers, who are able to optimally venture the present temperature level. Moreover, in this paper we did comparison of four types of decision tree algorithms like Decision Stump, Hoeffding Tree, logistic model trees (LMT) and J48. Based on these algorithms we analyzed the behavior of Combined Cycle Power Plant (CCPP), particularly its temperature. The temperature is target variable among other variables. This is why temperature was divided three classes. Theoretical analysis and experimental results have shown that the J48 algorithm is the best algorithm which predict attributes of given instances precisely among other three algorithms. Based on our findings, J48 algorithm can predict precisely the temperature of Combined Cycle Power Plant and predict 97.1676% cases correctly.
本文介绍了联合循环电厂(CCPP)及其决策树。CCPP被认为是其燃气轮机通道与环境或冷却过程之间的大温度倾斜的最佳有效电源供应商,并帮助他们的工程师,他们能够最佳地冒险当前的温度水平。此外,本文还对decision Stump、Hoeffding tree、logistic model trees (LMT)和J48四种决策树算法进行了比较。在此基础上,对联合循环电厂的运行行为进行了分析,特别是对其温度进行了分析。温度是众多变量中的目标变量。这就是温度被分为三类的原因。理论分析和实验结果表明,在三种算法中,J48算法能准确地预测给定实例的属性,是最好的算法。研究结果表明,J48算法能准确预测联合循环电厂的温度,预测正确率为97.1676%。
{"title":"An approach based on decision tree for analysis of behavior with combined cycle power plant","authors":"Abshukirov Zhandos, Jian Guo","doi":"10.1109/PIC.2017.8359583","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359583","url":null,"abstract":"This paper presents about Combined Cycle Power Plant (CCPP) and decision tree. CCPP considered as the best effective power suppliers to the large temperature incline between its gas turbine passage and the environment or the cooling process, and to help of their engineers, who are able to optimally venture the present temperature level. Moreover, in this paper we did comparison of four types of decision tree algorithms like Decision Stump, Hoeffding Tree, logistic model trees (LMT) and J48. Based on these algorithms we analyzed the behavior of Combined Cycle Power Plant (CCPP), particularly its temperature. The temperature is target variable among other variables. This is why temperature was divided three classes. Theoretical analysis and experimental results have shown that the J48 algorithm is the best algorithm which predict attributes of given instances precisely among other three algorithms. Based on our findings, J48 algorithm can predict precisely the temperature of Combined Cycle Power Plant and predict 97.1676% cases correctly.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115381119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Analysis of SNS photo data taken by foreign tourists to Japan and a proposed adaptive tourism recommendation system 赴日外国游客SNS照片数据分析及适应性旅游推荐系统
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359565
Shuang Li, Shizune Takahashi, Keizo Yamada, Masanori Takagi, Jun Sasaki
As the number of foreign travelers to Japan rapidly increases, the Japanese government expects the economic growth of whole rural areas to improve via inbound tourism. However, some rural areas are difficult to access for foreign travelers because of unknown areas, language problems, and transportation inconvenience, among other factors. Thus, developing a tourism support system for foreigners' traveling in the rural areas is necessary. To investigate the most effective way to support foreign travelers, we propose a viewpoint that recommends tourism resources for foreign travelers according to differences in nationality and season. We confirm the feasibility of this system by analyzing social networking service photo data taken by foreign tourists. Finally, we propose a tourism information system considering several needs of foreign travelers.
随着来日本的外国游客数量迅速增加,日本政府希望通过入境旅游来改善整个农村地区的经济增长。然而,由于未知地区、语言问题、交通不便等因素,一些农村地区对外国游客来说很难进入。因此,开发一个外国人在农村旅游的旅游支持系统是必要的。本文提出了根据国籍和季节的不同,为外国游客推荐旅游资源的观点,以探讨对外国游客最有效的支持方式。通过分析国外游客在社交网络服务中拍摄的照片数据,证实了该系统的可行性。最后,我们提出了一个考虑到外国游客的几种需求的旅游信息系统。
{"title":"Analysis of SNS photo data taken by foreign tourists to Japan and a proposed adaptive tourism recommendation system","authors":"Shuang Li, Shizune Takahashi, Keizo Yamada, Masanori Takagi, Jun Sasaki","doi":"10.1109/PIC.2017.8359565","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359565","url":null,"abstract":"As the number of foreign travelers to Japan rapidly increases, the Japanese government expects the economic growth of whole rural areas to improve via inbound tourism. However, some rural areas are difficult to access for foreign travelers because of unknown areas, language problems, and transportation inconvenience, among other factors. Thus, developing a tourism support system for foreigners' traveling in the rural areas is necessary. To investigate the most effective way to support foreign travelers, we propose a viewpoint that recommends tourism resources for foreign travelers according to differences in nationality and season. We confirm the feasibility of this system by analyzing social networking service photo data taken by foreign tourists. Finally, we propose a tourism information system considering several needs of foreign travelers.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124669134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
PGM-WV: A context-aware hybrid model for heuristic and semantic question classification in question-answering system PGM-WV:问答系统中启发式和语义问题分类的上下文感知混合模型
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359550
Hengxun Li, Ning Wang, Guangjun Hu, Weiqing Yang
In the field of information retrieval, with the rapid growth of the amount of questions and answers, automatic question-answering system comes up to be a hot research direction, which consists of three procedures: question classification, information retrieval and answer extraction. Question classification is the first and most important part of the whole task. Currently, two kinds of algorithms are employed, rule-based algorithms and statistical-model-based algorithms. Rule-based algorithms have good performance in accuracy and pertinence with the shortcoming of relying on professional knowledge and poor scalability. Statistical-model-based algorithms get classification models from training dataset, these methods extract syntax features heuristically and provide better scalability and thus most question classification algorithms are based on statistical-model. However, semantic features have largely been overlooked in existing statistical-model-based question classification algorithms. In this paper, we propose a context-aware hybrid model based on a statistical-model PGM and a semantic language model word2vec. The experimental evaluations demonstrate the capability of the proposed model.
在信息检索领域,随着问题和答案数量的快速增长,自动问答系统成为一个热门的研究方向,该系统包括问题分类、信息检索和答案提取三个步骤。问题分类是整个任务的第一部分,也是最重要的一部分。目前主要采用两种算法:基于规则的算法和基于统计模型的算法。基于规则的算法在准确性和针对性方面表现良好,但存在依赖专业知识、可扩展性差的缺点。基于统计模型的分类算法从训练数据集中获得分类模型,这些方法启发式地提取语法特征并提供更好的可扩展性,因此大多数问题分类算法都是基于统计模型的。然而,在现有的基于统计模型的问题分类算法中,语义特征在很大程度上被忽略了。本文提出了一种基于统计模型PGM和语义语言模型word2vec的上下文感知混合模型。实验结果验证了该模型的有效性。
{"title":"PGM-WV: A context-aware hybrid model for heuristic and semantic question classification in question-answering system","authors":"Hengxun Li, Ning Wang, Guangjun Hu, Weiqing Yang","doi":"10.1109/PIC.2017.8359550","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359550","url":null,"abstract":"In the field of information retrieval, with the rapid growth of the amount of questions and answers, automatic question-answering system comes up to be a hot research direction, which consists of three procedures: question classification, information retrieval and answer extraction. Question classification is the first and most important part of the whole task. Currently, two kinds of algorithms are employed, rule-based algorithms and statistical-model-based algorithms. Rule-based algorithms have good performance in accuracy and pertinence with the shortcoming of relying on professional knowledge and poor scalability. Statistical-model-based algorithms get classification models from training dataset, these methods extract syntax features heuristically and provide better scalability and thus most question classification algorithms are based on statistical-model. However, semantic features have largely been overlooked in existing statistical-model-based question classification algorithms. In this paper, we propose a context-aware hybrid model based on a statistical-model PGM and a semantic language model word2vec. The experimental evaluations demonstrate the capability of the proposed model.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126363446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A discourse coherence model for analyzing Chinese students' essay 用话语连贯模型分析中国学生作文
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359586
Guimin Huang, Min Tan, Sirui Huang, Ruyu Mo, Ya Zhou
Many attempts on improving entity grid model have been made and boost the research on text coherence. However, only a few of them applied it to noisy data domain and students' essay. Thus, we proposed a novel discourse coherence model, which is based on entity grid model, to evaluate coherence on Chinese students' essay. In allusion to the feature of Chinese students' essay, frequently using lexical repetition and coreference methods in coherence essay, we designed a coreference module, instead of clustering algorithm or knowledge base search methods, to integrate with the enhanced coherence model. In addition, we fully merged coreference feature into similarity assessment of adjacent sentences, and the semantic coherence. Experiments show that our model outperforms entity-based model and LSA methods and has an ideal effect on students' essay automatic assessment.
人们对实体网格模型的改进进行了许多尝试,促进了文本连贯的研究。然而,将其应用于噪声数据域和学生论文的研究很少。为此,我们提出了一种基于实体网格模型的篇章连贯模型来评价中国学生作文的连贯性。针对中国学生在连贯文章中频繁使用词汇重复和共指方法的特点,我们设计了一个共指模块,取代了聚类算法或知识库搜索方法,与增强的连贯模型相结合。此外,我们将共指特征完全融合到相邻句子的相似度评估和语义连贯中。实验表明,该模型优于基于实体的模型和LSA方法,在学生作文自动评价中具有理想的效果。
{"title":"A discourse coherence model for analyzing Chinese students' essay","authors":"Guimin Huang, Min Tan, Sirui Huang, Ruyu Mo, Ya Zhou","doi":"10.1109/PIC.2017.8359586","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359586","url":null,"abstract":"Many attempts on improving entity grid model have been made and boost the research on text coherence. However, only a few of them applied it to noisy data domain and students' essay. Thus, we proposed a novel discourse coherence model, which is based on entity grid model, to evaluate coherence on Chinese students' essay. In allusion to the feature of Chinese students' essay, frequently using lexical repetition and coreference methods in coherence essay, we designed a coreference module, instead of clustering algorithm or knowledge base search methods, to integrate with the enhanced coherence model. In addition, we fully merged coreference feature into similarity assessment of adjacent sentences, and the semantic coherence. Experiments show that our model outperforms entity-based model and LSA methods and has an ideal effect on students' essay automatic assessment.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sequence-based protein-Ca2+ binding site prediction using SVM classifier ensemble with random under-sampling 基于序列的随机欠采样SVM分类器集成蛋白- ca2 +结合位点预测
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359520
Liang Qiao, Dongqing Xie
Calcium ions (Ca2+) are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Accurately recognizing Ca2+-binding sites is of significant importance for protein function analysis. Although much progress has been made, challenges remain, especially in the post-genome era where large volume of proteins without being functional annotated are quickly accumulated. In this study, we design a new ab initio predictor, CaSite, to identify Ca2+-binding residues from protein sequence. CaSite first uses evolutionary information, predicted secondary structure, predicted solvent accessibility, and Jensen-Shannon divergence information to represent each residue sample feature. A mean ensemble classifier constructed based on support vector machines (SVM) from multiple random under-samplings is used as the final prediction model, which is effective for relieving the negative influence of the imbalance phenomenon between positive and negative training samples. Experimental results demonstrate that the proposed CaSite achieves a better prediction performance and outperforms the existing sequence-based predictor, Targets.
钙离子(Ca2+)对蛋白质功能至关重要。它们参与酶催化,发挥调节作用,并帮助维持蛋白质结构。准确识别Ca2+结合位点对蛋白质功能分析具有重要意义。尽管取得了很大的进展,但挑战依然存在,特别是在后基因组时代,大量没有功能注释的蛋白质迅速积累。在这项研究中,我们设计了一个新的从头开始预测器CaSite,从蛋白质序列中识别Ca2+结合残基。CaSite首先使用进化信息、预测二级结构、预测溶剂可及性和Jensen-Shannon散度信息来表示每个残留样本特征。采用基于多个随机欠采样的支持向量机(SVM)构建均值集成分类器作为最终预测模型,有效缓解了正负训练样本不平衡现象带来的负面影响。实验结果表明,所提出的CaSite具有更好的预测性能,优于现有的基于序列的预测器Targets。
{"title":"Sequence-based protein-Ca2+ binding site prediction using SVM classifier ensemble with random under-sampling","authors":"Liang Qiao, Dongqing Xie","doi":"10.1109/PIC.2017.8359520","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359520","url":null,"abstract":"Calcium ions (Ca2+) are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Accurately recognizing Ca2+-binding sites is of significant importance for protein function analysis. Although much progress has been made, challenges remain, especially in the post-genome era where large volume of proteins without being functional annotated are quickly accumulated. In this study, we design a new ab initio predictor, CaSite, to identify Ca2+-binding residues from protein sequence. CaSite first uses evolutionary information, predicted secondary structure, predicted solvent accessibility, and Jensen-Shannon divergence information to represent each residue sample feature. A mean ensemble classifier constructed based on support vector machines (SVM) from multiple random under-samplings is used as the final prediction model, which is effective for relieving the negative influence of the imbalance phenomenon between positive and negative training samples. Experimental results demonstrate that the proposed CaSite achieves a better prediction performance and outperforms the existing sequence-based predictor, Targets.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133552634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2017 International Conference on Progress in Informatics and Computing (PIC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1