首页 > 最新文献

International journal of database theory and application最新文献

英文 中文
A Conceptual Framework for the Mining and Analysis of the Social Media Data 社交媒体数据挖掘与分析的概念框架
Pub Date : 2017-10-31 DOI: 10.14257/ijdta.2017.10.10.02
S. R. Joseph, Keletso J. Letsholo, H. Hlomani
Social media data possess the characteristics of Big Data such as volume, veracity, velocity, variability and value. These characteristics make its analysis a bit more challenging than conventional data. Manual analysis approaches are unable to cope with the fast pace at which data is being generated. Processing data manually is also time consuming and requires a lot of effort as compared to using computational methods. However, computational analysis methods usually cannot capture in-depth meanings (semantics) within data. On their individual capacity, each approach is insufficient. As a solution, we propose a Conceptual Framework, which integrates both the traditional approaches and computational approaches to the mining and analysis of social media data. This allows us to leverage the strengths of traditional content analysis, with its regular meticulousness and relative understanding, whilst exploiting the extensive capacity of Big Data analytics and accuracy of computational methods. The proposed Conceptual Framework was evaluated in two stages using an example case of the political landscape of Botswana data collected from Facebook and Twitter platforms. Firstly, a user study was carried through the Inductive Content Analysis (ICA) process using the collected data. Additionally, a questionnaire was conducted to evaluate the usability of ICA as perceived by the participants. Secondly, an experimental study was conducted to evaluate the performance of data mining algorithms on the data from the ICA process. The results, from the user study, showed that the ICA process is flexible and systematic in terms of allowing the users to analyse social media data, hence reducing the time and effort required to manually analyse data. The users’ perception in terms of ease of use and usefulness of the ICA on analysing social media data is positive. The results from the experimental study show that data mining algorithms produced higher accurate results in classifying data when supplied with data from the ICA process. That is, when data mining algorithms are integrated with the ICA process, they are able to overcome the difficulty they face to capture semantics within data. Overall, the results of this study, including the Proposed Conceptual Framework are useful to scholars and practitioners who wish to do some researches on social media data mining and analysis. The Framework serves as a guide to the mining and analysis of the social media data in a systematic manner.
社交媒体数据具有海量、准确性、速度、可变性、价值等大数据特征。这些特征使其分析比传统数据更具挑战性。手工分析方法无法处理生成数据的快速速度。与使用计算方法相比,手动处理数据也很耗时,而且需要付出很多努力。然而,计算分析方法通常无法捕获数据中的深层含义(语义)。就他们的个人能力而言,每种方法都是不够的。作为解决方案,我们提出了一个概念框架,该框架将传统方法和计算方法结合起来,用于挖掘和分析社交媒体数据。这使我们能够利用传统内容分析的优势,其常规的细致和相对的理解,同时利用大数据分析的广泛能力和计算方法的准确性。利用从Facebook和Twitter平台收集的博茨瓦纳政治格局数据,分两个阶段对拟议的概念框架进行了评估。首先,利用收集到的数据,通过归纳内容分析(ICA)过程进行用户研究。此外,还进行了问卷调查,以评估参与者感知到的ICA的可用性。其次,通过实验研究,评估了数据挖掘算法对ICA过程数据的性能。来自用户研究的结果表明,ICA流程在允许用户分析社交媒体数据方面是灵活和系统的,从而减少了手动分析数据所需的时间和精力。用户对ICA在分析社交媒体数据方面的易用性和有用性的看法是积极的。实验研究结果表明,当数据挖掘算法提供来自ICA过程的数据时,数据挖掘算法在分类数据方面产生了更高的准确性。也就是说,当数据挖掘算法与ICA过程集成时,它们能够克服在数据中捕获语义所面临的困难。总的来说,本研究的结果,包括提出的概念框架,对于希望对社交媒体数据挖掘和分析进行研究的学者和实践者是有用的。该框架为系统地挖掘和分析社交媒体数据提供了指南。
{"title":"A Conceptual Framework for the Mining and Analysis of the Social Media Data","authors":"S. R. Joseph, Keletso J. Letsholo, H. Hlomani","doi":"10.14257/ijdta.2017.10.10.02","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.10.02","url":null,"abstract":"Social media data possess the characteristics of Big Data such as volume, veracity, velocity, variability and value. These characteristics make its analysis a bit more challenging than conventional data. Manual analysis approaches are unable to cope with the fast pace at which data is being generated. Processing data manually is also time consuming and requires a lot of effort as compared to using computational methods. However, computational analysis methods usually cannot capture in-depth meanings (semantics) within data. On their individual capacity, each approach is insufficient. As a solution, we propose a Conceptual Framework, which integrates both the traditional approaches and computational approaches to the mining and analysis of social media data. This allows us to leverage the strengths of traditional content analysis, with its regular meticulousness and relative understanding, whilst exploiting the extensive capacity of Big Data analytics and accuracy of computational methods. The proposed Conceptual Framework was evaluated in two stages using an example case of the political landscape of Botswana data collected from Facebook and Twitter platforms. Firstly, a user study was carried through the Inductive Content Analysis (ICA) process using the collected data. Additionally, a questionnaire was conducted to evaluate the usability of ICA as perceived by the participants. Secondly, an experimental study was conducted to evaluate the performance of data mining algorithms on the data from the ICA process. The results, from the user study, showed that the ICA process is flexible and systematic in terms of allowing the users to analyse social media data, hence reducing the time and effort required to manually analyse data. The users’ perception in terms of ease of use and usefulness of the ICA on analysing social media data is positive. The results from the experimental study show that data mining algorithms produced higher accurate results in classifying data when supplied with data from the ICA process. That is, when data mining algorithms are integrated with the ICA process, they are able to overcome the difficulty they face to capture semantics within data. Overall, the results of this study, including the Proposed Conceptual Framework are useful to scholars and practitioners who wish to do some researches on social media data mining and analysis. The Framework serves as a guide to the mining and analysis of the social media data in a systematic manner.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87834348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Security Alarm Systems: Modeling and Analysis of SIA Protocol 安全报警系统:SIA协议的建模与分析
Pub Date : 2017-09-30 DOI: 10.14257/ijdta.2017.10.9.04
Shankar Raman Ravindran
{"title":"Security Alarm Systems: Modeling and Analysis of SIA Protocol","authors":"Shankar Raman Ravindran","doi":"10.14257/ijdta.2017.10.9.04","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.9.04","url":null,"abstract":"","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86922857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Proposal for an Economic Attendance Management System 一种经济考勤管理系统的设计方案
Pub Date : 2017-09-30 DOI: 10.14257/ijdta.2017.10.9.01
Bokrae Jang, J. Yim, Seunghyun Oh
This paper proposes an attendance management system (AMS) after reviewing existing AMSs. The proposed AMS uses smartphones carried by students and the wireless access points that are installed to allow Wi-Fi compliant devices such as smartphones to access to the local area network. There is almost no university campus where the local area network is not available. Since the proposed AMS does not use any other devices except the smartphones and access points, it is easy and economical to be installed. The proposed system recognizes all students in the class as attendees as long as every student carries a smartphone that is registered to the AMS. The proposed system recognizes all students who are not in the classroom as absentees. During the preparation stage, each access point is assigned a unique dynamic Internet Protocol (IP) address allocation range.
本文在回顾现有考勤管理系统的基础上,提出了一种考勤管理系统。拟议的AMS使用学生携带的智能手机和安装的无线接入点,以允许符合Wi-Fi标准的设备(如智能手机)访问局域网。几乎没有大学校园没有局域网。由于拟议的辅助医疗系统不使用智能手机和接入点以外的任何其他设备,因此安装起来既方便又经济。只要每个学生都携带在AMS注册的智能手机,该系统就会将班上的所有学生识别为参与者。拟议的系统将所有不在教室的学生视为缺勤学生。在准备阶段,为每个接入点分配一个唯一的动态IP地址分配范围。
{"title":"A Proposal for an Economic Attendance Management System","authors":"Bokrae Jang, J. Yim, Seunghyun Oh","doi":"10.14257/ijdta.2017.10.9.01","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.9.01","url":null,"abstract":"This paper proposes an attendance management system (AMS) after reviewing existing AMSs. The proposed AMS uses smartphones carried by students and the wireless access points that are installed to allow Wi-Fi compliant devices such as smartphones to access to the local area network. There is almost no university campus where the local area network is not available. Since the proposed AMS does not use any other devices except the smartphones and access points, it is easy and economical to be installed. The proposed system recognizes all students in the class as attendees as long as every student carries a smartphone that is registered to the AMS. The proposed system recognizes all students who are not in the classroom as absentees. During the preparation stage, each access point is assigned a unique dynamic Internet Protocol (IP) address allocation range.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81933354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Predictive analytics of Pima Diabetics using Deep Learning 使用深度学习的Pima糖尿病患者的最佳预测分析
Pub Date : 2017-09-30 DOI: 10.14257/IJDTA.2017.10.9.05
H. Balaji, N. Iyengar, Ronnie D. Caytiles
An intelligent predictive model using deep learning is proposed to predict the patient risk factor and severity of diabetics using conditional data set. The model involves deep learning in the form of a deep neural network which helps to apply predictive analytics on the diabetes data set to obtain optimal results. The existing predictive models is used to predict the severity and the risk factor of the diabetics based on the data which is processed. In our case Firstly, a feature selection algorithm is run for the selection process. Secondly, the deep learning model has a deep neural network which employs a Restricted Boltzmann Machine (RBM) as a basic unit to analyse the data by assigning weights to the each branch of the neural network. This deep neural network, coded on python, will help to obtain numeric results on the severity and the risk factor of the diabetics in the data set. At the end, a comparative study is done between the implementation of this model on type 1 diabetes mellitus, Pima Indians diabetes and the Rough set theory model. The results add value to additional reports because the number of studies done on diabetes using a deep learning model is few to none. This will help to predict diabetes with much more precision as shown by the results obtained. characteristic
提出了一种基于深度学习的智能预测模型,利用条件数据集预测糖尿病患者的危险因素和严重程度。该模型以深度神经网络的形式进行深度学习,有助于对糖尿病数据集进行预测分析,以获得最佳结果。利用已有的预测模型,根据处理后的数据预测糖尿病患者的严重程度和危险因素。在我们的案例中,首先,在选择过程中运行特征选择算法。其次,深度学习模型采用一个以受限玻尔兹曼机(RBM)为基本单元的深度神经网络,通过为神经网络的每个分支分配权重来分析数据。这个用python编码的深度神经网络将有助于获得数据集中糖尿病患者的严重程度和风险因素的数值结果。最后,对该模型在1型糖尿病、皮马印第安人糖尿病和粗糙集理论模型中的应用进行了比较研究。这些结果为其他报告增加了价值,因为使用深度学习模型对糖尿病进行的研究很少,甚至没有。正如所获得的结果所示,这将有助于更精确地预测糖尿病。特征
{"title":"Optimal Predictive analytics of Pima Diabetics using Deep Learning","authors":"H. Balaji, N. Iyengar, Ronnie D. Caytiles","doi":"10.14257/IJDTA.2017.10.9.05","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.9.05","url":null,"abstract":"An intelligent predictive model using deep learning is proposed to predict the patient risk factor and severity of diabetics using conditional data set. The model involves deep learning in the form of a deep neural network which helps to apply predictive analytics on the diabetes data set to obtain optimal results. The existing predictive models is used to predict the severity and the risk factor of the diabetics based on the data which is processed. In our case Firstly, a feature selection algorithm is run for the selection process. Secondly, the deep learning model has a deep neural network which employs a Restricted Boltzmann Machine (RBM) as a basic unit to analyse the data by assigning weights to the each branch of the neural network. This deep neural network, coded on python, will help to obtain numeric results on the severity and the risk factor of the diabetics in the data set. At the end, a comparative study is done between the implementation of this model on type 1 diabetes mellitus, Pima Indians diabetes and the Rough set theory model. The results add value to additional reports because the number of studies done on diabetes using a deep learning model is few to none. This will help to predict diabetes with much more precision as shown by the results obtained. characteristic","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88349173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Efficient Filtering Technique for Reducing Time Overhead of Dynamic Data Race Detection in Multithread Programs 减少多线程程序动态数据争用检测时间开销的高效过滤技术
Pub Date : 2017-09-30 DOI: 10.14257/IJDTA.2017.10.9.03
Ok-Kyoon Ha, Se-Won Park, S. Heo
Data races are the hardest defect to handle in multithread programs because they may lead to unpredictable results of the program caused by nondeterministic interleaving of concurrent threads. The main drawback of dynamic data race detection is the heavy additional overhead to monitor and analyze memory operations and thread operations during an execution of the program. It is important to reduce the additional overheads for debugging the concurrency bug. This paper presents a monitoring filtering technique that rules out repeatedly executing regions of parallel loops from the monitoring targets.
数据竞争是多线程程序中最难处理的缺陷,因为它们可能导致并发线程的不确定性交错而导致程序的不可预测结果。动态数据竞争检测的主要缺点是在程序执行期间监视和分析内存操作和线程操作的额外开销很大。减少调试并发性错误的额外开销是很重要的。本文提出了一种监测滤波技术,可以从监测目标中剔除重复执行的并行回路区域。
{"title":"Efficient Filtering Technique for Reducing Time Overhead of Dynamic Data Race Detection in Multithread Programs","authors":"Ok-Kyoon Ha, Se-Won Park, S. Heo","doi":"10.14257/IJDTA.2017.10.9.03","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.9.03","url":null,"abstract":"Data races are the hardest defect to handle in multithread programs because they may lead to unpredictable results of the program caused by nondeterministic interleaving of concurrent threads. The main drawback of dynamic data race detection is the heavy additional overhead to monitor and analyze memory operations and thread operations during an execution of the program. It is important to reduce the additional overheads for debugging the concurrency bug. This paper presents a monitoring filtering technique that rules out repeatedly executing regions of parallel loops from the monitoring targets.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87521591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Gain in HIVE through Query Optimization using Index Joins 通过索引连接查询优化HIVE的性能提升
Pub Date : 2017-09-30 DOI: 10.14257/ijdta.2017.10.9.02
Stephen Neal Joshua Eali, N. Thirupathi Rao, Swathi Kalam, D. Bhattacharyya, Hye-jin Kim
Index joins range unit pivotal for proficiency and quality once technique questions over colossal data. HIVE may be a cluster balanced immense data administration motor that is good for data examination applications and for OLAP for phenomenally "specific" inquiries whose yield sizes region unit little division from the contributing data, there the beast compel experiences poor execution because of repetitive circle I/O operations or end in starts of additional guide operations. Here all through this paper a shot is made and propose file joins procedure to rush up the inquiry strategy and incorporate it in Hive by mapping our vogue to the unique change stream to assess the execution, we've a slant to give and measure check inquiries on datasets created abuse TPC-H benchmark. Our outcomes show vital execution increase over moderately tremendous data sets and/or uncommonly specific questions having a two-way are a piece of and one be a piece of condition.
在庞大的数据中,一旦出现技术问题,指数加入范围单位对熟练程度和质量至关重要。HIVE可能是一个集群平衡的巨大数据管理马达,它适用于数据检查应用程序和OLAP,用于非常“特定”的查询,这些查询的生成大小与贡献数据的区域单位相差很小,在那里,由于重复的循环I/O操作或结束于额外的引导操作的启动,强制执行体验较差。在这里,通过本文的尝试,提出了一个文件连接过程,通过将我们的时尚映射到独特的变更流来评估执行,从而加快查询策略并将其纳入Hive,我们倾向于对滥用TPC-H基准创建的数据集进行检查查询。我们的结果显示,在适度庞大的数据集和/或不常见的特定问题上,执行力有了重要的提高,其中一个是双向的,一个是一个条件。
{"title":"Performance Gain in HIVE through Query Optimization using Index Joins","authors":"Stephen Neal Joshua Eali, N. Thirupathi Rao, Swathi Kalam, D. Bhattacharyya, Hye-jin Kim","doi":"10.14257/ijdta.2017.10.9.02","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.9.02","url":null,"abstract":"Index joins range unit pivotal for proficiency and quality once technique questions over colossal data. HIVE may be a cluster balanced immense data administration motor that is good for data examination applications and for OLAP for phenomenally \"specific\" inquiries whose yield sizes region unit little division from the contributing data, there the beast compel experiences poor execution because of repetitive circle I/O operations or end in starts of additional guide operations. Here all through this paper a shot is made and propose file joins procedure to rush up the inquiry strategy and incorporate it in Hive by mapping our vogue to the unique change stream to assess the execution, we've a slant to give and measure check inquiries on datasets created abuse TPC-H benchmark. Our outcomes show vital execution increase over moderately tremendous data sets and/or uncommonly specific questions having a two-way are a piece of and one be a piece of condition.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78811506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Personal Information Protection Issues and Its Solutions 个人信息保护问题及其解决方案
Pub Date : 2017-08-31 DOI: 10.14257/IJDTA.2017.10.8.11
Seung-Il Moon, Ki-Min Song, J. Shim, Ho-young Choi
This study aims to review the legal system concerning information efficiency and privacy protection on the U-Health infrastructure construction for the disabled. Regarding methodology, related provisions such as U-Health, legal definitions of the disabled, as well as Privacy Protection Act for security are analyzed and studied. As a result, Personal Information Control Right of the information agent should be secured in the gathering, processing, use and provision of the medical information. Also, legal norms to protect personal medical information leakage due to inadequate administrative and technical action are required.
本研究旨在探讨残障人士U-Health基础设施建设中有关资讯效率及私隐保护的法律制度。在方法论上,分析研究了U-Health、残疾人法律定义、安全隐私保护法等相关规定。因此,在医疗信息的收集、处理、使用和提供过程中,应保障信息代理人的个人信息控制权。此外,还需要制定法律规范,以保护因行政和技术行动不足而导致的个人医疗信息泄露。
{"title":"Personal Information Protection Issues and Its Solutions","authors":"Seung-Il Moon, Ki-Min Song, J. Shim, Ho-young Choi","doi":"10.14257/IJDTA.2017.10.8.11","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.8.11","url":null,"abstract":"This study aims to review the legal system concerning information efficiency and privacy protection on the U-Health infrastructure construction for the disabled. Regarding methodology, related provisions such as U-Health, legal definitions of the disabled, as well as Privacy Protection Act for security are analyzed and studied. As a result, Personal Information Control Right of the information agent should be secured in the gathering, processing, use and provision of the medical information. Also, legal norms to protect personal medical information leakage due to inadequate administrative and technical action are required.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88052152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Approach for Text Summarization 文本摘要的机器学习方法
Pub Date : 2017-08-31 DOI: 10.14257/ijdta.2017.10.8.08
Amita Arora, Akanksha Diwedy, Manjeet Singh, N. Chauhan
With the abundance of interminable text documents, providing summaries can help in retrieval of relevant information very quickly. The technique is to extract those sentences from the document that contain important information. This paper presents the results of our research on extractive summarization with a method based on Support Vector Machines (SVMs). The SVMs are trained using DUC-2002 dataset and the importance of sentences is judged on the basis of salient features. To evaluate the performance of our system, comparisons are conducted with two existing methods. ROUGE scores are used to compare the system generated summaries with the human generated summaries, and the experimental results show that our system's performance achieved high metrics.
由于有大量冗长的文本文档,提供摘要可以帮助快速检索相关信息。该技术是从文档中提取包含重要信息的句子。本文介绍了基于支持向量机(svm)的抽取摘要方法的研究结果。使用DUC-2002数据集训练支持向量机,并根据显著特征判断句子的重要性。为了评估系统的性能,与两种现有方法进行了比较。ROUGE分数用于比较系统生成的摘要与人类生成的摘要,实验结果表明,我们的系统的性能达到了很高的指标。
{"title":"Machine Learning Approach for Text Summarization","authors":"Amita Arora, Akanksha Diwedy, Manjeet Singh, N. Chauhan","doi":"10.14257/ijdta.2017.10.8.08","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.8.08","url":null,"abstract":"With the abundance of interminable text documents, providing summaries can help in retrieval of relevant information very quickly. The technique is to extract those sentences from the document that contain important information. This paper presents the results of our research on extractive summarization with a method based on Support Vector Machines (SVMs). The SVMs are trained using DUC-2002 dataset and the importance of sentences is judged on the basis of salient features. To evaluate the performance of our system, comparisons are conducted with two existing methods. ROUGE scores are used to compare the system generated summaries with the human generated summaries, and the experimental results show that our system's performance achieved high metrics.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88817046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fragment Allocation and Replication in Distributed Databases 分布式数据库中的片段分配和复制
Pub Date : 2017-08-31 DOI: 10.14257/ijdta.2017.10.8.05
A. Amiri
We study the problem of designing a distributed database system. We develop optimization models for the problem that deals simultaneously with two major design issues, namely which fragments to replicate, and where to store those fragments and replicas. Given the difficulty of the problem, we propose a solution algorithm based on a new formulation of the problem in which every server is allocated a fragment combination from a set of combinations generated by a randomized greedy heuristic. The results of a computational study show that the algorithm outperforms a standard branch & bound technique for large instances of the problem.
本文研究了分布式数据库系统的设计问题。我们为同时处理两个主要设计问题的问题开发了优化模型,即复制哪些片段,以及在哪里存储这些片段和副本。鉴于问题的难度,我们提出了一种基于问题新公式的解决算法,其中每个服务器从随机贪婪启发式生成的一组组合中分配一个片段组合。计算研究结果表明,该算法在处理大实例问题时优于标准的分支定界技术。
{"title":"Fragment Allocation and Replication in Distributed Databases","authors":"A. Amiri","doi":"10.14257/ijdta.2017.10.8.05","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.8.05","url":null,"abstract":"We study the problem of designing a distributed database system. We develop optimization models for the problem that deals simultaneously with two major design issues, namely which fragments to replicate, and where to store those fragments and replicas. Given the difficulty of the problem, we propose a solution algorithm based on a new formulation of the problem in which every server is allocated a fragment combination from a set of combinations generated by a randomized greedy heuristic. The results of a computational study show that the algorithm outperforms a standard branch & bound technique for large instances of the problem.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88388247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Study on the Visualizing Time Series Data Using R 基于R的时间序列数据可视化研究
Pub Date : 2017-08-31 DOI: 10.14257/ijdta.2017.10.8.01
Eunmi Jung, A. Kim, Hyenki Kim
With the recent increase in data volume, there is a growing interest in Big Data technology and there is also a growing interest in techniques to visualize result of big data processing. The vast majority of people accept visual information more quickly than text. Therefore, visualization is the important thing to focus on regarding big data analysis. Therefore, the study examined various visualization methods using an open source statistical analysis software R program. The study explored a method to configure data sets and a method to implement various graphs according to visualization method using R to determine patterns in data and understand the characteristics of data at a glance through visualization of data. Through this, it was possible to determine characteristics of data that were not known only through simple regression analysis and through showing that rather than interpreting data as it is, it could be visualized in various methods through conversion of data sets, it is expected that it will help users to make various decisions.
随着最近数据量的增加,人们对大数据技术的兴趣越来越大,对大数据处理结果可视化的技术也越来越感兴趣。绝大多数人接受视觉信息比接受文字信息要快。因此,可视化是大数据分析的重点。因此,该研究使用开源统计分析软件R程序检查了各种可视化方法。本研究探索了一种配置数据集的方法和一种使用R根据可视化方法实现各种图形的方法,通过数据的可视化来确定数据中的模式,并一目了然地了解数据的特征。通过这种方法,可以确定数据的特征,而这些特征仅通过简单的回归分析是不知道的,并且可以通过数据集的转换来显示数据,而不是按原样解释数据,可以通过各种方法将数据可视化,期望可以帮助用户做出各种决策。
{"title":"A Study on the Visualizing Time Series Data Using R","authors":"Eunmi Jung, A. Kim, Hyenki Kim","doi":"10.14257/ijdta.2017.10.8.01","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.8.01","url":null,"abstract":"With the recent increase in data volume, there is a growing interest in Big Data technology and there is also a growing interest in techniques to visualize result of big data processing. The vast majority of people accept visual information more quickly than text. Therefore, visualization is the important thing to focus on regarding big data analysis. Therefore, the study examined various visualization methods using an open source statistical analysis software R program. The study explored a method to configure data sets and a method to implement various graphs according to visualization method using R to determine patterns in data and understand the characteristics of data at a glance through visualization of data. Through this, it was possible to determine characteristics of data that were not known only through simple regression analysis and through showing that rather than interpreting data as it is, it could be visualized in various methods through conversion of data sets, it is expected that it will help users to make various decisions.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84073371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International journal of database theory and application
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1