首页 > 最新文献

2012 11th International Conference on Machine Learning and Applications最新文献

英文 中文
A Cooperative Learning Scheme for Energy Efficient Routing in Wireless Sensor Networks 无线传感器网络中节能路由的合作学习方案
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.143
S. Al-Wakeel, N. Al-Nabhan
Wireless sensor networks (WSNs) are gaining more interest in variety of applications. Of their different characteristics and challenges, network lifetime and efficiency are the most considered issues in WSN-based systems. The scarcest WSN's resource is energy, and one of the most energy-expensive operations is route discovery and data transmission. This paper presents a novel design of a cooperative nodes learning scheme for cooperative energy-efficient routing (CEERA) in wireless sensor networks. In CEERA, nodes perform a cooperative learning in delivering data to the base station. The retransmission of packets is controlled through an address-based timer. CEERA achieves overhead reduction and energy conservation by controlling various parameters that affect the overall network efficiency. Performance results are evaluated using NS2 simulator and our own implemented event-driven simulation. The simulation results show that our algorithm minimizes the overall energy consumption of the WSN, extends network operational lifetime, and improves network efficiency and throughput.
无线传感器网络(WSNs)在各种应用中越来越受到关注。在其不同的特点和挑战中,网络寿命和效率是基于无线网络的系统中最需要考虑的问题。无线传感器网络中最稀缺的资源是能源,而路由发现和数据传输是最耗能的操作之一。提出了一种新颖的无线传感器网络协同节能路由(CEERA)节点学习方案。在CEERA中,节点在向基站发送数据时执行合作学习。通过基于地址的定时器控制报文的重传。CEERA通过控制影响网络整体效率的各种参数,达到降低开销和节能的目的。使用NS2模拟器和我们自己实现的事件驱动仿真来评估性能结果。仿真结果表明,该算法最大限度地降低了无线传感器网络的总能耗,延长了网络的运行寿命,提高了网络效率和吞吐量。
{"title":"A Cooperative Learning Scheme for Energy Efficient Routing in Wireless Sensor Networks","authors":"S. Al-Wakeel, N. Al-Nabhan","doi":"10.1109/ICMLA.2012.143","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.143","url":null,"abstract":"Wireless sensor networks (WSNs) are gaining more interest in variety of applications. Of their different characteristics and challenges, network lifetime and efficiency are the most considered issues in WSN-based systems. The scarcest WSN's resource is energy, and one of the most energy-expensive operations is route discovery and data transmission. This paper presents a novel design of a cooperative nodes learning scheme for cooperative energy-efficient routing (CEERA) in wireless sensor networks. In CEERA, nodes perform a cooperative learning in delivering data to the base station. The retransmission of packets is controlled through an address-based timer. CEERA achieves overhead reduction and energy conservation by controlling various parameters that affect the overall network efficiency. Performance results are evaluated using NS2 simulator and our own implemented event-driven simulation. The simulation results show that our algorithm minimizes the overall energy consumption of the WSN, extends network operational lifetime, and improves network efficiency and throughput.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117236463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PQL: Protein Query Language PQL:蛋白质查询语言
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.217
S. Elfayoumy, Paul Bathen
This paper introduces a Protein Query Language (PQL) for querying protein structures in an expressive yet concise manner. One of the objectives of the paper is to demonstrate how such a language would be beneficial to protein researchers to obtain in-depth protein data from a relational database without extensive SQL knowledge. The language features options such as limiting query results by key protein characteristics such as methyl donated hydrogen bond interactions, minimum and maximum phi and psi angles, repulsive forces, CH/Pi calculations, and other pertinent factors. A backend data model was designed to support storage and retrieval of protein primary and secondary sequences, atomic-level data, as well as calculations on said data. A relational DBMS is used as the persistent storage backend, with every effort made to ensure transparent portability to most relational database systems. In addition, front end applications can be developed to support retrieving, transforming, and preprocessing of information from the Research Collaboratory for Structural Bioinformatics (RCSB) into the backend data repository. The new language and associated architecture allow users to load additional protein files from RCSB into the database, issue standard queries to download pertinent data in user-friendly formats including CSV files, issue non-standard queries against secondary structures via the protein query language, and run error-detection routines against data in the database. Query results may include normalized or denormalized data, model and chain data, residue data, atom detail data, and primary as well as secondary structure data.
本文介绍了一种蛋白质查询语言(PQL),用于以一种简洁而富有表现力的方式查询蛋白质结构。本文的目标之一是演示这种语言如何有助于蛋白质研究人员从关系数据库中获得深入的蛋白质数据,而无需广泛的SQL知识。该语言的功能选项包括通过关键蛋白质特征(如甲基捐赠氢键相互作用、最小和最大phi和psi角、排异力、CH/Pi计算和其他相关因素)限制查询结果。设计了一个后端数据模型,以支持蛋白质一级和二级序列的存储和检索、原子级数据以及对这些数据的计算。关系DBMS用作持久存储后端,并尽一切努力确保对大多数关系数据库系统的透明可移植性。此外,可以开发前端应用程序来支持从结构生物信息学研究合作实验室(RCSB)到后端数据存储库的信息检索、转换和预处理。新语言和相关架构允许用户从RCSB加载额外的蛋白质文件到数据库中,发出标准查询以用户友好格式下载相关数据,包括CSV文件,通过蛋白质查询语言对二级结构发出非标准查询,并对数据库中的数据运行错误检测例程。查询结果可能包括规范化或非规范化的数据、模型和链数据、剩余数据、原子细节数据以及主要和次要结构数据。
{"title":"PQL: Protein Query Language","authors":"S. Elfayoumy, Paul Bathen","doi":"10.1109/ICMLA.2012.217","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.217","url":null,"abstract":"This paper introduces a Protein Query Language (PQL) for querying protein structures in an expressive yet concise manner. One of the objectives of the paper is to demonstrate how such a language would be beneficial to protein researchers to obtain in-depth protein data from a relational database without extensive SQL knowledge. The language features options such as limiting query results by key protein characteristics such as methyl donated hydrogen bond interactions, minimum and maximum phi and psi angles, repulsive forces, CH/Pi calculations, and other pertinent factors. A backend data model was designed to support storage and retrieval of protein primary and secondary sequences, atomic-level data, as well as calculations on said data. A relational DBMS is used as the persistent storage backend, with every effort made to ensure transparent portability to most relational database systems. In addition, front end applications can be developed to support retrieving, transforming, and preprocessing of information from the Research Collaboratory for Structural Bioinformatics (RCSB) into the backend data repository. The new language and associated architecture allow users to load additional protein files from RCSB into the database, issue standard queries to download pertinent data in user-friendly formats including CSV files, issue non-standard queries against secondary structures via the protein query language, and run error-detection routines against data in the database. Query results may include normalized or denormalized data, model and chain data, residue data, atom detail data, and primary as well as secondary structure data.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"321 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116121785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of Tests for Programming Challenge Tasks on Graph Theory Using Evolution Strategy 基于进化策略的图论编程挑战任务的测试生成
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.194
M. Buzdalov
In this paper, an automated method for generation of tests against inefficient solutions for programming challenge tasks on graph theory is proposed. The method is based on the use of (1+1) evolution strategy and is able to defeat several kinds of inefficient solutions. The proposed method was applied to a task from the Internet problem archive, the Timus Online Judge.
本文提出了一种自动生成图论编程挑战任务无效解测试的方法。该方法基于(1+1)进化策略的使用,能够克服几种低效的解决方案。将该方法应用于Internet问题库中的一个任务——Timus Online Judge。
{"title":"Generation of Tests for Programming Challenge Tasks on Graph Theory Using Evolution Strategy","authors":"M. Buzdalov","doi":"10.1109/ICMLA.2012.194","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.194","url":null,"abstract":"In this paper, an automated method for generation of tests against inefficient solutions for programming challenge tasks on graph theory is proposed. The method is based on the use of (1+1) evolution strategy and is able to defeat several kinds of inefficient solutions. The proposed method was applied to a task from the Internet problem archive, the Timus Online Judge.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122905835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Monitoring and Determination of Wind Energy Potential by Web Based Wireless Network 基于Web无线网络的风能潜力监测与测定
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.205
Onur Keskin, Ismet Ates, Z. H. Karadeniz, A. Turgut, Z. Kıral
In this paper, we develop a web based interface which performs a wireless communication with ZigBee protocol for monitoring wind energy potential and also gathering custom reports for determination of the interested wind field. A custom printed circuit board layer is designed for interfacing with all the sensors that are in use. Web based interface is a product of responsive design for platform and device independency. This system enables scalable, accessible, reliable, low cost and low power consumption solution for renewable energy systems.
在本文中,我们开发了一个基于web的接口,该接口使用ZigBee协议进行无线通信,用于监测风能潜力,并收集自定义报告以确定感兴趣的风场。定制的印刷电路板层设计用于与所有正在使用的传感器接口。基于Web的界面是响应式设计的产物,具有平台和设备独立性。该系统为可再生能源系统提供了可扩展、可访问、可靠、低成本和低功耗的解决方案。
{"title":"Monitoring and Determination of Wind Energy Potential by Web Based Wireless Network","authors":"Onur Keskin, Ismet Ates, Z. H. Karadeniz, A. Turgut, Z. Kıral","doi":"10.1109/ICMLA.2012.205","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.205","url":null,"abstract":"In this paper, we develop a web based interface which performs a wireless communication with ZigBee protocol for monitoring wind energy potential and also gathering custom reports for determination of the interested wind field. A custom printed circuit board layer is designed for interfacing with all the sensors that are in use. Web based interface is a product of responsive design for platform and device independency. This system enables scalable, accessible, reliable, low cost and low power consumption solution for renewable energy systems.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123027836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Approach to Control of the Output Voltage in Renewable Energy Sources on the Basis of AE-method Using Genetic Algorithm 基于ae -方法的遗传算法控制可再生能源输出电压的方法
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.168
Viktor Ten, N. Isembergenov, Y. Akhmetbekov, D. Sarbassov, A. Iglikov, B. Matkarimov
Problem statement for a renewable power system test site located at Nazarbayev University is formulated with consideration of a presence of uncertain disturbances from consumer grid side. Proposed controller is based on the method of additional equilibria. For adjusting of parameters of controller and control plant the genetic algorithm is proposed. Results of MATLAB simulation of designed control system are presented.
针对纳扎尔巴耶夫大学可再生能源系统试验场的问题陈述,考虑了消费者电网侧不确定干扰的存在。所提出的控制器是基于附加平衡点的方法。针对控制器和被控对象的参数调整,提出了遗传算法。给出了所设计控制系统的MATLAB仿真结果。
{"title":"Approach to Control of the Output Voltage in Renewable Energy Sources on the Basis of AE-method Using Genetic Algorithm","authors":"Viktor Ten, N. Isembergenov, Y. Akhmetbekov, D. Sarbassov, A. Iglikov, B. Matkarimov","doi":"10.1109/ICMLA.2012.168","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.168","url":null,"abstract":"Problem statement for a renewable power system test site located at Nazarbayev University is formulated with consideration of a presence of uncertain disturbances from consumer grid side. Proposed controller is based on the method of additional equilibria. For adjusting of parameters of controller and control plant the genetic algorithm is proposed. Results of MATLAB simulation of designed control system are presented.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130675327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automated Storage Tiering Using Markov Chain Correlation Based Clustering 基于马尔可夫链关联聚类的自动存储分层
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.71
Malak Alshawabkeh, Alma Riska, Adnan Sahin, Motasem Awwad
In this paper, we develop an automated and adaptive framework that aims to move active data to high performance storage tiers and inactive data to low cost/high capacity storage tiers by learning patterns of the storage workloads. The framework proposed is designed using efficient Markov chain correlation based clustering method (MCC), which can quickly predict or detect any changes in the current workload based on what the system has experienced before. The workload data is first normalized and Markov chains are constructed from the dynamics of the IO loads of the data storage units. Based on the correlation of one-step Markov chain transition probabilities k-means method is employed to group the storage units that have similar behavior at each point. Such framework can then easily be incorporated in various resource management policies that aim at enhancing performance, reliability, availability. The predictive nature of the model, particularly makes a storage system both faster and lower-cost at the same time, because it only uses high performance tiers when needed, and uses low cost/high capacity tiers when possible.
在本文中,我们开发了一个自动化的自适应框架,旨在通过学习存储工作负载的模式将活动数据移动到高性能存储层,将非活动数据移动到低成本/高容量存储层。该框架采用高效的基于马尔可夫链相关的聚类方法(MCC)设计,可以根据系统之前的经验快速预测或检测当前工作负载的任何变化。首先将工作负载数据归一化,并根据数据存储单元的IO负载动态构造马尔可夫链。基于一步马尔可夫链转移概率的相关性,采用k-means方法对各点具有相似行为的存储单元进行分组。这样的框架可以很容易地合并到各种旨在提高性能、可靠性和可用性的资源管理策略中。该模型的预测性使存储系统在速度更快的同时成本更低,因为它只在需要时使用高性能层,在可能的情况下使用低成本/高容量层。
{"title":"Automated Storage Tiering Using Markov Chain Correlation Based Clustering","authors":"Malak Alshawabkeh, Alma Riska, Adnan Sahin, Motasem Awwad","doi":"10.1109/ICMLA.2012.71","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.71","url":null,"abstract":"In this paper, we develop an automated and adaptive framework that aims to move active data to high performance storage tiers and inactive data to low cost/high capacity storage tiers by learning patterns of the storage workloads. The framework proposed is designed using efficient Markov chain correlation based clustering method (MCC), which can quickly predict or detect any changes in the current workload based on what the system has experienced before. The workload data is first normalized and Markov chains are constructed from the dynamics of the IO loads of the data storage units. Based on the correlation of one-step Markov chain transition probabilities k-means method is employed to group the storage units that have similar behavior at each point. Such framework can then easily be incorporated in various resource management policies that aim at enhancing performance, reliability, availability. The predictive nature of the model, particularly makes a storage system both faster and lower-cost at the same time, because it only uses high performance tiers when needed, and uses low cost/high capacity tiers when possible.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123818530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Classification, Segmentation and Chronological Prediction of Cinematic Sound 电影声音的分类、分割和时间预测
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.172
Pedro Silva
This paper presents work done on classification, segmentation and chronological prediction of cinematic sound employing support vector machines (SVM) with sequential minimal optimization (SMO). Speech, music, environmental sound and silence, plus all pair wise combinations excluding silence, are considered as classes. A model considering simple adjacency rules and probabilistic output from logistic regression is used for segmenting fixed-length parts into auditory scenes. Evaluation of the proposed methods on a 44-film dataset against k-nearest neighbor, Naive Bayes and standard SVM classifiers shows superior results of the SMO classifier on all performance metrics. Subsequently, we propose sample size optimizations to the building of similar datasets. Finally, we use meta-features built from classification as descriptors in a chronological model for predicting the period of production of a given soundtrack. A decision table classifier is able to estimate the year of production of an unknown soundtrack with a mean absolute error of approximately five years.
本文介绍了基于序列最小优化(SMO)的支持向量机(SVM)在电影声音分类、分割和时间预测方面所做的工作。语音、音乐、环境声和静音,加上除静音外的所有配对组合,被视为类。该模型考虑了简单邻接规则和逻辑回归的概率输出,用于将固定长度的部分分割成听觉场景。在一个44部电影的数据集上,用k近邻、朴素贝叶斯和标准SVM分类器对所提出的方法进行了评估,结果表明SMO分类器在所有性能指标上都有优越的结果。随后,我们提出了样本大小优化,以建立类似的数据集。最后,我们使用从分类中构建的元特征作为时间顺序模型中的描述符,用于预测给定配乐的制作周期。决策表分类器能够估计未知配乐的制作年份,平均绝对误差约为5年。
{"title":"Classification, Segmentation and Chronological Prediction of Cinematic Sound","authors":"Pedro Silva","doi":"10.1109/ICMLA.2012.172","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.172","url":null,"abstract":"This paper presents work done on classification, segmentation and chronological prediction of cinematic sound employing support vector machines (SVM) with sequential minimal optimization (SMO). Speech, music, environmental sound and silence, plus all pair wise combinations excluding silence, are considered as classes. A model considering simple adjacency rules and probabilistic output from logistic regression is used for segmenting fixed-length parts into auditory scenes. Evaluation of the proposed methods on a 44-film dataset against k-nearest neighbor, Naive Bayes and standard SVM classifiers shows superior results of the SMO classifier on all performance metrics. Subsequently, we propose sample size optimizations to the building of similar datasets. Finally, we use meta-features built from classification as descriptors in a chronological model for predicting the period of production of a given soundtrack. A decision table classifier is able to estimate the year of production of an unknown soundtrack with a mean absolute error of approximately five years.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124135803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Animal Cognition, Epistemic Fluency, Social Networks and the Scientific Habit of Mind 动物认知、认知流畅性、社会网络与科学思维习惯
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.166
D. M. Morrison, Xiangen Hu
This concept paper suggests a new way of thinking about the origin, growth, and spread of a general-purpose "scientific habit of mind" in humans, and discusses how intelligent coaching agents may help. The argument begins with a description of the role of the cognitive cycle in animal thinking. We then examine critical differences between non-human and human cognition, especially in respect to the crucial and yet problematic role of language and linguistic interaction in the "widening spread and deepening hold" of scientific thinking and discourse in human populations. The paper concludes with a vision of a new kind of open, networked learning community inhabited by human learners, human experts, and intelligent agents, and suggests ways of evaluating the development of scientific thinking within these communities using a combination of social network and semantic space analysis.
这篇概念论文提出了一种思考人类通用的“科学思维习惯”的起源、发展和传播的新方法,并讨论了智能指导代理如何提供帮助。该论点首先描述了认知循环在动物思维中的作用。然后,我们研究了非人类和人类认知之间的关键差异,特别是关于语言和语言互动在人类群体中科学思维和话语的“扩大传播和深化控制”中的关键而又有问题的作用。最后,本文展望了一种由人类学习者、人类专家和智能代理共同居住的新型开放、网络化学习社区,并提出了使用社会网络和语义空间分析相结合的方法来评估这些社区中科学思维的发展。
{"title":"Animal Cognition, Epistemic Fluency, Social Networks and the Scientific Habit of Mind","authors":"D. M. Morrison, Xiangen Hu","doi":"10.1109/ICMLA.2012.166","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.166","url":null,"abstract":"This concept paper suggests a new way of thinking about the origin, growth, and spread of a general-purpose \"scientific habit of mind\" in humans, and discusses how intelligent coaching agents may help. The argument begins with a description of the role of the cognitive cycle in animal thinking. We then examine critical differences between non-human and human cognition, especially in respect to the crucial and yet problematic role of language and linguistic interaction in the \"widening spread and deepening hold\" of scientific thinking and discourse in human populations. The paper concludes with a vision of a new kind of open, networked learning community inhabited by human learners, human experts, and intelligent agents, and suggests ways of evaluating the development of scientific thinking within these communities using a combination of social network and semantic space analysis.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121317251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Validity of a New SMS Spam Collection 一种新的短信垃圾邮件收集方法的有效性研究
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.211
J. M. G. Hidalgo, Tiago A. Almeida, A. Yamakami
Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. To address this issue, we have recently proposed a new SMS Spam Collection that, to the best of our knowledge, is the largest, public and real SMS dataset available for academic studies. However, as it has been created by augmenting a previously existing database built using roughly the same sources, it is sensible to certify that there are no duplicates coming from them. So, in this paper we offer a comprehensive analysis of the new SMS Spam Collection in order to ensure that this does not happen, since it may ease the task of learning SMS spam classifiers and, hence, it could compromise the evaluation of methods. The analysis of results indicate that the procedure followed does not lead to near-duplicates and, consequently, the proposed dataset is reliable to use for evaluating and comparing the performance achieved by different classifiers.
手机正成为电子垃圾邮件的最新目标。最近的报告清楚地表明,垃圾短信的数量正在逐年急剧增加。可能,学术设置中的主要关注点之一是公共短信垃圾邮件数据集的稀缺性,这是验证和比较不同分类器所迫切需要的。为了解决这个问题,我们最近提出了一个新的短信垃圾信息收集,据我们所知,这是学术研究中最大的、公开的、真实的短信数据集。但是,由于它是通过增加使用大致相同的源构建的先前存在的数据库来创建的,因此确保没有来自它们的重复是明智的。因此,在本文中,我们对新的SMS Spam Collection进行了全面的分析,以确保不会发生这种情况,因为它可能会简化学习SMS Spam分类器的任务,因此,它可能会损害方法的评估。结果分析表明,所遵循的过程不会导致近似重复,因此,所提出的数据集可可靠地用于评估和比较不同分类器所取得的性能。
{"title":"On the Validity of a New SMS Spam Collection","authors":"J. M. G. Hidalgo, Tiago A. Almeida, A. Yamakami","doi":"10.1109/ICMLA.2012.211","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.211","url":null,"abstract":"Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. To address this issue, we have recently proposed a new SMS Spam Collection that, to the best of our knowledge, is the largest, public and real SMS dataset available for academic studies. However, as it has been created by augmenting a previously existing database built using roughly the same sources, it is sensible to certify that there are no duplicates coming from them. So, in this paper we offer a comprehensive analysis of the new SMS Spam Collection in order to ensure that this does not happen, since it may ease the task of learning SMS spam classifiers and, hence, it could compromise the evaluation of methods. The analysis of results indicate that the procedure followed does not lead to near-duplicates and, consequently, the proposed dataset is reliable to use for evaluating and comparing the performance achieved by different classifiers.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121418837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction 软件缺陷预测中高维类不平衡的混合处理方法
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.145
Kehan Gao, T. Khoshgoftaar, Amri Napolitano
High dimensionality and class imbalance are the two main problems affecting many software defect prediction. In this paper, we propose a new technique, named SelectRUSBoost, which is a form of ensemble learning that in-corporates data sampling to alleviate class imbalance and feature selection to resolve high dimensionality. To evaluate the effectiveness of the new technique, we apply it to a group of datasets in the context of software defect prediction. We employ two classification learners and six feature selection techniques. We compare the technique to the approach where feature selection and data sampling are used together, as well as the case where feature selection is used alone (no sampling used at all). The experimental results demonstrate that the SelectRUSBoost technique is more effective in improving classification performance compared to the other approaches.
高维数和类不平衡是影响软件缺陷预测的两个主要问题。在本文中,我们提出了一种名为SelectRUSBoost的新技术,这是一种集成学习的形式,它结合了数据采样来缓解类失衡和特征选择来解决高维问题。为了评估新技术的有效性,我们将其应用于软件缺陷预测背景下的一组数据集。我们使用了两个分类学习器和六种特征选择技术。我们将该技术与特征选择和数据采样一起使用的方法以及单独使用特征选择(根本不使用采样)的情况进行了比较。实验结果表明,与其他方法相比,SelectRUSBoost技术在提高分类性能方面更有效。
{"title":"A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction","authors":"Kehan Gao, T. Khoshgoftaar, Amri Napolitano","doi":"10.1109/ICMLA.2012.145","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.145","url":null,"abstract":"High dimensionality and class imbalance are the two main problems affecting many software defect prediction. In this paper, we propose a new technique, named SelectRUSBoost, which is a form of ensemble learning that in-corporates data sampling to alleviate class imbalance and feature selection to resolve high dimensionality. To evaluate the effectiveness of the new technique, we apply it to a group of datasets in the context of software defect prediction. We employ two classification learners and six feature selection techniques. We compare the technique to the approach where feature selection and data sampling are used together, as well as the case where feature selection is used alone (no sampling used at all). The experimental results demonstrate that the SelectRUSBoost technique is more effective in improving classification performance compared to the other approaches.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114617622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2012 11th International Conference on Machine Learning and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1