关晓蔷,庞继芳,梁吉业.基于类别随机化的随机森林算法[J].计算机科学,2019,46(2):196-201
基于类别随机化的随机森林算法
Randomization of Classes Based Random Forest Algorithm
投稿时间:2018-09-07  修订日期:2018-11-23
DOI:
中文关键词:  随机森林,多分类问题,类别随机化,多样性
英文关键词:Random forest,Multi-class classification problems,Randomization of classes,Diversity
基金项目:本文受国家自然科学基金项目(61876103),山西省青年科技基金项目(201701D221098),山西省重点研发项目(201603D111014),山西省留学基金项目(2016-003)资助
作者单位E-mail
关晓蔷 山西大学计算机与信息技术学院 太原030006 山西大学计算智能与中文信息处理教育部重点实验室 太原030006 gxq0079@sxu.edu.cn 
庞继芳 山西大学计算机与信息技术学院 太原030006 山西大学计算智能与中文信息处理教育部重点实验室 太原030006  
梁吉业 山西大学计算机与信息技术学院 太原030006 山西大学计算智能与中文信息处理教育部重点实验室 太原030006 ljy@sxu.edu.cn 
摘要点击次数: 0
全文下载次数: 0
中文摘要:
      随机森林是数据挖掘和机器学习领域中一种常用的分类方法,已成为国内外学者共同关注的研究热点,并被广泛应用到各种实际问题中。传统的随机森林方法没有考虑类别个数对分类效果的影响,忽略了基分类器和类别之间的关联性,导致随机森林在处理多分类问题时的性能受到限制。为了更好地解决该问题,结合多分类问题的特点,提出一种基于类别随机化的随机森林算法(RCRF)。从类别的角度出发,在随机森林两种传统随机化的基础上增加类别随机化,为不同类别设计具有不同侧重点的基分类器。由于不同的分类器侧重区分的类别不同,所生成的决策树的结构也不同,这样既能够保证单个基分类器的性能,又可以进一步增大基分类器的多样性。为了验证所提算法的有效性,在UCI数据库中的21个数据集上将RCRF与其他算法进行了比较分析。实验从两个方面进行,一方面,通过准确率、F1-measure和Kappa系数3个指标来验证RCRF算法的性能;另一方面,利用κ-误差图从多样性角度对各种算法进行对比与分析。实验结果表明,所提算法能够有效提升集成模型的整体性能,在处理多分类问题时具有明显优势。
英文摘要:
      Random forest is a commonly used classification method in the field of data mining and machine learning,which has become a research focus of scholars at home and abroad,and has been widely applied to various practical problems.The traditional random forest methods do not consider the influence of the number of classes on the classification effect,and neglect the correlation between base classifiers and classes,limiting the performance of the random forest in dealing with multi-class classification problems.In order to solve the problem better,combined with the characteristics of multi-class classification problem,this paper proposed a randomization of classes based random forest algorithm (RCRF).From the perspective of classes,the randomization of classes is added on the basis of two kinds of traditional randomizations of random forest,and the corresponding base classifiers with different emphasis are designed for diffe-rent classes.The structures of the decision tree generated by the base classifier are different because different classifiers focus on different classes,which can not only guarantee the performance of the single base classifier,but also further increase the diversity of base classifier.In order to verify the validity of the proposed algorithm,RCRF is compared with other algorithms on 21 data sets in UCI database.The experiment is carried out from two aspects.On the one hand,the accuracy,F1-measure and Kappa coefficient are used to verify the performance of RCRF algorithm.On the other hand,the κ-error diagram is used to compare and analyze various algorithms from the perspective of diversity.Experimental results show that the proposed algorithm can effectively improve the overall performance of the integrated model and has obvious advantages in dealing with multi-class classification problems. 〖BHDWG1,WK32,WK44,WK42W〗第2期 关晓蔷 ,等:基于类别随机化的随机森林算法
查看全文  查看/发表评论  下载PDF阅读器