胡闯,杨庚,白云璐.面向差分隐私保护的聚类算法[J].计算机科学,2019,46(2):120-126
面向差分隐私保护的聚类算法
Clustering Algorithm in Differential Privacy Preserving
投稿时间:2018-01-29  修订日期:2018-04-19
DOI:
中文关键词:  差分隐私,k-均值,聚类算法,隐私保护
英文关键词:Differential privacy,k-means,Clustering algorithms,Privacy preserving
基金项目:本文受国家自然科学基金项目(61572263),江苏省自然科学基金政策引导类计划——前瞻性联合研究项目(2016ZS04)资助
作者单位E-mail
胡闯 南京邮电大学计算机学院 南京210003
江苏省大数据安全与智能处理重点实验室 南京 210023 
 
杨庚 南京邮电大学计算机学院 南京210003
江苏省大数据安全与智能处理重点实验室 南京 210023 
yangg@njupt.edu.cn 
白云璐 南京邮电大学计算机学院 南京210003
南京中医药大学信息技术学院 南京210023 
 
摘要点击次数: 0
全文下载次数: 0
中文摘要:
      大数据时代的数据挖掘技术在研究和应用等领域取得了较大发展,但大量敏感信息披露给用户带来了众多威胁和损失。因此,在聚类分析过程中如何保护数据隐私成为数据挖掘和数据隐私保护领域的热点问题。传统差分隐私保护k-means算法对其初始中心点的选择较为敏感,而且在聚簇个数k值的选择上存在一定的盲目性,降低了聚类结果的可用性。为了进一步提高差分隐私k-means聚类方法聚类结果的可用性,研究并提出一种新的基于差分隐私的DPk-means-up聚类算法,同时进行了理论分析和比较实验。理论分析表明,该算法满足ε-差分隐私,可适用于不同规模和不同维度的数据集。此外,实验结果表明,在相同隐私保护级别下,与其他差分隐私k-means聚类方法相比,所提算法有效提高了聚类的可用性。
英文摘要:
      Data mining has made great progress in the field of research and application of big data,but sensitive information disclosure could bring users many threats and losses.Therefore,how to protect data privacy in clustering analysis has become a hot issue in data mining and data privacy protection.Traditional differential privacy k-means is sensitive to the selection of its initial centers,and it has a certain blindness in the selection of cluster number k,which reduces the availability of clustering results.To improve the availability of clustering results of differential privacy k-means clustering,this paper presented a new DPk-means-up clustering algorithm based on differential privacy and carried out theoretical analysis and comparison experiment.Theoretical analysis shows that the algorithm satisfies ε-differential privacy,and can be applied to data sets with different sizes and dimensions.In addition,experimental results indicate that the proposed algorithm improves clustering availability than other differential privacy k-means clustering methods at the same level of privacy preserve.
查看全文  查看/发表评论  下载PDF阅读器