崔凯,周斌,贾焰,梁政.一种基于LDA的在线主题演化挖掘模型[J].计算机科学,2010,37(11):156-159
一种基于LDA的在线主题演化挖掘模型
LDA-based Model for Online Topic Evolution Mining
投稿时间:2009-12-28  修订日期:2010-03-16
DOI:
中文关键词:  主题模型,LDA,演化,舆情
英文关键词:Topic model, LDA, Evolution, Public opinion
基金项目:本文受国家自然科学基金重点项目(60933005),面上项目(60873204)资助。
作者单位E-mail
崔凯,周斌,贾焰,梁政 (国防科学技术大学计算机学院 长沙410073) cuikai186@gmail.com 
摘要点击次数: 6859
全文下载次数: 1989
中文摘要:
      基于文本内容的隐含语义分析建立在线主题演化计算模型,通过追踪不同时间片内主题的变化趋势进行主题演化分析。将Latent Dirichlet Allocation(LDA)模型扩展到在线文本流,建立并实现了在线LDA模型;利用前一时间片的后验概率影响当前时间片的先验概率来维持主题间的连续性;根据改进的增量Gibbs算法进行推理,获取主题一词和文档一主题的概率分布,利用KullbackLeibler(KL)相对嫡来衡量主题之间的相似度,从而发现主题演化中的“主题遗传”和“主题变异”。实验结果表明,该模型能从互联网语料中找出主题的演化趋势,具有良好的效果。
英文摘要:
      A computational model for online topic evolution mining was established through a latent semantic analysis process on textual data. Topical evolutionary analysis was achieved by tracking the topic trends in different time-slices.In this paper, Latent Dirichlet Allocation (LDA) was extended to the context of online text streams, and an online LDA model was proposed and implemented as well. The main idea is to use the posterior of topirword distribution of each time-slice to influence the inference of the next time-slice, which also maintains the relevance between the topics. The topirword and document-topic distributions arc inferenced by incremental Gibbs algorithm. Kullback Leibler (KI)relative entropy is uesd to measure the similarity between topics in order to identify topic genetic and topic mutation. Experiments show that the proposed model can discover meaningful topical evolution trends both on English and Chinese corpus.
查看全文  查看/发表评论  下载PDF阅读器