郑斐然 苗夺谦 张志飞 高灿.一种中文微博新闻话题检测的方法[J].计算机科学,2012,39(1):138-141
一种中文微博新闻话题检测的方法
News Topic Detection Approach on Chinese Microblog
  
DOI:
中文关键词:  微博,新闻,话题检测,聚类
英文关键词:Microblog,News,Topic detection,Clustering
基金项目:
作者单位
郑斐然 苗夺谦 张志飞 高灿 (同济大学计算机科学与技术系 上海201804) (同济大学嵌入式系统与服务计算教育部重点实验室 上海201804) 
摘要点击次数: 6307
全文下载次数: 2024
中文摘要:
      微博的迅猛发展带来了另一种社会化的新闻媒体形式。提出一种从微博中挖掘新闻话题的方法,即在线检测微博消息中大量突现的关键字,并将它们进行聚类,从而找到新闻话题。为了提取出新闻主题词,综合考虑短文本中的词频和增长速度而构造复合权值,用以量化词语是新闻词汇的程度;在话题构造中使用了上下文的相关度模型来支撑增量式聚类算法,相比于语义相似度模型,其更能适应该问题的特点。在真实的微博数据上运行的实验表明,本方法可以有效地从大量消息中检测出新闻话题。
英文摘要:
      The popularity of microblogging brings another form of social news media. The paper proposed an approach of news topics mining from microblog. News topics were formed by finding the emerging keywords in large numbers and clustering them. To extract news keywords,a compound weight was introduced combining the word frequency and the growth, to measure the likelihood of a word to be a news keyword, and to construct the topic, contextual relevance model was used to support incremental clustering, which is more suitable to the problem compared with semantic similarity. The experiments on real world microblog data show the effectiveness of the approach to detect news topic out of massroc mcssagcs.
查看全文  查看/发表评论  下载PDF阅读器