余珊珊,苏锦钿,李鹏飞.基于改进的TextRank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247
基于改进的TextRank的自动摘要提取方法
Improved TextRank-based Method for Automatic Summarization
投稿时间:2016-01-20  修订日期:2016-03-20
DOI:10.11896/j.issn.1002-137X.2016.06.048
中文关键词:  中文文本,自动摘要提取,TextRank,篇章结构,无监督学习方法
英文关键词:Chinese texts,Automatic summarization extraction,TextRank,Article discourse,Unsupervised learning methods
基金项目:本文受广东省自然科学基金(2015A030310318),广东省医学科学技术研究基金项目(A2015065),国家自然科学基金资助
作者单位E-mail
余珊珊 广东药科大学医药信息工程学院 广州510006 susyu@139.com 
苏锦钿 华南理工大学计算机科学与工程学院 广州510640  
李鹏飞 华南理工大学计算机科学与工程学院 广州510640  
摘要点击次数: 3513
全文下载次数: 3664
中文摘要:
      经典的TextRank算法在文档的自动摘要提取时往往只考虑了句子节点间的相似性,而忽略了文档的篇章结构及句子的上下文信息。针对这些问题,结合中文文本的结构特点,提出一种改进后的iTextRank算法,通过将标题、段落、特殊句子、句子位置和长度等信息引入到TextRank网络图的构造中,给出改进后的句子相似度计算方法及权重调整因子,并将其应用于中文文本的自动摘要提取,同时分析了算法的时间复杂度。最后,实验证明iTextRank比经典的TextRank方法具有更高的准确率和更低的召回率。
英文摘要:
      The canonical TextRank usually only considers the similarity between sentences in the processes of automatic summarization and neglects the information of text structures and sentence contexts.To overcome these disadvantages,we proposed an improved method on the basis of TextRank,called iTextRank,by incorporating the structure information of Chinese texts.iTextRank takes some important contexts and semantic information into consideration,including titles,paragraphs,special sentences,positions and lengths of sentences,when building the network diagram of TextRank,computing the similarities of sentences and adjusting the weights of the nodes.We also applied iTextRank into the automatic summarization of Chinese texts and analyzed its time complexities.Finally,some experiments were done.The results prove that iTextRank has higher accuracy rate and lower recall rate compared with canonical TextRank.
查看全文  查看/发表评论  下载PDF阅读器