Avenue U

posts(42) comments(0) trackbacks(0)
  • BlogJava
  • 联系
  • RSS 2.0 Feed 聚合
  • 管理

常用链接

  • 我的随笔
  • 我的评论
  • 我的参与

留言簿

  • 给我留言
  • 查看公开留言
  • 查看私人留言

随笔分类

  • C++(1)
  • Core Java(2)
  • My Master-degree Project(33)
  • SSH(4)
  • struts2(1)

随笔档案

  • 2009年7月 (1)
  • 2009年6月 (41)

Core Java

最新随笔

  • 1. String Stream in C++
  • 2. Validators in Struts2
  • 3. An Interceptor Example in Strut2-Spring-Hibernate Application
  • 4. 3 Validators in Struts2-Spring-Hibernate
  • 5. Strut2-Spring-Hibernate under Lomboz Eclipse3.3
  • 6. Run Spring by Maven2 in Vista
  • 7. Appendix B
  • 8. 5 Conclusion
  • 9. 4.7 Sentence Rank on Yahoo News Page
  • 10. 4.6 Sentence Rankv

搜索

  •  

最新评论

阅读排行榜

评论排行榜

View Post

2.5.1 Word-Rank

Word-rank is one implementation of weighted graph ranking algorithm including undirected weighted (on edges) graph and directed weighted (on edges) graph when a single word/term is considered as a vertex and all content is a graph. A window size parameter ‘w’ is introduced for implementing connection among vertices. In undirected weighted graph, each word has connection with other words only in the window size distance, including previous w words and following w words. In directed weighted graph, each word has connection with the following words only in the window size distance. Take Figure2.14 as an example and set window size to 2, ‘packing’ has connections with ‘ferocious’, ‘storm’, ‘freezing’ and ‘rain’ in undirected weighted graph, while it only has connections with ‘freezing’ and ‘rain’ in directed weighted graph. The score associated with each vertex is set to an initial value of 1 and ranking algorithm, 2-6 for undirected weighted graph and 2-7 for directed weighted graph, is run on graph repeatedly until it converges – usually for 20-30 iterations, at a threshold of 0.0001 [9].

Figure2.14

The expected end result for this application is a set of words or phrases that are representative for a natural language text. The terms to be ranked are therefore sequences of one or more lexical units extracted from text, and these represent the vertices that are added to the text graph. If more than 1 term happened to be neighbors, they can be connected as a key phrase. Thus, the language consistency in the content is preserved. Rada and Paul in their paper “TextRank: Bring Order into Texts” gave a clear view of this passage “Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types.” and extracted the following terms as results: “linear constraints”, “linear Diophantine equations”, “natural numbers”, “nonstrict inequations”, “strict inequations” and “uper bounds” [9].

posted on 2009-06-18 02:54 JosephQuinn 阅读(212) 评论(0)  编辑  收藏 所属分类: My Master-degree Project

新用户注册  刷新评论列表  

只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问   管理
相关文章:
  • Appendix B
  • 5 Conclusion
  • 4.7 Sentence Rank on Yahoo News Page
  • 4.6 Sentence Rankv
  • 4.5 Random pick sentence
  • 4.4 Word Rank
  • 4.3 Google search tips: meta keys and meta description
  • 4.2 Title
  • 4.1 The basics
  • 3.5 Deep Web Search Engine
 
 
Powered by:
BlogJava
Copyright © JosephQuinn