2.5.1 Word-Rank

View Post

Word-rank is one implementation of weighted graph ranking algorithm including undirected weighted (on edges) graph and directed weighted (on edges) graph when a single word/term is considered as a vertex and all content is a graph. A window size parameter ‘w’ is introduced for implementing connection among vertices. In undirected weighted graph, each word has connection with other words only in the window size distance, including previous w words and following w words. In directed weighted graph, each word has connection with the following words only in the window size distance. Take Figure2.14 as an example and set window size to 2, ‘packing’ has connections with ‘ferocious’, ‘storm’, ‘freezing’ and ‘rain’ in undirected weighted graph, while it only has connections with ‘freezing’ and ‘rain’ in directed weighted graph. The score associated with each vertex is set to an initial value of 1 and ranking algorithm, 2-6 for undirected weighted graph and 2-7 for directed weighted graph, is run on graph repeatedly until it converges – usually for 20-30 iterations, at a threshold of 0.0001 ^[9].

Figure2.14

The expected end result for this application is a set of words or phrases that are representative for a natural language text. The terms to be ranked are therefore sequences of one or more lexical units extracted from text, and these represent the vertices that are added to the text graph. If more than 1 term happened to be neighbors, they can be connected as a key phrase. Thus, the language consistency in the content is preserved. Rada and Paul in their paper “TextRank: Bring Order into Texts” gave a clear view of this passage “Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types.” and extracted the following terms as results: “linear constraints”, “linear Diophantine equations”, “natural numbers”, “nonstrict inequations”, “strict inequations” and “uper bounds”^[9].

posted on 2009-06-18 02:54 JosephQuinn 阅读(212) 评论(0) 编辑收藏所属分类: My Master-degree Project

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理
相关文章: Appendix B 5 Conclusion 4.7 Sentence Rank on Yahoo News Page 4.6 Sentence Rankv 4.5 Random pick sentence 4.4 Word Rank 4.3 Google search tips: meta keys and meta description 4.2 Title 4.1 The basics 3.5 Deep Web Search Engine

Avenue U

常用链接

留言簿

随笔分类

随笔档案

Core Java

最新随笔

搜索

最新评论

阅读排行榜

评论排行榜

View Post

2.5.1 Word-Rank