Avenue U

posts(42) comments(0) trackbacks(0)
  • BlogJava
  • 联系
  • RSS 2.0 Feed 聚合
  • 管理

常用链接

  • 我的随笔
  • 我的评论
  • 我的参与

留言簿

  • 给我留言
  • 查看公开留言
  • 查看私人留言

随笔分类

  • C++(1)
  • Core Java(2)
  • My Master-degree Project(33)
  • SSH(4)
  • struts2(1)

随笔档案

  • 2009年7月 (1)
  • 2009年6月 (41)

Core Java

最新随笔

  • 1. String Stream in C++
  • 2. Validators in Struts2
  • 3. An Interceptor Example in Strut2-Spring-Hibernate Application
  • 4. 3 Validators in Struts2-Spring-Hibernate
  • 5. Strut2-Spring-Hibernate under Lomboz Eclipse3.3
  • 6. Run Spring by Maven2 in Vista
  • 7. Appendix B
  • 8. 5 Conclusion
  • 9. 4.7 Sentence Rank on Yahoo News Page
  • 10. 4.6 Sentence Rankv

搜索

  •  

最新评论

阅读排行榜

评论排行榜

View Post

4.4 Word Rank

Xiaojun Wang’s paper included word rank on all words in a document, all nouns and verbs in a document, all noun and adjectives in a document, 2 combined word rank and DF: WordRank3DF2 and WordRank4DF1. In this project, tests target on all words’ word rank, nouns and verbs word rank, but nouns and verbs word rank is not included. Word rank based on direct weighted graph is not included. There are WordRank3DF2, WordRank4DF1 and WordRank5DF5, meanwhile, TF is taken into consideration, 3 groups are added: WordRank3TFIDF2, WordRank4TFIDF1 and WordRank5TFIDF5.

WordRank

Google

Yahoo

3

47.00

20.89%

46.00

20.44%

4

73.00

32.44%

68.00

30.22%

5

93.00

41.33%

88.00

39.11%

6

99.00

44.00%

98.00

43.56%

7

119.00

52.89%

119.00

52.89%

8

133.00

59.11%

119.00

52.89%

9

145.00

64.44%

127.00

56.44%

10

152.00

67.56%

133.00

59.11%

11

149.00

66.22%

129.00

57.33%

12

155.00

68.89%

129.00

57.33%

13

155.00

68.89%

129.00

57.33%

14

156.00

69.33%

135.00

60.00%

15

156.00

69.33%

130.00

57.78%

Average

125.54

55.79%

111.54

49.57%

Table4.16

 

(a)                                                                                                  (b)

Figure4.17 success retrieved pages’ counts per 225 pages and corresponding percentage value by undirected graph word rank.

NounsVerbs

Google

Yahoo

3

29.00

12.89%

24.00

10.67%

4

52.00

23.11%

51.00

22.67%

5

72.00

32.00%

67.00

29.78%

6

85.00

37.78%

82.00

36.44%

7

100.00

44.44%

108.00

48.00%

8

110.00

48.89%

109.00

48.44%

9

125.00

55.56%

117.00

52.00%

10

129.00

57.33%

120.00

53.33%

11

134.00

59.56%

121.00

53.78%

12

134.00

59.56%

121.00

53.78%

13

138.00

61.33%

130.00

57.78%

14

136.00

60.44%

123.00

54.67%

15

140.00

62.22%

130.00

57.78%

Average

106.46

47.32%

100.23

44.55%

Table4.17

 

(a)                                                                                        (b)

Figure4.18 success retrieved pages’ counts per 225 pages and corresponding percentage value by undirected graph nouns and verbs rank.

WR3DF2

Google

Yahoo

3

128.00

56.89%

122.00

54.22%

4

151.00

67.11%

139.00

61.78%

5

165.00

73.33%

148.00

65.78%

6

168.00

74.67%

145.00

64.44%

7

170.00

75.56%

141.00

62.67%

8

169.00

75.11%

141.00

62.67%

9

167.00

74.22%

136.00

60.44%

10

165.00

73.33%

127.00

56.44%

11

165.00

73.33%

125.00

55.56%

12

165.00

73.33%

131.00

58.22%

13

163.00

72.44%

131.00

58.22%

14

160.00

71.11%

132.00

58.67%

15

161.00

71.56%

134.00

59.56%

Average

161.31

71.69%

134.77

59.90%

Table4.18

 

(a)                                                                               (b)

Figure4.19 success retrieved pages’ counts per 225 pages and corresponding percentage value by undirected graph WordRank3DF2.

WR4DF1

Google

Yahoo

3

47.00

20.89%

47.00

20.89%

4

73.00

32.44%

68.00

30.22%

5

150.00

66.67%

139.00

61.78%

6

155.00

68.89%

130.00

57.78%

7

159.00

70.67%

135.00

60.00%

8

161.00

71.56%

129.00

57.33%

9

165.00

73.33%

132.00

58.67%

10

167.00

74.22%

138.00

61.33%

11

169.00

75.11%

144.00

64.00%

12

170.00

75.56%

148.00

65.78%

13

170.00

75.56%

154.00

68.44%

14

171.00

76.00%

149.00

66.22%

15

172.00

76.44%

148.00

65.78%

Average

148.38

65.95%

127.77

56.79%

Table4.19

 

(a)                                                                                                 (b)

Figure4.20 success retrieved pages’ counts per 225 pages and corresponding percentage value by undirected graph WordRank4DF1.

WR5DF5

Google

Yahoo

3

128.00

56.89%

121.00

53.78%

4

151.00

67.11%

141.00

62.67%

5

157.00

69.78%

140.00

62.22%

6

168.00

74.67%

144.00

64.00%

7

167.00

74.22%

142.00

63.11%

8

169.00

75.11%

140.00

62.22%

9

168.00

74.67%

142.00

63.11%

10

167.00

74.22%

134.00

59.56%

11

165.00

73.33%

142.00

63.11%

12

163.00

72.44%

131.00

58.22%

13

163.00

72.44%

137.00

60.89%

14

163.00

72.44%

131.00

58.22%

15

163.00

72.44%

131.00

58.22%

Average

160.92308

71.52%

136.6154

60.72%

Table4.20

 

(a)                                                                               (b)

Figure4.21 success retrieved pages’ counts per 225 pages and corresponding percentage value by undirected graph WordRank5DF5.

WR3TFIDF2

Google

Yahoo

3

38.00

16.89%

38.00

16.89%

4

55.00

24.44%

47.00

20.89%

5

85.00

37.78%

86.00

38.22%

6

91.00

40.44%

85.00

37.78%

7

120.00

53.33%

92.00

40.89%

8

124.00

55.11%

99.00

44.00%

9

139.00

61.78%

111.00

49.33%

10

146.00

64.89%

114.00

50.67%

11

145.00

64.44%

119.00

52.89%

12

155.00

68.89%

129.00

57.33%

13

152.00

67.56%

123.00

54.67%

14

159.00

70.67%

136.00

60.44%

15

160.00

71.11%

131.00

58.22%

Average

120.69231

53.64%

100.7692

44.79%

Table4.21

 

(a)                                                                               (b)

Figure4.22 success retrieved pages’ counts per 225 pages and corresponding percentage value by undirected graph WordRank3TFIDF2.

WR4TFIDF1

Google

Yahoo

3

48.00

21.33%

48.00

21.33%

4

69.00

30.67%

64.00

28.44%

5

82.00

36.44%

72.00

32.00%

6

96.00

42.67%

84.00

37.33%

7

110.00

48.89%

102.00

45.33%

8

126.00

56.00%

111.00

49.33%

9

137.00

60.89%

125.00

55.56%

10

143.00

63.56%

128.00

56.89%

11

156.00

69.33%

136.00

60.44%

12

158.00

70.22%

139.00

61.78%

13

160.00

71.11%

131.00

58.22%

14

161.00

71.56%

129.00

57.33%

15

159.00

70.67%

132.00

58.67%

Average

123.46

54.87%

107.77

47.90%

Table4.22

 

(a)                                                                                        (b)

Figure4.23 success retrieved pages’ counts per 225 pages and corresponding percentage value by undirected graph WordRank4TFIDF1.

WR5TFIDF5

Google

Yahoo

3

38.00

16.89%

38.00

16.89%

4

55.00

24.44%

47.00

20.89%

5

78.00

34.67%

67.00

29.78%

6

92.00

40.89%

85.00

37.78%

7

105.00

46.67%

97.00

43.11%

8

124.00

55.11%

99.00

44.00%

9

128.00

56.89%

106.00

47.11%

10

139.00

61.78%

107.00

47.56%

11

144.00

64.00%

114.00

50.67%

12

143.00

63.56%

122.00

54.22%

13

146.00

64.89%

124.00

55.11%

14

155.00

68.89%

129.00

57.33%

15

156.00

69.33%

123.00

54.67%

Average

115.62

51.38%

96.77

43.01%

Table4.23

 

(a)                                                                               (b)

Figure4.24 success retrieved pages’ counts per 225 pages and corresponding percentage value by undirected graph WordRank5TFIDF5.

The above charts show the following 2 facts:

1.       The start up success retrieval rate increases when the DF takes more parts in WordRankxDFy or WordRankxTFIDFy. This acts very similarly as the basic methods in section 4.1, like TFxDFy or TFIDFxDFy.

2.       After exceeding 10 words in a query, the success retrieval rates tend to be flat and stable. This also acts very similarly like the basic methods.

Figure4.25 all word rank related methods comparison

The average success retrieve percentage rates are shown in Figure4.25. WR3DF2 and WR5DF5 show the 2 best results than other word rank related methods. Google’s results are more than 70% and Yahoo’s results are more than 60%.

posted on 2009-06-18 11:44 JosephQuinn 阅读(397) 评论(0)  编辑  收藏 所属分类: My Master-degree Project

新用户注册  刷新评论列表  

只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问   管理
相关文章:
  • Appendix B
  • 5 Conclusion
  • 4.7 Sentence Rank on Yahoo News Page
  • 4.6 Sentence Rankv
  • 4.5 Random pick sentence
  • 4.4 Word Rank
  • 4.3 Google search tips: meta keys and meta description
  • 4.2 Title
  • 4.1 The basics
  • 3.5 Deep Web Search Engine
 
 
Powered by:
BlogJava
Copyright © JosephQuinn