Groovy高效编程——统计单词频率

在搜索引擎，语音识别等领域常会统计单词的出现频率，下面给出Groovy实现，打印出现频率最高的6个单词以及相应的出现次数:

def content   =
     """
    The Java Collections API is the basis   for   all the nice support that Groovy gives you
    through lists and maps. In fact, Groovy not only uses the same abstractions, it
    even works on the very same classes that make up the Java Collections API.
     """

def words = content.tokenize()

def wordFrequency = [:]

words.each {
    wordFrequency[it] = wordFrequency.get(it, 0 ) + 1
}

def wordList = wordFrequency.keySet().toList()

wordList.sort {wordFrequency[it]}

def result = ''

wordList[ - 1 .. - 6 ].each {
    result += it.padLeft( 12 ) + " : " + wordFrequency[it] + " \n "
}

println result

运行结果：

the: 5

Groovy: 2

that: 2
Collections: 2

Java: 2

same: 2

如果所要处理的文本比较复杂，可以使用Regex进行处理，顺便说一句，Groovy在语言级别支持Regex！

posted on 2007-02-01 23:31 山风小子阅读(4382) 评论(6) 编辑收藏所属分类: Groovy & Grails

常用链接

留言簿(71)

随笔分类

随笔档案

相册

Documentations

Groovy & Grails

友情链接

好友 & 邻居

最新随笔

搜索

积分与排名

最新评论

阅读排行榜

评论排行榜