﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-&lt;h3 style="font-family: Comic Sans MS"&gt;&lt;font color="#FA1A0A" size="10"&gt;︻┳═一Java&lt;/font&gt;&lt;/h3&gt;-随笔分类-Lucene</title><link>http://www.blogjava.net/rain1102/category/37646.html</link><description>&lt;br/&gt;&lt;font color="green" style="font-family: 华文行楷;font-size:16px;"&gt;子曰：危邦不入，乱邦不居。天下有道则见，无道则隐。&lt;/font&gt;&lt;font color="#3C1435"&gt;&lt;/font&gt;</description><language>zh-cn</language><lastBuildDate>Sun, 09 Aug 2009 17:08:31 GMT</lastBuildDate><pubDate>Sun, 09 Aug 2009 17:08:31 GMT</pubDate><ttl>60</ttl><item><title>当前几个主要的Lucene中文分词器的比较【转载】</title><link>http://www.blogjava.net/rain1102/archive/2009/08/09/290409.html</link><dc:creator>Eric.Zhou</dc:creator><author>Eric.Zhou</author><pubDate>Sun, 09 Aug 2009 02:15:00 GMT</pubDate><guid>http://www.blogjava.net/rain1102/archive/2009/08/09/290409.html</guid><wfw:comment>http://www.blogjava.net/rain1102/comments/290409.html</wfw:comment><comments>http://www.blogjava.net/rain1102/archive/2009/08/09/290409.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rain1102/comments/commentRss/290409.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rain1102/services/trackbacks/290409.html</trackback:ping><description><![CDATA[转载地址：http://www.javaeye.com/news/9637<br />
<p><strong>1. 基本介绍：</strong></p>
<p><a href="http://code.google.com/p/paoding/" target="_blank">paoding</a> ：Lucene中文分词&#8220;庖丁解牛&#8221; Paoding Analysis<br />
<a href="http://code.google.com/p/imdict-chinese-analyzer/" target="_blank">imdict</a> ：imdict智能词典所采用的智能中文分词程序<br />
<a href="http://code.google.com/p/mmseg4j/" target="_blank">mmseg4j</a> ： 用 Chih-Hao Tsai 的 <a href="http://technology.chtsai.org/mmseg/" target="_blank">MMSeg 算法</a> 实现的中文分词器<br />
<a href="http://code.google.com/p/ik-analyzer/" target="_blank">ik</a> ：采用了特有的&#8220;正向迭代最细粒度切分算法&#8220;，多子处理器分析模式</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>2. 开发者及开发活跃度：</strong></p>
<p><a href="http://code.google.com/p/paoding/" target="_blank">paoding</a> ：<a style="white-space: nowrap" href="http://code.google.com/u/qieqie.wang/" target="_blank">qieqie.wang</a>， google code 上最后一次代码提交：2008-06-12，svn 版本号 132<br />
<a href="http://code.google.com/p/imdict-chinese-analyzer/" target="_blank">imdict</a> ：<a href="http://code.google.com/p/imdict-chinese-analyzer/source/detail?r=2" target="_blank">XiaoPingGao</a>， 进入了 lucene contribute，lucene trunk 中 contrib/analyzers/smartcn/ 最后一次提交：2009-07-24，<br />
<a href="http://code.google.com/p/mmseg4j/" target="_blank">mmseg4j</a> ：<a style="white-space: nowrap" href="http://code.google.com/u/chenlb2008/" target="_blank">chenlb2008</a>，google code 中 2009-08-03 （昨天），版本号 57，log为：mmseg4j-1.7 创建分支<br />
<a href="http://code.google.com/p/ik-analyzer/" target="_blank">ik</a> ：<a style="white-space: nowrap" href="http://code.google.com/u/linliangyi2005/" target="_blank">linliangyi2005</a>，google code 中 2009-07-31，版本号 41</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>3. 用户自定义词库：</strong></p>
<p><a href="http://code.google.com/p/paoding/" target="_blank">paoding</a> ：支持不限制个数的用户自定义词库，纯文本格式，一行一词，使用后台线程检测词库的更新，自动编译更新过的词库到二进制版本，并加载<br />
<a href="http://code.google.com/p/imdict-chinese-analyzer/" target="_blank">imdict</a> ：暂时不支持用户自定义词库。但 原版 <a title="中科院中文分词系统" href="http://ictclas.org/" target="_blank">ICTCLAS</a> 支持。支持用户自定义 stop words<br />
<a href="http://code.google.com/p/mmseg4j/" target="_blank">mmseg4j</a> ：自带sogou词库，支持名为 wordsxxx.dic， utf8文本格式的用户自定义词库，一行一词。不支持自动检测。 -Dmmseg.dic.path<br />
<a href="http://code.google.com/p/ik-analyzer/" target="_blank">ik</a> ： 支持api级的用户词库加载，和配置级的词库文件指定，无 BOM 的 UTF-8 编码，\r\n 分割。不支持自动检测。</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>4. 速度（基于官方介绍，非自己测试）</strong></p>
<p><a href="http://code.google.com/p/paoding/" target="_blank">paoding</a> ：在PIII 1G内存个人机器上，<strong>1秒</strong> 可准确分词 <strong>100万</strong> 汉字<br />
<a href="http://code.google.com/p/imdict-chinese-analyzer/" target="_blank">imdict</a> ：<strong>483.64</strong> (字节/秒)，<strong>259517</strong>(汉字/秒)<br />
<a href="http://code.google.com/p/mmseg4j/" target="_blank">mmseg4j</a> ： complex 1200kb/s左右, simple 1900kb/s左右<br />
<a href="http://code.google.com/p/ik-analyzer/" target="_blank">ik</a> ：具有50万字/秒的高速处理能力</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>5. 算法和代码复杂度</strong></p>
<p><a href="http://code.google.com/p/paoding/" target="_blank">paoding</a> ：svn src 目录一共1.3M，6个properties文件，48个java文件，6895 行。使用不用的 Knife 切不同类型的流，不算很复杂。<br />
<a href="http://code.google.com/p/imdict-chinese-analyzer/" target="_blank">imdict</a> ：词库 6.7M（这个词库是必须的），src 目录 152k，20个java文件，2399行。使用 <a title="中科院中文分词系统" href="http://ictclas.org/" target="_blank">ICTCLAS</a> HHMM隐马尔科夫模型，&#8220;利用大量语料库的训练来统计汉语词汇的词频和跳转概率，从而根据这些统计结果对整个汉语句子计算最似然(likelihood)的切分&#8221;<br />
<a href="http://code.google.com/p/mmseg4j/" target="_blank">mmseg4j</a> ： svn src 目录一共 132k，23个java文件，2089行。<a href="http://technology.chtsai.org/mmseg/" target="_blank">MMSeg 算法</a> ，有点复杂。<br />
<a href="http://code.google.com/p/ik-analyzer/" target="_blank">ik</a> ： svn src 目录一共6.6M(词典文件也在里面)，22个java文件，4217行。多子处理器分析，跟paoding类似，歧义分析算法还没有弄明白。</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>6. 文档</strong></p>
<p><a href="http://code.google.com/p/paoding/" target="_blank">paoding</a> ：几乎无。代码里有一些注释，但因为实现比较复杂，读代码还是有一些难度的。<br />
<a href="http://code.google.com/p/imdict-chinese-analyzer/" target="_blank">imdict</a> ： 几乎无。 <a title="中科院中文分词系统" href="http://ictclas.org/" target="_blank">ICTCLAS</a> 也没有详细的文档，HHMM隐马尔科夫模型的数学性太强，不太好理解。<br />
<a href="http://code.google.com/p/mmseg4j/" target="_blank">mmseg4j</a> ： <a href="http://technology.chtsai.org/mmseg/" target="_blank">MMSeg 算法</a> 是英文的，但原理比较简单。实现也比较清晰。<br />
<a href="http://code.google.com/p/ik-analyzer/" target="_blank">ik</a> ： 有一个pdf使用手册，里面有使用示例和配置说明。</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>7. 其它</strong></p>
<p><a href="http://code.google.com/p/paoding/" target="_blank">paoding</a> ：引入隐喻，设计比较合理。search 1.0 版本就用的这个。主要优势在于原生支持词库更新检测。主要劣势为作者已经不更新甚至不维护了。<br />
<a href="http://code.google.com/p/imdict-chinese-analyzer/" target="_blank">imdict</a> ：进入了 lucene trunk，原版 ictclas 在各种评测中都有不错的表现，有坚实的理论基础，不是个人山寨。缺点为暂时不支持用户词库。<br />
<a href="http://code.google.com/p/mmseg4j/" target="_blank">mmseg4j</a> ： 在complex基础上实现了最多分词(max-word)，但是还不成熟，还有很多需要改进的地方。<br />
<a href="http://code.google.com/p/ik-analyzer/" target="_blank">ik</a> ：&nbsp; 针对Lucene全文检索优化的查询分析器IKQueryParser</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>8. 结论</strong></p>
<p>个人觉得，可以在 mmseg4j 和 paoding 中选一个。关于这两个分词效果的对比，可以参考：</p>
<p><a title="mmseg4j与paoding分词效果比较" href="http://blog.chenlb.com/2009/04/mmseg4j-max-word-segment-compare-with-paoding-in-effect.html" target="_blank">http://blog.chenlb.com/2009/04/mmseg4j-max-word-segment-compare-with-paoding-in-effect.html</a></p>
<p>或者自己再包装一下，将 paoding 的词库更新检测做一个单独的模块实现，然后就可以在所有基于词库的分词算法之间无缝切换了。</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>ps</strong>，对不同的 field 使用不同的分词器是一个可以考虑的方法。比如 tag 字段，就应该使用一个最简单的分词器，按空格分词就可以了。</p><img src ="http://www.blogjava.net/rain1102/aggbug/290409.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rain1102/" target="_blank">Eric.Zhou</a> 2009-08-09 10:15 <a href="http://www.blogjava.net/rain1102/archive/2009/08/09/290409.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Lucene全文检索小试</title><link>http://www.blogjava.net/rain1102/archive/2007/01/29/96436.html</link><dc:creator>Eric.Zhou</dc:creator><author>Eric.Zhou</author><pubDate>Mon, 29 Jan 2007 01:57:00 GMT</pubDate><guid>http://www.blogjava.net/rain1102/archive/2007/01/29/96436.html</guid><wfw:comment>http://www.blogjava.net/rain1102/comments/96436.html</wfw:comment><comments>http://www.blogjava.net/rain1102/archive/2007/01/29/96436.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rain1102/comments/commentRss/96436.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rain1102/services/trackbacks/96436.html</trackback:ping><description><![CDATA[<p>
				<strong>
						<font color="#006400">HTML 解析器</font>
				</strong>
				<br />
				<strong>
						<font color="#000000">package com.rain.util;</font>
				</strong>
		</p>
		<p>
				<strong>
						<font color="#000000">import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.FileInputStream;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.FileNotFoundException;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.IOException;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.InputStream;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.InputStreamReader;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.Reader;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.UnsupportedEncodingException;</font>
				</strong>
		</p>
		<p>
				<strong>
						<font color="#000000">import org.apache.lucene.demo.html.HTMLParser;</font>
				</strong>
		</p>
		<p>
				<strong>
						<font color="#000000">public class HTMLDocParser {</font>
				</strong>
		</p>
		<p>
				<font color="#000000">
						<strong>&#160;private String htmlPath;<br />&#160;private HTMLParser htmlParser;<br />&#160;<br />&#160;public HTMLDocParser(String htmlPath){<br />&#160;&#160;this.htmlPath=htmlPath;<br />&#160;&#160;initHtmlParser();<br />&#160;}<br />&#160;public void initHtmlParser(){<br />&#160;&#160;InputStream inputStream=null;<br />&#160;&#160;try{<br />&#160;&#160;&#160;inputStream=new FileInputStream(htmlPath);<br />&#160;&#160;}catch(FileNotFoundException e){<br />&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;}<br />&#160;&#160;if(null!=inputStream){<br />&#160;&#160;&#160;try{<br />&#160;&#160;&#160;&#160;htmlParser=new HTMLParser(new InputStreamReader(inputStream,"utf-8"));<br />&#160;&#160;&#160;}catch(UnsupportedEncodingException e){<br />&#160;&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;&#160;}<br />&#160;&#160;}<br />&#160;}<br />&#160;public String getTitle(){<br />&#160;&#160;if(null!=htmlParser){<br />&#160;&#160;&#160;try{<br />&#160;&#160;&#160;&#160;return htmlParser.getTitle();<br />&#160;&#160;&#160;}catch(IOException e){<br />&#160;&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;&#160;}catch(InterruptedException e){<br />&#160;&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;&#160;}<br />&#160;&#160;}<br />&#160;&#160;return "";<br />&#160;}<br />&#160;public Reader getContent(){<br />&#160;&#160;if(null!=htmlParser){<br />&#160;&#160;&#160;try{<br />&#160;&#160;&#160;&#160;return htmlParser.getReader();<br />&#160;&#160;&#160;}catch(IOException e){<br />&#160;&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;&#160;}<br />&#160;&#160;}<br />&#160;&#160;return null;<br />&#160;}<br />&#160;public String getPath(){<br />&#160;&#160;return this.htmlPath;<br />&#160;}<br />}<br /></strong>
				</font>
		</p>
		<p>
		</p>
		<hr />
		<p>
		</p>
		<p>
				<font style="BACKGROUND-COLOR: #ffffff" color="#006400">描述搜索结果的结构实体Bean<br /><font color="#000000">package com.rain.search;</font></font>
		</p>
		<p>
				<font style="BACKGROUND-COLOR: #ffffff" color="#000000">public class SearchResultBean {<br />&#160;&#160;&#160; private String htmlPath;<br />&#160;&#160;&#160; <br />&#160;&#160;&#160; private String htmlTitle;</font>
		</p>
		<p>
				<font style="BACKGROUND-COLOR: #ffffff" color="#000000">&#160;public String getHtmlPath() {<br />&#160;&#160;return htmlPath;<br />&#160;}</font>
		</p>
		<p>
				<font style="BACKGROUND-COLOR: #ffffff" color="#000000">&#160;public void setHtmlPath(String htmlPath) {<br />&#160;&#160;this.htmlPath = htmlPath;<br />&#160;}</font>
		</p>
		<p>
				<font style="BACKGROUND-COLOR: #ffffff" color="#000000">&#160;public String getHtmlTitle() {<br />&#160;&#160;return htmlTitle;<br />&#160;}</font>
		</p>
		<p>
				<font style="BACKGROUND-COLOR: #ffffff" color="#006400">
						<font color="#000000">&#160;public void setHtmlTitle(String htmlTitle) {<br />&#160;&#160;this.htmlTitle = htmlTitle;<br />&#160;}<br />}</font>
						<br />
				</font>
		</p>
		<p>
		</p>
		<hr />
		<p>
		</p>
		<p>
				<font color="#000000">
						<font color="#006400">索引子系统的实现</font>
						<br />
						<br />package com.rain.index;</font>
		</p>
		<p>
				<font color="#000000">import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.File;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.IOException;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.Reader;</font>
		</p>
		<p>
				<font color="#000000">import org.apache.lucene.analysis.Analyzer;<br />import org.apache.lucene.analysis.standard.StandardAnalyzer;<br />import org.apache.lucene.document.Document;<br />import org.apache.lucene.index.IndexWriter;<br />import org.apache.lucene.store.Directory;<br />import org.apache.lucene.store.FSDirectory;<br />import org.apache.lucene.document.Field;</font>
		</p>
		<p>
				<font color="#000000">import com.rain.util.HTMLDocParser;</font>
		</p>
		<p>
				<font color="#000000">public class IndexManager {<br />&#160;<br />&#160;//the directory that stores HTML files<br />&#160;private final String dataDir="E:\\dataDir";<br />&#160;<br />&#160;//the directory that is used to store a Lucene index<br />&#160;private final String indexDir="E:\\indexDir";<br />&#160;<br />&#160;public boolean creatIndex()throws IOException{<br />&#160;&#160;if(true==inIndexExist()){<br />&#160;&#160;&#160;return true;<br />&#160;&#160;}<br />&#160;&#160;File dir=new File(dataDir);<br />&#160;&#160;if(!dir.exists()){<br />&#160;&#160;&#160;return false;<br />&#160;&#160;}<br />&#160;&#160;File[] htmls=dir.listFiles();<br />&#160;&#160;Directory fsDirectory=FSDirectory.getDirectory(indexDir,true);<br />&#160;&#160;Analyzer analyzer=new StandardAnalyzer();<br />&#160;&#160;IndexWriter indexWriter=new IndexWriter(fsDirectory,analyzer,true);<br />&#160;&#160;for(int i=0;i&lt;htmls.length;i++){<br />&#160;&#160;&#160;String htmlPath=htmls[i].getAbsolutePath();<br />&#160;&#160;&#160;if(htmlPath.endsWith(".html")||htmlPath.endsWith("htm")){<br />&#160;&#160;&#160;&#160;addDocument(htmlPath,indexWriter);<br />&#160;&#160;&#160;}<br />&#160;&#160;}<br />&#160;&#160;indexWriter.optimize();<br />&#160;&#160;indexWriter.close();<br />&#160;&#160;return true;<br />&#160;}<br />&#160;<br />&#160;public void addDocument(String htmlPath,IndexWriter indexWriter){<br />&#160;&#160;HTMLDocParser htmlParser=new HTMLDocParser(htmlPath);<br />&#160;&#160;String path=htmlParser.getPath();<br />&#160;&#160;String title=htmlParser.getTitle();<br />&#160;&#160;Reader content=htmlParser.getContent();<br />&#160;&#160;<br />&#160;&#160;Document document=new Document();<br />&#160;&#160;document.add(new Field("path",path,Field.Store.YES,Field.Index.NO));<br />&#160;&#160;document.add(new Field("title",title,Field.Store.YES,Field.Index.TOKENIZED));<br />&#160;&#160;&#160;&#160; document.add(new Field("content",content));<br />&#160;&#160;&#160;&#160; try{<br />&#160;&#160;&#160;&#160; &#160;indexWriter.addDocument(document);<br />&#160;&#160;&#160;&#160; }catch(IOException e){<br />&#160;&#160;&#160;&#160; &#160;e.printStackTrace();<br />&#160;&#160;&#160;&#160; }<br />&#160;}<br />&#160;public String getDataDir(){<br />&#160;&#160;return this.dataDir;<br />&#160;}<br />&#160;<br />&#160;public String getIndexDir(){<br />&#160;&#160;return this.indexDir;<br />&#160;}<br />&#160;<br />&#160;public boolean inIndexExist(){<br />&#160;&#160;File directory=new File(indexDir);<br />&#160;&#160;if(0&lt;directory.listFiles().length){<br />&#160;&#160;&#160;return true;<br />&#160;&#160;}else{<br />&#160;&#160;&#160;return false;<br />&#160;&#160;}<br />&#160;}<br />}<br /></font>
		</p>
		<p>
		</p>
		<hr />
		<p>
		</p>
		<p>搜索功能的实现<br /><font color="#000000">package com.rain.search;</font></p>
		<p>
				<font color="#000000">import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.IOException;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.util.ArrayList;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.util.List;</font>
		</p>
		<p>
				<font color="#000000">import org.apache.lucene.analysis.Analyzer;<br />import org.apache.lucene.analysis.standard.StandardAnalyzer;<br />import org.apache.lucene.queryParser.ParseException;<br />import org.apache.lucene.queryParser.QueryParser;<br />import org.apache.lucene.search.Hits;<br />import org.apache.lucene.search.IndexSearcher;<br />import org.apache.lucene.search.Query;</font>
		</p>
		<p>
				<font color="#000000">import com.rain.index.IndexManager;</font>
		</p>
		<p>
				<font color="#000000">public class SearchManager {<br />&#160;private String searchWord;<br />&#160;private IndexManager indexManager;<br />&#160;private Analyzer analyzer;<br />&#160;<br />&#160;public SearchManager(String searchWord){<br />&#160;&#160;this.searchWord=searchWord;<br />&#160;&#160;this.indexManager=new IndexManager();<br />&#160;&#160;this.analyzer=new StandardAnalyzer();<br />&#160;}<br />&#160;<br />&#160;/**<br />&#160;&#160;&#160;&#160; * do search<br />&#160;&#160;&#160;&#160; */<br />&#160;public List search(){<br />&#160;&#160;List searchResult=new ArrayList();<br />&#160;&#160;if(false==indexManager.inIndexExist()){<br />&#160;&#160;&#160;try{<br />&#160;&#160;&#160;&#160;if(false==indexManager.creatIndex()){<br />&#160;&#160;&#160;&#160;&#160;return searchResult;<br />&#160;&#160;&#160;&#160;}<br />&#160;&#160;&#160;}catch(IOException e){<br />&#160;&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;&#160;&#160;return searchResult;<br />&#160;&#160;&#160;}<br />&#160;&#160;}<br />&#160;&#160;IndexSearcher indexSearcher=null;<br />&#160;&#160;try{<br />&#160;&#160;&#160;indexSearcher=new IndexSearcher(indexManager.getIndexDir());<br />&#160;&#160;}catch(IOException e){<br />&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;}<br />&#160;&#160;QueryParser queryParser=new QueryParser("content",analyzer);<br />&#160;&#160;Query query=null;<br />&#160;&#160;try{<br />&#160;&#160;&#160;query=queryParser.parse(searchWord);<br />&#160;&#160;}catch(ParseException e){<br />&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;}<br />&#160;&#160;if(null!=query&amp;&amp;null!=indexSearcher){<br />&#160;&#160;&#160;try{<br />&#160;&#160;&#160;&#160;Hits hits=indexSearcher.search(query);<br />&#160;&#160;&#160;&#160;for(int i=0;i&lt;hits.length();i++){<br />&#160;&#160;&#160;&#160;&#160;SearchResultBean resultBean=new SearchResultBean();<br />&#160;&#160;&#160;&#160;&#160;resultBean.setHtmlPath(hits.doc(i).get("path"));<br />&#160;&#160;&#160;&#160;&#160;resultBean.setHtmlTitle(hits.doc(i).get("title"));<br />&#160;&#160;&#160;&#160;&#160;searchResult.add(resultBean);<br />&#160;&#160;&#160;&#160;}<br />&#160;&#160;&#160;}catch(IOException e){<br />&#160;&#160;&#160;&#160;e.printStackTrace();<br />&#160;&#160;&#160;}<br />&#160;&#160;}<br />&#160;&#160; return searchResult;<br />&#160;}</font>
		</p>
		<p>
				<font color="#000000">}<br /><p></p><hr /></font>
		</p>
		<p>
				<font color="#006400">请求管理器的实现</font>
				<br />
				<br />
				<font color="#000000">package com.rain.servlet;</font>
		</p>
		<p>
				<font color="#000000">import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.io.IOException;<br />import <a title="Java爱好者" href="http://www.blogjava.net/rain1102" >Java</a>.util.List;</font>
		</p>
		<p>
				<font color="#000000">import javax.servlet.RequestDispatcher;<br />import javax.servlet.ServletException;<br />import javax.servlet.http.HttpServlet;<br />import javax.servlet.http.HttpServletRequest;<br />import javax.servlet.http.HttpServletResponse;</font>
		</p>
		<p>
				<font color="#000000">import com.rain.search.SearchManager;</font>
		</p>
		<p>
				<font color="#000000">/**<br />&#160;* @author zhourui<br />&#160;* 2007-1-28<br />&#160;*/<br />public class SearchController extends HttpServlet {<br />&#160;private static final long serialVersionUID=1L;<br />&#160;<br />&#160;/* (non-Javadoc)<br />&#160; * @see javax.servlet.http.HttpServlet#doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)<br />&#160; */<br />&#160;@Override<br />&#160;protected void doPost(HttpServletRequest arg0, HttpServletResponse arg1) throws ServletException, IOException {<br />&#160;&#160;// TODO Auto-generated method stub<br />&#160;&#160;String searchWord=arg0.getParameter("searchWord");<br />&#160;&#160;SearchManager searchManager=new SearchManager(searchWord);<br />&#160;&#160;List searchResult=null;<br />&#160;&#160;searchResult=searchManager.search();<br />&#160;&#160;RequestDispatcher dispatcher=arg0.getRequestDispatcher("search.jsp");<br />&#160;&#160;arg0.setAttribute("searchResult",searchResult);<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; dispatcher.forward(arg0, arg1);<br />&#160;}<br />&#160;<br />}</font>
				<br />
				<br />
		</p>
		<hr />
		<br />
		<strong>向Web服务器提交搜索请求</strong>
		<br />
		<strong>&lt;form action="SearchController" method="post"&gt;<br />&#160;&#160;&#160;&#160;&#160; &lt;table&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;tr&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;td colspan="3"&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SearchWord:&lt;input type="text" name="searchWord" id="searchWord" size="40"&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;input id="doSearch" type="submit" value="search"&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/td&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/tr&gt;<br />&#160;&#160;&#160;&#160;&#160; &lt;/table&gt;<br />&#160;&#160;&#160; &lt;/form&gt;<br />显示搜索结果<br />&#160;&lt;table class="result"&gt;<br />&#160;&#160;&#160;&#160;&#160; &lt;%<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; List searchResult=(List)request.getAttribute("searchResult");<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; int resultCount=0;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; if(null!=searchResult){<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;resultCount=searchResult.size();<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; }<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; for(int i=0;i&lt;resultCount;i++){<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;SearchResultBean resultBean=(SearchResultBean)searchResult.get(i);<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;String title=resultBean.getHtmlTitle();<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;String path=resultBean.getHtmlPath();<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;%&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;&lt;tr&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;&#160; &lt;td class="title"&gt;&lt;h3&gt;&lt;a href="&lt;%=path%&gt;"&gt;&lt;%=title%&gt;&lt;/a&gt;&lt;/h3&gt;&lt;/td&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;&lt;/tr&gt;<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160;&lt;%<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; }<br />&#160;&#160;&#160;&#160;&#160; %&gt;<br />&#160;&#160;&#160; &lt;/table&gt;</strong><img src ="http://www.blogjava.net/rain1102/aggbug/96436.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rain1102/" target="_blank">Eric.Zhou</a> 2007-01-29 09:57 <a href="http://www.blogjava.net/rain1102/archive/2007/01/29/96436.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Lucene基本使用介绍</title><link>http://www.blogjava.net/rain1102/archive/2007/01/28/96356.html</link><dc:creator>Eric.Zhou</dc:creator><author>Eric.Zhou</author><pubDate>Sun, 28 Jan 2007 02:38:00 GMT</pubDate><guid>http://www.blogjava.net/rain1102/archive/2007/01/28/96356.html</guid><wfw:comment>http://www.blogjava.net/rain1102/comments/96356.html</wfw:comment><comments>http://www.blogjava.net/rain1102/archive/2007/01/28/96356.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rain1102/comments/commentRss/96356.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rain1102/services/trackbacks/96356.html</trackback:ping><description><![CDATA[<p>一.&#160; 概述</p>
		<p>随着系统信息的越来越多，怎么样从这些信息海洋中捞起自己想要的那一根针就变得非常重要了，全文检索是通常用于解决此类问题的方案，而Lucene则为实现全文检索的工具，任何应用都可通过嵌入它来实现全文检索。</p>
		<p>二.&#160; 环境搭建</p>
		<p>从lucene.apache.org上下载最新版本的lucene.jar，将此jar作为项目的build path，那么在项目中就可以直接使用lucene了。</p>
		<p>三.&#160; 使用说明</p>
		<p>3.1.&#160;&#160;&#160;&#160;&#160;&#160; 基本概念</p>
		<p>这里介绍的主要为在使用中经常碰到一些概念，以大家都比较熟悉的数据库来进行类比的讲解，使用Lucene进行全文检索的过程有点类似数据库的这个过程，table---&#224;查询相应的字段或查询条件----&#224;返回相应的记录，首先是IndexWriter，通过它建立相应的索引表，相当于数据库中的table，在构建此索引表时需指定的为该索引表采用何种方式进行构建，也就是说对于其中的记录的字段以什么方式来进行格式的划分，这个在Lucene中称为Analyzer，Lucene提供了几种环境下使用的Analyzer：SimpleAnalyzer、StandardAnalyzer、GermanAnalyzer等，其中StandardAnalyzer是经常使用的，因为它提供了对于中文的支持，在表建好后我们就需要往里面插入用于索引的记录，在Lucene中这个称为Document，有点类似数据库中table的一行记录，记录中的字段的添加方法，在Lucene中称为Field，这个和数据库中基本一样，对于Field Lucene分为可被索引的，可切分的，不可被切分的，不可被索引的几种组合类型，通过这几个元素基本上就可以建立起索引了。在查询时经常碰到的为另外几个概念，首先是Query，Lucene提供了几种经常可以用到的Query：TermQuery、MultiTermQuery、BooleanQuery、WildcardQuery、PhraseQuery、PrefixQuery、PhrasePrefixQuery、FuzzyQuery、RangeQuery、SpanQuery，Query其实也就是指对于需要查询的字段采用什么样的方式进行查询，如模糊查询、语义查询、短语查询、范围查询、组合查询等，还有就是QueryParser，QueryParser可用于创建不同的Query，还有一个MultiFieldQueryParser支持对于多个字段进行同一关键字的查询，IndexSearcher概念指的为需要对何目录下的索引文件进行何种方式的分析的查询，有点象对数据库的哪种索引表进行查询并按一定方式进行记录中字段的分解查询的概念，通过IndexSearcher以及Query即可查询出需要的结果，Lucene返回的为Hits.通过遍历Hits可获取返回的结果的Document，通过Document则可获取Field中的相关信息了。<br /></p>
		<p>比较一下Lucene和数据库：</p>
		<p>
		</p>
		<table width="100%" border="1">
				<tbody>
						<tr>
								<td align="middle" width="50%">Lucene</td>
								<td align="middle" width="50%">数据库</td>
						</tr>
						<tr>
								<td width="50%">
										<pre>索引数据源：doc(field1,field2...) doc(field1,field2...)<br />                  \  indexer /<br />                 _____________<br />                | Lucene Index|<br />                --------------<br />                 / searcher \<br /> 结果输出：Hits(doc(field1,field2) doc(field1...))</pre>
								</td>
								<td width="50%">
										<pre> 索引数据源：record(field1,field2...) record(field1..)<br />              \  SQL: insert/<br />               _____________<br />              | DB  Index   |<br />               -------------<br />              / SQL: select \<br />结果输出：results(record(field1,field2..) record(field1...))</pre>
								</td>
						</tr>
						<tr>
								<td width="50%">Document：一个需要进行索引的&#8220;单元&#8221;<br />一个Document由多个字段组成</td>
								<td width="50%">Record：记录，包含多个字段</td>
						</tr>
						<tr>
								<td width="50%">Field：字段</td>
								<td width="50%">Field：字段</td>
						</tr>
						<tr>
								<td width="50%">Hits：查询结果集，由匹配的Document组成</td>
								<td width="50%">RecordSet：查询结果集，由多个Record组成</td>
						</tr>
				</tbody>
		</table>
		<br />通过对于上面在建立索引和全文检索的基本概念的介绍希望能让你对Lucene建立一定的了解。<br /><p>需要熟悉几个接口：<br /><font color="#006400">分析器Analyzer</font><span class="oblog_text"><br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; 分析器主要工作是筛选，一段文档进来以后，经过它，出去的时候只剩下那些有用的部分，其他则剔除。而这个分析器也可以自己根据需要而编写。<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; org.apache.lucene.analysis.Analyzer：这是一个虚构类，以下两个借口均继承它而来。</span><span class="oblog_text"><br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;org.apache.lucene.analysis.SimpleAnalyzer：分析器，支持最简单拉丁语言。<br /></span><span class="oblog_text">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;org.apache.lucene.analysis.standard.StandardAnalyzer：标准分析器，除了拉丁语言还支持亚洲语言，并在一些匹配功能上进行完善。在这个接口中还有一个很重要的构造函数：StandardAnalyzer(String[] stopWords)，可以对分析器定义一些使用词语，这不仅可以免除检索一些无用信息，而且还可以在检索中定义禁止的政治性、非法性的检索关键词。</span><br /><font color="#006400">IndexWriter</font><span class="oblog_text"><br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;IndexWriter的构造函数有三种接口，针对目录Directory、文件File、文件路径String三种情况。<br />例如IndexWriter(String path, Analyzer a, boolean create)，path为文件路径，a为分析器，create标志是否重建索引（true：建立或者覆盖已存在的索引，false：扩展已存在的索引。）<br />&#160;&#160;&#160;&#160;&#160;&#160; 一些重要的方法：</span></p><p></p><table style="BORDER-RIGHT: medium none; BORDER-TOP: medium none; BORDER-LEFT: medium none; BORDER-BOTTOM: medium none; BORDER-COLLAPSE: collapse; mso-border-alt: solid windowtext .5pt; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt" cellspacing="0" cellpadding="0" border="1"><tbody><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">接口名<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /?><o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">备注<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">addDocument(Document doc)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">索引添加一个文档<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">addIndexes(Directory[] dirs)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">将目录中已存在索引添加到这个索引<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">addIndexes(IndexReader[] readers)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">将提供的索引添加到这个索引<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">optimize()<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">合并索引并优化<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">close()<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">关闭<o:p></o:p></span></p></td></tr></tbody></table><span class="oblog_text">　<br />&#160;&#160;&#160;&#160;&#160;&#160; IndexWriter为了减少大量的io维护操作，在每得到一定量的索引后建立新的小索引文件（笔者测试索引批量的最小单位为10），然后再定期将它们整合到一个索引文件中，因此在索引结束时必须进行wirter.optimize()，以便将所有索引合并优化。<br /></span><font color="#006400">org.apache.lucene.document</font><span class="oblog_text"><br />&#160;以下介绍两种主要的类：<br />&#160;a）org.apache.lucene.document.Document：<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;Document文档类似数据库中的一条记录，可以由好几个字段（Field）组成，并且字段可以套用不同的类型（详细见b）。Document的几种接口：&#160;</span><p></p><table style="BORDER-RIGHT: medium none; BORDER-TOP: medium none; BORDER-LEFT: medium none; BORDER-BOTTOM: medium none; BORDER-COLLAPSE: collapse; mso-border-alt: solid windowtext .5pt; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt" cellspacing="0" cellpadding="0" border="1"><tbody><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">接口名<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">备注<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">add(Field field)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">添加一个字段（Field）到Document中<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">String get(String name)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 5.25pt; mso-char-indent-count: .5; mso-char-indent-size: 10.5pt"><span class="oblog_text">从文档中获得一个字段对应的文本<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">Field getField(String name)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">由字段名获得字段值<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">Field[] getFields(String name)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 243pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="324"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">由字段名获得字段值的集<o:p></o:p></span></p></td></tr></tbody></table><span class="oblog_text"><br />&#160;b）org.apache.lucene.document.Field<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; 即上文所说的&#8220;字段&#8221;，它是Document的片段section。<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; Field的构造函数：<br />&#160;&#160;&#160;&#160;&#160;&#160; Field(String name, String string, boolean store, boolean index, boolean token)。<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160; Indexed：如果字段是Indexed的，表示这个字段是可检索的。<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;Stored：如果字段是Stored的，表示这个字段的值可以从检索结果中得到。<br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;Tokenized：如果一个字段是Tokenized的，表示它是有经过Analyzer转变后成为一个tokens序列，在这个转变过程tokenization中，Analyzer提取出需要进行索引的文本，而剔除一些冗余的词句（例如：a，the,they等，详见org.apache.lucene.analysis.StopAnalyzer.ENGLISH_STOP_WORDS和org.apache.lucene.analysis.standard.StandardAnalyzer(String[] stopWords)的API）。Token是索引时候的基本单元，代表一个被索引的词，例如一个英文单词，或者一个汉字。因此，所有包含中文的文本都必须是Tokenized的。<br />&#160;&#160;&#160;&#160; Field的几种接口：</span><p></p><table style="BORDER-RIGHT: medium none; BORDER-TOP: medium none; BORDER-LEFT: medium none; BORDER-BOTTOM: medium none; BORDER-COLLAPSE: collapse; mso-border-alt: solid windowtext .5pt; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-table-layout-alt: fixed" cellspacing="0" cellpadding="0" border="1"><tbody><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">Name<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 45pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt" valign="top" width="60"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-INDENT: -0.15pt"><span class="oblog_text">Stored<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 43.95pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt" valign="top" width="59"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-INDENT: -6.4pt; mso-char-indent-count: -.61; mso-char-indent-size: 10.45pt"><span class="oblog_text">Indexed<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 64.05pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt" valign="top" width="85"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 0.05pt"><span class="oblog_text">Tokenized<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 108pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt" valign="top" width="144"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">use<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">Keyword(String name,<o:p></o:p></span></p><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">&#160;&#160;&#160;&#160;&#160;&#160;&#160; String value)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 45pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="60"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 43.95pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="59"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 64.05pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="85"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">N<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 108pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="144"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">date,url<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">Text(String name, Reader value)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 45pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="60"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">N<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 43.95pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="59"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 64.05pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="85"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 108pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="144"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">short text fields:<o:p></o:p></span></p><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">title,subject<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">Text(String name, String value)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 45pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="60"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 43.95pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="59"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 64.05pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="85"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 108pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="144"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">longer text fields,<o:p></o:p></span></p><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">like &#8220;body&#8221;<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">UnIndexed(String name,<o:p></o:p></span></p><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 52.5pt; mso-char-indent-count: 5.0; mso-char-indent-size: 10.5pt"><span class="oblog_text">String value)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 45pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="60"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 43.95pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="59"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">N<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 64.05pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="85"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">N<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 108pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="144"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">&#160;<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 167.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="223"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">UnStored(String name,<o:p></o:p></span></p><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; String value)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 45pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="60"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">N<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 43.95pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="59"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 64.05pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="85"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: center" align="center"><span class="oblog_text">Y<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 108pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="144"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">&#160;<o:p></o:p></span></p></td></tr></tbody></table><font color="#006400">Hits与Searcher</font><span class="oblog_text"><br />&#160;&#160;&#160;&#160;&#160;&#160; Hits的主要使用接口：</span><p></p><table style="BORDER-RIGHT: medium none; BORDER-TOP: medium none; BORDER-LEFT: medium none; BORDER-BOTTOM: medium none; BORDER-COLLAPSE: collapse; mso-border-alt: solid windowtext .5pt; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt" cellspacing="0" cellpadding="0" border="1"><tbody><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 86.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent" valign="top" width="115"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">接口名<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 324pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt" valign="top" width="432"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">备注<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 86.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="115"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">Doc(int n)<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 324pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="432"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">返回第n个的文档的所有字段<o:p></o:p></span></p></td></tr><tr><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 86.4pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-top-alt: solid windowtext .5pt" valign="top" width="115"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">length()<o:p></o:p></span></p></td><td style="BORDER-RIGHT: windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: #d4d0c8; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: #d4d0c8; WIDTH: 324pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt" valign="top" width="432"><p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span class="oblog_text">返回这个集中的可用个数<o:p></o:p></span></p></td></tr></tbody></table><br />3.2.&#160;&#160;&#160;&#160;&#160;&#160; 全文检索需求的实现<p>索引建立部分的代码：</p><p><br />private void createIndex(String indexFilePath) throws Exception{</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; IndexWriter iwriter=getWriter(indexFilePath);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Document doc=new Document();</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc.add(Field.Keyword("name","jerry"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc.add(Field.Text("sender","<a href="mailto:bluedavy@gmail.com">bluedavy@gmail.com</a>"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc.add(Field.Text("receiver","<a href="mailto:google@gmail.com">google@gmail.com</a>"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc.add(Field.Text("title","用于索引的标题"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc.add(Field.UnIndexed("content","不建立索引的内容"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Document doc2=new Document();</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc2.add(Field.Keyword("name","jerry.lin"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc2.add(Field.Text("sender","<a href="mailto:bluedavy@hotmail.com">bluedavy@hotmail.com</a>"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc2.add(Field.Text("receiver","<a href="mailto:msn@hotmail.com">msn@hotmail.com</a>"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc2.add(Field.Text("title","用于索引的第二个标题"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; doc2.add(Field.Text("content","建立索引的内容"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; iwriter.addDocument(doc);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; iwriter.addDocument(doc2);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; iwriter.optimize();</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; iwriter.close();</p><p>&#160;&#160;&#160; }</p><p>&#160;&#160;&#160; </p><p>&#160;&#160;&#160; private IndexWriter getWriter(String indexFilePath) throws Exception{</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; boolean append=true;</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; File file=new File(indexFilePath+File.separator+"segments");</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; if(file.exists())</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; append=false; </p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; return new IndexWriter(indexFilePath,analyzer,append);</p><p>&#160;&#160;&#160; }</p><p><br />3.2.1.&#160;&#160;&#160;&#160;&#160;&#160; 对于某字段的关键字的模糊查询</p><p><br />Query query=new WildcardQuery(new Term("sender","*davy*"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; </p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Searcher searcher=new IndexSearcher(indexFilePath);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Hits hits=searcher.search(query);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; for (int i = 0; i &lt; hits.length(); i++) {</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; System.out.println(hits.doc(i).get("name"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p><p><br />3.2.2.&#160;&#160;&#160;&#160;&#160;&#160; 对于某字段的关键字的语义查询</p><p><br />Query query=QueryParser.parse("索引","title",analyzer);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; </p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Searcher searcher=new IndexSearcher(indexFilePath);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Hits hits=searcher.search(query);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; for (int i = 0; i &lt; hits.length(); i++) {</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; System.out.println(hits.doc(i).get("name"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p><p><br />3.2.3.&#160;&#160;&#160;&#160;&#160;&#160; 对于多字段的关键字的查询</p><p><br />Query query=MultiFieldQueryParser.parse("索引",new String[]{"title","content"},analyzer);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; </p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Searcher searcher=new IndexSearcher(indexFilePath);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Hits hits=searcher.search(query);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; for (int i = 0; i &lt; hits.length(); i++) {</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; System.out.println(hits.doc(i).get("name"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p><p><br />3.2.4.&#160;&#160;&#160;&#160;&#160;&#160; 复合查询(多种查询条件的综合查询)</p><p><br />Query query=MultiFieldQueryParser.parse("索引",new String[]{"title","content"},analyzer);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Query mquery=new WildcardQuery(new Term("sender","bluedavy*"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; TermQuery tquery=new TermQuery(new Term("name","jerry"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; </p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; BooleanQuery bquery=new BooleanQuery();</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; bquery.add(query,true,false);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; bquery.add(mquery,true,false);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; bquery.add(tquery,true,false);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; </p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Searcher searcher=new IndexSearcher(indexFilePath);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; Hits hits=searcher.search(bquery);</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; for (int i = 0; i &lt; hits.length(); i++) {</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; System.out.println(hits.doc(i).get("name"));</p><p>&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p><p><br />四.&#160; 总结</p><p>相信大家通过上面的说明能知道Lucene的一个基本的使用方法，在全文检索时建议大家先采用语义时的搜索，先搜索出有意义的内容，之后再进行模糊之类的搜索，^_^，这个还是需要根据搜索的需求才能定了，Lucene还提供了很多其他更好用的方法，这个就等待大家在使用的过程中自己去进一步的摸索了，比如对于Lucene本身提供的Query的更熟练的掌握，对于Filter、Sorter的使用，自己扩展实现Analyzer，自己实现Query等等，甚至可以去了解一些关于搜索引擎的技术(切词、索引排序 etc)等等</p><img src ="http://www.blogjava.net/rain1102/aggbug/96356.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rain1102/" target="_blank">Eric.Zhou</a> 2007-01-28 10:38 <a href="http://www.blogjava.net/rain1102/archive/2007/01/28/96356.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>