BlogJava-BaoYaEr-文章分类-Lucene

lucene全文检索应用示例及代码简析

大田斗 — Mon, 14 Jan 2008 02:38:00 GMT

摘要: 使用Lucene实现全文检索，主要有下面三个步骤：　　1、建立索引库：根据网站新闻信息库中的已有的数据资料建立Lucene索引文件。　　2、通过索引库搜索：有了索引后，即可使用标准的词法分析器或直接的词法分析器实现进行全文检索。　　3、维护索引库：网站新闻信息库中的信息会不断的变动，包括新增、修改及删除等，这些信息的变动都需要进一步反映到Lucene索引文件中。 &nbs... 阅读全文

大田斗 2008-01-14 10:38 发表评论

lucene实例使用

大田斗 — Mon, 14 Jan 2008 02:32:00 GMT

摘要: 说明一下,这一篇文章的用到的lucene,是用2.0版本的,主要在查询的时候2.0版本的lucene与以前的版本有了一些区别. 其实这一些代码都是早几个月写的,自己很懒,所以到今天才写到自己的博客上,高深的文章自己写不了，只能记录下一些简单的记录与点滴，其中的代码算是自娱自乐的，希望高手不要把重构之类的砸下来... 1、在windows系统下的的C盘，建一个名叫s的文件夹,在... 阅读全文

大田斗 2008-01-14 10:32 发表评论

Lucene基本使用介绍

大田斗 — Tue, 13 Feb 2007 03:50:00 GMT

今天用了下Lucene，发现网上虽然也有不少介绍它的文档，不过很多都偏向介绍概念呀、设计或者是一些更为深入的东西，对于其入门使用的介绍性的文档并不多，就写了这么一篇。

Lucene 基本使用介绍

本文的目的不在于对Lucene的概念和设计这些进行介绍，仅在于介绍怎么样去使用Lucene来达到自己想要的几种常见的全文检索的需求，如果想深入了解Lucene的话本文不会带给你什么收获的。看完本文后想更深入的了解Lucene请访问：http://lucene.apache.org

一. 概述

随着系统信息的越来越多，怎么样从这些信息海洋中捞起自己想要的那一根针就变得非常重要了，全文检索是通常用于解决此类问题的方案，而Lucene则为实现全文检索的工具，任何应用都可通过嵌入它来实现全文检索。

二. 环境搭建

从lucene.apache.org上下载最新版本的lucene.jar，将此jar作为项目的build path，那么在项目中就可以直接使用lucene了。

三. 使用说明

3.1. 基本概念

这里介绍的主要为在使用中经常碰到一些概念，以大家都比较熟悉的数据库来进行类比的讲解，使用Lucene进行全文检索的过程有点类似数据库的这个过程，table---à查询相应的字段或查询条件----à返回相应的记录，首先是IndexWriter，通过它建立相应的索引表，相当于数据库中的table，在构建此索引表时需指定的为该索引表采用何种方式进行构建，也就是说对于其中的记录的字段以什么方式来进行格式的划分，这个在Lucene中称为Analyzer，Lucene提供了几种环境下使用的Analyzer：SimpleAnalyzer、StandardAnalyzer、GermanAnalyzer等，其中StandardAnalyzer是经常使用的，因为它提供了对于中文的支持，在表建好后我们就需要往里面插入用于索引的记录，在Lucene中这个称为Document，有点类似数据库中table的一行记录，记录中的字段的添加方法，在Lucene中称为Field，这个和数据库中基本一样，对于Field Lucene分为可被索引的，可切分的，不可被切分的，不可被索引的几种组合类型，通过这几个元素基本上就可以建立起索引了。在查询时经常碰到的为另外几个概念，首先是Query，Lucene提供了几种经常可以用到的Query：TermQuery、MultiTermQuery、BooleanQuery、WildcardQuery、PhraseQuery、PrefixQuery、PhrasePrefixQuery、FuzzyQuery、RangeQuery、SpanQuery，Query其实也就是指对于需要查询的字段采用什么样的方式进行查询，如模糊查询、语义查询、短语查询、范围查询、组合查询等，还有就是QueryParser，QueryParser可用于创建不同的Query，还有一个MultiFieldQueryParser支持对于多个字段进行同一关键字的查询，IndexSearcher概念指的为需要对何目录下的索引文件进行何种方式的分析的查询，有点象对数据库的哪种索引表进行查询并按一定方式进行记录中字段的分解查询的概念，通过IndexSearcher以及Query即可查询出需要的结果，Lucene返回的为Hits.通过遍历Hits可获取返回的结果的Document，通过Document则可获取Field中的相关信息了。

通过对于上面在建立索引和全文检索的基本概念的介绍希望能让你对Lucene建立一定的了解。

3.2. 全文检索需求的实现

索引建立部分的代码：

private void createIndex(String indexFilePath) throws Exception{

IndexWriter iwriter=getWriter(indexFilePath);

Document doc=new Document();

doc.add(Field.Keyword("name","jerry"));

doc.add(Field.Text("sender","bluedavy@gmail.com"));

doc.add(Field.Text("receiver","google@gmail.com"));

doc.add(Field.Text("title","用于索引的标题"));

doc.add(Field.UnIndexed("content","不建立索引的内容"));

Document doc2=new Document();

doc2.add(Field.Keyword("name","jerry.lin"));

doc2.add(Field.Text("sender","bluedavy@hotmail.com"));

doc2.add(Field.Text("receiver","msn@hotmail.com"));

doc2.add(Field.Text("title","用于索引的第二个标题"));

doc2.add(Field.Text("content","建立索引的内容"));

iwriter.addDocument(doc);

iwriter.addDocument(doc2);

iwriter.optimize();

iwriter.close();

}

private IndexWriter getWriter(String indexFilePath) throws Exception{

boolean append=true;

File file=new File(indexFilePath+File.separator+"segments");

if(file.exists())

append=false;

return new IndexWriter(indexFilePath,analyzer,append);

}

3.2.1. 对于某字段的关键字的模糊查询

Query query=new WildcardQuery(new Term("sender","*davy*"));

Searcher searcher=new IndexSearcher(indexFilePath);

Hits hits=searcher.search(query);

for (int i = 0; i < hits.length(); i++) {

System.out.println(hits.doc(i).get("name"));

}

3.2.2. 对于某字段的关键字的语义查询

Query query=QueryParser.parse("索引","title",analyzer);

Searcher searcher=new IndexSearcher(indexFilePath);

Hits hits=searcher.search(query);

for (int i = 0; i < hits.length(); i++) {

System.out.println(hits.doc(i).get("name"));

}

3.2.3. 对于多字段的关键字的查询

Query query=MultiFieldQueryParser.parse("索引",new String[]{"title","content"},analyzer);

Searcher searcher=new IndexSearcher(indexFilePath);

Hits hits=searcher.search(query);

for (int i = 0; i < hits.length(); i++) {

System.out.println(hits.doc(i).get("name"));

}

3.2.4. 复合查询(多种查询条件的综合查询)

Query query=MultiFieldQueryParser.parse("索引",new String[]{"title","content"},analyzer);

Query mquery=new WildcardQuery(new Term("sender","bluedavy*"));

TermQuery tquery=new TermQuery(new Term("name","jerry"));

BooleanQuery bquery=new BooleanQuery();

bquery.add(query,true,false);

bquery.add(mquery,true,false);

bquery.add(tquery,true,false);

Searcher searcher=new IndexSearcher(indexFilePath);

Hits hits=searcher.search(bquery);

for (int i = 0; i < hits.length(); i++) {

System.out.println(hits.doc(i).get("name"));

}

四. 总结

相信大家通过上面的说明能知道Lucene的一个基本的使用方法，在全文检索时建议大家先采用语义时的搜索，先搜索出有意义的内容，之后再进行模糊之类的搜索，^_^，这个还是需要根据搜索的需求才能定了，Lucene还提供了很多其他更好用的方法，这个就等待大家在使用的过程中自己去进一步的摸索了，比如对于Lucene本身提供的Query的更熟练的掌握，对于Filter、Sorter的使用，自己扩展实现Analyzer，自己实现Query等等，甚至可以去了解一些关于搜索引擎的技术(切词、索引排序 etc)等等。

大田斗 2007-02-13 11:50 发表评论

Lucene In Action ch 4 笔记(I)--Analysis

大田斗 — Tue, 13 Feb 2007 03:32:00 GMT

本章详细的讨论了 Lucene的分析处理过程和几个Analyzer.

在indexing过程中要把需要indexing的text分析处理一下, 经过处理和切词然后建立index. 而不通的Analyzer有不同的分析规则, 因此在程序中使用Lucene时选择正确的Analyzer是很重要的.

1.Using Analyzers

在使用Analyzer以前先来看看text经过Analyzer分析后的效果吧:

Listing 4.1 Visualizing analyzer effects
Analyzing "The quick brown fox jumped over the lazy dogs"
WhitespaceAnalyzer:
    [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
SimpleAnalyzer:
    [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
StopAnalyzer:
    [quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
StandardAnalyzer:
    [quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
　

Analyzing "XY&Z Corporation - xyz@example.com"
WhitespaceAnalyzer:
    [XY&Z] [Corporation] [-] [xyz@example.com]
SimpleAnalyzer:
    [xy] [z] [corporation] [xyz] [example] [com]
StopAnalyzer:
    [xy] [z] [corporation] [xyz] [example] [com]
StandardAnalyzer:
    [xy&z] [corporation] [xyz@example.com]

上面是在下面我们要提到的一个例子的运行结果. 可以看出不同的Analyzer 是如何来分析text的.在分析The quick brown fox jumped over the lazy dogs 时, WhitespaceAnalyzer和 SimpleAnalyzer只是简单的把词分开,建立Term就可以了;而另外两个Analyzer则去掉了stop word. 而在分析XY&Z Corporation - xyz@example.com 的时候不同的Analyzer 对待 & 和 - 的方式也是不一样的 . 现在对Analysis有个感性的了解,下面来看看不同处理阶段的分析过程.

I. Indexing Analysis

还记得在ch2 indexing 中讲到 ,在建立index时,使用IndexWriter 在构造IndexWriter时,要使用到Analyser.如下所示:

Analyzer analyzer = new StandardAnalyzer();

IndexWriter writer = new IndexWriter(directory,

analyzer, true);

然后就可以使用writer对 document 来indexing了.如下

Document doc = new Document();

doc.add(

Field.Text("title", "This is the title"));

doc.add(

Field.UnStored("contents", "...document contents..."));

writer.addDocument(doc);

使用的是在构造IndexWriter时指定的Analyzer. 如果要给一个文档单独指定一个Analyzer 可以用下面的一个方法:

writer.addDocument(doc,analyzer);

II.QueryParser Analysis

Analysis 是term搜索的关键.要确保经过Analyzer分析后的term和被索引的一样这样才可以得到搜索结果.在使用QueryParser parse 用户输入的搜索表达式时可以指定一个Analyzer 如下所示:

Query query = QueryParser.parse(expression, "contents",

analyzer);

通过QueryParser的静态方法实现. 如果使用QueryParser实例, 则可以在构造QueryParser时候提供一个Analyzer 如下:

QueryParser parser = new QueryParser("contents",

analyzer);

query = parser.parse(expression);

QueryParser

analyzes individual pieces of the expression, not the expression as a

whole, which may include operators, parenthesis, and other special expression

syntax to denote range, wildcard, and fuzzy searches.

QueryParser 平等的分析所有的text,她并不知道他们是如何每indxed, 这时如果当搜索一个被索引为Keyword的filed时就可能会遇到问题.

还有一个问题就是在分析一些包含其他元素的text时该如何处理 ,如 Html xml 文档, 他们都带有元素标签而这些标签一般是不索引的.以及如何处理分域(field)索引, 如 Html 有Header 和 Body域如何分开搜索这个问题Analyzer现在也不能解决的, 因为在每次Analyzer都处理单个域. 在后面我们在进一步讨论该问题.

2. Analyzing the Analyzer

要详细了解Lucene分析文本的过程就要知道Analyzer是如何工作的,下面就来看看Analyzer是怎么工作的吧. Analyzer是各个XXXAnalyzer的基类 ,该类出奇的简单(比我想象的要简单多了) 只要一个方法 tokenStream(String fieldName, Reader reader); fieldName 参数对有些Analyzer实现是没有作用的,如SimpleAnalyzer, 该类的代码如下:

public final class SimpleAnalyzer extends Analyzer {

public TokenStream tokenStream(String fieldName, Reader reader) {

return new LowerCaseTokenizer(reader);

}

可以看到该类也是出奇的简单, 只用到了LowerCaseTokenizer; 但LowerCaseTokenizer是干什么的呢? 看看名字就可以猜个差不多啦 ,

该类把Text 中非字母(nonletters)的字符去掉,并把所有Text转换为小写.

而返回的

TokenStream 是一个 enumerator-like class ,通过她可以得到连续的 Tokens,当到达末尾时候返回null.

大田斗 2007-02-13 11:32 发表评论

Lucene In Action ch 3 笔记--Add search

大田斗 — Tue, 13 Feb 2007 03:31:00 GMT

今天看看 ch3, Add search to your Application. 真正开始使用 Lucene search 来搜索你的目标了.

1. 实现一个简单的search feature

在本章中只限于讨论简单Lucene 搜索API, 有下面几个相关的类:

Lucene 基本搜索API:

类	功能
IndexSearcher	搜索一个index的入口.所有的searches都是通过IndexSearcher 实例的几个重载的方法实现的.
Query (and subclasses)	各个子类封装了特定搜索类型的逻辑(logic),Query实例传递给IndexSearcher的search方法.
QueryParser	处理一个可读的表达式,转换为一个具体的Query实例.
Hits	包含了搜索的结果.有IndexSearcher的search函数返回.

下面我们来看几个书中的例子:

LiaTestCase.java 一个继承自TestCase 并且扩展了TestCase的类, 下面的几个例子都继承自该类.

01 package lia.common; 02 03 import junit.framework.TestCase; 04 import org.apache.lucene.store.FSDirectory; 05 import org.apache.lucene.store.Directory; 06 import org.apache.lucene.search.Hits; 07 import org.apache.lucene.document.Document; 08 09 import java.io.IOException; 10 import java.util.Date; 11 import java.text.ParseException; 12 import java.text.SimpleDateFormat; 13 14 /** 15 * LIA base class for test cases. 16 */ 17 public abstract class LiaTestCase extends TestCase { 18 private String indexDir = System.getProperty("index.dir"); // 测试 index 已经建立好了 19 protected Directory directory; 20 21 protected void setUp() throws Exception { 22 directory = FSDirectory.getDirectory(indexDir, false); 23 } 24 25 protected void tearDown() throws Exception { 26 directory.close(); 27 } 28 29 /** 30 * For troubleshooting 为了解决问题的方法 31 */ 32 protected final void dumpHits(Hits hits) throws IOException { 33 if (hits.length() == 0) { 34 System.out.println("No hits"); 35 } 36 37 for (int i=0; i < hits.length(); i++) { 38 Document doc = hits.doc(i); 39 System.out.println(hits.score(i) + ":" + doc.get("title")); 40 } 41 } 42 43 protected final void assertHitsIncludeTitle( 44 Hits hits, String title) 45 throws IOException { 46 for (int i=0; i < hits.length(); i++) { 47 Document doc = hits.doc(i); 48 if (title.equals(doc.get("title"))) { 49 assertTrue(true); 50 return; 51 } 52 } 53 54 fail("title '" + title + "' not found"); 55 } 56 57 protected final Date parseDate(String s) throws ParseException { 58 return new SimpleDateFormat("yyyy-MM-dd").parse(s); 59 } 60 }

I.搜索一个特定的Term 和利用QueryParser 解析用户输入的表达式

要利用一个特定的term搜索,使用QueryTerm就可以了,单个term 尤其适合Keyword搜索. 解析用户输入的表达式可以更适合用户的使用方式,搜索表达式的解析有QueryParser来完成.如果表达式解析错误会有异常抛出, 可以取得相信的错误信息以便给用户适当的提示.在解析表达式时,还需要一个Analyzer 来分析用户的输入, 并根据不同的Analyzer来生产相应的Term然后构成Query实例.

下面看个例子吧:BasicSearchingTest.java

01 package lia.searching; 02 03 import lia.common.LiaTestCase; 04 import org.apache.lucene.analysis.SimpleAnalyzer; 05 import org.apache.lucene.document.Document; 06 import org.apache.lucene.index.Term; 07 import org.apache.lucene.queryParser.QueryParser; 08 import org.apache.lucene.search.Hits; 09 import org.apache.lucene.search.IndexSearcher; 10 import org.apache.lucene.search.Query; 11 import org.apache.lucene.search.TermQuery; 12 13 public class BasicSearchingTest extends LiaTestCase { 14 15 public void testTerm() throws Exception { 16 IndexSearcher searcher = new IndexSearcher(directory); 17 Term t = new Term("subject", "ant"); // 构造一个Term 18 Query query = new TermQuery(t); 19 Hits hits = searcher.search(query); // 搜索 20 assertEquals("JDwA", 1, hits.length()); //测试结果 21 22 t = new Term("subject", "junit"); 23 hits = searcher.search(new TermQuery(t)); 24 assertEquals(2, hits.length()); 25 26 searcher.close(); 27 } 28 29 public void testKeyword() throws Exception { // 测试关键字搜索 30 IndexSearcher searcher = new IndexSearcher(directory); 31 Term t = new Term("isbn", "1930110995"); // 关键字 term 32 Query query = new TermQuery(t); 33 Hits hits = searcher.search(query); 34 assertEquals("JUnit in Action", 1, hits.length()); 35 } 36 37 public void testQueryParser() throws Exception { // 测试 QueryParser. 38 IndexSearcher searcher = new IndexSearcher(directory); 39 40 Query query = QueryParser.parse("+JUNIT +ANT -MOCK", 41 "contents", 42 new SimpleAnalyzer()); // 通过解析搜索表达式返回一个Query实例 43 Hits hits = searcher.search(query); 44 assertEquals(1, hits.length()); 45 Document d = hits.doc(0); 46 assertEquals("Java Development with Ant", d.get("title")); 47 48 query = QueryParser.parse("mock OR junit", 49 "contents", 50 new SimpleAnalyzer()); // 通过解析搜索表达式返回一个Query实例51 hits = searcher.search(query); 52 assertEquals("JDwA and JIA", 2, hits.length()); 53 } 54 }

大田斗 2007-02-13 11:31 发表评论

Lucene In Action ch 2 笔记--indexing详解

大田斗 — Tue, 13 Feb 2007 03:29:00 GMT

Lucene In Action ch2 系统的讲解了 indexing,下面就来看看吧.

1,indexing的处理过程.

首先要把indexing的数据转换为text,因为Lucene只能索引text,然后由Analysis来过虑text,把一些ch1中提到的所谓的stop words 过滤掉, 然后建立index.建立的index为inverted index 也就是所谓的倒排索引.

2,基本的ingex操作

基本的操作包括 :添加删除更新.

I . 添加

下面我们看个例子代码 BaseIndexingTestCase.class

01 package lia.indexing;
02 
03 import org.apache.lucene.store.Directory;
04 import org.apache.lucene.store.FSDirectory;
05 import org.apache.lucene.document.Document;
06 import org.apache.lucene.document.Field;
07 import org.apache.lucene.index.IndexWriter;
08 import org.apache.lucene.index.IndexReader;
09 import org.apache.lucene.analysis.Analyzer;
10 import org.apache.lucene.analysis.SimpleAnalyzer;
11 
12 import junit.framework.TestCase;
13 import java.io.IOException;
14 
15 /**
16  *
17  */
18 public abstract class BaseIndexingTestCase extends TestCase {
19   protected String[] keywords = {"1", "2"};
20   protected String[] unindexed = {"Netherlands", "Italy"};
21   protected String[] unstored = {"Amsterdam has lots of bridges",
22                                  "Venice has lots of canals"};
23   protected String[] text = {"Amsterdam", "Venice"};
24   protected Directory dir;
25   // setUp 方法
26   protected void setUp() throws IOException {
27     String indexDir =
28       System.getProperty("java.io.tmpdir", "tmp") +
29       System.getProperty("file.separator") + "index-dir";
30     dir = FSDirectory.getDirectory(indexDir, true);
31     addDocuments(dir);
32   }
33 
34   protected void addDocuments(Directory dir)
35     throws IOException {
36     IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
37       true);    // 得到indexWriter 实例
38     writer.setUseCompoundFile(isCompound());
39     for (int i = 0; i < keywords.length; i++) {
40       Document doc = new Document();        // 添加文档
41       doc.add(Field.Keyword("id", keywords[i]));
42       doc.add(Field.UnIndexed("country", unindexed[i]));
43       doc.add(Field.UnStored("contents", unstored[i]));
44       doc.add(Field.Text("city", text[i]));
45       writer.addDocument(doc);
46     }
47     writer.optimize();   // 优化index
48     writer.close();
49   }
50   // 可以覆盖该方法提供不同的Analyzer 
51   protected Analyzer getAnalyzer() {
52     return new SimpleAnalyzer();
53   }
54   // 也可以覆盖该方法 指出Compound属性 是否是

Heterogeneous Documents


55   protected boolean isCompound() {
56     return true;
57   }
58   // 测试添加文档
59   public void testIndexWriter() throws IOException {
60     IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
61       false);
62     assertEquals(keywords.length, writer.docCount());
63     writer.close();
64   }
65   // 测试IndexReader
66   public void testIndexReader() throws IOException {
67     IndexReader reader = IndexReader.open(dir);
68     assertEquals(keywords.length, reader.maxDoc());
69     assertEquals(keywords.length, reader.numDocs());
70     reader.close();
71   }
72 }

这是一个测试超类可以被其他的测试用例继承来测试不同的功能.上面带有详细的注释.

在添加Field时, 会遇到同义词的情况,添加同义词由两种方式:

a.创建一个同义词词组,循环添加到Single Strng的不同Field中.

b.把同义词添加到一个Base word的field中.如下:

String baseWord = "fast";

String synonyms[] = String {"quick", "rapid", "speedy"};

Document doc = new Document();

doc.add(Field.Text("word", baseWord));

for (int i = 0; i < synonyms.length; i++) {

doc.add(Field.Text("word", synonyms[i]));

}

这样在Lucene内部把每个词都添加的一个名为word的Field中,在搜索时你可以使用任何一个给定的词语.

大田斗 2007-02-13 11:29 发表评论

Lucene In Action ch 1 笔记 -- 基本概念

大田斗 — Tue, 13 Feb 2007 03:28:00 GMT

在第一章中作者主要讲了Lucene 是什么能用来干什么, 以及一个 indexing 和 searching 的例子, 通过例子讲解了一点基本(核心)概念.给读者一个基本的Lucene 概况. 然后又介绍了现在流行的搜索框架.

我们主要来看看这个 indexing and searching 例子然后了解一些基本概念.

package lia.meetlucene; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import java.io.File; import java.io.IOException; import java.io.FileReader; import java.util.Date; /** * This code was originally written for * Erik's Lucene intro java.net article */ public class Indexer { public static void main(String[] args) throws Exception { if (args.length != 2) { throw new Exception("Usage: java " + Indexer.class.getName() + " "); } File indexDir = new File(args[0]); // 在该目录中创建Lucene Incex File dataDir = new File(args[1]); // 该目录中存放备索引的文件 long start = new Date().getTime(); int numIndexed = index(indexDir, dataDir); long end = new Date().getTime(); System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds"); } public static int index(File indexDir, File dataDir) throws IOException { if (!dataDir.exists() || !dataDir.isDirectory()) { throw new IOException(dataDir + " does not exist or is not a directory"); } IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(), true); //(1)创建 Lucene Index writer.setUseCompoundFile(false); indexDirectory(writer, dataDir); int numIndexed = writer.docCount(); writer.optimize(); writer.close(); // close index return numIndexed; } private static void indexDirectory(IndexWriter writer, File dir) throws IOException { File[] files = dir.listFiles(); for (int i = 0; i < files.length; i++) { File f = files[i]; if (f.isDirectory()) { indexDirectory(writer, f); //(2) recurse } else if (f.getName().endsWith(".txt")) { indexFile(writer, f); } } } private static void indexFile(IndexWriter writer, File f) throws IOException { if (f.isHidden() || !f.exists() || !f.canRead()) { return; } System.out.println("Indexing " + f.getCanonicalPath()); Document doc = new Document(); doc.add(Field.Text("contents", new FileReader(f))); // (3) index file content doc.add(Field.Keyword("filename", f.getCanonicalPath())); // (4) index file name writer.addDocument(doc); //(5) add document in Lucene index } }

上面的Indexer 使用了几行 Lucene的API, 来indexing 一个目录下面的文件. 运行时候需要两个参数 , 一个保存index的目录和要索引的文件目录.

在上面的类中,需要下面的一些Lucene classes 来执行 indexing 处理:

■

IndexWriter

■