Lucene In Action ch 2 笔记--indexing详解

Lucene In Action ch2 系统的讲解了 indexing,下面就来看看吧.

1,indexing的处理过程.

首先要把indexing的数据转换为text,因为Lucene只能索引text,然后由Analysis来过虑text,把一些ch1中提到的所谓的stop words 过滤掉, 然后建立index.建立的index为inverted index 也就是所谓的倒排索引.

2,基本的ingex操作

基本的操作包括 :添加删除更新.

I . 添加

下面我们看个例子代码 BaseIndexingTestCase.class

01 package lia.indexing;
02 
03 import org.apache.lucene.store.Directory;
04 import org.apache.lucene.store.FSDirectory;
05 import org.apache.lucene.document.Document;
06 import org.apache.lucene.document.Field;
07 import org.apache.lucene.index.IndexWriter;
08 import org.apache.lucene.index.IndexReader;
09 import org.apache.lucene.analysis.Analyzer;
10 import org.apache.lucene.analysis.SimpleAnalyzer;
11 
12 import junit.framework.TestCase;
13 import java.io.IOException;
14 
15 /**
16  *
17  */
18 public abstract class BaseIndexingTestCase extends TestCase {
19   protected String[] keywords = {"1", "2"};
20   protected String[] unindexed = {"Netherlands", "Italy"};
21   protected String[] unstored = {"Amsterdam has lots of bridges",
22                                  "Venice has lots of canals"};
23   protected String[] text = {"Amsterdam", "Venice"};
24   protected Directory dir;
25   // setUp 方法
26   protected void setUp() throws IOException {
27     String indexDir =
28       System.getProperty("java.io.tmpdir", "tmp") +
29       System.getProperty("file.separator") + "index-dir";
30     dir = FSDirectory.getDirectory(indexDir, true);
31     addDocuments(dir);
32   }
33 
34   protected void addDocuments(Directory dir)
35     throws IOException {
36     IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
37       true);    // 得到indexWriter 实例
38     writer.setUseCompoundFile(isCompound());
39     for (int i = 0; i < keywords.length; i++) {
40       Document doc = new Document();        // 添加文档
41       doc.add(Field.Keyword("id", keywords[i]));
42       doc.add(Field.UnIndexed("country", unindexed[i]));
43       doc.add(Field.UnStored("contents", unstored[i]));
44       doc.add(Field.Text("city", text[i]));
45       writer.addDocument(doc);
46     }
47     writer.optimize();   // 优化index
48     writer.close();
49   }
50   // 可以覆盖该方法提供不同的Analyzer 
51   protected Analyzer getAnalyzer() {
52     return new SimpleAnalyzer();
53   }
54   // 也可以覆盖该方法 指出Compound属性 是否是

Heterogeneous Documents


55   protected boolean isCompound() {
56     return true;
57   }
58   // 测试添加文档
59   public void testIndexWriter() throws IOException {
60     IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
61       false);
62     assertEquals(keywords.length, writer.docCount());
63     writer.close();
64   }
65   // 测试IndexReader
66   public void testIndexReader() throws IOException {
67     IndexReader reader = IndexReader.open(dir);
68     assertEquals(keywords.length, reader.maxDoc());
69     assertEquals(keywords.length, reader.numDocs());
70     reader.close();
71   }
72 }

这是一个测试超类可以被其他的测试用例继承来测试不同的功能.上面带有详细的注释.

在添加Field时, 会遇到同义词的情况,添加同义词由两种方式:

a.创建一个同义词词组,循环添加到Single Strng的不同Field中.

b.把同义词添加到一个Base word的field中.如下:

String baseWord = "fast";

String synonyms[] = String {"quick", "rapid", "speedy"};

Document doc = new Document();

doc.add(Field.Text("word", baseWord));

for (int i = 0; i < synonyms.length; i++) {

doc.add(Field.Text("word", synonyms[i]));

}

这样在Lucene内部把每个词都添加的一个名为word的Field中,在搜索时你可以使用任何一个给定的词语.

发表于 2007-02-13 11:29 大田斗阅读(856) 评论(0) 编辑收藏所属分类: Lucene

Lucene In Action ch 2 笔记--indexing详解

导航

统计

常用链接

留言簿(5)

随笔档案

文章分类

文章档案

java

工具

朋友

搜索

积分与排名

最新评论

阅读排行榜

评论排行榜