﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-csusky-随笔分类-LUCENE</title><link>http://www.blogjava.net/csusky/category/29489.html</link><description /><language>zh-cn</language><lastBuildDate>Fri, 30 May 2008 06:55:50 GMT</lastBuildDate><pubDate>Fri, 30 May 2008 06:55:50 GMT</pubDate><ttl>60</ttl><item><title>Lucene的切词 analysis包</title><link>http://www.blogjava.net/csusky/archive/2008/05/30/204087.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Fri, 30 May 2008 06:47:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/05/30/204087.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/204087.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/05/30/204087.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/204087.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/204087.html</trackback:ping><description><![CDATA[<p style="font-size: 10pt">在搜索引擎中，切词语是一个重要的部分，其中包括专有名词的提取、词的分割、词的格式化等等。<br />
TokenStream 类几乎是所有这些类的基类<br />
有两个需要被子类实现的方法Token next()&nbsp;和&nbsp;close()<br />
首先来看analysis包，这个包主要是提供一些简单的词汇化处理<br />
以<span style="color: #339966">Tokenizer结尾的类</span>是将要处理的字符串进行分割成Token流，而根据分割的依据的又产生了以下几个Tokenizer类<br />
首先Tokenizer类是所有<span style="color: #000000"><span style="color: #008080">以Tokenizer结尾的类</span>的基<span style="color: #000000">类<br />
然后是CharTokenizer，所有的以<span style="color: #339966">Tokenizer结尾的类都是从这个类继承的<br />
<span style="color: #000000">这个类中有一个抽象方法<br />
<span style="color: #ff0000">&nbsp; protected abstract boolean isTokenChar(char c);</span><br />
另外一个需要被子类覆写的方法<br />
&nbsp;<span style="color: #ff0000">&nbsp;protected char normalize(char c) {}；</span><br />
是对单个字符进行处理的方法譬如说将英文字母全部转化为小写<br />
<br />
还有一个变量<br />
<span style="color: #ff0000">protected Reader input;<br />
</span><span style="color: #0000ff">这个读取器是这些类所处理的数据的&nbsp;&nbsp; 数据源<br />
输入一个Reader ，产生一个Token流</span><br />
<br />
这个方法是是否进行切分的依据，依次读取char流，然后用这个方法对每个char进行检测，如果返回false则将预先存储在<br />
词汇缓冲区中的char数组作为一个Token返回<br />
<span style="color: #ff0000">LetterTokenizer ：</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #ff99cc">&nbsp;<span style="color: #333399">&nbsp;protected boolean isTokenChar(char c) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return Character.isLetter(c);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br />
</span></span><span style="color: #ff0000">WhitespaceTokenizer：</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #333399">&nbsp;protected boolean isTokenChar(char c) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;return !Character.isWhitespace(c);<br />
&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;}&nbsp;<br />
</span><span style="color: #ff0000">LowerCaseTokenizer extends LetterTokenizer：</span><br />
<span style="color: #333399">protected char normalize(char c) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return Character.toLowerCase(c);<br />
&nbsp;&nbsp; }</span><br />
&nbsp;&nbsp;&nbsp;在构造函数中调用super(in);进行和&nbsp;LetterTokenizer同样的操作，但是在词汇化之前所有的词都转化为小写了<br />
&nbsp;<br />
然后是以Filter结尾的类，这个类簇主要是对已经词汇化的Token流进行进一步的处理<br />
&nbsp;输入是Token流 , 输出仍然是Token流。<br />
TokenFilter extends TokenStream&nbsp; 是所有这些类的父类<br />
protected TokenStream input;<br />
在TokenFilter 中有一个TokenStream 变量，是Filter类簇处理的数据源，而Filter类簇又是继承了TokenStream 类的<br />
有一个public final Token next()方法,这个方法以TokenStream.next()产生的Token流 为处理源，产生的仍然是Token流<br />
只不过中间有一些处理的过程<br />
<span style="color: #ff0000">LowerCaseFilter：将所有的Token流的转化为小写<br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #333399">&nbsp;t.termText = t.termText.toLowerCase();</span><br />
<span style="color: #ff0000">StopFilter：过滤掉一些停止词，这些停止词由构造函数指定</span><br />
&nbsp;<span style="color: #333399">&nbsp;&nbsp;&nbsp; for (Token token = input.next(); token != null; token = input.next())<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (!stopWords.contains(token.termText))<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return token;<br />
</span><br />
<br />
<span style="color: #800080">比较一下Tokenizer类簇和Filter类簇，可以知道<br />
Tokenizer类簇主要是对输入的Reader流，实际上是字符流按照一定的规则进行分割，产生出Token流<br />
其输入是字符串的Reader流形式，输出是Token流<br />
<br />
Filter类簇主要是对输入的Token流进行更进一步的处理，如去除停止词，转化为小写<br />
主要为一些格式化操作。<br />
由于Filter类簇的输入输出相同，所以可以嵌套几个不同的Filter类，以达到预期的处理目的。<br />
前一个Filter类的输出作为后一个Filter类的输入<br />
而Tokenizer类簇由于输入输出不同，所以不能嵌套<br />
<br />
<br />
</span><br />
<br />
<br />
<br />
<br />
</span></span></span></span></p>
<img src ="http://www.blogjava.net/csusky/aggbug/204087.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-05-30 14:47 <a href="http://www.blogjava.net/csusky/archive/2008/05/30/204087.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>关于IndexWriter中的3个性能参数</title><link>http://www.blogjava.net/csusky/archive/2008/05/15/200706.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Thu, 15 May 2008 11:27:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/05/15/200706.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/200706.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/05/15/200706.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/200706.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/200706.html</trackback:ping><description><![CDATA[<span style="font-size: 10pt">在<font size="2">IndexWriter</font>中有3个重要的性能参数<br />
mergeFactor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 默认为10<br />
minMergeDocs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 默认为10<br />
maxMergeDocs&nbsp;&nbsp;&nbsp;&nbsp; 默认为Integer.maxValue<br />
<br />
maxMergeDocs&nbsp;&nbsp;&nbsp;&nbsp; 一个段中所能包含的最大的doc数，达到这个数目即不再将段进行合并 一般不改变这个值<br />
minMergeDocs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 是指在RAMDirectory中保存的Doc的个数，达到minMergeDocs&nbsp;个即要合并到硬盘上去（在硬盘上新建一个段）<br />
mergeFactor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 合并因子，是控制硬盘上的段的合并的，每次在硬盘上新建一个段之后即执行<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <font size="2">targetMergeDocs</font>*=mergeFactor（一开始<font size="2">targetMergeDocs</font>=minMergeDocs）&nbsp;如果硬盘上的doc数目大于等于<font size="2">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; targetMergeDocs则将硬盘上最后建立的mergeFactor个段进行合并成一个段<br />
</font><br />
拿默认的参数举例：<br />
如果硬盘上面已经有9个段&nbsp; 每个段分别存储了10个Document,共（90个DOC），这时候如果程序再向硬盘合并一个新的段（含10个DOC），合并完之后<font size="2">targetMergeDocs</font>=10*10&nbsp; 程序检查已经合并的最后（按照创建的时间先后顺序）mergeFactor个段的document的总和100是否大于等于<font size="2">targetMergeDocs（这里是100，刚好满足要求）</font>于是程序又将硬盘上面的后10个段合并为一个新的段。<br />
<br />
另外一个例子：<br />
doc数目&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 段数目<br />
&nbsp; 1000---------------9个<br />
&nbsp; 100-----------------9个<br />
&nbsp; 10&nbsp;&nbsp; ----------------9个<br />
这时如果再象硬盘中新建一个新的包含了10个doc的段<br />
&nbsp;&nbsp;&nbsp; doc数目&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 段数目<br />
&nbsp; (1) 1000----------------9个<br />
<br />
&nbsp; (2)&nbsp; 100-----------------9个<br />
<br />
&nbsp; (3)&nbsp;&nbsp; 10&nbsp;&nbsp;----------------9个<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br />
&nbsp; (4)&nbsp;&nbsp;&nbsp; 10&nbsp;----------------1个<br />
这时候(3)(4)首先合并成一个新的段(3-4)包含100个doc<br />
&nbsp;然后(2)(3-4)和并成一个新段（2-3-4）包含1000个doc<br />
然后(1)(2-3-4)合并成一个新的段&nbsp; 包含10000个doc<br />
最后合并成一个段 <br />
<br />
<br />
<div style="border-right: #cccccc 1px solid; padding-right: 5px; border-top: #cccccc 1px solid; padding-left: 4px; font-size: 13px; padding-bottom: 4px; border-left: #cccccc 1px solid; width: 98%; word-break: break-all; padding-top: 4px; border-bottom: #cccccc 1px solid; background-color: #eeeeee"><img id="Codehighlighter1_59_793_Open_Image" onclick="this.style.display='none'; Codehighlighter1_59_793_Open_Text.style.display='none'; Codehighlighter1_59_793_Closed_Image.style.display='inline'; Codehighlighter1_59_793_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_59_793_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_59_793_Closed_Text.style.display='none'; Codehighlighter1_59_793_Open_Image.style.display='inline'; Codehighlighter1_59_793_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" align="top"  alt="" /><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">final</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;maybeMergeSegments()&nbsp;</span><span style="color: #0000ff">throws</span><span style="color: #000000">&nbsp;IOException&nbsp;</span><span id="Codehighlighter1_59_793_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_59_793_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">long</span><span style="color: #000000">&nbsp;targetMergeDocs&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;minMergeDocs;<br />
<img id="Codehighlighter1_146_789_Open_Image" onclick="this.style.display='none'; Codehighlighter1_146_789_Open_Text.style.display='none'; Codehighlighter1_146_789_Closed_Image.style.display='inline'; Codehighlighter1_146_789_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_146_789_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_146_789_Closed_Text.style.display='none'; Codehighlighter1_146_789_Open_Image.style.display='inline'; Codehighlighter1_146_789_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">&nbsp;(targetMergeDocs&nbsp;</span><span style="color: #000000">&lt;=</span><span style="color: #000000">&nbsp;maxMergeDocs)&nbsp;</span><span id="Codehighlighter1_146_789_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_146_789_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;find&nbsp;segments&nbsp;smaller&nbsp;than&nbsp;current&nbsp;target&nbsp;size</span><span style="color: #008000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;minSegment&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;segmentInfos.size();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;mergeDocs&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;<br />
<img id="Codehighlighter1_305_464_Open_Image" onclick="this.style.display='none'; Codehighlighter1_305_464_Open_Text.style.display='none'; Codehighlighter1_305_464_Closed_Image.style.display='inline'; Codehighlighter1_305_464_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_305_464_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_305_464_Closed_Text.style.display='none'; Codehighlighter1_305_464_Open_Image.style.display='inline'; Codehighlighter1_305_464_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">&nbsp;(</span><span style="color: #000000">--</span><span style="color: #000000">minSegment&nbsp;</span><span style="color: #000000">&gt;=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">)&nbsp;</span><span id="Codehighlighter1_305_464_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_305_464_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SegmentInfo&nbsp;si&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;segmentInfos.info(minSegment);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(si.docCount&nbsp;</span><span style="color: #000000">&gt;=</span><span style="color: #000000">&nbsp;targetMergeDocs)<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">break</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mergeDocs&nbsp;</span><span style="color: #000000">+=</span><span style="color: #000000">&nbsp;si.docCount;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" /><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(mergeDocs&nbsp;</span><span style="color: #000000">&gt;=</span><span style="color: #000000">&nbsp;targetMergeDocs)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;found&nbsp;a&nbsp;merge&nbsp;to&nbsp;do</span><span style="color: #008000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mergeSegments(minSegment</span><span style="color: #000000">+</span><span style="color: #000000">1</span><span style="color: #000000">);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">else</span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">break</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" /><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;targetMergeDocs&nbsp;</span><span style="color: #000000">*=</span><span style="color: #000000">&nbsp;mergeFactor;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;increase&nbsp;target&nbsp;size</span><span style="color: #008000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(</span><span style="color: #000000">"</span><span style="color: #000000">-&nbsp;--&nbsp;-&nbsp;-targetMergeDocs:</span><span style="color: #000000">"</span><span style="color: #000000">+</span><span style="color: #000000">targetMergeDocs);<br />
<img id="Codehighlighter1_740_760_Open_Image" onclick="this.style.display='none'; Codehighlighter1_740_760_Open_Text.style.display='none'; Codehighlighter1_740_760_Closed_Image.style.display='inline'; Codehighlighter1_740_760_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_740_760_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_740_760_Closed_Text.style.display='none'; Codehighlighter1_740_760_Open_Image.style.display='inline'; Codehighlighter1_740_760_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">try</span><span style="color: #000000">&nbsp;</span><span id="Codehighlighter1_740_760_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_740_760_Open_Text"><span style="color: #000000">{Thread.sleep(</span><span style="color: #000000">5000</span><span style="color: #000000">);}</span></span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">catch</span><span style="color: #000000">(Exception&nbsp;e)&nbsp;</span><span id="Codehighlighter1_781_782_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_781_782_Open_Text"><span style="color: #000000">{}</span></span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;}</span></span></div>
</span>
<img src ="http://www.blogjava.net/csusky/aggbug/200706.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-05-15 19:27 <a href="http://www.blogjava.net/csusky/archive/2008/05/15/200706.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Lucene索引文件的格式</title><link>http://www.blogjava.net/csusky/archive/2008/04/21/194564.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Mon, 21 Apr 2008 09:52:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/21/194564.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/194564.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/21/194564.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/194564.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/194564.html</trackback:ping><description><![CDATA[<p><span style="font-size: 10pt"><span style="color: #ccffcc"><span style="color: #ccffcc"><span style="color: #99ccff"><span style="color: #3366ff">segments文件的格式： （段的信息）<br />
int:&nbsp; =-1&nbsp;&nbsp;&nbsp; 查看文件是否是Lucene合法的文件格式<br />
long:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 版本号，每更新一次该文件将会将版本号加1<br />
int:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 用来命名新段<br />
int:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 段的数目<br />
String + int 段的信息 String是段的名称&nbsp; int是段中所含的doc数目<br />
String + int 同上</span></span></span></span></span></p>
<p><br />
<span style="font-size: 10pt"><span style="color: #99cc00"><span style="color: #ccffcc"><span style="color: #ccffcc"><span style="color: #99ccff"><span style="color: #3366ff">.fnm的文件格式：&nbsp;&nbsp; （Field的信息）<br />
int:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Field的个数，最少为1，最少有一个Field("",false)，在初始化的时候写入(暂时不知道原因); 名称为空字符串，未索引，&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 未&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 向&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 量化。readVInt()读取<br />
String: byte&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String是&nbsp;Field的名称&nbsp; byte指示该Field&nbsp;是否被索引，是否向量化 （值有：11，10，01）第一个1代表被索引，第二个代表被向量化<br />
String: byte Field 同上</span></span></span></span></span></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </p>
<p><span style="color: #3366ff">&nbsp;<br />
</span></p>
<p><span style="font-size: 10pt"><span style="color: #3366ff">.fdx的文件格式：主要是提供对.fdt中存储的document的随即读取<br />
long :&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 第一个document在.fdt文件中的位置<br />
long:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 第二个document在.fdt文件中的位置</span></span><span style="color: #3366ff">
<p><span style="color: #3366ff"><br />
</span></p>
</span></p>
<p><span style="font-size: 10pt"><span style="color: #3366ff">.fdt的文件格式：&nbsp; .fdt文件存储了一系列document的信息<br />
VInt:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 该document中的isStored属性为true的域的个数<br />
(VInt:)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 如果该field的isStored属性为true则得到该field的fieldNumber，暂时不知道这个fieldNumber是怎么产生的，有什么用，初步估计是按照field创建的顺序产生的，每次再上一个field的fieldNumber基础上加1。<br />
byte:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 如果该field的isTokenized属性为true写入1否则写入false。<br />
String:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 该field的stringValue()值。<br />
一个document结束，下面的数据将会开始一个新的document，每个新的document的开始点的文件位置都会在.fdx中有记载，便于随即访问</span></span></p>
<p>&nbsp;</p>
<img src ="http://www.blogjava.net/csusky/aggbug/194564.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-21 17:52 <a href="http://www.blogjava.net/csusky/archive/2008/04/21/194564.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.index.SegmentInfos</title><link>http://www.blogjava.net/csusky/archive/2008/04/18/194072.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Fri, 18 Apr 2008 09:02:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/18/194072.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/194072.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/18/194072.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/194072.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/194072.html</trackback:ping><description><![CDATA[<span style="font-size: 10pt">final class SegmentInfos extends Vector<br />
可以看出该类实际上是一个Vector&nbsp;&nbsp; 以及封装了对该Vevtor的一些操作<br />
实际上封装的是对segments文件的一些读写操作<br />
先来看下segments文件的格式<br />
<br />
segments文件的格式：<br />
int:&nbsp; =-1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 文件是否是Lucene合法的文件格式正常情况下为 -1<br />
long:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 版本号，每更新一次该文件将会将版本号加1<br />
int:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;用来命名新段<br />
int:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 段的数目<br />
String +&nbsp;int&nbsp; 段的信息 String是段的名称&nbsp; int是段中所含的doc数目<br />
String + int&nbsp; 同上<br />
<br />
所以用Lucene的API，我们可以简单的打印出其segments的所有信息<br />
<p>try {<br />
&nbsp;&nbsp;&nbsp;//DataInputStream fis = new DataInputStream(new FileInputStream("C:\\sf\\snow\\segments"));<br />
&nbsp;&nbsp;&nbsp;FSDirectory dir=FSDirectory.getDirectory("C:/sf/snow", false);<br />
&nbsp;&nbsp;&nbsp; InputStream input = dir.openFile("segments");<br />
&nbsp;&nbsp;&nbsp;System.out.println("Format:"+input.readInt());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //得到文件标志，是否为正常的segments文件<br />
&nbsp;&nbsp;&nbsp;System.out.println("version:"+input.readLong());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //得到版本号<br />
&nbsp;&nbsp;&nbsp;System.out.println("name:"+input.readInt());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //<span style="color: red">得到用来重命名新段的int，暂时不知道有什么用<br />
</span>&nbsp;&nbsp;&nbsp;int n=input.readInt();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //段的数目<br />
&nbsp;&nbsp;&nbsp;System.out.println("SegmentNum:"+n);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;for(int i=0;i&lt;n;i++) {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //用循环打印出所有段的信息 名称和长度<br />
&nbsp;&nbsp;&nbsp;&nbsp;System.out.println("segment "+i+" - name:"+input.readString()+" num:"+input.readInt());<br />
&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;} catch (Exception e) {</p>
<p>&nbsp;&nbsp;}<br />
当然,该类提供了更为复杂的访问和更新segments文件的方法<br />
&nbsp;final void read(Directory directory)&nbsp;&nbsp;&nbsp; 将所有的段信息保存在本vector中<br />
final void write(Directory directory)&nbsp;&nbsp;&nbsp; 跟新该segment文件的内容，主要是为了添加段，<br />
主要是更新 版本号 段的数目，跟新完这些后即可往segment文件后添加新段的信息。<br />
<br />
</p>
</span>
<img src ="http://www.blogjava.net/csusky/aggbug/194072.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-18 17:02 <a href="http://www.blogjava.net/csusky/archive/2008/04/18/194072.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.index.SegmentInfo</title><link>http://www.blogjava.net/csusky/archive/2008/04/18/194062.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Fri, 18 Apr 2008 08:45:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/18/194062.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/194062.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/18/194062.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/194062.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/194062.html</trackback:ping><description><![CDATA[<p style="font-size: 10pt">segment(段)的信息<br />
该类比较简单，贴出其全部代码</p>
<p>import org.apache.lucene.store.Directory;</p>
<p><span style="font-size: 10pt">final class SegmentInfo {<br />
&nbsp; public String name;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //在索引目录中唯一的名称&nbsp;<br />
&nbsp; public int docCount;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //&nbsp;该段中doc的数目<br />
&nbsp; public Directory dir;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // 该段所存在的Dirrectory<br />
</span><span style="font-size: 10pt"><br />
&nbsp; public SegmentInfo(String name, int docCount, Directory dir) {<br />
&nbsp;&nbsp;&nbsp; this.name = name;<br />
&nbsp;&nbsp;&nbsp; this.docCount = docCount;<br />
&nbsp;&nbsp;&nbsp; this.dir = dir;<br />
&nbsp; }<br />
}</span></p>
 <img src ="http://www.blogjava.net/csusky/aggbug/194062.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-18 16:45 <a href="http://www.blogjava.net/csusky/archive/2008/04/18/194062.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.store.RAMInputStream</title><link>http://www.blogjava.net/csusky/archive/2008/04/18/193996.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Fri, 18 Apr 2008 03:45:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/18/193996.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/193996.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/18/193996.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/193996.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/193996.html</trackback:ping><description><![CDATA[<p style="font-size: 10pt">该类是从RAMFile中读数据用的<br />
最重要的一个方法：<br />
该方法存在着从RAMFile的多个byte[1024]中读取数据的情况，所以应该在循环中进行处理<br />
<br />
&nbsp;public void readInternal(byte[] dest, int destOffset, int len) {<br />
&nbsp;&nbsp;&nbsp; int remainder = len;<br />
&nbsp;&nbsp;&nbsp; int start = pointer;<br />
&nbsp;&nbsp;&nbsp; while (remainder != 0) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int bufferNumber = start/BUFFER_SIZE; //&nbsp; buffer的序号<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int bufferOffset = start%BUFFER_SIZE; //&nbsp;&nbsp;&nbsp; buffer偏移量<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int bytesInBuffer = BUFFER_SIZE - bufferOffset;// 在当前buffer中剩下的字节数<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //如果缓冲区中剩余的字节大于len，则读出len长度的字节，如果不够则读出剩余的字节数 <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // bytesToCopy表示实际读出的字节数 <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int bytesToCopy = bytesInBuffer &gt;= remainder ? remainder : bytesInBuffer;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; byte[] buffer = (byte[])file.buffers.elementAt(bufferNumber);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System.arraycopy(buffer, bufferOffset, dest, destOffset, bytesToCopy);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; destOffset += bytesToCopy;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //增加已经复制的byte数据长度 到&nbsp; dest中的偏移量<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; start += bytesToCopy;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //RAMFile文件指针，用来确定bufferNumber&nbsp;和bytesInBuffer&nbsp;&nbsp; 相当于内存中的分页<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; remainder -= bytesToCopy;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //剩余的还未复制的字节数<br />
&nbsp;&nbsp;&nbsp; }<br />
&nbsp;&nbsp;&nbsp; pointer += len;//文件指针位置<br />
&nbsp; }</p>
<img src ="http://www.blogjava.net/csusky/aggbug/193996.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-18 11:45 <a href="http://www.blogjava.net/csusky/archive/2008/04/18/193996.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.store.RAMOutputStream</title><link>http://www.blogjava.net/csusky/archive/2008/04/18/193988.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Fri, 18 Apr 2008 03:38:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/18/193988.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/193988.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/18/193988.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/193988.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/193988.html</trackback:ping><description><![CDATA[<p style="font-size: 10pt">这是OutputStream的一个子类，其输出设备是内存，准确来说是RAMFile，即将数据写入到RAMFile的Vector中去。<br />
该类有一个最重要的方法，现在把它整个贴出来</p>
<p><span style="font-size: 10pt">public void flushBuffer(byte[] src, int len) {<br />
&nbsp;&nbsp;&nbsp; int bufferNumber = pointer/BUFFER_SIZE;&nbsp;&nbsp; //buffer序列，即当前所写Buffer在RAMFile中的Vector中的序列号<br />
&nbsp;&nbsp;&nbsp; int bufferOffset = pointer%BUFFER_SIZE;&nbsp;&nbsp; //偏移量，即当前所写字节在当前Buffer中的偏移量。<br />
&nbsp;&nbsp;&nbsp; int bytesInBuffer = BUFFER_SIZE - bufferOffset; //当前Buffer的剩余可写字节数<br />
&nbsp;&nbsp; //bytesToCopy是实际写入的字节数，如果当前Bufer的剩余字节数大于需要写的字节的总数则写入所有字节<br />
&nbsp;&nbsp; //否则，将当前Buffer写满即可，剩余的字节将写入下一个Buffer<br />
&nbsp;&nbsp;&nbsp; int bytesToCopy = bytesInBuffer &gt;= len ? len : bytesInBuffer; </span></p>
<p><span style="font-size: 10pt">&nbsp;&nbsp;&nbsp; if (bufferNumber == file.buffers.size())<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file.buffers.addElement(new byte[BUFFER_SIZE]); //在RAMFile中添加新的byte[1024]元素</span></p>
<p><span style="font-size: 10pt">&nbsp;&nbsp;&nbsp; byte[] buffer = (byte[])file.buffers.elementAt(bufferNumber);<br />
&nbsp;&nbsp;&nbsp; System.arraycopy(src, 0, buffer, bufferOffset, bytesToCopy);</span></p>
<p><span style="font-size: 10pt">&nbsp;&nbsp;&nbsp; if (bytesToCopy &lt; len) {&nbsp;&nbsp;&nbsp;&nbsp; // not all in one buffer,<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int srcOffset = bytesToCopy;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bytesToCopy = len - bytesToCopy;&nbsp;&nbsp;&nbsp; // remaining bytes 剩余的未写入的字节数<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bufferNumber++;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //将buffer数增加1<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (bufferNumber == file.buffers.size())&nbsp; <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file.buffers.addElement(new byte[BUFFER_SIZE]);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; buffer = (byte[])file.buffers.elementAt(bufferNumber); //剩余字节写入下一个Buffer<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System.arraycopy(src, srcOffset, buffer, 0, bytesToCopy);<br />
&nbsp;&nbsp;&nbsp; }<br />
&nbsp;&nbsp;&nbsp; pointer += len;<br />
&nbsp;&nbsp;&nbsp; if (pointer &gt; file.length)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file.length = pointer;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //移位文件指针&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 在原有的基础上加上实际写入的字节总数</span></p>
<p><span style="font-size: 10pt">&nbsp;&nbsp;&nbsp; file.lastModified = System.currentTimeMillis(); //修改文件的最后修改时间为当前时间<br />
&nbsp; }<br />
<br />
从指定的字节数组复制指定长度的字节到RAMFile中去。由于RAMFile中Vector的元素是byte[1024]所以可能存在做一次该操作<br />
要操作两个Vector元素的情况。即先将当前byte[1024]数组填满，再新建一个元素装载剩余的字节。<br />
<br />
另外还有一个writeTo(OutputStream out)方法，将RAMFile中的数据输出到另一个输出流<br />
<br />
<br />
<br />
</span></p>
<img src ="http://www.blogjava.net/csusky/aggbug/193988.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-18 11:38 <a href="http://www.blogjava.net/csusky/archive/2008/04/18/193988.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.store.RAMFile</title><link>http://www.blogjava.net/csusky/archive/2008/04/18/193982.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Fri, 18 Apr 2008 03:23:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/18/193982.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/193982.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/18/193982.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/193982.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/193982.html</trackback:ping><description><![CDATA[<p style="font-size: 10pt">这个类比较简单<br />
<span style="font-size: 10pt">import java.util.Vector;<br />
</span><span style="font-size: 10pt">class RAMFile {<br />
&nbsp; Vector buffers = new Vector();<br />
&nbsp; long length;<br />
&nbsp; long lastModified = System.currentTimeMillis();<br />
}<br />
<br />
可以理解为一个存储在内存中的文件，buffers是存储数据的容器，length是容器中数据的总的字节数<br />
lastModified 是最后修改时间。<br />
<br />
在实际使用过程中容器buffers存放的对象是一个byte[1024]数组。<br />
<br />
</span></p>
<img src ="http://www.blogjava.net/csusky/aggbug/193982.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-18 11:23 <a href="http://www.blogjava.net/csusky/archive/2008/04/18/193982.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.store.OutputStream</title><link>http://www.blogjava.net/csusky/archive/2008/04/16/193574.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Wed, 16 Apr 2008 13:24:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/16/193574.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/193574.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/16/193574.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/193574.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/193574.html</trackback:ping><description><![CDATA[<span style="font-size: 10pt">OutputStream<br />
这是一个Abstract类，是Lucene自己的一个文件输出流的基类<br />
BUFFER_SIZE = 1024&nbsp; 缓冲区 大小为 1024bit<br />
bufferStart = 0 文件位置指针<br />
bufferPosition = 0 内存缓冲区指针<br />
<br />
&nbsp;public final void writeByte(byte b) throws IOException {<br />
&nbsp;&nbsp;&nbsp; if (bufferPosition &gt;= BUFFER_SIZE)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; flush();<br />
&nbsp;&nbsp;&nbsp; buffer[bufferPosition++] = b;<br />
&nbsp; }<br />
几乎所有的写入函数都要调用这个函数，如果缓冲区的当前容量已经等于他的最大容量，则将缓冲区中的数据写入文件。<br />
<br />
public final void writeBytes(byte[] b, int length) throws IOException<br />
批量写byte进入内存缓冲<br />
<br />
public final void writeInt(int i) throws IOException <br />
写入整形数据<br />
<br />
public final void writeLong(long i) throws IOException<br />
写入长整型数据，即结合移位运算调用两次writeInt(int i)<br />
<br />
另外，最值得注意的是在该类中有两个最特殊的函数<br />
writeVInt（int i） /&nbsp;&nbsp; writeVLong(long i),<br />
先说<br />
writeVInt(int&nbsp; i&nbsp;)&nbsp;&nbsp; {<br />
&nbsp;while ((i &amp; ~0x7F) != 0) {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; writeByte((byte)((i &amp; 0x7f) | 0x80));<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; i &gt;&gt;&gt;= 7;<br />
&nbsp;&nbsp;&nbsp; }<br />
&nbsp;&nbsp;&nbsp; writeByte((byte)i);<br />
}<br />
<span style="color: #ff0000">~0x7F==~(0111 1111)==(1000 0000)==0x80<br />
((i &amp; ~0x7F) != 0) 这一句判断i是否大于0x80，如果不是则说明该int只有一个字节的有效数据，其他字节都是0，直接转化为Byte写入。<br />
如果大于0x80则<br />
(i &amp; 0x7f) | 0x80<br />
i&amp;0x7f 只对后7位进行处理，|0x80将第8位置1，与前面的7个bit构成一个字节，置1的原因是说明该字节并不是一个完整的整形数，需要与其他的字节合起来才能构成一个整形数字。<br />
这个算法相当于将一个32bit的整形数字按照每7位编码成一个字节进行存储，将按照整形数的大小存储1-5个字节。</span><br />
writeVLong(long i)方法大致与其相同。<br />
<br />
final void writeChars(String s, int start, int length)<br />
<span style="color: #ff0000">将字符串转化成UTF-8编码的格式进行存储。<br />
</span>附：<br />
<p>UNICODE值 UTF-8编码<br />
U-00000000 - U-0000007F: 0xxxxxxx<br />
U-00000080 - U-000007FF: 110xxxxx 10xxxxxx <br />
U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx <br />
<br />
可见对于在 0x00-0x7F范围内的UNICODE值（最大有效数位：7位），将会编码成单字节的，会大大节约存储空间。<br />
对于在&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0x80-0x7FF范围内的UNICODE（最大有效数位：11位），会编码成双字节的。先存储原字节低5位的数位，且将最高位和次高位都置1，再次高位置0（writeByte((byte)(0xC0 | (code &gt;&gt; 6)));）。然后存储后6位的字节，将前两位置10（writeByte((byte)(0x80 | (code &amp; 0x3F)));）<br />
对于其他的UNICODE值则<br />
writeByte((byte)(0xE0 | (code &gt;&gt;&gt; 12)));&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4位<br />
&nbsp;writeByte((byte)(0x80 | ((code &gt;&gt; 6) &amp; 0x3F)));&nbsp;&nbsp; 5位<br />
&nbsp;writeByte((byte)(0x80 | (code &amp; 0x3F)));&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3-&nbsp;5位<br />
<br />
final void writeString(String s) throws IOException<br />
<span style="color: #ff0000">该函数首先用s.length()判断该String总共有多少个字符<br />
然后首先调用writeVInt写入这个字符长度<br />
再调用writeChars(s,s.length())写入字符</span><br />
<br />
在inputStream中的readString()方法则与其相反，首先用readVInt()方法读取字符长度len 然后读取len长度的字符<br />
<br />
protected final void flush() throws IOException<br />
该方法调用另外一个方法flushBuffer将缓冲区中的数据输出，然后清空缓冲区；<br />
<br />
abstract void flushBuffer(byte[] b, int len) throws IOException<br />
可见flushBuffer方法是abstract的，即需要其子类对该方法进行覆写，以定位该输出流的输出方式。<br />
<br />
final long getFilePointer() throws IOException<br />
得到文件指针的位置，即得到输出流已经输出的字节数。<br />
<br />
public void seek(long pos) throws IOException<br />
输出缓冲区的内容，然后将文件指针定位到long所指示的文件位置。<br />
<br />
abstract long length() throws IOException<br />
返回文件中已有的字节数。需要子类实现。<br />
<br />
<br />
<br />
</p>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
</span>
<img src ="http://www.blogjava.net/csusky/aggbug/193574.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-16 21:24 <a href="http://www.blogjava.net/csusky/archive/2008/04/16/193574.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.store.FSDirectory</title><link>http://www.blogjava.net/csusky/archive/2008/04/10/191977.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Thu, 10 Apr 2008 13:35:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/10/191977.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/191977.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/10/191977.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/191977.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/191977.html</trackback:ping><description><![CDATA[<p style="font-size: 10pt">FSDirectory继承了abstract类Directory<br />
在该类中既有该类的一些初始化操作，又有对FSDirectory对象本身的一些操作，这是为什么把其构造函数设置为私有的一部分原因<br />
<br />
static final Hashtable DIRECTORIES = new Hashtable();<br />
每新建一个FSDirectory都会将其加入到该Hashtable中来。名称是FSDirectory对应的File&nbsp;&nbsp; 值是该FSDirectory。<br />
注意：final对象并非是不可更改的<br />
<br />
static final String LOCK_DIR =<br />
&nbsp;&nbsp;&nbsp; System.getProperty("org.apache.lucene.lockdir",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System.getProperty("java.io.tmpdir"));<br />
首先看用户是否注册了"org.apache.lucene.lockdir"属性，如果没有则用JAVA虚拟机固有的"java.io.tmpdir"属性<br />
这个属性是一个路径，代表lucene的锁文件锁放的位置。<br />
<br />
static final boolean DISABLE_LOCKS =<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Boolean.getBoolean("disableLuceneLocks") || Constants.JAVA_1_1;<br />
如果用户注册了"disableLuceneLocks"属性且为false，<span style="color: red">或者</span>JAVA的版本是1.1则无法使用锁。<br />
<br />
static FSDirectory getDirectory(String path, boolean create)<br />
static FSDirectory getDirectory(File file, boolean create)<br />
从得到一个指定路径或者文件的FSDirectory如果在则取出，如果不存在则用其私有的构造函数构造一个<br />
该类还有3个非static的类变量<br />
&nbsp; private File directory = null;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 索引目录<br />
&nbsp; private int refCount;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;锁目录<br />
&nbsp; private File lockDir;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 索引目录数目<br />
实际上，初始化一个FSDirectory只需要初始化这3个变量即可<br />
如果create的值为true 则：如果索引目录是已经存在的目录，则会遍历该目录然后删除每一个文件，如果锁目录是已存在的也会用list返回所有的文件然后调用file.delete() 删除。 如果目录不存在则创建一个新的。<br />
<br />
注意：list()方法&nbsp;&nbsp; 会先用文件名进行排序然后返回（a.txt会比b.txt先返回）&nbsp;&nbsp;&nbsp; 且delete方法删除文件夹时，只能删除空文件夹。如果失败则跳出程序，不会删除在该文件夹之后返回的文件。（如果有aa.txt , ab/b.txt&nbsp;, b.txt ， 则删除时候由于a文件夹非空删除失败，则b.txt由于前面删除失败跳出程序，也不会被删除,但是aa.txt被正常删除）<br />
<br />
private FSDirectory(File path, boolean create) throws IOException<br />
私有的构造函数<br />
<br />
private synchronized void create() throws IOException<br />
创建新的directory /lockDir目录，当目录已存在时即清空该目录，不存在即创建新的目录。<br />
<br />
final String[] list() throws IOException<br />
以字符串文件名的形式返回索引目录的所有文件<br />
<br />
final boolean fileExists(String name) throws IOException<br />
在索引目录是否存在指定文件名的文件<br />
<br />
final long fileModified(String name) throws IOException<br />
static final long fileModified(File directory, String name)<br />
返回该文件的最后修改时间，directory参数为相对路径，第一个函数的相对路径为索引目录<br />
<br />
void touchFile(String name) throws IOException <br />
将该文件的最后修改时间设置为当前时间<br />
<br />
final long fileLength(String name) throws IOException<br />
返回该文件的长度<br />
<br />
final void deleteFile(String name) throws IOException <br />
删除该文件<br />
<br />
final synchronized void renameFile(String from, String to) throws IOException<br />
重命名该文件<br />
该方法会首先检测新的文件名命名的文件是否已经存在如果存在即删除该文件，然后再将文件重新命名为新的文件名。<br />
doug cutting在该方法的注释上写到：<br />
1.删除操作和重命名的操作不是原子的。<br />
2.重命名操作在有些虚拟机上面不能正确的工作，如果重命名失败则会采用手动copy的方法。使用输入输出流将旧的文件的内容写入到新的文件中去，然后删除旧的文件。<br />
注意：该方法必须是同步的。<br />
<br />
final OutputStream createFile(String name) throws IOException<br />
用指定的文件名创建一个新的可写的空文件&nbsp; 实际上返回的是FSOutputStream,注意这里的OutputStream并不是java的基础类。而是doug cutting自己写的一个文件随即访问类。同理FSInputStream和InputStream也是Lucene自己的类。<br />
<br />
final InputStream openFile(String name) throws IOException<br />
从一个存在的文件打开一个输入流<br />
<br />
getLockPrefix() <br />
在FSDirectory中还有<br />
&nbsp;private static MessageDigest DIGESTER;这个静态变量是提供加密功能的<br />
DIGESTER=MessageDigest.getInstance("MD5"),-----MD5加密算法<br />
或者可以DIGESTER=MessageDigest.getInstance("SHA"),-----SHA加密算法<br />
用于对锁目录的&nbsp;&nbsp; 文件名的加密<br />
用法如下：<br />
digest = DIGESTER.digest(dirName.getBytes());&nbsp; dirName是需要被加密的字符串，这里是索引文件的目录名，<br />
在FSContext中，其应用在 getLockPrefix() 该方法是为某个索引目录创建其对应的锁目录文件名。<br />
首先返回经过加密后的byte[] 数组digest，然后将digest按照每4个bit转化为一个16进制的字符，存进一个StringBuffer中<br />
其转化类似与Base64编码方式，不过要简单得多。<br />
<br />
方法<br />
Lockl&nbsp; makeLock（String name）<br />
是从Directory中扩展而来的，该方法返回一个Lock对象，该对象将会在介绍完Lucene的输入输出流之后介绍。<br />
该方法比较简单，首先是调用了getLockPrefix() 方法，返回文件锁的部分对象名，然后在该名称后面加上锁的特征名<br />
譬如说读写锁 事务锁 <br />
其名称类似于下：<br />
lucene-12c90c2c381bc7acbc4846b4ce97593b-write.lock<br />
lucene-12c90c2c381bc7acbc4846b4ce97593b-commit.lock<br />
这两种锁机制将会在后面介绍<br />
最后通过一个匿名的内部类返回一个经过重载的Lock对象，该内部类中的方法有锁的创建，得到，释放，以及检测，另外还有一个toString()方法返回锁的名称。<br />
<br />
<br />
<br />
在FSDirectory类中有OutputStream和InputStream的实现类，这两个虚类只是定义了一些操作，并没有定义输入或者输出的设备。<br />
Lucene在输入输出的设计上，将会由子类定义输入输出的设备。<br />
FSOutputStream<br />
在FSOutputStream中有一个 RandomAccess File=new RandomAccessFile(path, "rw");<br />
在对该输出流的操作将用调用该file的相应方法实现<br />
最重要的<br />
&nbsp; public final void flushBuffer(byte[] b, int size) throws IOException {<br />
&nbsp;&nbsp;&nbsp; file.write(b, 0, size);<br />
&nbsp; }<br />
flushBuffer的调用将会将byte中的0--size范围的数据写入到文件path中去。<br />
<br />
<br />
FSInputStream<br />
最重要的<br />
protected final void readInternal(byte[] b, int offset, int len)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; throws IOException {<br />
&nbsp;&nbsp;&nbsp; synchronized (file) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; long position = getFilePointer();&nbsp;&nbsp;&nbsp;&nbsp; //得到该当前文件指针<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (position != file.position) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file.seek(position);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file.position = position;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int total = 0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do {<br />
<br />
&nbsp;&nbsp; //从文件中读取指定长度的字节到字节数组<br />
&nbsp;&nbsp; // 在其基类InputStream中的refill()方法&nbsp; 将会调用&nbsp; readInternal(buffer, 0, bufferLength);首先从文件中读取字节到缓冲数组。<br />
&nbsp; //&nbsp; 在InputStream中每次读取操作都会调用readInternal方法，或者通过refill()方法间接调用该方法。<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int i = file.read(b, offset+total, len-total);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //将文件中的数据读到缓冲区<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (i == -1)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; throw new IOException("read past EOF");<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; file.position += i;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; total += i;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; } while (total &lt; len);<br />
&nbsp;&nbsp;&nbsp; }<br />
&nbsp; }<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
</p>
 <img src ="http://www.blogjava.net/csusky/aggbug/191977.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-10 21:35 <a href="http://www.blogjava.net/csusky/archive/2008/04/10/191977.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.document.DateField</title><link>http://www.blogjava.net/csusky/archive/2008/04/10/191963.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Thu, 10 Apr 2008 11:26:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/10/191963.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/191963.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/10/191963.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/191963.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/191963.html</trackback:ping><description><![CDATA[<p style="font-size: 10pt">该类提供了日期和字符串之间的相互转化，实际上是 long型和String型的相互转化，转化时用到了一个不常用的<br />
Long.toString(long,int);方法。是按指定的方式对long型进行转化<br />
第一个参数是要转化的long,第二个参数是转化时候的基数，如果基数是10就相当于方法Long.toString(long);<br />
这里使用的参数是最大值，即36== 10个数字+26个英文字母。这样转化出来的字符串长度比较短，占用比较少的空间，<br />
另外，在转化时，统一了转化后的字符串长度，如果不足9位（日期的long转化后最高为9位，1970之后的日期可正确转换），<br />
统一长度后的字符串可以通过比较字符串来比较日期的大小。<br />
<br />
<br />
日期转化成的字符串类似于<br />
0fev8eza3<br />
本来应该是fev8eza3 采取了不足9位补0的方法。<br />
<br />
&nbsp; private static int DATE_LEN = Long.toString(1000L*365*24*60*60*1000,<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Character.MAX_RADIX).length();<br />
计算出从1970年开始后1000年的时间转化为字符串后的长度，所有转化后的时间都不应超过这个长度，如果不足则在前面补0<br />
<br />
可以通过字符串转化为日期的函数计算出能表示的最大日期为<br />
stringToTime("zzzzzzzzz");<br />
打印出来是 Fri Apr 22 19:04:28 CST 5188&nbsp;&nbsp; <br />
所以该函数能转化的日期范围为 1970-1-1~~5188-4-22<br />
<br />
<br />
日期转化为字符串<br />
public static String timeToString(long time)<br />
<br />
字符串转化为日期<br />
public static long stringToTime(String s) <br />
<br />
<br />
实际上 函数 LongToString(long i,int radix) 相当于&nbsp; 先将i转化为radix进制的整数，然后再用函数<br />
LongToString(i)转化为字符串。所以radix的值应该在2--36之间如果不是 则按照10进制计算。<br />
</p>
<img src ="http://www.blogjava.net/csusky/aggbug/191963.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-10 19:26 <a href="http://www.blogjava.net/csusky/archive/2008/04/10/191963.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.document.Document</title><link>http://www.blogjava.net/csusky/archive/2008/04/08/191555.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Tue, 08 Apr 2008 12:27:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/08/191555.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/191555.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/08/191555.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/191555.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/191555.html</trackback:ping><description><![CDATA[<p style="font-size: 10pt">Document是一些Field的集合，每个Field有一个名字和文本值，当中的某些Field可能会随着Documnet被存储。这样，每个Document应该至少包含一个可以唯一标示它的被存储的Field<br />
<br />
//Field集合<br />
List fields = new Vector();<br />
//增强因子，作用于该Document的所有Field<br />
private float boost = 1.0f;<br />
//向Document中添加Field<br />
public final void add(Field field) {<br />
&nbsp;&nbsp;&nbsp; fields.add(field);<br />
&nbsp; }<br />
//删除指定名称的第一个Field<br />
public final void removeField(String name) <br />
//删除所有拥有指定名称的Field<br />
public final void removeFields(String name) <br />
//得到指定名称的第一个Field<br />
public final Field getField(String name) <br />
//以数组的形式返回指定名称的所有Field<br />
public final Field[] getFields(String name) <br />
//得到所有Field的一个枚举<br />
public final Enumeration fields() <br />
<br />
该类也重载了toString()方法<br />
打印出所有Field的信息<br />
<br />
<br />
<br />
<br />
</p>
<img src ="http://www.blogjava.net/csusky/aggbug/191555.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-08 20:27 <a href="http://www.blogjava.net/csusky/archive/2008/04/08/191555.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>org.apache.lucene.document.Field类</title><link>http://www.blogjava.net/csusky/archive/2008/04/08/191550.html</link><dc:creator>晓宇</dc:creator><author>晓宇</author><pubDate>Tue, 08 Apr 2008 12:07:00 GMT</pubDate><guid>http://www.blogjava.net/csusky/archive/2008/04/08/191550.html</guid><wfw:comment>http://www.blogjava.net/csusky/comments/191550.html</wfw:comment><comments>http://www.blogjava.net/csusky/archive/2008/04/08/191550.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/csusky/comments/commentRss/191550.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/csusky/services/trackbacks/191550.html</trackback:ping><description><![CDATA[package org.apache.lucene.document;&nbsp;&nbsp;&nbsp;<br />
Field<br />
是Document的一部分，每个Field有两个部分组成 名字-值 对 名字是String 值 可以是String 和 Reader，如果是KeyWord类型的Field，那么值将不会被进一步处理，像URL，Date等等。Field被存储在Index中，以便于能以Hits的形式返回原有的Document<br />
Field有3 个Boolean形的标识<br />
&nbsp; private boolean isStored = false;&nbsp;&nbsp;&nbsp;&nbsp; 被存储&nbsp;&nbsp; <br />
&nbsp; private boolean isIndexed = true;&nbsp;&nbsp;&nbsp; 被索引<br />
&nbsp; private boolean isTokenized = true&nbsp; 被分割<br />
通过调整这3个boolean的值，可以确定该Field的类型<br />
&nbsp; Keyword&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;true, true, false&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 一般存储 URL DATE 等关键字<br />
&nbsp; UnIndexed&nbsp;&nbsp; &nbsp;true, false, false&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;一般是随HITS查询结果一起返回的信息<br />
&nbsp; Text&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; true, true, true&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br />
&nbsp; UnStored&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; false, true, true<br />
<br />
另外，还有一个重载的toString方法 可以打印出该Field的类型<br />
<br />
float boost = 1.0f;&nbsp;&nbsp;&nbsp;&nbsp;增强因子，用于排序的评分，作用于拥有该域(field)的所有文档(document)<br />
<br />
&nbsp;&nbsp;<br />
&nbsp; <br />
<br />
 <img src ="http://www.blogjava.net/csusky/aggbug/191550.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/csusky/" target="_blank">晓宇</a> 2008-04-08 20:07 <a href="http://www.blogjava.net/csusky/archive/2008/04/08/191550.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>