﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-joe --专注java,开源,架构,项目管理-随笔分类-lucene</title><link>http://www.blogjava.net/freeman1984/category/47801.html</link><description>         
        STANDING ON THE SHOULDERS OF GIANTS</description><language>zh-cn</language><lastBuildDate>Fri, 25 Feb 2011 09:24:33 GMT</lastBuildDate><pubDate>Fri, 25 Feb 2011 09:24:33 GMT</pubDate><ttl>60</ttl><item><title>对Lucene PhraseQuery的slop的理解(转载)</title><link>http://www.blogjava.net/freeman1984/archive/2011/02/25/345116.html</link><dc:creator>@joe</dc:creator><author>@joe</author><pubDate>Fri, 25 Feb 2011 05:51:00 GMT</pubDate><guid>http://www.blogjava.net/freeman1984/archive/2011/02/25/345116.html</guid><wfw:comment>http://www.blogjava.net/freeman1984/comments/345116.html</wfw:comment><comments>http://www.blogjava.net/freeman1984/archive/2011/02/25/345116.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/freeman1984/comments/commentRss/345116.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/freeman1984/services/trackbacks/345116.html</trackback:ping><description><![CDATA[文章来自：http://myzhangjl.blog.sohu.com/95911870.html<br />
这几天看Lucene,看到检索那块,被PhraseQuery折腾了一阵,那本《Lucene In Action》里的代码版本太旧了，也不知是翻译的问题还是我的理解问题，总之在看PhraseQuery的设置slop时费了半天劲，不过，总算是搞明白了，发个帖子来分享一下：
<p><font size="3">&nbsp;&nbsp;&nbsp; 所谓PhraseQuery，就是通过短语来检索，比如我想查&#8220;big car&#8221;这个短语，那么如果待匹配的document的指定项里包含了"big car"这个短语，这个document就算匹配成功。可如果待匹配的句子里包含的是&#8220;big black car&#8221;，那么就无法匹配成功了，如果也想让这个匹配，就需要设定slop，先给出slop的概念：slop是指两个项的位置之间允许的最大间隔距离，下面我举例来解释：</font></p>
<p><font size="3">&nbsp;&nbsp; 我的待匹配的句子是：<strong><em>the quick brown fox jumped over the lazy dog.</em></strong></font></p>
<p><font size="3">&nbsp; &nbsp;<strong>例1：</strong> 如果我想用&#8220;<em><strong>quick fox</strong></em>&#8221;来匹配出上面的句子，我发现原句里是<em><strong>quick [brown] fox</strong></em>，就是说和我的&#8220;<em><strong>quick fox</strong></em>&#8221;中间相差了一个单词的距离，所以，我这里把slop设为1，表示<strong><em>quick</em></strong>和<em><strong>fox</strong></em>这两项之间最大可以允许有一个单词的间隔，这样所有&#8220;<em><strong>quick [***] fox</strong></em>&#8221;就都可以被匹配出来了。</font></p>
<p><font size="3">&nbsp;&nbsp;<strong> 例2：</strong>如果我想用&#8220;<em><strong>fox quick</strong></em>&#8221;来匹配出上面的句子，这也是可以的，不过比例1要麻烦，我们需要看把&#8220;<em><strong>fox quick</strong></em>&#8221;怎么移动能形成&#8220;<em><strong>quick [***] fox</strong></em>&#8221;，如下表所示，把<strong><em>fox</em></strong>向右移动3次即可：</font></p>
<div align="center">
<table border="1" align="center">
    <tbody>
        <tr>
            <td>&nbsp;&nbsp;</td>
            <td>fox</td>
            <td>quick</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
        </tr>
        <tr>
            <td>1</td>
            <td>&nbsp;&nbsp;</td>
            <td>fox|quick</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
        </tr>
        <tr>
            <td>2</td>
            <td>&nbsp;&nbsp;</td>
            <td>quick</td>
            <td>fox</td>
            <td>&nbsp;&nbsp;</td>
        </tr>
        <tr>
            <td>3</td>
            <td>&nbsp;&nbsp;</td>
            <td>quick</td>
            <td>&nbsp;&nbsp;</td>
            <td>fox</td>
        </tr>
    </tbody>
</table>
</div>
<p>&nbsp;&nbsp;&nbsp;<font size="3"> <strong>例3：</strong>如果我想用&#8220;<strong><em>lazy jumped quick</em></strong>&#8221;该如何匹配上面的句子呢？这个比例2还要麻烦，我们要考虑3个单词，不管多少个单词，slop表示的是间隔的最大距离，详细起见，我们分别来看每种组合：(<font size="2">我的待匹配的句子是：<strong><em>the quick brown fox jumped over the lazy dog.</em></strong></font>)</font></p>
<ul>
    <li><font size="3"><strong><em>lazy jumped:</em></strong>原句是<strong><em>jumped [over] [the] lazy</em></strong>，就是说它们两个之间间隔了2个词,如下所示：需要把<em><strong>lazy</strong></em>向右移动4位</font></li>
</ul>
<p>&nbsp;</p>
<table border="1" align="center">
    <tbody>
        <tr>
            <td>&nbsp;&nbsp;</td>
            <td>lazy</td>
            <td>jumped</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
        </tr>
        <tr>
            <td>1</td>
            <td>&nbsp;&nbsp;</td>
            <td>lazy|jumped</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
        </tr>
        <tr>
            <td>2</td>
            <td>&nbsp;&nbsp;</td>
            <td>jumped</td>
            <td>lazy</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
        </tr>
        <tr>
            <td>3</td>
            <td>&nbsp;&nbsp;</td>
            <td>jumped</td>
            <td>&nbsp;&nbsp;</td>
            <td>lazy</td>
            <td>&nbsp;&nbsp;</td>
        </tr>
        <tr>
            <td>4</td>
            <td>&nbsp;&nbsp;</td>
            <td>jumped</td>
            <td>&nbsp;&nbsp;</td>
            <td></td>
            <td>&nbsp;lazy&nbsp;</td>
        </tr>
    </tbody>
</table>
<p>&nbsp;</p>
<ul>
    <li>&nbsp; <font size="3"><strong><em>lazy jumped&nbsp;quick：</em></strong>我们主要看<em><strong>lazy</strong></em>和<strong><em>quick</em></strong>，但是由于<em><strong>jumped</strong></em>是在中间，所以移动的时候还是要把<strong><em>jumped</em></strong>考虑在内，原句里<em><strong>lazy</strong></em>和<strong><em>quick</em></strong>的关系是：<strong><em>quick [brown] [fox] [jumped] [over] [the] lazy ，quick lazy</em></strong>中间间隔了5个词，所以如下图所示，把lazy向右移动8次</font></li>
</ul>
<div align="center">
<table border="1" align="center">
    <tbody>
        <tr>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;lazy</td>
            <td>
            <p align="center">jumped</p>
            </td>
            <td>quick</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td></td>
        </tr>
        <tr>
            <td>
            <p align="center">1</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">lazy|jumped</p>
            </td>
            <td>
            <p align="center">quick</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;</p>
            </td>
        </tr>
        <tr>
            <td>
            <p align="center">2</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">jumped</p>
            </td>
            <td>
            <p align="center">lazy|quick</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;</p>
            </td>
        </tr>
        <tr>
            <td>
            <p align="center">&nbsp;3&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">jumped</p>
            </td>
            <td>
            <p align="center">quick</p>
            </td>
            <td>
            <p align="center">&nbsp;lazy&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;</p>
            </td>
        </tr>
        <tr>
            <td>
            <p align="center">4</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">jumped</p>
            </td>
            <td>
            <p align="center">quick</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">lazy&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;</p>
            </td>
        </tr>
        <tr>
            <td>
            <p align="center">&nbsp;5&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">jumped</p>
            </td>
            <td>
            <p align="center">quick</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">lazy&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;</p>
            </td>
        </tr>
        <tr>
            <td>
            <p align="center">6</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">jumped</p>
            </td>
            <td>
            <p align="center">quick</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">lazy&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;</p>
            </td>
        </tr>
        <tr>
            <td>
            <p align="center">7</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">jumped</p>
            </td>
            <td>
            <p align="center">quick</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;&nbsp;</p>
            </td>
            <td>
            <p align="center">lazy&nbsp;</p>
            </td>
            <td>
            <p align="center">&nbsp;</p>
            </td>
        </tr>
        <tr>
            <td>
            <p align="center">8</p>
            </td>
            <td>&nbsp;&nbsp;</td>
            <td>
            <p align="center">jumped</p>
            </td>
            <td>
            <p align="center">quick</p>
            </td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>&nbsp;&nbsp;</td>
            <td>lazy&nbsp;</td>
        </tr>
    </tbody>
</table>
</div>
<p>&nbsp;</p>
<ul>
    <li>&nbsp;<font size="3">最后是<strong><em>jumped qucik</em></strong>，这里不详细画表格了，大家可以自己试试，应该是把jumped向右移动4次。</font></li>
</ul>
<p><font size="3">&nbsp;&nbsp; 综合以上3种情况，所以我们需要把slop设为8才令&#8220;<font size="3"><strong><em>lazy jumped&nbsp;quick</em></strong></font>&#8221;可以匹配到原句。</font></p>
<img src ="http://www.blogjava.net/freeman1984/aggbug/345116.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/freeman1984/" target="_blank">@joe</a> 2011-02-25 13:51 <a href="http://www.blogjava.net/freeman1984/archive/2011/02/25/345116.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>