﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-CONAN ZONE-文章分类-Solr</title><link>http://www.blogjava.net/conans/category/51839.html</link><description>你越挣扎我就越兴奋</description><language>zh-cn</language><lastBuildDate>Sun, 17 Jun 2012 03:08:34 GMT</lastBuildDate><pubDate>Sun, 17 Jun 2012 03:08:34 GMT</pubDate><ttl>60</ttl><item><title>Solr 获取searcher实例分析(转)</title><link>http://www.blogjava.net/conans/articles/380686.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 13 Jun 2012 06:17:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/380686.html</guid><description><![CDATA[<strong>每一个搜索请求都会持有一个searcher的引用，而不是创建一个新的searcher，处理完后会释放掉这个引用</strong>。<br /><br />Solr在初始化化时，通过SolrCore核心类要做很多的初始化工作，包过读取solrconfig.xml配置文件里的内容，代码如下：<br />&nbsp;<wbr>&nbsp;<wbr><br />&nbsp;<wbr>booleanQueryMaxClauseCou<wbr>nt(); //设置布尔查询最多个数。<br />&nbsp;<wbr> &nbsp;<wbr> initListeners();&nbsp;<wbr> //读取配置文件的search实例的监听器。<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> initDeletionPolicy();<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> initIndex();<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> initWriters();<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> initQParsers();<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> initValueSourceParsers();<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> this.searchComponents = loadSearchComponents();<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> // Processors initialized before the handlers<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> updateProcessorChains = loadUpdateProcessorChain<wbr>s();<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> reqHandlers = new RequestHandlers(this);<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> reqHandlers.initHandlersFromConfig( solrConfig );<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> highlighter = initHighLighter();<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> // Handle things that should eventually go away<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> initDeprecatedSupport();<br /><br />loadSearchComponents方法就是初始化indexSearch实例。详细说明如下：<br />getSearcher &#8211; (forceNew, returnSearcher, waitSearcher-Futures)<br />关注solr全局三个点调用getSearcher函数 : solrCore初始化时(false, false, null)，QueryComponent处理查询<br />请求时(false, true, null)，UpdateHandler在处理commit请求时(true, false, new Future[1])<br />---------<br />1.solrCore初始化时<br />根据solrconfig配置的IndexReaderFactory&amp;DirectoryFactory获取索引的IndexReader，再使用这个reader<br />封装一个SolrIndexReader，再使用这个SolrIndexReader封装一个RefCounted(searcher的引用计数器，当搜索<br />组件获取一个组件后引用++，用完后调用close引用--，当引用数为0时将这个引用从core管理的一个当前被使用的<br />searcher的链表移除，同时调用searcher.close回收资源)，将这个引用添加到core管理的一个当前被使用的searcher<br />的链表里如果firstSearcherListeners不为空则回调这些监听器，这个回调是交给core的一个newSingleThreadExecutor去<br />做的，再往这个线程池里添加一个任务:将这个RefCounted设置为core当前最新的searcher的引用计数器<br />最后返回null，因为returnSearcher=false<br />在solrCore初始化时这样做的主要目的是在初始化时就加载好IndexSearcher，搜索请求来了之后能立即返回，<br />而不必等待加载IndexSearcher<br />---------<br />2.QueryComponent处理查询请求时<br />由于core当前最新的searcher的引用计数器不为null且这个获取IndexSearcher的请求不是强制要求获取最新的，且<br />returnSearcher=true故直接返回core当前最新的searcher的引用计数器，且这个引用计数器做++<br />这里面还有段当前searcher的引用计数器为null的逻辑，但是没有发现有什么情况会导致这种情况发生故不累述了<br />---------<br />3.UpdateHandler在处理commit请求时<br />首先到core管理的一个当前被使用的searcher的链表里获取目前最新的searcher；同时会加载索引目录下的<br />index.properties文件(如果存在的话)，拿到KEY=&#8217;index&#8217;的值，其指明目前索引的存放地方；如果获取的目录和当前<br />最新的searcher使用的目录一致且solrConfig.reopenReaders为true则获取通过searher.reader.reopen获取<br />最新的reader -&gt; 封装成searcher，否则直接IndexReader.open获取reader。<br />获取到searcher后的一段逻辑[RefCount封装，添加到searchers链表]和core初始化时是一样的，接下来的逻辑是<br />如果solrConfig.useColdSearcher为TRUE其当前searcher的引用为null-导致来自QueryComponent的请求阻塞<br />[现在还没发现什么情况会导致searcher的引用为null]<br />立即将这个新的searcher的引用设置为core当前最新的searcher的引用计数器，这样来自QueryComponent的请求<br />拿到这个引用后返回，当时这时这个新建的searcher是没有经过其前一个searcher的cache热身的，同时这样会导致这个<br />新建的searcher不会进行热身活动<br />如果solrConfig.useColdSearcher为FALSE则会往线程池里添加一个热身的任务<br />如果newSearcherListeners不为空则回调这些监听器，也是给线程池的任务<br />最后如果先前没有做将新的searcher的引用设置为core当前最新的searcher的引用计数器的行为的话，则往线程池添加<br />一个任务 &#8211; 将新的searcher的引用设置为core当前最新的searcher的引用计数器<br />最后返回null，因为returnSearcher=false <br /><br />from:<a href="http://blog.sina.com.cn/s/blog_56fd58ab0100v3tp.html">http://blog.sina.com.cn/s/blog_56fd58ab0100v3tp.html</a><img src ="http://www.blogjava.net/conans/aggbug/380686.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-06-13 14:17 <a href="http://www.blogjava.net/conans/articles/380686.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>olr 性能调优 NO_NORMS(转)</title><link>http://www.blogjava.net/conans/articles/380685.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 13 Jun 2012 06:16:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/380685.html</guid><description><![CDATA[<div id="sina_keyword_ad_area2" class="articalContent  ">
<h3>indexed fields</h3>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> indexed fields 的数量将会影响以下的一些性能：</p>
<ul><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>索引时的时候的内存使用量</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 索引段的合并时间</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>优化时间</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 索引的大小</li></ul>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 我们可以通过 将 omitNorms=&#8220;true&#8221; 来减少indexed fields数量增加所带来的影响。</p>
<h3>&nbsp;<wbr>&nbsp;<wbr> stored fields</h3>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> Retrieving the stored fields&nbsp;<wbr> 确实是一种开销。这个开销，受每个文档所存储的字节影响很大。每个文档的所占用的空间越大，文档就显的更稀疏，这样从硬盘中读取数据，就需要更多的i/o操作（通常，我们在存储比较大的域的时候，就会考虑这样的事情，比如存储一篇文章的文档。）</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 可以考虑将比较大的域放到solr外面来存储。如果你觉得这样做会有些别扭的话，可以考虑使用压缩的域，但是这样会加重cpu在存储和读取域的时候的负担。不过这样却是可以较少i/0的负担。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 如果，你并不是总是使用 stored fields 的话，可以使用stored field的延迟加载，这样可以节省很多的性能，尤其是使用compressed field 的时候。</p>
<h2>&nbsp;<wbr>Configuration Considerations</h2>
<h3>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> mergeFactor</h3>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 这个是合并因子，这个参数大概决定了segment(索引段)的数量。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 合并因子这个值告诉lucene，在什么时候，要将几个segment合并成为一个segment, 合并因子就像是一个数字系统的基数一样。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 比如说，如果你将合并因子设成10，那么每往索引中添加1000个文档的时候，就会创建一个新的索引段。当第10个大小为1000的索引段添加进来的时候，这十个索引段就会被合并成一个大小为10，000的索引段。当十个大小为10，000的索引段生成的时候，它们就会被合并成一个大小为100，000 的索引段。如此类推下去。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 这个值可以在 solrconfig.xml 中的 *mainIndex*中设置。（不用管indexDefaults中设置）</p>
<h3>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> mergeFactor Tradeoffs</h3>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 较高的合并因子</p>
<ul><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 会提高索引速度</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 较低频率的合并，会导致 更多的索引文件，这会降低索引的搜索效率</li></ul>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 较低的合并因子</p>
<ul><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 较少数量的索引文件，能加快索引的搜索速度。</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 较高频率的合并，会降低索引的速度。</li></ul>
<h2>Cache autoWarm Count Considerations</h2>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 当一个新的 searcher 打开的时候，它缓存可以被预热，或者说使用从旧的searcher的缓存的数据来&#8220;自动加热&#8221;。autowarmCount是这样的一个参数，它表示从旧缓存中拷贝到新缓存中的对象数量。autowarmCount这个参数将会影响&#8220;自动预热&#8221;的时间。有些时候，我们需要一些折中的考虑，seacher启动的时间和缓存加热的程度。当然啦，缓存加热的程度越好，使用的时间就会越长，但往往，我们并不希望过长的seacher启动时间。这个autowarm 参数可以在solrconfig.xml文件中被设置。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 详细的配置可以参考solr的wiki。</p>
<h2>Cache hit rate（缓存命中率）</h2>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 我们可以通过solr的admin界面来查看缓存的状态信息。提高solr缓存的大小往往是提高性能的捷径。当你使用面搜索的时候，你或许可以注意一下filterCache,这个是由solr实现的缓存。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr><br /></p>
<h2>Explicit Warming of Sort Fields&nbsp;<wbr></h2>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 如果你有许多域是基于排序的，那么你可以在"newSearcher"和"firstSearcher"event listeners中添加一些明显需要预热的查询，这样FieldCache 就会缓存这部分内容。</p>
<h2>&nbsp;<wbr>Optimization Considerations</h2>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 优化索引，是我们经常会做的事情，比如，当我们建立好索引，然后这个索引不会再变更的情况，我们就会做一次优化了。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 但，如果你的索引经常会改变，那么你就需要好好的考虑下面的因素的。</p>
<ul><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>当越来越多的索引段被加进索引，查询的性能就会降低， lucene对索引段的数量有一个上限的限制，当超过这个限制的时候，索引段可以自动合并成为一个。</li><li>在同样没有缓存的情况下，一个没有经过优化的索引的性能会比经过优化的索引的性能少10%&#8230;&#8230;</li><li>自动加热的时间将会变长，因为它依赖于搜索。</li><li>&nbsp;<wbr>优化将会对索引的分发产生影响。</li><li>&nbsp;<wbr>在优化期间，文件的大小将会是索引的两倍，不过最终将会回到它原来的大小，或者会更小一点。</li></ul>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 优化，会将所有的索引段合并成为一个索引段，所以，优化这个操作其实可以帮助避免&#8220;too many files&#8221;这个问题，这个错误是由文件系统抛出的。</p>
<h2>Updates and Commit Frequency Tradeoffs</h2>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 如果从机太经常从主机更新的话，从机的性能是会受到影响的。为了避免，由于这个问题而引起的性能下降，我们还必须了解从机是怎样执行更新的，这样我们才能更准确去调节一些相关的参数（commit的频率，spappullers,autowarming/autocount）,这样，从机的更新才不会太频繁。</p>
<ol><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>执行commit操作会让solr新生成一个snapshot。如果将postCommit参数设成true的话，optimization也会执行snapShot.</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> slave上的Snappuller程序一般是在crontab上面执行的，它会去master询问，有没有新版的snapshot。一旦发现新的版本，slave就会把它下载下来，然后snapinstall.</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 每次当一个新的searcher被open的时候，会有一个缓存预热的过程，预热之后，新的索引才会交付使用。</li></ol>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 这里讨论三个有关的参数：</p>
<ul><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> <strong>number/frequency of snapshots</strong> &nbsp;<wbr> ----snapshot的频率。</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> <strong>snappullers 是</strong> &nbsp;<wbr> 在crontab中的，它当然可以每秒一次、每天一次、或者其他的时间间隔一次运行。它运行的时候，只会下载slave上没有的，并且最新的版本。</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> <strong>Cache autowarming</strong> 可以在solrconfig.xml文件中配置。</li></ul>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>如果，你想要的效果是频繁的更新slave上的索引，以便这样看起来比较像&#8220;实时索引&#8221;。那么，你就需要让snapshot尽可能频繁的运行，然后也让 snappuller频繁的运行。这样，我们或许可以每5分钟更新一次，并且还能取得不错的性能，当然啦，cach的命中率是很重要的，恩，缓存的加热时间也将会影响到更新的频繁度。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> cache对性能是很重要的。一方面，新的缓存必须拥有足够的缓存量，这样接下来的的查询才能够从缓存中受益。另一方面，缓存的预热将可能占用很长一段时间，尤其是，它其实是只使用一个线程，和一个cpu在工作。snapinstaller太频繁的话，solr slave将会处于一个不太理想的状态，可能它还在预热一个新的缓存，然而一个更新的searcher被opern了。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>怎么解决这样的一个问题呢，我们可能会取消第一个seacher，然后去处理一个更新seacher，也即是第二个。然而有可能第二个seacher 还没有被使用上的时候，第三个又过来了。看吧，一个恶性的循环，不是。当然也有可能，我们刚刚预热好的时候就开始新一轮的缓存预热，其实，这样缓存的作用压根就没有能体现出来。出现这种情况的时候，降低snapshot的频率才是硬道理。</p>
<h2>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> Query Response Compression</h2>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 在有些情况下，我们可以考虑将solr xml response 压缩后才输出。如果response非常大，就会触及NIc i/o限制。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 当然压缩这个操作将会增加cpu的负担，其实，solr一个典型的依赖于cpu处理速度的服务，增加这个压缩的操作，将无疑会降低查询性能。但是，压缩后的数据将会是压缩前的数据的6分之一的大小。然而solr的查询性能也会有15%左右的消耗。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 至于怎样配置这个功能，要看你使用的什么服务器而定，可以查阅相关的文档。</p>
<h2>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> Embedded vs HTTP Post</h2>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 使用embeded 来建立索引，将会比使用xml格式来建立索引快50%。</p>
<h2>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>RAM Usage Considerations（内存方面的考虑）</h2>
<h3>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> OutOfMemoryErrors</h3>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 如果你的solr实例没有被指定足够多的内存的话，java virtual machine也许会抛outof memoryError，这个并不对索引数据产生影响。但是这个时候，任何的 adds/deletes/commits操作都是不能够成功的。</p>
<h3>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>Memory allocated to the Java VM</h3>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 最简单的解决这个方法就是，当然前提是java virtual machine 还没有使用掉你全部的内存，增加运行solr的java虚拟机的内存。</p>
<h4>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> Factors affecting memory usage（影响内存使用量的因素）</h4>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 我想，你或许也会考虑怎样去减少solr的内存使用量。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 其中的一个因素就是input document的大小。</p>
<p>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 当我们使用xml执行add操作的时候，就会有两个限制。</p>
<ul><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> document中的field都是会被存进内存的，field有个属性叫maxFieldLength，它或许能帮上忙。</li><li>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr> 每增加一个域，也是会增加内存的使用的。</li></ul></div><img src ="http://www.blogjava.net/conans/aggbug/380685.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-06-13 14:16 <a href="http://www.blogjava.net/conans/articles/380685.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Solr Cache使用介绍及分析(转)</title><link>http://www.blogjava.net/conans/articles/380684.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 13 Jun 2012 06:12:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/380684.html</guid><description><![CDATA[本文将介绍Solr查询中涉及到的Cache使用及相关的实现。Solr查询的核心类就是SolrIndexSearcher， 
<p>每个core通常在 同一时刻只由当前的SolrIndexSearcher供上层的handler使用</p>
<p>（当切换SolrIndexSearcher时可能会有两个同时提供服务），而Solr的各种Cache是依附于SolrIndexSearcher的，SolrIndexSearcher在则Cache 生，SolrIndexSearcher亡则Cache被清空close掉。</p>
<p>Solr中的应用Cache有filterCache、 queryResultCache、documentCache等，这些Cache都是SolrCache的实现类，</p>
<p>并且是 SolrIndexSearcher的成员变量，各自有着不同的逻辑和使命，下面分别予以介绍和分析。</p>
<h2>1、SolrCache接口实现类</h2>
<p>Solr提供了两种SolrCache接口实现类：solr.search.LRUCache和solr.search.FastLRUCache。</p>
<p>FastLRUCache是1.4版本中引入的，其速度在普遍意义上要比LRUCache更fast些。<br />下面是对SolrCache接口主要方法的注释：</p>
<div align="left">public interface SolrCache{publicObjectinit(Mapargs,Objectpersistence, CacheRegenerator regenerator);<br />publicintsize();<br />publicObjectput(Objectkey,Objectvalue);<br />publicObjectget(Objectkey);publicvoidclear();voidwarm(SolrIndexSearcher searcher, SolrCache old)throwsIOException;<br />publicvoidclose();}<br />
<h3>1.1、solr.search.LRUCache</h3>LRUCache可配置参数如下：<br />
<p>1）size：cache中可保存的最大的项数，默认是1024<br />2）initialSize：cache初始化时的大小，默认是1024。<br />3）autowarmCount：<br />当切换SolrIndexSearcher时，可以对新生成的SolrIndexSearcher做autowarm（预热）处理。<br />autowarmCount表示从旧的SolrIndexSearcher中取多少项来在新的SolrIndexSearcher中被重新生成，</p>
<p>如何重新生成由CacheRegenerator实现。在当前的1.4版本的Solr中，这个autowarmCount只能取预热的项数，</p>
<p>将来的4.0版本可以指定为已有cache项数的百分比，以便能更好的平衡autowarm的开销及效果。</p>
<p>如果不指定该参数，则表示不做autowarm处理。实现上，LRUCache直接使用LinkedHashMap来缓存数据，</p>
<p>由initialSize来限定cache的大小，淘汰策略也是使用LinkedHashMap的内置的LRU方式，</p>读写操作都是对map的全局锁，所以并发性效果方面稍差。 
<h3>1.2、solr.search.FastLRUCache</h3>在配置方面，FastLRUCache除了需要LRUCache的参数，还可有选择性的指定下面的参数：<br />
<p>1）minSize：当cache达到它的最大数，淘汰策略使其降到minSize大小，默认是0.9*size。<br />2）acceptableSize：当淘汰数据时，期望能降到minSize，但可能会做不到，则可勉为其难的降到acceptableSize，</p>
<p>默认是0.95*size。</p>
<p>3）cleanupThread：相比LRUCache是在put操作中同步进行淘汰工作，FastLRUCache可选择由独立的线程来做，</p>
<p>也就是配置cleanupThread的时候。当cache大小很大时，每一次的淘汰数据就可能会花费较长时间，</p>
<p>这对于提供查询请求的线程来说就不太合适，由独立的后台线程来做就很有必要。实现上，</p>
<p>FastLRUCache内部使用了ConcurrentLRUCache来缓存数据，它是个加了LRU淘汰策略的ConcurrentHashMap，</p>
<p>所以其并发性要好很多，这也是多数Java版Cache的极典型实现。</p>
<h2>2、filterCache</h2>filterCache存储了无序的lucene document id集合，该cache有3种用途：<br />
<p>1）filterCache<br />存储了filter queries(&#8220;fq&#8221;参数)得到的document id集合结果。Solr中的query参数有两种，即q和fq。如果fq存在，</p>
<p>Solr是先查询fq（因为fq可以多个，所以多个fq查询是个取结果交集 的过程），之后将fq结果和q结果取并。</p>
<p>在这一过程中，filterCache就是key为单个fq（类型为Query），value为documentid集合（类型为DocSet）的cache。</p>
<p>对于fq为range query来说，filterCache表现出其有价值的一面。<br />2）filterCache<br />还可用于facet查询（http://wiki.apache.org/solr/SolrFacetingOverview），facet查询中各<br />facet的计数是通过对满足query条件的document<br />id集合（可涉及到filterCache）的处理得到的。因为统计各facet计数可能会涉及到所有的doc<br />id，所以filterCache的大小需要能容下索引的文档数。<br />3）如果solfconfig.xml中配置了&lt;useFilterForSortedQuery/&gt;，</p>
<p>那么如果查询有filter（此filter是一需要过滤的DocSet，而不是fq，我未见得它有什么用），</p>则使用filterCache。<br />
<p>下面是filterCache的配置示例：</p>&lt;!-- Internal cache used by SolrIndexSearcher for filters (DocSets),unordered sets of *all* documents<br /><pre>that match a query.When a new searcher is opened, its caches may be prepopulatedor "autowarmed"<br />
&nbsp;<wbr>using data from caches in the old searcher.autowarmCount is the number of items to prepopulate. <br />
 For LRUCache,the prepopulated items will be the most recently accessed items.--&gt;<br />
&lt;filterCacheclass="solr.LRUCache"size="16384"initialSize="4096"/&gt;
</pre><br />
<p>对于是否使用filterCache及如何配置filterCache大小，需要根据应用特点、统计、效果、经验等各方面来评估。</p>
<p>对于使用fq、facet的应用，对filterCache的调优是很有必要的。</p>
<h2>3、queryResultCache</h2>顾名思义，queryResultCache是对查询结果的缓存（SolrIndexSearcher中的cache缓存的都是document id set），<br />这个结果就是针对查询条件的完全有序的结果。 下面是它的配置示例：<br />&lt;!-- queryResultCache caches results of searches - ordered lists ofdocument ids (DocList) based on a query, a sort, and the rangeof documents requested.--&gt;<br /><pre>&lt;queryResultCacheclass="solr.LRUCache"size="16384"initialSize="4096"/&gt;
</pre>缓存的key是个什么结构呢？就是下面的类（key的hashcode就是QueryResultKey的成员变量hc）：<br /><pre>publicQueryResultKey(Query query, List&lt;Query&gt;filters, Sort sort,intnc_flags)<br />
 {<br />
     this.query=query;<br />
     this.sort=sort;<br />
     this.filters=filters;<br />
     this.nc_flags=nc_flags;<br />
     inth=query.hashCode();<br />
     if(filters!=null)h^=filters.hashCode();<br />
     sfields=(this.sort!=null)?this.sort.getSort():defaultSort;<br />
    for(SortField sf:sfields)<br />
    { // mix the bits so that sortFields are position dependent<br />
 // so that a,b won't hash to the same value as b,ah^=(h&lt;&lt;8)|(h&gt;&gt;&gt;25);<br />
 // reversible hashif(sf.getField()!=null)h+=sf.getField().hashCode();h+=sf.getType();<br />
 if(sf.getReverse())h=~h;if(sf.getLocale()!=null)h+=sf.getLocale().hashCode();<br />
 if(sf.getFactory()!=null)h+=sf.getFactory().hashCode();}hc=h;<br />
 }
</pre>因为查询参数是有start和rows的，所以某个QueryResultKey可能命中了cache，但start和rows却不在cache的<br />document id set范围内。当然，document id<br />set是越大命中的概率越大，但这也会很浪费内存，这就需要个参数：queryResultWindowSize来指定document id<br />set的大小。Solr中默认取值为50,可配置，WIKI上的解释很深简单明了：<br />&lt;!-- An optimization for use with the queryResultCache. When a searchis requested, a superset of the requested number<br /><pre>of document idsare collected.  For example, of a search for a particular queryrequests matching documents 10 <br />
through 19, and queryWindowSize is 50,then documents 0 through 50 will be collected and cached.  <br />
Any furtherrequests in that range can be satisfied via the cache.--&gt;<br />
&lt;queryResultWindowSize&gt;50&lt;/queryResultWindowSize&gt;<br />
相比filterCache来说，queryResultCache内存使用上要更少一些，但它的效果如何就很难说。<br />
就索引数据来说，通常我们只是在索引上存储应用主键id，再从数据库等数据源获取其他需要的字段。<br />
这使得查询过程变成，首先通过solr得到document id set，再由Solr得到应用id集合，<br />
最后从外部数据源得到完成的查询结果。如果对查询结果正确性没有苛刻的要求，可以在Solr之外独立的缓存完整的<br />
<br />
查询结果（定时作废），这时queryResultCache就不是很有必要，否则可以考虑使用queryResultCache。当然，如果发现在<br />
queryResultCache生命周期内，query重合度很低，也不是很有必要开着它。 
</pre>
<h2>4、documentCache</h2><pre>又顾名思义，documentCache用来保存&lt;doc_id,document&gt;对的。如果使用documentCache，就尽可能开大<br />
<br />
些，至少要大过&lt;max_results&gt; *&lt;max_concurrent_queries&gt;，否则因为cache的淘汰，<br />
一次请求期间还需要重新获取document一次。也要注意document中存储的字段的多少，避免大量的内存消耗。<br />
下面是documentCache的配置示例：&lt;!-- documentCache caches Lucene Document objects (the stored fields for each document).--&gt;<br />
</pre><pre>&lt;documentCacheclass="solr.LRUCache"size="16384"initialSize="16384"/&gt;
</pre><pre>5、User/Generic Caches <br />
Solr支持自定义Cache，只需要实现自定义的regenerator即可，下面是配置示例：&lt;!-- Example of a generic cache.  These caches may be accessed by namethrough SolrIndexSearcher.getCache(),<br />
</pre><pre>cacheLookup(), and cacheInsert().The purpose is to enable easy caching of user/application level data.<br />
The regenerator argument should be specified as an implementationof solr.search.CacheRegenerator if<br />
&nbsp;<wbr>autowarming is desired.--&gt;&lt;!--&lt;cache name="yourCacheNameHere"class="solr.LRUCache"size="4096"<br />
initialSize="2048"regenerator="org.foo.bar.YourRegenerator"/&gt;--&gt;
</pre><pre>6、The Lucene FieldCache <br />
lucene中有相对低级别的FieldCache，Solr并不对它做管理，所以，lucene的FieldCache还是由lucene的IndexSearcher来搞。 
</pre>
<h2>7、autowarm</h2><pre>上面有提到autowarm，autowarm触发的时机有两个，一个是创建第一个Searcher时（firstSearcher），一个是创建个新<br />
<br />
Searcher（newSearcher）来代替当前的Searcher。在Searcher提供请求服务前，Searcher中的各个Cache可以<br />
做warm处理，处理的地方通常是SolrCache的init方法，而不同cache的warm策略也不一样。<br />
1）filterCache：filterCache注册了下面的CacheRegenerator，就是由旧的key查询索引得到新值put到新cache中。solrConfig.filterCacheConfig.setRegenerator(newCacheRegenerator(){publicbooleanregenerateI<wbr>tem<br />
</pre><pre>(SolrIndexSearcher newSearcher, SolrCache newCache, SolrCache oldCache,ObjectoldKey,ObjectoldVal)<br />
throwsIOException{newSearcher.cacheDocSet((Query)oldKey,null,false);returntrue;}});<br />
 2）queryResultCache：queryResultCache的autowarm不在SolrCache的init（也就是说，不是去遍历已<br />
 有的queryResultCache中的query key执行查询），而是通过SolrEventListener接口的void<br />
newSearcher(SolrIndexSearcher newSearcher, SolrIndexSearcher<br />
currentSearcher)方法，来执行配置中特定的query查询，达到显示的预热lucene FieldCache的效果。<br />
queryResultCache的配置示例如下：<br />
&lt;listenerevent="newSearcher"class="solr.QuerySenderListener"&gt;&lt;arrname="queries"&gt;&lt;!-- seed common sort fields --&gt;&lt;lst&gt;<br />
</pre><pre>&lt;strname="q"&gt;anything&lt;/str&gt;&lt;strname="sort"&gt;name desc price desc populartiy desc&lt;/str&gt;&lt;/lst&gt;&lt;/arr&gt;<br />
&lt;/listener&gt;&lt;listenerevent="firstSearcher"class="solr.QuerySenderListener"&gt;&lt;arrname="queries"&gt;<br />
&lt;!-- seed common sort fields --&gt;&lt;lst&gt;&lt;strname="q"&gt;anything&lt;/str&gt;&lt;strname="sort"&gt;<br />
name desc, price desc, populartiy desc&lt;/str&gt;&lt;/lst&gt;&lt;!-- seed common facets and filter queries --&gt;<br />
&lt;lst&gt;&lt;strname="q"&gt;anything&lt;/str&gt;&lt;strname="facet.field"&gt;category&lt;/str&gt;<br />
&lt;strname="fq"&gt;inStock:true&lt;/str&gt;&lt;strname="fq"&gt;price:[0 TO 100]&lt;/str&gt;&lt;/lst&gt;&lt;/arr&gt;&lt;/listener&gt;<br />
3）documentCache：因为新索引的document id和索引文档的对应关系发生变化，所以documentCache没有warm的过程，<br />
落得白茫茫一片真干净。尽管autowarm很好，也要注意autowarm带来的开销，这需要在实际中检验其warm的开销，<br />
也要注意Searcher的切换频率，避免因为warm和切换影响Searcher提供正常的查询服务。<br />
<br />
8、参考文章 <br />
http://wiki.apache.org/solr/SolrCaching
</pre></div><img src ="http://www.blogjava.net/conans/aggbug/380684.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-06-13 14:12 <a href="http://www.blogjava.net/conans/articles/380684.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>solr 的客户端调用solrj 建索引+分页查询 </title><link>http://www.blogjava.net/conans/articles/379556.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 07:05:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379556.html</guid><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 在&nbsp;solr 3.5 配置及应用(一) 讲过一了&nbsp;solr 3.5的详细配置，本节我们讲利用solr 的客户端调用solr的应用了！一、利用SolrJ操作solr API&nbsp; &nbsp; &nbsp;使用SolrJ操作Solr会比利用httpClient来操作Solr要简单。SolrJ是封装了httpClient方法，来操作solr的API的。SolrJ底层还...&nbsp;&nbsp;<a href='http://www.blogjava.net/conans/articles/379556.html'>阅读全文</a><img src ="http://www.blogjava.net/conans/aggbug/379556.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 15:05 <a href="http://www.blogjava.net/conans/articles/379556.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>solr的facet查询</title><link>http://www.blogjava.net/conans/articles/379555.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:52:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379555.html</guid><description><![CDATA[solr将以导航为目的的查询结果称为facet. 它并不会修改查询结果信息, 只是在查询结果上根据分类添加了count信息, 然后用户根据count信息做进一步的查询, 比如淘宝的查询列表中, 上面会表示不同的类目相关查询结果的数量. <br /><br />比如搜索数码相机, 在搜索结果栏会根据厂商, 分辨率等维度列出, 这里厂商, 分辨率就是一个个facet. <br /><br />然后在厂商下面会有nikon, canon, sony等品牌, 这个叫约束(constraints) <br /><br />接下来是根据选择, 列出当前的导航路径, 这个叫面包屑(breadcrumb). <br /><br />solr有几种facet: <br />普通facet, 比如从厂商品牌的维度建立fact <br />查询facet, 比如根据价格查询时, 将根据价格, 设置多个区间, 比如0-10, 10-20, 20-30等 <br />日期facet, 也是一种特殊的范围查询, 比如按照月份进行facet. <br /><br />facet的主要好处就是可以任意对搜索条件进行组合, 避免无效搜索, 改善搜索体验. <br /><br />facet都是在查询时通过参数指定. 比如 <br />在http api中这样写: 
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: #000000">"</span><span style="color: #ff0000">&amp;facet</span><span style="color: #000000">=true</span><span style="color: #ff0000">&amp;facet</span><span style="color: #000000">.field=manu"&nbsp;</span></div>java代码这样写： 
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;SolrQuery(</span><span style="color: #000000">"</span><span style="color: #000000">*:*</span><span style="color: #000000">"</span><span style="color: #000000">).setFacet(</span><span style="color: #0000ff">true</span><span style="color: #000000">).addFacetField(</span><span style="color: #000000">"</span><span style="color: #000000">manu</span><span style="color: #000000">"</span><span style="color: #000000">);</span></div>而xml返回的结果为这样：
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000ff">&lt;</span><span style="color: #800000">lst&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="facet_fields"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">lst&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="manu"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">int&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="Canon&nbsp;USA"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000">17</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">int</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">int&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="Olympus"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000">12</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">int</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">int&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="Sony"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000">12</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">int</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">int&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="Panasonic"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000">9</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">int</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">int&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="Nikon"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000">4</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">int</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">lst</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /></span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">lst</span><span style="color: #0000ff">&gt;</span></div>通过java代码可以这样获取facet结果：
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #000000">List</span><span style="color: #000000">&lt;</span><span style="color: #000000">FacetField</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;facetFields&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;queryResponse.getFacetFields();</span></div>在已有的查询基础上增加facet query,可以这样写：
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: #000000">solrQuery.addFacetQuery(</span><span style="color: #000000">"</span><span style="color: #000000">quality:[*&nbsp;TO&nbsp;10]</span><span style="color: #000000">"</span><span style="color: #000000">)</span></div>比如对价格按照指定的区间进行facet, 可以这样加上facet后缀: <br /><br />&amp;facet=true&amp;facet.query=price:[* TO 100] <br />&amp;facet.query=price:[100 TO 200];&amp;facet.query=[price:200 TO 300] <br />&amp;facet.query=price:[300 TO 400];&amp;facet.query=[price:400 TO 500] <br />&amp;facet.query=price:[500 TO *]<br /><br />如果要对价格在400到500期间的产品做进一步的搜索, 那么可以这样写(使用了solr的过滤查询): <br />
<div class="quote_title">引用</div>
<div class="quote_div">http://localhost:8983/solr/select?q=camera &amp;facet=on&amp;facet.field=manu&amp;facet.field=camera_type &amp;fq=price:[400 to 500]</div><br /><br />注意这里的facet field不再包含price了 <br /><br />如果这里对类型做进一步的查询, 那么query语句可以这样写: <br />
<div class="quote_title">引用</div>
<div class="quote_div">http://localhost:8983/solr/select?q=camera &amp;facet=on&amp;facet.field=manu &amp;fq=price:[400 to 500] &amp;fq=camera_type:SLR <br /></div><br /><br />facet的使用场景: <br />1.类目导航 <br />2.自动提示, 需要借助一个支持多值的tag field. <br />3.热门关键词排行, 也需要借助一个tag field <br /><br /><br /><br /><br /><br /><br /><img src ="http://www.blogjava.net/conans/aggbug/379555.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:52 <a href="http://www.blogjava.net/conans/articles/379555.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>新版SolrCloud概述</title><link>http://www.blogjava.net/conans/articles/379553.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:47:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379553.html</guid><description><![CDATA[在Lucene/Solr的SVN trunk中的SolrCloud已经可用, 在即将发布的4.0版本中将正式包含. <br /><br />目前SolrCloud已经成熟, 可以支持分布式索引和分布式搜索. 下面是我们一个项目采用新的SolrCloud的部署结构图: <br />
<div style="text-align: center"><img src="http://sematext.files.wordpress.com/2012/01/distributedsolr-arch.png"  alt="" /></div><br />看起来是否非常简单? 下面我们看看内部的一些实现细节. <br /><br /><strong>SolrCloud功能和架构</strong> <br />下面是SolrCloud一些不错的功能: <br />
<ul><li>中心化集群配置 </li><li>自动容灾 </li><li>近实时搜索 </li><li>领导选举 </li><li>索引持久化 </li></ul><br />另外SolrCloud也能被配置成: <br />分片(shard)索引 <br />每个shard可以有一个或多个副本(replica) <br /><br />多个shard和replica可以组成一个Collection(从图中可以看出就是一个SolrCloud), 多个Collection可以部署到一个SolrCloud集群. 而一个搜索请求可以同时搜索多个Collection. 其工作流程就像下图中那样. <br /><img src="http://sematext.files.wordpress.com/2012/01/distributedsolr-shardsreplicas.png"  alt="" /> <br /><br /><strong>SolrCloud Shard, Replica, Replication</strong> <br />就像上图那样, 一个新的doc将发送到一个SolrCloud集群中任何一个节点. doc能自动选择发送到哪一个Shard, 如果Shard有多个副本, doc会自动进行同步, 与原来的master/slave结构有所不同, 数据同步是实时的(原来则是定期批量同步). <br /><br /><strong>集群配置</strong> <br />SolrCloud集群的所有的配置存储在ZooKeeper. 一旦一个SolrCloud节点启动, 该节点的配置信息将发送到ZooKeeper上存储. <br /><br />Shard Replica除了作为容灾备份存在, 另外一个作用就是分散查询请求, 提高整个集群的查询能力. <br /><br /><strong>索引处理</strong> <br />索引文档的更新在Shard和Replica之间是自动和实时的. 因为不存在master server, doc可以发送到任何一个SolrCloud(也就是一个Collection), 然后由SolrCloud完成剩下的事情. 这样就不再存在以前master/slave的单点问题. <br /><br /><strong>搜索方式</strong> <br />有三种不同的搜索方式: <br />在单个Solr实例上搜索 <br />在单个Collection上搜索(即在一个Collection的多个Shard上搜索) <br />在指定的Shard上搜索 <br />在多个Collection上搜索, 并将最后merge的结果返回. <br /><br /><strong>运维管理</strong> <br />除了原来的标准core admin, 还增加了其他方式: <br />在一个Collection上创建一个Shard <br />新建一个Collection <br />增加节点. <br /><br /><strong>下一步计划</strong> <br /><a href="http://wiki.apache.org/solr/NewSolrCloudDesign">http://wiki.apache.org/solr/NewSolrCloudDesign</a><br />有新的SolrCloud设计方案. <img src ="http://www.blogjava.net/conans/aggbug/379553.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:47 <a href="http://www.blogjava.net/conans/articles/379553.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>[译]lucene&amp;solr 2011年盘点</title><link>http://www.blogjava.net/conans/articles/379552.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:44:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379552.html</guid><description><![CDATA[原文:<a href="http://java.dzone.com/articles/lucene-solr-year-2011-review" target="_blank">http://java.dzone.com/articles/lucene-solr-year-2011-review</a> <br /><br />2011年已经过去, 在这里针对本年lucene和solr领域发生的点点滴滴进行一下回顾, 也算是对lucene和solr的一个盘点. <br /><br />lucene成为apache基金会项目已逾十年(实际上lucene存在的历史已超过10年), solr 作为apache基金项目也差不多度过了六个春秋. 而这两个项目的发展离不开Otis(<a href="http://twitter.com/otisg" target="_blank">http://twitter.com/otisg</a> )的长期努力. <br /><br />在这一年里, solr和lucene发生了非常显著的变化, 增加了大量新的功能, 而这个变化可以说超过以往任何一年. <br /><br />其中最激动人心的功能莫过于近实时搜索功能(Near Real-Time search <a href="http://search-lucene.com/?q=NRT" target="_blank">http://search-lucene.com/?q=NRT</a> )的实现, 即对文档的修改会立马出现在搜索结果中. 虽然NRT依然还在继续改进中, 但是很多用户已经开始使用该功能. <br /><br />字段折叠(Field Collapsing <a href="http://wiki.apache.org/solr/FieldCollapsing" target="_blank">http://wiki.apache.org/solr/FieldCollapsing</a> ) 也是solr社区中长期以来期待的一个功能. 这个功能已在今年实现. 现在solr和lucene用户可以基于字段和查询条件对结果集进行进行分组. 并实现了对分组进行控制. 此外还可以基于分组进行facet运算(而以前只能基于文档). <br /><br />在这一年, lucene也引入了faceting module(<a href="https://issues.apache.org/jira/browse/LUCENE-3079" target="_blank">https://issues.apache.org/jira/browse/LUCENE-3079</a> ), 从此以后, facet将不再是solr的专利. lucene用户可以进行facet运算了. <br /><br />从今年开始, 你可以通过使用Join module(<a href="http://wiki.apache.org/solr/Join" target="_blank">http://wiki.apache.org/solr/Join</a> ) 对父子关联的文档建索引, 这样我们可以在查询的过程中根据文档索引将父子文档进行连接. <br /><br />2011年, 在多语言支持方面(<a href="http://wiki.apache.org/solr/LanguageAnalysis#Stemming" target="_blank">http://wiki.apache.org/solr/LanguageAnalysis#Stemming</a> ) ,solr和lucene也取得了重大突破: 加入了KStemFilter English stemmer(<a href="http://wiki.apache.org/solr/LanguageAnalysis#Notes_about_solr.KStemFilterFactory" target="_blank">http://wiki.apache.org/solr/LanguageAnalysis#Notes_about_solr.KStemFilterFactory</a> ) , 提供了对Unicode 4完整的支持, 增加了对中文和日文的支持, 增加了一个新的stemmer保护机制. 降低了synonym filter对内存的消耗. 其中最大的一个增强是集成了Hunspell(<a href="http://wiki.apache.org/solr/LanguageAnalysis#Notes_about_solr.HunspellStemFilterFactory" target="_blank">http://wiki.apache.org/solr/LanguageAnalysis#Notes_about_solr.HunspellStemFilterFactory</a> ), 这样可以使用OpenOffice所支持的语言进行stemming处理. <br /><br />lucene 3.5.0的发布, 大幅度的降低了term词典的内存消耗(在对term词典处理时, 比以前减少了3~5倍). <br /><br />以前在使用lucene的时候, 如果对大数据量的搜索结果进行分页处理, 从头翻到尾会出现问题. 而在lucene 3.5.0这个版本, 通过引入searchAfter方法进行了彻底的解决. <br /><br />在这一年, lucene和solr提供了一个新的, 更高效, 更可靠的基于Term Vector的高亮功能. <br /><br />在这一年, solr集成了扩展的Dismax查询解析器(<a href="http://search-lucene.com/?q=Extended+Dismax" target="_blank">http://search-lucene.com/?q=Extended+Dismax</a> ), 进一步提高了搜索结果的质量. <br /><br />这一年, 你可以使用函数(<a href="http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function" target="_blank">http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function</a> )对搜索结果进行排序(比如根据某个值到指定点的距离进行排序), 并且提供了一个新的根据空间搜索过滤器. <br /><br />solr也提供了一个新的, 基于FST机器人(可以显著的降低内存消耗)的suggest (<a href="http://wiki.apache.org/solr/Suggester" target="_blank">http://wiki.apache.org/solr/Suggester</a> )/自动完成搜索功能, 如果你对这个功能感兴趣, 可以关注一下Sematext (<a href="http://sematext.com/products/autocomplete/index.html" target="_blank">http://sematext.com/products/autocomplete/index.html</a> )提供的自动完成搜索功能. <br /><br />这里还需要提到的就是solr即将提供的新的事务日志(transaction log <a href="https://issues.apache.org/jira/browse/SOLR-2700" target="_blank">https://issues.apache.org/jira/browse/SOLR-2700</a> )支持, 该支持将实现实时返回(real-time get <a href="https://issues.apache.org/jira/browse/SOLR-2656" target="_blank">https://issues.apache.org/jira/browse/SOLR-2656</a> )的功能, 即在添加一个文档之后你能立即根据id返回该文档. 事务日志也将用于SolrCloud分布式节点的恢复. <br /><br />说到SolrCloud(<a href="http://wiki.apache.org/solr/SolrCloud" target="_blank">http://wiki.apache.org/solr/SolrCloud</a> ) 这里(<a href="http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/" target="_blank">http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/</a> )还有一篇介绍. 对于SolrCloud, 用一句话来概括, 就是运用最新的设计原则并借助其他软件模块(比如zookeeper)更快速的搭建一套更强大solr分布式集群. 其核心思想就是拒绝单点故障, 采用中心化的集群和配置管理, 打破原有的master-slave架构, 做到容灾自动切换和动态调整. <br /><br />2010年将两个项目的开发进行整合之后, 这两个项目的发展非常迅猛. 在2011年, lucene和solr在众多committer们的大力支持下发布了5个版本. 三月, lucene和solr 3.1版本发布, 3个月后的6月4日, 3.2版本发布. 一个月之后, 7月1日, lucene和solr 3.3版本发布. 9月14日, 3.4版本发布, 11月, 3.5.0版本顺利发布. <br /><br />在2011年, lucene和solr相关的会议也不少, 首先登场是是5月份在旧金山举行的Lucene Revolution, otis在大会上做了题为"Search Analytics: What? Why? How?"(<a href="http://java.dzone.com/articles/lucene-solr-year-2011-review" target="_blank">http://java.dzone.com/articles/lucene-solr-year-2011-review</a> )的演讲, 其他干货猛击这里 (<a href="http://lucenerevolution.com/2011/agenda" target="_blank">http://lucenerevolution.com/2011/agenda</a> ) . 在六月份的Buzzwords大会上, otis在大会上做了"Search Analytics: What? Why? How?"的升级版演讲. 相关资料可参考官方网站: <a href="http://berlinbuzzwords.de/" target="_blank">http://berlinbuzzwords.de</a> . 10月份, 在巴塞罗那举行了专门针对lucene和solr的 Lucene Eurocon 2011 大会. Otis 在大会上做了主题为"Search Analytics: Business Value &amp; BigData NoSQL Backend"(<a href="http://www.lucidimagination.com/sites/default/files/file/Eurocon2011/otis_gospodnetic_search_analytics_lucene_eurocon_2011.ppt" target="_blank">http://www.lucidimagination.com/sites/default/files/file/Eurocon2011/otis_gospodnetic_search_analytics_lucene_eurocon_2011.ppt</a> )的主题演讲, 而Rafał(<a href="http://twitter.com//kucrafal" target="_blank">http://twitter.com//kucrafal</a> )在大会上做了"Explaining &amp; Visualizing Solr 'explain' information"(<a href="http://www.lucidimagination.com/sites/default/files/file/Eurocon2011/Understanding%20and%20Visualizing%20Solr%20Explain%20information%20-%20Solr.pl%20-%20version%202.pdf" target="_blank">http://www.lucidimagination.com/sites/default/files/file/Eurocon2011/Understanding%20and%20Visualizing%20Solr%20Explain%20information%20-%20Solr.pl%20-%20version%202.pdf</a> )的演讲. <br /><br />在2011年, lucene和solr又迎来了一批新的志同道合者: <br />&#8226;Andi Vajda <br />&#8226;Chris Male <br />&#8226;Dawid Weiss <br />&#8226;Erick Erickson <br />&#8226;Jan H&#248;ydahl <br />&#8226;Martin van Groningen <br />&#8226;Stanisław Osiński <br /><br />对于一个成功的开源项目, 相关的图书对使用者也是必不可少. 虽然今年Lucene in Action没有推出新的版本, 但是Rafał Kuć在今年7月给我们带来了它的新作"Solr 3.1 Cookbook". 在该书中,&nbsp; 为解决solr的一些常见问题, Rafał给出了他的答案. 而David Smiley 和 Eric Pugh在今年十一月推出了"Apache Solr 3 Enterprise Search Server"的新版本. <br /><br />至于2012年, lucene和solr会带来什么新的惊喜, 让我们拭目以待.  <img src ="http://www.blogjava.net/conans/aggbug/379552.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:44 <a href="http://www.blogjava.net/conans/articles/379552.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>使用SolrJ生成索引</title><link>http://www.blogjava.net/conans/articles/379551.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:43:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379551.html</guid><description><![CDATA[代码很简单, 直接看就明白了, 可以在实际工作中借鉴, 原文在<a href="http://java.dzone.com/articles/indexing-solrj" target="_blank">这里</a>. 这个例子使用两种方式来演示如何生成全量索引: <br />一个是从db中通过sql生成全量索引 <br />一个是通过tika解析文件生成全量索引 
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000ff">package</span><span style="color: #000000">&nbsp;SolrJExample;<br /><br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.solr.client.solrj.SolrServerException;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.solr.client.solrj.impl.XMLResponseParser;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.solr.client.solrj.response.UpdateResponse;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.solr.common.SolrInputDocument;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.tika.metadata.Metadata;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.tika.parser.AutoDetectParser;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.tika.parser.ParseContext;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.apache.tika.sax.BodyContentHandler;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;org.xml.sax.ContentHandler;<br /><br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.io.File;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.io.FileInputStream;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.io.IOException;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.io.InputStream;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.sql.</span><span style="color: #000000">*</span><span style="color: #000000">;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.ArrayList;<br /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.Collection;<br /><br /></span><span style="color: #008000">/*</span><span style="color: #008000">&nbsp;Example&nbsp;class&nbsp;showing&nbsp;the&nbsp;skeleton&nbsp;of&nbsp;using&nbsp;Tika&nbsp;and<br />&nbsp;&nbsp;&nbsp;Sql&nbsp;on&nbsp;the&nbsp;client&nbsp;to&nbsp;index&nbsp;documents&nbsp;from<br />&nbsp;&nbsp;&nbsp;both&nbsp;structured&nbsp;documents&nbsp;and&nbsp;a&nbsp;SQL&nbsp;database.<br /><br />&nbsp;&nbsp;&nbsp;NOTE:&nbsp;The&nbsp;SQL&nbsp;example&nbsp;and&nbsp;the&nbsp;Tika&nbsp;example&nbsp;are&nbsp;entirely&nbsp;orthogonal.<br />&nbsp;&nbsp;&nbsp;Both&nbsp;are&nbsp;included&nbsp;here&nbsp;to&nbsp;make&nbsp;a<br />&nbsp;&nbsp;&nbsp;more&nbsp;interesting&nbsp;example,&nbsp;but&nbsp;you&nbsp;can&nbsp;omit&nbsp;either&nbsp;of&nbsp;them.<br /><br />&nbsp;</span><span style="color: #008000">*/</span><span style="color: #000000"><br /></span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">class</span><span style="color: #000000">&nbsp;SqlTikaExample&nbsp;{<br />&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;StreamingUpdateSolrServer&nbsp;_server;<br />&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">long</span><span style="color: #000000">&nbsp;_start&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;System.currentTimeMillis();<br />&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;AutoDetectParser&nbsp;_autoParser;<br />&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;_totalTika&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;<br />&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;_totalSql&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;<br /><br />&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;Collection&nbsp;_docs&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;ArrayList();<br /><br />&nbsp;&nbsp;</span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">static</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;main(String[]&nbsp;args)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">try</span><span style="color: #000000">&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SqlTikaExample&nbsp;idxer&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;SqlTikaExample(</span><span style="color: #000000">"</span><span style="color: #000000">http://localhost:8983/solr</span><span style="color: #000000">"</span><span style="color: #000000">);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;idxer.doTikaDocuments(</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;File(</span><span style="color: #000000">"</span><span style="color: #000000">/Users/Erick/testdocs</span><span style="color: #000000">"</span><span style="color: #000000">));<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;idxer.doSqlDocuments();<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;idxer.endIndexing();<br />&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</span><span style="color: #0000ff">catch</span><span style="color: #000000">&nbsp;(Exception&nbsp;e)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;e.printStackTrace();<br />&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;}<br /><br />&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;SqlTikaExample(String&nbsp;url)&nbsp;</span><span style="color: #0000ff">throws</span><span style="color: #000000">&nbsp;IOException,&nbsp;SolrServerException&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Create&nbsp;a&nbsp;multi-threaded&nbsp;communications&nbsp;channel&nbsp;to&nbsp;the&nbsp;Solr&nbsp;server.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Could&nbsp;be&nbsp;CommonsHttpSolrServer&nbsp;as&nbsp;well.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//<br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;_server&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;StreamingUpdateSolrServer(url,&nbsp;</span><span style="color: #000000">10</span><span style="color: #000000">,&nbsp;</span><span style="color: #000000">4</span><span style="color: #000000">);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;_server.setSoTimeout(</span><span style="color: #000000">1000</span><span style="color: #000000">);&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;socket&nbsp;read&nbsp;timeout</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;_server.setConnectionTimeout(</span><span style="color: #000000">1000</span><span style="color: #000000">);<br />&nbsp;&nbsp;&nbsp;&nbsp;_server.setMaxRetries(</span><span style="color: #000000">1</span><span style="color: #000000">);&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;defaults&nbsp;to&nbsp;0.&nbsp;&nbsp;&gt;&nbsp;1&nbsp;not&nbsp;recommended.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;binary&nbsp;parser&nbsp;is&nbsp;used&nbsp;by&nbsp;default&nbsp;for&nbsp;responses</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;_server.setParser(</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;XMLResponseParser());&nbsp;<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;One&nbsp;of&nbsp;the&nbsp;ways&nbsp;Tika&nbsp;can&nbsp;be&nbsp;used&nbsp;to&nbsp;attempt&nbsp;to&nbsp;parse&nbsp;arbitrary&nbsp;files.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;_autoParser&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;AutoDetectParser();<br />&nbsp;&nbsp;}<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Just&nbsp;a&nbsp;convenient&nbsp;place&nbsp;to&nbsp;wrap&nbsp;things&nbsp;up.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;endIndexing()&nbsp;</span><span style="color: #0000ff">throws</span><span style="color: #000000">&nbsp;IOException,&nbsp;SolrServerException&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(_docs.size()&nbsp;</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">)&nbsp;{&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Are&nbsp;there&nbsp;any&nbsp;documents&nbsp;left&nbsp;over?</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_server.add(_docs,&nbsp;</span><span style="color: #000000">300000</span><span style="color: #000000">);&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Commit&nbsp;within&nbsp;5&nbsp;minutes</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;_server.commit();&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Only&nbsp;needs&nbsp;to&nbsp;be&nbsp;done&nbsp;at&nbsp;the&nbsp;end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;commitWithin&nbsp;should&nbsp;do&nbsp;the&nbsp;rest.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Could&nbsp;even&nbsp;be&nbsp;omitted<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;assuming&nbsp;commitWithin&nbsp;was&nbsp;specified.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">long</span><span style="color: #000000">&nbsp;endTime&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;System.currentTimeMillis();<br />&nbsp;&nbsp;&nbsp;&nbsp;log(</span><span style="color: #000000">"</span><span style="color: #000000">Total&nbsp;Time&nbsp;Taken:&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;(endTime&nbsp;</span><span style="color: #000000">-</span><span style="color: #000000">&nbsp;_start)&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;milliseconds&nbsp;to&nbsp;index&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;_totalSql&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;SQL&nbsp;rows&nbsp;and&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;_totalTika&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;documents</span><span style="color: #000000">"</span><span style="color: #000000">);<br />&nbsp;&nbsp;}<br /><br />&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;I&nbsp;hate&nbsp;writing&nbsp;System.out.println()&nbsp;everyplace,<br />&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;besides&nbsp;this&nbsp;gives&nbsp;a&nbsp;central&nbsp;place&nbsp;to&nbsp;convert&nbsp;to&nbsp;true&nbsp;logging<br />&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;in&nbsp;a&nbsp;production&nbsp;system.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">static</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;log(String&nbsp;msg)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(msg);<br />&nbsp;&nbsp;}<br /><br />&nbsp;&nbsp;</span><span style="color: #008000">/**</span><span style="color: #008000"><br />&nbsp;&nbsp;&nbsp;*&nbsp;***************************Tika&nbsp;processing&nbsp;here<br />&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">*/</span><span style="color: #000000"><br />&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Recursively&nbsp;traverse&nbsp;the&nbsp;filesystem,&nbsp;parsing&nbsp;everything&nbsp;found.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;doTikaDocuments(File&nbsp;root)&nbsp;</span><span style="color: #0000ff">throws</span><span style="color: #000000">&nbsp;IOException,&nbsp;SolrServerException&nbsp;{<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Simple&nbsp;loop&nbsp;for&nbsp;recursively&nbsp;indexing&nbsp;all&nbsp;the&nbsp;files<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;in&nbsp;the&nbsp;root&nbsp;directory&nbsp;passed&nbsp;in.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">&nbsp;(File&nbsp;file&nbsp;:&nbsp;root.listFiles())&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(file.isDirectory())&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doTikaDocuments(file);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">continue</span><span style="color: #000000">;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Get&nbsp;ready&nbsp;to&nbsp;parse&nbsp;the&nbsp;file.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ContentHandler&nbsp;textHandler&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;BodyContentHandler();<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Metadata&nbsp;metadata&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;Metadata();<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ParseContext&nbsp;context&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;ParseContext();<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;InputStream&nbsp;input&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;FileInputStream(file);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Try&nbsp;parsing&nbsp;the&nbsp;file.&nbsp;Note&nbsp;we&nbsp;haven't&nbsp;checked&nbsp;at&nbsp;all&nbsp;to<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;see&nbsp;whether&nbsp;this&nbsp;file&nbsp;is&nbsp;a&nbsp;good&nbsp;candidate.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">try</span><span style="color: #000000">&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_autoParser.parse(input,&nbsp;textHandler,&nbsp;metadata,&nbsp;context);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</span><span style="color: #0000ff">catch</span><span style="color: #000000">&nbsp;(Exception&nbsp;e)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Needs&nbsp;better&nbsp;logging&nbsp;of&nbsp;what&nbsp;went&nbsp;wrong&nbsp;in&nbsp;order&nbsp;to<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;track&nbsp;down&nbsp;"bad"&nbsp;documents.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;log(String.format(</span><span style="color: #000000">"</span><span style="color: #000000">File&nbsp;%s&nbsp;failed</span><span style="color: #000000">"</span><span style="color: #000000">,&nbsp;file.getCanonicalPath()));<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;e.printStackTrace();<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">continue</span><span style="color: #000000">;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Just&nbsp;to&nbsp;show&nbsp;how&nbsp;much&nbsp;meta-data&nbsp;and&nbsp;what&nbsp;form&nbsp;it's&nbsp;in.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dumpMetadata(file.getCanonicalPath(),&nbsp;metadata);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Index&nbsp;just&nbsp;a&nbsp;couple&nbsp;of&nbsp;the&nbsp;meta-data&nbsp;fields.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SolrInputDocument&nbsp;doc&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;SolrInputDocument();<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doc.addField(</span><span style="color: #000000">"</span><span style="color: #000000">id</span><span style="color: #000000">"</span><span style="color: #000000">,&nbsp;file.getCanonicalPath());<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Crude&nbsp;way&nbsp;to&nbsp;get&nbsp;known&nbsp;meta-data&nbsp;fields.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Also&nbsp;possible&nbsp;to&nbsp;write&nbsp;a&nbsp;simple&nbsp;loop&nbsp;to&nbsp;examine&nbsp;all&nbsp;the<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;metadata&nbsp;returned&nbsp;and&nbsp;selectively&nbsp;index&nbsp;it&nbsp;and/or<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;just&nbsp;get&nbsp;a&nbsp;list&nbsp;of&nbsp;them.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;One&nbsp;can&nbsp;also&nbsp;use&nbsp;the&nbsp;LucidWorks&nbsp;field&nbsp;mapping&nbsp;to<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;accomplish&nbsp;much&nbsp;the&nbsp;same&nbsp;thing.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;author&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;metadata.get(</span><span style="color: #000000">"</span><span style="color: #000000">Author</span><span style="color: #000000">"</span><span style="color: #000000">);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(author&nbsp;</span><span style="color: #000000">!=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">null</span><span style="color: #000000">)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doc.addField(</span><span style="color: #000000">"</span><span style="color: #000000">author</span><span style="color: #000000">"</span><span style="color: #000000">,&nbsp;author);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doc.addField(</span><span style="color: #000000">"</span><span style="color: #000000">text</span><span style="color: #000000">"</span><span style="color: #000000">,&nbsp;textHandler.toString());<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_docs.add(doc);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #000000">++</span><span style="color: #000000">_totalTika;<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Completely&nbsp;arbitrary,&nbsp;just&nbsp;batch&nbsp;up&nbsp;more&nbsp;than&nbsp;one&nbsp;document<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;for&nbsp;throughput!</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(_docs.size()&nbsp;</span><span style="color: #000000">&gt;=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">1000</span><span style="color: #000000">)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Commit&nbsp;within&nbsp;5&nbsp;minutes.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;UpdateResponse&nbsp;resp&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;_server.add(_docs,&nbsp;</span><span style="color: #000000">300000</span><span style="color: #000000">);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(resp.getStatus()&nbsp;</span><span style="color: #000000">!=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;log(</span><span style="color: #000000">"</span><span style="color: #000000">Some&nbsp;horrible&nbsp;error&nbsp;has&nbsp;occurred,&nbsp;status&nbsp;is:&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;resp.getStatus());<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_docs.clear();<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;}<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Just&nbsp;to&nbsp;show&nbsp;all&nbsp;the&nbsp;metadata&nbsp;that's&nbsp;available.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;dumpMetadata(String&nbsp;fileName,&nbsp;Metadata&nbsp;metadata)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;log(</span><span style="color: #000000">"</span><span style="color: #000000">Dumping&nbsp;metadata&nbsp;for&nbsp;file:&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;fileName);<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">&nbsp;(String&nbsp;name&nbsp;:&nbsp;metadata.names())&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;log(name&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">:</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;metadata.get(name));<br />&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;log(</span><span style="color: #000000">"</span><span style="color: #000000">\n\n</span><span style="color: #000000">"</span><span style="color: #000000">);<br />&nbsp;&nbsp;}<br /><br />&nbsp;&nbsp;</span><span style="color: #008000">/**</span><span style="color: #008000"><br />&nbsp;&nbsp;&nbsp;*&nbsp;***************************SQL&nbsp;processing&nbsp;here<br />&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">*/</span><span style="color: #000000"><br />&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;doSqlDocuments()&nbsp;</span><span style="color: #0000ff">throws</span><span style="color: #000000">&nbsp;SQLException&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;Connection&nbsp;con&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">null</span><span style="color: #000000">;<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">try</span><span style="color: #000000">&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Class.forName(</span><span style="color: #000000">"</span><span style="color: #000000">com.mysql.jdbc.Driver</span><span style="color: #000000">"</span><span style="color: #000000">).newInstance();<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;log(</span><span style="color: #000000">"</span><span style="color: #000000">Driver&nbsp;Loaded<img src="http://www.blogjava.net/Images/dot.gif"  alt="" /><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span style="color: #000000">"</span><span style="color: #000000">);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;con&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;DriverManager.getConnection(</span><span style="color: #000000">"</span><span style="color: #000000">jdbc:mysql://192.168.1.103:3306/test?</span><span style="color: #000000">"</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">user=testuser&amp;password=test123</span><span style="color: #000000">"</span><span style="color: #000000">);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Statement&nbsp;st&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;con.createStatement();<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ResultSet&nbsp;rs&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;st.executeQuery(</span><span style="color: #000000">"</span><span style="color: #000000">select&nbsp;id,title,text&nbsp;from&nbsp;test</span><span style="color: #000000">"</span><span style="color: #000000">);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">&nbsp;(rs.next())&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;DO&nbsp;NOT&nbsp;move&nbsp;this&nbsp;outside&nbsp;the&nbsp;while&nbsp;loop<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;or&nbsp;be&nbsp;sure&nbsp;to&nbsp;call&nbsp;doc.clear()</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SolrInputDocument&nbsp;doc&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;SolrInputDocument();</span><span style="color: #000000">&amp;</span><span style="color: #000000">nbsp;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;id&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;rs.getString(</span><span style="color: #000000">"</span><span style="color: #000000">id</span><span style="color: #000000">"</span><span style="color: #000000">);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;title&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;rs.getString(</span><span style="color: #000000">"</span><span style="color: #000000">title</span><span style="color: #000000">"</span><span style="color: #000000">);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;text&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;rs.getString(</span><span style="color: #000000">"</span><span style="color: #000000">text</span><span style="color: #000000">"</span><span style="color: #000000">);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doc.addField(</span><span style="color: #000000">"</span><span style="color: #000000">id</span><span style="color: #000000">"</span><span style="color: #000000">,&nbsp;id);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doc.addField(</span><span style="color: #000000">"</span><span style="color: #000000">title</span><span style="color: #000000">"</span><span style="color: #000000">,&nbsp;title);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doc.addField(</span><span style="color: #000000">"</span><span style="color: #000000">text</span><span style="color: #000000">"</span><span style="color: #000000">,&nbsp;text);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_docs.add(doc);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #000000">++</span><span style="color: #000000">_totalSql;<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Completely&nbsp;arbitrary,&nbsp;just&nbsp;batch&nbsp;up&nbsp;more&nbsp;than&nbsp;one<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;document&nbsp;for&nbsp;throughput!</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(_docs.size()&nbsp;</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">1000</span><span style="color: #000000">)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;Commit&nbsp;within&nbsp;5&nbsp;minutes.</span><span style="color: #008000"><br /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;UpdateResponse&nbsp;resp&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;_server.add(_docs,&nbsp;</span><span style="color: #000000">300000</span><span style="color: #000000">);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(resp.getStatus()&nbsp;</span><span style="color: #000000">!=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;log(</span><span style="color: #000000">"</span><span style="color: #000000">Some&nbsp;horrible&nbsp;error&nbsp;has&nbsp;occurred,&nbsp;status&nbsp;is:&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;resp.getStatus());<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_docs.clear();<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</span><span style="color: #0000ff">catch</span><span style="color: #000000">&nbsp;(Exception&nbsp;ex)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ex.printStackTrace();<br />&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</span><span style="color: #0000ff">finally</span><span style="color: #000000">&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(con&nbsp;</span><span style="color: #000000">!=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">null</span><span style="color: #000000">)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;con.close();<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;}<br />}<br /></span></div><br /><br /><img src ="http://www.blogjava.net/conans/aggbug/379551.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:43 <a href="http://www.blogjava.net/conans/articles/379551.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Solr调优参考</title><link>http://www.blogjava.net/conans/articles/379550.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:40:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379550.html</guid><description><![CDATA[<div><font style="background-color: #cce8cf">转自：<a href="http://rdc.taobao.com/team/jm/archives/1753">http://rdc.taobao.com/team/jm/archives/1753</a><br />共整理三部分，第一部分Solr常规处理，第二部分针对性性处理，前者比较通用，后者有局限性。务必根据具体应用特性，具体调节参数，对比性能。第三部分<br />solr查询相关的 
<p>&nbsp;</p>
<p>具体应用需要全面去把控，各个因素一起起作用。</p>
<p><span style="font-weight: bold">第一部分&lt;Solr常规的调优&gt;</span><br />E文连接 http://wiki.apache.org/solr/SolrPerformanceFactors</p>
<h2 style="margin: 0cm 0cm 0pt"><span lang="EN-US">Schema Design Considerations</span></h2>
<h3><span lang="EN-US">indexed fields</span></h3>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;indexed fields</span> 的数量将会影响以下的一些性能：</p>
<ul style="margin-top: 0cm" type="disc"><li><span style="font-family: 宋体">索引时的时候的内存使用量</span></li><li><span style="font-family: 宋体">索引段的合并时间</span></li><li><span style="font-family: 宋体">优化时间</span></li><li><span style="font-family: 宋体">索引的大小</span></li></ul>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>我们可以通过将<span lang="EN-US">omitNorms=&#8220;true&#8221;</span>来减少<span lang="EN-US">indexed fields</span>数量增加所带来的影响。</p>
<h3><span lang="EN-US">stored fields</span></h3>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;Retrieving the stored fields&nbsp;</span>确实是一种开销。这个开销，受每个文档所存储的字节影响很大。<strong>每个文档的所占用的空间越大，文档就显的更稀疏</strong>，这样从硬盘中读取数据，就需要更多的<span lang="EN-US">i/o</span>操作（通常，我们在存储比较大的域的时候，就会考虑这样的事情，比如存储一篇文章的文档。）</p>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>可以考虑将比较大的域放到<span lang="EN-US">solr</span>外面来存储。如果你觉得这样做会有些别扭的话，可以考虑使用压缩的域，但是这样会加重<span lang="EN-US">cpu</span>在存储和读取域的时候的负担。不过这样却是可以较少<span lang="EN-US">i/0</span>的负担。</p>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>如果，你并不是总是使用<span lang="EN-US">stored fields</span>的话，可以使用<span lang="EN-US">stored field</span>的延迟加载，这样可<strong>以节省很多的性能</strong>，尤其是使用<span lang="EN-US">compressed field</span> 的时候。</p>
<h2><span lang="EN-US">Configuration Considerations</span></h2>
<h3><span lang="EN-US">mergeFactor</span></h3>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>这个是合并因子，这个参数<strong>大概</strong>决定了<span lang="EN-US">segment(</span>索引段<span lang="EN-US">)</span>的数量。</p>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>合并因子这个值告诉<span lang="EN-US">lucene</span>，在什么时候，要将几个<span lang="EN-US">segment</span>合并成为一个<span lang="EN-US">segment,</span> 合并因子就像是一个数字系统的<strong>基数</strong>一样。</p>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>比如说，如果你将合并因子设成<span lang="EN-US">10</span>，那么每往索引中添加<span lang="EN-US">1000</span>个文档的时候，就会创建一个新的索引段。当第<span lang="EN-US">10</span>个大小为<span lang="EN-US">1000</span>的索引段添加进来的时候，这十个索引段就会被合并成一个大小为<span lang="EN-US">10</span>，<span lang="EN-US">000</span>的索引段。当十个大小为<span lang="EN-US">10</span>，<span lang="EN-US">000</span>的索引段生成的时候，它们就会被合并成一个大小为<span lang="EN-US">100</span>，<span lang="EN-US">000</span>的索引段。如此类推下去。</p>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span><br />这个值可以在<span lang="EN-US">solrconfig.xml</span> 中的<br /><span lang="EN-US">*<strong>mainIndex</strong>*</span>中设置。（不用管<span lang="EN-US">indexDefaults</span>中设置）</p>
<h3><span lang="EN-US">&nbsp;mergeFactor Tradeoffs</span></h3>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;</span>较高的合并因子</p>
<ul style="margin-top: 0cm" type="disc"><li><span lang="EN-US">&nbsp;&nbsp;</span><span style="font-family: 宋体">会提高索引速度</span></li><li><span lang="EN-US">&nbsp;&nbsp;</span><span style="font-family: 宋体">较低频率的合并，会导致更多的索引文件，这会降低索引的搜索效率</span></li></ul>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;</span>较低的合并因子</p>
<ul style="margin-top: 0cm" type="disc"><li><span lang="EN-US">&nbsp;&nbsp;</span><span style="font-family: 宋体">较少数量的索引文件，能加快索引的搜索速度。</span></li><li><span lang="EN-US">&nbsp;&nbsp;</span><span style="font-family: 宋体">较高频率的合并，会降低索引的速度。</span></li></ul>
<h2><span lang="EN-US">HashDocSet Max Size Considerations</span></h2>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US"><span>&nbsp;</span></span><span lang="EN-US"><span>&nbsp;</span>hashDocSet</span><span>是</span><span lang="EN-US">solrconfig.xml</span><span>中自定义优化选项</span><span lang="EN-US">,</span> <span><br />使用在</span><span lang="EN-US">filters(docSets)</span> <span><br />中，更小的</span><span lang="EN-US">sets</span><span>，表明更小的内存消耗、遍历、插入。</span></p>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US"><span>&nbsp;&nbsp;</span><br />hashDocSet</span><span>参数值最后基于索引文档总数来定，索引集合越大，</span><span lang="EN-US">hashDocSet</span><span>值也越大。</span></p>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">Calulate 0.005 of the total number of documents that you are going to store.&nbsp; Try values on either &#8216;side&#8217; of that value to arrive at the best query times.  When query times seem to plateau, and performance doesn&#8217;t show much difference between the higher number and the lower, use the higher. </span></p>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">Note: hashDocSet is no longer part of Solr as of version 1.4.0, see <a href="https://issues.apache.org/jira/browse/SOLR-1169"><span style="color: windowtext; text-decoration: none">SOLR-1169</span></a>.</span></p>
<h2><span lang="EN-US">Cache autoWarm Count Considerations</span></h2>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>当一个新的<span lang="EN-US">searcher</span> 打开的时候，它缓存可以被预热，或者说<strong>使用从旧的<span lang="EN-US">searcher</span>的缓存的数据来<span lang="EN-US">&#8220;</span>自动加热<span lang="EN-US">&#8221;</span></strong>。<span lang="EN-US">autowarmCount</span>是这样的一个参数，它表示从旧缓存中拷贝到新缓存中的对象数量。<span lang="EN-US">autowarmCount</span>这个参数将会影响<span lang="EN-US">&#8220;</span><strong>自动预热<span lang="EN-US">&#8221;</span>的时间</strong>。有些时候，我们需要一些折中的考虑，<span lang="EN-US">seacher</span>启动的时间和缓存加热的程度。当然啦，缓存加热的程度越好，使用的时间就会越长，但往往，我们并不希望过长的<span lang="EN-US">seacher</span>启动时间。这个<span lang="EN-US">autowarm</span> 参数可以在<span lang="EN-US">solrconfig.xml</span>文件中被设置。</p>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;</span>详细的配置可以参考<span lang="EN-US">solr</span>的<span lang="EN-US">wiki</span>。</p>
<h2><span lang="EN-US">Cache hit rate</span>（缓存命中率）</h2>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>我们可以通过<span lang="EN-US">solr</span>的<span lang="EN-US">admin</span>界面来查看缓存的状态信息。<strong>提高<span lang="EN-US">solr</span>缓存的大小往往是提高性能的捷径</strong>。当你使用<strong>面搜索的时候</strong>，你或许可以注意一下<span lang="EN-US">filterCache,</span>这个是由<span lang="EN-US">solr</span>实现的缓存。</p>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;</span><br />详细的内容可以参考<span lang="EN-US">solrCaching</span>这篇<span lang="EN-US">wiki</span>。</p>
<h2><span lang="EN-US">Explicit Warming of Sort Fields&nbsp;</span></h2>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span>如果你有许多域是基于排序的，那么你可以在<span lang="EN-US">&#8220;newSearcher&#8221;</span>和<span lang="EN-US">&#8220;firstSearcher&#8221;event<br />listeners</span>中添加一些明显需要预热的查询，这样<strong><span lang="EN-US">FieldCache</span> 就会缓存这部分内容</strong>。</p>
<h2><span lang="EN-US">Optimization Considerations</span></h2>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>优化索引，是我们经常会做的事情，比如，当我们建立好索引，然后这个索引<strong>不会再变更的情况</strong>，我们就会做一次优化了。</p>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>但，如果你的索引经常会改变，那么你就需要好好的考虑下面的因素的。</p>
<ul type="disc"><li><span style="font-family: 宋体">当越来越多的索引段被加进索引，查询的性能就会降低，</span> <span lang="EN-US">lucene</span><span style="font-family: 宋体">对索引段的数量有一个上限的限制，当超过这个限制的时候，索引段可以自动合并成为一个。</span></li><li><span style="font-family: 宋体">在同样没有缓存的情况下，一个没有经过优化的索引的性能会比经过优化的索引的性能少</span><span lang="EN-US">10%&#8230;&#8230;</span></li><li><span style="font-family: 宋体">自动加热的时间将会变长，因为它依赖于搜索。</span></li><li><span lang="EN-US">&nbsp;</span><strong><span style="font-family: 宋体">优化将会对索引的分发产生影响</span></strong><span style="font-family: 宋体">。</span></li><li><span lang="EN-US">&nbsp;</span><span style="font-family: 宋体">在优化期间，文件的大小将会是<strong>索引的两倍</strong>，不过最终将会回到它原来的大小，或者会更小一点。</span></li></ul>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>优化，会将所有的索引段合并成为一个索引段，所以，优化这个操作其实可以帮助避免<span lang="EN-US">&#8220;too many files&#8221;</span>这个问题，这个错误是由文件系统抛出的。</p>
<h2><span lang="EN-US">Updates and Commit Frequency Tradeoffs</span></h2>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;</span><br />如果从机 经常从 主机更新的话，从机的性能是会受到影响的。为了避免，由于这个问题而引起的性能下降，我们还必须了解从机是怎样执行更新的，这样我们才能更准确去调节一些相关的参数（<span lang="EN-US">commit</span>的频率，<span lang="EN-US">spappullers, autowarming/autocount</span>）<span lang="EN-US">,</span>这样，从机的更新才不会太频繁。</p>
<ol type="1"><li><span style="font-family: 宋体"><br />执行</span><span lang="EN-US">commit</span><span style="font-family: 宋体">操作会让</span><span lang="EN-US">solr</span><span style="font-family: 宋体">新生成一个</span><span lang="EN-US">snapshot</span><span style="font-family: 宋体">。如果将</span><span lang="EN-US">postCommit</span><span style="font-family: 宋体">参数设成</span><span lang="EN-US">true</span><span style="font-family: 宋体">的话，</span><span lang="EN-US">optimization</span><span style="font-family: 宋体">也会执行</span><span lang="EN-US">snapShot.</span></li><li><span lang="EN-US">slave</span><span style="font-family: 宋体">上的</span><span lang="EN-US">Snappuller</span><span style="font-family: 宋体">程序一般是在</span><span style="background: yellow" lang="EN-US">crontab</span><span style="font-family: 宋体">上面执行的，它会去</span><span lang="EN-US">master</span><span style="font-family: 宋体">询问，有没有新版的</span><span lang="EN-US">snapshot</span><span style="font-family: 宋体">。一旦发现新的版本，</span><span lang="EN-US">slave</span><span style="font-family: 宋体">就会把它下载下来，然后</span><span lang="EN-US">snapinstall.</span></li><li><span style="font-family: 宋体">每次当一个新的</span><span lang="EN-US">searcher</span><span style="font-family: 宋体">被</span><span lang="EN-US">open</span><span style="font-family: 宋体">的时候，会有一个缓存预热的过程，<strong>预热之后，新的索引才会交付使用。</strong></span></li></ol>
<p style="margin: 0cm 0cm 0pt"><span lang="EN-US">&nbsp;&nbsp;&nbsp;</span>这里讨论三个有关的参数：</p>
<ul style="margin-top: 0cm" type="disc"><li><span lang="EN-US">&nbsp;<strong><span>number/frequency of snapshots</span></strong>&nbsp; &#8212;-snapshot</span><span style="font-family: 宋体">的频率。</span></li><li><strong><span lang="EN-US">snappullers </span></strong><strong><span>是</span></strong><span lang="EN-US">&nbsp;</span> <span style="font-family: 宋体">在</span><span lang="EN-US">crontab</span><span style="font-family: 宋体">中的，它当然可以每秒一次、每天一次、或者其他的时间间隔一次运行。它运行的时候，只会下载</span><span lang="EN-US">slave</span><span style="font-family: 宋体">上没有的，并且最新的版本。</span></li><li><strong><span lang="EN-US">Cache autowarming</span></strong> <span style="font-family: 宋体">可以在</span><span lang="EN-US">solrconfig.xml</span><span style="font-family: 宋体">文件中配置。</span></li></ul>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span>如果，你想要的效果是频繁的更新<span lang="EN-US">slave</span>上的索引，以便这样看起来比较像<span lang="EN-US">&#8220;</span>实时索引<span lang="EN-US">&#8221;</span>。那么，你就需要让<span lang="EN-US">snapshot</span>尽可能频繁的运行，然后也让<span lang="EN-US">snappuller</span>频繁的运行。这样，我们或许可以每<span lang="EN-US">5</span>分钟更新一次，并且还能取得不错的性能，当然啦，<span lang="EN-US">cach</span>的命中率是很重要的，恩，缓存的加热时间也将会影响到更新的频繁度。</p>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;<strong>&nbsp;<span style="background: yellow">cache</span></strong></span><strong><span style="background: yellow">对性能是很重要的</span></strong>。一方面，新的缓存必须拥有足够的缓存量，这样接下来的的查询才能够从缓存中受益。另一方面，缓存的预热将可能占用很长一段时间，尤其是，它其实是只使用一个线程，和一个<span lang="EN-US">cpu</span>在工作。<span lang="EN-US">snapinstaller</span>太频繁的话，<span lang="EN-US">solr<br />slave</span>将会处于一个不太理想的状态，可能它还在预热一个新的缓存，然而一个更新的<span lang="EN-US">searcher</span>被<span lang="EN-US">opern</span>了。</p>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>怎么解决这样的一个问题呢，我们可能会取消第一个<span lang="EN-US">seacher</span>，然后去处理一个更新<span lang="EN-US">seacher</span>，也即是第二个。然而有可能第二个<span lang="EN-US">seacher</span> 还没有被使用上的时候，第三个又过来了。看吧，一个恶性的循环，不是。当然也有可能，我们刚刚预热好的时候就开始新一轮的缓存预热，其实，这样缓存的作用压根就没有能体现出来。出现这种情况的时候，降低<span lang="EN-US">snapshot</span>的频率才是硬道理。</p>
<h2><span lang="EN-US">Query Response Compression</span></h2>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>在有些情况下，我们可以考虑将<span lang="EN-US">solr xml response</span> <strong>压缩后才输出</strong>。如果<span lang="EN-US">response</span>非常大，就会触及<span lang="EN-US">NIc i/o</span>限制。</p>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>当然压缩这个操作将会增加<span lang="EN-US">cpu</span>的负担，其实，<span lang="EN-US">solr</span>一个典型的依赖于<span lang="EN-US">cpu</span>处理速度的服务，增加这个压缩的操作，将无疑会降低查询性能。但是，压缩后的数据将会是压缩前的数据的<strong><span lang="EN-US">6</span>分之一的大小</strong>。然而<span lang="EN-US">solr</span>的查询性能也会有<span lang="EN-US">15%</span>左右的消耗。</p>
<p><span lang="EN-US">&nbsp;&nbsp;</span>至于怎样配置这个功能，要看你使用的什么服务器而定，可以查阅相关的文档。</p>
<h2><span lang="EN-US">Embedded vs HTTP Post</span></h2>
<p><span lang="EN-US">&nbsp;</span>使用<span lang="EN-US">embeded</span> 来建立索引，将会比使用<span lang="EN-US">xml</span>格式来建立索引快<span lang="EN-US">50%</span>。</p>
<h2><span lang="EN-US">RAM Usage Considerations</span>（内存方面的考虑）</h2>
<h3><span lang="EN-US">&nbsp;OutOfMemoryErrors</span></h3>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>如果你的<span lang="EN-US">solr</span>实例没有被指定足够多的内存的话，<span lang="EN-US">java virtual machine</span>也许会抛<span lang="EN-US">outof memoryError</span>，这个<strong>并不对索引数据产生影响</strong>。但是这个时候，任何的<span lang="EN-US">adds/deletes/commits</span>操作都是不能够成功的。</p>
<h3><span lang="EN-US">&nbsp;Memory allocated to the Java VM</span></h3>
<p><span lang="EN-US">&nbsp;&nbsp;&nbsp;&nbsp;</span>最简单的解决这个方法就是，当然前提是<span lang="EN-US">java virtual machine</span>还没有使用掉你全部的内存，增加运行<span lang="EN-US">solr</span>的<span lang="EN-US">java</span>虚拟机的内存。</p>
<h4><span lang="EN-US">&nbsp;Factors affecting memory usage</span><span style="font-family: 宋体">（影响内存使用量的因素）</span></h4>
<p style="margin: 0cm 0cm 0pt">我想，你或许也会考虑怎样去减少<span lang="EN-US">solr</span>的内存使用量。其中的一个因素就是<span lang="EN-US">input document</span>的大小。当我们使用<span lang="EN-US">xml</span>执行<span lang="EN-US">add</span>操作的时候，就会有两个限制。</p>
<ul style="margin-top: 0cm" type="disc"><li><span lang="EN-US">document</span><span style="font-family: 宋体">中的</span><span lang="EN-US">field</span><span style="font-family: 宋体">都是会被存进内存的，</span><span lang="EN-US">field</span><span style="font-family: 宋体">有个属性叫</span><span lang="EN-US">maxFieldLength</span><span style="font-family: 宋体">，它或许能帮上忙。</span></li><li><span style="font-family: 宋体">每增加一个域，也是会增加内存的使用的。</span></li></ul>
<p><span style="font-weight: bold">第二部分&lt;Solr特殊调优&gt;</span></p>
<p>1. 多core的时候</p>
<p>多core 如果同一时间进行core 切换，会导致内存、cpu压力过大，可以扩展Solr代码，限制最多同时core<br />切换的执行个数。保证不会出现高load或者高cpu 风险</p>
<p>2，应用较高安全</p>
<p>最后不低于2个结点工作，并且最好2个结点是跨机器的。<br />offline与online切换的时候，如果数据量不是很多，可以考虑index与search合一，如果数据量较大，超过5000w的时候，建议index<br />offline或者search结点之外的其他结点上执行index</p>
<p>3.cache参数配置</p>
<p>如果更新很频繁，导致commit和reopen频繁，如果可以的话，关闭cache.<br />如果访问中依赖cache提示性能，那么最好关闭cache warm，no facet 需求<br />或者开开启cache warm&nbsp; 有facet需要，对fieldvalue cache很依赖的话。<br />实时更新的话，通常document cache命中率比较低，完全可以不开启这个配置</p>
<p>4.reopen 和commit</p>
<p>如果可以的话，主磁盘索引，不参入segment合并，新的索引段走不同的目录。并且reopen的时候，主索引的不变动。</p>
<p>commit与reopen异步化</p>
<p>5.有一部分数据如果不变动，可以考虑使用memory cache 或者locale cache 平衡性能和空间开销，同时避免FGC</p>
<p>6.中间变量压缩、单例化</p>
<p>所有查询或者建索引过程中，尽量少创建对象，而通过set改变对象值，以及单例化，提升性能。一些较大中间变量，如果可以的话，采取一些整数压缩</p>
<p>7.对象表示重定义<br />例如日期、地区、url、byte等一些对象，可以考虑差值、区位码、可别部分、压缩等结构，使得内存开销降低间接使得内存使用率提高，获得更好性能。</p>
<p>8.index与store 隔离<br />就是index发挥它的查询性能，store发挥它的存储、响应性能。<br />也就是不要将所有的内容都放在index中，尽量使得field的属性stored=false</p>
<p>9. 使用solr、lucene最新版本</p>
<p>10. 共享分词实例<br />自定义的分词，务必使用单例。千万不要一个document创建一个分词对象</p>
<p><span style="font-weight: bold">第三部分 Solr查询</span></p>
<p>1. 对按指定域排序<br />展示的时候，对于数字的建议，展示最近1或者3个月数据。例如价格，防止作弊<br />dump或者建索引的时候，对数字加以上下界检测，及早发现数字本身正确，而实际意义不合理的数据</p>
<p>2. 排序可变性<br />默认的排序务必有自己的相关参数，并且平衡各方面需求。<br />排序要变，但是不至于大的波动。排序的细节不公开，但是排序的结果可以解释的清楚。</p>
<p>3.线上线下<br />有些分值可以线下完成，有些分值线上完成。看需求。</p>
<p>4.多域查询<br />如果默认查询多个域，不妨将多个域合成一个域，只差一个域</p>
<p>5.高亮<br />高亮可以在solr里面或者外面执行的，不一定在solr里面执行，可以在solr之外执行<br />同理，分词可以在线下执行好，dump只执行简单的空格分词即可</p>
<p>6.统计<br />facet统计可以先上与线下相结合，不一定完全依赖线上即时计数。</p>
<p>7.主动搜索<br />主动搜索查询串务必严格处理，既要去无效查询串，也要适当扩展查询串。<br />明确查询路径和hit=0的对应处理。</p><br /></font></div><img src ="http://www.blogjava.net/conans/aggbug/379550.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:40 <a href="http://www.blogjava.net/conans/articles/379550.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>solr学习笔记-linux下配置solr(转)</title><link>http://www.blogjava.net/conans/articles/379549.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:38:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379549.html</guid><description><![CDATA[<p><span style="font-size: small">本文地址：</span></p>
<p><span style="line-height: 18px"><span style="font-size: small"></span></span></p>
<h3 style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px"><a style="background-color: #108ac6; color: white; text-decoration: underline" href="http://zhoujianghai.iteye.com/blog/1540176"><span style="font-size: small">http://zhoujianghai.iteye.com/blog/1540176</span></a></h3>
<p style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px">&nbsp;</p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">首先介绍一下solr：</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small"><span style="font-size: small">Apache Solr (读音:&nbsp;<span style="padding-bottom: 0px; line-height: 16px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px">SOLer)&nbsp;</span>是一个开源、</span><span style="font-size: small">高性能、采用Java开发、</span><span style="font-size: small">基于<a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" href="http://www.oschina.net/p/lucene">Lucene</a>的全文搜索服务器</span><span style="font-size: small">，</span><span style="font-size: small">文档通过Http利用XML加到一个搜索集合中，查询该集合也是通过 http收到一个XML/<a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px" href="http://www.oschina.net/project/search?q=JSON">JSON</a>响应来实现。</span><span style="font-size: small">Solr 中存储的资源是以 Document 为对象进行存储的。每个文档由一系列的 Field 构成，每个 Field 表示资源的一个属性。Solr 中的每个 Document 需要有能唯一标识其自身的属性，默认情况下这个属性的名字是 id，在 Schema 配置文件（schema.xml）中使用：<code style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px"><strong>&lt;uniqueKey&gt;id&lt;/uniqueKey&gt;</strong></code>进行描述。solr有两个核心文件，solrconfig.xml和schema.xml。</span><span style="line-height: 26px; color: #333333; font-size: 14px">solrconfig.xml是solr的基础文件，里面配置了各种web请求处理器、请求响应处理器、日志、缓存等;</span><span style="line-height: 26px; color: #333333; font-size: 14px">schema.xml配置映射了各种数据类型的索引方案，分词器的配置、索引文档中包含的字段也在此配置。</span></span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">工作中主要用来分词和搜索，简单的工作原理是：利用分词器对数据源进行分词处理，然后根据分词结果建立索引库;查询的时候，利用分词器对查询语句进行分词，根据查询语句分词的结果在索引库中进行匹配，最后返回结果。</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small"><br /></span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">废话少说，下面开始solr之旅吧：</span></span></span></p>
<p><strong><span style="font-size: x-small"><span style="font-size: small">一.安装JDK和Tomcat</span></span></strong></p>
<div><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">（1）：安装jdk &nbsp;下载jdk安装包，解压到jdk-1.x目录</span></span></span></div>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">（2）：安装tomcat，下载tomcat安装包，解压到apache-tomcat目录下</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">修改tomcat安装目录下的conf目录的server.xml</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">找到&lt;Connector port="8080" .../&gt;，加入<span style="line-height: 18px"><span style="color: black">URIEncoding=</span><span style="color: blue" class="string">"UTF-8"，为了</span></span>支持中文。</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: small">设置Java和tomcat环境变量</span></span></p>
<p><span style="font-size: x-small"><span style="font-size: small"><br /></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">上面两步比较简单，这里就只简单描述一下，不明白的可以网上查资料。</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: small"><br /></span></span></p>
<p><span style="font-size: x-small"><strong><span style="font-size: x-small"><span style="font-size: small">二. 安装solr</span></span></strong></span></p>
<p><span style="font-size: small"><span style="font-size: x-small"><span style="font-size: x-small">下载solr包，</span></span><span style="white-space: pre"><strong>http://labs.renren.com/apache-mirror/lucene/solr/3.5.0/apache-solr-3.5.0.zip</strong></span></span></p>
<p><span style="font-size: small"><br /></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">解压缩到apache-solr目录，把apache-solr/dist目录下的apache-solr-3.5.0.war 复制到$TOMCAT_HOME/webapps目录下，重命名为solr.war</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">复制apache-solr/example/solr到tomcat根目录下（如果你想配置多core（实例），就复制apache-solr /example/multicore到tomcat根目录下，不用复制solr了），作为solr/home，以后也可以往该目录添加 core，每个core下面都可以有自己的配置文件。</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: small">在apache-tomcat/conf/Catalina/localhost/下创建solr.xml（跟webapps下的solr项目同名），指定solr.war和solr/home的位置，让tomcat启动时就自动加载该应用。</span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">solr.xml内容如下：</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">&lt;Context docBase="/home/zhoujh/java/apache-tomcat7/webapps/solr.war" debug="0" crossContext="true" &gt;</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">&nbsp; &nbsp;&lt;Environment name="solr/home" type="java.lang.String" value="/home/zhoujh/java/apache-tomcat7/solr" override="true" /&gt;</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">&lt;/Context&gt;</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">然后在tomcat的bin目录下执行./startup.sh，启动tomcat</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">在地址栏访问http://localhost:8080/solr/</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">将会出现solr欢迎界面和admin入口</span></span></span></p>
<p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small"></span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: small"><span style="font-size: x-small">注：如果出现<span style="color: #333333; font-size: 14px">org.apache.solr.common.SolrException: Error loading class 'solr.VelocityResponseWriter' 异常，最简单的</span><span style="color: #333333; font-size: 14px">解决方法：</span><span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px">找到</span><span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px">$TOMCAT_HOME/solr/</span><span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px">conf/solrconfig.xml，把</span><span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px">&lt;queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" enable="${solr.velocity.enabled:true}"/&gt;注释掉或者</span><span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px">enable:false</span><span style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px">即可。</span></span>如果一切顺利的话，现在可以看到solr的web管理界面了。不过要想实现分词的功能，得安装一个中文分词器，这里推荐<span>IKAnalyzer或</span><span>mmseg4j。</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small"><span>IKAnalyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包，</span><span>采用了特有的&#8220;正向迭代最细粒度切分算法&#8220;，具有60万字/秒的高速处理能力，</span><span>采用了多子处理器分析模式，支持：英文字母（IP地址、Email、URL）、数字（日期，常用中文数量词，罗马数字，科学计数法），中文词汇（姓名、地名处理）等分词处理。</span><span>优化的词典存储，更小的内存占用。支持用户词典扩展定。</span></span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small"><span>mmseg4j 用 Chih-Hao Tsai 的 MMSeg 算法(<a style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; color: #3e62a6; padding-top: 0px" href="http://technology.chtsai.org/mmseg/">http://technology.chtsai.org/mmseg/</a>&nbsp;)实现的中文分词器，并实现 lucene 的 analyzer 和 solr 的TokenizerFactory 以方便在Lucene和Solr中使用。</span><span>MMSeg 算法有两种分词方法：Simple和Complex，都是基于正向最大匹配。Complex 加了四个规则过虑。官方说：词语的正确识别率达到了 98.41%。mmseg4j 已经实现了这两种分词算法。</span></span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><strong><span style="font-size: x-small"><span style="font-size: small">三. 配置中文分词器</span></span></strong></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">下面分别安装这两个中文分词器，当然选择安装其中一个也是可以的。</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: small">（1）<span style="color: #ff0000">安装<span>IKAnalyzer</span></span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">下载地址：<span style="font-size: 13px"><a style="border-bottom: #bbbbbb 1px dotted; background-color: #eeeeee; color: #555555; text-decoration: none" class="ext-link" href="http://code.google.com/p/ik-analyzer/downloads/list"><span style="padding-left: 12px; background-position: 50% 50%" class="icon">&nbsp;</span>http://code.google.com/p/ik-analyzer/downloads/list</a></span></span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">在当前目录下新建IKAnalyzer目录，解压到该目录下：unzip IKAnalyzer2012_u5.zip -d ./IKAnalyzer</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: small">把IKAnalyzer目录下的IKAnalyzer2012.jar文件拷贝到 $TOMCAT_HOME/webapps/solr/WEB-INF/lib/下</span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: small">配置schema.xml，编辑$TOMCAT_HOME/solr/conf/schema.xml，在文件中添加下面这个fieldtype</span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><strong><span style="color: #ff0000">注：下面的代码中多了很多&#8220;<span style="line-height: 18px"><span class="tag">&lt;</span><span class="tag-name">span</span>&nbsp;<span class="attribute">style</span>=<span class="attribute-value">"font-size:&nbsp;x-small;"</span><span class="tag">&gt;</span></span>&#8221;标签，这个是设置字体时iteye编辑器自己生成的。</p>
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: #0000ff">&lt;</span><span style="color: #800000">span&nbsp;</span><span style="color: #ff0000">style</span><span style="color: #0000ff">="font-size:&nbsp;x-small;"</span><span style="color: #0000ff">&gt;&lt;</span><span style="color: #800000">span&nbsp;</span><span style="color: #ff0000">style</span><span style="color: #0000ff">="font-size:&nbsp;x-small;"</span><span style="color: #0000ff">&gt;&lt;</span><span style="color: #800000">span&nbsp;</span><span style="color: #ff0000">style</span><span style="color: #0000ff">="font-size:&nbsp;small;"</span><span style="color: #0000ff">&gt;&lt;</span><span style="color: #800000">fieldType&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="text"</span><span style="color: #ff0000">&nbsp;class</span><span style="color: #0000ff">="solr.TextField"</span><span style="color: #ff0000">&nbsp;positionIncrementGap</span><span style="color: #0000ff">="100"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">analyzer&nbsp;</span><span style="color: #ff0000">type</span><span style="color: #0000ff">="index"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">tokenizer&nbsp;</span><span style="color: #ff0000">class&nbsp;</span><span style="color: #0000ff">=&nbsp;"org.wltea.analyzer.solr.IKTokenizerFactory"</span><span style="color: #ff0000">&nbsp;isMaxWordLength</span><span style="color: #0000ff">="false"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.StopFilterFactory"</span><span style="color: #ff0000">&nbsp;ignoreCase</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;words</span><span style="color: #0000ff">="stopwords.txt"</span><span style="color: #ff0000">&nbsp;enablePositionIncrements</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.WordDelimiterFilterFactory"</span><span style="color: #ff0000">&nbsp;generateWordParts</span><span style="color: #0000ff">="1"</span><span style="color: #ff0000">&nbsp;generateNumberParts</span><span style="color: #0000ff">="1"</span><span style="color: #ff0000">&nbsp;catenateWords</span><span style="color: #0000ff">="1"</span><span style="color: #ff0000">&nbsp;catenateNumbers</span><span style="color: #0000ff">="1"</span><span style="color: #ff0000">&nbsp;catenateAll</span><span style="color: #0000ff">="0"</span><span style="color: #ff0000">&nbsp;splitOnCaseChange</span><span style="color: #0000ff">="1"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.LowerCaseFilterFactory"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.EnglishPorterFilterFactory"</span><span style="color: #ff0000">&nbsp;protected</span><span style="color: #0000ff">="protwords.txt"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.RemoveDuplicatesTokenFilterFactory"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">analyzer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">analyzer&nbsp;</span><span style="color: #ff0000">type</span><span style="color: #0000ff">="query"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">tokenizer&nbsp;</span><span style="color: #ff0000">class&nbsp;</span><span style="color: #0000ff">=&nbsp;"org.wltea.analyzer.solr.IKTokenizerFactory"</span><span style="color: #ff0000">&nbsp;isMaxWordLength</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.SynonymFilterFactory"</span><span style="color: #ff0000">&nbsp;synonyms</span><span style="color: #0000ff">="synonyms.txt"</span><span style="color: #ff0000">&nbsp;ignoreCase</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;expand</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.StopFilterFactory"</span><span style="color: #ff0000">&nbsp;ignoreCase</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;words</span><span style="color: #0000ff">="stopwords.txt"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.WordDelimiterFilterFactory"</span><span style="color: #ff0000">&nbsp;generateWordParts</span><span style="color: #0000ff">="1"</span><span style="color: #ff0000">&nbsp;generateNumberParts</span><span style="color: #0000ff">="1"</span><span style="color: #ff0000">&nbsp;catenateWords</span><span style="color: #0000ff">="0"</span><span style="color: #ff0000">&nbsp;catenateNumbers</span><span style="color: #0000ff">="0"</span><span style="color: #ff0000">&nbsp;catenateAll</span><span style="color: #0000ff">="0"</span><span style="color: #ff0000">&nbsp;splitOnCaseChange</span><span style="color: #0000ff">="1"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.LowerCaseFilterFactory"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.EnglishPorterFilterFactory"</span><span style="color: #ff0000">&nbsp;protected</span><span style="color: #0000ff">="protwords.txt"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">filter&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="solr.RemoveDuplicatesTokenFilterFactory"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000">&nbsp;&nbsp;<br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">analyzer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">fieldType</span><span style="color: #0000ff">&gt;&lt;/</span><span style="color: #800000">span</span><span style="color: #0000ff">&gt;&lt;/</span><span style="color: #800000">span</span><span style="color: #0000ff">&gt;&lt;/</span><span style="color: #800000">span</span><span style="color: #0000ff">&gt;</span></div>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"></span></strong></p><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">添加一个索引字段field，并应用上面配置的fieldtype</span></span></span><br />
<p style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px"></p>
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: #0000ff">&lt;</span><span style="color: #800000">field&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="game_name"</span><span style="color: #ff0000">&nbsp;type</span><span style="color: #0000ff">="text"</span><span style="color: #ff0000">&nbsp;indexed</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;stored</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;required</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span></div>
<p style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px">&nbsp;</p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">然后找到这一句：&lt;defaultSearchField&gt;text&lt;/defaultSearchField&gt;把它改成&lt;defaultSearchField&gt;game_name&lt;/defaultSearchField&gt;</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">在浏览器打开<a href="http://localhost:8080/solr/admin/analysis.jsp">http://localhost:8080/solr/admin/analysis.jsp</a>，就可以进行分词处理了。</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">IKAnalyzer添加自定义分词词典：<span style="font-size: 13px">词典文件格式为无BOM的UTF-8编码的文本文件,文件扩展名不限，</span><span style="font-size: 13px">一次可以添加多个词库，每个词库以";"分开。把</span>IKAnalyzer 目录下的IKAnalyzer.cfg.xml和stopword.dic拷贝到$TOMCAT_HOME/webapps/solr/WEB_INF /classes目录下，可以自己新建一个mydic.dic文件，然后在IKAnalyzer.cfg.xml里进行配置。<br /></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">（2）<span style="color: #ff0000">安装mmseg4j</span></span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: 13px"><a style="border-bottom: #bbbbbb 1px dotted; color: #bb0000; text-decoration: none" class="ext-link" href="http://code.google.com/p/mmseg4j/downloads/list"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small"><span style="padding-left: 12px; background-position: 50% 50%" class="icon"><span style="color: #bb0000">&nbsp;</span><span style="color: #000000">下载地址：</span></span><span style="color: #000000">http://code.google.com/p/mmseg4j/downloads/list</span></span></span></span></a></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: small">在当前目录下新建mmseg4j目录，解压到该目录下：unzip mmseg4j-1.8.5.zip -d ./mmseg4j</span></span></p>
<p style="margin: 0px"><span style="font-size: x-small"><span style="font-size: small">把mmseg4j目录下的mmseg4j-all-1.8.5.jar文件拷贝到 $TOMCAT_HOME/webapps/solr/WEB-INF/lib/下</span></span></p>
<p style="margin: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small"><br /></span></span></span></p>
<p style="margin: 0px"><span style="font-size: x-small"><span style="font-size: small">配置schema.xml，编辑$TOMCAT_HOME/solr/conf/schema.xml，在文件中添加下面这个fieldtype</span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><br /></span></span></span></p>
<p style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px"></p>
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: #0000ff">&lt;</span><span style="color: #800000">fieldtype&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="textComplex"</span><span style="color: #ff0000">&nbsp;class</span><span style="color: #0000ff">="solr.TextField"</span><span style="color: #ff0000">&nbsp;positionIncrementGap</span><span style="color: #0000ff">="100"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">analyzer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">tokenizer&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"</span><span style="color: #ff0000">&nbsp;mode</span><span style="color: #0000ff">="complex"</span><span style="color: #ff0000">&nbsp;dicPath</span><span style="color: #0000ff">="/home/zhoujh/java/apache-tomcat7/solr/dict"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">tokenizer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">analyzer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">fieldtype</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">fieldtype&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="textMaxWord"</span><span style="color: #ff0000">&nbsp;class</span><span style="color: #0000ff">="solr.TextField"</span><span style="color: #ff0000">&nbsp;positionIncrementGap</span><span style="color: #0000ff">="100"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">analyzer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">tokenizer&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"</span><span style="color: #ff0000">&nbsp;mode</span><span style="color: #0000ff">="max-word"</span><span style="color: #ff0000">&nbsp;dicPath</span><span style="color: #0000ff">="/home/zhoujh/java/apache-tomcat7/solr/dict"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">tokenizer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">analyzer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">fieldtype</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">fieldtype&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="textSimple"</span><span style="color: #ff0000">&nbsp;class</span><span style="color: #0000ff">="solr.TextField"</span><span style="color: #ff0000">&nbsp;positionIncrementGap</span><span style="color: #0000ff">="100"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">analyzer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">tokenizer&nbsp;</span><span style="color: #ff0000">class</span><span style="color: #0000ff">="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"</span><span style="color: #ff0000">&nbsp;mode</span><span style="color: #0000ff">="simple"</span><span style="color: #ff0000">&nbsp;dicPath</span><span style="color: #0000ff">="/home/zhoujh/java/apache-tomcat7/solr/dict"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">tokenizer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">analyzer</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">fieldtype</span><span style="color: #0000ff">&gt;</span></div>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">注意：<span style="white-space: pre">dicPath的值改成你自己机器上相应的目录。</span></span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">然后修改之前添加的filed，让其使用mmseg4j分词器</span></span></span></p>
<p style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px"></p>
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: #0000ff">&lt;</span><span style="color: #800000">field&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="game_name"</span><span style="color: #ff0000">&nbsp;type</span><span style="color: #0000ff">="textComplex"</span><span style="color: #ff0000">&nbsp;indexed</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;stored</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;required</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span></div>
<p style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px"><br /></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">配置mmseg4j分词词典：<span style="font-size: 13px">MMSEG4J的词库是可以动态加载的，</span><span style="font-size: 13px">词库的编码必须是UTF-8，</span><span>mmseg4j 默认从当前目录下的 data 目录读取上面的文件，当然也可以指定别的目录，比如我就放在自定义的dict目录下</span><span>。</span><span>自定义词库文件名必需是 "words" 为前缀和 ".dic" 为后缀。</span><span>如：/data/words-my.dic。</span></span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">这里直接把mmseg4j/data目录下的所有.dic文件拷贝到$TOMCAT_HOME/solr/dict目录下。共有：4个dic文件，chars.dic、units.dic、 words.dic、 words-my.dic。下面简单解释一下这几个文件的作用。</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">1、chars.dic，是单个字，和对应的频率，一行一对，字在全面，频率在后面，中间用空格分开。这个文件的信息是 complex 模式要用到的。在最后一条过虑规则中使用了频率信息。</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">2、units.dic，是单位的字，如：分、秒、年。</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">3、words.dic，是核心的词库文件，一行一条，不需要其它任何数据（如词长）。</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">4、words-my.dic，是自定义词库文件</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: x-small"><span style="font-size: small">在浏览器打开<a href="http://localhost:8080/solr/admin/analysis.jsp">http://localhost:8080/solr/admin/analysis.jsp</a>，就可以看到分词效果了。</span></span></span></p>
<p style="padding-bottom: 15px; margin: 0in; padding-left: 0px; padding-right: 0px; padding-top: 0px"><span style="font-size: x-small"><span style="font-size: small">现在，这两种分词方法都已配置好了，想用哪种就把查询的filed的type设置成哪种。</span></span></p>
<p style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px"><br /></p>
<p style="padding-bottom: 0px; line-height: 1.5em; margin: 0px 0px 0.5em; padding-left: 0px; padding-right: 0px; color: black; font-size: 16px; padding-top: 10px"></p><img src ="http://www.blogjava.net/conans/aggbug/379549.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:38 <a href="http://www.blogjava.net/conans/articles/379549.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Solr 创建索引 From DataBase</title><link>http://www.blogjava.net/conans/articles/379547.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:33:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379547.html</guid><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: The Data Import Handler FrameworkSolr includes a very popular contrib module for importing data known as the DataImportHandler (DIH in short). It's a data processing pipeline built specificallyfor S...&nbsp;&nbsp;<a href='http://www.blogjava.net/conans/articles/379547.html'>阅读全文</a><img src ="http://www.blogjava.net/conans/aggbug/379547.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:33 <a href="http://www.blogjava.net/conans/articles/379547.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>使用Apache Solr对数据库建立索引（包括处理CLOB、CLOB）</title><link>http://www.blogjava.net/conans/articles/379546.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:23:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379546.html</guid><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 以下资料整理自网络，觉的有必要合并在一起，这样方便查看。主要分为两部分，第一部分是对《db-data-config.xml》的配置内容的讲解（属于高级内容），第二部分是DataImportHandler（属于基础）,第三部分是对db-data-config.xml的进阶（这个国内可能还没有人写过啊，我在google、baidu上都没有搜索到，最后可是拔代码，看solr的英文文档找的）第一部分是...&nbsp;&nbsp;<a href='http://www.blogjava.net/conans/articles/379546.html'>阅读全文</a><img src ="http://www.blogjava.net/conans/aggbug/379546.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:23 <a href="http://www.blogjava.net/conans/articles/379546.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>关于solr schema.xml 和solrconfig.xml的解释</title><link>http://www.blogjava.net/conans/articles/379545.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:18:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379545.html</guid><description><![CDATA[<div id="blog_content" class="blog_content">
<p><strong><span style="font-size: medium">一、字段配置（schema）</span> </strong></p>
<p>&nbsp;</p>
<p>schema.xml位于solr/conf/目录下，类似于数据表配置文件，</p>
<p>定义了加入索引的数据的数据类型，主要包括type、fields和其他的一些缺省设置。</p>
<p>&nbsp;</p>
<p>1、先来看下type节点，这里面定义FieldType子节点，包括name,class,positionIncrementGap等一些参数。</p>
<ul><li>name：就是这个FieldType的名称。</li><li>class：指向org.apache.solr.analysis包里面对应的class名称，用来定义这个类型的行为。</li></ul>
<div>
<div>
<div><a title="view plain" href="http://blog.csdn.net/escaflone/article/details/5726320"></a></div></div>
<ol><li><span>&lt;</span> <span>schema</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"example"</span> <span>&nbsp;</span> <span>version</span> <span>=</span> <span>"1.2"</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;<span>&lt;</span> <span>types</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>fieldType</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"string"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.StrField"</span> <span>&nbsp;</span> <span>sortMissingLast</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>omitNorms</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>fieldType</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"boolean"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.BoolField"</span> <span>&nbsp;</span> <span>sortMissingLast</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>omitNorms</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>fieldtype</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"binary"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.BinaryField"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>fieldType</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"int"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.TrieIntField"</span> <span>&nbsp;</span> <span>precisionStep</span> <span>=</span> <span>"0"</span> <span>&nbsp;</span> <span>omitNorms</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>positionIncrementGap</span> <span>=</span> <span>"0"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>fieldType</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"float"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.TrieFloatField"</span> <span>&nbsp;</span> <span>precisionStep</span> <span>=</span> <span>"0"</span> <span>&nbsp;</span> <span>omitNorms</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>positionIncrementGap</span> <span>=</span> <span>"0"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>fieldType</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"long"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.TrieLongField"</span> <span>&nbsp;</span> <span>precisionStep</span> <span>=</span> <span>"0"</span> <span>&nbsp;</span> <span>omitNorms</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>positionIncrementGap</span> <span>=</span> <span>"0"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>fieldType</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"double"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.TrieDoubleField"</span> <span>&nbsp;</span> <span>precisionStep</span> <span>=</span> <span>"0"</span> <span>&nbsp;</span> <span>omitNorms</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>positionIncrementGap</span> <span>=</span> <span>"0"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;...&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;<span>&lt;/</span> <span>types</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;...&nbsp;&nbsp;</span></li><li><span>&lt;/</span> <span>schema</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </li></ol></div>
<p>&nbsp;</p>
<p>必要的时候fieldType还需要自己定义这个类型的数据在建立索引和进行查询的时候要使用的分析器analyzer，包括分词和过滤，如下：</p>
<div>
<div>
<div><a title="view plain" href="http://blog.csdn.net/escaflone/article/details/5726320">view plain</a> <a title="print" href="http://blog.csdn.net/escaflone/article/details/5726320">print</a> <a title="?" href="http://blog.csdn.net/escaflone/article/details/5726320">?</a> </div></div>
<ol><li><span>&lt;</span> <span>fieldType</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"text_ws"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.TextField"</span> <span>&nbsp;</span> <span>positionIncrementGap</span> <span>=</span> <span>"100"</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;<span>&lt;</span> <span>analyzer</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>tokenizer</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.WhitespaceTokenizerFactory"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;<span>&lt;/</span> <span>analyzer</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&lt;/</span> <span>fieldType</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>fieldType</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"text"</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.TextField"</span> <span>&nbsp;</span> <span>positionIncrementGap</span> <span>=</span> <span>"100"</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;<span>&lt;</span> <span>analyzer</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"index"</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&lt;!--这个分词包是空格分词，在向索引库添加text类型的索引时，Solr会首先用空格进行分词&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;然后把分词结果依次使用指定的过滤器进行过滤，最后剩下的结果，才会加入到索引库中以备查询。&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;注意:Solr的analysis包并没有带支持中文的包，需要自己添加中文分词器，google下。&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;--<span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>tokenizer</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.WhitespaceTokenizerFactory"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;!--&nbsp;in&nbsp;this&nbsp;example,&nbsp;we&nbsp;will&nbsp;only&nbsp;use&nbsp;synonyms&nbsp;at&nbsp;query&nbsp;time&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.SynonymFilterFactory"</span> <span>&nbsp;</span> <span>synonyms</span> <span>=</span> <span>"index_synonyms.txt"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>ignoreCase</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>expand</span> <span>=</span> <span>"false"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;--<span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;!--&nbsp;Case&nbsp;insensitive&nbsp;stop&nbsp;word&nbsp;removal.&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;<span>enablePositionIncrements</span> <span>=</span> <span>true</span> <span>&nbsp;in&nbsp;both&nbsp;the&nbsp;index&nbsp;and&nbsp;query&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;analyzers&nbsp;to&nbsp;leave&nbsp;a&nbsp;'gap'&nbsp;for&nbsp;more&nbsp;accurate&nbsp;phrase&nbsp;queries.&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;--<span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.StopFilterFactory"</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>ignoreCase</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>words</span> <span>=</span> <span>"stopwords.txt"</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>enablePositionIncrements</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.WordDelimiterFilterFactory"</span> <span>&nbsp;</span> <span>generateWordParts</span> <span>=</span> <span>"1"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>generateNumberParts</span> <span>=</span> <span>"1"</span> <span>&nbsp;</span> <span>catenateWords</span> <span>=</span> <span>"1"</span> <span>&nbsp;</span> <span>catenateNumbers</span> <span>=</span> <span>"1"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>catenateAll</span> <span>=</span> <span>"0"</span> <span>&nbsp;</span> <span>splitOnCaseChange</span> <span>=</span> <span>"1"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.LowerCaseFilterFactory"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.SnowballPorterFilterFactory"</span> <span>&nbsp;</span> <span>language</span> <span>=</span> <span>"English"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>protected</span> <span>=</span> <span>"protwords.txt"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;/</span> <span>analyzer</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>analyzer</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"query"</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>tokenizer</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.WhitespaceTokenizerFactory"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.SynonymFilterFactory"</span> <span>&nbsp;</span> <span>synonyms</span> <span>=</span> <span>"synonyms.txt"</span> <span>&nbsp;</span> <span>ignoreCase</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>expand</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.StopFilterFactory"</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>ignoreCase</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>words</span> <span>=</span> <span>"stopwords.txt"</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>enablePositionIncrements</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.WordDelimiterFilterFactory"</span> <span>&nbsp;</span> <span>generateWordParts</span> <span>=</span> <span>"1"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>generateNumberParts</span> <span>=</span> <span>"1"</span> <span>&nbsp;</span> <span>catenateWords</span> <span>=</span> <span>"0"</span> <span>&nbsp;</span> <span>catenateNumbers</span> <span>=</span> <span>"0"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>catenateAll</span> <span>=</span> <span>"0"</span> <span>&nbsp;</span> <span>splitOnCaseChange</span> <span>=</span> <span>"1"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.LowerCaseFilterFactory"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>filter</span> <span>&nbsp;</span> <span>class</span> <span>=</span> <span>"solr.SnowballPorterFilterFactory"</span> <span>&nbsp;</span> <span>language</span> <span>=</span> <span>"English"</span> <span>&nbsp;&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>protected</span> <span>=</span> <span>"protwords.txt"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;/</span> <span>analyzer</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&lt;/</span> <span>fieldType</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </li></ol></div>
<p>&nbsp;</p>
<p>2、再来看下fields节点内定义具体的字段（类似数据库的字段），含有以下属性：</p>
<ul><li>name：字段名</li><li>type：之前定义过的各种FieldType</li><li>indexed：是否被索引</li><li>stored：是否被存储（如果不需要存储相应字段值，尽量设为false）</li><li>multiValued：是否有多个值（对可能存在多值的字段尽量设置为true，避免建索引时抛出错误）</li></ul>
<div>
<div>
<div><a title="view plain" href="http://blog.csdn.net/escaflone/article/details/5726320">view plain</a> <a title="print" href="http://blog.csdn.net/escaflone/article/details/5726320">print</a> <a title="?" href="http://blog.csdn.net/escaflone/article/details/5726320">?</a> </div></div>
<ol><li><span>&lt;</span> <span>fields</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>field</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"id"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"integer"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>required</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>field</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"name"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"text"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>field</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"summary"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"text"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>field</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"author"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"string"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>field</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"date"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"date"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"false"</span> <span>&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>field</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"content"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"text"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>stored</span> <span>=</span> <span>"false"</span> <span>&nbsp;</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>field</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"keywords"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"keyword_text"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>stored</span> <span>=</span> <span>"false"</span> <span>&nbsp;</span> <span>multiValued</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;!--拷贝字段--&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&nbsp;&nbsp;&nbsp;&nbsp;<span>&lt;</span> <span>field</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"all"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"text"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;</span> <span>stored</span> <span>=</span> <span>"false"</span> <span>&nbsp;</span> <span>multiValued</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </span></li><li><span>&lt;/</span> <span>fields</span> <span>&gt;</span> <span>&nbsp;&nbsp;</span> </li></ol></div>
<p>&nbsp;</p>
<p>3、建议建立一个拷贝字段，将所有的 全文本 字段复制到一个字段中，以便进行统一的检索：</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp; 以下是拷贝设置：</p>
<div>
<div>
<div><a title="view plain" href="http://blog.csdn.net/escaflone/article/details/5726320">view plain</a> <a title="print" href="http://blog.csdn.net/escaflone/article/details/5726320">print</a> <a title="?" href="http://blog.csdn.net/escaflone/article/details/5726320">?</a> </div></div>
<ol><li><span>&lt;</span> <span>copyField</span> <span>&nbsp;</span> <span>source</span> <span>=</span> <span>"name"</span> <span>&nbsp;</span> <span>dest</span> <span>=</span> <span>"all"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>copyField</span> <span>&nbsp;</span> <span>source</span> <span>=</span> <span>"summary"</span> <span>&nbsp;</span> <span>dest</span> <span>=</span> <span>"all"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </li></ol></div>
<p>&nbsp;</p>
<p>4、动态字段，没有具体名称的字段，用dynamicField字段</p>
<p>如：name为*_i，定义它的type为int，那么在使用这个字段的时候，任务以_i结果的字段都被认为符合这个定义。如name_i, school_i</p>
<div>
<div>
<div><a title="view plain" href="http://blog.csdn.net/escaflone/article/details/5726320">view plain</a> <a title="print" href="http://blog.csdn.net/escaflone/article/details/5726320">print</a> <a title="?" href="http://blog.csdn.net/escaflone/article/details/5726320">?</a> </div></div>
<ol><li><span>&lt;</span> <span>dynamicField</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"*_i"</span> <span>&nbsp;&nbsp;</span> <span>type</span> <span>=</span> <span>"int"</span> <span>&nbsp;&nbsp;&nbsp;&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>dynamicField</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"*_s"</span> <span>&nbsp;&nbsp;</span> <span>type</span> <span>=</span> <span>"string"</span> <span>&nbsp;&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>dynamicField</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"*_l"</span> <span>&nbsp;&nbsp;</span> <span>type</span> <span>=</span> <span>"long"</span> <span>&nbsp;&nbsp;&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>dynamicField</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"*_t"</span> <span>&nbsp;&nbsp;</span> <span>type</span> <span>=</span> <span>"text"</span> <span>&nbsp;&nbsp;&nbsp;&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>dynamicField</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"*_b"</span> <span>&nbsp;&nbsp;</span> <span>type</span> <span>=</span> <span>"boolean"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>dynamicField</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"*_f"</span> <span>&nbsp;&nbsp;</span> <span>type</span> <span>=</span> <span>"float"</span> <span>&nbsp;&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>dynamicField</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"*_d"</span> <span>&nbsp;&nbsp;</span> <span>type</span> <span>=</span> <span>"double"</span> <span>&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span></li><li><span>&lt;</span> <span>dynamicField</span> <span>&nbsp;</span> <span>name</span> <span>=</span> <span>"*_dt"</span> <span>&nbsp;</span> <span>type</span> <span>=</span> <span>"date"</span> <span>&nbsp;&nbsp;&nbsp;&nbsp;</span> <span>indexed</span> <span>=</span> <span>"true"</span> <span>&nbsp;&nbsp;</span> <span>stored</span> <span>=</span> <span>"true"</span> <span>/&gt;</span> <span>&nbsp;&nbsp;</span> </li></ol></div>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong><span style="font-size: medium">schema.xml文档注释中的信息：</span> </strong></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>1、为了改进性能，可以采取以下几种措施：</p>
<ul><li>将所有只用于搜索的，而不需要作为结果的field（特别是一些比较大的field）的stored设置为false</li><li>将不需要被用于搜索的，而只是作为结果返回的field的indexed设置为false</li><li>删除所有不必要的copyField声明</li><li>为了索引字段的最小化和搜索的效率，将所有的 text fields的index都设置成field，然后使用copyField将他们都复制到一个总的 text field上，然后对他进行搜索。</li><li>为了最大化搜索效率，使用java编写的客户端与solr交互（使用流通信）</li><li>在服务器端运行JVM（省去网络通信），使用尽可能高的Log输出等级，减少日志量。</li></ul>
<p>2、<span style="color: #0000ff">&lt;</span> <span style="color: #990000"><span>schema</span> <span>name</span> </span><span style="color: #0000ff">="</span> <strong>example</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">version</span> <span style="color: #0000ff">="</span> <strong>1.2</strong> <span style="color: #0000ff"><span>"</span> <span>&gt;</span> </span></p>
<ul><li>name：标识这个schema的名字</li><li>version：现在版本是1.2</li></ul>
<p>3、filedType</p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">fieldType</span> <span style="color: #990000">name</span> <span style="color: #0000ff">="</span> <strong>string</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">class</span> <span style="color: #0000ff">="</span> <strong>solr.StrField</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">sortMissingLast</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">omitNorms</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<ul><li>name：标识而已。</li><li>class和其他属性决定了这个fieldType的实际行为。（class以solr开始的，都是在org.appache.solr.analysis包下）</li></ul>
<p>可选的属性：</p>
<ul><li>sortMissingLast和sortMissingFirst两个属性是用在可以内在使用String排序的类型上（包括：string,boolean,sint,slong,sfloat,sdouble,pdate）。</li><li>sortMissingLast="true"，没有该field的数据排在有该field的数据之后，而不管请求时的排序规则。</li><li>sortMissingFirst="true"，跟上面倒过来呗。</li><li>2个值默认是设置成false</li></ul>
<p>&nbsp;</p>
<p>StrField类型不被分析，而是被逐字地索引/存储。</p>
<p>StrField和TextField都有一个可选的属性&#8220;compressThreshold&#8221;，保证压缩到不小于一个大小（单位：char）</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000"><span>fieldType</span> <span>name</span> </span><span style="color: #0000ff">="</span> <strong>text</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">class</span> <span style="color: #0000ff">="</span> <strong>solr.TextField</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">positionIncrementGap</span> <span style="color: #0000ff">="</span> <strong>100</strong> <span style="color: #0000ff"><span>"</span> <span>&gt;</span> </span></p>
<p>&nbsp;</p>
<p>solr.TextField 允许用户通过分析器来定制索引和查询，分析器包括一个分词器（tokenizer）和多个过滤器（filter）</p>
<p>&nbsp;</p>
<ul><li>positionIncrementGap：可选属性，定义在同一个文档中此类型数据的空白间隔，避免短语匹配错误。</li></ul>
<p>name:&nbsp;&nbsp;&nbsp; 字段类型名&nbsp; <br />class:&nbsp;&nbsp;&nbsp; java类名&nbsp; <br />indexed:&nbsp;&nbsp;&nbsp; 缺省true。 说明这个数据应被搜索和排序，如果数据没有indexed，则stored应是true。&nbsp; <br />stored:&nbsp;&nbsp;&nbsp; 缺省true。说明这个字段被包含在搜索结果中是合适的。如果数据没有stored,则indexed应是true。&nbsp; <br />sortMissingLast:&nbsp;&nbsp;&nbsp; 指没有该指定字段数据的document排在有该指定字段数据的document的后面&nbsp; <br />sortMissingFirst:&nbsp;&nbsp;&nbsp; 指没有该指定字段数据的document排在有该指定字段数据的document的前面&nbsp; <br />omitNorms:&nbsp;&nbsp;&nbsp; 字段的长度不影响得分和在索引时不做boost时，设置它为true。一般文本字段不设置为true。&nbsp; <br />termVectors:&nbsp;&nbsp;&nbsp; 如果字段被用来做more like this 和highlight的特性时应设置为true。&nbsp; <br />compressed:&nbsp;&nbsp;&nbsp; 字段是压缩的。这可能导致索引和搜索变慢，但会减少存储空间，只有StrField和TextField是可以压缩，这通常适合字段的长度超过200个字符。&nbsp; <br />multiValued:&nbsp;&nbsp;&nbsp; 字段多于一个值的时候，可设置为true。&nbsp; <br />positionIncrementGap:&nbsp;&nbsp;&nbsp; 和multiValued<br />一起使用，设置多个值之间的虚拟空白的数量 <br /></p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">tokenizer</span> <span style="color: #990000">class</span> <span style="color: #0000ff">="</span> <strong>solr.WhitespaceTokenizerFactory</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<p>空格分词，精确匹配。</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">filter</span> <span style="color: #990000">class</span> <span style="color: #0000ff">="</span> <strong>solr.WordDelimiterFilterFactory</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">generateWordParts</span> <span style="color: #0000ff">="</span> <strong>1</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">generateNumberParts</span> <span style="color: #0000ff">="</span> <strong>1</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">catenateWords</span> <span style="color: #0000ff">="</span> <strong>1</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">catenateNumbers</span> <span style="color: #0000ff">="</span> <strong>1</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">catenateAll</span> <span style="color: #0000ff">="</span> <strong>0</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">splitOnCaseChange</span> <span style="color: #0000ff">="</span> <strong>1</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<p>在分词和匹配时，考虑 "-"连字符，字母数字的界限，非字母数字字符，这样 "wifi"或"wi fi"都能匹配"Wi-Fi"。</p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">filter</span> <span style="color: #990000">class</span> <span style="color: #0000ff">="</span> <strong>solr.SynonymFilterFactory</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">synonyms</span> <span style="color: #0000ff">="</span> <strong>synonyms.txt</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">ignoreCase</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">expand</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<p><span style="color: #000000">同义词&nbsp;</span> </p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">filter</span> <span style="color: #990000">class</span> <span style="color: #0000ff">="</span> <strong>solr.StopFilterFactory</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">ignoreCase</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">words</span> <span style="color: #0000ff">="</span> <strong>stopwords.txt</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">enablePositionIncrements</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<p>在禁用字（stopword）删除后，在短语间增加间隔</p>
<p>stopword：即在建立索引过程中（建立索引和搜索）被忽略的词，比如is this等常用词。在conf/stopwords.txt维护。</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>4、fields</p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">field</span> <span style="color: #990000">name</span> <span style="color: #0000ff">="</span> <strong>id</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">type</span> <span style="color: #0000ff">="</span> <strong>string</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">indexed</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">stored</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">required</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<ul><li>name：标识而已。</li><li>type：先前定义的类型。</li><li>indexed：是否被用来建立索引（关系到搜索和排序）</li><li>stored：是否储存</li><li>compressed：[false]，是否使用gzip压缩（只有TextField和StrField可以压缩）</li><li>mutiValued：是否包含多个值</li><li>omitNorms：是否忽略掉Norm，可以节省内存空间，只有全文本field和need an index-time boost的field需要norm。（具体没看懂，注释里有矛盾）</li><li>termVectors：[false]，当设置true，会存储 term vector。当使用MoreLikeThis，用来作为相似词的field应该存储起来。</li><li>termPositions：存储 term vector中的地址信息，会消耗存储开销。</li><li>termOffsets：存储 term vector 的偏移量，会消耗存储开销。</li><li>default：如果没有属性需要修改，就可以用这个标识下。</li></ul>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">field</span> <span style="color: #990000">name</span> <span style="color: #0000ff">="</span> <strong>text</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">type</span> <span style="color: #0000ff">="</span> <strong>text</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">indexed</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">stored</span> <span style="color: #0000ff">="</span> <strong>false</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">multiValued</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<p>包罗万象（有点夸张）的field，包含所有可搜索的text fields，通过copyField实现。</p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">copyField</span> <span style="color: #990000">source</span> <span style="color: #0000ff">="</span> <strong>cat</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">dest</span> <span style="color: #0000ff">="</span> <strong>text</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<div>
<div><span><strong><span style="color: #ff0000">&nbsp;</span> </strong></span><span style="color: #0000ff">&lt;</span> <span style="color: #990000">copyField</span> <span style="color: #990000">source</span> <span style="color: #0000ff">="</span> <strong>name</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">dest</span> <span style="color: #0000ff">="</span> <strong>text</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></div></div>
<div>
<div><span><strong><span style="color: #ff0000">&nbsp;</span> </strong></span><span style="color: #0000ff">&lt;</span> <span style="color: #990000">copyField</span> <span style="color: #990000">source</span> <span style="color: #0000ff">="</span> <strong>manu</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">dest</span> <span style="color: #0000ff">="</span> <strong>text</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></div></div>
<div>
<div><span><strong><span style="color: #ff0000">&nbsp;</span> </strong></span><span style="color: #0000ff">&lt;</span> <span style="color: #990000">copyField</span> <span style="color: #990000">source</span> <span style="color: #0000ff">="</span> <strong>features</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">dest</span> <span style="color: #0000ff">="</span> <strong>text</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></div></div>
<div>
<div><span><strong><span style="color: #ff0000">&nbsp;</span> </strong></span><span style="color: #0000ff">&lt;</span> <span style="color: #990000">copyField</span> <span style="color: #990000">source</span> <span style="color: #0000ff">="</span> <strong>includes</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">dest</span> <span style="color: #0000ff">="</span> <strong>text</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></div></div>
<p>在添加索引时，将所有被拷贝field（如cat）中的数据拷贝到text field中</p>
<p>作用：</p>
<ul><li>将多个field的数据放在一起同时搜索，提供速度</li><li>将一个field的数据拷贝到另一个，可以用2种不同的方式来建立索引。</li></ul>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">dynamicField</span> <span style="color: #990000">name</span> <span style="color: #0000ff">="</span> <strong>*_i</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">type</span> <span style="color: #0000ff">="</span> <strong>int</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">indexed</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">stored</span> <span style="color: #0000ff">="</span> <strong>true</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<p>&nbsp;</p>
<p>如果一个field的名字没有匹配到，那么就会用动态field试图匹配定义的各种模式。</p>
<ul><li>"*"只能出现在模式的最前和最后</li><li>较长的模式会被先去做匹配</li><li>如果2个模式同时匹配上，最先定义的优先</li></ul>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">dynamicField</span> <span style="color: #990000">name</span> <span style="color: #0000ff">="</span> <strong>*</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">type</span> <span style="color: #0000ff">="</span> <strong>ignored</strong> <span style="color: #0000ff">"</span> <span style="color: #990000">multiValued<span style="color: #0000ff">="</span> <span style="color: #000000"><strong>true</strong> </span><span style="color: #0000ff">"</span> </span><span style="color: #0000ff"><span>/&gt;</span> </span></p>
<p><span style="color: #0000ff"><span>如果通过上面的匹配都没找到，可以定义这个，然后定义个type，当String处理。（一般不会发生）</span> </span></p>
<p><span style="color: #0000ff"><span>但若不定义，找不到匹配会报错。</span> </span></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>5、其他一些标签</p>
<p>&nbsp;</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">uniqueKey</span> <span style="color: #0000ff">&gt;</span> <span><strong>id</strong> </span><span style="color: #0000ff">&lt;/</span> <span style="color: #990000">uniqueKey</span> <span style="color: #0000ff">&gt;</span> </p>
<p>文档的唯一标识，&nbsp;必须填写这个field（除非该field被标记required="false"），否则solr建立索引报错。</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">defaultSearchField</span> <span style="color: #0000ff">&gt;</span> <span><strong>text</strong> </span><span style="color: #0000ff">&lt;/</span> <span style="color: #990000">defaultSearchField</span> <span style="color: #0000ff">&gt;</span> </p>
<p>如果搜索参数中没有指定具体的field，那么这是默认的域。</p>
<p><span style="color: #0000ff">&lt;</span> <span style="color: #990000">solrQueryParser</span> <span style="color: #990000">defaultOperator</span> <span style="color: #0000ff">="</span> <strong>OR</strong> <span style="color: #0000ff"><span>"</span> <span>/&gt;</span> </span></p>
<p>配置搜索参数短语间的逻辑，可以是"AND|OR"。</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong><span style="font-size: medium">二、solrconfig.xml</span> </strong></p>
<p>&nbsp;</p>
<p>1、索引配置</p>
<p>&nbsp;</p>
<p>mainIndex 标记段定义了控制Solr索引处理的一些因素.</p>
<ul><li>
<p>useCompoundFile：通过将很多 Lucene 内部文件整合到单一一个文件来减少使用中的文件的数量。这可有助于减少 Solr 使用的文件句柄数目，代价是降低了性能。除非是应用程序用完了文件句柄，否则 <code>false</code> 的默认值应该就已经足够。</p></li><li>useCompoundFile：通过将很多Lucene内部文件整合到一个文件，来减少使用中的文件的数量。这可有助于减少Solr使用的文件句柄的数目，代价是降低了性能。除非是应用程序用完了文件句柄，否则false的默认值应该就已经足够了。</li><li>mergeFacor：决定Lucene段被合并的频率。较小的值（最小为2）使用的内存较少但导致的索引时间也更慢。较大的值可使索引时间变快但会牺牲较多的内存。（典型的时间与空间 的平衡配置）</li><li>maxBufferedDocs：在合并内存中文档和创建新段之前，定义所需索引的最小文档数。段是用来存储索引信息的Lucene文件。较大的值可使索引时间变快但会牺牲较多内存。</li><li>maxMergeDocs：控制可由Solr合并的 Document 的最大数。较小的值（&lt;10,000）最适合于具有大量更新的应用程序。</li><li>maxFieldLength：对于给定的Document，控制可添加到Field的最大条目数，进而阶段该文档。如果文档可能会很大，就需要增加这个数值。然后，若将这个值设置得过高会导致内存不足错误。</li><li>unlockOnStartup：告知Solr忽略在多线程环境中用来保护索引的锁定机制。在某些情况下，索引可能会由于不正确的关机或其他错误而一直处于锁定，这就妨碍了添加和更新。将其设置为true可以禁用启动索引，进而允许进行添加和更新。（锁机制）</li></ul>
<p>&nbsp;</p>
<p>&nbsp;2、查询处理配置</p>
<p>&nbsp;</p>
<p>query标记段中以下一些与缓存无关的特性：</p>
<ul><li>maxBooleanClauses：定义可组合在一起形成以个查询的字句数量的上限。正常情况1024已经足够。如果应用程序大量使用了通配符或范围查询，增加这个限制将能避免当值超出时，抛出TooMangClausesException。</li><li>enableLazyFieldLoading：如果应用程序只会检索Document上少数几个Field，那么可以将这个属性设置为 true。懒散加载的一个常见场景大都发生在应用程序返回一些列搜索结果的时候，用户常常会单击其中的一个来查看存储在此索引中的原始文档。初始的现实常常只需要现实很短的一段信息。若是检索大型的Document，除非必需，否则就应该避免加载整个文档。</li></ul>
<p>&nbsp;</p>
<p>query部分负责定义与在Solr中发生的时间相关的几个选项：</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>概念：Solr（实际上是Lucene）使用称为Searcher的Java类来处理Query实例。Searcher将索引内容相关的数据加载到内存中。根据索引、CPU已经可用内存的大小，这个过程可能需要较长的一段时间。要改进这一设计和显著提高性能，Solr引入了一张&#8220;温暖&#8221;策略，即把这些新的Searcher联机以便为现场用户提供查询服务之前，先对它们进行&#8220;热身&#8221;。</p>
<ul><li>newSearcher和firstSearcher事件，可以使用这些事件来制定实例化新Searcher或第一个Searcher时，应该执行哪些查询。如果应用程序期望请求某些特定的查询，那么在创建新Searcher或第一个Searcher时就应该反注释这些部分并执行适当的查询。</li></ul>
<p>&nbsp;</p>
<p>query中的智能缓存：</p>
<p>&nbsp;</p>
<ul><li>filterCache：通过存储一个匹配给定查询的文档 id 的无序集，过滤器让 Solr 能够有效提高查询的性能。缓存这些过滤器意味着对Solr的重复调用可以导致结果集的快速查找。更常见的场景是缓存一个过滤器，然后再发起后续的精炼查询，这种查询能使用过滤器来限制要搜索的文档数。</li><li>queryResultCache：为查询、排序条件和所请求文档的数量缓存文档 id 的有序集合。</li><li>documentCache：缓存Lucene Document，使用内部Lucene文档id（以便不与Solr唯一id相混淆）。由于Lucene的内部Document id 可以因索引操作而更改，这种缓存不能自热。</li><li>Named caches：命名缓存是用户定义的缓存，可被 Solr定制插件 所使用。</li></ul>
<p>其中filterCache、queryResultCache、Named caches（如果实现了org.apache.solr.search.CacheRegenerator）可以自热。</p>
<p>每个缓存声明都接受最多四个属性：</p>
<ul><li>class：是缓存实现的Java名</li><li>size：是最大的条目数</li><li>initialSize：是缓存的初始大小</li><li>autoWarmCount：是取自旧缓存以预热新缓存的条目数。如果条目很多，就意味着缓存的hit会更多，只不过需要花更长的预热时间。</li></ul>
<p>对于所有缓存模式而言，在设置缓存参数时，都有必要在内存、cpu和磁盘访问之间进行均衡。统计信息管理页（管理员界面的Statistics）对于分析缓存的 hit-to-miss 比例以及微调缓存大小的统计数据都非常有用。而且，并非所有应用程序都会从缓存受益。实际上，一些应用程序反而会由于需要将某个永远也用不到的条目存储在缓存中这一额外步骤而受到影响。</p></div><img src ="http://www.blogjava.net/conans/aggbug/379545.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:18 <a href="http://www.blogjava.net/conans/articles/379545.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>DataImportHandler--remove data from index</title><link>http://www.blogjava.net/conans/articles/379544.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:11:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379544.html</guid><description><![CDATA[<p>Deleting data from an index using DIH incremental indexing, on Solr wiki, is residually treated as something that works similarly to update the records. Similarly, in a previous article, I used this shortcut, the more that I have given an example of indexing wikipedia data that does not need to delete data.</p>
<p>Having at hand a sample data of the albums and performers, I decided to show my way of dealing with such cases. For simplicity and clarity, I assume that after the first import, the data can only decrease.</p>
<p><span id="more-711"></span></p>
<h2>Test data</h2>
<p>My test data are located in the PostgreSQL database table defined as follows:</p><pre>Table "public.albums"
Column |  Type   |                      Modifiers
--------+---------+-----------------------------------------------------
id     | integer | not null default nextval('albums_id_seq'::regclass)
name   | text    | not null
author | text    | not null
Indexes:
"albums_pk" PRIMARY KEY, btree (id)</pre>
<p>The table has 825,661 records.</p>
<h2>Test installation</h2>
<p>For testing purposes I used the Solr instance having the following characteristics:</p>
<p>Definition at schema.xml:</p>
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000ff">&lt;</span><span style="color: #800000">fields</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">field&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="id"</span><span style="color: #ff0000">&nbsp;type</span><span style="color: #0000ff">="string"</span><span style="color: #ff0000">&nbsp;indexed</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;stored</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;required</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000"><br />&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">field&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="album"</span><span style="color: #ff0000">&nbsp;type</span><span style="color: #0000ff">="text"</span><span style="color: #ff0000">&nbsp;indexed</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;stored</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;multiValued</span><span style="color: #0000ff">="true"</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000"><br />&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">field&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="author"</span><span style="color: #ff0000">&nbsp;type</span><span style="color: #0000ff">="text"</span><span style="color: #ff0000">&nbsp;indexed</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;stored</span><span style="color: #0000ff">="true"</span><span style="color: #ff0000">&nbsp;multiValued</span><span style="color: #0000ff">="true"</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000"><br /></span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">fields</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /></span><span style="color: #0000ff">&lt;</span><span style="color: #800000">uniqueKey</span><span style="color: #0000ff">&gt;</span><span style="color: #000000">id</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">uniqueKey</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /></span><span style="color: #0000ff">&lt;</span><span style="color: #800000">defaultSearchField</span><span style="color: #0000ff">&gt;</span><span style="color: #000000">album</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">defaultSearchField</span><span style="color: #0000ff">&gt;</span></div>
<p>&nbsp;</p>Definition of DIH in solrconfig.xm
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000ff">&lt;</span><span style="color: #800000">requestHandler&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="/dataimport"</span><span style="color: #ff0000">&nbsp;class</span><span style="color: #0000ff">="org.apache.solr.handler.dataimport.DataImportHandler"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">lst&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="defaults"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">str&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="config"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000">db-data-config.xml</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">str</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">lst</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /></span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">requestHandler</span><span style="color: #0000ff">&gt;</span></div><br />And the file DIH db-data-config.
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000ff">&lt;</span><span style="color: #800000">dataConfig</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">dataSource&nbsp;</span><span style="color: #ff0000">driver</span><span style="color: #0000ff">="org.postgresql.Driver"</span><span style="color: #ff0000">&nbsp;url</span><span style="color: #0000ff">="jdbc:postgresql://localhost:5432/shardtest"</span><span style="color: #ff0000">&nbsp;user</span><span style="color: #0000ff">="solr"</span><span style="color: #ff0000">&nbsp;password</span><span style="color: #0000ff">="secret"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000"><br />&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">document</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">entity&nbsp;</span><span style="color: #ff0000">name</span><span style="color: #0000ff">="album"</span><span style="color: #ff0000">&nbsp;query</span><span style="color: #0000ff">="SELECT&nbsp;*&nbsp;from&nbsp;albums"</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">field&nbsp;</span><span style="color: #ff0000">column</span><span style="color: #0000ff">="id"</span><span style="color: #ff0000">&nbsp;name</span><span style="color: #0000ff">="id"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">field&nbsp;</span><span style="color: #ff0000">column</span><span style="color: #0000ff">="name"</span><span style="color: #ff0000">&nbsp;name</span><span style="color: #0000ff">="album"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;</span><span style="color: #800000">field&nbsp;</span><span style="color: #ff0000">column</span><span style="color: #0000ff">="author"</span><span style="color: #ff0000">&nbsp;name</span><span style="color: #0000ff">="author"</span><span style="color: #ff0000">&nbsp;</span><span style="color: #0000ff">/&gt;</span><span style="color: #000000"><br />&nbsp;&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">entity</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br />&nbsp;</span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">document</span><span style="color: #0000ff">&gt;</span><span style="color: #000000"><br /></span><span style="color: #0000ff">&lt;/</span><span style="color: #800000">dataConfig</span><span style="color: #0000ff">&gt;</span></div><br /><br />
<h2>Deleting Data</h2>
<p>Looking at the table shows that when we remove the record, he is deleted without leaving a trace, and the only way to update our index would be to compare the documents identifiers in the index to the identifiers in the database and deleting those that no longer exist in the database. Slow and cumbersome. Another way is adding a column <em>deleted_at</em>: instead of physically deleting the record, only add <a title="information" href="http://solr.pl/en/informations/">information</a> to this column. DIH can then retrieve all records from the set date later than the last crawl. The disadvantage of this solution may be necessary to modify the application to take such information into consideration.</p>
<p>I apply a different solution, transparent to applications. Let&#8217;s create a new table:</p>
<div id="highlighter_497115" class="syntaxhighlighter  ">
<div class="lines">
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>1</code></td>
<td class="content"><code class="keyword">CREATE</code> <code class="keyword">TABLE</code> <code class="plain">deletes</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>2</code></td>
<td class="content"><code class="plain">(</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>3</code></td>
<td class="content"><code class="plain">id serial </code><code class="color1">NOT</code> <code class="color1">NULL</code><code class="plain">,</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>4</code></td>
<td class="content"><code class="plain">deleted_id </code><code class="keyword">bigint</code><code class="plain">,</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>5</code></td>
<td class="content"><code class="plain">deleted_at </code><code class="keyword">timestamp</code> <code class="plain">without </code><code class="keyword">time</code> <code class="plain">zone </code><code class="color1">NOT</code> <code class="color1">NULL</code><code class="plain">,</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>6</code></td>
<td class="content"><code class="keyword">CONSTRAINT</code> <code class="plain">deletes_pk </code><code class="keyword">PRIMARY</code> <code class="keyword">KEY</code> <code class="plain">(id)</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>7</code></td>
<td class="content"><code class="plain">);</code></td></tr></tbody></table></div></div></div>
<p>This table will automagically add an identifier of those items that were removed from the table <em>albums </em>and information when they were removed.</p>
<p>Now we add the function:</p>
<div id="highlighter_389800" class="syntaxhighlighter  ">
<div class="lines">
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>01</code></td>
<td class="content"><code class="keyword">CREATE</code> <code class="color1">OR</code> <code class="color2">REPLACE</code> <code class="keyword">FUNCTION</code> <code class="plain">insert_after_delete()</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>02</code></td>
<td class="content"><code class="keyword">RETURNS</code> <code class="keyword">trigger</code> <code class="keyword">AS</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>03</code></td>
<td class="content"><code class="plain">$BODY$</code><code class="keyword">BEGIN</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>04</code></td>
<td class="content"><code class="plain">IF tg_op = </code><code class="string">'DELETE'</code> <code class="keyword">THEN</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>05</code></td>
<td class="content"><code class="keyword">INSERT</code> <code class="keyword">INTO</code> <code class="plain">deletes(deleted_id, deleted_at)</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>06</code></td>
<td class="content"><code class="keyword">VALUES</code> <code class="plain">(old.id, now());</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>07</code></td>
<td class="content"><code class="keyword">RETURN</code> <code class="plain">old;</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>08</code></td>
<td class="content"><code class="keyword">END</code> <code class="plain">IF;</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>09</code></td>
<td class="content"><code class="keyword">END</code><code class="plain">$BODY$</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>10</code></td>
<td class="content"><code class="plain">LANGUAGE plpgsql VOLATILE;</code></td></tr></tbody></table></div></div></div>
<p>and a trigger:</p>
<div id="highlighter_512012" class="syntaxhighlighter  ">
<div class="lines">
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>1</code></td>
<td class="content"><code class="keyword">CREATE</code> <code class="keyword">TRIGGER</code> <code class="plain">deleted_trg</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>2</code></td>
<td class="content"><code class="plain">BEFORE </code><code class="keyword">DELETE</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>3</code></td>
<td class="content"><code class="keyword">ON</code> <code class="plain">albums</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>4</code></td>
<td class="content"><code class="keyword">FOR</code> <code class="plain">EACH ROW</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>5</code></td>
<td class="content"><code class="keyword">EXECUTE</code> <code class="keyword">PROCEDURE</code> <code class="plain">insert_after_delete();</code></td></tr></tbody></table></div></div></div>
<h2>How it works</h2>
<p>Each entry deleted from the <em>albums </em>table should result in addition to the table <em>deletes</em>. Let&#8217;s check it out. Remove a few records:</p>
<div id="highlighter_605248" class="syntaxhighlighter  ">
<div class="lines">
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>1</code></td>
<td class="content"><code class="plain">=&gt; </code><code class="keyword">DELETE</code> <code class="keyword">FROM</code> <code class="plain">albums </code><code class="keyword">where</code> <code class="plain">id &lt; 37;</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>2</code></td>
<td class="content"><code class="keyword">DELETE</code> <code class="plain">2</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>3</code></td>
<td class="content"><code class="plain">=&gt; </code><code class="keyword">SELECT</code> <code class="plain">* </code><code class="keyword">from</code> <code class="plain">deletes;</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>4</code></td>
<td class="content"><code class="plain">id | deleted_id |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; deleted_at</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>5</code></td>
<td class="content"><code class="comments">----+------------+----------------------------</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>6</code></td>
<td class="content"><code class="plain">26 |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 35 | 2010-12-23 13:53:18.034612</code></td></tr></tbody></table></div>
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>7</code></td>
<td class="content"><code class="plain">27 |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 36 | 2010-12-23 13:53:18.034612</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>8</code></td>
<td class="content"><code class="plain">(2 </code><code class="keyword">rows</code><code class="plain">)</code></td></tr></tbody></table></div></div></div>
<p>So the database part works.</p>
<p>We fill up the DIH configuration file so that the <em>entity </em>has been defined as follows:</p>
<div id="highlighter_108857" class="syntaxhighlighter  ">
<div class="lines">
<div class="line alt1">
<table>
<tbody>
<tr>
<td class="number"><code>1</code></td>
<td class="content"><code class="plain">&lt;</code><code class="keyword">entity</code> <code class="color1">name</code><code class="plain">=</code><code class="string">"album"</code> <code class="color1">query</code><code class="plain">=</code><code class="string">"SELECT * from albums"</code></td></tr></tbody></table></div>
<div class="line alt2">
<table>
<tbody>
<tr>
<td class="number"><code>2</code></td>
<td class="content"><code class="spaces">&nbsp;&nbsp;</code><code class="plain">deletedPkQuery="SELECT deleted_id as id FROM deletes WHERE deleted_at &gt; '${dataimporter.last_index_time}'"&gt;</code></td></tr></tbody></table></div></div></div>
<p>This allows the import DIH incremental import to use the <em>deletedPkQuery </em>attribute to get the identifiers of the documents which should be removed.</p>
<p>A clever reader will probably begin to wonder, are you sure we need the column with the date of deletion. We could delete all records that are found in the table <em>deletes </em>and then delete the contents of this table. Theoretically this is true, but in the event of a problem with the Solr indexing server we can easily replace it with another &#8211; the degree of synchronization with the database is not very important &#8211; just the next incremental imports will sync with the database. If we would delete the contents of the <em>deletes </em>table such possibility does not exist.</p>
<p>We can now do the incremental import by calling the following address:&nbsp; <em>/solr/dataimport?command=delta-import</em><br />In the logs you should see a line similar to this:<br /><em>INFO: {delete=[35, 36],optimize=} 0 2</em><br />Which means that DIH properly removed from the index the documents, which were previously removed from the database.</p><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><img src ="http://www.blogjava.net/conans/aggbug/379544.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:11 <a href="http://www.blogjava.net/conans/articles/379544.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Solr 使用 Log4j</title><link>http://www.blogjava.net/conans/articles/379541.html</link><dc:creator>CONAN</dc:creator><author>CONAN</author><pubDate>Wed, 30 May 2012 06:01:00 GMT</pubDate><guid>http://www.blogjava.net/conans/articles/379541.html</guid><description><![CDATA[<h2>
<p>大家知道在解压开solr的web程序（apache-solr-3.2.0.war）时，在其WEB-INF/lib目录下有slf4j- api-1.5.5.jar，slf4j-jdk14-1.5.5.jar这两个jar包，故可知其默认使用的是jdk的日志数据，其日志都是输入到 tomcat的logs中；再看其是结合slf4j进行jdk的日志数据；slf4j并不是一种具体的日志系统，而是一个用户日志系统的facade，允许在部署最终应用时方便的变更其日志系统。故solr使用log4j也是ok的，即采用log4j替换jdk的日志输入；做法如下：<br />1. &nbsp;将solr/WINF-WEB/lib中的slf4j-api-1.5.5.jar，slf4j-jdk14-1.5.5.jar删除，新加入 log4j-1.2.15.jar &nbsp;slf4j-api-1.5.0.jar &nbsp;slf4j-log4j12-1.5.0.jar或是其对应的jar包；<br />2.在solr/WEB-INF/下创建classes目录，因为默认的包中没有该目录，其都是使用jsp操作；<br />3. 将写好的log4j.properties放到solr/WEB-INF/classes中, 其内容如下，</p>
<p>log4j.rootLogger=INFO<br />log4j.logger.org.apache.solr=INFO,ROLLING_FILE</p>
<p>log4j.appender.ROLLING_FILE=org.apache.log4j.RollingFileAppender<br />log4j.appender.ROLLING_FILE.Append=false<br />log4j.appender.ROLLING_FILE.File=/var/log/solr.log<br />log4j.appender.ROLLING_FILE.MaxBackupIndex=50<br />log4j.appender.ROLLING_FILE.MaxFileSize=200MB<br />log4j.appender.LOGFILE.Threshold=INFO<br />log4j.appender.ROLLING_FILE.layout=org.apache.log4j.PatternLayout<br />log4j.appender.ROLLING_FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH\:mm\:ss} %p [%c]\:%L Line &#8211; %m%n</p>
<p>4.重启tomcat即可<br />PS：如果是采用JNDI部署，最好将以上的重新打包war，在替换旧的</p></h2> <img src ="http://www.blogjava.net/conans/aggbug/379541.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/conans/" target="_blank">CONAN</a> 2012-05-30 14:01 <a href="http://www.blogjava.net/conans/articles/379541.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>