﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-梦幻之旅-随笔分类-Regular Exp</title><link>http://www.blogjava.net/hwpok/category/31113.html</link><description>DEBUG - 天道酬勤</description><language>zh-cn</language><lastBuildDate>Mon, 10 Sep 2012 21:09:33 GMT</lastBuildDate><pubDate>Mon, 10 Sep 2012 21:09:33 GMT</pubDate><ttl>60</ttl><item><title>java  正则</title><link>http://www.blogjava.net/hwpok/archive/2012/09/10/387426.html</link><dc:creator>惠万鹏</dc:creator><author>惠万鹏</author><pubDate>Mon, 10 Sep 2012 15:15:00 GMT</pubDate><guid>http://www.blogjava.net/hwpok/archive/2012/09/10/387426.html</guid><wfw:comment>http://www.blogjava.net/hwpok/comments/387426.html</wfw:comment><comments>http://www.blogjava.net/hwpok/archive/2012/09/10/387426.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/hwpok/comments/commentRss/387426.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/hwpok/services/trackbacks/387426.html</trackback:ping><description><![CDATA[<h3>目录</h3>正则表达式介绍<a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#man_match_mode"><br /><font color="#1d58d1">匹配模式</font></a><br /><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#man_char_class"><font color="#1d58d1">字符子集</font></a><br /><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#man_line_terminator"><font color="#1d58d1">行结束符</font></a><br /><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#man_group"><font color="#1d58d1">分组和引用</font></a><br /><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#man_unicode"><font color="#1d58d1">Unicode支持</font></a><br /><br /><br /><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#reference"><font color="#1d58d1">正则表达式语法参考</font></a><br />
<ol><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_chars"><font color="#1d58d1">字符</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_logicopr"><font color="#1d58d1">逻辑操作符</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_backref"><font color="#1d58d1">向后引用</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_boundmeta"><font color="#1d58d1">边界元字符</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_repeatindicator"><font color="#1d58d1">重复指示符</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_char_class"><font color="#1d58d1">字符子集</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_predef_meta"><font color="#1d58d1">预定义子集（元字符）</font></a></li><li>扩展子集（元字符）</li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_chinese_meta"><font color="#1d58d1">扩展中文子集（元字符）</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_posix_subset"><font color="#1d58d1">POSIX字符子集（只适用于ASCII）</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#ref_unicode_category"><font color="#1d58d1">Unicode块和分类</font></a><br /></li></ol><br />替换表达式<br /><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#man_subexp"><font color="#1d58d1">替换表达式<br /></font></a>
<ol><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#subexp_chars"><font color="#1d58d1">特殊字符</font></a></li><li><a href="http://www.blogjava.net/zhugf000/archive/2005/12/12/23414.html#subexp_custtbl"><font color="#1d58d1">自定义替换表</font></a><br /></li></ol><font color="#1d58d1">
<hr size="2" width="100%" />
</font>
<h3><a name="man_match_mode"></a>匹配模式</h3>匹配模式指得是正则表达式引擎将以何种模式匹配字符串。<br />
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top">模式名称<br /></td>
<td valign="top">启用，禁用<br /></td>
<td valign="top">缺省启用<br /></td>
<td valign="top">说明<br /></td></tr>
<tr>
<td valign="top">UNIX_LINES<br /></td>
<td valign="top">(?d)启用，(?-d)禁用<br /></td>
<td valign="top">是<br /></td>
<td valign="top">启用Unix行模式。<br />在此模式下，只有 <tt>'\n'</tt>被认为是行结束符。它会影响<tt>.</tt>, <tt>^</tt>, 和 <tt>$</tt> 的行为。<br /><br /></td></tr>
<tr>
<td valign="top">CASE_INSENSITIVE<br /></td>
<td valign="top">(?i)启用，(?-i)禁用<br /></td>
<td valign="top">否<br /></td>
<td valign="top">启用忽略大小写模式。<br />缺省时，忽略大小写模式只会影响 ASCII字符的匹配。 而Unicode范围的忽略大小写匹配需要通过 UNICODE_CASE 标志与本标志联合使用。<br />启用此模式会影响匹配性能。<br /><br /></td></tr>
<tr>
<td valign="top">COMMENTS<br /></td>
<td valign="top">(?x)启用，(?-x)禁用<br /></td>
<td valign="top">否<br /></td>
<td valign="top">允许空格和注释出现在正则表达式中。<br />在此模式下，空格被忽略，以#开始的单行注释被忽略。 <br /></td></tr>
<tr>
<td valign="top">MULTILINE<br /></td>
<td valign="top">(?m)启用，(?-m)禁用<br /></td>
<td valign="top">是<br /></td>
<td valign="top">启用多行模式。<br />In multiline mode the expressions <tt>^</tt> and <tt>$</tt> match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence. <br /><br /></td></tr>
<tr>
<td valign="top">DOTALL<br /></td>
<td valign="top">(?s)启用，(?-s)禁用<br /></td>
<td valign="top">否<br /></td>
<td valign="top">让.可以匹配行结束符。<br />在此模式下，元字符<tt>.</tt>可以匹配行结束符。缺省不允许如此匹配。<br /><br /><br /></td></tr>
<tr>
<td valign="top">UNICODE_CASE<br /></td>
<td valign="top">(?u)启用，(?-u)禁用<br /></td>
<td valign="top">否<br /></td>
<td valign="top">Enables Unicode-aware case folding.<br />When this flag is specified then case-insensitive matching, when enabled by the <code>CASE_INSENSITIVE</code> flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. 启用此模式会影响性能。<br /><br /></td></tr>
<tr>
<td valign="top">CANON_EQ<br /></td>
<td valign="top">(?c)启用，(?-c)禁用<br /></td>
<td valign="top">否<br /></td>
<td valign="top">Enables canonical equivalence.<br />When this flag is specified then two characters will be considered to match if, and only if, their full canonical decompositions match. The expression <tt>"a\u030A"</tt>, for example, will match the string <tt>"?"</tt> when this flag is specified. By default, matching does not take canonical equivalence into account. <br />启用此模式会影响性能。<br /><br /></td></tr></tbody></table><br /><br />
<h3><a name="man_char_class"></a>字符子集</h3>字符子集可以含有别的字符子集，并且可以通过联合操作符（缺省）和交集操作符（&amp;&amp;）实现组合。联合操作符表示某个子集匹配它的子子集所匹配的任意字符。交集操作符表明某个字符子集只匹配它的子子集都匹配的字符。<br />字符子集所能够有的操作符的优先级如下，从高到低：<br />
<ol><li>转义符\x</li><li>成组符 [...]</li><li>区间符 a-z</li><li>联合符 [a-e][i-u]</li><li>交集符 [a-z&amp;&amp;[aeiou]]</li></ol>注意：在字符子集[]内部的语法根本不同于正则表达式其它部分中的语法。例如，在字符子集内部，正则表达式 . 失去了它原有的含义，而是成了一个匹配.的元字符。 <br /><br />
<h3><a name="man_line_terminator"></a>行结束符</h3>行结束符是一个或两个字符序列，用以表明输入字符序列中一行的结束。下面的字符被认为是行结束符：<br />
<ul><li>一个换行符('\n')。</li><li>一个回车符加上一个换行符("\r\n")。</li><li>一个单独的回车符('\r')。</li><li>代表下一行的字符('\u0085')。</li><li>行分隔符('\u2028')，Unicode中被定义。</li><li>一个分段符('\u2029)，Unicode中被定义。</li></ul>如果 UNIX_LINES 模式被启用，则只有换行符被认为是行结束符。<br />如果 MULTILINE 模式被启用，。<br /><br />
<h3><a name="man_group"></a>分组和引用</h3>字符分组以它们的左括号的出现顺序来排序。例如在表达式((A)(B(C)))，有四个分组：<br />
<ol><li>((A)(B(C)))</li><li>(A)</li><li>(B(C))</li><li>(C)</li></ol>第0组永远表示表达式本身。<br />分组采用这样的命名方式，是因为，在一次匹配过程中，正则表达式会被匹配多次。以前的匹配子序列有可能在将来被使用；或者在匹配结束时，程序有可能需要重新获得所有匹配的子字符序列。<br />对于正则表达式中的某个分组而言，永远只保留最后匹配的字符序列。如果对某个分组匹配尝试失败，则会保留上次匹配成功的字符序列。例如，对于正则表达式(a(b)?)+而言，字符序列"aba"，将会让分组2匹配的字符序列为"b"。<br />以(?开始的分组，将不会计入分组数目，也不会被后续匹配所引用。<br /><br />
<h3><a name="man_unicode"></a>Unicode支持</h3>本正则表达式匹配引擎的实现遵循了《<a href="http://www.unicode.org/unicode/reports/tr18/"><font color="#1d58d1">Unicode技术报告：Unicode正则表达式指南</font></a>》，实现了该指南的第二层所需的功能，但是在细微处有一些简单语法修改。<br />Unicode块(Block)和分类(Category)通过\p和\P通配苻表示。\p{<strong><em>prop</em></strong>}匹配含有<strong><em>prop</em></strong>的输入序列，而\P{<strong><em>prop</em></strong>}匹配不含有<strong><em>prop</em></strong>的输入序列。Unicode块通过前缀<tt>In</tt>表示，如\p{InMongolian}。Unicode分类通过可选的前缀Is表示，因此\p{L}和\p{IsL}都代表Unicode分类 letters。Unicode块和分类都可以在正则表达式子集外部和内部使用。<br />目前支持的Unicode块和分类是《<a href="http://www.unicode.org/unicode/standard/standard.html"><font color="#1d58d1">Unicode标准，第三版</font></a>》中所指定的块和分类。 Unicode块名称在《<a href="http://www.unicode.org/Public/3.0-Update/UnicodeCharacterDatabase-3.0.0.html"><font color="#1d58d1">Unicode 字符数据库</font></a>》的第14章被定义，文件名称叫<a href="http://www.unicode.org/Public/3.0-Update/Blocks-3.txt"><font color="#1d58d1">Blocks-3.txt</font></a>，但是名称中的空格被去掉了。例如"Basic Latin"成了"BasicLatin"。无论是标准化的还是非标准化的分类，都在该标准的第88页的第4-5表中被全部定义。<br /><br />
<h3><a name="man_compare_perl5"></a>与Perl 5正则表达式语法对比 </h3>[TBD]<br /><br />
<hr size="2" width="100%" />

<h2><a name="reference"></a>正则表达式参考</h2><br />
<hr size="2" width="100%" />

<h3><a name="ref_chars"></a>字符</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">正则表达式字符串<br /></td>
<td valign="top">匹配的字符串<br /></td></tr>
<tr>
<td valign="top" width="30%">X<br /></td>
<td valign="top">字符X，包括 CJK ExtB 区汉字<br /></td></tr>
<tr>
<td valign="top" width="30%">\\<br /></td>
<td valign="top">反斜杠\<br /></td></tr>
<tr>
<td valign="top" width="30%">\0<strong><em>n</em></strong></td>
<td valign="top">八进制0<strong><em>n</em></strong>代表的字符(0&lt;=n&lt;=7)<br /></td></tr>
<tr>
<td valign="top" width="30%">\0<strong><em>nn</em></strong><br /></td>
<td valign="top">八进制0<strong><em>nn</em></strong>代表的字符(0&lt;=n&lt;=7)<br /></td></tr>
<tr>
<td valign="top" width="30%">\0<strong><em>mnn</em></strong><br /></td>
<td valign="top">八进制0<strong><em>mnn</em></strong>代表的字符(0&lt;=m&lt;=3,0&lt;=n&lt;=7)<br /></td></tr>
<tr>
<td valign="top" width="30%">\x<strong><em>hh</em></strong><br /></td>
<td valign="top">十六进制 0x<strong><em>hh</em></strong>所代表的字符<br /></td></tr>
<tr>
<td valign="top" width="30%">\u<strong><em>hhhh</em></strong><br /></td>
<td valign="top">十六进制 0x<strong><em>hhhh</em></strong>所代表的字符。<font color="#ff0000">注意</font>，目前尚不支持CJK ExtB区汉字。<br /></td></tr>
<tr>
<td valign="top" width="30%">\t<br /></td>
<td valign="top">制表符('\u0009')<br /></td></tr>
<tr>
<td valign="top">\n<br /></td>
<td valign="top">换行('\u000A')<br /></td></tr>
<tr>
<td valign="top">\r<br /></td>
<td valign="top">回车('\u000D')<br /></td></tr>
<tr>
<td valign="top">\a<br /></td>
<td valign="top">响铃符('\u0007')<br /></td></tr>
<tr>
<td valign="top">\e<br /></td>
<td valign="top">取消符Escape('\001B')<br /></td></tr>
<tr>
<td valign="top">\c<em><strong>x</strong></em><br /></td>
<td valign="top"><em><strong>x</strong></em>所代表的控制字符<br /></td></tr></tbody></table><br />
<h3><a name="ref_logicopr"></a>逻辑操作符</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">正则表达式字符串<br /></td>
<td valign="top">匹配的字符串<br /></td></tr>
<tr>
<td valign="top" width="30%"><em>XY</em><br /></td>
<td valign="top"><em>X</em>后面跟随<em>Y</em><br /></td></tr>
<tr>
<td valign="top" width="30%"><em>X</em>|<em>Y</em><br /></td>
<td valign="top"><em>X</em>或者<em>Y</em><br /></td></tr>
<tr>
<td valign="top" width="30%">(<em>X</em>)<br /></td>
<td valign="top"><em>X</em>作为分组表达式<br /></td></tr></tbody></table><br />
<h3><a name="ref_backref"></a>向后引用</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">正则表达式字符串<br /></td>
<td valign="top">匹配的字符串<br /></td></tr>
<tr>
<td valign="top" width="30%">\<strong>n</strong><br /></td>
<td valign="top">第<strong>n</strong>个匹配的分组<br /></td></tr></tbody></table><br />
<h3><a name="ref_boundmeta"></a>边界元字符</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">边界字符<br /></td>
<td valign="top">匹配的字符串<br /></td></tr>
<tr>
<td valign="top" width="30%">^<br /></td>
<td valign="top">行首<br /></td></tr>
<tr>
<td valign="top" width="30%">$<br /></td>
<td valign="top">行末<br /></td></tr>
<tr>
<td valign="top" width="30%">\b </td>
<td valign="top">字符边界<br /></td></tr>
<tr>
<td valign="top" width="30%">\B<br /></td>
<td valign="top">非字符边界<br /></td></tr>
<tr>
<td valign="top" width="30%">\A<br /></td>
<td valign="top">输入流的开始<br /></td></tr>
<tr>
<td valign="top" width="30%">\G<br /></td>
<td valign="top">上次匹配的结束处<br /></td></tr>
<tr>
<td valign="top" width="30%">\Z<br /></td>
<td valign="top">输入流的结束，或者是最后一个行结束符，参见<a href="file:///D:/source/jtextpro/src/dist/jtextpro/docs/regexp.html#man_line_terminator"><font color="#1d58d1">行结束符</font></a>。<br /></td></tr>
<tr>
<td valign="top" width="30%">\z<br /></td>
<td valign="top">输入流的结束<br /></td></tr></tbody></table><br />
<h3><a name="ref_repeatindicator"></a>重复指示符</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">正则表达式字符串<br /></td>
<td valign="top">匹配的字符串<br /></td></tr>
<tr>
<td valign="top" width="30%"><strong>X</strong>?<br /></td>
<td valign="top"><strong>X</strong>重复一次，或者不重复<br /></td></tr>
<tr>
<td valign="top" width="30%"><strong>X</strong>*<br /></td>
<td valign="top"><strong>X</strong>重复0次或多次<br /></td></tr>
<tr>
<td valign="top" width="30%"><strong>X</strong>+ <br /></td>
<td valign="top"><strong>X</strong>重复1次或多次<br /></td></tr>
<tr>
<td valign="top" width="30%"><strong>X</strong>{n}<br /></td>
<td valign="top"><strong>X</strong>重复n次，不多也不少。<br /></td></tr>
<tr>
<td valign="top" width="30%"><strong>X</strong>{n,}<br /></td>
<td valign="top"><strong>X</strong>至少重复n次<br /></td></tr>
<tr>
<td valign="top" width="30%"><strong>X</strong>{n,m}<br /></td>
<td valign="top"><strong>X</strong>至少重复n次，至多重复m次。<br /></td></tr></tbody></table>注：<strong>X</strong>{n,m}、?、*、+方式可以联合使用。<br /><br />
<h3><a name="ref_char_class"></a>字符子集</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">正则表达式字符串子集<br /></td>
<td valign="top" width="50%">匹配的字符串<br /></td>
<td valign="top" width="20%" align="left">组合方式<br /></td></tr>
<tr>
<td valign="top" width="30%">[abc]<br /></td>
<td valign="top" width="60%">字符a,b或c，包括 CJK ExtB 区汉字<br /></td>
<td valign="top" width="20%" align="left">简单子集<br /></td></tr>
<tr>
<td valign="top" width="30%">[^abc]<br /></td>
<td valign="top" width="60%">任意非a,b或c的字符。<br /></td>
<td valign="top" width="20%" align="left">排除<br /></td></tr>
<tr>
<td valign="top" width="30%">[a-zA-Z] </td>
<td valign="top" width="60%">从a到z，或者A到Z，包含a,z,A,Z。<br /></td>
<td valign="top" width="20%" align="left">区间<br /></td></tr>
<tr>
<td valign="top" width="30%">[a-d[m-p]]<br /></td>
<td valign="top" width="60%">从a到d，或者m到p，等于[a-dm-p]。<br /></td>
<td valign="top" width="20%" align="left">联合<br /></td></tr>
<tr>
<td valign="top" width="30%">[a-z&amp;&amp;[def]]<br /></td>
<td valign="top" width="60%">d,e或者f。<br /></td>
<td valign="top" width="20%" align="left">交集<br /></td></tr>
<tr>
<td valign="top">[a-z&amp;&amp;[^bc]]<br /></td>
<td valign="top" width="60%">从a到z，除了b和c，等于[ad-z]<br /></td>
<td valign="top" width="20%" align="left">扣除<br /></td></tr>
<tr>
<td valign="top">[a-z&amp;&amp;[^m-p]]<br /></td>
<td valign="top">从a到z，并且不包括从m到p，等于[a-lq-z]<br /></td>
<td valign="top">扣除<br /></td></tr></tbody></table><br />
<h3><a name="ref_predef_meta"></a>预定义子集（元字符）</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">边界字符<br /></td>
<td valign="top">匹配的字符串<br /></td></tr>
<tr>
<td valign="top" width="30%">.<br /></td>
<td valign="top">任意字符，可能匹配行结束符。<br /></td></tr>
<tr>
<td valign="top" width="30%">\d<br /></td>
<td valign="top">数字[0-9]<br /></td></tr>
<tr>
<td valign="top" width="30%">\D </td>
<td valign="top">非数字[^0-9]<br /></td></tr>
<tr>
<td valign="top" width="30%">\s<br /></td>
<td valign="top">空白符[ \t\n\x0B\f\r]<br /></td></tr>
<tr>
<td valign="top" width="30%">\S<br /></td>
<td valign="top">非空白符[^\s]<br /></td></tr>
<tr>
<td valign="top" width="30%">\w<br /></td>
<td valign="top">单词符，包含有字母和数字[a-zA-Z_0-9]<br /></td></tr>
<tr>
<td valign="top" width="30%">\W<br /></td>
<td valign="top">非单词符，不包含有字母和数字。<br /></td></tr></tbody></table><br />
<h3>扩展子集（元字符）</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">正则表达式字符串<br /></td>
<td valign="top">匹配的字符串<br /></td></tr>
<tr>
<td valign="top" width="30%"><br /></td>
<td valign="top"><br /></td></tr></tbody></table><br /><br />
<h3><a name="ref_chinese_meta"></a>扩展中文子集（元字符）</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top">名称<br /></td>
<td valign="top">块名称（\p{InXXX}）<br /></td>
<td valign="top">简化通配符<br /></td>
<td valign="top">标准Unicode块，或者汉字列表<br /></td></tr>
<tr>
<td valign="top">任意双字节字符（汉字或全角符号）<br /></td>
<td valign="top">\p{InQuqnJiao}<br /></td>
<td valign="top">\E<br /></td>
<td valign="top">任意由GBK表示的汉字，不包括GB18030扩展部分，<br />以及CJK ExtB区的汉字。<br /></td></tr>
<tr>
<td valign="top">任意单字节字符<br /></td>
<td valign="top">\p{InFQuanJiao}<br /></td>
<td valign="top">\~E<br /></td>
<td valign="top">任意单字节字符。<br /></td></tr>
<tr>
<td valign="top">任意全角ASCII字符<br /></td>
<td valign="top">\p{InQJAscii}<br /></td>
<td valign="top">\H<br /></td>
<td valign="top">标准HalfwidthandFullwidthForms块<br /></td></tr>
<tr>
<td valign="top">任意收录在BIG5码集中的双字节字符<br /></td>
<td valign="top">\p{InBig5}<br /></td>
<td valign="top">\I<br /></td>
<td valign="top">Big5可编码双字节字符<br /></td></tr>
<tr>
<td valign="top">匹配未收录在BIG5码集中的双字节字符</td>
<td valign="top">\p{InFBig5}<br /></td>
<td valign="top">\~I<br /></td>
<td valign="top">非Big5可编码双字节字符<br /></td></tr>
<tr>
<td valign="top">匹配任意汉字(不包括符号)<br /></td>
<td valign="top">\p{InHanziOrCJKExtB}<br /></td>
<td valign="top">\X<br /></td>
<td valign="top">任意汉字，包括GB18030扩展部分。<br /></td></tr>
<tr>
<td valign="top">匹配任意汉字(不包括符号)<br /></td>
<td valign="top">\p{InHanzi}<br /></td>
<td valign="top">\M<br /></td>
<td valign="top">任意汉字，不包括GB18030扩展部分。<br /></td></tr>
<tr>
<td valign="top">匹配非汉字的双字节字符<br /></td>
<td valign="top">\p{InFHanzi}<br /></td>
<td valign="top">\~M<br /></td>
<td valign="top">任意非汉字的双字节字符，<br />包括GB18030扩展部分。<br /></td></tr>
<tr>
<td valign="top">地支字符<br /></td>
<td valign="top">\p{InDiZhi}<br /></td>
<td valign="top">\U<br /></td>
<td valign="top">子丑寅卯辰巳午未申酉戌亥<br /></td></tr>
<tr>
<td valign="top">匹配收录在GB码集中的双字节字符<br /></td>
<td valign="top">\p{InGB}<br /></td>
<td valign="top">\g<br /></td>
<td valign="top">收录在GB码集中的双字节字符，<br />不包括GB18030扩展部分。<br /></td></tr>
<tr>
<td valign="top">匹配非收录在GB码集中的双字节字符<br /></td>
<td valign="top">\p{InFGB}<br /></td>
<td valign="top">\~g<br /></td>
<td valign="top">未收录在GB码集中的双字节字符，<br />不包括GB18030扩展部分。<br /></td></tr>
<tr>
<td valign="top">匹配收录在GBK码集中的双字节字符<br /></td>
<td valign="top">\p{InGBK}<br /></td>
<td valign="top">\h<br /></td>
<td valign="top">收录在GBK码集中的双字节字符，<br />不包括GB18030扩展部分。<br /></td></tr>
<tr>
<td valign="top">匹配非收录在GBK码集中的双字节字符<br /></td>
<td valign="top">\p{InFGBK}<br /></td>
<td valign="top">\~h<br /></td>
<td valign="top">未收录在GBK码集中的双字节字符，<br />不包括GB18030扩展部分。<br /></td></tr>
<tr>
<td valign="top">大写希腊字母<br /></td>
<td valign="top">\p{InDaXila}<br /></td>
<td valign="top">\K<br /></td>
<td valign="top">&#913;&#914;&#915;&#916;&#917;&#918;&#919;&#920;&#921;&#922;&#923;&#924;&#925;<br />&#926;&#927;&#928;&#929;&#931;&#932;&#933;&#934;&#935;&#936;&#937;<br /></td></tr>
<tr>
<td valign="top">日文片假名<br /></td>
<td valign="top">\p{InPianJia}<br /></td>
<td valign="top">\j<br /></td>
<td valign="top">标准Katakana块<br /></td></tr>
<tr>
<td valign="top">日文平假名<br /></td>
<td valign="top">\p{InPingJia}<br /></td>
<td valign="top">\J<br /></td>
<td valign="top">标准Hiragana块<br /></td></tr>
<tr>
<td valign="top">小写希腊字母<br /></td>
<td valign="top">\p{InXiaoXila}<br /></td>
<td valign="top">\k<br /></td>
<td valign="top">&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;&#953;&#954;&#955;&#956;&#957;<br />&#958;&#959;&#960;&#961;&#963;&#964;&#965;&#966;&#967;&#968;&#969;<br /></td></tr>
<tr>
<td valign="top">数学符号<br /></td>
<td valign="top">\p{InMathe}<br /></td>
<td valign="top">\m<br /></td>
<td valign="top">&#177;&#215;&#247;&#8758;&#8743;&#8744;&#8721;&#8719;&#8746;&#8745;&#8712;&#8759;&#8730;&#8869;&#8741;&#8736;&#8978;&#8857;<br />&#8747;&#8750;&#8801;&#8780;&#8776;&#8765;&#8733;&#8800;&#8814;&#8815;&#8804;&#8805;&#8734;&#8757;&#8756;<br /></td></tr>
<tr>
<td valign="top">中文数字<br /></td>
<td valign="top">\p{InCnDigit}<br /></td>
<td valign="top">\i<br /></td>
<td valign="top">〇一二三四五六七八九十百千万亿兆吉京<br /></td></tr>
<tr>
<td valign="top">大写中文数字<br /></td>
<td valign="top">\p{InDaCnDigit}<br /></td>
<td valign="top">\N<br /></td>
<td valign="top">零壹贰叁肆伍陆柒捌玖拾佰仟萬亿兆吉京<br /></td></tr>
<tr>
<td valign="top">全角标点符号<br /></td>
<td valign="top">\p{InQJBiaoDian}<br /></td>
<td valign="top">\o<br /></td>
<td valign="top">、。&#183;ˉˇ&#168;〃々&#8212;～&#8214;&#8230;&#8216;&#8217;&#8220;&#8221;〔〕<br />〈〉《》「」『』〖〗【】！＂＇（），<br />－．：；＜＝＞？［］｛｜｝｀﹉﹊﹋﹌﹍﹎﹏﹐﹑﹒﹔﹕﹖﹗﹙﹚<br />﹛﹜﹝﹞︵︶︹︺︿﹀︽︾﹁﹂﹃﹄<br />︻︼︷︸︱︳︴<br /></td></tr>
<tr>
<td valign="top">小写俄文字母<br /></td>
<td valign="top">\p{InXiaoEWen}<br /></td>
<td valign="top">\l<br /></td>
<td valign="top">абвгдеёжзийклмн<br />опрстуфхцчшщъыьэюя<br /></td></tr>
<tr>
<td valign="top">大写俄文字母<br /></td>
<td valign="top">\p{InDaEWen}<br /></td>
<td valign="top">\R<br /></td>
<td valign="top">АБВГДЕЁЖЗИЙКЛМНО<br />ПРСТУФХЦЧШЩЪЫЬЭЮЯ<br /></td></tr>
<tr>
<td valign="top">中文序号<br /></td>
<td valign="top">\p{InCnSN}<br /></td>
<td valign="top">\q<br /></td>
<td valign="top">&#8544;&#8545;&#8546;&#8547;&#8548;&#8549;&#8550;&#8551;&#8552;&#8553;&#8554;&#8555;<br />&#8560;&#8561;&#8562;&#8563;&#8564;&#8565;&#8566;&#8567;&#8568;&#8569;<br />再加上Unicode标准EnclosedAlphanumerics块<br /></td></tr>
<tr>
<td valign="top">天干字符<br /></td>
<td valign="top">\p{InTianGan}<br /></td>
<td valign="top">\T<br /></td>
<td valign="top">甲乙丙丁戊己庚辛壬癸<br /></td></tr>
<tr>
<td valign="top">竖排标点符号<br /></td>
<td valign="top">\p{InSPBiaoDian}<br /></td>
<td valign="top">\V<br /></td>
<td valign="top">︵︶︹︺︿﹀︽︾﹁﹂﹃﹄︻︼︷︸︱︳︴<br /></td></tr>
<tr>
<td valign="top">拼音字符<br /></td>
<td valign="top">\p{InPinyin}<br /></td>
<td valign="top">\y<br /></td>
<td valign="top">ā&#225;ǎ&#224;ē&#233;ě&#232;ī&#237;ǐ&#236;ō&#243;ǒ&#242;ū&#250;ǔ&#249;ǖǘǚǜ&#252;&#234;ɑńňɡ<br />GBK -&gt; 0xA8A1- 0xA8C0<br />只是Unicode标准LatinExtended-A块的一部分。<br /></td></tr>
<tr>
<td valign="top">注音字符<br /></td>
<td valign="top">\p{InZhuyin}<br /></td>
<td valign="top">\Y<br /></td>
<td valign="top">标准Bopomofo块<br /></td></tr>
<tr>
<td valign="top">制表字符<br /></td>
<td valign="top">\p{InZhiBiao}<br /></td>
<td valign="top">\C<br /></td>
<td valign="top">标准BoxDrawing块。<br />经检查发现 textpro 的算法含有部分非标<br />准Unicode制表符：&#8220;&#8735;&#8739;&#8786;&#8806;&#8807;&#8895;&#9552;&#8221;。<br /></td></tr></tbody></table><br />
<h3><a name="ref_posix_subset"></a>POSIX字符子集（只适用于ASCII）</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">正则表达式字符串<br /></td>
<td valign="top">匹配的字符串<br /></td></tr>
<tr>
<td valign="top" width="30%">\p{Lower}<br /></td>
<td valign="top">小写字母[a-z]<br /></td></tr>
<tr>
<td valign="top" width="30%">\p{Upper}<br /></td>
<td valign="top">大写字母[A-Z]<br /></td></tr>
<tr>
<td valign="top" width="30%">\p{ASCII}<br /></td>
<td valign="top">所有的ASCII字符[\x00-\x7F]<br /></td></tr>
<tr>
<td valign="top" width="30%">\p{Alpha}<br /></td>
<td valign="top">大小写字母[\p{Lower}\p{Upper}]<br /></td></tr>
<tr>
<td valign="top" width="30%">\p{Digit}<br /></td>
<td valign="top">数字[0-9]<br /></td></tr>
<tr>
<td valign="top" width="30%">\p{Alnum}<br /></td>
<td valign="top">字母数字符，包含大小写字母和数字[\p{Alpha}\p{Digit}]<br /></td></tr>
<tr>
<td valign="top" width="30%">\p{Punct}<br /></td>
<td valign="top">标点符号，!"#$%&amp;'()*+,-./:;&lt;=&gt;?@[\]^_`{|}~之一。<br /></td></tr>
<tr>
<td valign="top" width="30%">\p{Graph}<br /></td>
<td valign="top">可显示字符[\p{Alnum}\p{Punct}]<br /></td></tr>
<tr>
<td valign="top">\p{Print}<br /></td>
<td valign="top">可打印字符[\p{Graph}]<br /></td></tr>
<tr>
<td valign="top">\p{Blank}<br /></td>
<td valign="top">空格或者制表符[ \t]<br /></td></tr>
<tr>
<td valign="top">\p{Cntrl}<br /></td>
<td valign="top">控制字符[\x00-\x1F\x7F<br /></td></tr>
<tr>
<td valign="top">\p{XDigit}<br /></td>
<td valign="top">十六进制数字[0-9a-fA-F]<br /></td></tr>
<tr>
<td valign="top">\p{Space}<br /></td>
<td valign="top">空白符[ \t\n\x0B\f\r]<br /></td></tr></tbody></table><br />
<h3><a name="ref_unicode_category"></a>Unicode块和分类</h3>
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top">块<br /></td>
<td valign="top">中文名称（摘自Word XP）<br /></td>
<td valign="top">代码区域<br /></td></tr>
<tr>
<td valign="top">BasicLatin<br /></td>
<td valign="top">基本拉丁语<br /></td>
<td valign="top">\u0000-\u007F<br /></td></tr>
<tr>
<td valign="top">Latin-1Supplement<br /></td>
<td valign="top">拉丁语-1<br /></td>
<td valign="top">\u0080-\u00FF<br /></td></tr>
<tr>
<td valign="top">LatinExtended-A<br /></td>
<td valign="top">拉丁语扩充-A<br /></td>
<td valign="top">\u0100-\u017F<br /></td></tr>
<tr>
<td valign="top">LatinExtended-Bound<br /></td>
<td valign="top">拉丁语扩充-B<br /></td>
<td valign="top">\u0180-\u024F<br /></td></tr>
<tr>
<td valign="top">IPAExtensions<br /></td>
<td valign="top">国际音标扩充<br /></td>
<td valign="top">\u0250-\u02AF<br /></td></tr>
<tr>
<td valign="top">SpacingModifierLetters<br /></td>
<td valign="top">进格的修饰字符<br /></td>
<td valign="top">\u02B0-\u02FF<br /></td></tr>
<tr>
<td valign="top">CombiningDiacriticalMarks<br /></td>
<td valign="top">组合用发音符<br /></td>
<td valign="top">\u0300-\u036F<br /></td></tr>
<tr>
<td valign="top">Greek<br /></td>
<td valign="top">基本希腊语<br /></td>
<td valign="top">\u0370-\u03FF<br /></td></tr>
<tr>
<td valign="top">Cyrillic<br /></td>
<td valign="top">西里尔语<br /></td>
<td valign="top">\u0400-\u04FF<br /></td></tr>
<tr>
<td valign="top">Armenian<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0530-\u058F<br /></td></tr>
<tr>
<td valign="top">Hebrew<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0590-\u05FF<br /></td></tr>
<tr>
<td valign="top">Arabic<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0600-\u06FF<br /></td></tr>
<tr>
<td valign="top">Syriac<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0700-\u074F<br /></td></tr>
<tr>
<td valign="top">Thaana<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0780-\u07BF<br /></td></tr>
<tr>
<td valign="top">Devanagari<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0900-\u097F<br /></td></tr>
<tr>
<td valign="top">Bengali<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0980-\u09FF<br /></td></tr>
<tr>
<td valign="top">Gurmukhi<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0A00-\u0A7F<br /></td></tr>
<tr>
<td valign="top">Gujarati<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0A80-\u0AFF<br /></td></tr>
<tr>
<td valign="top">Oriya<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0B00-\u0B7F<br /></td></tr>
<tr>
<td valign="top">Tamil<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0B80-\u0BFF<br /></td></tr>
<tr>
<td valign="top">Telugu<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0C00-\u0C7F<br /></td></tr>
<tr>
<td valign="top">Kannada<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0C80-\u0CFF<br /></td></tr>
<tr>
<td valign="top">Malayalam<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0D00-\u0D7F<br /></td></tr>
<tr>
<td valign="top">Sinhala<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0D800-\uDFF<br /></td></tr>
<tr>
<td valign="top">Thai<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0E00-\u0E7F<br /></td></tr>
<tr>
<td valign="top">Lao<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u0E80-\u0EFF<br /></td></tr>
<tr>
<td valign="top">Tibetan<br /></td>
<td valign="top">藏语<br /></td>
<td valign="top">\u0F00-\u0FFF<br /></td></tr>
<tr>
<td valign="top">Myanmar<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u1000-\u109F<br /></td></tr>
<tr>
<td valign="top">Georgian<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u10A0-\u10FF<br /></td></tr>
<tr>
<td valign="top">HangulJamo<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u1100-\u11FF<br /></td></tr>
<tr>
<td valign="top">Ethiopic<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u1200-\u137F<br /></td></tr>
<tr>
<td valign="top">Cherokee<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u13A0-\u13FF<br /></td></tr>
<tr>
<td valign="top">UnifiedCanadianAboriginalSyllabics<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u1400-\u167F<br /></td></tr>
<tr>
<td valign="top">Ogham<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u1680-\u169F<br /></td></tr>
<tr>
<td valign="top">Runic<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u16A0-\u16FF<br /></td></tr>
<tr>
<td valign="top">Khmer<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u1780-\u17FF<br /></td></tr>
<tr>
<td valign="top">Mongolian<br /></td>
<td valign="top">蒙古语<br /></td>
<td valign="top">\u1800-\u18AF<br /></td></tr>
<tr>
<td valign="top">LatinExtendedAdditional<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u1E00-\u1EFF<br /></td></tr>
<tr>
<td valign="top">GreekExtended<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u1F00-\u1FFF<br /></td></tr>
<tr>
<td valign="top">GeneralPunctuation<br /></td>
<td valign="top">广义标点<br /></td>
<td valign="top">\u2000-\u206F<br /></td></tr>
<tr>
<td valign="top">SuperscriptsandSubscripts<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u2070-\u209F<br /></td></tr>
<tr>
<td valign="top">CurrencySymbols<br /></td>
<td valign="top">货币符号<br /></td>
<td valign="top">\u20A0-\u20CF<br /></td></tr>
<tr>
<td valign="top">CombiningMarksforSymbols<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u20D0-\u20FF<br /></td></tr>
<tr>
<td valign="top">LetterlikeSymbols<br /></td>
<td valign="top">类似字母的符号<br /></td>
<td valign="top">\u2100-\u214F<br /></td></tr>
<tr>
<td valign="top">NumberForms<br /></td>
<td valign="top">数字形式<br /></td>
<td valign="top">\u2150-\u218F<br /></td></tr>
<tr>
<td valign="top">Arrows<br /></td>
<td valign="top">箭头<br /></td>
<td valign="top">\u2190-\u21FF<br /></td></tr>
<tr>
<td valign="top">MathematicalOperators<br /></td>
<td valign="top">数学运算符<br /></td>
<td valign="top">\u2200-\u22FF<br /></td></tr>
<tr>
<td valign="top">MiscellaneousTechnical<br /></td>
<td valign="top">零杂技术用符号<br /></td>
<td valign="top">\u2300-\u23FF<br /></td></tr>
<tr>
<td valign="top">ControlPictures<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u2400-\u243F<br /></td></tr>
<tr>
<td valign="top">OpticalCharacterRecognition<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u2440-\u245F<br /></td></tr>
<tr>
<td valign="top">EnclosedAlphanumerics<br /></td>
<td valign="top">带括号的字母数字<br /></td>
<td valign="top">\u2460-\u24FF<br /></td></tr>
<tr>
<td valign="top">BoxDrawing<br /></td>
<td valign="top">制表符<br /></td>
<td valign="top">\u2500-\u257F<br /></td></tr>
<tr>
<td valign="top">BlockElements<br /></td>
<td valign="top">方块图形<br /></td>
<td valign="top">\u2580-\u259F<br /></td></tr>
<tr>
<td valign="top">GeometricShapes<br /></td>
<td valign="top">几何图形<br /></td>
<td valign="top">\u25A0-\u25FF<br /></td></tr>
<tr>
<td valign="top">MiscellaneousSymbols<br /></td>
<td valign="top">零杂丁贝符（示意符等）<br /></td>
<td valign="top">\u2600-\u26FF<br /></td></tr>
<tr>
<td valign="top">Dingbats<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u2700-\u27BF<br /></td></tr>
<tr>
<td valign="top">BraillePatterns<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u2800-\u28FF<br /></td></tr>
<tr>
<td valign="top">CJKRadicalsSupplement<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u2E80-\u2EFF<br /></td></tr>
<tr>
<td valign="top">KangxiRadicals<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u2F00-\u2FDF<br /></td></tr>
<tr>
<td valign="top">IdeographicDescriptionCharacters<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u2FF0-\u2FFF<br /></td></tr>
<tr>
<td valign="top">CJKSymbolsandPunctuation<br /></td>
<td valign="top">CJK符号和标点<br /></td>
<td valign="top">\u3000-\u303F<br /></td></tr>
<tr>
<td valign="top">Hiragana<br /></td>
<td valign="top">平假名<br /></td>
<td valign="top">\u3040-\u309F<br /></td></tr>
<tr>
<td valign="top">Katakana<br /></td>
<td valign="top">片假名<br /></td>
<td valign="top">\u30A0-\u30FF<br /></td></tr>
<tr>
<td valign="top">Bopomofo<br /></td>
<td valign="top">注音<br /></td>
<td valign="top">\u3100-\u312F<br /></td></tr>
<tr>
<td valign="top">HangulCompatibilityJamo<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u3130-\u318F<br /></td></tr>
<tr>
<td valign="top">Kanbun<br /></td>
<td valign="top"><br /></td>
<td valign="top">\u3190-\u319F<br /></td></tr>
<tr>
<td valign="top">BopomofoExtended<br /></td>
<td valign="top">扩展注音<br /></td>
<td valign="top">\u31A0-\u31BF<br /></td></tr>
<tr>
<td valign="top">EnclosedCJKLettersandMonths<br /></td>
<td valign="top">带括号的CJK字母及月份<br /></td>
<td valign="top">\u3200-\u32FF<br /></td></tr>
<tr>
<td valign="top">CJKCompatibility<br /></td>
<td valign="top">CJK兼容字符<br /></td>
<td valign="top">\u3300-\u33FF <br /></td></tr>
<tr>
<td valign="top">CJKUnifiedIdeographsExtensionA<br /></td>
<td valign="top">CJK统一汉字扩展-A<br /></td>
<td valign="top">\u3400-\u4dBF<br /></td></tr>
<tr>
<td valign="top">CJKUnifiedIdeographs<br /></td>
<td valign="top">CJK统一汉字<br /></td>
<td valign="top">\u4E00-\u9fAF<br /></td></tr>
<tr>
<td valign="top">YiSyllables<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uA000-\uA48F<br /></td></tr>
<tr>
<td valign="top">YiRadicals<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uA490-\uA4CF<br /></td></tr>
<tr>
<td valign="top">HangulSyllables<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uAC00-\uD7A3<br /></td></tr>
<tr>
<td valign="top">HighSurrogates<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uD800-\uDB7F<br /></td></tr>
<tr>
<td valign="top">HighPrivateUseSurrogates<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uDB80-\uDBFF<br /></td></tr>
<tr>
<td valign="top">LowSurrogates<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uDC00-\uDFFF<br /></td></tr>
<tr>
<td valign="top">PrivateUse<br /></td>
<td valign="top">专用区<br /></td>
<td valign="top">\uE000-\uF8FF<br /></td></tr>
<tr>
<td valign="top">CJKCompatibilityIdeographs<br /></td>
<td valign="top">CJK兼容汉字<br /></td>
<td valign="top">\uF900-\uFAFF<br /></td></tr>
<tr>
<td valign="top">AlphabeticPresentationForms<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uFB00-\uFB4F<br /></td></tr>
<tr>
<td valign="top">ArabicPresentationForms-A<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uFB50-\uFDFF<br /></td></tr>
<tr>
<td valign="top">CombiningHalfMarks<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uFE20-\uFE2F<br /></td></tr>
<tr>
<td valign="top">CJKCompatibilityForms<br /></td>
<td valign="top">CJK兼容形式<br /></td>
<td valign="top">\uFE30-\uFE4F<br /></td></tr>
<tr>
<td valign="top">SmallFormVariants<br /></td>
<td valign="top">小写变体<br /></td>
<td valign="top">\uFE50-\uFE6F<br /></td></tr>
<tr>
<td valign="top">ArabicPresentationForms-Bound<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uFE70-\ufeFF<br /></td></tr>
<tr>
<td valign="top">Specials<br /></td>
<td valign="top"><br /></td>
<td valign="top">\uFFF0-\uFFFF<br /></td></tr>
<tr>
<td valign="top">HalfwidthandFullwidthForms<br /></td>
<td valign="top">半形及全形字符<br /></td>
<td valign="top">\uFF00-\uFFEF<br /></td></tr></tbody></table><br /><br />
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top">分类<br /></td>
<td valign="top">全称<br /></td>
<td valign="top">说明<br /></td></tr>
<tr>
<td valign="top">Cn<br /></td>
<td valign="top"><br /></td>
<td valign="top">UNASSIGNED<br /></td></tr>
<tr>
<td valign="top">Lu<br /></td>
<td valign="top"><br /></td>
<td valign="top">UPPERCASE_LETTER<br /></td></tr>
<tr>
<td valign="top">Ll<br /></td>
<td valign="top"><br /></td>
<td valign="top">LOWERCASE_LETTER<br /></td></tr>
<tr>
<td valign="top">Lt<br /></td>
<td valign="top"><br /></td>
<td valign="top">TITLECASE_LETTER<br /></td></tr>
<tr>
<td valign="top">Lm<br /></td>
<td valign="top"><br /></td>
<td valign="top">MODIFIER_LETTER<br /></td></tr>
<tr>
<td valign="top">Lo<br /></td>
<td valign="top"><br /></td>
<td valign="top">OTHER_LETTER<br /></td></tr>
<tr>
<td valign="top">Mn<br /></td>
<td valign="top"><br /></td>
<td valign="top">NON_SPACING_MARK<br /></td></tr>
<tr>
<td valign="top">Me<br /></td>
<td valign="top"><br /></td>
<td valign="top">ENCLOSING_MARK<br /></td></tr>
<tr>
<td valign="top">Mc<br /></td>
<td valign="top"><br /></td>
<td valign="top">COMBINING_SPACING_MARK<br /></td></tr>
<tr>
<td valign="top">Nd<br /></td>
<td valign="top"><br /></td>
<td valign="top">DECIMAL_DIGIT_NUMBER<br /></td></tr>
<tr>
<td valign="top">Nl<br /></td>
<td valign="top"><br /></td>
<td valign="top">LETTER_NUMBER<br /></td></tr>
<tr>
<td valign="top">No<br /></td>
<td valign="top"><br /></td>
<td valign="top">OTHER_NUMBER<br /></td></tr>
<tr>
<td valign="top">Zs<br /></td>
<td valign="top"><br /></td>
<td valign="top">SPACE_SEPARATOR<br /></td></tr>
<tr>
<td valign="top">Zl<br /></td>
<td valign="top"><br /></td>
<td valign="top">LINE_SEPARATOR<br /></td></tr>
<tr>
<td valign="top">Zp<br /></td>
<td valign="top"><br /></td>
<td valign="top">PARAGRAPH_SEPARATOR<br /></td></tr>
<tr>
<td valign="top">Cc<br /></td>
<td valign="top"><br /></td>
<td valign="top">CNTRL<br /></td></tr>
<tr>
<td valign="top">Cf<br /></td>
<td valign="top"><br /></td>
<td valign="top">FORMAT<br /></td></tr>
<tr>
<td valign="top">Co<br /></td>
<td valign="top"><br /></td>
<td valign="top">PRIVATE_USE<br /></td></tr>
<tr>
<td valign="top">Cs<br /></td>
<td valign="top"><br /></td>
<td valign="top">SURROGATE<br /></td></tr>
<tr>
<td valign="top">Pd<br /></td>
<td valign="top"><br /></td>
<td valign="top">DASH_PUNCTUATION<br /></td></tr>
<tr>
<td valign="top">Ps<br /></td>
<td valign="top"><br /></td>
<td valign="top">START_PUNCTUATION<br /></td></tr>
<tr>
<td valign="top">Pe<br /></td>
<td valign="top"><br /></td>
<td valign="top">END_PUNCTUATION<br /></td></tr>
<tr>
<td valign="top">Pc<br /></td>
<td valign="top"><br /></td>
<td valign="top">CONNECTOR_PUNCTUATION<br /></td></tr>
<tr>
<td valign="top">Po<br /></td>
<td valign="top"><br /></td>
<td valign="top">OTHER_PUNCTUATION<br /></td></tr>
<tr>
<td valign="top">Sm<br /></td>
<td valign="top"><br /></td>
<td valign="top">MATH_SYMBOL<br /></td></tr>
<tr>
<td valign="top">Sc<br /></td>
<td valign="top"><br /></td>
<td valign="top">CURRENCY_SYMBOL<br /></td></tr>
<tr>
<td valign="top">Sk<br /></td>
<td valign="top"><br /></td>
<td valign="top">MODIFIER_SYMBOL<br /></td></tr>
<tr>
<td valign="top">So<br /></td>
<td valign="top"><br /></td>
<td valign="top">OTHER_SYMBOL<br /></td></tr>
<tr>
<td valign="top">L<br /></td>
<td valign="top"><br /></td>
<td valign="top">LETTER<br /></td></tr>
<tr>
<td valign="top">M<br /></td>
<td valign="top"><br /></td>
<td valign="top">MARK<br /></td></tr>
<tr>
<td valign="top">N<br /></td>
<td valign="top"><br /></td>
<td valign="top">NUMBER<br /></td></tr>
<tr>
<td valign="top">Z<br /></td>
<td valign="top"><br /></td>
<td valign="top">SEPARATOR<br /></td></tr>
<tr>
<td valign="top">C<br /></td>
<td valign="top"><br /></td>
<td valign="top">CONTROL<br /></td></tr>
<tr>
<td valign="top">P<br /></td>
<td valign="top"><br /></td>
<td valign="top">PUNCTUATION<br /></td></tr>
<tr>
<td valign="top">S<br /></td>
<td valign="top"><br /></td>
<td valign="top">SYMBOL</td></tr>
<tr>
<td valign="top">LD<br /></td>
<td valign="top"><br /></td>
<td valign="top">LETTER_OR_DIGIT<br /></td></tr>
<tr>
<td valign="top">L1<br /></td>
<td valign="top"><br /></td>
<td valign="top">Latin-1<br /></td></tr>
<tr>
<td valign="top">all<br /></td>
<td valign="top"><br /></td>
<td valign="top">ALL<br /></td></tr>
<tr>
<td valign="top">ASCII<br /></td>
<td valign="top"><br /></td>
<td valign="top">ASCII<br /></td></tr>
<tr>
<td valign="top">Alnum<br /></td>
<td valign="top"><br /></td>
<td valign="top">字母数字(0-9,a-z,A-Z)<br /></td></tr>
<tr>
<td valign="top">Alpha<br /></td>
<td valign="top"><br /></td>
<td valign="top">字母(a-z,A-Z)<br /></td></tr>
<tr>
<td valign="top">Blank<br /></td>
<td valign="top"><br /></td>
<td valign="top">空格和制表符(' '|\t)<br /></td></tr>
<tr>
<td valign="top">Cntrl<br /></td>
<td valign="top"><br /></td>
<td valign="top">控制字符，不可打印<br /></td></tr>
<tr>
<td valign="top">Digit<br /></td>
<td valign="top"><br /></td>
<td valign="top">数字(0-9)<br /></td></tr>
<tr>
<td valign="top">Graph<br /></td>
<td valign="top"><br /></td>
<td valign="top">可打印且可视字母（例如空格' '是可打印的但不是可视字母，而 `a' 两者都是。）<br /></td></tr>
<tr>
<td valign="top">Lower<br /></td>
<td valign="top"><br /></td>
<td valign="top">小写字母<br /></td></tr>
<tr>
<td valign="top">Print<br /></td>
<td valign="top"><br /></td>
<td valign="top">可打印字母（非控制字符）<br /></td></tr>
<tr>
<td valign="top">Punct<br /></td>
<td valign="top"><br /></td>
<td valign="top">标符号（字母、数字、控制、空白符以外的字母），如：!@#$%}{&lt;&gt;,./?[]等等。<br /></td></tr>
<tr>
<td valign="top">Space<br /></td>
<td valign="top"><br /></td>
<td valign="top">空白符(' ',\t,0x09,0x0A,0x0B,0x0C,0x0D,0x20)<br /></td></tr>
<tr>
<td valign="top">Upper<br /></td>
<td valign="top"><br /></td>
<td valign="top">大写字母<br /></td></tr>
<tr>
<td valign="top">XDigit<br /></td>
<td valign="top"><br /></td>
<td valign="top">十六进制数字(0-9，a-f, A-F)<br /></td></tr></tbody></table><br />
<hr size="2" width="100%" />
<br /><br /><br />
<h3><a name="man_subexp"></a>替换表达式</h3><a name="subexp_chars"></a>特殊字符<br />
<table border="1" cellspacing="2" cellpadding="2" width="100%">
<tbody>
<tr>
<td valign="top" width="30%">特殊字符介绍<br /></td>
<td valign="top">描述<br /></td></tr>
<tr>
<td valign="top" width="30%">\n<br /></td>
<td valign="top">换行<br /></td></tr>
<tr>
<td valign="top" width="30%">\b<br /></td>
<td valign="top">向前删除一个字符。当这个字符位于替换串之首时，将删除匹配串之前的一个字符。若匹配串位于行首，将使匹配串所在行与前一行相合并。<br /></td></tr>
<tr>
<td valign="top" width="30%">\d<br /></td>
<td valign="top">向后删除一个字符。当这个字符位于替换串之末时，将删除匹配串之后的一个字符。若匹配串位于行末，将使匹配串所在行与下一行相合并。<br /></td></tr>
<tr>
<td valign="top" width="30%">\e<br /></td>
<td valign="top">插入一个ESC字符<br /></td></tr>
<tr>
<td valign="top">\t<br /></td>
<td valign="top">插入一个TAB字符<br /></td></tr>
<tr>
<td valign="top">\<em><strong>n</strong></em><br /></td>
<td valign="top">n代表查找正则表达式中的子表达式（组）。\1代表第一个子表达式，\2代表第二个子表达式，依次类推。\0代表整个匹配到的字符串。<br /></td></tr>
<tr>
<td valign="top">\v<br /></td>
<td valign="top">大写下一个字母<br /></td></tr>
<tr>
<td valign="top">\U<br /></td>
<td valign="top">全部大写以后的字母，直到碰到其它指示符为止。<br /></td></tr>
<tr>
<td valign="top">\l<br /></td>
<td valign="top">小写下一个字母<br /></td></tr>
<tr>
<td valign="top">\L<br /></td>
<td valign="top">全部小写以后的字母，直到碰到其它指示符为止。<br /></td></tr>
<tr>
<td valign="top">\E<br /></td>
<td valign="top">取消所有的字母大小写指示符。<br /></td></tr></tbody></table><br /><br /><a name="subexp_custtbl"></a>自定义替换表<br /><br />在查找／替换中使用自定义替换表 <br />有的时候，上述简单的自定义替换功能是不够的。例如，用户可能希望只把出现在括号内的源串替换为目标串。这种文本处理可以通过在查找/替换中使用自定义替换表来解决。 <br /><br />在查找/替换功能中使用自定义替换表的替换函数是\Tn，其中n是0-9的数字， 注意n为0表示第10张替换表。如果略去n，其效果相当于\T1，即使用第一张替换表。例如要把所有放在方括号中的汉字替换为拼音，可以查找&#8220;\[(\E)\]&#8221;，替换为&#8220;\T{\1}&#8221;。即把第一个子表达式的匹配内容按自定义替换表转换。注意，如果\T函数的参数不在替换表的源串中，\T函数的结果与源串相同，即不做任何变换。 <br /><br />有些情况下，用户可能希望只使用替换表的一部分内容。还是以拼音为例，前面给出的替换表中包含了拼音的音调，如果在替换时不希望加上这些音调数字，可以使用&#8220;过滤&#8221;功能。所谓过滤，其实是用一个正则表达式去分析替换表的目标串，并把其中的某个子表达式取出来。 <br /><br />使用&#8220;过滤&#8221;时，在&#8220;设置自定义替换表&#8221;对话框中，点&#8220;过滤&#8221;按钮，在弹出的对话框中填入一个正则表达式。再以拼音为例，表达式可以写为&#8220;(\p{Alpha}+)(\d)&#8221;，其中第一对括号中的是不含音调的拼音，第二对括号是音调。在调用\T函数时，JTextPro会在目标串中查找这个正则表达式。但是如何把其中的子表达式取出来呢？\T函数还有一个可选的下标，取第n个子表达式的值就写作\T{...}[n]。所以，把放在方括号中的汉字替换为不带调的拼音，可以查找&#8220;\[(\E)\]&#8221;，替换为&#8220;\T{\1}[1]&#8221;<br /><img src ="http://www.blogjava.net/hwpok/aggbug/387426.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/hwpok/" target="_blank">惠万鹏</a> 2012-09-10 23:15 <a href="http://www.blogjava.net/hwpok/archive/2012/09/10/387426.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>抓网页</title><link>http://www.blogjava.net/hwpok/archive/2008/07/14/214839.html</link><dc:creator>惠万鹏</dc:creator><author>惠万鹏</author><pubDate>Mon, 14 Jul 2008 15:24:00 GMT</pubDate><guid>http://www.blogjava.net/hwpok/archive/2008/07/14/214839.html</guid><wfw:comment>http://www.blogjava.net/hwpok/comments/214839.html</wfw:comment><comments>http://www.blogjava.net/hwpok/archive/2008/07/14/214839.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/hwpok/comments/commentRss/214839.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/hwpok/services/trackbacks/214839.html</trackback:ping><description><![CDATA[今天晚上,帮我一个同门师兄,解决一下问题.<br />
题目是,抓取一个网站的所以页面,并抓下这些页码的所有网址.<br />
代码如下:<br />
<div style="border-right: #cccccc 1px solid; padding-right: 5px; border-top: #cccccc 1px solid; padding-left: 4px; font-size: 13px; padding-bottom: 4px; border-left: #cccccc 1px solid; width: 98%; word-break: break-all; padding-top: 4px; border-bottom: #cccccc 1px solid; background-color: #eeeeee"><img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /><span style="color: #0000ff">package</span><span style="color: #000000">&nbsp;com.hwp.test;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.io.InputStream;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.net.HttpURLConnection;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.net.URL;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.ArrayList;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.HashMap;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.List;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.Map;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.Set;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.regex.Matcher;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.regex.Pattern;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">class</span><span style="color: #000000">&nbsp;SearchEngine<br />
<img id="Codehighlighter1_319_3200_Open_Image" onclick="this.style.display='none'; Codehighlighter1_319_3200_Open_Text.style.display='none'; Codehighlighter1_319_3200_Closed_Image.style.display='inline'; Codehighlighter1_319_3200_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_319_3200_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_319_3200_Closed_Text.style.display='none'; Codehighlighter1_319_3200_Open_Image.style.display='inline'; Codehighlighter1_319_3200_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" align="top"  alt="" /></span><span id="Codehighlighter1_319_3200_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_319_3200_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;Map</span><span style="color: #000000">&lt;</span><span style="color: #000000">String,&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">String</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">&nbsp;pageNameUrls;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;SearchEngine()<br />
<img id="Codehighlighter1_408_474_Open_Image" onclick="this.style.display='none'; Codehighlighter1_408_474_Open_Text.style.display='none'; Codehighlighter1_408_474_Closed_Image.style.display='inline'; Codehighlighter1_408_474_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_408_474_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_408_474_Closed_Text.style.display='none'; Codehighlighter1_408_474_Open_Image.style.display='inline'; Codehighlighter1_408_474_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_408_474_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_408_474_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pageNameUrls&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;HashMap</span><span style="color: #000000">&lt;</span><span style="color: #000000">String,&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">String</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;String&nbsp;getContent(String&nbsp;httpUrl)<br />
<img id="Codehighlighter1_531_1358_Open_Image" onclick="this.style.display='none'; Codehighlighter1_531_1358_Open_Text.style.display='none'; Codehighlighter1_531_1358_Closed_Image.style.display='inline'; Codehighlighter1_531_1358_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_531_1358_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_531_1358_Closed_Text.style.display='none'; Codehighlighter1_531_1358_Open_Image.style.display='inline'; Codehighlighter1_531_1358_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_531_1358_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_531_1358_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;htmlCode&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">""</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">try</span><span style="color: #000000"><br />
<img id="Codehighlighter1_583_1215_Open_Image" onclick="this.style.display='none'; Codehighlighter1_583_1215_Open_Text.style.display='none'; Codehighlighter1_583_1215_Closed_Image.style.display='inline'; Codehighlighter1_583_1215_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_583_1215_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_583_1215_Closed_Text.style.display='none'; Codehighlighter1_583_1215_Open_Image.style.display='inline'; Codehighlighter1_583_1215_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_583_1215_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_583_1215_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;InputStream&nbsp;in;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;URL&nbsp;url&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;java.net.URL(httpUrl);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;HttpURLConnection&nbsp;connection&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;(HttpURLConnection)&nbsp;url<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.openConnection();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connection&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;(HttpURLConnection)&nbsp;url.openConnection();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connection.setRequestProperty(</span><span style="color: #000000">"</span><span style="color: #000000">User-Agent</span><span style="color: #000000">"</span><span style="color: #000000">,&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">Mozilla/4.0</span><span style="color: #000000">"</span><span style="color: #000000">);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connection.connect();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;connection.getInputStream();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">byte</span><span style="color: #000000">[]&nbsp;buffer&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">byte</span><span style="color: #000000">[</span><span style="color: #000000">512</span><span style="color: #000000">];<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;length&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">-</span><span style="color: #000000">1</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">&nbsp;((length&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;in.read(buffer,&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">,&nbsp;</span><span style="color: #000000">512</span><span style="color: #000000">))&nbsp;</span><span style="color: #000000">!=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">-</span><span style="color: #000000">1</span><span style="color: #000000">)<br />
<img id="Codehighlighter1_1132_1205_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1132_1205_Open_Text.style.display='none'; Codehighlighter1_1132_1205_Closed_Image.style.display='inline'; Codehighlighter1_1132_1205_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_1132_1205_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1132_1205_Closed_Text.style.display='none'; Codehighlighter1_1132_1205_Open_Image.style.display='inline'; Codehighlighter1_1132_1205_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_1132_1205_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_1132_1205_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;htmlCode&nbsp;</span><span style="color: #000000">+=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;String(buffer,&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">,&nbsp;length);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">catch</span><span style="color: #000000">&nbsp;(Exception&nbsp;e)<br />
<img id="Codehighlighter1_1253_1254_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1253_1254_Open_Text.style.display='none'; Codehighlighter1_1253_1254_Closed_Image.style.display='inline'; Codehighlighter1_1253_1254_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_1253_1254_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1253_1254_Closed_Text.style.display='none'; Codehighlighter1_1253_1254_Open_Image.style.display='inline'; Codehighlighter1_1253_1254_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_1253_1254_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_1253_1254_Open_Text"><span style="color: #000000">{}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(htmlCode&nbsp;</span><span style="color: #000000">==</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">null</span><span style="color: #000000">)<br />
<img id="Codehighlighter1_1294_1327_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1294_1327_Open_Text.style.display='none'; Codehighlighter1_1294_1327_Closed_Image.style.display='inline'; Codehighlighter1_1294_1327_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_1294_1327_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1294_1327_Closed_Text.style.display='none'; Codehighlighter1_1294_1327_Open_Image.style.display='inline'; Codehighlighter1_1294_1327_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_1294_1327_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_1294_1327_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">return</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">""</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">return</span><span style="color: #000000">&nbsp;htmlCode;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">String</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;getPageUrls(String&nbsp;page)<br />
<img id="Codehighlighter1_1419_1951_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1419_1951_Open_Text.style.display='none'; Codehighlighter1_1419_1951_Closed_Image.style.display='inline'; Codehighlighter1_1419_1951_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_1419_1951_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1419_1951_Closed_Text.style.display='none'; Codehighlighter1_1419_1951_Open_Image.style.display='inline'; Codehighlighter1_1419_1951_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_1419_1951_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_1419_1951_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">String</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;urls&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;ArrayList</span><span style="color: #000000">&lt;</span><span style="color: #000000">String</span><span style="color: #000000">&gt;</span><span style="color: #000000">();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;content&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">this</span><span style="color: #000000">.getContent(page);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;reg&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">http://([\\w-]+\\.)+[\\w-]+(/[\\w-&nbsp;./?%&amp;=]*)?</span><span style="color: #000000">"</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Pattern&nbsp;pattern&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;Pattern.compile(reg);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Matcher&nbsp;matcher&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;pattern.matcher(content);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;url&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">""</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">&nbsp;(matcher.find())<br />
<img id="Codehighlighter1_1783_1924_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1783_1924_Open_Text.style.display='none'; Codehighlighter1_1783_1924_Closed_Image.style.display='inline'; Codehighlighter1_1783_1924_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_1783_1924_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1783_1924_Closed_Text.style.display='none'; Codehighlighter1_1783_1924_Open_Image.style.display='inline'; Codehighlighter1_1783_1924_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_1783_1924_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_1783_1924_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;url&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;matcher.group();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(</span><span style="color: #000000">!</span><span style="color: #000000">urls.contains(url))<br />
<img id="Codehighlighter1_1869_1914_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1869_1914_Open_Text.style.display='none'; Codehighlighter1_1869_1914_Closed_Image.style.display='inline'; Codehighlighter1_1869_1914_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_1869_1914_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1869_1914_Closed_Text.style.display='none'; Codehighlighter1_1869_1914_Open_Image.style.display='inline'; Codehighlighter1_1869_1914_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_1869_1914_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_1869_1914_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;urls.add(url);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">return</span><span style="color: #000000">&nbsp;urls;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;test(String&nbsp;url,&nbsp;String&nbsp;baseUrl)<br />
<img id="Codehighlighter1_2011_2664_Open_Image" onclick="this.style.display='none'; Codehighlighter1_2011_2664_Open_Text.style.display='none'; Codehighlighter1_2011_2664_Closed_Image.style.display='inline'; Codehighlighter1_2011_2664_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_2011_2664_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_2011_2664_Closed_Text.style.display='none'; Codehighlighter1_2011_2664_Open_Image.style.display='inline'; Codehighlighter1_2011_2664_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_2011_2664_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_2011_2664_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;content&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">this</span><span style="color: #000000">.getContent(url);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">&nbsp;System.out.println(content);</span><span style="color: #008000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;reg&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">(</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;baseUrl<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #000000">+</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">(/[\\w-]+)*(/[\\w-]+\\.(htm|html|xhtml|jsp|asp|php)))</span><span style="color: #000000">"</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Pattern&nbsp;pattern&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;Pattern.compile(reg);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Matcher&nbsp;matcher&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;pattern.matcher(content);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">&nbsp;(matcher.find())<br />
<img id="Codehighlighter1_2355_2658_Open_Image" onclick="this.style.display='none'; Codehighlighter1_2355_2658_Open_Text.style.display='none'; Codehighlighter1_2355_2658_Closed_Image.style.display='inline'; Codehighlighter1_2355_2658_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_2355_2658_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_2355_2658_Closed_Text.style.display='none'; Codehighlighter1_2355_2658_Open_Image.style.display='inline'; Codehighlighter1_2355_2658_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_2355_2658_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_2355_2658_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;tempUrl&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;matcher.group();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(</span><span style="color: #000000">!</span><span style="color: #0000ff">this</span><span style="color: #000000">.pageNameUrls.containsKey(tempUrl))<br />
<img id="Codehighlighter1_2472_2648_Open_Image" onclick="this.style.display='none'; Codehighlighter1_2472_2648_Open_Text.style.display='none'; Codehighlighter1_2472_2648_Closed_Image.style.display='inline'; Codehighlighter1_2472_2648_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_2472_2648_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_2472_2648_Closed_Text.style.display='none'; Codehighlighter1_2472_2648_Open_Image.style.display='inline'; Codehighlighter1_2472_2648_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_2472_2648_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_2472_2648_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">System.out.println(tempUrl);</span><span style="color: #008000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">this</span><span style="color: #000000">.pageNameUrls.put(tempUrl,&nbsp;</span><span style="color: #0000ff">this</span><span style="color: #000000">.getPageUrls(tempUrl));<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;test(tempUrl,&nbsp;baseUrl);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">static</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;main(String[]&nbsp;args)<br />
<img id="Codehighlighter1_2718_3198_Open_Image" onclick="this.style.display='none'; Codehighlighter1_2718_3198_Open_Text.style.display='none'; Codehighlighter1_2718_3198_Closed_Image.style.display='inline'; Codehighlighter1_2718_3198_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_2718_3198_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_2718_3198_Closed_Text.style.display='none'; Codehighlighter1_2718_3198_Open_Image.style.display='inline'; Codehighlighter1_2718_3198_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_2718_3198_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_2718_3198_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;url&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">http://www.blogjava.net</span><span style="color: #000000">"</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;baseUrl&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">http://www.blogjava.net</span><span style="color: #000000">"</span><span style="color: #000000">;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SearchEngine&nbsp;se&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;SearchEngine();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;se.test(url,&nbsp;baseUrl);<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Map</span><span style="color: #000000">&lt;</span><span style="color: #000000">String,&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">String</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">&nbsp;map</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;se.pageNameUrls;<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set</span><span style="color: #000000">&lt;</span><span style="color: #000000">Map.Entry</span><span style="color: #000000">&lt;</span><span style="color: #000000">String,&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">String</span><span style="color: #000000">&gt;&gt;&gt;</span><span style="color: #000000">&nbsp;set&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;map.entrySet();<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">(Map.Entry</span><span style="color: #000000">&lt;</span><span style="color: #000000">String,&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">String</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">&nbsp;entry:&nbsp;set)<br />
<img id="Codehighlighter1_3084_3192_Open_Image" onclick="this.style.display='none'; Codehighlighter1_3084_3192_Open_Text.style.display='none'; Codehighlighter1_3084_3192_Closed_Image.style.display='inline'; Codehighlighter1_3084_3192_Closed_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top"  alt="" /><img id="Codehighlighter1_3084_3192_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_3084_3192_Closed_Text.style.display='none'; Codehighlighter1_3084_3192_Open_Image.style.display='inline'; Codehighlighter1_3084_3192_Open_Text.style.display='inline';" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_3084_3192_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img src="http://www.blogjava.net/Images/dot.gif"  alt="" /></span><span id="Codehighlighter1_3084_3192_Open_Text"><span style="color: #000000">{<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(entry.getKey());<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(entry.getValue());<br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top"  alt="" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" align="top"  alt="" />}</span></span><span style="color: #000000"><br />
<img src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top"  alt="" /></span></div>
<img src ="http://www.blogjava.net/hwpok/aggbug/214839.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/hwpok/" target="_blank">惠万鹏</a> 2008-07-14 23:24 <a href="http://www.blogjava.net/hwpok/archive/2008/07/14/214839.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>正则表达式   抓取网页面上所有图片</title><link>http://www.blogjava.net/hwpok/archive/2008/04/30/197464.html</link><dc:creator>惠万鹏</dc:creator><author>惠万鹏</author><pubDate>Wed, 30 Apr 2008 02:58:00 GMT</pubDate><guid>http://www.blogjava.net/hwpok/archive/2008/04/30/197464.html</guid><wfw:comment>http://www.blogjava.net/hwpok/comments/197464.html</wfw:comment><comments>http://www.blogjava.net/hwpok/archive/2008/04/30/197464.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.blogjava.net/hwpok/comments/commentRss/197464.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/hwpok/services/trackbacks/197464.html</trackback:ping><description><![CDATA[<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
--><span style="color: #0000ff;">package</span><span style="color: #000000;">&nbsp;com.roadway.test;<br />
<br />
</span><span style="color: #0000ff;">import</span><span style="color: #000000;">&nbsp;java.io.InputStream;<br />
</span><span style="color: #0000ff;">import</span><span style="color: #000000;">&nbsp;java.net.HttpURLConnection;<br />
</span><span style="color: #0000ff;">import</span><span style="color: #000000;">&nbsp;java.net.URL;<br />
</span><span style="color: #0000ff;">import</span><span style="color: #000000;">&nbsp;java.util.regex.Matcher;<br />
</span><span style="color: #0000ff;">import</span><span style="color: #000000;">&nbsp;java.util.regex.Pattern;<br />
<br />
</span><span style="color: #0000ff;">public</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">class</span><span style="color: #000000;">&nbsp;TeskSRC&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">public</span><span style="color: #000000;">&nbsp;String&nbsp;getHtmlCode(String&nbsp;httpUrl)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;htmlCode&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;</span><span style="color: #000000;">""</span><span style="color: #000000;">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">try</span><span style="color: #000000;">&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;InputStream&nbsp;in;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;URL&nbsp;url&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">new</span><span style="color: #000000;">&nbsp;java.net.URL(httpUrl);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;HttpURLConnection&nbsp;connection&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;(HttpURLConnection)&nbsp;url<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.openConnection();<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connection&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;(HttpURLConnection)&nbsp;url.openConnection();<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connection.setRequestProperty(</span><span style="color: #000000;">"</span><span style="color: #000000;">User-Agent</span><span style="color: #000000;">"</span><span style="color: #000000;">,&nbsp;</span><span style="color: #000000;">"</span><span style="color: #000000;">Mozilla/4.0</span><span style="color: #000000;">"</span><span style="color: #000000;">);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;connection.connect();<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;connection.getInputStream();<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">byte</span><span style="color: #000000;">[]&nbsp;buffer&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">new</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">byte</span><span style="color: #000000;">[</span><span style="color: #000000;">512</span><span style="color: #000000;">];<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">int</span><span style="color: #000000;">&nbsp;length&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;</span><span style="color: #000000;">-</span><span style="color: #000000;">1</span><span style="color: #000000;">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">while</span><span style="color: #000000;">((length&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;in.read(buffer,</span><span style="color: #000000;">0</span><span style="color: #000000;">,</span><span style="color: #000000;">512</span><span style="color: #000000;">))&nbsp;</span><span style="color: #000000;">!=</span><span style="color: #000000;">&nbsp;</span><span style="color: #000000;">-</span><span style="color: #000000;">1</span><span style="color: #000000;">){<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;htmlCode&nbsp;</span><span style="color: #000000;">+=</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">new</span><span style="color: #000000;">&nbsp;String(buffer,</span><span style="color: #000000;">0</span><span style="color: #000000;">,length);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</span><span style="color: #0000ff;">catch</span><span style="color: #000000;">&nbsp;(Exception&nbsp;e)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">if</span><span style="color: #000000;">(htmlCode&nbsp;</span><span style="color: #000000;">==</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">null</span><span style="color: #000000;">){<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">return</span><span style="color: #000000;">&nbsp;</span><span style="color: #000000;">""</span><span style="color: #000000;">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">return</span><span style="color: #000000;">&nbsp;htmlCode;<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">public</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">static</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">void</span><span style="color: #000000;">&nbsp;main(String[]&nbsp;args){<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;TeskSRC&nbsp;ts&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;</span><span style="color: #0000ff;">new</span><span style="color: #000000;">&nbsp;TeskSRC();<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;searchImgReg&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;</span><span style="color: #000000;">"</span><span style="color: #000000;">(?x)(src|SRC|background|BACKGROUND)=('|\</span><span style="color: #000000;">"</span><span style="color: #000000;">)(http:</span><span style="color: #008000;">//</span><span style="color: #008000;">([\\w-]+\\.)+[\\w-]+(:[0-9]+)*(/[\\w-]+)*(/[\\w-]+\\.(jpg|JPG|png|PNG|gif|GIF)))('|\")";</span><span style="color: #008000;"><br />
</span><span style="color: #000000;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;content&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;ts.getHtmlCode(</span><span style="color: #000000;">"</span><span style="color: #000000;">http://www.163.com</span><span style="color: #000000;">"</span><span style="color: #000000;">);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Pattern&nbsp;pattern&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;Pattern.compile(searchImgReg);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Matcher&nbsp;matcher&nbsp;</span><span style="color: #000000;">=</span><span style="color: #000000;">&nbsp;pattern.matcher(content);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">while</span><span style="color: #000000;">(matcher.find()){<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(matcher.group(</span><span style="color: #000000;">3</span><span style="color: #000000;">));<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000;">//</span><span style="color: #008000;">searchImgReg&nbsp;&nbsp;=&nbsp;"(?x)(src|SRC|background|BACKGROUND)=('|\")/?(([\\w-]+/)*([\\w-]+\\.(jpg|JPG|png|PNG|gif|GIF)))('|\")";</span><span style="color: #008000;"><br />
</span><span style="color: #000000;">&nbsp;&nbsp;&nbsp;&nbsp;}<br />
}<br />
</span></div>
<img src ="http://www.blogjava.net/hwpok/aggbug/197464.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/hwpok/" target="_blank">惠万鹏</a> 2008-04-30 10:58 <a href="http://www.blogjava.net/hwpok/archive/2008/04/30/197464.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>