grep的非贪婪模式 - zhyiwww - 语源科技BlogJava

最近在项目中，我希望能通过grep实现从一个html页面中检索出所有的超链接，
比如下面的一段代码

<tr class=rb><td class=pl><a href=mail.htm>邮箱</a></td><td><a href=http://mail.163.com/>163邮箱</a>
<a href="http://cn.mail.yahoo.com/?id=40014" class="greenfont">雅虎邮箱</a><a href=http://www.126.com/>126邮箱</a>
　　<a href=http://mail.sina.com.cn/>新浪邮箱</a> 　　<a href=http://mail.qq.com/>QQ邮箱</a> 　　<a href=http://www.hotmail.com/>Hotmail</a></td><td><a href=mail.htm>更多 »</a></td></tr>

<tr class=ry><td class=pl><a href=wangmei.htm>视频</a></td><td><a href=http://www.youku.com/>优酷网</a>　　<a href="http://www.tudou.com/">土豆网</a>　　<a href="http://www.ku6.com/">酷6网</a>　　<a href=http://6.cn/>六间房</a>　　<a href=http://www.openv.com/>OpenV天线</a>　　<a href=http://www.joy.cn/>激动网</a></td><td><a href=wangmei.htm>更多 »</a></td></tr>

我希望能一次检索出所有的<a href=http:.*/>的信息, 我用的命令如下
C:\tmp>grep -ior "href=.*\/>" a.txt（回车）
结果如下：
<tr class=rb><td class=pl><a href=mail.htm>邮箱</a></td><td><a href=http://mail.163.com/>163邮箱</a> 　　<a href="http://cn.mail.yahoo.com/?id=40014" class="greenfont">雅虎邮箱</a> 　　<a href=http://www.126.com/>126邮箱</a> 　　<a href=http://mail.sina.com.cn/>新浪邮箱</a> 　　<a href=http://mail.qq.com/>QQ邮箱</a> 　　<a href=http://www.hotmail.com/>Hotmail</a></td><td><a href=mail.htm>更多 »</a></td></tr>

因为这种模式是贪婪匹配模式。我希望能用非贪婪模式，来进行匹配，方法是通过在*修饰副后面添加\？,修改如下：

C:\tmp>grep -ior "href=.*\?\/>" a.txt
结果如下：
href=mail.htm>邮箱</a></td><td><a href=http://mail.163.com/>163邮箱</a> 　　<a href="http://cn.mail.yahoo.com/?id=40014
" class="greenfont">雅虎邮箱</a> 　　<a href=http://www.126.com/>126邮箱</a> 　　<a href=http://mail.sina.com.cn/>新浪邮
箱</a> 　　<a href=http://mail.qq.com/>QQ邮箱</a> 　　<a href=http://www.hotmail.com/>

我期望的结果如下：

href=mail.htm
href=http://mail.163.com/
href=
href=http://www.126.com/
href=http://mail.sina.com.cn/
href=http://mail.qq.com/
href=http://www.hotmail.com/
href=mail.htm
不知道如何实现。如果您有解决方案，请多多指导。先谢了。

|----------------------------------------------------------------------------------------|
版权声明版权所有 @zhyiwww
引用请注明来源 http://www.blogjava.net/zhyiwww
|----------------------------------------------------------------------------------------|

posted on 2008-09-26 13:25 zhyiwww 阅读(3044) 评论(1) 编辑收藏所属分类: linux

常用链接

留言簿(33)

随笔分类(626)

朋友的博客

最新随笔

搜索

积分与排名

最新评论

阅读排行榜

评论排行榜


只有注册用户登录后才能发表评论。




网站导航: 博客园博客园最新博文博问管理
相关文章: ubuntu上安装repo 禅道PDO_MySQL扩展的安装 apache+subversion+ssl配置 tar打包时排除一些文件或者目录 find仅列某一级目录的内容 linux查看目录大小红帽5.4企业版上yum的安装和配置 Shell脚本执行时出现declare: not found的解决方法 Shell把字符串声明成变量 Ubuntu下修改PDF默认打开程序