﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>语源科技BlogJava-Python, Java, Life, etc</title><link>http://www.blogjava.net/pyguru/</link><description>A blog of technology and life.</description><language>zh-cn</language><lastBuildDate>Wed, 15 Apr 2026 11:55:57 GMT</lastBuildDate><pubDate>Wed, 15 Apr 2026 11:55:57 GMT</pubDate><ttl>60</ttl><item><title>Parsing MIME &amp; HTML</title><link>http://www.blogjava.net/pyguru/archive/2005/02/19/1312.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Fri, 18 Feb 2005 16:33:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/19/1312.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1312.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/19/1312.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1312.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1312.html</trackback:ping><description><![CDATA[


<!-- $Id: top,v 1.1.1.1 2005/01/01 14:55:47 lem Exp $ -->
<div class="header">
<a class="toplink" href="http://mipagina.cantv.net/lem/perl/index-en.html">
<img class="logo" alt="Logo" src="http://mipagina.cantv.net/lem/images/perl.gif"></a><span class="title">Parsing MIME &amp; HTML</span><br>

<!-- $Id: navbar,v 1.1.1.1 2005/01/02 00:14:27 lem Exp $ -->
<span class="navbar"><a href="http://mipagina.cantv.net/lem/index-en.html"></a></span></div>
<div class="main"><p>Understanding an email message encoded with
MIME can be very, very, very difficult. It can get frustrating due to
the number of options and different ways to do the actual
encoding. Now add to that the sometimes too-liberal interpretations of
the relevant RFCs by the email client designers and you will begin to
get the idea. This article will show you how this task can be
laughably simple thanks to Perl's extensive bag of tricks, <a href="http://www.cpan.org/">CPAN<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a>.
</p>

<p>
I started out with a simple and straightforward mission: Fetch an
Email from a POP mailbox and display it in a 7-bit, text-only capable
device. This article describes the different stages for a simple tool
that accomplishes this task, written in Perl with a lot of help from
CPAN modules. I hope this to be useful to other Perl folks who might
have a similar mission.   Let's discuss each part of this task
in turn, as we read through <tt>mfetch</tt>, the script I prepared as
an example. Keep in mind that TIMTOWTDI.</p>


<h2>Setting up the script</h2>

<p>

</p>

<p>
The first thing, as you know, is loading up all of the modules I
will be using. I'm sure you already know <tt>strict</tt> and
<tt>warnings</tt>. We'll see how do we use the rest of the modules a
bit later.</p>
<pre class="listing">    1: #!/usr/bin/perl<br>    2: <br>    3: # This script is (c) 2002 Luis E. Muñoz, All Rights Reserved<br>    4: # This code can be used under the same terms as Perl itself. It comes<br>    5: # with absolutely NO WARRANTY. Use at your own risk.<br>    6: <br>    7: use strict;<br>    8: use warnings;<br>    9: use IO::File;<br>   10: use Net::POP3;<br>   11: use NetAddr::IP;<br>   12: use Getopt::Std;<br>   13: use MIME::Parser;<br>   14: use HTML::Parser;<br>   15: use Unicode::Map8;<br>   16: use MIME::WordDecoder;<br>   17: <br>   18: use vars qw($opt_s $opt_u $opt_p $opt_m $wd $e $map);<br>   19: <br>   20: getopts('s:u:p:m:');<br>   21: <br>   22: usage_die("-s server is required\n") unless $opt_s;<br>   23: usage_die("-u username is required\n") unless $opt_u;<br>   24: usage_die("-p password is required\n") unless $opt_p;<br>   25: usage_die("-m message is required\n") unless $opt_m;<br>   26: <br>   27: $opt_s = NetAddr::IP-&gt;new($opt_s)<br>   28:     or die "Cannot make sense of given server\n";<br></pre>

<p>
 
</p>

<p>
Note the lines 27 and 28, where I use <a href="http://search.cpan.org/search?query=NetAddr%3A%3AIP&amp;mode=module">NetAddr::IP<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> to convert whatever the user gave us through the
<tt>-s</tt> option into an IP address. This is a very common use of
this module, as its <tt>new()</tt> method will convert many common IP
notations into an object I can later extract an IP address from. It
will even perform a name resolution for us if required. So far,
everything should look familiar, as a lot of scripts start like this
one.
</p>

<p>
It is worth noting that the error handling in lines 22-25 is not a
brilliant example of good coding or documentation. It is much better
practice to write your scripts' documentation in POD, and use a module
such as <a href="http://search.cpan.org/search?query=Pod%3A%3AUsage&amp;mode=module">Pod::Usage<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> to provide useful error messages to the user. At
the very least, try to provide an informative usage message. You can
see the <tt>usage_die()</tt> function if you <a href="http://mipagina.cantv.net/lem/perl/mfetch">download the complete script</a>.</p>


<h2>Fetching a message via POP3</h2>

<p>

</p>

<p>
 On to deeper waters. The first step in parsing a message, is getting
at the message itself. For this, I'll use <a href="http://search.cpan.org/search?query=Net%3A%3APOP3&amp;mode=module">Net::POP3<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a>, which implements the POP3 protocol described in
<a href="http://www.rfc-editor.org/rfc/rfc1939.txt">RFC-1939<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a>. This is all done in the code below.</p>
<pre class="listing">   30: my $pops = Net::POP3-&gt;new($opt_s-&gt;addr)<br>   31:     or die "Failed to connect to POP3 server: $!\n";<br>   32: <br>   33: $pops-&gt;login($opt_u, $opt_p)<br>   34:     or die "Authentication failed\n";<br>   35: <br>   36: my $fh = new_tmpfile IO::File<br>   37:     or die "Cannot create temporary file: $!\n";<br>   38: <br>   39: $pops-&gt;get($opt_m, $fh)<br>   40:     or die "No such message $opt_m\n";<br>   41: <br>   42: $pops-&gt;quit();<br>   43: $pops = undef;<br>   44: <br>   45: $fh-&gt;seek(0, SEEK_SET);<br></pre>

<p>
 
</p>

<p>
At line 30, a connection to the POP server is attempted. This is a
TCP connection, in this case to port 110. If this connection succeeds,
the <tt>USER</tt> and <tt>PASS</tt> commands are issued at line 33,
which are the simplest form of authentication supported by the POP
protocol. Your username and password are being sent here through the
network without the protection of cryptography, so a bit of caution is
in order.
</p>

<p>
<a href="http://search.cpan.org/search?query=Net%3A%3APOP3&amp;mode=module">Net::POP3<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> supports many operations defined in the POP
protocol that allow for more complex actions, such as fetching the
list of messages, unseen messages, etc. It can also fetch messages for
us in a variety of ways. Since I want this script to be as lightweight
as possible (i.e., to burn as little memory as possible), I want to
fetch the message to a temporary on-disk file. The temporary file is
nicely provided by the <tt>new_tmpfile</tt> method of <a href="http://search.cpan.org/search?query=IO%3A%3AFile&amp;mode=module">IO::File<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> in line 36, which returns a file handle to a
deleted file. I can work on this file, which will magically disappear
when the script is finished.
</p>

<p>
Later, I instruct the <tt>Net::POP3</tt> object to fetch the required
message from the mail server and write it to the supplied filehandle
using the <tt>get</tt> method, on line 39. After this, the connection
is terminated gracefully by invoking <tt>quit</tt> and destroying the
object. Destroying the object insures that the TCP connection with the
server is terminated, freeing the resources being held in the POP
server as soon as possible. This is a good programming practice for
network clients.
</p>

<p>
The interaction required by <tt>mfetch</tt> with the POP server is
really simple, so I'm not making justice to <tt>Net::POP3</tt>. It
provides a very complete implementation of the protocol, allowing for
much more sophisticated applications.
</p>

<p>
Note that in line 45, I <em>rewind</em> the file so that the fetched
message can be read back by the code that follows.
</p>

<p>
For this particular example, we could also have used
<tt>Net::POP3Client</tt>, which provides a somewhat similar
interface. The code would have looked more or less like the following
fragment.</p>
<pre class="listing">    1: my $pops = new Net::POP3Client(USER =&gt; $opt_u,<br>    2:                                PASSWORD =&gt; $opt_p,<br>    3:                                HOST =&gt; $opt_s-&gt;addr)<br>    4:     or die "Error connecting or logging in: $!\n";<br>    5: <br>    6: my $fh = new_tmpfile IO::File<br>    7:     or die "Cannot create temporary file: $!\n";<br>    8: <br>    9: $pops-&gt;HeadAndBodyToFile($fh, $opt_m)<br>   10:     or die "Cannot fetch message: $!\n";<br>   11: <br>   12: $pops-&gt;Close();<br></pre>

<h2>Parsing the MIME structure</h2>

<p>

</p>

<p>
Just as email travels inside a sort of envelope (the headers), complex
messages that include attachments and generally, HTML messages, travel
within a collection of MIME <em>entities</em>. You can think of these
entities as containers that can transfer any kind of binary
information through the Email infrastructure, which in general does
not know how to deal with 8-bit data. The code reproduced below, takes
care of parsing this MIME structure.</p>
<pre class="listing">   47: my $mp = new MIME::Parser;<br>   48: $mp-&gt;ignore_errors(1);<br>   49: $mp-&gt;extract_uuencode(1);<br>   50: <br>   51: eval { $e = $mp-&gt;parse($fh); };<br>   52: my $error = ($@ || $mp-&gt;last_error);<br>   53: <br>   54: if ($error)<br>   55: {<br>   56:     $mp-&gt;filer-&gt;purge;                # Get rid of the temp files<br>   57:     die "Error parsing the message: $error\n";<br>   58: }<br></pre>

<p>
 
</p>

<p>
Perl has a wonderful class that provides the ability to understand
this MIME encapsulation, returning a nice hierarchy of objects that
represent the message. You access this facilities through the <a href="http://search.cpan.org/search?query=MIME%3A%3AParser&amp;mode=module">MIME::Parser<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> class, part of the <a href="http://search.cpan.org/search?query=MIME-Tools&amp;mode=dist">MIME-Tools<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> bundle. <tt>MIME::Parser</tt> returns a hierarchy
of <tt>MIME::Entity</tt> objects representing your message. The parser
is so smart, that if you pass it a non-MIME email, it will be returned
to you as a <tt>text/plain</tt> entity.
</p>

<p>
<tt>MIME::Parser</tt> can be tweaked in many ways, as its
documentation will show you. One of the points where this toggling
might be important, is the decoding process. Remember that I need to
be as light in memory usage as possible. The default behavior of
<tt>MIME::Parser</tt> involves the use of temporary files for decoding
of the message. These temporary files can be spared and core memory
used instead by invoking <tt>output_to_core()</tt>. Before doing this,
note all the caveats cited in the module's documentation. The most
important one is that if a 100 MB file ends up in your inbox, this
whole thing needs to be slurped into RAM.
</p>

<p>
In line 47 I create the parser object. The call to
<tt>ignore_errors()</tt> in line 48 is an attempt to made this parser
as tolerant as possible. <tt>extract_uuencode()</tt> in line 49, takes
care of pieces of the email that are uu-encoded automatically,
translating them back into a more readable form.  The actual request
to parse the message, available through reading the <tt>$fh</tt>
filehandle, is in line 51. Note that it is enclosed in an
<tt>eval</tt> block. I have to do this as the parser might throw an
exception if certain errors are encountered. The <tt>eval</tt> allows
me to catch this exception and react in a way that is sensible to this
application. In this case, I want to be sure that any temporary file
created by the parsing process is cleared by a call to
<tt>purge()</tt>, as seen in lines 56 and 57.</p>


<h2>Setting up the HTML parser</h2>

<p>

</p>

<p>
Parsing HTML can be a tricky and tedious task. Thankfully, Perl has a
number of nice ways to help you do this job. A number of excellent
books such as <em>The Perl Cookbook</em> (<a href="http://www.oreilly.com/catalog/cookbook/">from O'Reilly &amp;
Associates<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a>) has a couple of recipes that came very close to what I
needed, especially recipe 20.5, <em>"Converting HTML to ASCII"</em>,
which I reproduce below.</p>
<pre class="listing">    1: use HTML::TreeBuilder;<br>    2: use HTML::FormatText;<br>    3: <br>    4: $html = HTML::TreeBuilder-&gt;new();<br>    5: $html-&gt;parse($document);<br>    6: <br>    7: $formatter = HTML::FormatText-&gt;new(leftmargin =&gt; 0, rightmargin =&gt; 50);<br>    8: <br>    9: $ascii = $formatter-&gt;format($html);<br></pre>

<p>

</p>

<p>
I did not want to use this recipe because of two reasons: I needed
fine-grained control in the HTML to ASCII conversion and I wanted to
have as little impact as possible in resources. I did a small <a href="http://mipagina.cantv.net/lem/perl/mbench">benchmark</a> that shows the kind of performance
difference among the two options while parsing a copy of one of my web
articles. The result below shows that the custom parser explained
later runs faster than the Cookbook's recipe. This does not mean that
the recipe or the modules it uses are bad. This result simply means
that the recipe is actually doing a lot of additional work, which just
happens to not be all that useful for this particular task.</p>
<pre class="console">bash-2.05a$ ./mbench<br>Benchmark: timing 100 iterations of Cookbook's, Custom...<br>Cookbook's: 73 wallclock secs (52.82 usr +  0.00 sys = 52.82 CPU) @  1.89/s (n=100)<br>    Custom:  1 wallclock secs ( 1.17 usr +  0.00 sys =  1.17 CPU) @ 85.47/s (n=100)<br>             Rate Cookbook's     Custom<br>Cookbook's 1.89/s         --       -98%<br>Custom     85.5/s      4415%         --</pre>

<p>

</p>

<p>
<a href="http://search.cpan.org/search?query=HTML%3A%3AFormatText&amp;mode=module">HTML::FormatText<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> does an awesome job of converting the HTML
to plain text. Unfortunately I have a set of guidelines that I need to
follow in the conversion and that are not compatible with the output
of this module. Additionally, <a href="http://search.cpan.org/search?query=HTML%3A%3ATreeBuilder&amp;mode=module">HTML::TreeBuilder<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> does an excellent job of parsing an HTML
document, but produces an intermediate structure - the parse tree -
that in my case, wastes resources.
</p>

<p>
However, Perl has an excellent HTML parser in the <a href="http://search.cpan.org/search?query=HTML%3A%3AParser&amp;mode=module">HTML::Parser<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> module. In this case, I chose to use this class
to implement an event-driven parser, where tokens (syntactic elements)
in the source document cause the parser to call functions I
provide. This allowed me complete control on the translation while
sparing the intermediate data structure.
</p>

<p>
Converting HTML to text is a lossy transformation. This means that
what goes out of this transformation is not exactly equivalent to what
went in in the first place. Pictures, text layout, style and a few
other information elements are lost. My needs required that I noted
the existence of images as well as a reasonably accurate rendition of
the page's text, but nothing else. Remember that the target device can
only display 7-bit text, and this is within a very small and limited
display. This piece of code sets up the parser to do what I need.</p>
<pre class="listing">   62: my $parser = HTML::Parser-&gt;new<br>   63: (<br>   64:  api_version =&gt; 3,<br>   65:  default_h =&gt; [ "" ],<br>   66:  start_h   =&gt; [ sub { print "[IMG ", <br>   67:                       d($_[1]-&gt;{alt}) || $_[1]-&gt;{src},"]\n" <br>   68:                           if $_[0] eq 'img';<br>   69:                      }, "tagname, attr" ],<br>   70:  text_h    =&gt; [ sub { print d(shift); }, "dtext" ],<br>   71: ) or die "Cannot create HTML parser\n";<br>   72: <br>   73: $parser-&gt;ignore_elements(qw(script style));<br>   74: $parser-&gt;strict_comment(1);<br></pre>

<p>
 
</p>

<p>
Starting on line 71, I set up the <tt>HTML::Parser</tt>
object that will help me do this. First, I tell it I want to use the
latest (as of this writing) interface style, which provides more
flexibility than earlier interfaces. On line 65, I tell the object
that by default, no parse events should do anything. There are other
ways to say this, but the one shown is the most efficient.
</p>

<p>
Lines 66 through 69 define a handler for the <em>start</em>
events. This handler will be called each time an opening tag such as
<tt>&lt;a&gt;</tt> or <tt>&lt;img&gt;</tt> is recognized in the source
being parsed. Handlers are specified as a reference to an array whose
first element tells the parser what to do and its second element,
tells the parser what information to pass to the code. In this
example, I supply a function that for any <tt>img</tt> tag, will
output a hopefully descriptive text composed with either the
<tt>alt</tt> or the <tt>src</tt> attributes. I request this handler to
be called with the name of the tag as the first argument and the list
of attributes as further arguments, through the string <tt>"tagname,
attr"</tt> found in line 69. The <tt>d()</tt> function will be
explained a bit later, but it has to do with decoding its
argument.
</p>

<p>
The <em>text</em> event will be triggered by anything inside tags
in the input text. I've set up a simpler handler for this event that
merely prints out whatever is recognized. I also request that HTML
entities such as <tt>&amp;euro;</tt> or <tt>&amp;ntilde;</tt> be
decoded for me through the string <tt>"dtext"</tt> on line 70. HTML
entities are used to represent special characters outside the
traditional ASCII range. In the interest of document accuracy, you
should always use entities instead of directly placing 8-bit
characters in the text.
</p>

<p>
Some syntactic elements are used to enclose information that is not
important for this application, such as
<tt>&lt;style&gt;...&lt;/style&gt;</tt> and
<tt>&lt;script&gt;...&lt;/script&gt;</tt>. I ask the parser to ignore
those elements with the call to <tt>ignore_elements()</tt> at line
73. I also request the parser to follow strict comment syntax through
the call to <tt>strict_comment()</tt> on line 74.</p>


<h2>Setting up the Unicode mappings</h2>

<p>

</p>

<p>
MIME defines various ways to encode binary data depending on the
frequency of octets greater than 127. With relatively few high-bit
octets, <em>Quoted-Printable</em> encoding is used. When many high-bit
octets are present, <em>Base-64</em> encoding is used instead. The
reason is that Quoted-Printable is slightly more readable but very
inefficient in space while Base-64 is completely unreadable by
standard humans and adds much less overhead in the size of encoded
files. Often, message headers such as the sender's name are encoded
using Quoted-Printable when they contain characters such as a
'ñ'. These headers look like <tt>From:
=?ISO-8859-1?Q?Luis_Mu=F1oz?= &lt;some@body.org&gt;</tt> and should be
converted to <tt>From: Luis Muñoz &lt;some@body.org&gt;</tt>. In plain
english, Quoted-Printable encoding is being used to make the extended
ISO-8859-1 characters acceptable for any 7-bit transport such as
email. Many contemporary mail transport agents can properly handle
message bodies that contain high-bit octets but will choke on headers
with binary data, in case you were wondering why all this fuzz.
</p>

<p>
Lines 92 through 102 define <tt>setup_decoder()</tt>, which can use
the headers contained in a <a href="http://search.cpan.org/search?query=MIME%3A%3AHead&amp;mode=module">MIME::Head<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> object to setup a suitable decoder based on the
<a href="http://search.cpan.org/search?query=MIME%3A%3AWordDecoder&amp;mode=module">MIME::WordDecoder<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> class. This will translate instances of
Quoted-Printable text, to its high-bit equivalent. Note that I
selected ISO-8859-1 as the default in case no proper character set can
be identified. This was a sensible choice for me, as ISO-8859-1
encloses spanish characters, which happen to be my native language.</p>
<pre class="listing">   92: sub setup_decoder<br>   93: {<br>   94:     my $head = shift;<br>   95:     if ($head-&gt;get('Content-Type')<br>   96:         and $head-&gt;get('Content-Type') =~ m!charset="([^\"]+)"!)<br>   97:     {<br>   98:         $wd = supported MIME::WordDecoder uc $1;<br>   99:     }<br>  100:     $wd = supported MIME::WordDecoder "ISO-8859-1" unless $wd;<br>  101: }<br></pre>
<!--" This double quote is here because of the HTML mode in Emacs 
--> 

<p>

</p>

<p>
But this clever decoding is not enough. Getting at the original
high-bit characters is not enough. I must recode these high characters
into something usable by the 7-bit display device. So in line 76 I set
up a mapping based on <a href="http://search.cpan.org/search?query=Unicode%3A%3AMap8&amp;mode=module">Unicode::Map8<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a>. This module can convert 8-bit characters such
as ISO-8859-1 or ASCII into wider characters (<a href="http://www.unicode.org/">Unicode<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a>) and then back into
our chosen representation, ASCII, which only defines 7-bit
characters. This means that any character that cannot be properly
represented, will be lost, which for our application is acceptable.</p>
<pre class="listing">   76: $map = Unicode::Map8-&gt;new('ASCII')<br>   77:     or die "Cannot create character map\n";<br></pre>

<p>

</p>

<p>
The decoding and character mapping is then brought together at line
90, where I define the <tt>d()</tt> function, that simply invokes the
adequate MIME decoding method, transforms the resulting string into
Unicode via the <tt>to16()</tt> method and then, transforms it back
into <tt>ASCII</tt> using <tt>to8()</tt> to insure printable results
in our device. Since I am allergic to warnings related to
<tt>undef</tt> values, I make sure that <tt>decode()</tt> always get a
defined string to work with.</p>
<pre class="listing">   90: sub d { $map-&gt;to8($map-&gt;to16($wd-&gt;decode(shift||''))); }<br></pre>

<p>

</p>

<p>
 As you might notice if you try this code, the conversion is again
lossy because there are characters that does not exist in
<tt>ASCII</tt>. You can experiment with the <tt>addpair()</tt> method
to <tt>Unicode::Map8</tt> in order to add custom character
transformations (i.e., '€' might be 'E'). Another way to achieve
this, is through deriving a class from <tt>Unicode::Map8</tt> and
implementing the <tt>unmapped_to8</tt> method to supply your own
interpretation of the missing characters. Take a look at the module's
documentation for more information.</p>


<h2>Starting the decode process</h2>

<p>

</p>

<p>
With all the pieces in place, all that's left is to traverse the
hierarchy of entities that <tt>MIME::Parser</tt> provides after
parsing a message. I implemented a very simple recursive function
<tt>decode_entities</tt> starting at line 103. This is a recursive
function because recursion comes naturally as a way to handle trees
such as those produced by <tt>MIME::Parser</tt>. At
least to me.</p>
<pre class="listing">  103: sub decode_entities<br>  104: {<br>  105:     my $ent = shift;<br>  106: <br>  107:     if (my @parts = $ent-&gt;parts)<br>  108:     {<br>  109:         decode_entities($_) for @parts;<br>  110:     }<br>  111:     elsif (my $body = $ent-&gt;bodyhandle)<br>  112:     {<br>  113:         my $type = $ent-&gt;head-&gt;mime_type;<br>  114: <br>  115:         setup_decoder($ent-&gt;head);<br>  116: <br>  117:         if ($type eq 'text/plain')<br>  118:         { print d($body-&gt;as_string); }<br>  119:         elsif ($type eq 'text/html')<br>  120:         { $parser-&gt;parse($body-&gt;as_string); }<br>  121:         else<br>  122:         { print "[Unhandled part of type $type]"; }<br>  123:     }<br>  124: }<br></pre>

<p>

</p>

<p>
The condition at line 107 asks if this part or entity contains
other parts. If it does, it extracts them and invokes itself
recursively to process each sub-part at line 109.
</p>

<p>
If this part is a leaf, its body is processed. Line 111 gets it as a
<a href="http://search.cpan.org/search?query=MIME%3A%3ABody&amp;mode=module">MIME::Body<img alt="no alt defined" src="http://mipagina.cantv.net/lem/images/icon_external_link.gif" height="13" width="14"></a> object. On line 155 I setup a decoder for this
part's encoding and based on the type of this part, taken at line 113,
the code on lines 117 to 122 call the proper handlers.
</p>

<p>
In order to fire the decoding process, I call
<tt>decode_entities()</tt> with the result of the MIME decoding of the
message on line 86. This will invoke the HTML parser when needed and
in general, produce the output I look for in this example. After this
processing is done, I make sure to wipe temporary files created by
<tt>MIME::Parser</tt> on line 88. Note that if the message is not
actually encoded with MIME, <tt>MIME::Parser</tt> will arrange for you
to receive a single part of type <tt>text/plain</tt> that contains the
whole message text, which is perfect for our application.</p>
<pre class="listing">   86: decode_entities($e);<br>   87: <br>   88: $mp-&gt;filer-&gt;purge;<br></pre>

<h2>And that's about it</h2>

<p>

</p>

<p>
After these less than 130 lines of code, I can
easily fetch and decode a message, such as in the following
example:</p>
<pre class="console">bash-2.05a$ <strong>./mfetch -s pop.foo.bar -u myself \<br>            -p very_secure_password -m 5</strong><br>Date: Sat, 28 Dec 2002 20:14:37 -0400<br>From: root &lt;root@foo.bar&gt;<br>To: myself@foo.bar<br>Subject: This is the plain subject<br><br>This is a boring and plain message.</pre>

<p>

</p>

<p>
More complex MIME messages can also be decoded. Look at this
example where I dissect a dreaded piece of junk mail, but don't
worry. I used <tt>head</tt> to spare you pages and pages of worthless
image links:</p>
<pre class="console">bash-2.05a$ <strong>./mfetch -s pop.foo.bar -u myself \<br>            -p very_secure_password -m 2 | head -20</strong><br><br>Date: Sun, 22 Dec 2002 23:22:25 -0400<br>From: Luis Muoz &lt;lem@foo.bar&gt;<br>To: Myself &lt;myself@foo.bar&gt;<br>Subject: Fwd: Get $860 Free - Come, Play, Have Fun!<br><br><br><br>Begin forwarded message:<br><br>&gt; From: Cosmic Offers &lt;munged@migada.com.INVALID&gt;;<br>&gt; Date: Sun Dec 22, 2002  20:59:43 America/Caracas<br>&gt; To: spam@victim.net<br>&gt; Subject: Get $860 Free - Come, Play, Have Fun!<br>&gt;<br><br>&gt;<br>[IMG http://www.migada.com/email/Flc_600_550_liberty_mailer_.gif]<br>[IMG http://www.migada.com/email/Flc_600_550_liberty_mail-02.gif]<br>[IMG http://www.migada.com/email/Flc_600_550_liberty_mail-03.gif]<br>[IMG http://www.migada.com/email/Flc_600_550_liberty_mail-04.gif]</pre>

<p>

</p>

<p>
If you're curious, please <a href="http://mipagina.cantv.net/lem/perl/mfetch">download the complete
script</a> and play with it a bit. I hope this tutorial and its
related script to be as helpful for you as it has been for me</p></div>
<!-- $Id: footer,v 1.1.1.1 2005/01/01 12:02:45 lem Exp $ --><img src ="http://www.blogjava.net/pyguru/aggbug/1312.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-19 00:33 <a href="http://www.blogjava.net/pyguru/archive/2005/02/19/1312.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>专业电子书</title><link>http://www.blogjava.net/pyguru/archive/2005/02/18/1297.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Thu, 17 Feb 2005 22:33:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/18/1297.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1297.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/18/1297.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1297.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1297.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 专业电子书             美国化学文摘查阅法    包括期索引、期的内容格式及沿革、期文摘、期索引、卷（文摘）索引、卷（辅助）索引、指导性索引、累计索引、资料来源索引、CA勘误、化学物质索引中索引标题的选择原则、CA索引查阅原则及索引关系表、CA查阅实例讨论及附录。共计194页。彭海卿1978年编，较老，但仍然有参考价值。因为是超星格式，文件体积较大，...&nbsp;&nbsp;<a href='http://www.blogjava.net/pyguru/archive/2005/02/18/1297.html'>阅读全文</a><img src ="http://www.blogjava.net/pyguru/aggbug/1297.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-18 06:33 <a href="http://www.blogjava.net/pyguru/archive/2005/02/18/1297.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>BioJava In Anger</title><link>http://www.blogjava.net/pyguru/archive/2005/02/18/1296.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Thu, 17 Feb 2005 22:32:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/18/1296.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1296.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/18/1296.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1296.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1296.html</trackback:ping><description><![CDATA[

<div class="Section1">

<p class="MsoNormal"><span style="display: none;" lang="EN-US"><o:p>&nbsp;</o:p></span></p>

<table class="MsoNormalTable" style="width: 100%;" border="0" cellpadding="0" cellspacing="0" width="100%">
 <tbody><tr style="">
  <td style="padding: 0cm; width: 100%;" valign="top" width="100%">
  <h1 style="text-align: center;" align="center"><span class="SpellE"><span style="font-size: 36pt;" lang="EN-US">BioJava</span></span><span style="font-size: 36pt;" lang="EN-US"> <span class="GramE">In</span> Anger</span></h1>
  <h5 style="text-align: center;" align="center"><span style="font-size: 24pt;">快速指南</span></h5>
  <div class="MsoNormal" style="text-align: center;" align="center"><span lang="EN-US">
  <hr align="center" size="2" width="100%">
  </span></div>
  <h4><span style="font-size: 18pt;">介绍：</span></h4>
  <p><span class="SpellE"><span style="font-size: 13.5pt;" lang="EN-US">BioJava</span></span><span style="font-size: 13.5pt;" lang="EN-US"> </span><span style="font-size: 13.5pt;">的设计涵盖了生物信息学的很多方面，本身就比较复杂和庞大，有时候甚至令人生畏。对于那些想快速了解并且利用这个强大的工具的生物信息学家们来说，有时候面对这一大堆的接口常常会头痛欲裂。本指南能够帮助你利用<span class="SpellE"><span lang="EN-US">BioJava</span></span>开发<span lang="EN-US">99</span>％常用的程序，而不必为此掌握<span lang="EN-US">99</span>％的<span class="SpellE"><span lang="EN-US">BioJava</span></span>接口。</span></p>
  <p><span style="font-size: 13.5pt;">本指南使用多数编程快速指南的格式，采用“我如何使用<span lang="EN-US">.....</span>”的主题形式来帮助大家使用<span class="SpellE"><span lang="EN-US">BioJava</span></span>。每个主题都提供你可能会期望并经常使用的源代码。这些代码基本上可以直接在你的机器中编译运行。我尽量详细注释这些代码，让大家更容易理解程序中可能的一些晦涩代码。</span></p>
  <p><span style="font-size: 13.5pt;">“<span class="SpellE"><span lang="EN-US">BioJava</span></span><span lang="EN-US"> In Anger</span>”由<span lang="EN-US">Mark Schreiber</span>维护。任何建议和问题请联系<span lang="EN-US"><a href="mailto:biojava-l@biojava.org">biojava mailing list</a></span>。点击<span lang="EN-US"><a href="http://biojava.org/mailman/listinfo/biojava-l"><span lang="EN-US"><span lang="EN-US">这里</span></span></a></span>订阅邮件列表。</span></p>
  <p><span style="font-size: 13.5pt;">本指南使用的例子经过<span lang="EN-US">BioJava1.3,
  Java1.4</span>测试。</span></p>
  <p><span style="font-size: 13.5pt;">本指南中文版由<span lang="EN-US"> Wu <span class="SpellE">Xin</span>(<a href="http://www.cbi.pku.edu.cn/">Center of
  Bioinformatics,Peking University</a>)</span>翻译<span lang="EN-US">,</span>任何翻译问题请通过邮件列表联系或者登录<span lang="EN-US"><a href="http://bbs.cbi.pku.edu.cn/">BBS</a></span>。</span></p>
  <div class="MsoNormal" style="text-align: center;" align="center"><span lang="EN-US">
  <hr align="center" size="2" width="100%">
  </span></div>
  <h2 style="text-align: center;" align="center">我如何使用<span lang="EN-US">.......?</span></h2>
  <h2><span style="font-size: 13.5pt;">安装</span></h2>
  <p><span style="font-size: 13.5pt;" lang="EN-US">&gt; <a href="http://java.sun.com/downloads/"><span lang="EN-US"><span lang="EN-US">安装Java</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US">&gt; <a href="http://www.biojava.org/docs/started.html"><span lang="EN-US"><span lang="EN-US">安装BioJava</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">成分表<span lang="EN-US">(alphabets)</span>和标记<span lang="EN-US">(symbol)</span></span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US">&gt;<a href="http://www.digitalgene.net/archives/Cookbook/biojava/1.html"><span lang="EN-US"><span lang="EN-US">我如何得到DNA,RNA</span></span><span lang="EN-US"><span lang="EN-US">或蛋白质的成分表?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US">&gt;<a href="http://www.digitalgene.net/archives/Cookbook/biojava/2.html"><span lang="EN-US"><span lang="EN-US">我如何用自定义的<span lang="EN-US">标记建立自定义的成分表?</span></span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US">&gt;<a href="http://www.digitalgene.net/archives/Cookbook/biojava/3.html"><span lang="EN-US"><span lang="EN-US">我如何建立杂交产物成分表(cross product alphabet),</span></span><span lang="EN-US"><span lang="EN-US">例如密码字成分表(codon alphabet)?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US">&gt;<a href="http://www.digitalgene.net/archives/Cookbook/biojava/4.html"><span lang="EN-US"><span lang="EN-US">我如何从杂交产物成分表(cross product alphabet)</span></span><span lang="EN-US"><span lang="EN-US">中分解出他们的组成标记(component symbol)?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/5.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何判别两个成分表或两个标记是否相同?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/6.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何建立一个多义标记(ambiguous symbol),</span></span><span lang="EN-US"><span lang="EN-US">例如Y</span></span><span lang="EN-US"><span lang="EN-US">或R?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">基本序列操作</span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/7.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何从字串中创建一条序列对象以及将其写回一条字串?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/8.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何从一条序列中得到子序列?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/9.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将DNA</span></span><span lang="EN-US"><span lang="EN-US">序列转录到RNA</span></span><span lang="EN-US"><span lang="EN-US">序列?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/10.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何得到一条DNA</span></span><span lang="EN-US"><span lang="EN-US">或RNA</span></span><span lang="EN-US"><span lang="EN-US">序列的互补链?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/11.html">&gt;<span lang="EN-US"><span lang="EN-US">序列是不可变的(immutable),</span></span><span lang="EN-US"><span lang="EN-US">我如何改变它的名字?</span></span></a></span></p>
  <p><span lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/12.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何编辑一条序列或者标记链(symbollist)?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">翻译<span lang="EN-US">(translation)</span></span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/13.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将一条DNA</span></span><span lang="EN-US"><span lang="EN-US">或RNA</span></span><span lang="EN-US"><span lang="EN-US">或标记链翻译成蛋白质?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/14.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将单个密码子翻译成单个氨基酸?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/15.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何使用一个非标准翻译表?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">序列输入输出<span lang="EN-US">(sequence I/O)</span></span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/16.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将序列以FASTA</span></span><span lang="EN-US"><span lang="EN-US">格式输出?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/17.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何读取FASTA</span></span><span lang="EN-US"><span lang="EN-US">格式的文件?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/18.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何读取GenBank/EMBL/SwissProt</span></span><span lang="EN-US"><span lang="EN-US">格式的文件</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/19.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何从GenBank/EMBL/SwissProt</span></span><span lang="EN-US"><span lang="EN-US">格式中抽取序列并且以FASTA</span></span><span lang="EN-US"><span lang="EN-US">格式输出?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/20.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将ABI</span></span><span lang="EN-US"><span lang="EN-US">序列转化为BioJava</span></span><span lang="EN-US"><span lang="EN-US">序列?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">注释<span lang="EN-US">(annotation)</span></span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/21.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将一条序列的注释列出来?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/22.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何用物种这个参数(</span></span><span lang="EN-US"><span lang="EN-US">或其他注释属性)</span></span><span lang="EN-US"><span lang="EN-US">来筛选序列?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">位置和特征<span lang="EN-US">(location and
  feature)</span></span></h4>
  <p><span lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/23.html">&gt;<span style="font-size: 13.5pt;" lang="EN-US"><span lang="EN-US">我如何指定一个点位置(point location)?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/24.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何指定一个<span class="GramE"><span lang="EN-US">域位置</span></span>(range
  location)?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/25.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何使用环状位置(circular location)?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/26.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何建立一个特征(feature)?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/27.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何以类型为参数筛选特征?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/28.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何删除特征?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;" lang="EN-US">BLAST</span><span style="font-size: 13.5pt;">和<span lang="EN-US">FASTA</span></span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/29.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何创建一个BLAST</span></span><span class="GramE"><span lang="EN-US"><span lang="EN-US">解析器</span></span></span>?</a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/30.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何创建一个FASTA</span></span><span class="GramE"><span lang="EN-US"><span lang="EN-US">解析器</span></span></span>?</a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/31.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何从解析结果中抽取信息?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">计数和分布<span lang="EN-US">(count and
  distribution)</span></span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/32.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何计算序列中的残基数?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/index-cn.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何计算序列中某种标记(symbol)</span></span><span lang="EN-US"><span lang="EN-US">的频率?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/34.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将计数转为分布?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/35.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何从一种分布中创建一条随机序列?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/36.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何从一种分布中计算熵值?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/37.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何能找到一种简单的方法来判断两种分布是否具有相同的权重?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/38.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何对一个自定义的成分表创建一个N</span></span><span lang="EN-US"><span lang="EN-US">阶分布(order N distribution)?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/39.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将一种分布以XML</span></span><span lang="EN-US"><span lang="EN-US">格式输出?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">权重矩阵和动态规划<span lang="EN-US">(weight matrix
  and dynamic programming)</span></span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/40.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何利用一个权重矩阵<span class="GramE"><span lang="EN-US">寻找模体</span></span>?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/41.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何创建<span class="GramE"><span lang="EN-US">一个隐马模型</span></span></span></span><span lang="EN-US"><span lang="EN-US">谱(profile HMM)?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/42.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何建立一个自定义<span class="GramE"><span lang="EN-US">的隐马模型</span></span>(HMM)?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;">用户界面<span lang="EN-US">(user interfaces)</span></span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/43.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何将注释和特征以树状形式显示?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/44.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何在GUI</span></span><span lang="EN-US"><span lang="EN-US">中显示一条序列?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/45.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何显示序列标尺?</span></span></a></span></p>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/46.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何显示特征?</span></span></a></span></p>
  <h4><span style="font-size: 13.5pt;" lang="EN-US">OBDA</span></h4>
  <p><span style="font-size: 13.5pt;" lang="EN-US"><a href="http://www.digitalgene.net/archives/Cookbook/biojava/47.html">&gt;<span lang="EN-US"><span lang="EN-US">我如何设置BioSQL?</span></span></a></span></p>
  <div class="MsoNormal" style="text-align: center;" align="center"><span lang="EN-US">
  <hr align="center" size="2" width="100%">
  </span></div>
  <h4><span style="font-size: 13.5pt;">免责声明<span lang="EN-US">:</span></span></h4>
  <p><span style="font-size: 13.5pt;">这些源代码由各个作者贡献<span lang="EN-US">,</span>尽管经过我们测试<span lang="EN-US">,</span>但仍然可能有错误发生。所有代码可以免费使用，但我们并不保证和负责代码的正确性。在使用前，请自我测试。</span></p>
  <div class="MsoNormal" style="text-align: center;" align="center"><span lang="EN-US">
  <hr align="center" size="2" width="100%">
  </span></div>
  <h4><span style="font-size: 13.5pt;">版权：</span></h4>
  <p><span style="font-size: 13.5pt;">本站文档属于其贡献者。如果在出版物中使用，请先垂询<span lang="EN-US"><a href="mailto:biojava-l@biojava.org">biojava mailing list</a></span>。源代码是<span lang="EN-US"><a href="http://www.opensource.org/docs/definition_plain.php"><span lang="EN-US"><span lang="EN-US">开放资源</span></span></a>,</span>如果你同意其声明则可以免费使用。</span></p>
  <h5 style="text-align: center;" align="center"><span lang="EN-US"><o:p>&nbsp;</o:p></span></h5>
  </td>
 </tr>
 <tr style="height: 15pt;">
  <td style="padding: 0cm; background: red none repeat scroll 0% 50%; -moz-background-clip: initial; -moz-background-origin: initial; -moz-background-inline-policy: initial; height: 15pt;">
  <p style="text-align: center;" align="center"><span lang="EN-US">&nbsp;</span><span class="SpellE"><b><span style="color: white;" lang="EN-US">Maintainted</span></b></span><b><span style="color: white;" lang="EN-US"> by Wu <span class="SpellE">Xin</span>, CBI,
  Peking University, China, 2003</span></b></p>
  </td>
 </tr>
</tbody></table>

<p><span lang="EN-US">&nbsp;</span></p>

</div>


<!-- #EndTemplate --><img src ="http://www.blogjava.net/pyguru/aggbug/1296.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-18 06:32 <a href="http://www.blogjava.net/pyguru/archive/2005/02/18/1296.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Bioperl简介</title><link>http://www.blogjava.net/pyguru/archive/2005/02/18/1295.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Thu, 17 Feb 2005 22:29:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/18/1295.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1295.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/18/1295.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1295.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1295.html</trackback:ping><description><![CDATA[<span id="ArticleContent1_ArticleContent1_lblContent">
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Bioperl
最近已经到了1.0版,先说bioperl.org,该组织正式成立于1995年,在此之前已经作为非正式的团体存在那很多年,现在他已经形成了一个国际
性的开发者的协会,这些开发者开发用于生物信息学,基因组学,和生命科学研究的开放源码的Perl 工具.</p>
<p>&nbsp;该组织的支持者和推动者是Open Bioinformatics Foundation. 他们的伙伴还有biojava.org,
biopython.org, DAS, bioruby.org, biocorba.org, ENSEMBL 和 EMBOSS. </p>
<p>Bioperl的服务器提供供下列服务,用于生命科学的基于perl的模块,脚本,web联接的软件. </p>
<p>Bioperl现在已发展成为一个令人瞩目的国际性的自由软件开发计划，bioperl在生物信息学的使用加速了生物信息学、基因组学以及其他生命
科学研究的发展。最近bioperl 1.0版本正式发布，这其间历时七年，成绩斐然。Bioperl 1.0
包括832个文件，93个Script，功能丰富，源码全部开放。它是生物信息学研究的利器。详细的内容大家可以访问<a href="http://www.bioperl.org/">www.bioperl.org</a>。</p>
<p>Bioperl作为perl的扩充的专门用于生物信息的工具与函数集,自然也继承了perl的众多优点.</p>
<p>第一. Perl强大的正则表示式(regular
expression)比对以及字符串操作使这个工作变得简单而没有其它语言能相比。Perl
非常擅长于切割，扭转，绞，弄平，总结，以及其它的操作文字文件。生物资料大部分是文字文件:物种名称,种属关系，基因或序列的注解，评住，目录查阅,
甚至DNA序列也是类文字的。现在互相交换以以文字文件的形式存在的但是具有不兼容的资料格式生物信息资料是一个很头疼的问题,perl的这个方面的优
点,可以在这一方面解决不少问题.</p>
<p>第二. Perl 能容错。生物资料通常是不完全的，错误或者说误差从数据的产生时候可能就产生了.另外生物数据的某项值栏位可以被忽略
,可能是空着的，或是某个栏位也就是某个值,被预期要出现好几次(举例来说，一个实验可能被重复的操作)，或是资料以手动输入所以有错误。Perl并不介
意某个值是空的或是有奇怪的字符。正规表示式能够被写成取出并且更正错误的一般错误。当然这种弹性也可能是各坏处。 </p>
<p><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 还有,Perl 是组件导向的。Perl 鼓励人们将他们的软件写成小模组，不论是用 Perl 函式库模组或是正统的
Unix 工具导向的方式。外部程序能够轻易的被整合进 Perl 程序,靠着管道(pipe),系统呼叫,或是插座(socket)。Perl5
引进的动态载入器允许人们使用 C 的函式，或者让整个编程过的函式库，被使用在 Perl
直译器中。最近的成果是世界各地的智能结晶都会收录在一组模组里面，称为”bioPerl”（请参考 Perl Journal）<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Perl
很容易去写并且能很快开发完。直译器让你不需要宣告你所有的函数型式以及资料型态，当未定义的函式被呼叫时只会引起一个错误，除错器也能与Emacs很好
的合作并且让你能用令人舒服的交谈式的开发模式。<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Perl 是良好的原型语言。因为它快而且脏(quick and dirty)，用 Perl
建构新演算的原型比直接写成一个快的需要编程过的语言来的有意义。有时候发现结果是Perl已经够快了，所以程序变不需要移植;更多情形是某人可以用C写
一个小的核心程序，编程成动态载入的模组或是外部的可执行程序，然后其它的部分用Perl来完成。这部分的例子可以参考 <a href="http://waldo.wi.mit.edu/ftp/distribution/software/rhmapper/%29%E3%80%82">http://waldo.wi.mit.edu/ftp/distribution/software/rhmapper/)。</a> </p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 有一点要强调的是, Perl 在写作网页 CGI
方面非常优秀，而且重要性随着各实验将资料发表在网络上之后更是增加。我在基因中心环境下使用 Perl 的经验从头到尾都是值得称赞的。然而我发现
Perl 也有它的问题。它的松散的程序风格导致许多错误，这些在其它严格的语言都会被抓到。举例来说，Perl
让你在一个变数在被指定值之前就能使用，这是个很有用的特性当你需要的时候，但是却是一个灾难当你单纯的打错了辨识名称。同样的，很容易忘记要宣告一个函
式里面的区域变数，导致不小心地改到了全域变数。<br>&nbsp;&nbsp;&nbsp; 最后，Perl 的不足之处在于建立图形化的使用者接口。虽然 Unix忠实信徒所有事情都能在命令模式下完成，大多数的终端使用者却不同意。视窗，选单，弹跳的图案已经变成了必要的时尚。</p>
<p><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 直到最近，直到最近，Perl 的使用者界面(GUI)发展仍是不成熟的。然而 Nick
Ing-Simmons的努力使得 perlTK(pTK) 的整合使得以 Perl 驱动的使用者接口在
X-window上面成为可能。我的伙伴和我曾经在 MIT 基因中心写过几个 pTK
为基础的应用程序供互连网使用者，而且从头到尾都是一个令人满意的经验。其它的基因中心则更大规模的使用 pTK，在某些地方已经成为主要的生产力。</p></span><img src ="http://www.blogjava.net/pyguru/aggbug/1295.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-18 06:29 <a href="http://www.blogjava.net/pyguru/archive/2005/02/18/1295.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>生物信息学 -- 黄英武 （解放军306医院） 过涛（清华大学生物信息学研究所）</title><link>http://www.blogjava.net/pyguru/archive/2005/02/18/1294.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Thu, 17 Feb 2005 22:27:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/18/1294.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1294.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/18/1294.html#Feedback</comments><slash:comments>3</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1294.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1294.html</trackback:ping><description><![CDATA[<b>
<h1>生物信息学</h1></b>
<p align="justify"><b><font face="宋体" lang="ZH-CN">撰稿人：黄英武 （解放军306医院） 
过涛（清华大学生物信息学研究所）</font></b></p>

<p align="justify"><b><font face="宋体" lang="ZH-CN">审稿人：孙之荣（清华大学生物信息学研究所）</font></b></p>

<dir>
<p><b>1 <a href="http://www.digitalgene.net/archives/bi_hyw/1.htm">概述</a></b></p>
<p><font face="宋体" lang="ZH-CN"><b>2 <a href="http://www.digitalgene.net/archives/bi_hyw/2.htm">生物信息数据库与查询 </a></b></font></p>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/2.htm"><font face="宋体" lang="ZH-CN"><b>2.1 基因和基因组数据库</b> </font></a></p></blockquote>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/2.htm"><font face="宋体" lang="ZH-CN"><b>2.2 蛋白质数据库</b> </font></a></p></blockquote>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/2.htm"><font face="宋体" lang="ZH-CN"><b>2.3 功能数据库</b> </font></a></p></blockquote>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/2.htm"><font face="宋体" lang="ZH-CN"><b>2.4 其它数据库资源</b> </font></a></p></blockquote>
<p><font face="宋体" lang="ZH-CN"><b>3 <a href="http://www.digitalgene.net/archives/bi_hyw/3.htm">序列比对和数据库搜索</a></b></font><a href="http://www.digitalgene.net/archives/bi_hyw/3.htm"><font face="宋体" lang="ZH-CN"> </font></a></p>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/3.htm"><font face="宋体" lang="ZH-CN"><b>3.1 序列两两比对</b> </font></a></p></blockquote>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/3.htm"><font face="宋体" lang="ZH-CN"><b>3.2 多序列比对</b></font></a><font face="宋体" lang="ZH-CN"> 
  </font></p></blockquote>
<p><font face="宋体" lang="ZH-CN"><b>4 <a href="http://www.digitalgene.net/archives/bi_hyw/4.htm">核酸与蛋白质结构和功能的预测分析</a></b></font><a href="http://www.digitalgene.net/archives/bi_hyw/4.htm"><font face="宋体" lang="ZH-CN"> </font></a></p>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/4.htm"><font face="宋体" lang="ZH-CN"><b>4.1 针对核酸序列的预测方法</b> </font></a></p>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/4.htm"><font face="宋体" lang="ZH-CN"><b>4.2 针对蛋白质的预测方法</b></font></a><font face="宋体" lang="ZH-CN"> 
  </font></p></blockquote><font face="宋体" lang="ZH-CN">
<p><b>5 <a href="http://www.digitalgene.net/archives/bi_hyw/5.htm">分子进化</a></b> 
</p>
<p><font face="宋体" lang="ZH-CN"><b>6 <a href="http://www.digitalgene.net/archives/bi_hyw/6.htm">基因组序列信息分析</a></b></font><a href="http://www.digitalgene.net/archives/bi_hyw/6.htm"><font face="宋体" lang="ZH-CN"> </font></a></p></font>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/6.htm"><font face="宋体" lang="ZH-CN"><b>6.1 基因组序列分析工具</b> </font></a></p>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/6.htm"><font face="宋体" lang="ZH-CN"><b>6.2 人类和鼠类公共物理图谱的使用</b> </font></a></p>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/6.htm"><font face="宋体" lang="ZH-CN"><b>6.3 SNPs识别</b> </font></a></p>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/6.htm"><font face="宋体" lang="ZH-CN"><b>6.4 全基因组比较</b> </font></a></p>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/6.htm"><font face="宋体" lang="ZH-CN"><b>6.5 EST序列应用</b> </font></a></p></blockquote><font face="宋体" lang="ZH-CN">
<p><b>7 <a href="http://www.digitalgene.net/archives/bi_hyw/7.htm">功能基因组相关信息分析</a></b><a href="http://www.digitalgene.net/archives/bi_hyw/7.htm"> </a></p></font>
<blockquote>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/7.htm"><font face="宋体" lang="ZH-CN"><b>7.1 大规模基因表达谱分析</b> </font></a></p>
  <p><a href="http://www.digitalgene.net/archives/bi_hyw/7.htm"><font face="宋体" lang="ZH-CN"><b>7.2 基因组水平蛋白质功能综合预测</b></font></a><font face="宋体" lang="ZH-CN"> </font></p></blockquote><font face="宋体" lang="ZH-CN">
<p><b><a href="http://www.digitalgene.net/archives/bi_hyw/8.htm">参考文献</a></b><a href="http://www.digitalgene.net/archives/bi_hyw/8.htm"> </a></p></font><font face="宋体" lang="ZH-CN" size="3">

<p align="justify">　</p></font></dir>
<img src ="http://www.blogjava.net/pyguru/aggbug/1294.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-18 06:27 <a href="http://www.blogjava.net/pyguru/archive/2005/02/18/1294.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>CGI::Carp - CGI routines for writing to the HTTPD (or other) error log</title><link>http://www.blogjava.net/pyguru/archive/2005/02/18/1293.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Thu, 17 Feb 2005 21:27:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/18/1293.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1293.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/18/1293.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1293.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1293.html</trackback:ping><description><![CDATA[

<!-- INDEX BEGIN -->

<ul>
<li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#NAME">NAME</a>
	</li><li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#SYNOPSIS">SYNOPSIS</a>
	</li><li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#DESCRIPTION">DESCRIPTION</a>
	</li><li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#REDIRECTING_ERROR_MESSAGES">REDIRECTING ERROR MESSAGES</a>
	</li><li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#MAKING_PERL_ERRORS_APPEAR_IN_THE">MAKING PERL ERRORS APPEAR IN THE BROWSER WINDOW</a>
	<ul><li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#Changing_the_default_message">Changing the default message</a>
	</li></ul>

	</li><li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#CHANGE_LOG">CHANGE LOG</a>
	</li><li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#AUTHORS">AUTHORS</a>
	</li><li><a href="http://www.perl.com/doc/manual/html/lib/CGI/Carp.html#SEE_ALSO">SEE ALSO</a>
</li>
</ul>

<!-- INDEX END -->

<hr>
<p>
</p>
<h1><a name="NAME">NAME</a></h1>

<p>
<strong>CGI::Carp</strong> - 
<font size="-1">CGI</font> routines for writing to the 
<font size="-1">HTTPD</font> (or other) error log

</p>
<p>
</p>
<hr>
<h1><a name="SYNOPSIS">SYNOPSIS</a></h1>

<p>
</p>
<pre>    use CGI::Carp;<br></pre>

<p>
</p>
<pre>    croak "We're outta here!";<br>    confess "It was my fault: $!";<br>    carp "It was your fault!";   <br>    warn "I'm confused";<br>    die  "I'm dying.\n";<br></pre>

<p>
</p>
<hr>
<h1><a name="DESCRIPTION">DESCRIPTION</a></h1>

<p>

<font size="-1">CGI</font> scripts have a nasty habit of leaving warning
messages in the error logs that are neither time stamped nor fully
identified. Tracking down the script that caused the error is a pain. This
fixes that. Replace the usual

</p>
<p>
</p>
<pre>    use Carp;<br></pre>

<p>
with

</p>
<p>
</p>
<pre>    use CGI::Carp<br></pre>

<p>
And the standard 
<code>warn(),</code> die (), 
<code>croak(),</code> 
<code>confess()</code> and 
<code>carp()</code> calls will automagically be replaced with functions that write out nicely time-stamped messages to the 
<font size="-1">HTTP</font> server error log.

</p>
<p>
For example:

</p>
<p>
</p>
<pre>   [Fri Nov 17 21:40:43 1995] test.pl: I'm confused at test.pl line 3.<br>   [Fri Nov 17 21:40:43 1995] test.pl: Got an error message: Permission denied.<br>   [Fri Nov 17 21:40:43 1995] test.pl: I'm dying.<br></pre>

<p>
</p>
<hr>
<h1><a name="REDIRECTING_ERROR_MESSAGES">REDIRECTING ERROR MESSAGES</a></h1>

<p>
By default, error messages are sent to 
<font size="-1">STDERR.</font> Most 
<font size="-1">HTTPD</font> servers direct 
<font size="-1">STDERR</font> to the server's error log. Some
applications may wish to keep private error logs, distinct from the
server's error log, or they may wish to direct error messages to <font size="-1">STDOUT</font> so that the browser will receive them.

</p>
<p>
The <code>carpout()</code> function is provided for this purpose. Since 
<code>carpout()</code> is not exported by
default, you must import it explicitly by saying

</p>
<p>
</p>
<pre>   use CGI::Carp qw(carpout);<br></pre>

<p>
The 
<code>carpout()</code> function requires
one argument, which should be a reference to an open filehandle for writing
errors. It should be called in a <code>BEGIN</code> block at the top of the 
<font size="-1">CGI</font> application so that compiler errors will be
caught. Example:

</p>
<p>
</p>
<pre>   BEGIN {<br>     use CGI::Carp qw(carpout);<br>     open(LOG, "&gt;&gt;/usr/local/cgi-logs/mycgi-log") or<br>       die("Unable to open mycgi-log: $!\n");<br>     carpout(LOG);<br>   }<br></pre>

<p>

<code>carpout()</code> does not handle
file locking on the log for you at this point.

</p>
<p>
The real 
<font size="-1">STDERR</font> is not closed -- it is moved to 
<font size="-1">SAVEERR.</font> Some servers, when dealing with 
<font size="-1">CGI</font> scripts, close their connection to the browser when the script closes 
<font size="-1">STDOUT</font> and 
<font size="-1">STDERR.</font> 
<font size="-1">SAVEERR</font> is used to prevent this from happening prematurely.

</p>
<p>
You can pass filehandles to 
<code>carpout()</code> in a variety of ways. The ``correct'' way according to Tom Christiansen is to pass a reference to a filehandle 
<font size="-1">GLOB:</font>

</p>
<p>
</p>
<pre>    carpout(\*LOG);<br></pre>

<p>
This looks weird to mere mortals however, so the following syntaxes are
accepted as well:

</p>
<p>
</p>
<pre>    carpout(LOG);<br>    carpout(main::LOG);<br>    carpout(main'LOG);<br>    carpout(\LOG);<br>    carpout(\'main::LOG');<br></pre>

<p>
</p>
<pre>    ... and so on<br></pre>

<p>
FileHandle and other objects work as well.

</p>
<p>
Use of 
<code>carpout()</code> is not great for performance, so it is recommended for debugging purposes or for moderate-use applications. 
<font size="-1">A</font> future version of this module may delay redirecting 
<font size="-1">STDERR</font> until one of the CGI::Carp methods is called to prevent the performance hit.

</p>
<p>
</p>
<hr>
<h1><a name="MAKING_PERL_ERRORS_APPEAR_IN_THE">MAKING PERL ERRORS APPEAR IN THE BROWSER WINDOW</a></h1>

<p>
If you want to send fatal (die, confess) errors to the browser, ask to
import the special ``fatalsToBrowser'' subroutine:

</p>
<p>
</p>
<pre>    use CGI::Carp qw(fatalsToBrowser);<br>    die "Bad error here";<br></pre>

<p>
Fatal errors will now be echoed to the browser as well as to the log. CGI::Carp arranges to send a minimal 
<font size="-1">HTTP</font> header to the browser so that even errors
that occur in the early compile phase will be seen. Nonfatal errors
will still be directed to the log file only (unless redirected with
carpout).
</p>
<p>
</p>
<hr>
<!-- <A NAME="Changing_the_default_message">Changing the default message</A> -->
<p>By default, the software error message is followed by a note to
contact the Webmaster by e-mail with the time and date of the error. If
this message is not to your liking, you can change it using the <code>set_message()</code> routine. This is not imported by default; you should import it on the 
<code>use()</code> line:

</p>
<p>
</p>
<pre>    use CGI::Carp qw(fatalsToBrowser set_message);<br>    set_message("It's not a bug, it's a feature!");<br></pre>

<p>
You may also pass in a code reference in order to create a custom error
message. At run time, your code will be called with the text of the error
message that caused the script to die. Example:

</p>
<p>
</p>
<pre>    use CGI::Carp qw(fatalsToBrowser set_message);<br>    BEGIN {<br>       sub handle_errors {<br>          my $msg = shift;<br>          print "&lt;h1&gt;Oh gosh&lt;/h1&gt;";<br>          print "Got an error: $msg";<br>      }<br>      set_message(\&amp;handle_errors);<br>    }<br></pre>

<p>
In order to correctly intercept compile-time errors, you should call 
<code>set_message()</code> from within a 
<font size="-1">BEGIN{}</font> block.

</p>
<p>
</p>
<hr>
<h1><a name="CHANGE_LOG">CHANGE LOG</a></h1>

<p>
1.05 
<code>carpout()</code> added and minor
corrections by Marc Hedlund &lt;<a href="mailto:hedlund@best.com">hedlund@best.com</a>&gt; on 11/26/95.

</p>
<p>
1.06 
<code>fatalsToBrowser()</code> no longer aborts for fatal errors within 
<code>eval()</code> statements.

</p>
<p>
1.08 
<code>set_message()</code> added and 
<code>carpout()</code> expanded to allow for FileHandle objects.

</p>
<p>
1.09 
<code>set_message()</code> now allows users to pass a code 
<font size="-1">REFERENCE</font> for really custom error messages. croak and carp are now exported by default. Thanks to Gunther Birznieks for the patches.

</p>
<p>
1.10 Patch from Chris Dean (<a href="mailto:ctdean@cogit.com">ctdean@cogit.com</a>) to allow module to run
correctly under mod_perl.

</p>
<p>
</p>
<hr>
<h1><a name="AUTHORS">AUTHORS</a></h1>

<p>
Lincoln 
<font size="-1">D.</font> Stein &lt;<a href="mailto:lstein@genome.wi.mit.edu">lstein@genome.wi.mit.edu</a>&gt;
Feel free to redistribute this under the Perl Artistic License.

</p>
<p>
</p>
<hr>
<h1><a name="SEE_ALSO">SEE ALSO</a></h1>

<p>
Carp, CGI::Base, CGI::BasePlus, CGI::Request, CGI::MiniSvr, CGI::Form,
CGI::Response

</p>
<hr>
<h1>DISCLAIMER</h1>
 
We are painfully aware that these documents may contain incorrect links and
misformatted HTML.  Such bugs lie in the automatic translation process
that automatically created the hundreds and hundreds of separate documents that you find here.  Please <b>do
not report</b> link or formatting bugs, because we cannot fix
per-document problems.  The only bug reports that will help us are those
that supply working patches to the <i>installhtml</i> or <i>pod2html</i>
programs, or to the <tt>Pod::HTML</tt> module itself, for which I and the entire
Perl community will shower you with thanks and praises.  
<p>
If rather than formatting bugs, you encounter substantive content errors in these documents, such as mistakes in
the explanations or code, please use the <i>perlbug</i> utility included
with the Perl distribution.
</p>
<p>
</p>
<dl>
<dd>--Tom Christiansen, Perl Documentation Compiler and Editor</dd>
</dl>
 
<p>
</p>
<hr>
Return to the <a href="http://www.perl.com/doc/manual/html/index.html">Perl Documentation Index</a>.
<br>

Return to the <a href="http://www.perl.com/">Perl Home Page</a>.
<img src ="http://www.blogjava.net/pyguru/aggbug/1293.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-18 05:27 <a href="http://www.blogjava.net/pyguru/archive/2005/02/18/1293.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Perl 5 by Example</title><link>http://www.blogjava.net/pyguru/archive/2005/02/18/1292.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Thu, 17 Feb 2005 19:56:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/18/1292.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1292.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/18/1292.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1292.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1292.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: Perl 5by Exampleby David MedinetsC&nbsp;&nbsp;O&nbsp;&nbsp;N&nbsp;&nbsp;T&nbsp;&nbsp;E&nbsp;&nbsp;N&nbsp;&nbsp;T&nbsp;&nbsp;SChapter 1...&nbsp;&nbsp;<a href='http://www.blogjava.net/pyguru/archive/2005/02/18/1292.html'>阅读全文</a><img src ="http://www.blogjava.net/pyguru/aggbug/1292.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-18 03:56 <a href="http://www.blogjava.net/pyguru/archive/2005/02/18/1292.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Perl: The Carp Module</title><link>http://www.blogjava.net/pyguru/archive/2005/02/18/1291.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Thu, 17 Feb 2005 19:49:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/18/1291.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1291.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/18/1291.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1291.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1291.html</trackback:ping><description><![CDATA[<h3><a name="ExampleTheTTFONTSIZEFACECourierCarpFONTTTFONTSIZEModuleFONT">

Example: The <tt>Carp</tt>

Module</a></h3>


<p>

This useful little module lets you do a better job of analyzing

runtime errors-like when your script can't open a file or when

an unexpected input value is found. It defines the <tt>carp()</tt>,

<tt>croak()</tt>, and <tt>confess()</tt>

fuNCtions. These are similar to <tt>warn()</tt>

and <tt>die()</tt>. However, instead

of reported in the exact script line where the error occurred,

the fuNCtions in this module will display the line number that

called the fuNCtion that generated the error. Confused? So was

I, until I did some experimenting. The results of that experimenting

can be found in Listing 15.6.

</p>
<p>



<img src="http://octopus.cdut.edu.cn/%7Eyf17/perl5ex/pseudo.gif" tppabs="http://202.113.16.101/%7eeb%7e/Perl%205%20By%20Example/pseudo.gif" align="right" border="1"></p>
<p>

</p>
<blockquote>

<i>Load the Carp module.<br>

Invoke the strict pragma.<br>

Start the Foo namespace.<br>

Define the </i><tt><i>foo()</i></tt><i>

fuNCtion.<br>

Call the </i><tt><i>carp()</i></tt><i>

fuNCtion.<br>

Call the </i><tt><i>croak()</i></tt><i>

fuNCtion.<br>

Switch to the main namespace.<br>

Call the </i><tt><i>foo()</i></tt><i>

fuNCtion.</i>

</blockquote>


<hr>

<blockquote>

<b>Listing 15.6&nbsp;&nbsp;15LST06.PL-Using the </b><tt><b><font face="Courier">carp()</font></b></tt><b>

and </b><tt><b><font face="Courier">croak()</font></b></tt><b>

from the </b><tt><b><font face="Courier">Carp Module<br>

</font></b></tt>

</blockquote>


<blockquote>

<pre>use Carp;<br>use strict;<br><br>package Foo;<br>    sub foo {<br>        main::carp("carp called at line " . __LINE__ .<br>            ",\n    but foo() was called");<br><br>        main::croak("croak called at line " . __LINE__ .<br>            ",\n    but foo() was called");<br>}<br><br>package main;<br>    foo::foo();<br><br></pre>

</blockquote>


<hr>

<p>

This program displays:

</p>
<blockquote>

<pre>carp called at line 9, <br><br>    but foo() was called at e.pl line 18<br><br>croak called at line 10, <br><br>    but foo() was called at e.pl line 18<br><br></pre>

</blockquote>


<p>

This example uses a compiler symbol, __LINE__, to iNCorporate

the current line number in the string passed to both <tt>carp()</tt>

and <tt>croak()</tt>. This technique

enables you to see both the line number where <tt>carp()</tt>

and <tt>croak()</tt> were called <i>and</i>

the line number where <tt>foo()</tt>

was called.

</p>
<p>

The <tt>Carp</tt> module also defines

a <tt>confess()</tt> fuNCtion which

is similar to <tt>croak()</tt> except

that a fuNCtion call history will also be displayed. Listing 15.7

shows how this fuNCtion can be used. The fuNCtion declarations

were placed after the <tt>foo()</tt>

fuNCtion call so that the program flow reads from top to bottom

with no jumping around.

</p>
<p>

<img src="http://octopus.cdut.edu.cn/%7Eyf17/perl5ex/pseudo.gif" tppabs="http://202.113.16.101/%7eeb%7e/Perl%205%20By%20Example/pseudo.gif" align="right" border="1"></p>
<p>

</p>
<blockquote>

<i>Load the Carp module.<br>

Invoke the strict pragma.<br>

Call </i><tt><i>foo()</i></tt><i>.

<br>

Define </i><tt><i>foo()</i></tt><i>.

<br>

Call </i><tt><i>bar()</i></tt><i>.

<br>

Define </i><tt><i>bar()</i></tt><i>.

<br>

Call </i><tt><i>baz()</i></tt><i>.

<br>

Define </i><tt><i>baz()</i></tt><i>.

<br>

Call </i><tt><i>Confess()</i></tt><i>.</i>

</blockquote>


<hr>

<blockquote>

<b>Listing 15.7&nbsp;&nbsp;15LST07.PL-Using </b><tt><b><font face="Courier">confess()</font></b></tt><b>

from the </b><tt><b><font face="Courier">Carp</font></b></tt><b>

Module<br>

</b>

</blockquote>


<blockquote>

<pre>use Carp;<br>use strict;<br><br>foo();<br><br>sub foo {<br>    bar();<br>}<br><br>sub bar {<br>    baz();<br>}<br><br>sub baz {<br>    confess("I give up!");<br>}<br></pre>

</blockquote>


<hr>

<p>

This program displays:

</p>
<blockquote>

<pre>I give up! at e.pl line 16<br><br>        main::baz called at e.pl line 12<br><br>        main::bar called at e.pl line 8<br><br>        main::foo called at e.pl line 5<br><br></pre>

</blockquote>


<p>

This daisy-chain of fuNCtion calls was done to show you how the

fuNCtion call history looks when displayed. The fuNCtion call

history is also called a <i>stack trace</i>. As each fuNCtion

is called, the address from which it is called gets placed on

a stack. When the <tt>confess()</tt>

fuNCtion is called, the stack is unwound or read. This lets Perl

print the fuNCtion call history.

</p>
<img src ="http://www.blogjava.net/pyguru/aggbug/1291.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-18 03:49 <a href="http://www.blogjava.net/pyguru/archive/2005/02/18/1291.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Add RSS feeds to your Web site with Perl XML::RSS</title><link>http://www.blogjava.net/pyguru/archive/2005/02/17/1268.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Wed, 16 Feb 2005 19:04:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/17/1268.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1268.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/17/1268.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1268.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1268.html</trackback:ping><description><![CDATA[<span class="mdeck">
Guest Contributor, TechRepublic<br>
December  22, 2004<br>
URL: <a href="http://www.builderau.com.au/architect/webservices/0,39024590,39171461,00.htm">http://www.builderau.com.au/architect/webservices/0,39024590,39171461,00.htm</a><p>
</p></span>
<!-- Story Body BEGIN -->
<br>

<span class="mdeck">
<span class="mdeck"><img src="http://www.builderau.com.au/resources/images/TechRepublic150x36.gif" alt="TechRepublic" height="36" width="150"><br><br><span class="smdeck">Take advantage of the XML::RSS CPAN package, which is specifically designed to read and parse RSS feeds.</span>  
<p>
You've probably already heard of RSS, the XML-based format which allows
Web sites to publish and syndicate the latest content on their site to
all interested parties. RSS is a boon to the lazy Webmaster, because
(s)he no longer has to manually update his or her Web site with new
content.
</p><p>Instead, all a Webmaster has to do is plug in an RSS client,
point it to the appropriate Web sites, and sit back and let the site
"update itself" with news, weather forecasts, stock market data, and
software alerts. You've already seen, in <a href="http://www.builderau.com.au/architect/webservices/0,39024590,39131860,00.htm" target="_blank">previous articles</a>,
how you can use the ASP.NET platform to manually parse an RSS feed and
extract information from it by searching for the appropriate elements.
But I'm a UNIX guy, and I have something that's even better than
ASP.NET. It's called Perl.
</p><p>
<span class="subhead1">Installing XML::RSS</span>
<br>RSS parsing in Perl is usually handled by the XML::RSS CPAN
package. Unlike ASP.NET, which comes with a generic XML parser and
expects you to manually write RSS-parsing code, the XML::RSS package is
specifically designed to read and parse RSS feeds. When you give
XML::RSS an RSS feed, it converts the various &lt;item&gt;s in the feed
into array elements, and exposes numerous methods and properties to
access the data in the feed. XML::RSS currently supports versions 0.9,
0.91, and 1.0 of RSS.
</p><p>
Written entirely in Perl, XML::RSS isn't included with Perl by default, and you must install it from <a href="http://search.cpan.org/%7Ekellan/XML-RSS-1.05/lib/RSS.pm" target="_blank">CPAN</a>.
Detailed installation instructions are provided in the download
archive, but by far the simplest way to install it is to use the CPAN
shell, as follows:
</p><p>
<span class="code">
shell&gt; perl -MCPAN -e shell<br>
cpan&gt; install XML::RSS
</span>
</p><p>If you use the CPAN shell, dependencies will be automatically
downloaded for you (unless you told the shell not to download dependent
modules). If you manually download and install the module, you may need
to download and install the XML::Parser module before XML::RSS can be
installed. The examples in this tutorial also need the LWP::Simple
package, so you should download and install that one too if you don't
already have it.
</p><p>
<span class="subhead1">Basic usage</span>
<br>For our example, we'll assume that you're interested in displaying
the latest geek news from Slashdot on your site. The URL for Slashdot's
RSS feed is located <a href="http://slashdot.org/index.rss" target="_blank">here</a>. The script in <b>Listing A</b> retrieves this feed, parses it, and turns it into a human-readable HTML page using XML::RSS:
</p><p>
<b>Listing A</b>
</p><p>
</p><pre>#!/usr/bin/perl<br><br># import packages<br>use XML::RSS;<br>use LWP::Simple;<br><br># initialize object<br>$rss = new XML::RSS();<br><br># get RSS data<br>$raw = get('http://www.slashdot.org/index.rss');<br><br># parse RSS feed<br>$rss-&gt;parse($raw);<br><br># print HTML header and page<br>print "Content-Type: text/html\n\n";<br>print "<basefont face="Arial" size="8">"; print ""; print "";<br>print "";<br>print "<table border="1" cellpadding="5" cellspacing="0" width="300"><tbody><tr><td align="center" bgcolor="Silver">" . $rss-&gt;channel('title') .
"</td></tr><tr><td>";

# print titles and URLs of news items
foreach my $item (@{$rss-&gt;{'items'}})
{
        $title = $item-&gt;{'title'};
        $url = $item-&gt;{'link'};
        print "<a href="http://www.builderau.com.au/architect/webservices/%5C%22$url%5C%22">$title</a><p \="">"; }

# print footers
print "</p></td></tr></tbody></table>";<br>print "";<br><br></pre>
<p>
Place the script in your Web server's cgi-bin/ directory/. Remember to
make it executable, and then browse to it using your Web browser. After
a short wait for the RSS file to download, you should see something
like <b>Figure A</b>.
</p><p>
</p><center>
<b>Figure A</b>
<p>

<img src="http://www.builderau.com.au/resources/images/rssfeedsa.gif"><br>
Slashdot RSS feed</p></center>
<p>
 
</p></span>
<span class="mdeck">How does the script in <b>Listing A</b> work? Well,
the first task is to get the RSS feed from the remote system to the
local one. This is accomplished with the LWP::Simple package, which
simulates an HTTP client and opens up a network connection to the
remote site to retrieve the RSS data. An XML::RSS object is created,
and this raw data is then passed to it for processing.
<p>
The various elements of the RSS feed are converted into Perl structures, and a <i>foreach()</i>
loop is used to iterate over the array of items. Each item contains
properties representing the item name, URL and description; these
properties are used to dynamically build a readable list of news items.
Each time Slashdot updates its RSS feed, the list of items displayed by
the script above will change automatically, with no manual intervention
required.
</p><p>
The script in <b>Listing A</b> will work with other RSS feeds as well—simply alter the URL passed to the LWP's <i>get()</i> method, and watch as the list of items displayed by the script changes.
</p><p>
</p><hr width="100%">

<b>Here are some RSS feeds to get you started</b>
<p>
</p><ul><li><a href="http://www.builderau.com.au/feeds.htm" target="_blank">Builder AU</a>
</li><li><a href="http://www.thinkgeek.com/thinkgeek.rdf" target="_blank">Thinkgeek</a> 
</li><li><a href="http://www.cnet.com/4520-6022-5115113.html" target="_blank">CNET</a> 
</li><li><a href="http://www.syndic8.com/" target="_blank">Syndic8</a> 
</li><li><a href="http://www.weatherclicks.com/cgi-bin/weather/hw3.cgi?config=&amp;forecast=pass&amp;pass=tafINT" target="_blank">Local weather forecasts</a>
</li></ul>
<p>

<b>Tip:</b> Notice that the RSS channel name (and description) can be obtained with the object's <i>channel()</i> method, which accepts any one of three arguments (title, description or link) and returns the corresponding channel value.
 </p><hr width="100%">
<p>
<span class="subhead1">Adding multiple sources and optimising performance</span>
<br>
So that takes care of adding a feed to your Web site. But hey, why limit yourself to one when you can have many? <b>Listing B</b>, a revision of the <b>Listing A</b>,
sets up an array containing the names of many different RSS feeds, and
iterates over the array to produce a page containing multiple channels
of information.
</p><p>
<b>Listing B</b>
</p><p>
</p><pre>#!/usr/bin/perl<br><br># import packages<br>use XML::RSS;<br>use LWP::Simple;<br><br># initialize object<br>$rss = new XML::RSS();<br><br># get RSS data<br>$raw = get('http://www.slashdot.org/index.rss');<br><br># parse RSS feed<br>$rss-&gt;parse($raw);<br><br># print HTML header and page<br>print "Content-Type: text/html\n\n";<br>print "<basefont face="Arial" size="8">"; print ""; print "";<br>print "";<br>print "<table border="1" cellpadding="5" cellspacing="0" width="300"><tbody><tr><td align="center" bgcolor="Silver">" . $rss-&gt;channel('title') .
"</td></tr><tr><td>";

# print titles and URLs of news items
foreach my $item (@{$rss-&gt;{'items'}})
{
        $title = $item-&gt;{'title'};
        $url = $item-&gt;{'link'};
        print "<a href="http://www.builderau.com.au/architect/webservices/%5C%22$url%5C%22">$title</a><p \="">"; }

# print footers
print "</p></td></tr></tbody></table>";<br>print "";<br></pre>
<p>
<b>Figure B</b> shows you what it looks like.
</p><p>
</p><center>
<b>Figure B</b>
<p>

<img src="http://www.builderau.com.au/resources/images/rssfeedsb.gif"><br>
Several RSS feeds 
</p></center>
<p>
You'll notice, if you're sharp-eyed, that <b>Listing B</b> uses the <i>parsefile()</i>
method to read a local version of the RSS file, instead of using LWP to
retrieve it from the remote site. This revision results in improved
performance, because it does away with the need to generate an internal
request for the RSS data source every time the script is executed.
Fetching the RSS file on each script run not only causes things to go
slow (because of the time taken to fetch the RSS file), but it's also
inefficient; it's unlikely that the source RSS file will change on a
minute-by-minute basis, and by fetching the same data over and over
again, you're simply wasting bandwidth. A better solution is to
retrieve the RSS data source once, save it to a local file, and use
that local file to generate your page.
</p><p>Depending on how often the source file gets updated, you can
write a simple shell script to download a fresh copy of the file on a
regular basis.
</p><p>
Here's an example of such a script:
</p><p>
<span class="code">
#!/bin/bash<br>

/bin/wget http://www.freshmeat.net/backend/fm.rdf -O freshmeat.rdf
</span>
</p><p> 
This script uses the <i>wget</i> utility (included with most Linux distributions) to download and save the RSS file to disk. Add this to your system <i>crontab</i>, and set it to run on an hourly or daily basis.
</p><p>If you find performance unacceptably low even after using local
copies of RSS files, you can take things a step further, by generating
a static HTML snapshot from the script above, and sending that to
clients instead. To do this, comment out the line printing the
"Content-Type" header in the script above and then run the script from
the console, redirecting the output to an HTML file. Here's how:
</p><p>
<span class="code">
$ ./rss.cgi &gt; static.html</span>
</p><p>Now, simply serve this HTML file to your users. Since the file
is a static file and not a script, no server-side processing takes
place before the server transmits it to the client. You can run the
command-line above from your <i>crontab</i>
to regenerate the HTML file on a regular basis. Performance with a
static file should be noticeably better than with a Perl script.
</p><p>
Looks easy? What are you waiting for—get out there and start hooking your site up to your favorite RSS news feeds.
</p></span></span><img src ="http://www.blogjava.net/pyguru/aggbug/1268.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-17 03:04 <a href="http://www.blogjava.net/pyguru/archive/2005/02/17/1268.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Lilina：RSS聚合器构建个人门户(Write once, publish anywhere)</title><link>http://www.blogjava.net/pyguru/archive/2005/02/17/1267.html</link><dc:creator>pyguru</dc:creator><author>pyguru</author><pubDate>Wed, 16 Feb 2005 19:00:00 GMT</pubDate><guid>http://www.blogjava.net/pyguru/archive/2005/02/17/1267.html</guid><wfw:comment>http://www.blogjava.net/pyguru/comments/1267.html</wfw:comment><comments>http://www.blogjava.net/pyguru/archive/2005/02/17/1267.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/pyguru/comments/commentRss/1267.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/pyguru/services/trackbacks/1267.html</trackback:ping><description><![CDATA[<h3>Lilina：<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>聚合器构建个人门户(Write once, publish anywhere)</h3>


<p>最近搜集<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>解析工具中找到了<a href="http://magpie.sourceforge.net/">MagPieRSS</a> 和基于其设计的<a href="http://lilina.sourceforge.net/">Lilina</a>；Lilina的主要功能：</p>


<p>1 基于WEB界面的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>管理：添加，删除，OPML导出，<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>后台缓存机制（避免对数据源服务器产生过大压力），ScriptLet: 类似于Del.icio.us it的收藏夹即时订阅JS脚本；</p>


<p>2 前台发布：将自己的首页改成了用Lilina发布我常看的几个朋友的网志，也省去了很多更新自己网页的工作，需要<strong>php 4.3 + mbstring iconv</strong><br>
<img alt="lilina.png" src="http://www.chedong.com/blog/archives/lilina.png" height="441" width="394"><br>
开源软件对i18n的支持越来越好了，php 4.3.x，'--enable-mbstring' '--with-iconv'后比较好的同时处理了UTF-8和其他中文字符集发布的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>。<br>
<a href="http://minutillo.com/steve/weblog/2004/6/17/php-xml-and-character-encodings-a-tale-of-sadness-rage-and-data-loss">需要感谢Steve在PHP进行转码方面</a>对<a href="http://magpierss.sourceforge.net/">MagPieRSS</a>进行和XML Hacking工作。至少目前为止：<a href="http://weblog.chedong.com/archives/000496.html">Add to my yahoo还不能很好的处理utf-8字符集的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>收藏</a>。</p>


<p>记得年初<a href="http://blog.timetide.net/">Wen Xin</a>在CNBlog的研讨会上介绍了<a href="http://www.wen-xin.net/document/blog-socialnetwork-personal-portal-wenxin.ppt">个人门户</a>的概念，随着<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>在CMS技术中的成熟，越来越多的服务可以让个人用户根据自己需求构建门户，也算是符合了互联网的<a href="http://www.google.com/search?q=define%3Adecentralization">非中心化</a>趋势吧，比如利用Add to My Yahoo!功能，用户可以轻松的实现自己从更多数据源进行新闻订阅。想象一下把你自己的del.icio.us书签收藏 / flickr图片收藏 / Yahoo!新闻都通过这样一个<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>聚合器聚合/发布起来。其传播效率将有多快。</p>


<p>好比软件开发通过中间平台/虚拟机实现：一次写成，随处运行（Write once, run anywhere），通过<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>/XML这个中间层，信息发布也实现了：一次写成，随处发布（Write once, publish anywhere...）</p>


<div id="a000027more"><div id="more">
<p>安装Lilina需要PHP 4.3 以上，并带有iconv mbstring等函数的支持，请确认一下<a href="http://www.chedong.com/phpMan.php/phpinfo">PHP模块的支持</a>：'--enable-mbstring' '--with-iconv'</p>

<p>另外就是一个需要能通过服务器端向外部服务器发送RPC请求，这点51.NET不支持。感觉<a href="http://signup.powweb.com/powweb-bin/referer.cgi?account_id=100811">PowWeb的服务</a>很不错，很多缺省的包都安装好了：</p>

<p>iconv<br>
iconv support  enabled<br>
iconv implementation  unknown<br>
iconv library version  unknown</p>

<p>Directive Local Value Master Value<br>
iconv.input_encoding ISO-8859-1 ISO-8859-1<br>
iconv.internal_encoding ISO-8859-1 ISO-8859-1<br>
iconv.output_encoding ISO-8859-1 ISO-8859-1</p>

<p>mbstring<br>
Multibyte Support  enabled<br>
Japanese support  enabled<br>
Simplified chinese support  enabled<br>
Traditional chinese support  enabled<br>
Korean support  enabled<br>
Russian support  enabled<br>
Multibyte (japanese) regex support  enabled</p>

<p>将安装包解包（下载文件扩展名是.gz 其实是.tgz，需要重命名一下）：上传到服务器相应目录下，注意：相应cache目录和当前目录的可写入属性设置，然后配置一下conf.php中的参数即可开始使用。</p>

<p>何东给我的建议：<br>
1）右边的一栏，第一项的sources最好跟hobby、友情链接一样，加个图片。<br>
2）一堆检索框在那儿，有些乱，建议只有一个，其它的放到一个二级页面上。<br>
3）把联系方式及cc,分别做成一条或一个图片，放在右边一栏中，具体的内容可以放到二级页面上，因为我觉得好象没有多少人会细读这些文字。<br>
4）如果可能，把lilina的头部链接汉化一下吧？</p>

<p>一些改进计划：<br>
1 删除过长的摘要，可以通过寻找第2个"</p><p>"  实现；<br>
2 分组功能：将<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>进行组输出；</p>

<p>修改默认显示实现：Lilina缺省显示最近1天发表的文章，如果需要改成其他时间周期可以找到：<br>
$TIMERANGE = ( $_REQUEST['hours'] ? $_REQUEST['hours']*3600 : 3600*24 ) ;</p>

<p>进行改动。</p>

<p><b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>是一个能将自己的所有资源：WIKI / BLOG / 邮件聚合起来的轻量级协议，以后无论你在何处书写，只要有<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>接口就都可以通过一定方式进行再次的汇聚和发布起来，从而大大提高了个人知识管理和发布/传播效率。</p>

<p>以前对<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>理解非常浅：不就是一个DTD嘛，真了解起解析器来，才知道namespace的重要性，一个好的协议也应该是这样的：并非没有什么可加的，但肯定是没有什么可“减”的了，而真的要做到这个其实很难很难……。</p>

<p>我会再尝试一下JAVA的相关解析器，将其扩展到<a href="http://sourceforge.net/projects/weblucene/">WebLucene</a>项目中，更多<a href="http://java-source.net/open-source/rss-rdf-tools">Java相关Open Source <b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>解析器资源</a>。</p>

<p>另外找到的2个使用<b style="color: black; background-color: rgb(255, 255, 102);">Perl</b>进行<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>解析的包：<br>
使用<a href="http://search.cpan.org/%7Eebosrup/RSS-Parser-Lite/">XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>::Lite</a>和<a href="http://search.cpan.org/%7Etima/XML-RSS-Parser-2.15/">XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b> </a>解析<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b></p>

<p>XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>::Lite的代码样例如下：</p>

<p>#!/usr/bin/<b style="color: black; background-color: rgb(255, 255, 102);">perl</b> -w<br>
# $Id$<br>
# XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>::Lite sample</p>

<p>use strict;<br>
use XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>::Lite;<br>
use LWP::Simple;</p>

<p><br>
my $xml = get("http://www.klogs.org/index.xml");<br>
my $rp = new XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>::Lite;<br>
$rp-&gt;parse($xml);</p>

<p># print blog header<br>
print "&lt;a href=\"".$rp-&gt;get('url')."\"&gt;" . $rp-&gt;get('title') . " - " . $rp-&gt;get('description') . "&lt;/a&gt;\n";</p>

<p># convert item to &lt;li&gt;<br>
print "&lt;ul&gt;";<br>
for (my $i = 0; $i &lt; $rp-&gt;count(); $i++) {<br>
        my $it = $rp-&gt;get($i);<br>
        print "&lt;li&gt;&lt;a href=\"" . $it-&gt;get('url') . "\"&gt;" . $it-&gt;get('title') . "&lt;/a&gt;&lt;/li&gt;\n";<br>
}<br>
print "&lt;/ul&gt;";</p>

<p>安装：<br>
    需要SOAP-Lite</p>

<p>优点：<br>
    方法简单，支持远程抓取；</p>

<p>缺点：<br>
    只支持title, url, description这3个字段，不支持时间字段，</p>

<p>计划用于简单的抓取<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>同步服务设计：每个人都可以出版自己订阅的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>。</p>

<p><br>
 XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>代码样例如下：<br>
#!/usr/bin/<b style="color: black; background-color: rgb(255, 255, 102);">perl</b> -w<br>
# $Id$<br>
# XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b> sample with Iconv charset convert</p>

<p>use strict;<br>
use XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>;<br>
use Text::Iconv;<br>
my $converter = Text::Iconv-&gt;new("utf-8", "gbk");</p>

<p><br>
my $p = new XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>;<br>
my $feed = $p-&gt;parsefile('index.xml');</p>

<p># output some values<br>
my $title = XML::<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>::<b style="color: black; background-color: rgb(153, 255, 153);">Parser</b>-&gt;ns_qualify('title',$feed-&gt;rss_namespace_uri);<br>
# may cause error this line: print $feed-&gt;channel-&gt;children($title)-&gt;value."\n";<br>
print "item count: ".$feed-&gt;item_count()."\n\n";<br>
foreach my $i ( $feed-&gt;items ) {<br>
   map { print $_-&gt;name.": ".$converter-&gt;convert($_-&gt;value)."\n" } $i-&gt;children;<br>
   print "\n";<br>
}</p>

<p>优点：<br>
    能够直接将数据按字段输出，提供更底层的界面；</p>

<p>缺点：<br>
    不能直接解析远程<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>，需要下载后再解析；</p>

<p>2004-12-14: <br>
从cnblog的Trackback中了解到了<a href="http://planetplanet.org/">Planet <b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>聚合器</a></p>

<p>Planet的安装：解包后，直接在目录下运行：python planet.py examples/config.ini 就可以在output目录中看到缺省样例FEED中的输出了index.html，另外还有opml.xml和<b style="color: black; background-color: rgb(160, 255, 255);">rss</b>.xml等输出（这点比较好）</p>

<p>我用几个<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>试了一下，UTF-8的没有问题，但是GBK的全部都乱码了，planetlib.py中和XML字符集处理的只有以下代码：看来所有的非UTF-8都被当作iso8859_1处理了：<br>
        try:<br>
            data = unicode(data, "utf8").encode("utf8")<br>
            logging.debug("Encoding: UTF-8")<br>
        except UnicodeError:<br>
            try:<br>
                data = unicode(data, "iso8859_1").encode("utf8")<br>
                logging.debug("Encoding: ISO-8859-1")<br>
            except UnicodeError:<br>
                data = unicode(data, "ascii", "replace").encode("utf8")<br>
                logging.warn("Feed wasn't in UTF-8 or ISO-8859-1, replaced " +<br>
                             "all non-ASCII characters.")</p>

<p>近期学习一下Python的unicode处理，感觉是一个很简洁的语言，有比较好的try ... catch 机制和logging</p>

<p>关于MagPieRSS性能问题的疑虑：<br>
对于Planet和MagPieRSS性能的主要差异在是缓存机制上，关于使用缓存机制加速WEB服务可以参考：<a href="http://www.chedong.com/tech/cache.html">可缓存的cms设计</a>。</p>

<p>可以看到：Lilina的缓存机制是每次请求的时候遍历缓存目录下的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>文件，如果缓存文件过期，还要动态向<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>数据源进行请求。因此不能支持后台太多的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>订阅和前端大量的并发访问（会造成很多的I/O操作）。</p>

<p>Planet是一个后台脚本，通过脚本将订阅的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>定期汇聚成一个文件输出成静态文件。</p>

<p>其实只要在MagPieRSS前端增加一个wget脚本定期将index.php的数据输出成index.html，然后要求每次访问先访问index.html缓存，这样不就和Planet的每小时生成index.html静态缓存一样了吗。</p>

<p>所以在不允许自己配置服务器脚本的虚拟主机来说Planet根本是无法运行的。</p>

<p>更多关于PHP中处理GBK的XML解析问题请参考：<br>
<a href="http://weblog.chedong.com/archives/000598.html">MagPieRSS中UTF-8和GBK的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>解析分析</a></p>

<p>2004-12-19 <br>
正如在SocialBrain 2005年的讨论会中，Isaac Mao所说：<strong>Blog is a 'Window', also could be a 'Bridge'</strong>，Blog是个人/组织对外的“窗口”，而<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>更方便你将这些窗口组合起来，成为其间的“桥梁”，有了这样的中间发布层，Blog不仅从单点发布，更到P2P自助传播，越来越看到了<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>在网络传播上的重要性。</p>
</div></div>


<p class="posted">Posted by chedong at December 11, 2004 12:34 AM
<a href="http://www.chedong.com/cgi-bin/mt3/mt.cgi?__mode=view&amp;_type=entry&amp;id=27&amp;blog_id=1">Edit</a>
<br>
Last Modified at December 19, 2004 04:40 PM
</p>


<ul>

相关文章:
<li><a href="http://www.chedong.com/blog/archives/000048.html">2005改变你生活的50种方法</a> 2005-01-31</li><li><a href="http://www.chedong.com/blog/archives/000047.html">首尔之行</a> 2005-01-25</li><li><a href="http://www.chedong.com/blog/archives/000045.html">+1 rel="nofollow" = 互联网为超链戴上的安全套?! ;-)</a> 2005-01-21</li><li><a href="http://www.chedong.com/blog/archives/000044.html">可读性和更新性: <b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>模板的atom化改造</a> 2005-01-20</li><li><a href="http://www.chedong.com/blog/archives/000043.html">让搜索引擎Spider告诉你：什么时间，从哪里，用什么身份抓取了你的网站</a> 2005-01-17</li>
</ul>



<script type="text/javascript"><!--
google_ad_client = "pub-1309797784693300";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as";
google_ad_channel ="";
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><iframe name="google_ads_frame" src="http://pagead2.googlesyndication.com/pagead/ads?client=ca-pub-1309797784693300&amp;dt=1108574215339&amp;lmt=1108574205&amp;format=468x60_as&amp;output=html&amp;url=http%3A%2F%2F64.233.161.104%2Fsearch%3Fq%3Dcache%3AeqhzNn1G3bcJ%3Awww.chedong.com%2Fblog%2Farchives%2F000027.html%2Bperl%2Brss%2Bparser%26hl%3Den%26lr%3Dlang_zh-CN%26client%3Dfirefox-a&amp;ref=http%3A%2F%2Fwww.google.com%2Fsearch%3Fq%3Dperl%2Brss%2Bparser%26start%3D0%26start%3D0%26ie%3Dutf-8%26oe%3Dutf-8%26client%3Dfirefox-a%26rls%3Dorg.mozilla%3Aen-US%3Aofficial&amp;u_h=768&amp;u_w=1024&amp;u_ah=740&amp;u_aw=1024&amp;u_cd=32&amp;u_tz=-300&amp;u_his=1&amp;u_java=true&amp;u_nplug=12&amp;u_nmime=38" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" frameborder="0" height="60" scrolling="no" width="468">&amp;lt;img&amp;gt;</iframe>
<h2 id="trackbacks">Trackback Pings</h2>

<p class="techstuff">TrackBack URL for this entry:<br>
http://www.chedong.com/cgi-bin/mt3/mt-tb.cgi/27</p>


<p>Listed below are links to weblogs that reference <a href="http://www.chedong.com/blog/archives/000027.html">Lilina：<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>聚合器构建个人门户(Write once, publish anywhere)</a>:</p>


<p>
» <a href="http://weblog.chedong.com/archives/000598.html" rel="nofollow">MagPieRSS中UTF-8和GBK的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>解析分析（附：php中的面向字符编程详解）</a> from 车东BLOG<br>
第一次尝试MagpieRSS，因为没有安装iconv和mbstring，所以失败了，今天在服务器上安装了iconv和mtstring的支持，我今天仔细看了一下lilina中的rss_fetch的用法：最重要的是制定<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>的输出格式为'MAGPIE_OU... <a href="http://weblog.chedong.com/archives/000598.html" rel="nofollow">[Read More]</a>
</p>

<p>Tracked on December 19, 2004 12:37 AM</p>

<p>
» <a href="http://weblog.realmarsnova.net/index.php?op=ViewArticle&amp;articleId=69&amp;blogId=2" rel="nofollow">用 lilina 和 blogline 来看 blog</a> from Philharmania's Weblog<br>
看到一篇<a href="http://www.chedong.com/blog/archives/000027.html" rel="nofollow">介绍 lilina 的文章</a>后就自己<a href="http://realmarsnova.net/lilina/" rel="nofollow">安装了一个</a>试了下。<a href="http://lilina.sourceforge.net/" rel="nofollow">lilina</a> 是一个用 PHP 语 <a href="http://weblog.realmarsnova.net/index.php?op=ViewArticle&amp;articleId=69&amp;blogId=2" rel="nofollow">[Read More]</a>
</p>

<p>Tracked on December 26, 2004 01:57 PM</p>

<p>
» <a href="http://blog.cnblog.org/archives/2004/12/cnblogosserssoo.html" rel="nofollow">CNBlog作者群<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>征集中</a> from CNBlog: Blog on Blog<br>
在CNBLOG上搭建了<a href="http://blog.cnblog.org/lilina/" rel="nofollow">Lilina <b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>聚合器</a>，请各位志愿者将各自网志或者和与cnblog相关专栏的<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>提交给我 — 直接在评论中回复即可。

推广使用<b style="color: black; background-color: rgb(160, 255, 255);">RSS</b>聚合工具主要的目的   . <a href="http://blog.cnblog.org/archives/2004/12/cnblogosserssoo.html" rel="nofollow">[Read More]</a>
</p>

<p>Tracked on December 26, 2004 07:42 PM</p>

<p>
» <a href="http://weblog.dalouis.com/archives/2005/01/ae_lilina_cecae.html" rel="nofollow">关于加快 lilina 显示速度的一些设置</a> from Kreny's Blog<br>
我的 lilina 在设定了几位朋友的 blog 和一些 news 以后，发现打开速度异常的慢，于是请教了车东，解决了问题。
解决的关键在于：</p>
<blockquote>直接将以下语句加入到 index.php 头部即可，LILINA中你   .</blockquote>
 <a href="http://weblog.dalouis.com/archives/2005/01/ae_lilina_cecae.html" rel="nofollow">[Read More]</a>

<p>Tracked on January 14, 2005 06:14 PM</p>

<p>
» <a href="http://weblog.chedong.com/archives/000009.html" rel="nofollow">MT的模板修改和界面皮肤设置</a> from 车东BLOG<br>
分类索引： 首页缺省有按月归档的索引，没有分类目录的索引，看了手册里面也没有具体的参数定义，只好直接看SOURCE：尝试着把Monthly改成Category，居然成了 :-) 还到了Movable Style的MT样式站，... <a href="http://weblog.chedong.com/archives/000009.html" rel="nofollow">[Read More]</a>
</p>

<p>Tracked on January 17, 2005 01:25 PM</p>







<h2 id="comments">Comments</h2>


<div id="c100">
<p>请问如果更改默认显示7天的新闻，谢谢。</p>
</div>

<p class="posted">Posted by: <a href="mailto:honren@tom.com" rel="nofollow">honren</a>  at December 12, 2004 10:20 PM</p>

<div id="c102">
<p>我使用lilina已经一段时间了。<br>
<a href="http://news.yanfeng.org/" rel="nofollow">http://news.yanfeng.org</a><br>
稍微改了一点UI。<br>
如果你能改进它，那就好了。</p>
</div>

<p class="posted">Posted by: <a href="http://yanfeng.org/blog" rel="nofollow">mulberry</a>  at December 13, 2004 09:24 AM</p>

<div id="c138">
<p>老车同志，没觉得你使用lilina以来，主页的访问速度具慢吗？放弃吧，至少没必要当作首页，lilina还在技术还不成熟`~ </p>
</div>

<p class="posted">Posted by: <a href="http://www.oaspro.com/" rel="nofollow">kalen</a>  at December 16, 2004 10:33 AM</p>

<div id="c156">
<p>可以考虑一下用drupal</p>
</div>

<p class="posted">Posted by: <a href="http://shunz.8866.org/" rel="nofollow">shunz</a>  at December 28, 2004 06:46 PM</p>

<div id="c185">
<p>可以试试我做的：<a href="http://blog.terac.com/" rel="nofollow">http://blog.terac.com</a></p>

<p>每3小时抓取blog,然后每个选5条最新的，排序，聚合，生成静态xml，用xsl格式化显示。。。</p>
</div>

<p class="posted">Posted by: <a href="http://blog.terac.com/go/andy" rel="nofollow">andy</a>  at January  6, 2005 12:53 PM</p>

<div id="c253">
<p>车东同志，这样做不好：P<br>
<b style="color: black; background-color: rgb(160, 255, 255);">rss</b>本来就在网上，你聚合它在你的网页上不仅损害了你自己主页的质量，而且迷惑了搜索引擎，造成你痛斥的“门户网站损害创作热情”的效果。还是不要聚合的好！</p>
</div>
<img src ="http://www.blogjava.net/pyguru/aggbug/1267.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/pyguru/" target="_blank">pyguru</a> 2005-02-17 03:00 <a href="http://www.blogjava.net/pyguru/archive/2005/02/17/1267.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>