﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-HiKer`s Blog-文章分类-其他</title><link>http://www.blogjava.net/cslee/category/13626.html</link><description /><language>zh-cn</language><lastBuildDate>Wed, 28 Feb 2007 07:48:36 GMT</lastBuildDate><pubDate>Wed, 28 Feb 2007 07:48:36 GMT</pubDate><ttl>60</ttl><item><title>test</title><link>http://www.blogjava.net/cslee/articles/80542.html</link><dc:creator>清风逐月</dc:creator><author>清风逐月</author><pubDate>Sat, 11 Nov 2006 01:50:00 GMT</pubDate><guid>http://www.blogjava.net/cslee/articles/80542.html</guid><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: DB2 UDB Test Samples 701														1.          								To allow incoming connections via TCP/IP to an instance, which of the following must be set on the server?								A.        ...&nbsp;&nbsp;<a href='http://www.blogjava.net/cslee/articles/80542.html'>阅读全文</a><img src ="http://www.blogjava.net/cslee/aggbug/80542.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/cslee/" target="_blank">清风逐月</a> 2006-11-11 09:50 <a href="http://www.blogjava.net/cslee/articles/80542.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Unicode编码及UCS、UTF、BMP、BOM等名词解释</title><link>http://www.blogjava.net/cslee/articles/60994.html</link><dc:creator>清风逐月</dc:creator><author>清风逐月</author><pubDate>Mon, 31 Jul 2006 05:12:00 GMT</pubDate><guid>http://www.blogjava.net/cslee/articles/60994.html</guid><wfw:comment>http://www.blogjava.net/cslee/comments/60994.html</wfw:comment><comments>http://www.blogjava.net/cslee/articles/60994.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/cslee/comments/commentRss/60994.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/cslee/services/trackbacks/60994.html</trackback:ping><description><![CDATA[
		<h3 style="margin: auto 0cm;">
				<font face="宋体">
						<span lang="EN-US">0</span>、<span lang="EN-US">big endian</span>和<span lang="EN-US">little endian</span></font>
		</h3>
		<p>
				<font face="宋体">
						<span lang="EN-US">big endian</span>和<span lang="EN-US">little endian</span>是<span lang="EN-US">CPU</span>处理多字节数的不同方式。例如<span lang="EN-US">“</span>汉<span lang="EN-US">”</span>字的<span lang="EN-US">Unicode</span>编码是<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="6" unitname="C"><span lang="EN-US">6C</span></st1:chmetcnv><span lang="EN-US">49</span>。那么写到文件里时，究竟是将<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="6" unitname="C"><span lang="EN-US">6C</span></st1:chmetcnv>写在前面，还是将<span lang="EN-US">49</span>写在前 面？如果将<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="6" unitname="C"><span lang="EN-US">6C</span></st1:chmetcnv>写在前面，就是<span lang="EN-US">big endian</span>。如果将<span lang="EN-US">49</span>写在前面，就是<span lang="EN-US">little endian</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">“endian”</span>这个词出自《格列佛游记》。小人国的内战就源于吃鸡蛋时是究竟从大头<span lang="EN-US">(Big-Endian)</span>敲开还是从小头<span lang="EN-US">(Little-Endian)</span>敲开，由此曾发生过六次叛乱，一个皇帝送了命，另一个丢了王位。</font>
		</p>
		<p>
				<font face="宋体">一般将<span lang="EN-US">endian</span>翻译成<span lang="EN-US">“</span>字节序<span lang="EN-US">”</span>，将<span lang="EN-US">big endian</span>和<span lang="EN-US">little endian</span>称作<span lang="EN-US">“</span>大尾<span lang="EN-US">”</span>和<span lang="EN-US">“</span>小尾<span lang="EN-US">”</span>。</font>
		</p>
		<h3 style="margin: auto 0cm;">
				<font face="宋体">
						<span lang="EN-US">1</span>、字符编码、内码，顺带介绍汉字编码</font>
		</h3>
		<p>
				<font face="宋体">字符必须编码后才能被计算机处理。计算机使用的缺省编码方式就是计算机的内码。早期的计算机使用<span lang="EN-US">7</span>位的<span lang="EN-US">ASCII</span>编码，为了处理汉字，程序员设计了用于简体中文的<span lang="EN-US">GB2312</span>和用于繁体中文的<span lang="EN-US">big5</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">GB2312(1980</span>年<span lang="EN-US">)</span>一共收录了<span lang="EN-US">7445</span>个字符，包括<span lang="EN-US">6763</span>个汉字和<span lang="EN-US">682</span>个其它符号。汉字区的内码范围高字节从<span lang="EN-US">B0-F7</span>，低字节从<span lang="EN-US">A1-FE</span>，占用的码位是<span lang="EN-US">72*94=6768</span>。其中有<span lang="EN-US">5</span>个空位是<span lang="EN-US">D7FA-D7FE</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">GB2312</span>支持的汉字太少。<span lang="EN-US">1995</span>年的汉字扩展规范<span lang="EN-US">GBK1.0</span>收录了<span lang="EN-US">21886</span>个符号，它分为汉字区和图形符号区。汉字区包括<span lang="EN-US">21003</span>个字符。</font>
		</p>
		<p>
				<font face="宋体">从<span lang="EN-US">ASCII</span>、<span lang="EN-US">GB2312</span>到<span lang="EN-US">GBK</span>，这些编码方法是向下兼容的，即同一个字符在这些方案中总是有相同的编码，后面的标准支持更多的字符。在这些编 码中，英文和中文可以统一地处理。区分中文编码的方法是高字节的最高位不为<span lang="EN-US">0</span>。按照程序员的称呼，<span lang="EN-US">GB2312</span>、<span lang="EN-US">GBK</span>都属于双字节字符集<span lang="EN-US"> (DBCS)</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">2000</span>年的<span lang="EN-US">GB18030</span>是取代<span lang="EN-US">GBK1.0</span>的正式国家标准。该标准收录了<span lang="EN-US">27484</span>个汉字，同时还收录了藏文、蒙文、维吾尔文等主要的少数民族文字。从汉字字汇上说，<span lang="EN-US">GB18030</span>在<span lang="EN-US">GB13000.1</span>的<span lang="EN-US">20902</span>个汉字的基础上增加了<span lang="EN-US">CJK</span>扩展<span lang="EN-US">A</span>的<span lang="EN-US">6582</span>个汉字（<span lang="EN-US">Unicode</span>码<span lang="EN-US"> 0x3400-0x4db5</span>），一共收录了<span lang="EN-US">27484</span>个汉字。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">CJK</span>就是中日韩的意思。<span lang="EN-US">Unicode</span>为了节省码位，将中日韩三国语言中的文字统一编码。<span lang="EN-US">GB13000.1</span>就是<span lang="EN-US">ISO/IEC 10646-1</span>的中文版，相当于<span lang="EN-US">Unicode 1.1</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">GB18030</span>的编码采用单字节、双字节和<span lang="EN-US">4</span>字节方案。其中单字节、双字节和<span lang="EN-US">GBK</span>是完全兼容的。<span lang="EN-US">4</span>字节编码的码位就是收录了<span lang="EN-US">CJK</span>扩展<span lang="EN-US">A</span>的<span lang="EN-US">6582</span>个汉字。 例如：<span lang="EN-US">UCS</span>的<span lang="EN-US">0x3400</span>在<span lang="EN-US">GB18030</span>中的编码应该是<span lang="EN-US">8139EF30</span>，<span lang="EN-US">UCS</span>的<span lang="EN-US">0x3401</span>在<span lang="EN-US">GB18030</span>中的编码应该是<span lang="EN-US">8139EF31</span>。</font>
		</p>
		<p>
				<font face="宋体">微软提供了<span lang="EN-US">GB18030</span>的升级包，但这个升级包只是提供了一套支持<span lang="EN-US">CJK</span>扩展<span lang="EN-US">A</span>的<span lang="EN-US">6582</span>个汉字的新字体：新宋体<span lang="EN-US">-18030</span>，并不改变内码。<span lang="EN-US">Windows </span>的内码仍然是<span lang="EN-US">GBK</span>。</font>
		</p>
		<p>
				<font face="宋体">这里还有一些细节：</font>
		</p>
		<p style="margin-left: 36pt; text-indent: -18pt;">
				<span style="font-size: 10pt; font-family: Symbol;" lang="EN-US">
						<span style="">·<span style="font-family: 'Times New Roman'; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;">         </span></span>
				</span>
				<font face="宋体">
						<span lang="EN-US">GB2312</span>的原文还是区位码，从区位码到内码，需要在高字节和低字节上分别加上<span lang="EN-US">A0</span>。</font>
		</p>
		<p style="margin-left: 36pt; text-indent: -18pt;">
				<span style="font-size: 10pt; font-family: Symbol;" lang="EN-US">
						<span style="">·<span style="font-family: 'Times New Roman'; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;">         </span></span>
				</span>
				<font face="宋体">对 于任何字符编码，编码单元的顺序是由编码方案指定的，与<span lang="EN-US">endian</span>无关。例如<span lang="EN-US">GBK</span>的编码单元是字节，用两个字节表示一个汉字。 这两个字节的顺序是固定的，不受<span lang="EN-US">CPU</span>字节序的影响。<span lang="EN-US">UTF-16</span>的编码单元是<span lang="EN-US">word</span>（双字节），<span lang="EN-US">word</span>之间的顺序是编码方案指定的，<span lang="EN-US">word</span>内部的 字节排列才会受到<span lang="EN-US">endian</span>的影响。后面还会介绍<span lang="EN-US">UTF-16</span>。</font>
		</p>
		<p style="margin-left: 36pt; text-indent: -18pt;">
				<span style="font-size: 10pt; font-family: Symbol;" lang="EN-US">
						<span style="">·<span style="font-family: 'Times New Roman'; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;">         </span></span>
				</span>
				<font face="宋体">
						<span lang="EN-US">GB2312 </span>的两个字节的最高位都是<span lang="EN-US">1</span>。但符合这个条件的码位只有<span lang="EN-US">128*128=16384</span>个。所以<span lang="EN-US">GBK</span>和<span lang="EN-US">GB18030</span>的低字节最高位都可能不是<span lang="EN-US">1</span>。不过这不影 响<span lang="EN-US">DBCS</span>字符流的解析：在读取<span lang="EN-US">DBCS</span>字符流时，只要遇到高位为<span lang="EN-US">1</span>的字节，就可以将下两个字节作为一个双字节编码，而不用管低字节的高位是什么。</font>
		</p>
		<h3 style="margin: auto 0cm;">
				<font face="宋体">
						<span lang="EN-US">2</span>、<span lang="EN-US">Unicode</span>、<span lang="EN-US">UCS</span>和<span lang="EN-US">UTF</span></font>
		</h3>
		<p>
				<font face="宋体">前面提到从<span lang="EN-US">ASCII</span>、<span lang="EN-US">GB2312</span>、<span lang="EN-US">GBK</span>到<span lang="EN-US">GB18030</span>的编码方法是向下兼容的。而<span lang="EN-US">Unicode</span>只与<span lang="EN-US">ASCII</span>兼容（更准确地说，是与<span lang="EN-US">ISO-8859-1</span>兼容），与<span lang="EN-US">GB</span>码不兼容。例如<span lang="EN-US">“</span>汉<span lang="EN-US">”</span>字的<span lang="EN-US">Unicode</span>编码是<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="6" unitname="C"><span lang="EN-US">6C</span></st1:chmetcnv><span lang="EN-US">49</span>，而<span lang="EN-US">GB</span>码是<span lang="EN-US">BABA</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">Unicode</span>也是一种字符编码方法，不过它是由国际组织设计，可以容纳全世界所有语言文字的编码方案。<span lang="EN-US">Unicode</span>的学名是<span lang="EN-US"> "Universal Multiple-Octet Coded Character Set"</span>，简称为<span lang="EN-US">UCS</span>。<span lang="EN-US">UCS</span>可以看作是<span lang="EN-US">"Unicode Character Set"</span>的缩写。</font>
		</p>
		<p>
				<font face="宋体">根据维基百科全书<span lang="EN-US">(http://zh.wikipedia.org/wiki/)</span>的记载：历史上存在两个试图独立设计<span lang="EN-US">Unicode</span>的组织，即国 际标准化组织（<span lang="EN-US">ISO</span>）和一个软件制造商的协会（<span lang="EN-US">unicode.org</span>）。<span lang="EN-US">ISO</span>开发了<span lang="EN-US">ISO 10646</span>项目，<span lang="EN-US">Unicode</span>协会开发了<span lang="EN-US">Unicode</span>项目。</font>
		</p>
		<p>
				<font face="宋体">在<span lang="EN-US">1991</span>年前后，双方都认识到世界不需要两个不兼容的字符集。于是它们开始合并双方的工作成果，并为创立一个单一编码表而协同工作。从<span lang="EN-US">Unicode2.0</span>开始，<span lang="EN-US">Unicode</span>项目采用了与<span lang="EN-US">ISO 10646-1</span>相同的字库和字码。</font>
		</p>
		<p>
				<font face="宋体">目前两个项目仍都存在，并独立地公布各自的标准。<span lang="EN-US">Unicode</span>协会现在的最新版本是<span lang="EN-US">2005</span>年的<span lang="EN-US">Unicode <st1:chsdate w:st="on" isrocdate="False" islunardate="False" day="30" month="12" year="1899">4.1.0</st1:chsdate></span>。<span lang="EN-US">ISO</span>的最新标准是<span lang="EN-US">ISO 10646-3:2003</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">UCS</span>只是规定如何编码，并没有规定如何传输、保存这个编码。例如<span lang="EN-US">“</span>汉<span lang="EN-US">”</span>字的<span lang="EN-US">UCS</span>编码是<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="6" unitname="C"><span lang="EN-US">6C</span></st1:chmetcnv><span lang="EN-US">49</span>，我可以用<span lang="EN-US">4</span>个<span lang="EN-US">ascii</span>数字来传输、保存这个编 码；也可以用<span lang="EN-US">utf-8</span>编码<span lang="EN-US">:3</span>个连续的字节<span lang="EN-US">E6 B1 89</span>来表示它。关键在于通信双方都要认可。<span lang="EN-US">UTF-8</span>、<span lang="EN-US">UTF-7</span>、<span lang="EN-US">UTF-16</span>都是被广泛接受的方案。<span lang="EN-US">UTF-8</span>的一个特别的好处是它与<span lang="EN-US">ISO- 8859-1</span>完全兼容。<span lang="EN-US">UTF</span>是<span lang="EN-US">“UCS Transformation Format”</span>的缩写。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">IETF</span>的<span lang="EN-US">RFC2781</span>和<span lang="EN-US">RFC3629</span>以<span lang="EN-US">RFC</span>的一贯风格，清晰、明快又不失严谨地描述了<span lang="EN-US">UTF-16</span>和<span lang="EN-US">UTF-8</span>的编码方法。我总是记不得<span lang="EN-US"> IETF</span>是<span lang="EN-US">Internet Engineering Task Force</span>的缩写。但<span lang="EN-US">IETF</span>负责维护的<span lang="EN-US">RFC</span>是<span lang="EN-US">Internet</span>上一切规范的基础。</font>
		</p>
		<h4 style="margin: 14pt 0cm 14.5pt;">
				<font size="5">
						<span lang="EN-US">
								<font face="Arial">2.1</font>
						</span>
						<span style="font-family: 黑体;">、内码和</span>
						<span lang="EN-US">
								<font face="Arial">code page</font>
						</span>
				</font>
		</h4>
		<p>
				<font face="宋体">目前<span lang="EN-US">Windows</span>的内核已经支持<span lang="EN-US">Unicode</span>字符集，这样在内核上可以支持全世界所有的语言文字。但是由于现有的大量程序和文档都采用了某种特定语言的编码，例如<span lang="EN-US">GBK</span>，<span lang="EN-US">Windows</span>不可能不支持现有的编码，而全部改用<span lang="EN-US">Unicode</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">Windows</span>使用代码页<span lang="EN-US">(code page)</span>来适应各个国家和地区。<span lang="EN-US">code page</span>可以被理解为前面提到的内码。<span lang="EN-US">GBK</span>对应的<span lang="EN-US">code page</span>是<span lang="EN-US">CP936</span>。</font>
		</p>
		<p>
				<font face="宋体">微软也为<span lang="EN-US">GB18030</span>定义了<span lang="EN-US">code page</span>：<span lang="EN-US">CP54936</span>。但是由于<span lang="EN-US">GB18030</span>有一部分<span lang="EN-US">4</span>字节编码，而<span lang="EN-US">Windows</span>的代码页只支持单字节和双字节编码，所以这个<span lang="EN-US">code page</span>是无法真正使用的。</font>
		</p>
		<h3 style="margin: auto 0cm;">
				<font face="宋体">
						<span lang="EN-US">3</span>、<span lang="EN-US">UCS-2</span>、<span lang="EN-US">UCS-4</span>、<span lang="EN-US">BMP</span></font>
		</h3>
		<p>
				<font face="宋体">
						<span lang="EN-US">UCS</span>有两种格式：<span lang="EN-US">UCS-2</span>和<span lang="EN-US">UCS-4</span>。顾名思义，<span lang="EN-US">UCS-2</span>就是用两个字节编码，<span lang="EN-US">UCS-4</span>就是用<span lang="EN-US">4</span>个字节（实际上只用了<span lang="EN-US">31</span>位，最高位必须为<span lang="EN-US">0</span>）编码。下面让我们做一些简单的数学游戏：</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">UCS-2</span>有<span lang="EN-US">2^16=65536</span>个码位，<span lang="EN-US">UCS-4</span>有<span lang="EN-US">2^31=2147483648</span>个码位。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">UCS-4</span>根据最高位为<span lang="EN-US">0</span>的最高字节分成<span lang="EN-US">2^7=128</span>个<span lang="EN-US">group</span>。每个<span lang="EN-US">group</span>再根据次高字节分为<span lang="EN-US">256</span>个<span lang="EN-US">plane</span>。每个<span lang="EN-US">plane</span>根据第<span lang="EN-US"> 3</span>个字节分为<span lang="EN-US">256</span>行<span lang="EN-US"> (rows)</span>，每行包含<span lang="EN-US">256</span>个<span lang="EN-US">cells</span>。当然同一行的<span lang="EN-US">cells</span>只是最后一个字节不同，其余都相同。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">group 0</span>的<span lang="EN-US">plane 0</span>被称作<span lang="EN-US">Basic Multilingual Plane, </span>即<span lang="EN-US">BMP</span>。或者说<span lang="EN-US">UCS-4</span>中，高两个字节为<span lang="EN-US">0</span>的码位被称作<span lang="EN-US">BMP</span>。</font>
		</p>
		<p>
				<font face="宋体">将<span lang="EN-US">UCS-4</span>的<span lang="EN-US">BMP</span>去掉前面的两个零字节就得到了<span lang="EN-US">UCS-2</span>。在<span lang="EN-US">UCS-2</span>的两个字节前加上两个零字节，就得到了<span lang="EN-US">UCS-4</span>的<span lang="EN-US">BMP</span>。而目前的<span lang="EN-US">UCS-4</span>规范中还没有任何字符被分配在<span lang="EN-US">BMP</span>之外。</font>
		</p>
		<h3 style="margin: auto 0cm;">
				<font face="宋体">
						<span lang="EN-US">4</span>、<span lang="EN-US">UTF</span>编码</font>
		</h3>
		<p>
				<font face="宋体">
						<span lang="EN-US">UTF-8</span>就是以<span lang="EN-US">8</span>位为单元对<span lang="EN-US">UCS</span>进行编码。从<span lang="EN-US">UCS-2</span>到<span lang="EN-US">UTF-8</span>的编码方式如下：</font>
		</p>
		<table class="MsoNormalTable" style="width: 75%;" border="1" cellpadding="0" width="75%">
				<tbody>
						<tr style="">
								<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
										<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
												<span lang="EN-US">UCS-2</span>
												<span style="font-family: 宋体;">编码</span>
												<span lang="EN-US">(16</span>
												<span style="font-family: 宋体;">进制</span>
												<span lang="EN-US">)</span>
												<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
														<o:p>
														</o:p>
												</span>
										</p>
								</td>
								<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
										<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
												<span lang="EN-US">UTF-8 </span>
												<span style="font-family: 宋体;">字节流</span>
												<span lang="EN-US">(</span>
												<span style="font-family: 宋体;">二进制</span>
												<span lang="EN-US">)</span>
												<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
														<o:p>
														</o:p>
												</span>
										</p>
								</td>
						</tr>
						<tr style="">
								<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
										<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
												<span lang="EN-US">0000 - <st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="7" unitname="F">007F</st1:chmetcnv></span>
												<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
														<o:p>
														</o:p>
												</span>
										</p>
								</td>
								<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
										<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
												<span lang="EN-US">0xxxxxxx</span>
												<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
														<o:p>
														</o:p>
												</span>
										</p>
								</td>
						</tr>
						<tr style="">
								<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
										<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
												<span lang="EN-US">0080 - 07FF</span>
												<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
														<o:p>
														</o:p>
												</span>
										</p>
								</td>
								<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
										<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
												<span lang="EN-US">110xxxxx 10xxxxxx</span>
												<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
														<o:p>
														</o:p>
												</span>
										</p>
								</td>
						</tr>
						<tr style="">
								<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
										<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
												<span lang="EN-US">0800 - FFFF</span>
												<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
														<o:p>
														</o:p>
												</span>
										</p>
								</td>
								<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
										<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
												<span lang="EN-US">1110xxxx 10xxxxxx 10xxxxxx</span>
												<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
														<o:p>
														</o:p>
												</span>
										</p>
								</td>
						</tr>
				</tbody>
		</table>
		<p>
				<font face="宋体">例如<span lang="EN-US">“</span>汉<span lang="EN-US">”</span>字的<span lang="EN-US">Unicode</span>编码是<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="6" unitname="C"><span lang="EN-US">6C</span></st1:chmetcnv><span lang="EN-US">49</span>。<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="6" unitname="C"><span lang="EN-US">6C</span></st1:chmetcnv><span lang="EN-US">49</span>在<span lang="EN-US">0800-FFFF</span>之间，所以肯定要用<span lang="EN-US">3</span>字节模板了：<span style="color: blue;" lang="EN-US">1110</span><span lang="EN-US">xxxx <span style="color: blue;">10</span>xxxxxx <span style="color: blue;">10</span>xxxxxx</span>。将<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="6" unitname="C"><span lang="EN-US">6C</span></st1:chmetcnv><span lang="EN-US">49</span>写成二进制是：<span lang="EN-US">0110 110001 001001</span>， 用这个比特流依次代替模板中的<span lang="EN-US">x</span>，得到：<span style="color: blue;" lang="EN-US">1110</span><span lang="EN-US">0110 <span style="color: blue;">10</span>110001 <span style="color: blue;">10</span>001001</span>，即<span lang="EN-US">E6 B1 89</span>。</font>
		</p>
		<p>
				<font face="宋体">可以用记事本测试一下编码是否正确。需要注意，<span lang="EN-US">UltraEdit</span>在打开<span lang="EN-US">utf-8</span>编码的文本文件时会自动转换为<span lang="EN-US">UTF-16</span>，可能产生混淆。你可以在设置中关掉这个选项。更好的工具是<span lang="EN-US">Hex Workshop</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">UTF-16</span>以<span lang="EN-US">16</span>位为单元对<span lang="EN-US">UCS</span>进行编码。对于小于<span lang="EN-US">0x10000</span>的<span lang="EN-US">UCS</span>码，<span lang="EN-US">UTF-16</span>编码就等于<span lang="EN-US">UCS</span>码对应的<span lang="EN-US">16</span>位无符号整数。对于不小于<span lang="EN-US">0x10000</span>的<span lang="EN-US">UCS</span>码，定义了一个算法。不过由于实际使用的<span lang="EN-US">UCS2</span>，或者<span lang="EN-US">UCS4</span>的<span lang="EN-US">BMP</span>必然小于<span lang="EN-US">0x10000</span>，所以就目前而言，可以认为<span lang="EN-US"> UTF-16</span>和<span lang="EN-US">UCS-2</span>基本相同。但<span lang="EN-US">UCS-2</span>只是一个编码方案，<span lang="EN-US">UTF-16</span>却要用于实际的传输，所以就不得不考虑字节序的问题。</font>
		</p>
		<h3 style="margin: auto 0cm;">
				<font face="宋体">
						<span lang="EN-US">5</span>、<span lang="EN-US">UTF</span>的字节序和<span lang="EN-US">BOM</span></font>
		</h3>
		<p>
				<font face="宋体">
						<span lang="EN-US">UTF-8</span>以字节为编码单元，没有字节序的问题。<span lang="EN-US">UTF-16</span>以两个字节为编码单元，在解释一个<span lang="EN-US">UTF-16</span>文本前，首先要弄清楚每个编码单元的字 节序。例如<span lang="EN-US">“</span>奎<span lang="EN-US">”</span>的<span lang="EN-US">Unicode</span>编码是<span lang="EN-US">594E</span>，<span lang="EN-US">“</span>乙<span lang="EN-US">”</span>的<span lang="EN-US">Unicode</span>编码是<span lang="EN-US">4E59</span>。如果我们收到<span lang="EN-US">UTF-16</span>字节流<span lang="EN-US">“594E”</span>，那么这是<span lang="EN-US">“</span>奎<span lang="EN-US">” </span>还是<span lang="EN-US">“</span>乙<span lang="EN-US">”</span>？</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">Unicode</span>规范中推荐的标记字节顺序的方法是<span lang="EN-US">BOM</span>。<span lang="EN-US">BOM</span>不是<span lang="EN-US">“Bill Of Material”</span>的<span lang="EN-US">BOM</span>表，而是<span lang="EN-US">Byte Order Mark</span>。<span lang="EN-US">BOM</span>是一个有点小聪明的想法：</font>
		</p>
		<p>
				<font face="宋体">在<span lang="EN-US">UCS</span>编码中有一个叫做<span lang="EN-US">"ZERO WIDTH NO-BREAK SPACE"</span>的字符，它的编码是<span lang="EN-US">FEFF</span>。而<span lang="EN-US">FFFE</span>在<span lang="EN-US">UCS</span>中是不存在的字符，所以不应该出现在实际传输中。<span lang="EN-US">UCS</span>规范建议我们在传输字节流前，先传输 字符<span lang="EN-US">"ZERO WIDTH NO-BREAK SPACE"</span>。</font>
		</p>
		<p>
				<font face="宋体">这样如果接收者收到<span lang="EN-US">FEFF</span>，就表明这个字节流是<span lang="EN-US">Big-Endian</span>的；如果收到<span lang="EN-US">FFFE</span>，就表明这个字节流是<span lang="EN-US">Little-Endian</span>的。因此字符<span lang="EN-US">"ZERO WIDTH NO-BREAK SPACE"</span>又被称作<span lang="EN-US">BOM</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">UTF-8</span>不需要<span lang="EN-US">BOM</span>来表明字节顺序，但可以用<span lang="EN-US">BOM</span>来表明编码方式。字符<span lang="EN-US">"ZERO WIDTH NO-BREAK SPACE"</span>的<span lang="EN-US">UTF-8</span>编码是<span lang="EN-US">EF BB BF</span>（读者可以用我们前面介绍的编码方法验证一下）。所以如果接收者收到以<span lang="EN-US">EF BB BF</span>开头的字节流，就知道这是<span lang="EN-US">UTF-8</span>编码了。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">Windows</span>就是使用<span lang="EN-US">BOM</span>来标记文本文件的编码方式的。</font>
		</p>
		<h3 style="margin: auto 0cm;">
				<font face="宋体">附录<span lang="EN-US">1</span>区位码、<span lang="EN-US">GB2312</span>、内码和代码页</font>
		</h3>
		<p>
				<font face="宋体">有的朋友对文章中这句话还有疑问：</font>
				<span lang="EN-US">
						<br />
						<font face="宋体">“GB2312</font>
				</span>
				<font face="宋体">的原文还是区位码，从区位码到内码，需要在高字节和低字节上分别加上<span lang="EN-US">A0</span>。<span lang="EN-US">”</span></font>
		</p>
		<p>
				<font face="宋体">我再详细解释一下：</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">“GB2312</span>的原文<span lang="EN-US">”</span>是指国家<span lang="EN-US">1980</span>年的一个标准《中华人民共和国国家标准 信息交换用汉字编码字符集 基本集<span lang="EN-US"> GB 2312-80</span>》。这个标准用两个数来编码汉字和中文符号。第一个数称为<span lang="EN-US">“</span>区<span lang="EN-US">”</span>，第二个数称为<span lang="EN-US">“</span>位<span lang="EN-US">”</span>。所以也称为区位码。<span lang="EN-US">1-9</span>区是中文符号，<span lang="EN-US">16-55 </span>区是一级汉字，<span lang="EN-US">56-87</span>区是二级汉字。现在<span lang="EN-US">Windows</span>也还有区位输入法，例如输入<span lang="EN-US">1601</span>得到<span lang="EN-US">“</span>啊<span lang="EN-US">”</span>。（这个区位输入法可以自动识别<span lang="EN-US">16</span>进制的<span lang="EN-US"> GB2312</span>和<span lang="EN-US">10</span>进制的区位码，也就是说输入<span lang="EN-US">B<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="0" unitname="a">0A</st1:chmetcnv>1</span>同样会得到<span lang="EN-US">“</span>啊<span lang="EN-US">”</span>。）</font>
		</p>
		<p>
				<font face="宋体">内码是指操作系统内部的字符编码。早期操作系统的内码是与语言相关的。现在的<span lang="EN-US">Windows</span>在系统内部支持<span lang="EN-US">Unicode</span>，然后用代码页适应各种语言，<span lang="EN-US">“</span>内码<span lang="EN-US">”</span>的概念就比较模糊了。微软一般将缺省代码页指定的编码说成是内码。</font>
		</p>
		<p>
				<font face="宋体">内码这个词汇，并没有什么官方的定义，代码页也只是微软这个公司的叫法。作为程序员，我们只要知道它们是什么东西，没有必要过多地考证这些名词。</font>
		</p>
		<p>
				<font face="宋体">所谓代码页<span lang="EN-US">(code page)</span>就是针对一种语言文字的字符编码。例如<span lang="EN-US">GBK</span>的<span lang="EN-US">code page</span>是<span lang="EN-US">CP936</span>，<span lang="EN-US">BIG5</span>的<span lang="EN-US">code page</span>是<span lang="EN-US">CP950</span>，<span lang="EN-US">GB2312</span>的<span lang="EN-US">code page</span>是<span lang="EN-US">CP20936</span>。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">Windows</span>中有缺省代码页的概念，即缺省用什么编码来解释字符。例如<span lang="EN-US">Windows</span>的记事本打开了一个文本文件，里面的内容是字节流：<span lang="EN-US">BA</span>、<span lang="EN-US">BA</span>、<span lang="EN-US">D7</span>、<span lang="EN-US">D6</span>。<span lang="EN-US">Windows</span>应该去怎么解释它呢？</font>
		</p>
		<p>
				<font face="宋体">是按照<span lang="EN-US">Unicode</span>编码解释、还是按照<span lang="EN-US">GBK</span>解释、还是按照<span lang="EN-US">BIG5</span>解释，还是按照<span lang="EN-US">ISO8859-1</span>去解释？如果按<span lang="EN-US">GBK</span>去解释，就会得到<span lang="EN-US">“</span>汉 字<span lang="EN-US">”</span>两个字。按照其它编码解释，可能找不到对应的字符，也可能找到错误的字符。所谓<span lang="EN-US">“</span>错误<span lang="EN-US">”</span>是指与文本作者的本意不符，这时就产生了乱码。</font>
		</p>
		<p>
				<font face="宋体">答案是<span lang="EN-US">Windows</span>按照当前的缺省代码页去解释文本文件里的字节流。缺省代码页可以通过控制面板的区域选项设置。记事本的另存为中有一项<span lang="EN-US">ANSI</span>，其实就是按照缺省代码页的编码方法保存。</font>
		</p>
		<p>
				<font face="宋体">
						<span lang="EN-US">Windows</span>的内码是<span lang="EN-US">Unicode</span>，它在技术上可以同时支持多个代码页。只要文件能说明自己使用什么编码，用户又安装了对应的代码页，<span lang="EN-US">Windows</span>就能正确显示，例如在<span lang="EN-US">HTML</span>文件中就可以指定<span lang="EN-US">charset</span>。</font>
		</p>
		<p>
				<font face="宋体">有的<span lang="EN-US">HTML</span>文件作者，特别是英文作者，认为世界上所有人都使用英文，在文件中不指定<span lang="EN-US">charset</span>。如果他使用了<span lang="EN-US">0x80-0xff</span>之间的字符， 中文<span lang="EN-US">Windows</span>又按照缺省的<span lang="EN-US">GBK</span>去解释，就会出现乱码。这时只要在这个<span lang="EN-US">html</span>文件中加上指定<span lang="EN-US">charset</span>的语句，例如：</font>
				<span lang="EN-US">
						<br />
						<font face="宋体">&lt;meta http-equiv="Content-Type" content="text/html; charset=ISO8859-1"&gt;<br /></font>
				</span>
				<font face="宋体">如果原作者使用的代码页和<span lang="EN-US">ISO8859-1</span>兼容，就不会出现乱码了。</font>
		</p>
		<p>
				<font face="宋体">再说区位码，啊的区位码是<span lang="EN-US">1601</span>，写成<span lang="EN-US">16</span>进制是<span lang="EN-US">0x10,0x01</span>。这和计算机广泛使用的<span lang="EN-US">ASCII</span>编码冲突。为了兼容<span lang="EN-US">00<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="True" hasspace="False" sourcevalue="7" unitname="F">-7f</st1:chmetcnv></span>的<span lang="EN-US">ASCII </span>编码，我们在区位码的高、低字节上分别加上<span lang="EN-US">A0</span>。这样<span lang="EN-US">“</span>啊<span lang="EN-US">”</span>的编码就成为<span lang="EN-US">B<st1:chmetcnv w:st="on" tcsc="0" numbertype="1" negative="False" hasspace="False" sourcevalue="0" unitname="a">0A</st1:chmetcnv>1</span>。我们将加过两个<span lang="EN-US">A0</span>的编码也称为<span lang="EN-US">GB2312</span>编码，虽然<span lang="EN-US">GB2312</span>的 原文根本没提到这一点。<span lang="EN-US"></span></font>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">v</span>
		</p>
<img src ="http://www.blogjava.net/cslee/aggbug/60994.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/cslee/" target="_blank">清风逐月</a> 2006-07-31 13:12 <a href="http://www.blogjava.net/cslee/articles/60994.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Unicode 的编码和实现</title><link>http://www.blogjava.net/cslee/articles/60992.html</link><dc:creator>清风逐月</dc:creator><author>清风逐月</author><pubDate>Mon, 31 Jul 2006 05:04:00 GMT</pubDate><guid>http://www.blogjava.net/cslee/articles/60992.html</guid><wfw:comment>http://www.blogjava.net/cslee/comments/60992.html</wfw:comment><comments>http://www.blogjava.net/cslee/articles/60992.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/cslee/comments/commentRss/60992.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/cslee/services/trackbacks/60992.html</trackback:ping><description><![CDATA[
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">大概来说，</span>
				<span lang="EN-US">Unicode </span>
				<span style="font-family: 宋体;">编码系统可分为编码方式和实现方式两个层次。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">编码方式</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">Unicode </span>
				<span style="font-family: 宋体;">的编码方式与</span>
				<span lang="EN-US"> ISO 10646 </span>
				<span style="font-family: 宋体;">的通用字符集（</span>
				<span lang="EN-US">Universal Character Set</span>
				<span style="font-family: 宋体;">，</span>
				<span lang="EN-US">UCS</span>
				<span style="font-family: 宋体;">）概念相对应，目前实际应用的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">版本对应于</span>
				<span lang="EN-US"> UCS-2</span>
				<span style="font-family: 宋体;">，使用</span>
				<span lang="EN-US">16</span>
				<span style="font-family: 宋体;">位的编码空间。也就是每个字符占用</span>
				<span lang="EN-US">2</span>
				<span style="font-family: 宋体;">个字节。这样理论上一共最多可以表示</span>
				<span lang="EN-US"> 216 </span>
				<span style="font-family: 宋体;">个字符。基本满足各种语言的使用。实际上目前版本的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">尚未填充满这</span>
				<span lang="EN-US">16</span>
				<span style="font-family: 宋体;">位编码，保留了大量空间作为特殊使用或将来扩展。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">上述</span>
				<span lang="EN-US">16</span>
				<span style="font-family: 宋体;">位</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">字符构成基本多文种平面（</span>
				<span lang="EN-US">Basic Multilingual Plane, </span>
				<span style="font-family: 宋体;">简称</span>
				<span lang="EN-US"> BMP</span>
				<span style="font-family: 宋体;">）。最新（但未实际广泛使用）的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">版本定义了</span>
				<span lang="EN-US">16</span>
				<span style="font-family: 宋体;">个辅助平面，两者合起来至少需要占据</span>
				<span lang="EN-US">21</span>
				<span style="font-family: 宋体;">位的编码空间，比</span>
				<span lang="EN-US">3</span>
				<span style="font-family: 宋体;">字节略少。但事实上辅助平面字符仍然占用</span>
				<span lang="EN-US">4</span>
				<span style="font-family: 宋体;">字节编码空间，与</span>
				<span lang="EN-US"> UCS-4 </span>
				<span style="font-family: 宋体;">保持一致。未来版本会扩充到</span>
				<span lang="EN-US"> ISO 10646-1 </span>
				<span style="font-family: 宋体;">实现级别</span>
				<span lang="EN-US">3</span>
				<span style="font-family: 宋体;">，即涵盖</span>
				<span lang="EN-US"> UCS-4 </span>
				<span style="font-family: 宋体;">的所有字符。</span>
				<span lang="EN-US">UCS-4 </span>
				<span style="font-family: 宋体;">是一个更大的尚未填充完全的</span>
				<span lang="EN-US">31</span>
				<span style="font-family: 宋体;">位字符集，加上恒为</span>
				<span lang="EN-US">0</span>
				<span style="font-family: 宋体;">的首位，共需占据</span>
				<span lang="EN-US">32</span>
				<span style="font-family: 宋体;">位，即</span>
				<span lang="EN-US">4</span>
				<span style="font-family: 宋体;">字节。理论上最多能表示</span>
				<span lang="EN-US"> 231 </span>
				<span style="font-family: 宋体;">个字符，完全可以涵盖一切语言所用的符号。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">BMP </span>
				<span style="font-family: 宋体;">字符的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码表示为</span>
				<span lang="EN-US"> U+hhhh</span>
				<span style="font-family: 宋体;">，其中每个</span>
				<span lang="EN-US"> h </span>
				<span style="font-family: 宋体;">代表一个十六进制数位。与</span>
				<span lang="EN-US"> UCS-2 </span>
				<span style="font-family: 宋体;">编码完全相同。对应的</span>
				<span lang="EN-US">4</span>
				<span style="font-family: 宋体;">字节</span>
				<span lang="EN-US"> UCS-4 </span>
				<span style="font-family: 宋体;">编码后两个字节一致，前两个字节的所有位均为</span>
				<span lang="EN-US">0</span>
				<span style="font-family: 宋体;">。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">实现方式</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">Unicode </span>
				<span style="font-family: 宋体;">的实现方式不同于编码方式。一个字符的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码是确定的。但是在实际传输过程中，由于不同系统平台的设计不一定一致，以及出于节省空间的目的，对</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码的实现方式有所不同。</span>
				<span lang="EN-US">Unicode </span>
				<span style="font-family: 宋体;">的实现方式称为</span>
				<span lang="EN-US">Unicode</span>
				<span style="font-family: 宋体;">转换格式（</span>
				<span lang="EN-US">Unicode Translation Format</span>
				<span style="font-family: 宋体;">，简称为</span>
				<span lang="EN-US"> UTF</span>
				<span style="font-family: 宋体;">）。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">例如，如果一个仅包含基本</span>
				<span lang="EN-US">7</span>
				<span style="font-family: 宋体;">位</span>
				<span lang="EN-US">ASCII</span>
				<span style="font-family: 宋体;">字符的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">文件，如果每个字符都使用</span>
				<span lang="EN-US">2</span>
				<span style="font-family: 宋体;">字节的原</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码传输，其第一字节的</span>
				<span lang="EN-US">8</span>
				<span style="font-family: 宋体;">位始终为</span>
				<span lang="EN-US">0</span>
				<span style="font-family: 宋体;">。这就造成了比较大的浪费。对于这种情况，可以使用</span>
				<span lang="EN-US"> UTF-8 </span>
				<span style="font-family: 宋体;">编码，这是一种变长编码，它将基本</span>
				<span lang="EN-US">7</span>
				<span style="font-family: 宋体;">位</span>
				<span lang="EN-US">ASCII</span>
				<span style="font-family: 宋体;">字符仍用</span>
				<span lang="EN-US">7</span>
				<span style="font-family: 宋体;">位编码表示，占用一个字节（首位补</span>
				<span lang="EN-US">0</span>
				<span style="font-family: 宋体;">）。而遇到与其他</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">字符混合的情况，将按一定算法转换，每个字符使用</span>
				<span lang="EN-US">1-3</span>
				<span style="font-family: 宋体;">个字节编码，并利用首位为</span>
				<span lang="EN-US">0</span>
				<span style="font-family: 宋体;">或</span>
				<span lang="EN-US">1</span>
				<span style="font-family: 宋体;">进行识别。这样对以</span>
				<span lang="EN-US">7</span>
				<span style="font-family: 宋体;">位</span>
				<span lang="EN-US">ASCII</span>
				<span style="font-family: 宋体;">字符为主的西文文档就大大节省了编码长度</span>
				<span style="font-family: 宋体;">。类似的，对未来会出现的需要</span>
				<span lang="EN-US">4</span>
				<span style="font-family: 宋体;">个字节的辅助平面字符和其他</span>
				<span lang="EN-US"> UCS-4 </span>
				<span style="font-family: 宋体;">扩充字符，</span>
				<span lang="EN-US">2</span>
				<span style="font-family: 宋体;">字节编码的</span>
				<span lang="EN-US"> UTF-16 </span>
				<span style="font-family: 宋体;">也需要通过一定的算法进行转换。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">再如，如果直接使用与</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码一致（仅限于</span>
				<span lang="EN-US"> BMP </span>
				<span style="font-family: 宋体;">字符）的</span>
				<span lang="EN-US"> UTF-16 </span>
				<span style="font-family: 宋体;">编码，由于每个字符占用了两个字节，在</span>
				<span lang="EN-US">Macintosh</span>
				<span style="font-family: 宋体;">机和</span>
				<span lang="EN-US">PC</span>
				<span style="font-family: 宋体;">机上对字节顺序的理解是不一致的。这时同一字节流可能会被解释为不同内容，如编码为</span>
				<span lang="EN-US"> U+594E </span>
				<span style="font-family: 宋体;">的字符“奎”同编码为</span>
				<span lang="EN-US"> U+4E59 </span>
				<span style="font-family: 宋体;">的“乙”就可能发生混淆。于是在</span>
				<span lang="EN-US"> UTF-16 </span>
				<span style="font-family: 宋体;">编码实现方式中使用了大尾序（</span>
				<span lang="EN-US">big-endian</span>
				<span style="font-family: 宋体;">）、小尾序（</span>
				<span lang="EN-US">little-endian</span>
				<span style="font-family: 宋体;">）的概念，以及</span>
				<span lang="EN-US">BOM</span>
				<span style="font-family: 宋体;">（</span>
				<span lang="EN-US">Byte Order Mark</span>
				<span style="font-family: 宋体;">）解决方案。（具体方案参见</span>
				<span lang="EN-US">UTF-16</span>
				<span style="font-family: 宋体;">）</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">此外</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">的实现方式还包括</span>
				<span lang="EN-US"> UTF-7</span>
				<span style="font-family: 宋体;">、</span>
				<span lang="EN-US">Punycode</span>
				<span style="font-family: 宋体;">、</span>
				<span lang="EN-US">CESU-8</span>
				<span style="font-family: 宋体;">、</span>
				<span lang="EN-US">SCSU</span>
				<span style="font-family: 宋体;">、</span>
				<span lang="EN-US">UTF-32</span>
				<span style="font-family: 宋体;">等，这些实现方式有些仅在一定的国家和地区使用，有些则属于未来的规划方式。目前通用的实现方式是</span>
				<span lang="EN-US"> UTF-16</span>
				<span style="font-family: 宋体;">小尾序（</span>
				<span lang="EN-US">BOM</span>
				<span style="font-family: 宋体;">）、</span>
				<span lang="EN-US">UTF-16</span>
				<span style="font-family: 宋体;">大尾序（</span>
				<span lang="EN-US">BOM</span>
				<span style="font-family: 宋体;">）和</span>
				<span lang="EN-US"> UTF-8</span>
				<span style="font-family: 宋体;">。在微软公司</span>
				<span lang="EN-US">Windows XP</span>
				<span style="font-family: 宋体;">操作系统附带的记事本中，“另存为”对话框可以选择的四种编码方式除去非</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码的</span>
				<span lang="EN-US"> ANSI </span>
				<span style="font-family: 宋体;">外，其余三种“</span>
				<span lang="EN-US">Unicode</span>
				<span style="font-family: 宋体;">”、“</span>
				<span lang="EN-US">Unicode big endian</span>
				<span style="font-family: 宋体;">”和“</span>
				<span lang="EN-US">UTF-8</span>
				<span style="font-family: 宋体;">”即分别对应这三种实现方式。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">目前辅助平面的工作主要集中在第二和第三平面的中日韩统一表意文字中，因此包括</span>
				<span lang="EN-US">GBK</span>
				<span style="font-family: 宋体;">、</span>
				<span lang="EN-US">GB18030</span>
				<span style="font-family: 宋体;">、</span>
				<span lang="EN-US">Big5</span>
				<span style="font-family: 宋体;">等简体中文、正体中文、日文、韩文以及越南字喃的各种编码与</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">的协调性被重点关注。考虑到</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">最终要涵盖所有的字符，从某种意义而言，这些编码方式也可视作</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">的出现于其之前的既成事实的实现方式，如同</span>
				<span lang="EN-US">ASCII</span>
				<span style="font-family: 宋体;">及其扩展</span>
				<span lang="EN-US">Latin-1</span>
				<span style="font-family: 宋体;">一样，后两者的字符在</span>
				<span lang="EN-US">16</span>
				<span style="font-family: 宋体;">位</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码空间中的编码第一字节各位全为</span>
				<span lang="EN-US">0</span>
				<span style="font-family: 宋体;">，第二字节编码与原编码完全一致。但上述东亚语言编码与</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码的对应关系要复杂得多。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">非</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">环境</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">在非</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">环境下，由于不同国家和地区采用的字符集不一致，很可能出现无法正常显示所有字符的情况。微软公司使用了代码页（</span>
				<span lang="EN-US">Codepage</span>
				<span style="font-family: 宋体;">）转换表的技术来过渡性的部分解决这一问题，即通过指定的转换表将非</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">的字符编码转换为同一字符对应的系统内部使用的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码。可以在“语言与区域设置”中选择一个代码页作为非</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码所采用的默认编码方式，如</span>
				<span lang="EN-US">936</span>
				<span style="font-family: 宋体;">为简体中文</span>
				<span lang="EN-US">GBK</span>
				<span style="font-family: 宋体;">，</span>
				<span lang="EN-US">950</span>
				<span style="font-family: 宋体;">为正体中文</span>
				<span lang="EN-US">Big5</span>
				<span style="font-family: 宋体;">（皆指</span>
				<span lang="EN-US">PC</span>
				<span style="font-family: 宋体;">上使用的）。在这种情况下，一些非英语的欧洲语言编写的软件和文档很可能出现乱码。而将代码页设置为相应语言中文处理又会出现问题，这一情况无法避免。从根本上说，完全采用统一编码才是解决之道，但目前尚无法做到这一点。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">代码页技术现在广泛为各种平台所采用。</span>
				<span lang="EN-US">UTF-7 </span>
				<span style="font-family: 宋体;">的代码页是</span>
				<span lang="EN-US">65000</span>
				<span style="font-family: 宋体;">，</span>
				<span lang="EN-US">UTF-8 </span>
				<span style="font-family: 宋体;">的代码页是</span>
				<span lang="EN-US">65001</span>
				<span style="font-family: 宋体;">。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">XML </span>
				<span style="font-family: 宋体;">和</span>
				<span lang="EN-US"> Unicode<o:p></o:p></span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">XML</span>
				<span style="font-family: 宋体;">及其子集</span>
				<span lang="EN-US">HTML</span>
				<span style="font-family: 宋体;">采用</span>
				<span lang="EN-US">UTF-8</span>
				<span style="font-family: 宋体;">作为标准字集，理论上我们可以在各种支持</span>
				<span lang="EN-US">XML</span>
				<span style="font-family: 宋体;">标准的浏览器上显示任何地区文字的网页，只要电脑本身安装有合适的字体即可。可以利用</span>
				<span lang="EN-US">&amp;#nnn;</span>
				<span style="font-family: 宋体;">的格式显示特定的字符。</span>
				<span lang="EN-US">nnn</span>
				<span style="font-family: 宋体;">代表该字符的十进制</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">代码。如果采用十六进制代码，在编码之前加上</span>
				<span lang="EN-US">x</span>
				<span style="font-family: 宋体;">字符即可。但部分旧版本的浏览器可能无法识别十六进制代码。</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">然而部分由于</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">版本发展原因，很多浏览器只能显示</span>
				<span lang="EN-US"> UCS-2 </span>
				<span style="font-family: 宋体;">完整字符集也即现在使用的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">版本中的一个小子集。下表可以检验您的浏览器怎样显示各种各样的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">代码：</span>
				<span lang="EN-US">
						<o:p>
						</o:p>
				</span>
		</p>
		<div align="center">
				<table class="MsoNormalTable" style="" border="1" cellpadding="0">
						<tbody>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<b>
																<span style="font-size: 12pt; font-family: 宋体;">代码<span lang="EN-US"><o:p></o:p></span></span>
														</b>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<b>
																<span style="font-size: 12pt; font-family: 宋体;">字符标准名称<span lang="EN-US"> (</span>英语<span lang="EN-US">)<o:p></o:p></span></span>
														</b>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<b>
																<span style="font-size: 12pt; font-family: 宋体;">在浏览器上的显示<span lang="EN-US"><o:p></o:p></span></span>
														</b>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#65;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">大写拉丁字母<span lang="EN-US">"A"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">A<o:p></o:p></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#223;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">小写拉丁字母<span lang="EN-US">"Sharp S"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">ß<o:p></o:p></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#254;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">小写拉丁字母<span lang="EN-US">"Thorn"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">þ<o:p></o:p></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#916;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">大写希腊字母<span lang="EN-US">"Delta"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">Δ<o:p></o:p></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#1049;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">大写斯拉夫字母<span lang="EN-US">"Short I"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">Й<o:p></o:p></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#1511;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">希伯来字母<span lang="EN-US">"Qof"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt;" lang="EN-US">ק</span>
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
																<o:p>
																</o:p>
														</span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#1605;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">阿拉伯字母<span lang="EN-US"> "Meem"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体-18030;" lang="EN-US">م</span>
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
																<o:p>
																</o:p>
														</span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#3671;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
																<a title="泰文字母" href="https://secure.wikimedia.org/wikipedia/zh/w/index.php?title=%E6%B3%B0%E6%96%87%E5%AD%97%E6%AF%8D&amp;action=edit">
																		<span style="color: windowtext; text-decoration: none;" lang="EN-US">
																				<span lang="EN-US">泰文</span>
																		</span>
																</a>
														</span>
														<span style="font-size: 12pt; font-family: 宋体;">数字<span lang="EN-US"> 7<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: Tahoma;" lang="EN-US">๗</span>
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
																<o:p>
																</o:p>
														</span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#4688;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">埃塞俄比亚音节文字<span lang="EN-US">"Qha"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">ቐ<o:p></o:p></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#12354;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">日语平假名<span lang="EN-US"> "A"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;">あ<span lang="EN-US"><o:p></o:p></span></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#12450;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">日语片假名<span lang="EN-US"> "A"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;">ア<span lang="EN-US"><o:p></o:p></span></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#21494;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">简体汉字<span lang="EN-US"> "</span>叶<span lang="EN-US">"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;">叶<span lang="EN-US"><o:p></o:p></span></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#33865;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">正体汉字<span lang="EN-US"> "</span>葉<span lang="EN-US">"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: 宋体;">葉<span lang="EN-US"><o:p></o:p></span></span>
												</p>
										</td>
								</tr>
								<tr style="">
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: right;" align="right">
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">&amp;#50685;<o:p></o:p></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
														<span style="font-size: 12pt; font-family: 宋体;">韩国音节文字<span lang="EN-US"> " Yeob"<o:p></o:p></span></span>
												</p>
										</td>
										<td style="border: medium none rgb(236, 233, 216); padding: 0.75pt; background-color: transparent;">
												<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center">
														<span style="font-size: 12pt; font-family: Batang;">엽</span>
														<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
																<o:p>
																</o:p>
														</span>
												</p>
										</td>
								</tr>
						</tbody>
				</table>
		</div>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left">
				<span style="font-size: 12pt; font-family: 宋体;" lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">输入</span>
				<span lang="EN-US">Unicode<o:p></o:p></span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span lang="EN-US">
						<o:p> </o:p>
				</span>
		</p>
		<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">
				<span style="font-family: 宋体;">除了输入法外，操作系统会提供几种方法输入</span>
				<span lang="EN-US">Unicode</span>
				<span style="font-family: 宋体;">。像是</span>
				<span lang="EN-US">Windows 2000</span>
				<span style="font-family: 宋体;">之后的</span>
				<span lang="EN-US">Windows</span>
				<span style="font-family: 宋体;">系统就提供一个可点击的表。例如在</span>
				<span lang="EN-US">Microsoft Word</span>
				<span style="font-family: 宋体;">之下，按下</span>
				<span lang="EN-US"> Alt </span>
				<span style="font-family: 宋体;">键不放，输入</span>
				<span lang="EN-US"> 0 </span>
				<span style="font-family: 宋体;">和某个字符的</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码（十进制），再松开</span>
				<span lang="EN-US"> Alt </span>
				<span style="font-family: 宋体;">键即可得到该字符，如</span>
				<span lang="EN-US">Alt + 033865</span>
				<span style="font-family: 宋体;">会得到</span>
				<span lang="EN-US">Unicode</span>
				<span style="font-family: 宋体;">字符叶。另外按</span>
				<span lang="EN-US">Alt + X </span>
				<span style="font-family: 宋体;">组合键，</span>
				<span lang="EN-US">MS Word </span>
				<span style="font-family: 宋体;">也会将光标前面的字符同其十六进制的四位</span>
				<span lang="EN-US"> Unicode </span>
				<span style="font-family: 宋体;">编码进行互相转换。</span>
		</p>
<img src ="http://www.blogjava.net/cslee/aggbug/60992.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/cslee/" target="_blank">清风逐月</a> 2006-07-31 13:04 <a href="http://www.blogjava.net/cslee/articles/60992.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>