我在网上找的与utf、gbk转换相关的资料

编程时中文编码问题总是令人头疼，加班中也遇到了需要把UTF-8转为GBK的情况：

1、疑问：用new String(str.getBytes("UTF-8"),"GBK")，为什么不能把utf-8转为gbk???

2、想到一个BT一点的转码方式：）

URLDecoder.decode(URLEncoder.encode(str,"gbk"),"gbk")，

其中str为utf-8 String，结果被转为GBK，呵呵，很有趣。

**************************************************************************

在url中,多字节被转换成了application/x-www-form-urlencoded MIME 格式.你自己编转换程序也没用.你应该用URLDecoder 类来首先将那种格式转换成UTF-8,然后就可转换成 GBK 了:

System.out.println("反对"+new String(URLDecoder(s, "utf-8").getBytes("utf-8"),"GBK"));

**************************************************************************************************

UTF-8版本转换为GBK版本

适用人群：从MOLYX等UTF-8版本转换过来的，装了UTF-8版本后悔的
简易程度：简单
教程制作：梦迟教程（jiaocheng.org）

方法如下：

本人为客服之家（kefu.net.cn）做论坛一开始用的是molyx2.5的程序，经过一段时间感觉不是很习惯，于是决定转换程序，去DZ的官方论坛找到一个转dz4的转换程序，转换过程非常顺利。但是转换必须是用UTF-8版本的程序，然而好多插件和风格都是用的GBK格式，所以造成没法使用，于是就想在写pw4转换molyx2.5的时候需要先将数据库从GBK转换到utf-8，所以想到如果现在将UTF-8转换回GBK在导入到GBK格式的论坛是否可以呢。于是试验了一下，结果也是很顺利，废话少说，下面开始实战。

一、首先将DZ论坛的UTF-8数据库在后台导出，然后下载到本地，存起来。

二、下载convertz这个软件，下载可从本站下载：

http://www.jiaocheng.org/soft/convertz802.zip

三、用convertz转换格式

1.解压缩convertz，运行里面的ConvertZ.exe如下图

2.点击文件按钮，按照以下动画演示做。

四、全新安装dz的GBK格式论坛程序，把转换后的文件上传到备份目录如：forumdata文件夹下。

五、登陆新系统，在后台数据库将转换后的数据库导入新论坛，更新缓存等。即可

演示：

原UTF8版本

http://msvip.com.cn

转换后GBK

http://kefu.net.cn

转换后基本没有什么错误，只是有些文字乱码，不是很严重。有不清楚的可以问我，谢谢大家支持

梦迟教程原创（jiaocheng.org），转载请注明出处

***********************************************************************************************

GBK 汉字转 UTF-8 汉字

来源：CSDN 发布会员：新书城收集整理发布时间：2006-8-8 人气：93

近日做一程序，需要将不同内码的文字转换成为某一种统一编码的文字（例如将 GBK 编码的汉字转换为 UTF-8 编码的汉字）。网上关于不同内码文字处理的文章，大都是关于解决汉字乱码问题的。而我需要做的，类似于 UltraEdit 中的 convertion 中的功能。

开始时，尝试了诸如
    new String(str.getBytes("GBK"), "UTF-8");
之类的方法。对于内码转换来说，这些方法都不是正确的。这些方法，对于解决汉字显示乱码是实用的，但是并不能正确地将 GBK 汉字映射到具有相同意义的 UTF-8 汉字上去。

我们都知道，在 JVM 内部，所有的字符串都是转换成为 Unicode 编码来处理的。我们从一个 GBK 编码的文本中读取的内容，写到另外一个 UTF-8 编码的文本文件中去，并不会出现乱码的问题。似乎可以猜测到，我们可以利用 Java IO 中的 Stream 来良好的处理内码转换的问题。为了方便起见，可以借助 Apache Commons-IO 项目中提供的实用工具来编写代码。
    /* gbkString 为一 GBK 编码的字符串 */
    String utf8String = IOUtils.toString(IOUtils.toInputStream(gbkString, "UTF-8"));
utf8String中字符，皆变为 UTF-8 编码。

附，com.apache.commons.io.IOUtils 中相关代码如下：
    /**
     * Convert the specified string to an input stream, encoded as bytes
     * using the specified character encoding.
     * <p>
     * Character encoding names can be found at
     * <a href="http://www.iana.org/assignments/character-sets">IANA</a>.
     *
     * @param input the string to convert
     * @param encoding the encoding to use, null means platform default
     * @throws IOException if the encoding is invalid
     * @return an input stream
     * @since Commons IO 1.1
     */
    public static InputStream toInputStream(String input, String encoding) throws IOException {
        byte[] bytes = encoding != null ? input.getBytes(encoding) : input.getBytes();
        return new ByteArrayInputStream(bytes);
    }

**************************************************************************************************************

tomcat中文问题的解决

在tomcat5中发现了以前处理tomcat4的方法不能适用于处碇苯油ü齯rl提交的请求，上网找资料终于发现了最完美的解决办法，不用每个地方都转换了，而且无论get,和post都正常。写了个文档，贴出来希望跟我有同样问题的人不再像我一样痛苦一次:-)

问题描述：

1 表单提交的数据，用request.getParameter(“xxx”)返回的字符串为乱码或者？？
2 直接通过url如http://localhost/a.jsp?name=中国，这样的get请求在服务端用request. getParameter(“name”)时返回的是乱码；按tomcat4的做法设置Filter也没有用或者用request.setCharacterEncoding("GBK");也不管用

原因：

1 tomcat的j2ee实现对表单提交即post方式提示时处理参数采用缺省的iso-8859-1来处理
2 tomcat对get方式提交的请求对query-string 处理时采用了和post方法不一样的处理方式。(与tomcat4不一样,所以设置setCharacterEncoding(“gbk”))不起作用。

解决办法：

首先所有的jsp文件都加上:

1 实现一个Filter.设置处理字符集为GBK。(在tomcat的webapps/servlet-examples目录有一个完整的例子。请参考web.xml和SetCharacterEncodingFilter的配置。)

1)只要把%TOMCAT安装目录%/ webapps\servlets-examples\WEB-INF\classes\filters\SetCharacterEncodingFilter.class文件拷到你的webapp目录/filters下，如果没有filters目录，就创建一个。
2)在你的web.xml里加入如下几行：

    <filter>
        <filter-name>Set Character Encoding</filter-name>
        <filter-class>filters.SetCharacterEncodingFilter</filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>GBK</param-value>
        </init-param>
    </filter>
        <filter-mapping>
        <filter-name>Set Character Encoding</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

3)完成.

2 get方式的解决办法

1) 打开tomcat的server.xml文件，找到区块，加入如下一行：URIEncoding=”GBK”

完整的应如下：

<Connector port="80" maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               debug="0" connectionTimeout="20000"
               disableUploadTimeout="true"
               URIEncoding="GBK"/>

2)重启tomcat,一切OK。

执行如下jsp页页测试是否成功

<%@ page contentType="text/html;charset=gb2312"%>
<%@ page import="java.util.*"%>
<%
String q=request.getParameter("q");
q = q == null? "没有值" : q;
%>
<HTML>
<HEAD>
<TITLE>新闻列表显示</TITLE>
<META http-equiv=Content-Type content="text/html; charset=gb2312">
<META http-equiv=pragma content=no-cache>
<body>你提交了：<%=q%><br>
<form action="tcnchar.jsp" method="post">
输入中文:<input type="text" name="q"><input type="submit" value="确定"> <br>
<a href="tcnchar.jsp?q=中国">通过get方式提交</a>
</form>
</BODY>
</HTML>

*******************************************************************************************

http://jakarta.apache.org/commons/httpclient/methods/post.html

*********************************************************************************************

请教个UTF-8转GBK的问题，谢谢

目的：要从一个WEB server下载内容，格式为UTF-8, 要转换为GBK输出

出现的问题：只有部分中文字可以转换为GBK，不能转换的都输出为“？”号，比如“我”可以正常转换，但“道”字就不能正常转换了,大家帮帮看看是什么问题吧

解决问题的思路：

1。先去了解了UTF-8的编码方法，将读出后BUFF的字串转换了byte数据，按16进制输出，发现不通转换的那些中文的编码不正常，比如“载”字，正确的UTF-8编码应该为E8BDBD ，但输出结果为E8BD3F，因此怀疑是WEB服务器传过来就有问题，但没理由的呀，浏览器显示却是正常的。
2。用SNIFFER 抓取数据包来分析，数据包里的编码也没问题。。。会不会是java的数据流读取那里出了问题呢。
3。更换读取数据流的方法，由读取String，改为读取byte(比读取String麻烦多了)，输出。。一切都正常了。。

[code]
                           String urlstring="http://**.com";

try{
URL url = new URL(urlstring);
URLConnection conn = url.openConnection();
InputStream in =conn.getInputStream();
byte[] tempbuff=new byte[100];  //临时数组
byte[] buff =new byte[10240];  //定义一下足够大的数组
int count=0; //读取字节个数
int rbyte=0; //每次读取的个数

while((rbyte=in.read(tempbuff))!=-1){
for(int i=0;i<rbyte;i++)
buff[count+i]=tempbuff[i];
count+=rbyte;
}


byte[] result=new byte[count];
for(int i=0;i<count;i++)
result[i]=buff[i];

String output=new String(result,"UTF-8");

System.out.println(output);

}
catch (MalformedURLException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}

**********************************************************************************

UTF转换成GBK

public final static String readUTF(byte[] data) throws IOException {
          int utflen = data.length;
          StringBuffer str = new StringBuffer(utflen);
          byte bytearr[] = data;
          int c, char2, char3;
          int count = 0;

          while (count < utflen) {
              c = (int) bytearr[count] & 0xff;
              switch (c >> 4) {
                  case 0:
                  case 1:
                  case 2:
                  case 3:
                  case 4:
                  case 5:
                  case 6:
                  case 7:
                      /* 0xxxxxxx*/
                      count++;
                      str.append( (char) c);
                      break;
                  case 12:
                  case 13:

                      /* 110x xxxx   10xx xxxx*/
                      count += 2;
                      if (count > utflen) {
                          throw new UTFDataFormatException(
                              "UTF Data Format Exception");
                      }
                      char2 = (int) bytearr[count - 1];
                      if ( (char2 & 0xC0) != 0x80) {
                          throw new UTFDataFormatException();
                      }
                      str.append( (char) ( ( (c & 0x1F) << 6) | (char2 & 0x3F)));
                      break;
                  case 14:

                      /* 1110 xxxx 10xx xxxx 10xx xxxx */
                      count += 3;
                      if (count > utflen) {
                          throw new UTFDataFormatException(
                              "UTF Data Format Exception");
                      }
                      char2 = (int) bytearr[count - 2];
                      char3 = (int) bytearr[count - 1];
                      if ( ( (char2 & 0xC0) != 0x80) || ( (char3 & 0xC0) != 0x80)) {
                          throw new UTFDataFormatException();
                      }
                      str.append( (char) ( ( (c & 0x0F) << 12)
                          | ( (char2 & 0x3F) << 6) | ( (char3 & 0x3F) << 0)));
                      break;
                  default:

                      /* 10xx xxxx, 1111 xxxx */
                      throw new UTFDataFormatException(
                          "UTF Data Format Exception");
              }
          }
          // The number of chars produced may be less than utflen
          return new String(str);
      }

***************************************************************************************

根据传入的UTF-8类型的字节数组生成Unicode字符串的方法.

下面的代码根据utf8转换成unicode.

/**
     * 根据传入的UTF-8类型的字节数组生成Unicode字符串
     * @param      UTF-8类型的字节数组
     * @return     Unicode字符串
     * @exception IOException           产生IO异常
     * @exception UTFDataFormatException 传入了非UTF-8类型的字节数组
     */
    public final static String readUTF(byte[] data) throws IOException {
        int utflen = data.length;
        StringBuffer str = new StringBuffer(utflen);
        byte bytearr[] = data;
        int c, char2, char3;
        int count = 0;

        while (count < utflen) {
            c = (int) bytearr[count] & 0xff;
            switch (c >> 4) {
                case 0:
                case 1:
                case 2:
                case 3:
                case 4:
                case 5:
                case 6:
                case 7:
                    /* 0xxxxxxx*/
                    count++;
                    str.append( (char) c);
                    break;
                case 12:
                case 13:

                    /* 110x xxxx   10xx xxxx*/
                    count += 2;
                    if (count > utflen) {
                        throw new UTFDataFormatException(
                            "UTF Data Format Exception");
                    }
                    char2 = (int) bytearr[count - 1];
                    if ( (char2 & 0xC0) != 0x80) {
                        throw new UTFDataFormatException();
                    }
                    str.append( (char) ( ( (c & 0x1F) << 6) | (char2 & 0x3F)));
                    break;
                case 14:

                    /* 1110 xxxx 10xx xxxx 10xx xxxx */
                    count += 3;
                    if (count > utflen) {
                        throw new UTFDataFormatException(
                            "UTF Data Format Exception");
                    }
                    char2 = (int) bytearr[count - 2];
                    char3 = (int) bytearr[count - 1];
                    if ( ( (char2 & 0xC0) != 0x80) || ( (char3 & 0xC0) != 0x80)) {
                        throw new UTFDataFormatException();
                    }
                    str.append( (char) ( ( (c & 0x0F) << 12)
                        | ( (char2 & 0x3F) << 6) | ( (char3 & 0x3F) << 0)));
                    break;
                default:

                    /* 10xx xxxx, 1111 xxxx */
                    throw new UTFDataFormatException(
                        "UTF Data Format Exception");
            }
        }
        // The number of chars produced may be less than utflen
        return new String(str);
    }

************************************************************************************

GB/BIG5/UTF-8 文件编码批量转换程序September 12th, 2006

GB/BIG5/UTF-8 文件编码批量转换程序昨天我需要将一个 GB 编码的 WEB 应用改变成 UTF-8 编码，整个 WEB 程序涉及 300 多个 ASP 和 HTML 文件….于是乎，我上网搜索能将 GB 文件批量转换成 UTF-8 编码的软件。找来找去，多是一些仅能在网页中实时编码的 VBS、JS 或 PHP 脚本，而没有进行大量文件编码转换的工具。

因为时间紧迫，后来只好使用最原始的办法，用 Windows 的记事本打开一个个 ASP 文件，使用“另存为…”的方式变成 UTF-8 编码。真是郁闷得要S….最后急S我了，只好再去找软件，拼了！！！

终于发现了这款很棒的GB/BIG5/UTF-8 文件编码批量转换程序，用下来感觉确实挺不错，推荐一下！

软件很小，才25KB，希望对于网站开发或者其他网页编辑人员有帮助。

下载地址：http://beebee.com.cn/jinnylife/wp-content/rar/gb2utf8.rar
解压缩密码：http://beebee.com.cn/jinnylife/

posted on 2006-09-27 09:30 SIMONE 阅读(2571) 评论(1) 编辑收藏

常用链接

留言簿(46)

随笔分类(476)

随笔档案(495)

最新随笔

搜索

积分与排名

最新评论

阅读排行榜

评论排行榜


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理