网上有很多关于java连接http server 的文章,也有不少在http server 上通过用户认证后从后台获取网页的帖子。但是,很少有一个对于https server上通过用户验证的介绍。项目需要,我折腾了一整天,终于搞定了,现在来整理一下。概括的说, https 比之 http 只是多了一个加密解密过程,所以https的连接只是比http连接多了一个验证的过程,一旦验证通过,剩下的操作与http上的相同。也就是说,在https server上一旦certificate通过验证,剩下的用户验证就于http server上用户认证一致,概括起来,这整一个过程如下:
1. 建立第一个HttpsURLConnection(URL为登录页面url),  通过https上server certificate 与client的验证   
2. 用POST方式向登录页面传出userID 和 password (具体的变量名要参考单 
    中的名称)。 post操作成功后,取得上面connection的Cookie,通过cookie split出SessionID。
3. 建立第二个HttpsURLConnection(URL为要抓取页面的url), 通过https上的证书验证
4. 用 URLConnetion.setPropertyValue("Cookie", SessionID),设置第二个URL的cookie,
    确保两个connection属于同一个登入后的Session
5. connection.getInputStream获得目标页面的内容

下面是我用到的一段testing code,是几个独立的片段,已通过测试,有兴趣的朋友改改后就可以用:

try
            
{
                                
if(protocol.equals("http")){
                                        
                    
final HttpURLConnection connection = (HttpURLConnection)iSourceURL.openConnection();
                    connection.connect();
                    stream 
= connection.getInputStream();
//                    
//                    try{
//                        printIoStream(stream);
//                    }catch(Exception e){
//                        e.printStackTrace();
//                    }
                    
                    modelSource 
= new StreamSource(stream);
                    
//                    connection.disconnect();
                }

                
else if(protocol.equals("https")){
                    
try {
                        
                        SSLContext sc 
= SSLContext.getInstance("SSL");
                        sc.init(
nullnew TrustManager[] new iTrustManager() },
                                
new java.security.SecureRandom());
                        
//                        url = new URL("https://9.186.10.56:8443/LogonServlet");
                        URL url = new URL(iSourceURL.getProtocol() + "://" + iSourceURL.getHost() + ":" + iSourceURL.getPort() + "/LogonServlet");
                        String strPost 
= "intranetID=*****&password=******";
                        HttpsURLConnection conn 
= (HttpsURLConnection) url.openConnection();
                        conn.setSSLSocketFactory(sc.getSocketFactory());
                        conn.setHostnameVerifier(
new TrustAnyHostnameVerifier());
                        
                        addProperty(conn);
                        
                        conn.setFollowRedirects(
true);
                        conn.setInstanceFollowRedirects(
true);
                        conn.setDoOutput(
true); // IO input to Server
                        conn.setDoInput(true); // 
                        conn.setUseCaches(false); // obtain the newest info of server
                        conn.setAllowUserInteraction(false);
                        conn.setRequestMethod(
"POST");
                        
                        conn.getOutputStream().write(strPost.getBytes());
                        conn.getOutputStream().flush();
                        conn.connect();

                        String cookie 
= conn.getHeaderField("Set-Cookie");
                        
                        String SessionID 
= getSessionIdFromCookie(cookie);
                        
                        stream 
= conn.getInputStream();

                        conn.disconnect();
                        
//                        printIoStream(stream);
                
                        
final HttpsURLConnection connection = (HttpsURLConnection)iSourceURL.openConnection();
                        connection.setSSLSocketFactory(sc.getSocketFactory());
                        connection.setHostnameVerifier(
new TrustAnyHostnameVerifier());
                        connection.setRequestProperty(
"Cookie", SessionID);
                        connection.connect();

                        stream 
= connection.getInputStream();
                        modelSource 
= new StreamSource(stream);
                        
//                        printIoStream(stream);
                        
                    }
 catch (Exception e) {
                        TMCodePlugin.getInstance().writeToLog(
                                Status.ERROR,
"Could not read data via URL(https):"+ iSourceURL, null);
                        e.printStackTrace();
                    }

                }
else{
                    TMCodePlugin.getInstance().writeToLog(Status.ERROR, 
"Protocol illegal: "+iSourceURL, null);
                }

            }

            
catch(IOException e)
            
{
                TMCodePlugin.getInstance().writeToLog(Status.ERROR, 
"Could not read data via URL:"+iSourceURL, null);
            }

            
catch(IllegalArgumentException e)
            
{
                TMCodePlugin.getInstance().writeToLog(Status.ERROR, 
"Could not read data via URL - illegal argument in URL:"+iSourceURL, null);
            }

        }







/**
     * the protocal of SSL operation on java, visite the HTTPS server via socket
     * 
@author chaixzh
     
*/

    
class iTrustManager implements X509TrustManager {
        iTrustManager() 
{
            
        }


        
// check client trust status
        public void checkClientTrusted(X509Certificate chain[], String authType)
                
throws CertificateException {
            System.out.println(
"check client trust status");
        }


        
// check Server trust status
        public void checkServerTrusted(X509Certificate chain[], String authType)
                
throws CertificateException {
            System.out.println(
"check Server trust status");
        }


        
//get those accepted Issuers
        public X509Certificate[] getAcceptedIssuers() {
            
return null;
        }

    }

    
    
private static class TrustAnyHostnameVerifier implements HostnameVerifier {
        
public boolean verify(String hostname, SSLSession session) {
            
return true;
        }

    }

    
    
/**
     * to split out the SessionID from a Cookie String
     * 
@param cookie
     * 
@return
     
*/

    
private String getSessionIdFromCookie(String cookie){
        
int index_1 = cookie.indexOf("JSESSIONID=");
        
int index_2 = cookie.indexOf(";");
        
return cookie.substring(index_1, index_2);
    }

    
    
/**
     * just for the sake of debuging
     * 
@param stream
     * 
@throws Exception
     
*/

    
private void printIoStream(InputStream stream) throws Exception{
        BufferedInputStream buff 
= new BufferedInputStream(stream);
        Reader r 
= new InputStreamReader(buff, "gbk");
        BufferedReader br 
= new BufferedReader(r);
        StringBuffer strHtml 
= new StringBuffer("");
        String strLine 
= null;
        
while ((strLine = br.readLine()) != null{
            strHtml.append(strLine 
+ "\r\n");
        }

        System.out.print(strHtml.toString());
    }

    
    
private void addProperty(URLConnection connection){
        connection.addRequestProperty(
"Accept""image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/x-silverlight, */*");
        connection.setRequestProperty(
"Referer""https://9.186.10.56:8443/index.jsp");
        connection.setRequestProperty(
"Accept-Language""zh-cn");
        connection.setRequestProperty(
"Content-Type""application/x-www-form-urlencoded");
        connection.setRequestProperty(
"Accept-Encoding""gzip, deflate");
        connection.setRequestProperty(
"User-Agent""Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Foxy/1; .NET CLR 2.0.50727;MEGAUPLOAD 1.0)");
        connection.setRequestProperty(
"Connection""Keep-Alive");
        connection.setRequestProperty(
"Cache-Control""no-cache");
    }


此外,还有通过socket连接, 或者借助apache 的httpclient连接的,不一而足,大同小异,总体就是通过验证后保持在同一个session中,进而抓取目标网页内容。

cxzforever