Re: [工具] HtmlParser 的一些白痴心得
: 不過,有一些網頁,用 HttpURLConnection.getInputStream() 是不行的
: 最殘忍的例子就是 google search result
: http://www.google.com/search?hl=zh-TW&q=htmlparser
要設一下 user-agent
你可以在這挑一個你想用的
http://en.wikipedia.org/wiki/User_agent
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
public class SimpleUrlConnection {
public static void main(String[] args) throws IOException{
URL u = new URL("http://www.google.com/search?hl=zh-TW&q=htmlparser");
HttpURLConnection urlConnection = (HttpURLConnection) u.openConnection();
urlConnection.addRequestProperty("user-agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; " +
"SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)");
FileOutputStream out = new FileOutputStream(new File("result.html"));
BufferedInputStream in = new
BufferedInputStream(urlConnection.getInputStream());
int size = 0;
byte[] block = new byte[1024];
while((size = in.read(block)) >0){
out.write(block, 0, size);
}
out.close();
in.close();
}
}
--
有圖有真相, 新版的 eclipse 有 fu
http://farm2.static.flickr.com/1202/1083119916_5c469f95c3_o.png

--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 218.161.122.204
※ 編輯: qrtt1 來自: 218.161.122.204 (08/11 23:01)
討論串 (同標題文章)
本文引述了以下文章的的內容:
完整討論串 (本文為第 4 之 4 篇):