Re: [問題] 網路爬蟲遇到javascript
感謝 uziel和caty1010大大的幫忙
下面是我參考網路上的資料所寫的,不過他好像沒有把轉向後的網頁抓出來
請問哪裡有問題???
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.commons.httpclient.Header;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.NameValuePair;
import org.apache.commons.httpclient.methods.PostMethod;
public class test
{
public static void main(String args[])
{
String url = "http://khh.travel/tw/spots/RecSpotList.aspx";
PostMethod postMethod = new PostMethod(url);
// 填入各個表單域的值
NameValuePair[] data =
{
new NameValuePair("__EVENTTARGET", "Next"),
new NameValuePair("__EVENTARGUMEN", ""),
new NameValuePair("__VIEWSTAT","hidden")
};
// 將表單的值放入postMethod中
postMethod.setRequestBody(data);
// 執行postMethod
HttpClient httpClient=new HttpClient();
int statusCode = 0;
try {
statusCode = httpClient.executeMethod(postMethod);
} catch (HttpException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// HttpClient對于要求接受后繼服務的請求,象POST和PUT等不能自動處理轉發
// 301或者302
if (statusCode == HttpStatus.SC_MOVED_PERMANENTLY ||
statusCode == HttpStatus.SC_MOVED_TEMPORARILY) {
// 從頭中取出轉向的位址
Header locationHeader = postMethod.getResponseHeader("location");
String location = null;
if (locationHeader != null) {
location = locationHeader.getValue();
System.out.println("The page was redirected to:" + location);
} else {
System.err.println("Location field value is null.");
}
}
}
}
ps.我看有一些爬蟲類的書好像有介紹到Rhino這個工具,來處理javascript,
有其他大大要介紹一下嘛?雖然用上面方法好像就ok了
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 140.116.130.25
討論串 (同標題文章)
完整討論串 (本文為第 3 之 3 篇):