Re: [問題] 網路爬蟲遇到javascript

看板java作者 (菠蘿麵包)時間14年前 (2011/06/06 17:52), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串3/3 (看更多)
感謝 uziel和caty1010大大的幫忙 下面是我參考網路上的資料所寫的,不過他好像沒有把轉向後的網頁抓出來 請問哪裡有問題??? import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import org.apache.commons.httpclient.Header; import org.apache.commons.httpclient.HttpClient; import org.apache.commons.httpclient.HttpException; import org.apache.commons.httpclient.HttpStatus; import org.apache.commons.httpclient.NameValuePair; import org.apache.commons.httpclient.methods.PostMethod; public class test { public static void main(String args[]) { String url = "http://khh.travel/tw/spots/RecSpotList.aspx"; PostMethod postMethod = new PostMethod(url); // 填入各個表單域的值 NameValuePair[] data = { new NameValuePair("__EVENTTARGET", "Next"), new NameValuePair("__EVENTARGUMEN", ""), new NameValuePair("__VIEWSTAT","hidden") }; // 將表單的值放入postMethod中 postMethod.setRequestBody(data); // 執行postMethod HttpClient httpClient=new HttpClient(); int statusCode = 0; try { statusCode = httpClient.executeMethod(postMethod); } catch (HttpException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } // HttpClient對于要求接受后繼服務的請求,象POST和PUT等不能自動處理轉發 // 301或者302 if (statusCode == HttpStatus.SC_MOVED_PERMANENTLY || statusCode == HttpStatus.SC_MOVED_TEMPORARILY) { // 從頭中取出轉向的位址 Header locationHeader = postMethod.getResponseHeader("location"); String location = null; if (locationHeader != null) { location = locationHeader.getValue(); System.out.println("The page was redirected to:" + location); } else { System.err.println("Location field value is null."); } } } } ps.我看有一些爬蟲類的書好像有介紹到Rhino這個工具,來處理javascript, 有其他大大要介紹一下嘛?雖然用上面方法好像就ok了 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.116.130.25
文章代碼(AID): #1DxADaYW (java)
文章代碼(AID): #1DxADaYW (java)