Re: [問題] 爬蟲如何爬指定資料

看板Python作者sky800507 (B翰)時間8年前 (2017/02/18 09:05)推噓0(0推 0噓 2→)

留言2則, 1人參與討論串3/3 (看更多)

※ 引述《a856479 (thebelief)》之銘言： : 大家好，由於我沒有程式語言的基礎 : 最近試著爬網頁資料時遇到一個問題不知道怎麼解決 : 請求各位協助幫忙，謝謝! : 目標網址:https://goo.gl/02M292 : 目標資料:只要"今日十全戰法偏多"裡面出現的股票名稱 : 問題:我找不出"今日十全戰法偏多"的class或能定義它的方式， : 導致只能抓取到網頁內所有的股票名稱... : ----------以下是我目前的寫法---------- : import requests : from bs4 import BeautifulSoup : res = requests.get("http://www.sohowgood.com/TwStock/PowerKLine.aspx") : soup = BeautifulSoup(res.text, "lxml") : stocks = soup.find_all('li') : for stock in stocks: : meta = stock.find('a') : stockid = meta.getText().strip() : print(stockid) : 請問我該如何修改才能抓取到我需要的部分? 謝謝大家 import requests from bs4 import BeautifulSoup res = requests.get('http://www.sohowgood.com/TwStock/PowerKLine.aspx') soup = BeautifulSoup(res.text, 'lxml') for stock in soup.select('ul')[5].select('li a'): print(stock['title']) 如果怕ul的每次順序不是5，可以這樣寫: import requests from bs4 import BeautifulSoup res = requests.get('http://www.sohowgood.com/TwStock/PowerKLine.aspx') soup = BeautifulSoup(res.text, 'lxml') for table in soup.select('h2'): if table.text == '今日十全戰法偏多': for stock in table.next_sibling.next_sibling.select('li a'): print(stock['title']) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 118.160.52.219 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1487408702.A.95F.html

→

a856479

02/18 22:36, , 1^F

02/18 22:36, 1^F

→

a856479

02/18 22:37, , 2^F

02/18 22:37, 2^F

‣ 返回看板[ Python ] 程設

‣ 更多 sky800507 的文章

文章代碼(AID): #1Og0u-bV (Python)