Re: [問題] 爬蟲取得相對路徑的圖片 Hsins PTT批踢踢實業坊

Re: [問題] 爬蟲取得相對路徑的圖片

作者: Hsins (翔) 2021-12-14 16:57:11

※ 引述《sky094315 (monkeyo)》之銘言：
: 想請問一下各位大大
: 目前正在做一個網站爬蟲，此網站會有圖形驗證碼，而此驗證碼每次重新整理後都會改變
: (伺服器端會產生亂數製作一組圖片)，且只可取得一次。
: 請問有其他不使用selenium開啟瀏覽器把圖檔抓下來的方法嗎？
: 或是有什麼關鍵字呢？
: 謝謝
: 參考資料：https://weirenxue.github.io/2021/07/04/python_selenium_captcha/
: 這邊附上
: 參考網站：https://aaav2.hinet.net/A1/AuthScreen.jsp
你這參考網站沒 cookies 進不去
所以我拿其他頁面的內容示範下：
https://aaaservice.hinet.net/User/unipresidentConsole.jsp
https://aaacp.hinet.net/CP/index.html
這兩個頁面都有 Captcha, 透過 Chrome/Edge 的開發者工具可以檢查：
https://i.imgur.com/V5x4d9u.png
其中的 Captcha 主要是透過向以下兩個 URI 打 GET 獲取
https://aaaservice.hinet.net/User/Captcha?rdn=1639470286847
https://aaacp.hinet.net/CP/Captcha?rdn=1639469984177
其中後面的 rdn 一臉就長得很像 timestamp
餵過去 https://www.epochconverter.com/ 檢查下是含 milliseconds 的
所以事情就變得很簡單了：
1. 打請求
2. 存圖片
```python
import requests
from datetime import datetime
for _ in range(10):
current_timestamp = round(datetime.now().timestamp() * 1000)
image_url = f"https://aaacp.hinet.net/CP/Captcha?rdn={current_timestamp}"
image_data = requests.get(image_url).content
with open(f'./{current_timestamp}.jpg', 'wb') as handler:
handler.write(image_data)
```

作者: sky094315 2021-12-14 18:29:00

不好意思沒有發現要cookie感謝您的回覆，這樣我有方向了

作者: Hsins (翔) 2021-12-14 19:36:00

如果你是載來要訓練的話沒差，載來要識別然後登入的話，要處理一下 cookies

作者: sky094315 2021-12-14 20:04:00

好的，謝謝您的回覆

繼續閱讀

[問題] 爬蟲取得相對路徑的圖片sky094315 [教學] Instagram 發文 by Seleniumbrad0315 [問題] 推薦簡單可以學習模仿的Packagectr1 [問題] Opencv 讀取高解析度Webcam時FPS很低ADDandy [問題] Pchome股票網站爬蟲s8607142004 [問題] tkinter.entryconfig無法使用迴圈輸入MaJaeYun [問題] PYTHON問題newforte [問題]rebuild TensorFlow with the appropriapolytrade [問題] 請問如何在bash script啟動pyenv虛擬環境chang0206 [問題] isChanged 是python的 keywords 還是方法njpp