Re: [問題] 爬蟲POST問題

作者: celestialgod (天)   2021-04-07 13:44:04
※ 引述《ppp1987 (ppp)》之銘言:
: [問題類型]:
: 程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來)
: [軟體熟悉度]:
: 入門(寫過其他程式,只是對語法不熟悉)
: [問題敘述]:
: 想爬一個網站的資料 用python可以做出來 但是用R就發生問題
: google不到解決方式
: 求板上高手幫忙
: 謝謝
: [程式範例]:
: <python> 可以順利跑出來
: import requests
: import pandas as pd
: import json
: url = "https://securev.jihsun.com.tw/JssFHCTradeNet/JSStockCR/
: StockRatingCR_P.aspx/GetData"
: headers = {'Content-Type': 'application/json; charset=UTF-8'}
: data = {'stockNo': '2330'}
: response = requests.post(url = url, data=json.dumps(data), headers=headers)
: <R>
: url = "https://securev.jihsun.com.tw/JssFHCTradeNet/JSStockCR/
: StockRatingCR_P.aspx/GetData"
: headers = c('Content-Type' = 'application/json; charset=UTF-8')
: data = '{"stockNo": "2330"}'
: get_data <- httr::POST(url = url,
: httr::add_headers(.headers=headers),
: body = data)
: # 會噴出下面的error
: # Error in curl::curl_fetch_memory(url, handle = handle) :
: # Maximum (10) redirects followed
: [環境]
: R version 4.0.4
: curl 4.3
: httr 1.4.2
: macbbok M1
: [關鍵字]:
你只要先開verbose
get_data <- POST(
url = url,
content_type("application/json"),
body = data,
verbose()
)
你會發現下面的訊息:
-> POST /JssFHCTradeNet/JSStockCR/StockRatingCR_P.aspx/GetData HTTP/1.1
-> Host: securev.jihsun.com.tw
-> User-Agent: libcurl/7.59.0 r-curl/3.3 httr/1.4.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Content-Length: 16
->
>> {"stockNo":2330}
<- HTTP/1.1 302 Found
<- Connection: close
<- Content-Length: 0
<- Content-Type: text/html; charset=utf-8
<- Location:
http://jsmarket.jihsun.com.tw/Marketnet/Error/Error.aspx?sys=08&support_id=41102190011827406
<-
-> GET /Marketnet/Error/Error.aspx?sys=08&support_id=41102190011827406
HTTP/1.1
-> Host: jsmarket.jihsun.com.tw
-> User-Agent: libcurl/7.59.0 r-curl/3.3 httr/1.4.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
->
<- HTTP/1.1 302 Found
<- Connection: close
<- Content-Length: 0
<- Content-Type: text/html; charset=utf-8
<- Location:
http://jsmarket.jihsun.com.tw/Marketnet/Error/Error.aspx?sys=09&support_id=41102190016500008
你可以把上面的Error網址點進去,就是一個錯誤訊息
所以就可以排除是redirection的問題了
不過補充一下,如果是redirection的話,解法如下:
get_data <- POST(
url = url,
content_type("application/json"),
body = data,
config(maxredirs=-1)
)
所以就很簡單猜測一下,可能user agent不對
get_data <- POST(
url = url,
content_type("application/json"),
body = data,
user_agent("Chrome/89.0.4389.114"),
verbose()
)
這樣就過了
-> POST /JssFHCTradeNet/JSStockCR/StockRatingCR_P.aspx/GetData HTTP/1.1
-> Host: securev.jihsun.com.tw
-> User-Agent: Chrome/89.0.4389.114
-> Accept-Encoding: gzip, deflate
-> Cookie: ASP.NET_SessionId=1wl2d0fpigsiiwxvoudb0jlw;
TS014ea3cc=01b12d6ecc001a4641027d81bf890dc86511b24c71a79e3cb594413c98562558ff8e91057a93a67298bc1020dfa0f573cd9c0bd7cd
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Content-Length: 16
->
>> {"stockNo":2330}
<- HTTP/1.1 200 OK
<- Cache-Control: private, max-age=0
<- Content-Type: application/json; charset=utf-8
<- X-AspNet-Version: 4.0.30319
<- X-Powered-By: ASP.NET
<- Date: Wed, 07 Apr 2021 05:43:44 GMT
<- Content-Length: 13571
<-
以上
作者: ppp1987 (ppp)   2021-04-07 16:47:00
成功了 非常感謝!

Links booklink

Contact Us: admin [ a t ] ucptt.com