[問題] 抓取網頁 david31408 PTT批踢踢實業坊

[問題] 抓取網頁

作者: david31408 (Hope) 2016-08-12 18:05:15

[軟體熟悉度]:
請把以下不需要的部份刪除
入門(寫過其他程式，只是對語法不熟悉)
[問題敘述]:
請簡略描述你所要做的事情，或是這個程式的目的
大家好，我是R的新手，所以最近在練習
想要用XML這個package試著抓取 baseballreference的資料試看看
由於很菜，所以就先亂試，程式碼跟提示如下
會不會不是所有的網頁都可以用xml抓取?
> library("XML", lib.loc="~/R/win-library/3.2")
> url <- "http://www.baseball-reference.com/leaders/H_career.shtml"
> Hits <- readHTMLTable(url)
Error in UseMethod("xpathApply") :
no applicable method for 'xpathApply' applied to an object of class "NULL"
在上面的case中，不知道為什麼會出現這樣的error message
但我猜網頁本身不是table
後來又試了方法2
> url <- "http://www.baseball-reference.com/leaders/H_career.shtml"
> x <- xmlParse(url)
Error message 如下
Specification mandate value for attribute itemscope
attributes construct error
Couldn't find end of Start Tag html line
Extra content at the end of the document
Error: 1: Specification mandate value for attribute itemscope
2: attributes construct error
3: Couldn't find end of Start Tag html line 1
4: Extra content at the end of the document
可能baseballreference防止這樣?
謝謝大家教學 :)
[關鍵字]:
MLB, XML

作者: andrew43 (討厭有好心推文後刪文者) 2016-08-12 20:26:00

你在板上先爬個文吧。另外，你這樣「亂試」不是學習的好方法。多看說明文件和前人的例子。

作者: david31408 (Hope) 2016-08-12 20:33:00

謝謝這算是爬蟲嗎?

作者: celestialgod (å¤©) 2016-08-12 22:20:00

是爬蟲

作者: david31408 (Hope) 2016-08-12 23:43:00

了解！！謝謝:)

繼續閱讀

[問題] 在資料中新增一個變數來進行統計分析swilly0906 [問題] 有條件的刪除資料筆數amygm307 [問題] 矩陣運算問題Muhaosic Re: [問題] 求救QQ 時間序列分析繪圖問題naturalsmen [問題] 求救QQ 時間序列分析繪圖問題kindarex [問題] 有關網路爬蟲"網址(url)"的問題wheado [問題] 如何用R讀取本地的mdb檔?Tampa [問題] bigmemory 用ssd硬碟會變快嗎? f496328mm [問題] 爬蟲相關問題GetRobin Re: [問題] 自動跳過填入驗證碼clansoda