[問題] 抓取PTT網頁，請問此程式碼的錯誤在哪？ mikemlbb PTT批踢踢實業坊

[問題] 抓取PTT網頁，請問此程式碼的錯誤在哪？

作者: mikemlbb (M) 2016-10-18 15:23:22

[問題類型]:
程式諮詢(我想用R 做某件事情，但是我不知道要怎麼用R 寫出來)
[軟體熟悉度]:
使用者(已經有用R 做過不少作品)
[問題敘述]:
我照著書本輸入以下程式碼，
想嘗試抓取笨版中之文章文字內容，
但程式碼執行完後卻出現：
Error in regexpr("www", line) :
argument "line" is missing, with no default
了解regexpr語法的用法後，
發現此程式的"line"字詞與該語法之用法不同，
然而若是在此範例中，
要怎麼修改才能抓取到笨版當中的文章呢？
謝謝大家解惑
[程式範例]:
install.packages("XML")
install.packages("RCurl")
library(XML)
library(RCurl)
data <- list()
for(i in 1058:1118){
tmp <- paste(i, '.html', sep = '')
url <- paste('https://www.ptt.cc/bbs/StupidClown/index',tmp,sep = '')
get_url <- getURL(url,ssl.verifypeer = FALSE)
html <- htmlParse(get_url)
url.list <- xpathSApply(html,"//div[@class='title']/a[@href]",xmlAttrs)
data <- rbind(data, paste('https://www.ptt.cc',url.list,sep = ''))
}
data <- unlist(data)
getdoc <- function(line){
start <- regexpr('www', line)[1]
end <- regexpr('html', line)[1]
if(start != -1 & end != -1){
url <- substr(line, start, end+3)
html <- htmlParse(getURL(url,ssl.verifypeer = FALSE),encoding = 'UTF-8')
doc <- xpathSApply(html, "//div[@id='main-container']",xmlValue)
name <- strsplit(url,'/')[[1]][4]
write(doc,gsub('html','txt',name))
}
}
getdoc()
sapply(data, getdoc)
setwd("C://Documents and Settings//12345//桌面//R_textmining")
write.table(getdoc,file = "getdoc.txt",row.names = F,quote = F)
[環境敘述]:
R version 3.3.1 (2016-06-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 3
[關鍵字]:
regexpr、xpathSApply、PTT爬蟲

作者: clansoda (小笨) 2016-10-19 09:41:00

Hi，I am trying to solve your problem.Would you tell me what your expected output isThe "data" dataframe contains 1220 URL characters

作者: mikemlbb (M) 2016-10-21 02:23:00

I'm trying to crawl the content of StupidClown siteXIncluding article title and content by no.1058 to11But the code seem to be wrong.When I run "getdoc()"The error will emerge then say "line" is not defined

繼續閱讀

Re: [問題] grepl與迴圈使用celestialgod [問題] grepl與迴圈使用huangsam [問題] R call excel 執行規劃求解Edster [問題] Function中變數轉為文字的方法chrisli7 [問題] 如何用MLE法估計回歸係數jlzl0612 [問題] 如何檢定資料是否為常態分配jlzl0612 [問題] 讀取程式時當機msntree [問題] twoord.plot 繪圖問題SleepyChink Re: [問題] dplyr 與 mutate用法celestialgod Re: [問題] dplyr 與 mutate用法swedrf0112