[問題] 爬蟲出現問題 proud PTT批踢踢實業坊

[問題] 爬蟲出現問題

作者: proud (hc) 2016-05-24 12:51:40

想請問code沒問題下
出現 HTTP Error 500: server error
這樣狀況是什麼問題？
爬的網頁可以開啟
用本地IP去爬也是出現這個error
也排除IP問題
想請問有什麼解決法？
code片段如下抓的是奇摩股票新聞
stockList = [line.rstrip() for line in open('test1.txt')]
for count in range(100000000):
t1 = time.time()
timeCount = 0
for stockNum in stockList:
d = feedparser.parse('http://tw.stock.yahoo.com/rss/s/%s' % stockNum)
lens = len(d.entries)
print lens
for newsNum in range(lens):
print d.feed.title
title = d.entries[newsNum].title.encode('utf-8')
print title
rTitle = title.replace('/', '.')
link = d.entries[newsNum].link
req = urllib2.Request(link)
print req.__doc__
if not req.__doc__:
continue
content = urllib2.urlopen(req).read()
save = open('./database/%s/%s.news' % (stockNum, rTitle), 'w')
save.write(content)
save.close()

作者: secondDim (　祈求備取會上) 2016-05-24 13:16:00

google http 500

作者: uranusjr (â†é€™äººæ˜¯è¶…ç´šç¬¨è›‹) 2016-05-24 16:59:00

500 代表 server 自己壞了, 你沒辦法解決

作者: aweimeow (喵喵喵喵ヽ( ・∀・)ノ) 2016-05-24 17:09:00

你試試看把你存取網頁的 UA 塞進去之前有碰過因為 UA 不對就故意吐 500 給我的 server

作者: daniel1205 (??!!) 2016-05-24 20:04:00

header , cookie 看一下

作者: aweimeow (喵喵喵喵ヽ( ・∀・)ノ) 2016-05-25 13:18:00

話說你不把 code 貼出來大家要怎麼 debug我剛剛實驗了一下，是 UA 的問題看看這段吧，你能理解問題出在哪http://pastebin.com/q4ff1tDJ

作者: kanggy ((我還在，只是熱情不再)) 2016-05-26 08:54:00

謝謝 aweimeow 分享, 學習了 :P

作者: billy0131 (Pluto) 2016-05-27 11:37:00

這種防爬蟲的方法到底有什麼用....

作者: s860134 (s860134) 2016-05-27 22:13:00

防君子不防小人阿　最差最差你模仿瀏覽器還是能爬最簡單就是　user-agent, host 比較複雜就 cookie

繼續閱讀

[資訊] Pycon Taiwan 2016 Tutorialhane1818 [閒聊] 最近學了try/except/else/finallyshemale [問題] (Fix)經由UART傳遞固定byte的資料至C Codec74319 [問題] python on OS X os.system('clear')ray60110 [問題] 執行卻說讀不到已安全模組 imutilsjackjenny Re: [問題] 排列組合問題SocketAM2 [資訊] Pycon Taiwan 2016 Sprint 活動報名hane1818 [問題] 關於資料格式的取用[]和()max533 Re: [問題] 關於Python入門書籍ccwang002 Re: [問題] 關於Python入門書籍Neisseria