Re: [問題] 初學網路爬蟲問題 starlichin PTT批踢踢實業坊

Re: [問題] 初學網路爬蟲問題

作者: starlichin (白星羽) 2018-11-04 20:04:27

XML格式的網頁中(網頁網址是http://py4e-data.dr-chuck.net/comments_42.xml)，
想爬出裡面count這個tag下面的attribute。
網頁的原始碼大概是長這樣:
<comments>
<comment>
<name>Romina</name>
<count>97</count>
</comment>
<comment>
<name>Laurie</name>
<count>97</count>
</comment>
<comment>
<name>Bayli</name>
<count>90</count>
</comment>
<comment>
<name>Siyona</name>
<count>90</count>
</comment>
<comment>
<name>Taisha</name>
<count>88</count>
</comment>
我寫的部分如下，但抓不到Attribute (顯示為none)，可以請教為什麼嗎?
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = 'http://py4e-data.dr-chuck.net/comments_42.xml'
html = urllib.request.urlopen(url, context=ctx).read().decode('utf-8')
tree = ET.fromstring(html)
counts = tree.findall('.//count')
print('counts:', len(counts))
for item in counts:
print('Attribute:', item.get("count"))

作者: InfinityGate (小鳥) 2018-11-04 23:15:00

因為它就沒有attribute你如果要那個數字那是它的text

作者: rexyeah (ccccccc) 2018-11-05 08:31:00

https://goo.gl/fVm5fd 可以先查一下直接.text就可以

繼續閱讀

[問題] pyinstaller 支援外部圖檔 for Macbighb69738 [問題] xlwings package 安裝沒有網路的電腦caron0225 [推荐] 台灣新聞拆拆樂 (twnews)tacovirus [問題] 請教數字三角型、菱型等撰寫OCEANSAE [教學] 徵讓我可以辨視一個馬克紅色杯子就好了psw [問題] 如何透過PHP開啟python的cgieternal523 [問題] 字典dict() key的條件Angesi [閒聊] 初學網路爬蟲beautifulsoup stanleychao [問題] folium無法正常顯示 geojson資料lh1122 [問題] 初學網路爬蟲問題starlichin