[問題] 請教爬蟲bs4如何去除標籤? nini200 PTT批踢踢實業坊

[問題] 請教爬蟲bs4如何去除標籤?

作者: nini200 (200妮妮) 2018-12-25 23:29:20

import requests
from bs4 import BeautifulSoup
import re
url = 'https://tw.appledaily.com/new/realtime'
res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
tags = soup.find('ul',attrs={'class':'rtddd slvl'})
titles = tags.find_all('h1')
for title in titles:
print(title.text)
我爬蘋果日報標題
title.text 會將文字 <span>數字</span> 合併
但我只想要文字部分數字不要
請問如何提取呢
感謝

作者: leawei (新手上路) 2018-12-26 09:26:00

.string吧

作者: s860134 (s860134) 2018-12-26 23:32:00

看了一下結構，兩條路，用 lxml，會殘留做括弧第二條路　re.strip 硬幹，數字的pattern 只出現在尾端第二條路應該比較符合你的理想阿　第一條路其實沒殘存，那個括弧是標題被截斷

作者: cody880528 (Summon) 2018-12-26 23:39:00

在print(title.text)前面加上title.span.decompose()

作者: s860134 (s860134) 2018-12-26 23:44:00

title.font.string 其實就可以了XD

繼續閱讀

Fw: [問卷] 台灣AI人才進修研究調查ohhahaha [問題] import requests請教vincent5425 [問題] 爬蟲新手請教ptt爬蟲問題xiangying [問題] vscode 變數內容查詢somoskylin [問題] selenium 定位問題bewilderment [問題] 想請問有關爬蟲問題rennmin84 [問題] VScode無法辨識cmdlethagi13 [問題] labelimgjasonfghx [問題] 透過點擊讓數字往上加(gui)partDu [問題] matplotlib 圖例說明yshihyu