[問題] 取得discovery news抬頭

作者: wtuxxj (尋找Miss Right)   2012-07-29 20:53:57
hi all:
我需要取得discovery news的新聞標題
http://news.discovery.com/earth/
以下是部份原始碼
=====================begin=====================
<dl class="asset-items clear clearfix">
<dd class="details">
<h2 class="title"><a
href="http://news.discovery.com/earth/us-fire-map-hot-spots-120727.html"
onclick="componentClickTracking.build(this, {title:'topic
landing',name:'channel module',location:'1',position:'1'}); return
false;">Hottest Spots in America: Big Pic </a> </h2>
<p class="source">Posted Fri Jul 27, 2012 05:57 AM ET
&#160;&#160;|&#160;&#160; <span class="js-kit-comments-count comment-bubble"
id="count-fe5a23fb-12e7-4008-acf3-9cd954fe4ade"
uniq="/fe5a23fb-12e7-4008-acf3-9cd954fe4ade" exclude-sources="Digg,
FriendFeed, Twitter">0</span></p>
<p class="description">Armed with NASA satellite data, a
clever data visualization expert has produced a
hotspots map of all major fires in the contiguous US from 2001 through early
July 2012.
<a rel="nofollow"
href="http://news.discovery.com/earth/us-fire-map-hot-spots-120727.html"
onclick="componentClickTracking.build(this, {title:'topic
landing',name:'channel module',location:'1',position:'3'}); return
false;"><strong> Read&#160;more </strong></a></p>
</dd>
<dd class="thumbnail"><a
href="http://news.discovery.com/earth/us-fire-map-hot-spots-120727.html"
onclick="componentClickTracking.build(this, {title:'topic
landing',name:'channel module',location:'1',position:'4'}); return
false;"><img src="/earth/2012/07/27/firemap-278.jpg" title="fire map"
alt="fire map" class="" /></a></dd>
</dl>
<dl class="asset-items clear clearfix">
<dd class="details">
<h2 class="title"><a
href="http://news.discovery.com/earth/dead-lawn-paint-it-green-dnews-nugget-.html"
onclick="componentClickTracking.build(this, {title:'topic
landing',name:'channel module',location:'2',position:'1'}); return
false;">Dead Lawn? Paint it Green: DNews Nugget</a> </h2>
<p class="source">Posted by &#160;<a
href="/contributors/christina-reed/"
onclick="componentClickTracking.build(this, {title:'topic
landing',name:'channel module',location:'2',position:'2'}); return
false;">Christina Reed</a>&#160; Fri Jul 27, 2012 04:38 AM ET
&#160;&#160;|&#160;&#160; <span class="js-kit-comments-count comment-bubble"
id="count-2d57762b-b52b-4656-ae24-4f853dd4429d"
uniq="/2d57762b-b52b-4656-ae24-4f853dd4429d" exclude-sources="Digg,
FriendFeed, Twitter">0</span></p>
<p class="description">Residents around the country this
summer are calling their local turf and lawn painters for touch-ups to the
front yard or getting into the business themselves. <a rel="nofollow"
href="http://news.discovery.com/earth/dead-lawn-paint-it-green-dnews-nugget-.html"
onclick="componentClickTracking.build(this, {title:'topic
landing',name:'channel module',location:'2',position:'3'}); return
false;"><strong> Read&#160;more </strong></a></p>
</dd>
<dd class="thumbnail"><a
href="http://news.discovery.com/earth/dead-lawn-paint-it-green-dnews-nugget-.html"
onclick="componentClickTracking.build(this, {title:'topic
landing',name:'channel module',location:'2',position:'4'}); return
false;"><img
src="http://blogs.discovery.com/.a/6a00d8341bf67c53ef0168ebeb2dc3970c-800wi"
title="Dead Lawn? Paint it Green: DNews Nugget" alt="Dead Lawn? Paint it
Green: DNews Nugget" class="" /></a></dd>
</dl>
=====================end=======================
我需要取得標頭是
Hottest Spots in America: Big Pic
Dead Lawn? Paint it Green: DNews Nugget
我觀察到的規則是標頭後會帶有</a> </h2>
中間會插入兩個空白,前頭會有location及position
所以我用了以下的表示法
location([\s\S]+)position([\s\S]+)return\sfalse([\S\s]+)</a>\s\s</h2>
但會全部選擇
請問我要怎麼改
謝謝
作者: wtuxxj (尋找Miss Right)   2012-07-29 21:54:00
position:'[0-9]{1}'}\); return false;\">([^>]+)</a>\s\s</h2> 自問自答

Links booklink

Contact Us: admin [ a t ] ucptt.com