Use php5's simplexml to parse various feeds

Author：Eve Cole Update Time：2009-06-06 18:16:04

Use simplexml to process atom data. Many blogs use atom to output data, but atom uses namespace, so now you must specify the namespace uniform resource identifier (URI) when requesting named elements and local names. Another point is The xpath method of simplexml cannot directly query this xml tree.

Starting with PHP version 5.1, SimpleXML can use XPath queries directly on namespace documents. As usual, XPath location paths must use a namespace prefix, even if the document being searched uses the default namespace. The registerXPathNamespace() function associates the prefix with the namespace URL used in subsequent queries.

The following is an example of using xpath to query the title element of an atom document:

PLAIN TEXT
CODE:
$atom = simplexml_load_file('http://www.ooso.net/index.php/feed/atom');
$atom->registerXPathNamespace('atom','http://www.w3.org/2005/Atom');
$titles = $atom->xpath('//atom:title');
foreach($titles as $title)
echo"<h2>". $title ."</h2>";
Use simplexml to process rss data
Wordpress can output rss2 data sources, and there are also some different namespaces, such as dc. An example of using simplexml to parse rss2:

PLAIN TEXT
PHP:
$ns=array(
'content'=>'http://purl.org/rss/1.0/modules/content/',
'wfw'=>'http://wellformedweb.org/CommentAPI/',
'dc'=>'http://purl.org/dc/elements/1.1/'
);

$articles=array();

// step 1: Get feed
$blogUrl='http://www.ooso.net/index.php/feed/rss2';
$xml= simplexml_load_url($blogUrl);

// step 2: Get channel metadata
$channel=array();
$channel['title'] =$xml->channel->title;
$channel['link'] =$xml->channel->link;
$channel['description']=$xml->channel->description;
$channel['pubDate'] =$xml->pubDate;
$channel['timestamp'] =strtotime($xml->pubDate);
$channel['generator'] =$xml->generator;
$channel['language'] =$xml->language;

// step 3: Get articles
foreach($xml->channel->itemas$item){
$article=array();
$article['channel']=$blog;
$article['title']=$item->title;
$article['link']=$item->link;
$article['comments']=$item->comments;
$article['pubDate']=$item->pubDate;
$article['timestamp']=strtotime($item->pubDate);
$article['description']=(string)trim($item->description);
$article['isPermaLink']=$item->guid['isPermaLink'];

// get data held in namespaces
$content=$item->children($ns['content']);
$dc =$item->children($ns['dc']);
$wfw =$item->children($ns['wfw']);

$article['creator']=(string)$dc->creator;
foreach($dc->subjectas$subject)
$article['subject'][]=(string)$subject;

$article['content']=(string)trim($content->encoded);
$article['commentRss']=$wfw->commentRss;

// add this article to the list
$articles[$article['timestamp']]=$article;
}
In this example, use the children method to obtain data in the namespace:

PLAIN TEXT
PHP:
$dc =$item->children($ns['dc']);