Using this model, the parser will load the
whole XML document to memory and present it to the client as a tree.
You can navigate the nodes of this tree and extract relevant
information.
Listing 1 shows Objective-C code that first fetches the RSS XML document from a URL, puts it into a string that the libxml2 library can work with, and then uses libxml2's functions to navigate the parsed tree and extract the relevant information.
Example 1. DOM XML Parsing.
1 #include <libxml/xmlmemory.h> 2 #include <libxml/parser.h> 3 4 -(void)fetchAbsconders{
5 NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init]; 6 NSError *err = nil; 7 NSURL * url = [NSURL URLWithString:feedURL]; 8 NSString *URLContents = [NSString stringWithContentsOfURL:url 9 encoding:NSUTF8StringEncoding error:&err]; 10 if(!URLContents) 11 return; 12 const char *XMLChars = 13 [URLContents cStringUsingEncoding:NSUTF8StringEncoding]; 14 15 if(parser == XML_PARSER_DOM){ 16 xmlDocPtr doc = xmlParseMemory(XMLChars, strlen(XMLChars)); 17 xmlNodePtr cur; 18 if (doc == NULL ) { 19 return; 20 } 21 cur = xmlDocGetRootElement(doc); 22 cur = findNextItem(cur); 23 while (cur){ 24 XOAbsconder *absconder = getitem(doc, cur); 25 if(absconder){ 26 [absconders addObject:absconder]; 27 } 28 cur = findNextItem(cur->next); 29 } 30 xmlFreeDoc(doc); 31 }
|
On line 7, we create an NSURL object from the URL address string representation, feedURL, of the RSS feed address. The statement on lines 8 and 9 uses the NSString's class method stringWithContentsOfURL:encoding:error: to create a string containing the contents of the URL. The method fetches the RSS feed file from the server and puts it in the NSString instance, URLContents.
On line 10, we check to see if the string was successfully created. If it was not, the fetchAbsconders method returns without changing the absconders array. Of course, in a production code, you will use the error object to propagate the error to the client.
Once we have an NSString object with the contents of the RSS feed file, we need to convert it to a C-string (char*), the format that libxml2 works with. The statement on lines 12 and 13 does just that. We use the NSString instance method cStringUsingEncoding: with the encoding NSUTF8StringEncoding.
The fetchAbsconders method demonstrates the use of two XML parsing schemes. Listing 1 shows the first half of this method and it covers the DOM parsing.
To work with any XML document using DOM, you first
need to load it into memory in the form of a tree. The function to
achieve that is xmlParseMemory(). The function is declared in parser.h as:
xmlDocPtr xmlParseMemory (const char * buffer, int size)
It takes the XML document, represented by a
C-string, and the size of this string as input. It returns a pointer to
the tree representation of the parsed document in the form of xmlDocPtr (a pointer to xmlDoc).
The xmlDoc is a structure defined in tree.h. The following shows the first few lines of this structure.
struct _xmlDoc {
void *_private; /* application data */
xmlElementType type; /* XML_DOCUMENT_NODE */
char *name; /* name/filename/URI of the document */
struct _xmlNode *children; /* the document tree */
struct _xmlNode *last; /* last child link */
struct _xmlNode *parent; /* child->parent link */
...
};
Now that we have a tree representation of the XML
document, we can start traversing it. To begin traversing, line 21
obtains the root node using the function xmlDocGetRootElement(). The function returns xmlNodePtr, which is a pointer to the root node, xmlNode.
Every node is represented by the xmlNode structure defined in tree.h as follows:
typedef struct _xmlNode xmlNode;
typedef xmlNode *xmlNodePtr;
struct _xmlNode {
void *_private; /* application data */
xmlElementType type;/* type number*/
const xmlChar *name; /* name of the node, or entity */
struct _xmlNode *children;/* parent->children link */
struct _xmlNode *last; /* last child link */
struct _xmlNode *parent;/* child->parent link */
struct _xmlNode *next;/* next sibling link */
struct _xmlNode *prev; /* previous sibling link */
struct _xmlDoc *doc; /* the containing document */
/* End of common part */
xmlNs *ns; /* pointer to the associated namespace */
xmlChar *content; /* the content */
struct _xmlAttr *properties;/* properties list */
xmlNs *nsDef; /* namespace definitions on this node */
void *psvi; /* for type/PSVI informations */
unsigned short line; /* line number */
unsigned short extra; /* extra data for XPath/XSLT */
};
Most of these fields are self-explanatory. You will
be dealing mostly with the fields which link to other nodes. If you are
at a given node, you can go to its parent using the parent field. If you want its children, use children. If you want the siblings (i.e., those nodes with same parent as your parent), use the next field.
Figure 3 shows a graphical representation of the navigational links available for various nodes in the document tree.
Now that we have a pointer to the root of the document, we search for the first item in the RSS feed. This is shown in the statement on line 22: cur = findNextItem(cur). The function findNextItem() is defined in Listing 2.
Example 2. Searching for an item element in the RSS feed.
xmlNodePtr findNextItem(xmlNodePtr curr){ if(!curr) return curr; if ((!xmlStrcmp(curr->name, (const xmlChar *)"item")) && (curr->type == XML_ELEMENT_NODE)) { return curr; }
if(curr->type == XML_TEXT_NODE){ return findNextItem(curr->next); } if(curr->type == XML_ELEMENT_NODE){ if ((!xmlStrcmp(curr->name, (const xmlChar *)"channel")) || (!xmlStrcmp(curr->name, (const xmlChar *)"rss"))){ return findNextItem(curr->xmlChildrenNode); } } if(curr->type == XML_ELEMENT_NODE){ if((!xmlStrcmp(curr->name, (const xmlChar *)"title")) || (!xmlStrcmp(curr->name, (const xmlChar *)"link")) || (!xmlStrcmp(curr->name,(const xmlChar *)"description")) || (!xmlStrcmp(curr->name, (const xmlChar *)"image"))){ return findNextItem(curr->next); } } return NULL; }
|
The function makes recursive calls to itself as long as the item tag has not been found. At the beginning, we check for the termination condition. We use the xmlStrcmp() function to see if the node's name is "item".
If yes, we return the pointer to that node. The rest of the code has
similar logic. The only difference is that, when we are interested in a
given subtree, we use the xmlChildrenNode link to traverse
that subtree. If we are not interested in the node, we skip the subtree
altogether and go to the next sibling using the next link.
Now that we have a pointer to an item element node,
we retrieve the three element children and build an Objective-C object
from the data. The function getitem() is where such logic is found. The function is called as follows:
XOAbsconder *absconder = getitem(doc, cur);
getitem() takes the document and node pointers and returns either the XOAbsconder object or nil. Listing 3 presents the implementation of the getitem() function.
Example 3. Building an XOAbsconder object from an item element.
XOAbsconder* getitem (xmlDocPtr doc, xmlNodePtr curr){ xmlChar *name, *link, *description; curr = curr->xmlChildrenNode; if(!curr) return nil; while (curr && (curr->type == XML_TEXT_NODE)) curr = curr->next; if(!curr) return nil;
if ((!xmlStrcmp(curr->name,(const xmlChar *)"title")) && (curr->type == XML_ELEMENT_NODE)) { name = xmlNodeListGetString(doc, curr->xmlChildrenNode, 1); curr = curr->next; while (curr && (curr->type == XML_TEXT_NODE)) curr = curr->next; if(!curr){ xmlFree(name); return nil; } } else return nil; if ((!xmlStrcmp(curr->name, (const xmlChar *)"link")) && (curr->type == XML_ELEMENT_NODE)) { link = xmlNodeListGetString(doc, curr->xmlChildrenNode, 1); curr = curr->next; while (curr && (curr->type == XML_TEXT_NODE)) curr = curr->next; if(!curr){ xmlFree(name); xmlFree(link); return nil; } } else return nil; if ((!xmlStrcmp(curr->name, (const xmlChar *)"description")) && (curr->type == XML_ELEMENT_NODE)) { description = xmlNodeListGetString(doc, curr->xmlChildrenNode, 1); } else{ xmlFree(name); xmlFree(link); xmlFree(description); return nil; } XOAbsconder *abscender = [[XOAbsconder alloc] initWithName:[NSString stringWithCString:name] andURL:[NSString stringWithCString:link] andDescription:[NSString stringWithCString:description]]; [abscender autorelease]; xmlFree(name); xmlFree(link); xmlFree(description); return abscender; }
|
We traverse all the children of the node. Since in
XML a whitespace is recognized as a valid child node, we skip those at
the beginning:
while (curr && (curr->type == XML_TEXT_NODE))
curr = curr->next;
Once we have skipped the text nodes, we check for the three elements: title, link, and description. The function requires that they appear in that order.
To retrieve the text value for each of these three elements, we can use the xmlNodeListGet-String function. The function is declared in tree.h as:
xmlChar *xmlNodeListGetString (xmlDocPtr doc,xmlNodePtr list,int inLine)
It constructs a string from the node list. If inLine
is 1, the entity contents are replaced. The function returns the string
and the caller is responsible for freeing the memory of the string
using the xmlFree() function.
After retrieving the text of the three elements, we create the XOAbsconder, autorelease it, free the memory of the three strings, and return the XOAbsconder object.
Back to Listing 1, the fetchAbsconders method keeps calling the getitem() function and adding the objects to the absconders array in the statement:
[absconders addObject:absconder];
When the fetchAbsconders method is finished, the absconders array contains the absconder objects created and populated from the RSS feed document.