iPhone SDK 3 Programming : XML Processing - Document Object Model (DOM)

11/13/2013 6:42:31 PM

Using this model, the parser will load the whole XML document to memory and present it to the client as a tree. You can navigate the nodes of this tree and extract relevant information.

Listing 1 shows Objective-C code that first fetches the RSS XML document from a URL, puts it into a string that the libxml2 library can work with, and then uses libxml2's functions to navigate the parsed tree and extract the relevant information.

Figure 1. Adding HEADER_SEARCH_PATHS = /usr/include/libxml2 to the project.

Figure 2. Adding libxml2 library to the target.

Example 1. DOM XML Parsing.

1 #include <libxml/xmlmemory.h>
2 #include <libxml/parser.h>
3
4 -(void)fetchAbsconders{

5   NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
6   NSError *err = nil;
7   NSURL * url = [NSURL URLWithString:feedURL];
8   NSString *URLContents = [NSString stringWithContentsOfURL:url
9                               encoding:NSUTF8StringEncoding error:&err];
10   if(!URLContents)
11     return;
12   const char *XMLChars =
13   [URLContents cStringUsingEncoding:NSUTF8StringEncoding];
14
15   if(parser == XML_PARSER_DOM){
16     xmlDocPtr doc = xmlParseMemory(XMLChars, strlen(XMLChars));
17     xmlNodePtr cur;
18     if (doc == NULL ) {
19       return;
20     }
21     cur = xmlDocGetRootElement(doc);
22     cur = findNextItem(cur);
23     while (cur){
24       XOAbsconder *absconder =  getitem(doc, cur);
25       if(absconder){
26         [absconders addObject:absconder];
27       }
28       cur = findNextItem(cur->next);
29     }
30     xmlFreeDoc(doc);
31   }

On line 7, we create an NSURL object from the URL address string representation, feedURL, of the RSS feed address. The statement on lines 8 and 9 uses the NSString's class method stringWithContentsOfURL:encoding:error: to create a string containing the contents of the URL. The method fetches the RSS feed file from the server and puts it in the NSString instance, URLContents.

On line 10, we check to see if the string was successfully created. If it was not, the fetchAbsconders method returns without changing the absconders array. Of course, in a production code, you will use the error object to propagate the error to the client.

Once we have an NSString object with the contents of the RSS feed file, we need to convert it to a C-string (char*), the format that libxml2 works with. The statement on lines 12 and 13 does just that. We use the NSString instance method cStringUsingEncoding: with the encoding NSUTF8StringEncoding.

The fetchAbsconders method demonstrates the use of two XML parsing schemes. Listing 1 shows the first half of this method and it covers the DOM parsing.

To work with any XML document using DOM, you first need to load it into memory in the form of a tree. The function to achieve that is xmlParseMemory(). The function is declared in parser.h as:

xmlDocPtr xmlParseMemory  (const char * buffer, int size)

It takes the XML document, represented by a C-string, and the size of this string as input. It returns a pointer to the tree representation of the parsed document in the form of xmlDocPtr (a pointer to xmlDoc).

The xmlDoc is a structure defined in tree.h. The following shows the first few lines of this structure.

struct  _xmlDoc {
    void            *_private; /* application data */ 
    xmlElementType  type;       /* XML_DOCUMENT_NODE */ 
    char   *name;  /* name/filename/URI of the document */ 
    struct  _xmlNode *children;  /* the document tree */ 
    struct  _xmlNode *last;  /* last child link */ 
    struct  _xmlNode *parent;  /* child->parent link */ 
...
};

Now that we have a tree representation of the XML document, we can start traversing it. To begin traversing, line 21 obtains the root node using the function xmlDocGetRootElement(). The function returns xmlNodePtr, which is a pointer to the root node, xmlNode.

Every node is represented by the xmlNode structure defined in tree.h as follows:

typedef struct  _xmlNode xmlNode;
typedef  xmlNode *xmlNodePtr;

struct  _xmlNode {
    void            *_private; /* application data */ 
    xmlElementType   type;/* type number*/ 
    const  xmlChar   *name; /* name of the node, or entity */ 
    struct  _xmlNode *children;/* parent->children link */ 
    struct  _xmlNode *last;  /* last child link */ 
    struct  _xmlNode *parent;/* child->parent link */ 
    struct  _xmlNode *next;/* next sibling link  */ 
    struct  _xmlNode *prev;  /* previous sibling link  */ 
    struct  _xmlDoc  *doc; /* the containing document */ 

    /* End of common part */ 
    xmlNs   *ns; /* pointer to the associated namespace */ 
    xmlChar  *content;   /* the content */ 
    struct  _xmlAttr *properties;/* properties list */ 
    xmlNs   *nsDef; /* namespace definitions on this node */ 
    void    *psvi; /* for type/PSVI informations */

unsigned short    line;  /* line number */ 
    unsigned short    extra; /* extra data for XPath/XSLT */ 
};

Most of these fields are self-explanatory. You will be dealing mostly with the fields which link to other nodes. If you are at a given node, you can go to its parent using the parent field. If you want its children, use children. If you want the siblings (i.e., those nodes with same parent as your parent), use the next field.

Figure 3 shows a graphical representation of the navigational links available for various nodes in the document tree.

Figure 3. Representation of the navigational links available for various nodes in the document tree.

Now that we have a pointer to the root of the document, we search for the first item in the RSS feed. This is shown in the statement on line 22: cur = findNextItem(cur). The function findNextItem() is defined in Listing 2.

Example 2. Searching for an item element in the RSS feed.

xmlNodePtr findNextItem(xmlNodePtr curr){
  if(!curr)
    return curr;
  if ((!xmlStrcmp(curr->name, (const xmlChar *)"item")) &&
         (curr->type == XML_ELEMENT_NODE)) {
    return curr;
  }

if(curr->type == XML_TEXT_NODE){
    return findNextItem(curr->next);
  }
  if(curr->type == XML_ELEMENT_NODE){
    if ((!xmlStrcmp(curr->name, (const xmlChar *)"channel"))
       || (!xmlStrcmp(curr->name, (const xmlChar *)"rss"))){
          return  findNextItem(curr->xmlChildrenNode);
    }
  }
  if(curr->type == XML_ELEMENT_NODE){
    if((!xmlStrcmp(curr->name, (const xmlChar *)"title"))
      || (!xmlStrcmp(curr->name, (const xmlChar *)"link"))
      || (!xmlStrcmp(curr->name,(const xmlChar *)"description"))
      || (!xmlStrcmp(curr->name, (const xmlChar *)"image"))){
         return  findNextItem(curr->next);
    }
  }
  return NULL;
}

The function makes recursive calls to itself as long as the item tag has not been found. At the beginning, we check for the termination condition. We use the xmlStrcmp() function to see if the node's name is "item". If yes, we return the pointer to that node. The rest of the code has similar logic. The only difference is that, when we are interested in a given subtree, we use the xmlChildrenNode link to traverse that subtree. If we are not interested in the node, we skip the subtree altogether and go to the next sibling using the next link.

Now that we have a pointer to an item element node, we retrieve the three element children and build an Objective-C object from the data. The function getitem() is where such logic is found. The function is called as follows:

XOAbsconder *absconder = getitem(doc, cur);

getitem() takes the document and node pointers and returns either the XOAbsconder object or nil. Listing 3 presents the implementation of the getitem() function.

Example 3. Building an XOAbsconder object from an item element.

XOAbsconder* getitem (xmlDocPtr doc, xmlNodePtr curr){
  xmlChar *name, *link, *description;
  curr = curr->xmlChildrenNode;
  if(!curr)
    return nil;
  while (curr && (curr->type == XML_TEXT_NODE))
    curr = curr->next;
  if(!curr)
    return nil;

if ((!xmlStrcmp(curr->name,(const xmlChar *)"title")) &&
       (curr->type == XML_ELEMENT_NODE)) {
    name = xmlNodeListGetString(doc, curr->xmlChildrenNode, 1);
    curr = curr->next;
    while (curr && (curr->type == XML_TEXT_NODE))
      curr = curr->next;
    if(!curr){
      xmlFree(name);
      return nil;
    }
  }
  else
    return nil;
  if ((!xmlStrcmp(curr->name, (const xmlChar *)"link")) &&
  (curr->type == XML_ELEMENT_NODE)) {
    link = xmlNodeListGetString(doc, curr->xmlChildrenNode, 1);
    curr = curr->next;
    while (curr && (curr->type == XML_TEXT_NODE))
      curr = curr->next;
    if(!curr){
      xmlFree(name);
      xmlFree(link);
      return nil;
    }
  }
  else
    return nil;
  if ((!xmlStrcmp(curr->name, (const xmlChar *)"description")) &&
       (curr->type == XML_ELEMENT_NODE)) {
    description = xmlNodeListGetString(doc, curr->xmlChildrenNode, 1);
  }
  else{
    xmlFree(name);
    xmlFree(link);
    xmlFree(description);
    return nil;
  }
  XOAbsconder *abscender = [[XOAbsconder alloc]
    initWithName:[NSString stringWithCString:name]
    andURL:[NSString stringWithCString:link]
    andDescription:[NSString stringWithCString:description]];
  [abscender autorelease];
  xmlFree(name);
  xmlFree(link);
  xmlFree(description);
  return abscender;
}

We traverse all the children of the node. Since in XML a whitespace is recognized as a valid child node, we skip those at the beginning:

while (curr && (curr->type == XML_TEXT_NODE))
    curr = curr->next;

Once we have skipped the text nodes, we check for the three elements: title, link, and description. The function requires that they appear in that order.

To retrieve the text value for each of these three elements, we can use the xmlNodeListGet-String function. The function is declared in tree.h as:

xmlChar *xmlNodeListGetString  (xmlDocPtr doc,xmlNodePtr list,int inLine)

It constructs a string from the node list. If inLine is 1, the entity contents are replaced. The function returns the string and the caller is responsible for freeing the memory of the string using the xmlFree() function.

After retrieving the text of the three elements, we create the XOAbsconder, autorelease it, free the memory of the three strings, and return the XOAbsconder object.

Back to Listing 1, the fetchAbsconders method keeps calling the getitem() function and adding the objects to the absconders array in the statement:

[absconders addObject:absconder];

When the fetchAbsconders method is finished, the absconders array contains the absconder objects created and populated from the RSS feed document.

Other

iPhone SDK 3 Programming : XML and RSS

Windows Phone 8 : Making Money - Modifying Your Application, Dealing with Failed Submissions, Using Ads in Your Apps

Windows Phone 8 : Making Money - Submitting Your App (part 3) - After the Submission

Windows Phone 8 : Making Money - Submitting Your App (part 2) - The Submission Process

Windows Phone 8 : Making Money - Submitting Your App (part 1) - Preparing Your Application

Windows Phone 8 : Making Money - What Is the Store?

BlackBerry Push APIs (part 3) - Building an Application that Uses the BlackBerry Push APIs - Checking the Status of a Push Request and Cancelling a Push Request

BlackBerry Push APIs (part 2) - Building an Application that Uses the BlackBerry Push APIs - Unsubscribing From the Push System, Pushing Data to a Subscriber

BlackBerry Push APIs (part 1) - Building an Application that Uses the BlackBerry Push APIs - BlackBerry Push API Domains , Subscriber Registration

Windows Phone 8 : Phone Hardware - Speech Synthesis