In some applications, the size of the XML document
may prevent loading the whole document due to limited device memory.
The Simple API for XML (SAX) is another XML parsing model that is
different from DOM. In SAX, you configure the parser with callback
functions. The SAX parser will use these function pointers to call your
functions, informing you of important events. For example, if you are
interested in the event Start of Document, you set up a function for this event and give the parser a pointer to it.
Listing 1 shows the remainder of the fetchAbsconders method pertaining to SAX parsing.
Example 1. SAX XML Parsing. Remainder of fetchAbsconders method.
else if(parser == XML_PARSER_SAX){ xmlParserCtxtPtr ctxt = xmlCreateDocParserCtxt(XMLChars); int parseResult = xmlSAXUserParseMemory(&rssSAXHandler, self, XMLChars, strlen(XMLChars));
xmlFreeParserCtxt(ctxt); xmlCleanupParser(); } [pool release]; }
|
To use SAX in libxml2, you first set up a parser context using the function xmlCreateDocParserCtxt(),
which takes a single parameter: the XML document represented as a
C-string. After that, you start the SAX parser by calling the xmlSAXUserParseMemory() function. The function is declared in parser.h as:
int xmlSAXUserParseMemory (xmlSAXHandlerPtr sax, void * user_data,
const char * buffer, int size)
This function parses an in-memory buffer and calls
your registered functions as necessary. The first parameter to this
function is a pointer to the SAX handler. The SAX handler is a
structure holding the pointers to your callback functions. The second
parameter is an optional pointer that is application-specific. The
value specified will be used as the context when the SAX parser calls
your callback functions. The third and fourth parameters are used for
the C-string XML document in memory and its length, respectively.
The SAX handler is where you store the pointers to
your callback functions. If you are not interested in an event type,
just store a NULL value in its field. The following is the definition of the structure in tree.h:
struct _xmlSAXHandler {
internalSubsetSAXFunc internalSubset;
isStandaloneSAXFunc isStandalone;
hasInternalSubsetSAXFunc hasInternalSubset;
hasExternalSubsetSAXFunc hasExternalSubset;
resolveEntitySAXFunc resolveEntity;
getEntitySAXFunc getEntity;
entityDeclSAXFunc entityDecl;
notationDeclSAXFunc notationDecl;
attributeDeclSAXFunc attributeDecl;
elementDeclSAXFunc elementDecl;
unparsedEntityDeclSAXFunc unparsedEntityDecl;
setDocumentLocatorSAXFunc setDocumentLocator;
startDocumentSAXFunc startDocument;
endDocumentSAXFunc endDocument;
startElementSAXFunc startElement;
endElementSAXFunc endElement;
referenceSAXFunc reference;
charactersSAXFunc characters;
ignorableWhitespaceSAXFunc ignorableWhitespace;
processingInstructionSAXFunc processingInstruction;
commentSAXFunc comment;
warningSAXFunc warning;
errorSAXFunc error;
fatalErrorSAXFunc fatalError;
getParameterEntitySAXFunc getParameterEntity;
cdataBlockSAXFunc cdataBlock;
externalSubsetSAXFunc externalSubset;
unsigned int initialized;
// The following fields are extensions
void * _private;
startElementNsSAX2Func startElementNs;
endElementNsSAX2Func endElementNs;
xmlStructuredErrorFunc serror;
};
Listing 2 shows our SAX handler.
Example 2. Our SAX handler.
static xmlSAXHandler rssSAXHandler ={ NULL, /* internalSubset */ NULL, /* isStandalone */ NULL, /* hasInternalSubset */ NULL, /* hasExternalSubset */ NULL, /* resolveEntity */ NULL, /* getEntity */ NULL, /* entityDecl */ NULL, /* notationDecl */ NULL, /* attributeDecl */ NULL, /* elementDecl */ NULL, /* unparsedEntityDecl */ NULL, /* setDocumentLocator */ NULL, /* startDocument */ NULL, /* endDocument */ NULL, /* startElement*/ NULL, /* endElement */ NULL, /* reference */ charactersFoundSAX, /* characters */ NULL, /* ignorableWhitespace */ NULL, /* processingInstruction */ NULL, /* comment */ NULL, /* warning */ errorEncounteredSAX, /* error */ fatalErrorEncounteredSAX, /* fatalError */ NULL, /* getParameterEntity */ NULL, /* cdataBlock */ NULL, /* externalSubset */ XML_SAX2_MAGIC, // NULL, startElementSAX, /* startElementNs */
endElementSAX, /* endElementNs */ NULL, /* serror */ };
|
Aside from the function pointers, the initialized field should be set to the value XML_SAX2_MAGIC in order to indicate that the handler is used for a SAX2 parser. Once you call the xmlSAXUserParseMemory(), the SAX parser starts the parsing of the document and calling your registered callback functions.
We are mainly interested in three functions: startElementNsSAX2Func(), endElementNsSAX2Func(), and charactersSAXFunc().
startElementNsSAX2Func() is called when the parser encounters the start of a new element. startElementNsSAX2Func() is defined in tree.h as:
void startElementNsSAX2Func (void * ctx, const xmlChar * localname,
const xmlChar * prefix, const xmlChar *URI,
int nb_namespaces,
const xmlChar ** namespaces,
int nb_attributes, int nb_defaulted,
const xmlChar ** attributes)
ctx is the user data, and it is the second value you used when you called the function xmlSAXUserParseMemory(). In our case, it is a pointer to the class XORSSFeedNebraska. Then localname is the local name of the element. prefix is the element namespace prefix (if available). URI is the element namespace name (if available). nb_namespaces is number of namespace definitions on that node. namespaces is a pointer to the array of prefix/URI pair namespace definitions. nb_attributes is the number of attributes on that node. nb_defaulted is the number of defaulted attributes. The defaulted ones are at the end of the array. attributes is a pointer to the array of (localname/prefix/URI/value/end) attribute values.
Listing 3 shows the definition of our startElementNsSAX2Func() function.
Example 3. The startElementSAX() callback function.
static void startElementSAX(void *ctx, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI, int nb_namespaces, const xmlChar **namespaces, int nb_attributes, int nb_defaulted, const xmlChar **attributes) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; XORSSFeedNebraska *feedNebraska = (XORSSFeedNebraska*) ctx; if (feedNebraska.currentElementContent) {
[feedNebraska.currentElementContent release]; feedNebraska.currentElementContent = nil; } if ((!xmlStrcmp(localname, (const xmlChar *)"item"))) { feedNebraska.currAbsconder = [[XOAbsconder alloc] init]; } [pool release]; }
|
It's good practice to have an autorelease pool per function. We first start by casting the ctx to a pointer to our class XORSSFeedNebraska. The class and its parent are declared in Listings 4 and 5.
Example 4. The XORSSFeedNebraska class declaration.
#import "XORSSFeed.h" @interface XORSSFeedNebraska : XORSSFeed { } @end
|
Example 5. The XORSSFeed class declaration.
@class XOAbsconder; typedef enum { XML_PARSER_DOM, XML_PARSER_SAX } XMLParser;
@interface XORSSFeed : NSObject { NSString *feedURL; NSMutableArray *absconders; XMLParser parser; NSMutableString *currentElementContent; XOAbsconder *currAbsconder; } @property(nonatomic, copy) NSString *feedURL; @property(nonatomic, assign) XMLParser parser; @property(nonatomic, assign) NSMutableString *currentElementContent; @property(nonatomic, assign) XOAbsconder *currAbsconder; -(id)init; -(id)initWithURL:(NSString*) feedURL; -(void)fetchAbsconders; -(NSUInteger)numberOfAbsconders; -(XOAbsconder*)absconderAtIndex:(NSUInteger) index; -(void)addAbsconder:(XOAbsconder*)absconder; @end
|
The XORSSFeedNebraska object has an instance variable of type NSMutableString called currentElementContent. This variable holds the text value inside an element. It's constructed in our charactersFoundSAX() function and used in the endElementSAX() function. The function startElementSAX() always releases and so we set this instance variable to nil (if it is not already nil). This will ensure that we start with an empty string for holding the text. If the element name is item, we create a new object of the XOAbsconder class. This is a simple class holding the three pieces of data information about an individual absconder. Listing 6 shows the declaration of the XOAbsconder and Listing 7 shows its definition.
Example 6. The XOAbsconder class declaration.
#import <UIKit/UIKit.h> @interface XOAbsconder : NSObject { NSString *name; NSString *furtherInfoURL; NSString *desc; }
@property(copy) NSString *name; @property(copy) NSString *furtherInfoURL; @property(copy) NSString *desc; -(id)init; -(id)initWithName:(NSString*)name andURL:(NSString*)url andDescription:(NSString*)desc; -(NSString*)description; @end
|
Example 7. The XOAbsconder class definition.
#import "XOAbsconder.h"
@implementation XOAbsconder @synthesize name; @synthesize furtherInfoURL; @synthesize desc;
-(id)initWithName:(NSString*)name andURL:(NSString*)url andDescription:(NSString*)description{ self = [super init]; if(self){ self.name = name; self.furtherInfoURL = url; self.desc = description; } return self; }
-(id)init{ return [self initWithName:@"" andURL:@"" andDescription:@""]; }
-(NSString*)description{ return [NSString stringWithString:name]; }
-(void)dealloc{ [name release]; [furtherInfoURL release]; [desc release]; [super dealloc]; } @end
|
Our endElementNsSAX2Func() function is called endElementSAX() and is shown in Listing 8.
Example 8. The endElementSAX() function definition.
static void endElementSAX (void *ctx, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI) { NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; XORSSFeedNebraska *feedNebraska = (XORSSFeedNebraska*) ctx; if ((!xmlStrcmp(localname, (const xmlChar *)"item"))) { if(feedNebraska.currAbsconder){ [feedNebraska addAbsconder:feedNebraska.currAbsconder]; } [feedNebraska.currAbsconder release]; feedNebraska.currAbsconder = nil; } else if ((!xmlStrcmp(localname,(const xmlChar *)"title"))) { if(feedNebraska.currAbsconder){ feedNebraska.currAbsconder.name = feedNebraska.currentElementContent; } } else if ((!xmlStrcmp(localname, (const xmlChar *)"link"))) { if(feedNebraska.currAbsconder){ feedNebraska.currAbsconder.furtherInfoURL = feedNebraska.currentElementContent; }
} else if ((!xmlStrcmp(localname,(const xmlChar *)"description"))) { if(feedNebraska.currAbsconder){ feedNebraska.currAbsconder.desc = feedNebraska.currentElementContent; } }
if (feedNebraska.currentElementContent) { [feedNebraska.currentElementContent release]; feedNebraska.currentElementContent = nil; } [pool release]; }
|
The function first checks to see if the element's name is item. If it is, then we add the XOAbsconder object which was constructed by the other callback functions. Otherwise, we check for the three element names: title, link, and description. For each of these elements, we set its respective text value gathered by the charactersSAXFunc() function. For example, the following sets the desc instance variable with the current text value.
feedNebraska.currAbsconder.desc = feedNebraska.currentElementContent;
The text of the element is stored in charactersSAXFunc(). The function is declared in parser.h as:
void charactersSAXFunc (void * ctx, const xmlChar * ch, int len)
This function is called by the parser informing you
of new found characters. In addition to the context, you receive the
string of characters and its length. Between the start of an element
and the end of that element, this function might be called several
times. Your function should take this into account and append the new
text to the current string.
Our charactersFoundSAX() function is shown in Listing 9.
Example 9. The charactersFoundSAX() function definition.
static void charactersFoundSAX(void * ctx, const xmlChar * ch, int len){ NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; XORSSFeedNebraska *feedNebraska =(XORSSFeedNebraska*) ctx; CFStringRef str = CFStringCreateWithBytes(kCFAllocatorSystemDefault, ch, len, kCFStringEncodingUTF8, false) if (!feedNebraska.currentElementContent) { feedNebraska.currentElementContent = [[NSMutableString alloc] init]; } [feedNebraska.currentElementContent appendString:(NSString *)str];
CFRelease(str); [pool release]; }
|
The function starts by casting the ctx into a XORSSFeedNebraska
instance. Using this pointer, we can call our Objective-C class. After
that, we create a string from received characters by using the function
CFStringCreateWithBytes(), which is declared as follows:
CFStringRef CFStringCreateWithBytes (
CFAllocatorRef alloc,
const UInt8 *bytes,
CFIndex numBytes,
CFStringEncoding encoding,
Boolean isExternalRepresentation
);
The first parameter is used to specify the memory allocator. kCFAllocatorDefault
is used for the current default allocator. The second parameter is the
buffer which contains the characters. The third parameter specifies the
number of bytes. The fourth parameter is the encoding. We use kCFStringEncodingUTF8
for UTF8 encoding. The fifth parameter is used to specify if the
characters in the buffer are in an external representation format.
Since they are not, we use false.
Once we have the string representation of the characters, we check to see if this is the first time charactersFoundSAX
has been called for the current element. Recall that the parser can
call this function multiple times, supplying the content of a single
element. If it is the first time, we allocate our mutable string. After
that, we append the string that we created from the character buffer to
the mutable string. When the endElementSAX() function is called, we retrieve this string to build our Objective-C object, currAbsconder. When we are finished with the string str, we use the CFRelease() function to deallocate it.
Finally, the error handling functions are shown in Listings 10 and 11. As in all other event functions, what you do for error-handling depends on your application. In our example, we release the currAbsconder object that we are constructing and log the problem.
Example 10. The errorEncounteredSAX() function definition.
static void errorEncounteredSAX (void * ctx, const char * msg, ...){ XORSSFeedNebraska *feedNebraska = (XORSSFeedNebraska*) ctx; if(feedNebraska.currAbsconder){ [feedNebraska.currAbsconder release]; feedNebraska.currAbsconder = nil; } NSLog(@"errorEncountered: %s", msg); }
|
Example 11. The fatalErrorEncounteredSAX() function definition.
static void fatalErrorEncounteredSAX (void * ctx, const char * msg, ...){
XORSSFeedNebraska *feedNebraska = (XORSSFeedNebraska*) ctx;
if(feedNebraska.currAbsconder){
[feedNebraska.currAbsconder release];
feedNebraska.currAbsconder = nil;
}
NSLog(@"fatalErrorEncountered: %s", msg);
}