In this section we discuss XML and then talk about RSS as one of its applications.
1. XML
Extensible Markup Language (XML) is a meta-language
specification for exchanging information over the Internet. As a
meta-language, it can be used to define application-specific languages
which are then used to instantiate XML documents that adhere to the
semantics of these languages.
The power behind XML is due to:
its extensibility, which allows anyone to define new XML elements
it being based on text, thus opening your application data to being used on any computing system.
To create/use an XML language, you need to identify the elements used in that language. An XML element uses:
begin-tag
text content
end-tag.
For example, the element person can appear in an XML document as:
<person>
content ...
</person>
Where <person> is the begin-tag and </person> is the end-tag. The content can itself be composed of text and other elements. For example:
<person>
<name>content of name...</name>
<address>content of address...</address>
</person>
If the text content of a given element contains
characters that are difficult to include (e.g., "<", ">",
"&", etc.), entities can be used for their representation. For
example, "<" can be represented by the entity reference <.
The fact that an XML document must have exactly one
root element, and that any given element may contain other elements,
allows us to naturally represent the XML document as a tree. For
example, the following XML document can be represented as a tree (see Figure 1).
To work with an XML document, you need to be able to
parse it (e.g., construct a tree representation of the document in
memory as shown in Figure 1). There are several techniques for parsing and we will cover those shortly. libxml2
is an XML parser written in C that is available on, and recommended for
use with, the iPhone OS. As we will see shortly, working with this
library is very easy. You will be able to use a few function calls to
the library in order to construct a tree similar to the one shown in Figure 1.
We need to remember that white spaces are not ignored in XML. In Figure 1, we show the white spaces as TEXT nodes. In libxml2, text nodes are of type XML_TEXT_NODE, while element nodes are of type XML_ELEMENT_NODE.
Now that we have an understanding of what XML is, let's look at one of its applications: RSS.
2. RSS
Really Simple Syndication (RSS) is an XML language
used for sharing web content. As a content publisher, RSS gives you the
power to inform your readers about new content on your information
channel. RSS allows you, as a content consumer, to target your web
activities towards information that you are actually interested in. For
example, if you are mainly interested in health news, you do not want
to spend a lot of time on cnn.com or msnbc.com looking for health articles. What you want is a way for cnn.com or msnbc.com
to tell you when new health articles become available on their
websites. The news channel can set up an XML instance file, based on
the RSS language, advertising the newest health articles on its
website. You use RSS reader software to subscribe to this XML file. The
reader can refresh the copy of this XML file and present it to you.
This scheme provides you with efficiency and also privacy as the
website does not have to know your email address in order to inform you
of new content. RSS can be thought of as both a push and a pull technology. The producer pushes filtered content that the consumer pulls.
Websites advertise the existence of specific channels using several icons. Figure 2 shows some of these icons. The universal feed icon (bottom icon) is gaining wide acceptance.
Let's illustrate the basics of an RSS feed through
an example. The Nebraska State Patrol provides an RSS feed about
absconded offenders. Individuals can subscribe to this channel to stay
informed. Like everything else on the Internet, an RSS feed has a URL.
For example, the URL for absconded offenders in the state of Nebraska
is: http://www.nsp.state.ne.us/SOR/Abscondedrss.xml. Listing 1 shows a sample XML document of this feed.
Example 1. An example of an RSS document.
<?xml version ="1.0" encoding="UTF-093"?> <rss version ="2.0"> <channel> <title> Nebraska State Patrol | Absconded Offenders </title> <link> http://www.nsp.state.ne.us/sor/ </link> <description> The Nebraska State Patrol is currently seeking information on the location of the following individuals to determine if they are in compliance with the Nebraska Sex Offender Registration Act. This site is intended to generate information on these individuals and should not be used solely for the purpose of arrest. Anyone with information please call 402-471-8647.
</description> <image> <title>Nebraska State Patrol | SOR</title> <url>http://www.nsp.state.ne.us/sor/rsslogo.jpg </url> <link>http://www.nsp.state.ne.us/sor/</link> </image>
<item> <title>Austen, Kate</title> <link> http://www.nsp.state.ne.us/sor/200403KA2 </link> <description> Absconded - Jefe de una loca mujer </description> </item>
</channel> </rss>
|
Every RSS feed document starts with the following :
<?xml version="1.0" encoding="UTF-8"?>
This line indicates that this is an XML document. The version attribute is mandatory while the encoding attribute is not.
The root of the RSS feed is the rss element. This root element has only one child: the channel element. The channel element has three mandatory child elements: title, link, and description. In addition, it can hold several optional child elements such as: webMaster, image, copyright.
The mandatory elements are required for an RSS feed
to be valid. However, valid does not necessarily mean useful. To be
useful, the channel element should have one or more item child elements. Each story in an RSS feed file is represented by an item element. An item element contains three child elements: (1) a title element, (2) a link element, and (3) an optional description
element. The reader presents to you the title of the story, its link,
and, optionally, its description. If you are interested in the story,
you click the link (a URL) to visit the web page of that story.
Now that we know the structure of an RSS feed document, let's use the libxml2 library to extract information from an RSS feed. First, we present a reader using DOM, then another one using SAX.
But before getting into the use of the libxml2 library, we need to do some additional configurations to the XCode project.
3. Configuring the XCode project
Follow these steps to configure your project to use ibxml2 library:
Add to Other Linker Flags of the project. Double-click on your project node. Select the Build tab and search for "other". Double-click on Other Linker Flags, and enter -lxml2. See Figure 3.
Add to Other Linker Flags of the target.
You also need to repeat the previous step, but instead of adding the
flag to the project, you need to add it to the target. Choose Project
> Edit Active Target from the menu. Select the Build tab and search
for "other". Double-click on Other Linker Flags and enter -lxml2. See Figure 4.
Add to the Header Search Path. You need to add the following line to the project:
HEADER_SEARCH_PATHS = /usr/include/libxml2
Double-click on the project node, and select the Build tab. Search for "header" and enter the value as shown in Figure 5.