In a text file exported from a 3D modeling
application, the data is usually presented in a human-readable form. In
this type of file there can be line breaks, spaces, tabs, words,
numbers, curly braces and other symbols, and so on. Regardless of what
information is in the file, it is useful to be able to effectively read
the information you care about. In most binary file formats the data is
tightly packed, with only the data necessary to represent its purpose.
In a text file there can exist comments, new lines, spaces, and so on
that are at times more for the benefit of someone reading the text
file’s contents than for the application loading it.
In
this section we will cover the creation of a class that will take a
series of text and split it up into separate pieces of text. In other
words, we will create a list of all words, numbers, and symbols that
appear in the text file, without any delimiters. Delimiters are things
such as new lines, spaces, tabs, and any other characters that can
appear in the file that mark the end of a piece of text (such as a
word). Consider, for example, the following.
"She sells sea shells by the sea shore"
This text is made up of eight individual pieces of
text. These pieces are known as tokens, pieces of text (letters,
numbers, and symbols) that are separated by delimiters. A delimiter can
be anything you define it to be, but they are commonly the examples
mentioned earlier; spaces, end-of-file markers, new lines, and so on. In
the example above the only delimiters are the white spaces between each
word.
For the class we will create, we want the ability
to extract each token between delimiters. Using the example from
earlier, we want a class that we can use to call a function to get each
word from the text, one at a time. To do this we will define a function
that can test each character of the complete text to see if it is a
delimiter and not part of a token. While reading, we read each character
until we come to such a delimiter, and then we return that token, which
we’ll call GetNextToken() in the class. Every time GetNextToken() is called, a new token from the file will be returned. Consider the following example.
If we called get GetNextToken() for the first time on the above example, the token VertexPos will be returned. If we were looking for the next vertex position, we would know that the next three calls to GetNextToken()
would return the X, Y, and Z values, which would be 100, 50, and 30. If
we defined a 3D model this way, we could have each vertex position of
each triangle on a line in the text file, and we simply would call one
function, GetNextToken(), to extract each piece of information, one at a time.
This class will be called TokenStream,
and it will have a function to load a file’s data into memory or to set
the data using an array, to get the next token, and to move to the next
line in the file. A file will be loaded by calling LoadTokenStream(),
which will only need to take as a parameter the name of the file being
loaded. Another way to set the data stream is to manually set it by
calling SetTokenStream(), which will take a character pointer
to an array of text to set. That way you can set the data stream from a
file or from an array of characters already in the application.
There will also be two GetNextToken()
functions, where the first will return the next token that appears in
the file while the overloaded version will search for a specific token
and return the token that immediately follows. The MoveToNextLine()
function for moving to a new line in the text data will read characters
until a new-line character is found and will return the entire line to
the caller. This can be useful if you have data specified strictly line
by line such as the “VertexPos 100 50 30” example from earlier. If you read the entire line, you could use another TokenStream object to break that line down into individual tokens for further processing.
Another function that is part of the class
includes a function to reset the token stream, which means moving the
reading indexes that are used to read tokens to the beginning of the
file (i.e., set to 0). There is also a function pointer that is used to
allow the programmer to set a function that is used to test characters
for a delimiter by testing whether what they consider a valid token
character is being read. Since a delimiter can be anything you define it
to be, this might be useful for reading different types of files, where
what you consider a delimiter might change depending on the file being
read.
There is also a function, called DefaultIsValidIdentifier(),
that is not part of the class. This function will be set to the class’s
function pointer by default and will essentially consider white spaces,
new lines, end-of-file markers, tabs, and any other non-letter,
-number, or -symbol as a delimiter. That way anyone using this class can
use the default function instead of always having to write their own to
do the same thing.
The class declaration for TokenStream is shown in Listing 1.
The class has member variables for the start and ending indexes that
are used internally for the reading of tokens (more on this coming up)
and a string that holds the entire text data that was set to the token
stream.
Listing 1. The TokenStream Class Declaration
/*
Token Stream
Ultimate Game Programming with DirectX 2nd Edition
Created by Allen Sherrod
*/
#ifndef _TOKEN_STREAM_H_
#define _TOKEN_STREAM_H_
bool DefaultIsValidIdentifier(char c);
class TokenStream
{
public:
TokenStream(bool (*IdentiferFuncPtr)(char c));
~TokenStream();
void ResetStream();
bool LoadTokenStream(char *fileName);
void SetTokenStream(char *data);
bool GetNextToken(std::string *buffer);
bool GetNextToken(std::string *token, std::string *buffer);
bool MoveToNextLine(std::string *buffer);
private:
int m_startIndex, m_endIndex;
std::string m_data;
bool (*isValidIdentifier)(char c);
};
#endif
|
In the class the constructor sets the read indexes to 0 and sets the function pointer. If NULL is passed to the constructor, DefaultIsValidIdentifier(), which is the default function, is used. DefaultIsValidIdentifier() is a simple function that considers everything in the ASCII code range between 32 (the ! symbol) and 127 (the ~
symbol) as a valid part of a token. This means anything outside that
range such as white spaces is considered a delimiter. Therefore, if the
character passed as the parameter to this function is a valid token
character, the function will return true; otherwise, it returns false if it considers the character to be a delimiter. The code functions for the DefaultIsValidIdentifier() function, class constructor, and class destructor are shown in Listing 2 along with the ResetStream() function, which just sets the two indexes to a value of 0.
Listing 2. The Class Constructor, Destructor, Stream Reset, and Valid Token Check
/*
Token Stream
Ultimate Game Programming with DirectX 2nd Edition
Created by Allen Sherrod
*/
#include<string>
#include<fstream>
#include"TokenStream.h"
using namespace std;
bool DefaultIsValidIdentifier(char c)
{
// ASCII from ! to ~.
if((int)c > 32 && (int)c < 127)
return true;
return false;
}
TokenStream::TokenStream(bool (*IdentiferFuncPtr)(char c))
{
ResetStream();
if(IdentiferFuncPtr == NULL)
isValidIdentifier = DefaultIsValidIdentifier;
else
isValidIdentifier = IdentiferFuncPtr;
}
TokenStream::~TokenStream()
{
}
void TokenStream::ResetStream()
{
m_startIndex = m_endIndex = 0;
}
|
The next functions we will be looking at, which are shown in Listing 3, are SetTokenStream() and LoadTokenStream(). The SetTokenStream() function resets the stream indexes and sets the text data to the function’s parameter. The LoadTokenStream()
function opens a file, reads its contents, sets the file’s contents to
the class’s data string, deletes the temporary allocated memory that was
used to read from the file, closes the file, and returns. This code is
essentially the same as that in the Files demo but is now being used as
the TokenStream class’s loading function.
Listing 3. The SetTokenStream() and LoadTokenStream() Functions
void TokenStream::SetTokenStream(char *data)
{
ResetStream();
m_data = data;
}
bool TokenStream::LoadTokenStream(char *fileName)
{
ResetStream();
ifstream fileStream;
int fileSize = 0;
fileStream.open(fileName, ifstream::in);
if(fileStream.is_open() == false)
return false;
fileStream.seekg(0, ios::end);
fileSize = fileStream.tellg();
fileStream.seekg(0, ios::beg);
if(fileSize <= 0)
return false;
char *buffer = new char[fileSize];
memset(buffer, 0, fileSize);
if(buffer == NULL)
return false;
fileStream.read(buffer, fileSize);
buffer[fileSize - 1] = '\0';
fileStream.close();
m_data = buffer;
delete[] buffer;
return true;
}
|
The GetNextToken()
functions are not difficult, but they are where all the work occurs
when you use the class. The first of these two functions starts off by
setting the starting index to the last position of the ending index, and
it goes on to test that we have not reached the end of the text data.
When this function is first called, both the start and end are 0, but as
reading occurs, the starting index is set to wherever the function last
left off, which is at the ending index position.
Assuming there is information to parse, the
function then reads all characters until it reaches a valid token
character. For every character that is a delimiter, the start index is
moved forward. This allows the code to skip all delimiters until it
reaches the start of the next token. Therefore, if the text had a bunch
of white spaces before the data begins, let’s say for formatting
purposes in the original text file, those delimiters are skipped so the
function can find the start of the next token. Once the start is found,
the new end index will be one past the new starting index.
With the starting location of the next valid token
found, the next step is to reach the entire text that makes up that
token. This involves reading characters until a delimiter is found. Each
time the code reads a valid token identifier, the end index is
incremented. Once a delimiter is found, the text between the start index
and end index represents the token. So if you were reading the
following line:
the start index will be 4 since the first three
white spaces are skipped and the “T” is the fourth character, and the
end index is 7, which is the position of the first “s.” The next time GetNextToken()
is called, using the text above, the white spaces between “This” and
“is” are skipped, and the starting index is set to the “i” in “is,”
while the ending index is, after the function completes, set to the “s”
in “is.” This would continue until the TokenStream object reaches the end of the data stream. If it was at the end of the data stream, the function would continue to return false during future calls unless the indexes are reset by calling ResetStream().
The last part after the code identifies the start
and end indexes that make up the token is to return the token’s text.
This is done by setting the function’s parameter, which is a pointer to
where the token is to be saved, to the characters between the start and
end indexes that make up the token. If NULL is passed to the
function, the token is discarded, which can be useful if you wanted to
move past the next token without actually storing it because you want to
discard or ignore it. As long as the function is able to find a token,
it returns true; otherwise, it will return false. The first GetNextToken() function is shown in Listing 4.
Listing 4. The First GetNextToken()
bool TokenStream::GetNextToken(std::string *buffer)
{
m_startIndex = m_endIndex;
int length = (int)m_data.length();
// Make sure we are not at the end.
if(m_startIndex >= length)
return false;
// Skip all delimiters.
while(m_startIndex < length &&
isValidIdentifier(m_data[m_startIndex]) == false)
{
m_startIndex++;
}
// The end is one past where we are starting (for 1 character).
m_endIndex = m_startIndex + 1;
// If we haven't reached the end of the data stream to begin.
if(m_startIndex < length)
{
// Read until we reach a delimiter or the end.
while(m_endIndex < length &&
(isValidIdentifier(m_data[m_endIndex])))
{
m_endIndex++;
}
// If we are returning this token, save it.
if(buffer != NULL)
{
int size = (m_endIndex - m_startIndex);
int index = m_startIndex;
buffer->reserve(size + 1);
buffer->clear();
for(int i = 0; i < size;
{
buffer->push_back(m_data[index++]);
}
}
return true;
}
return false;
}
|
The overloaded GetNextToken() function
has a parameter for a token to search for and a pointer address to where
to store the token that follows it. The function calls the original GetNextToken() until it finds the search token. Once found, GetNextToken() is called again to return the token that immediately follows. Using the “VertexPos 100 50 30” example from above, if you used this function to search for VertexPos, it will return the token after it, which is 100. The overloaded GetNextToken() function is shown in Listing 5.
Listing 5. The Overloaded GetNextToken() Function
bool TokenStream::GetNextToken(std::string *token,
std::string *buffer)
{
std::string tok;
if(token == NULL)
return false;
// Read tokens until…
while(GetNextToken(&tok))
{
// …we find the one after what we are looking for.
if(strcmp(tok.c_str(), token->c_str()) == 0)
return GetNextToken(buffer);
}
return false;
}
|
The overloaded GetNextToken() function
can be useful when you need information after a specific token but not
the token itself—for example, if somewhere in the file you had a file ID
as seen in the following.
If the application needs to check the validity of the file ID, it could use the overloaded GetNextToken() function to search for ID, which will return the information of real interest, 1001.
The last function of the TokenStream class is the MoveToNextLine()
function. This function is useful if you want to reach a single line of
text at a time from a file. This function is similar to the GetNextToken()
function, but instead of stopping at a white space, it keeps going
until one of the other delimiters is reached such as a new line,
end-of-file marker, and so on. The MoveToNextLine() function is shown in Listing 6.
Listing 6. The TokenStream’s MoveToNextLine() Function
bool TokenStream::MoveToNextLine(std::string *buffer)
{
int length = (int)m_data.length();
// Read the entire line until we reach a newline character.
// Read only if we are not at the end.
if(m_startIndex < length && m_endIndex < length)
{
m_endIndex = m_startIndex;
while(m_endIndex < length &&
(isValidIdentifier(m_data[m_endIndex]) ||
m_data[m_endIndex] == ' '))
{
m_endIndex++;
}
if((m_endIndex - m_startIndex) == 0)
return false;
if(m_endIndex - m_startIndex >= length)
return false;
// Return the line's data.
if(buffer != NULL)
{
int size = (m_endIndex - m_startIndex);
int index = m_startIndex;
buffer->reserve(size + 1);
buffer->clear();
for(int i = 0; i < size; i++)
{
buffer->push_back(m_data[index++]);
}
}
}
else
{
return false;
}
m_endIndex++;
m_startIndex = m_endIndex + 1;
return true;
}
|
The demo loads a file called tokens.txt,
which is also in the folder, and displays each token to the screen, one
at a time, using a loop. The loop continues to call and display the
result of GetNextToken() until GetNextToken() returns false, which means there are no more tokens left to read. The main source file from the Token Stream demo is shown in Listing 7. Listing 8 shows the file contents from tokens.txt.
Listing 7. The Main Source File for the Token Stream Demo
/*
Token Stream
Ultimate Game Programming with DirectX 2nd Edition
Created by Allen Sherrod
*/
#include<iostream>
#include<string>
#include"TokenStream.h"
using namespace std;
int main(int args, char *argc[])
{
cout << "Stream of Tokens…" << endl << endl;
TokenStream tokenStream(DefaultlsValidldentifier);
tokenStream.LoadTokenStream("tokens.txt");
string token;
while(tokenStream.GetNextToken(&token))
{
cout << token.c_str() << " ";
}
cout << endl << endl;
return 1;
}
|
Listing 8. The tokens.txt File
Hi hello "wow" ! $%&*
this
is
a
test 100