Game Programming with DirectX : 3D Models - Token Stream

5/19/2013 7:13:19 PM

In a text file exported from a 3D modeling application, the data is usually presented in a human-readable form. In this type of file there can be line breaks, spaces, tabs, words, numbers, curly braces and other symbols, and so on. Regardless of what information is in the file, it is useful to be able to effectively read the information you care about. In most binary file formats the data is tightly packed, with only the data necessary to represent its purpose. In a text file there can exist comments, new lines, spaces, and so on that are at times more for the benefit of someone reading the text file’s contents than for the application loading it.

In this section we will cover the creation of a class that will take a series of text and split it up into separate pieces of text. In other words, we will create a list of all words, numbers, and symbols that appear in the text file, without any delimiters. Delimiters are things such as new lines, spaces, tabs, and any other characters that can appear in the file that mark the end of a piece of text (such as a word). Consider, for example, the following.

"She sells sea shells by the sea shore"

This text is made up of eight individual pieces of text. These pieces are known as tokens, pieces of text (letters, numbers, and symbols) that are separated by delimiters. A delimiter can be anything you define it to be, but they are commonly the examples mentioned earlier; spaces, end-of-file markers, new lines, and so on. In the example above the only delimiters are the white spaces between each word.

For the class we will create, we want the ability to extract each token between delimiters. Using the example from earlier, we want a class that we can use to call a function to get each word from the text, one at a time. To do this we will define a function that can test each character of the complete text to see if it is a delimiter and not part of a token. While reading, we read each character until we come to such a delimiter, and then we return that token, which we’ll call GetNextToken() in the class. Every time GetNextToken() is called, a new token from the file will be returned. Consider the following example.

"VertexPos 100 50 30"

If we called get GetNextToken() for the first time on the above example, the token VertexPos will be returned. If we were looking for the next vertex position, we would know that the next three calls to GetNextToken() would return the X, Y, and Z values, which would be 100, 50, and 30. If we defined a 3D model this way, we could have each vertex position of each triangle on a line in the text file, and we simply would call one function, GetNextToken(), to extract each piece of information, one at a time.

This class will be called TokenStream, and it will have a function to load a file’s data into memory or to set the data using an array, to get the next token, and to move to the next line in the file. A file will be loaded by calling LoadTokenStream(), which will only need to take as a parameter the name of the file being loaded. Another way to set the data stream is to manually set it by calling SetTokenStream(), which will take a character pointer to an array of text to set. That way you can set the data stream from a file or from an array of characters already in the application.

There will also be two GetNextToken() functions, where the first will return the next token that appears in the file while the overloaded version will search for a specific token and return the token that immediately follows. The MoveToNextLine() function for moving to a new line in the text data will read characters until a new-line character is found and will return the entire line to the caller. This can be useful if you have data specified strictly line by line such as the “VertexPos 100 50 30” example from earlier. If you read the entire line, you could use another TokenStream object to break that line down into individual tokens for further processing.

Another function that is part of the class includes a function to reset the token stream, which means moving the reading indexes that are used to read tokens to the beginning of the file (i.e., set to 0). There is also a function pointer that is used to allow the programmer to set a function that is used to test characters for a delimiter by testing whether what they consider a valid token character is being read. Since a delimiter can be anything you define it to be, this might be useful for reading different types of files, where what you consider a delimiter might change depending on the file being read.

There is also a function, called DefaultIsValidIdentifier(), that is not part of the class. This function will be set to the class’s function pointer by default and will essentially consider white spaces, new lines, end-of-file markers, tabs, and any other non-letter, -number, or -symbol as a delimiter. That way anyone using this class can use the default function instead of always having to write their own to do the same thing.

The class declaration for TokenStream is shown in Listing 1. The class has member variables for the start and ending indexes that are used internally for the reading of tokens (more on this coming up) and a string that holds the entire text data that was set to the token stream.

Listing 1. The TokenStream Class Declaration

/*
   Token Stream
   Ultimate Game Programming with DirectX 2nd Edition
   Created by Allen Sherrod
*/
   #ifndef _TOKEN_STREAM_H_
   #define _TOKEN_STREAM_H_


   bool DefaultIsValidIdentifier(char c);


   class TokenStream
   {
      public:
         TokenStream(bool (*IdentiferFuncPtr)(char c));
         ~TokenStream();

         void ResetStream();

         bool LoadTokenStream(char *fileName);
         void SetTokenStream(char *data);
         bool GetNextToken(std::string *buffer);
         bool GetNextToken(std::string *token, std::string *buffer);

         bool MoveToNextLine(std::string *buffer);

      private:
         int m_startIndex, m_endIndex;
         std::string m_data;

         bool (*isValidIdentifier)(char c);
    };

    #endif

In the class the constructor sets the read indexes to 0 and sets the function pointer. If NULL is passed to the constructor, DefaultIsValidIdentifier(), which is the default function, is used. DefaultIsValidIdentifier() is a simple function that considers everything in the ASCII code range between 32 (the ! symbol) and 127 (the ~ symbol) as a valid part of a token. This means anything outside that range such as white spaces is considered a delimiter. Therefore, if the character passed as the parameter to this function is a valid token character, the function will return true; otherwise, it returns false if it considers the character to be a delimiter. The code functions for the DefaultIsValidIdentifier() function, class constructor, and class destructor are shown in Listing 2 along with the ResetStream() function, which just sets the two indexes to a value of 0.

Listing 2. The Class Constructor, Destructor, Stream Reset, and Valid Token Check

/*
    Token Stream
    Ultimate Game Programming with DirectX 2nd Edition
    Created by Allen Sherrod
*/

#include<string>
#include<fstream>
#include"TokenStream.h"

using namespace std;


bool DefaultIsValidIdentifier(char c)
{
   // ASCII from ! to ~.
   if((int)c > 32 && (int)c < 127)
      return true;

   return false;
}
TokenStream::TokenStream(bool (*IdentiferFuncPtr)(char c))
{
   ResetStream();

   if(IdentiferFuncPtr == NULL)
      isValidIdentifier = DefaultIsValidIdentifier;
   else
      isValidIdentifier = IdentiferFuncPtr;
}


TokenStream::~TokenStream()
{

}


void TokenStream::ResetStream()
{
   m_startIndex = m_endIndex = 0;
}

The next functions we will be looking at, which are shown in Listing 3, are SetTokenStream() and LoadTokenStream(). The SetTokenStream() function resets the stream indexes and sets the text data to the function’s parameter. The LoadTokenStream() function opens a file, reads its contents, sets the file’s contents to the class’s data string, deletes the temporary allocated memory that was used to read from the file, closes the file, and returns. This code is essentially the same as that in the Files demo but is now being used as the TokenStream class’s loading function.

Listing 3. The SetTokenStream() and LoadTokenStream() Functions

void TokenStream::SetTokenStream(char *data)
{
   ResetStream();
   m_data = data;
}


bool TokenStream::LoadTokenStream(char *fileName)
{
   ResetStream();

   ifstream fileStream;
   int fileSize = 0;

   fileStream.open(fileName, ifstream::in);

   if(fileStream.is_open() == false)
      return false;

   fileStream.seekg(0, ios::end);
   fileSize = fileStream.tellg();
   fileStream.seekg(0, ios::beg);

   if(fileSize <= 0)
     return false;

   char *buffer = new char[fileSize];
   memset(buffer, 0, fileSize);

   if(buffer == NULL)
     return false;

   fileStream.read(buffer, fileSize);
   buffer[fileSize - 1] = '\0';

   fileStream.close();
   m_data = buffer;
   delete[] buffer;

   return true;
}

The GetNextToken() functions are not difficult, but they are where all the work occurs when you use the class. The first of these two functions starts off by setting the starting index to the last position of the ending index, and it goes on to test that we have not reached the end of the text data. When this function is first called, both the start and end are 0, but as reading occurs, the starting index is set to wherever the function last left off, which is at the ending index position.

Assuming there is information to parse, the function then reads all characters until it reaches a valid token character. For every character that is a delimiter, the start index is moved forward. This allows the code to skip all delimiters until it reaches the start of the next token. Therefore, if the text had a bunch of white spaces before the data begins, let’s say for formatting purposes in the original text file, those delimiters are skipped so the function can find the start of the next token. Once the start is found, the new end index will be one past the new starting index.

With the starting location of the next valid token found, the next step is to reach the entire text that makes up that token. This involves reading characters until a delimiter is found. Each time the code reads a valid token identifier, the end index is incremented. Once a delimiter is found, the text between the start index and end index represents the token. So if you were reading the following line:

"    This       is a line"

the start index will be 4 since the first three white spaces are skipped and the “T” is the fourth character, and the end index is 7, which is the position of the first “s.” The next time GetNextToken() is called, using the text above, the white spaces between “This” and “is” are skipped, and the starting index is set to the “i” in “is,” while the ending index is, after the function completes, set to the “s” in “is.” This would continue until the TokenStream object reaches the end of the data stream. If it was at the end of the data stream, the function would continue to return false during future calls unless the indexes are reset by calling ResetStream().

The last part after the code identifies the start and end indexes that make up the token is to return the token’s text. This is done by setting the function’s parameter, which is a pointer to where the token is to be saved, to the characters between the start and end indexes that make up the token. If NULL is passed to the function, the token is discarded, which can be useful if you wanted to move past the next token without actually storing it because you want to discard or ignore it. As long as the function is able to find a token, it returns true; otherwise, it will return false. The first GetNextToken() function is shown in Listing 4.

Listing 4. The First GetNextToken()

bool TokenStream::GetNextToken(std::string *buffer)
{
   m_startIndex = m_endIndex;
   int length = (int)m_data.length();

   // Make sure we are not at the end.
   if(m_startIndex >= length)
      return false;

   // Skip all delimiters.
   while(m_startIndex < length &&
         isValidIdentifier(m_data[m_startIndex]) == false)
   {
      m_startIndex++;
   }

   // The end is one past where we are starting (for 1 character).
   m_endIndex = m_startIndex + 1;

   // If we haven't reached the end of the data stream to begin.
   if(m_startIndex < length)
   {
      // Read until we reach  a delimiter or the end.
      while(m_endIndex < length &&
            (isValidIdentifier(m_data[m_endIndex])))
      {
         m_endIndex++;
      }
      // If we are returning this token, save it.
      if(buffer != NULL)
      {
         int size = (m_endIndex - m_startIndex);
         int index = m_startIndex;

         buffer->reserve(size + 1);
         buffer->clear();

         for(int i = 0; i < size;
         {
            buffer->push_back(m_data[index++]);
         }
      }

      return true;
   }

   return false;
}

The overloaded GetNextToken() function has a parameter for a token to search for and a pointer address to where to store the token that follows it. The function calls the original GetNextToken() until it finds the search token. Once found, GetNextToken() is called again to return the token that immediately follows. Using the “VertexPos 100 50 30” example from above, if you used this function to search for VertexPos, it will return the token after it, which is 100. The overloaded GetNextToken() function is shown in Listing 5.

Listing 5. The Overloaded GetNextToken() Function

bool TokenStream::GetNextToken(std::string *token,
                               std::string *buffer)
{
   std::string tok;

   if(token == NULL)
      return false;

   // Read tokens until…
   while(GetNextToken(&tok))
   {
      // …we find the one after what we are looking for.
      if(strcmp(tok.c_str(), token->c_str()) == 0)
         return GetNextToken(buffer);
  }

  return false;
}

The overloaded GetNextToken() function can be useful when you need information after a specific token but not the token itself—for example, if somewhere in the file you had a file ID as seen in the following.

"ID 1001"

If the application needs to check the validity of the file ID, it could use the overloaded GetNextToken() function to search for ID, which will return the information of real interest, 1001.

The last function of the TokenStream class is the MoveToNextLine() function. This function is useful if you want to reach a single line of text at a time from a file. This function is similar to the GetNextToken() function, but instead of stopping at a white space, it keeps going until one of the other delimiters is reached such as a new line, end-of-file marker, and so on. The MoveToNextLine() function is shown in Listing 6.

Listing 6. The TokenStream’s MoveToNextLine() Function

bool TokenStream::MoveToNextLine(std::string *buffer)
{
   int length = (int)m_data.length();

   // Read the entire line until we reach a newline character.
   // Read only if we are not at the end.
   if(m_startIndex < length && m_endIndex < length)
   {
      m_endIndex = m_startIndex;

      while(m_endIndex < length &&
           (isValidIdentifier(m_data[m_endIndex]) ||
           m_data[m_endIndex] == ' '))

      {
         m_endIndex++;
      }

      if((m_endIndex - m_startIndex) == 0)
          return false;

      if(m_endIndex - m_startIndex >= length)
         return false;

      // Return the line's data.
      if(buffer != NULL)
      {
         int size = (m_endIndex - m_startIndex);
         int index = m_startIndex;

         buffer->reserve(size + 1);
         buffer->clear();

         for(int i = 0; i < size; i++)
         {
           buffer->push_back(m_data[index++]);
         }
      }
   }
   else
   {
      return false;
   }

   m_endIndex++;
   m_startIndex = m_endIndex + 1;

   return true;
}

The demo loads a file called tokens.txt, which is also in the folder, and displays each token to the screen, one at a time, using a loop. The loop continues to call and display the result of GetNextToken() until GetNextToken() returns false, which means there are no more tokens left to read. The main source file from the Token Stream demo is shown in Listing 7. Listing 8 shows the file contents from tokens.txt.

Listing 7. The Main Source File for the Token Stream Demo

/*
   Token Stream
   Ultimate Game Programming with DirectX 2nd Edition
   Created by Allen Sherrod
*/

#include<iostream>
#include<string>
#include"TokenStream.h"

using namespace std;

int main(int args, char *argc[])
{
   cout << "Stream of Tokens…" << endl << endl;

   TokenStream tokenStream(DefaultlsValidldentifier);
   tokenStream.LoadTokenStream("tokens.txt");

   string token;

   while(tokenStream.GetNextToken(&token))
   {
      cout << token.c_str() << " ";
   }

   cout << endl << endl;

   return 1;
}

Listing 8. The tokens.txt File

Hi     hello "wow"   ! $%&*

this
  is


a


  test 100

Other

Game Programming with DirectX : 3D Models - Files in C++

Sony Computer Entertainment (Part 3)

Sony Computer Entertainment (Part 2)

Sony Computer Entertainment (Part 1)

Sony's 4K Ultra World - Ready For Yet Another Resolution Revolution

JBL Flip Portable Wireless Loudspeaker

AOC E2460phu LED LCD 24” Monitor

Can You Produce Your Own DJ Mix With Apps?

Game Programming with DirectX : Time-Based Simulations (part 3) - The Main Source File

Game Programming with DirectX : Time-Based Simulations (part 2) - The Route Class