Windows System Programming : File Pointers & Getting the File Size

7/28/2011 3:33:23 PM

The 64-Bit File System

The Windows NTFS supports 64-bit file addresses so that files can, in principle, be as long as 2⁶⁴ bytes. The 2³²-byte length limit of older 32-bit file systems, such as FAT, constrains file lengths to 4GB (4 × 10⁹ bytes). This limit is a serious constraint for numerous applications, including large database and multimedia systems, so any complete modern OS must support much larger files.

Files larger than 4GB are sometimes called very large or huge files, although huge files have become so common that we’ll simply assume that any file could be huge and program accordingly.

Needless to say, some applications will never need huge files, so, for many programmers, 32-bit file addresses will be adequate. It is, however, a good idea to start working with 64-bit addresses from the beginning of a new development project, given the rapid pace of technical change and disk capacity growth,^[1] cost improvements, and application requirements.

File Pointers

Windows, just like UNIX, the C library, and nearly every other OS, maintains a file pointer with each open file handle, indicating the current byte location in the file. The next WriteFile or ReadFile operation will start transferring data sequentially to or from that location and increment the file pointer by the number of bytes transferred. Opening the file with CreateFile sets the pointer to zero, indicating the start of the file, and the handle’s pointer is advanced with each successive read or write. The crucial operation required for random file access is the ability to set the file pointer to an arbitrary value, using SetFilePointer and SetFilePointerEx.

The first function, SetFilePointer, is obsolete, as the handling of 64-bit file pointers is clumsy. SetFilePointerEx, one of a number of “extended”^[2] functions, is the correct choice, as it uses 64-bit pointers naturally. Nonetheless, we describe both functions here because SetFilePointer is still common. In the future, if the extended function is supported in NT5 and is actually superior, we mention the nonextended function only in passing.

^[2] The extended functions have an “Ex” suffix and, as would be expected, provide additional functionality. There is no consistency among the extended functions in terms of the nature of the new features or parameter usage. For example, MoveFileEx adds a new flag input parameter, while SetFilePointerEx has a LARGE_INTEGER input and output parameters. The registry functions have additional extended functions.

SetFilePointer shows, for the first time, how Windows handles addresses in large files. The techniques are not always pretty, and SetFilePointer is easiest to use with small files.

DWORD SetFilePointer ( HANDLE hFile, LONG lDistanceToMove, PLONG lpDistanceToMoveHigh, DWORD dwMoveMethod)
Return: The low-order DWORD (unsigned) of the new file pointer. The high-order portion of the new file pointer goes to the DWORD indicated by lpDistanceToMoveHigh (if non-NULL). In case of error, the return value is 0xFFFFFFFF.

Parameters

hFile is the handle of an open file with read or write access (or both).

lDistanceToMove is the 32-bit LONG signed distance to move or unsigned file position, depending on the value of dwMoveMethod.

lpDistanceToMoveHigh points to the high-order portion of the move distance. If this value is NULL, the function can operate only on files whose length is limited to 2³²–2. This parameter is also used to receive the high-order return value of the file pointer.^[3] The low-order portion is the function’s return value.

^[3] Windows is not consistent, as can be seen by comparing SetFilePointer with GetCurrentDirectory. In some cases, there are distinct input and output parameters.

dwMoveMethod specifies one of three move modes.

FILE_BEGIN: Position the file pointer from the start of the file, interpreting DistanceToMove as unsigned.
FILE_CURRENT: Move the pointer forward or backward from the current position, interpreting DistanceToMove as signed. Positive is forward.
FILE_END: Position the pointer backward or forward from the end of the file.

You can obtain the file length by specifying a zero-length move from the end of file, although the file pointer is changed as a side effect.

The method of representing 64-bit file positions causes complexities because the function return can represent both a file position and an error code. For example, suppose that the actual position is location 2³²–1 (that is, 0xFFFFFFFF) and that the call also specifies the high-order move distance. Invoke GetLastError to determine whether the return value is a valid file position or whether the function failed, in which case the return value would not be NO_ERROR. This explains why file lengths are limited to 2³²–2 when the high-order component is omitted.

Another confusing factor is that the high- and low-order components are separated and treated differently. The low-order address is treated as a call by value and returned by the function, whereas the high-order address is a call by reference and is both input and output. SetFilePointerEx is much easier to use, but, first, we need to describe Windows 64-bit arithmetic.

lseek (in UNIX) and fseek (in the C library) are similar to SetFilePointerEx. Both systems also advance the file position during read and write operations.

64-Bit Arithmetic

It is not difficult to perform the 64-bit file pointer arithmetic, and our example programs use the Windows LARGE_INTEGER 64-bit data type, which is a union of a LONGLONG (called QuadPart) and two 32-bit quantities (LowPart, a DWORD, and HighPart, a LONG). LONGLONG supports all the arithmetic operations. There is also a ULONGLONG data type, which is unsigned. The guidelines for using LARGE_INTEGER data are:

SetFilePointerEx and other functions require LARGE_INTEGER parameters.
Perform arithmetic on the QuadPart component of a LARGE_INTEGER value.
Use the LowPart and HighPart components as required; this is illustrated in an upcoming example.

SetFilePointerEx

SetFilePointerEx is straightforward, requiring a LARGE_INTEGER input for the requested position and a LARGE_INTEGER output for the actual position. The return result is a Boolean to indicate success or failure.

BOOL SetFilePointerEx ( HANDLE hFile, LARGE_INTERGER liDistanceToMove, PLARGE_INTEGER lpNewFilePointer, DWORD dwMoveMethod)

lpNewFilePointer can be NULL, in which case, the new file pointer is not returned. dwMoveMethod has the same values as for SetFilePointer.

Specifying File Position with an Overlapped Structure

Windows provides another way to specify the read/write file position. Recall that the final parameter to both ReadFile and WriteFile is the address of an overlapped structure, and this value has always been NULL in the previous examples. Two members of this structure are Offset and OffsetHigh. You can set the appropriate values in an overlapped structure, and the I/O operation can start at the specified location. The file pointer is changed to point past the last byte transferred, but the overlapped structure values do not change. The overlapped structure also has a handle member used for asynchronous overlapped I/O , hEvent, that must be NULL for now.

The overlapped structure is especially convenient when updating a file record, as the following code fragment illustrates; otherwise, separate SetFilePointerEx calls would be required before the ReadFile and WriteFile calls. The hEvent field is the last of five fields, as shown in the initialization statement. The LARGE_INTEGER data type represents the file position.

OVERLAPPED ov = { 0, 0, 0, 0, NULL };
RECORD r; /* Definition not shown
   but it includes the refCount field. */
LONGLONG n;
LARGE_INTEGER filePos;
DWORD nRead, nWrite;
...
/* Update the reference count in the nth record. */
filePos.QuadPart = n * sizeof(RECORD);
ov.Offset = filePos.LowPart;
ov.OffsetHigh = filePos.HighPart;
ReadFile(hFile, r, sizeof(RECORD), &nRead, &ov);
r.refCount++; /* Update the record. */
. . .
WriteFile(hFile, r, sizeof(RECORD), &nWrite, &ov);

If the file handle was created with the FILE_FLAG_NO_BUFFERING CreateFile flag, then both the file position and the record size (byte count) must be multiples of the disk volume’s sector size. Obtain physical disk information, including sector size, with GetDiskFreeSpace.

Note

You can append to the end of the file without knowing the file length. Just specify 0xFFFFFFFF on both Offset and OffsetHigh before performing the write.

Getting the File Size

Determine a file’s size by positioning 0 bytes from the end and using the file pointer value returned by SetFilePointerEx. Alternatively, you can use a specific function, GetFileSizeEx, for this purpose. GetFileSizeEx, like SetFilePointerEx, returns the 64-bit value as a LARGE_INTEGER.

BOOL GetFileSizeEx ( HANDLE hFile, PLARGE_INTEGER lpFileSize)
Return: The file size is in *lpFileSize. A false return indicates an error; check GetLastError.

GetFileSize (now obsolete) and GetFileSizeEx require that the file have an open handle. It is also possible to obtain the length by name. GetCompressedFileSize returns the size of the compressed file, and FindFirstFile, gives the exact size of a named file.

Setting the File Size, File Initialization, and Sparse Files

The SetEndOfFileEx function resizes the file using the current value of the file pointer to determine the length. A file can be extended or truncated. With extension, the contents of the extended region are not defined. The file will actually consume the disk space and user space quotas unless the file is a sparse file. Files can also be compressed to consume less space.

SetEndOfFileEx sets the physical end of file beyond the current “logical” end. The file’s tail, which is the portion between the logical and physical ends, contains no valid data. You can shorten the tail by writing data past the logical end.

With sparse files, disk space is consumed only as data is written. A file, directory, or volume can be specified to be sparse by the administrator. Also, the DeviceIoControl function can use the FSCTL_SET_SPARSE flag to specify that an existing file is sparse. Program 1 illustrates a situation where a sparse file can be used conveniently. SetFileValidData does not apply to sparse files.

Program 1. RecordAccess: Direct File Access

/* Usage: RecordAccess FileName [nrec]
   If nrec is omitted, FileName must already exist.
   If nrec > 0, FileName is recreated (destroying any existing file)
      and the program exits, having created an empty file.
   If the number of records is large, a sparse file is recommended.
*/
/* This program illustrates:
   1. Random file access.
   2. LARGE_INTEGER arithmetic and using 64-bit file positions.
   3. Record update in place.
   4. File initialization to 0
*/

#include "Everything.h"
#define STRING_SIZE 256
typedef struct _RECORD { /* File record structure */
   DWORD     referenceCount;  /* 0 means an empty record */
   SYSTEMTIME recordCreationTime;
   SYSTEMTIME recordLastRefernceTime;
   SYSTEMTIME recordUpdateTime;
   TCHAR     dataString[STRING_SIZE];
} RECORD;
typedef struct _HEADER { /* File header descriptor */
   DWORD     numRecords;
   DWORD     numNonEmptyRecords;
} HEADER;

int _tmain(int argc, char * argv[])
{
   HANDLE hFile;
   LARGE_INTEGER curPtr;
   DWORD openOption, nXfer, recNo;
   RECORD record;
   TCHAR string[STRING_SIZE], command, extra;
   OVERLAPPED ov = {0, 0, 0, 0, NULL}, ovZero = {0, 0, 0, 0, NULL};
   HEADER header = {0, 0};
   SYSTEMTIME currentTime;
   BOOLEAN headerChange, recordChange;

   openOption = ((argc > 2 && atoi(argv[2]) <= 0) || argc <= 2) ?
             OPEN_EXISTING : CREATE_ALWAYS;
   hFile = CreateFile(argv[1], GENERIC_READ | GENERIC_WRITE,
      0, NULL, openOption, FILE_FLAG_RANDOM_ACCESS, NULL);

   if (argc >= 3 && atoi(argv[2]) > 0) {
       /* Write the header and pre-size the new file) */
     header.numRecords = atoi(argv[2]);
     WriteFile(hFile, &header, sizeof(header), &nXfer, &ovZero);
     curPtr.QuadPart = (LONGLONG)sizeof(RECORD) * atoi(argv[2]) +
                      sizeof(HEADER);
     SetFilePointerEx(hFile, curPtr, NULL, FILE_BEGIN);
     SetEndOfFile(hFile);
     return 0;
  }

  /* Read file header: find number of records & non-empty records */
  ReadFile(hFile, &header, sizeof(HEADER), &nXfer, &ovZero);

  /* Prompt the user to read or write a numbered record */
  while (TRUE) {
     headerChange = FALSE; recordChange = FALSE;
     _tprintf(_T("Enter r(ead)/w(rite)/d(elete)/qu(it) Rec#\n"));
     _tscanf(_T("%c%u%c"), &command, &recNo, &extra);
     if (command == 'q') break;
     if (recNo >= header.numRecords) {
        _tprintf(_T("Record Number is too large. Try again.\n"));
        continue;
     }
     curPtr.QuadPart = (LONGLONG)recNo *
        sizeof(RECORD) + sizeof(HEADER);
     ov.Offset = curPtr.LowPart;
     ov.OffsetHigh = curPtr.HighPart;
     ReadFile(hFile, &record, sizeof(RECORD), &nXfer, &ov);
     GetSystemTime(&currentTime); /* To update record time fields */
     record.recordLastRefernceTime = currentTime;
     if (command == 'r' || command == 'd') { /* Report contents */
        if (record.referenceCount == 0) {
           _tprintf(_T("Record Number %d is empty.\n"), recNo);
           continue;
        } else {
           _tprintf(_T("Record Number %d. Reference Count: %d \n"),
              recNo, record.referenceCount);
           _tprintf(_T("Data: %s\n"), record.dataString);
           /* Exercise: Display times. See Program 3-2 */
        }
        if (command == 'd') { /* Delete the record */
           record.referenceCount = 0;
           header.numNonEmptyRecords--;
           headerChange = TRUE;
           recordChange = TRUE;
        }
     } else if (command == 'w') { /* Write record; first time? */
        _tprintf(_T("Enter new data string for the record.\n)");
        _fgetts(string, sizeof(string), stdin);
        /* Don't use _getts(potential buffer overflow) */
        string[_tcslen(string)-1] = _T('\0'); // remove newline
          if (record.referenceCount == 0) {
             record.recordCreationTime = currentTime;
             header.numNonEmptyRecords++;
             headerChange = TRUE;
          }
          record.recordUpdateTime = currentTime;
          record.referenceCount++;
          strncpy(record.dataString, string, STRING_SIZE-1);
          recordChange = TRUE;
       } else {
          _tprintf(_T("Command must be r, w, or d. Try again.\n"));
       }

       /* Update record in place if any contents have changed. */
       if (recordChange)
          WriteFile(hFile, &record, sizeof(RECORD), &nXfer, &ov):
       /* Update the number of non-empty records if required */
       if (headerChange)
          WriteFile(hFile, &header, sizeof(header), &nXfer, &ovZero);
   }

   _tprintf(_T("Computed number of non-empty records is: %d\n"),
      header.numNonEmptyRecords);
   ReadFile(hFile, &header, sizeof(HEADER), &nXfer, &ovZero);
   _tprintf(_T("File %s has %d non-empty records.\nCapacity: %d\n"),
      argv[1], header.numNonEmptyRecords, header.numRecords);

   CloseHandle(hFile);
   return 0;
}

NTFS files and file tails are initialized to zeros for security.

Notice that the SetEndOfFileEx call is not the only way to extend a file. You can also extend a file using many successive write operations, but this will result in more fragmented file allocation; SetEndOfFile allows the OS to allocate larger contiguous disk units.

Other

SharePoint 2010 : Business Intelligence - Excel Services (part 2) - Accessing Excel Services Over SOAP

SharePoint 2010 : Business Intelligence - Excel Services (part 1) - Accessing Excel Services Over REST

SharePoint 2010 : Business Intelligence - Visio Services

Exchange Server 2010 : Perform Essential Database Management (part 3) - Manage Database Settings

Exchange Server 2010 : Perform Essential Database Management (part 2) - Manage the Transaction Log Files

Exchange Server 2010 : Perform Essential Database Management (part 1) - Manage the Database Files

Architecting Applications for the Enterprise : UML Diagrams (part 3) - Sequence Diagrams

Architecting Applications for the Enterprise : UML Diagrams (part 2) - Class Diagrams

Architecting Applications for the Enterprise : UML Diagrams (part 1) - Use-Case Diagrams

SharePoint 2010 : Creating and Managing Workflows - Monitoring Workflows