MOBILE

Windows Mobile Security - Kernel Architecture

1/21/2011 7:28:50 PM
Windows Mobile 6 uses the Windows CE 5.2 kernel. The 5.1 version of this kernel served as the basis for Windows Mobile 5. Because the behaviors of the 5.1 and 5.2 CE kernels are so similar.

Windows Mobile 6.1 confused the situation further by tweaking memory management but not updating the actual CE kernel version. Notable differences between the Windows CE 5.x and Windows CE 6.0 kernels will be called out when appropriate. Remember the CE 6.x kernels are not in use for any currently available Windows Mobile devices.

Windows CE was developed specifically for embedded applications and is not based on the Windows NT kernel used in Microsoft’s desktop and server operating systems. Even though the two platforms are different, those familiar with the Windows NT kernel will recognize many of the primitives and concepts used by the Windows CE kernel. The devil remains in the details, and those familiar with how Windows NT manages security are recommended to forget that information promptly. The same security model does not apply in the mobile world.

The Windows CE 5.x kernel is a fully preemptive and multithreaded kernel. Unlike Windows NT, Windows CE is a single-user operating system. Security is handled by assigning each process a trust level that is tracked and managed by the kernel. Common Windows NT primitives such as security descriptors (SDs) and access control lists (ACLs) do not exist.

The Windows CE kernel handles memory management as well as process and thread scheduling. These services are implemented within the nk.exe executable. Other kernel facilities such as the file, graphics, and services subsystems run within their own processes.

Memory Layout

Windows Mobile 6.x uses a unique “slot-based” memory architecture. When a Windows Mobile device boots, the kernel allocates a single 4GB virtual address space that will be shared by all processes. The upper 2GB of this virtual memory (VM) address space is assigned to the kernel. The lower 2GB region is divided into 64 slots, each 32MB large.

The lower 32 slots contain processes, with the exception of Slot 0 and Slot 1. Slot 0 always refers to the current running process, and Slot 1 contains eXecute-in-Place (XiP) dynamic link libraries (DLLs).

DLLs contain library code loaded at application runtime and referenced by multiple running applications. Sharing code with DLLs reduces the number of times that the same code is loaded into memory, reducing the overall memory load on the system. XiP DLLs are a special kind of DLL unique to Windows CE and Windows Mobile. Unlike normal DLLs, XiP DLLs exist in ROM locations and therefore are never loaded into RAM. All DLLs shipped in the ROM are XiP, which goes a long way toward reducing the amount of RAM used to store running program code. Above the process slots is the Large Memory Area (LMA). Memory-mapped file data is stored within the LMA and is accessible to all running processes.

Above the LMA is a series of reserved slots. Windows Mobile 6 has only one reserved slot, Slot 63. This slot is reserved for resource-only DLLs. Windows Mobile 6.1 added four additional slots: Slot 62 for shared heaps, Slots 60 and 61 for large DLLs, and Slot 59 for Device Manager stacks. These additional slots were added to address situations where the device would have actual physical RAM remaining but the process had exhausted its VM address space.

A process’s slot contains non-XiP DLLs, the application’s code, heap, static data, dynamically allocated data, and stack. In Windows Mobile 6.1, some of the non-XiP DLLs may be moved into Slots 60 and 61. Moving the non-XiP DLLs into these slots removes them from the current process’s memory slot, freeing that virtual memory to be used for application data. Even though DLLs are loaded into the same address space for all processes, the Virtual Memory Manager (VMM) prevents applications from modifying shared DLL code.

To maintain the reliability and security of the device, the VMM ensures that each process does not access memory outside of its assigned slot. If processes could read or write any memory on the system, a malicious process could gain access to sensitive information or modify a privileged process’s behavior. Isolating each application into its own slot also protects the system from buggy applications. If an application overwrites or leaks memory, causing a crash, only the faulting application will terminate and other applications will not be affected. The check is performed by verifying the slot number stored in the most significant byte of a memory address.

OEM and other privileged applications can use the SetProcPermissions, MapCallerPtr, and MapPtrProcess APIs to marshal pointers between processes (refer to http://msdn.microsoft.com/en-us/library/aa930910.aspx). These APIs are not exposed as part of the Windows Mobile SDK and are only used by device driver writers. To call these APIs, applications must be running at the Privileged level.

When Windows CE was first introduced, embedded devices with large amounts of RAM were very rare, and it was unlikely that a process could exhaust all of its VM space. Modern devices have much more memory, and the Windows CE 6.0 kernel abolishes the “slot” layout in favor of providing each process with a full 2GB VM address space. This new architecture is much more robust and closely resembles a traditional desktop OS architecture.

Windows CE Processes

A process in Windows CE is a single instance of a running application. The Windows CE 5.x kernel supports up to 32 processes running at any one time. Thirty-two is a bit of a misnomer because the kernel process (nk.exe) always occupies one slot, leaving 31 slots available for user processes. Each process is assigned a “slot” in the kernel’s process table. Some of the slots are always occupied by critical platform services such as the file system (filesys.exe), Device Manager (device.exe), and the Graphical Windowing Environment System (GWES.exe). The Windows CE 6.0 kernel expands this limit to 32,000. However, there are some restrictions that make this number more theoretical then practical. Additionally, the previously mentioned critical platform services have been moved into the kernel to improve performance.

Like in Windows NT, each process has at least one thread, and the process can create additional threads as required. The number of possible threads is limited by the system’s resources. These threads are responsible for carrying out the process’s work and are the primitives used by the kernel to schedule execution. In fact, after execution has begun, the process is really only a container used for display to the user and for referring to a group of threads. The kernel keeps a record of which threads belong to which processes. Some rootkit authors use this behavior to hide processes from Task Manager and other system tools.

Each thread is given its own priority and slice of execution time to run. When the kernel decides a thread has run for long enough, it will stop the thread’s execution, save the current execution context, and swap execution to a new thread. The scheduling algorithm is set up to ensure that every thread gets its fair share of processing time. Certain threads, such as those related to phone functionality, have a higher priority and are able to preempt other running threads when necessary. There are 256 possible priority levels; only applications in privileged mode are able to set the priority level manually.

Services

Some applications start automatically and are always running in the background. These processes, referred to as services, provide critical device functionality or background tasks. Service configuration information is stored within the registry, and services must implement the Service Control interface so that they may be controlled by the Service Configuration Manager (SCM).

Objects

Inside the kernel, system resources are abstracted out as objects. The kernel is responsible for tracking an object’s current state and determining whether or not to grant a process access. Here are some examples of objects:

  • Synchronization objects (Event, Waitable Timer, Mutex, and Semaphore)

  • File objects, including memory mapped files

  • Registry keys

  • Processes and threads

  • Point-to-point message queues

  • Communications devices

  • Sockets

  • Databases

Objects are generally created using the Create*() or Open*() API function (for example, the event-creation API, CreateEvent). The creation functions do not return the actual objects; instead, they return a Win32 handle. Handles are 32-bit values that are passed into the kernel and are used by the Kernel Object Manager (KOM) to look up the actual resource. This way, the kernel knows about and can manage all open references to system objects. Applications use the handle to indicate to the kernel which object they would like to interact with.

Applications can choose to name their objects when creating or referencing them. Unnamed objects are generally used only within a single process or must be manually marshaled over to another process by using an interprocess communication (IPC) mechanism. Named objects provide a much easier means for multiple processes to refer to the same object. This technique is commonly used for cross-process synchronization. Here’s an example:

  1. Process 1 creates an event named FileReadComplete.

  2. The kernel creates the event and returns a handle to Process 1.

  3. Process 2 calls CreateEvent with the same name (FileReadComplete).

  4. The kernel identifies that the event already exists and returns a handle to Process 2.

  5. Process 1 signals the event.

  6. Process 2 is able to see the event has been signaled and starts its portion of the work.

In Windows Mobile, each system object type exists within its own namespace. This is different from Windows NT, where all object types exist within the same namespace. By providing a unique namespace per object type, it is easier to avoid name collisions.

The KOM’s handle table is shared across all processes on the system, and two handles open to the same object in different processes will receive the same handle value. Because Win32 handles are simply 32-bit integers, malicious applications can avoid asking the kernel for the initial reference and simply guess handle values. After guessing a valid handle value, the attacker can close the handle, which may cause applications using the handle to crash. In fact, some poorly written applications do this accidently today by not properly initializing handle values.

To prevent a Normal-level process from affecting Privileged-level processes, the object modification APIs check the process’s trust level before performing the requested operation on the handle’s resource. These time-of-use checks stop low-privileged processes from inappropriately accessing privileged objects, but they do not stop malicious applications from fabricating handle values and passing those handles to higher privileged processes. If the attacker’s application changes the object referred to by the handle, the privileged application may perform a privileged action on an unexpected object. Unfortunately, this attack cannot be prevented with the shared handle table architecture (refer to http://msdn.microsoft.com/en-us/library/bb202793.aspx).

Windows CE 6.0 removes the shared handle table and implements a per-process handle table. Handle values are no longer valid across processes, and it is not possible to fabricate handles by guessing. Therefore, attackers cannot cause handle-based denial-of-service conditions or elevate privileges by guessing handle values and passing them to higher privilege processes.

Remember that Windows Mobile does not support assigning ACLs to individual objects. Therefore, any process, regardless of trust level, can open almost any object. To protect certain key system objects, Windows Mobile maintains a blacklist of objects that cannot be written to by unprivileged processes. This rudimentary system of object protection is decent, but much data is not protected.

Kernel Mode and User Mode

There are two primary modes of execution on Windows CE 5.x devices: user mode and kernel mode. The current execution mode is managed on a per-thread basis, and privileged threads can change their mode using the SetKMode() function. Once a thread is running in kernel mode, the thread can access kernel memory; this is memory in the kernel’s address space above 0x8000000. Accessing kernel memory is a highly privileged operation because the device’s security relies on having a portion of memory that not all processes can modify. Contained within this memory is information about the current state of the device, including security policy, file system, encryption keys, and hardware device information. Reading or modifying any of this memory completely compromises the security of the device.

The concept of kernel mode and user mode is not unique to Windows Mobile—most operating systems have a privileged execution mode. Normally the OS uses special processor instructions to enter and exit kernel mode. Windows Mobile differs because the actual processor execution mode never changes. Windows Mobile only changes the individual thread’s memory mapping to provide a view of all memory when the thread enters kernel mode.

Application threads most often run in user mode. User mode threads do not have total access to the device and must leverage kernel services to accomplish most of their work. For example, a user mode thread wishing to use the file system must send a request through the kernel (nk.exe) to the file system process (FileSys.exe).

The request is carried out by using Windows CE’s system call (syscall) mechanism. Each system API is assigned a unique number that will be used to redirect the system call to the portion of system code responsible for carrying out the work. In many cases, this code is actually implemented in a separate process. The entire syscall mechanism works as follows:

  1. An application thread calls a system API (for example, CreateFile).

  2. The CreateFile method is implemented as a Thunk, which loads the unique number that identifies this call to the system. This number is a 32-bit number representing an invalid memory address.

  3. The Thunk jumps to this invalid memory address, causing the processor to generate a memory access fault.

  4. The kernel’s memory access handler, known as the pre-fetch abort handler, catches this fault and recognizes the invalid address as an encoded system call identifier.

  5. The kernel uses this identifier to look up the corresponding code location in an internal table. The code location is the actual implementation of the system API. In the case of CreateFile, this code resides within the memory space of FileSys.exe.

  6. If the kernel wants to allow the call, it changes the thread’s mode to kernel mode using SetKMode().

  7. The kernel then sets the user mode thread’s instruction pointer to the system API’s code location.

  8. The system API carries out its work and returns. When the process returns, it returns to an invalid address supplied by the kernel. This generates another memory access fault, which is caught by the kernel.

  9. The kernel recognizes the fault as a return address fault and returns execution to the user mode thread. The kernel also sets the thread’s execution mode back to user mode.

Throughout this process, the application code is always running on the same thread. The kernel is performing some tricks to change which process that thread executes in. This includes mapping pointers between the source process and the target process. One potential security problem is that a process could pass pointers to memory that it doesn’t have access to. For example, a malicious program is running in the memory range 0x880CB0C4 and passes a pointer in the range 0xD6S7BE42. After the kernel has performed operations on the malicious pointer, the memory in Process 2 would be modified—a clear security issue. The kernel prevents this attack by checking the top byte of the memory address against the process’s memory slot to verify that the calling process actually has access to the memory.

The system call process is complex and incurs a severe performance penalty due to the time required to handle faults, perform lookups, map memory, and jump to the corresponding locations. If the thread is already executing in kernel mode, the fault-handling process can be bypassed and the thread can jump directly to the system call implementation. Some Windows CE 5.x drivers do this. The Windows CE 6.0 kernel is significantly faster because core services such as Devices.exe, GWES.exe, and FileSystem.exe have been moved into the kernel, reducing the number of syscalls.

Other  
  •  Windows Mobile Security - Introduction to the Platform
  •  iPhone Programming : Table-View-Based Applications - Building a Model
  •  Mobile Application Security : The Apple iPhone - Push Notifications, Copy/Paste, and Other IPC
  •  Mobile Application Security : The Apple iPhone - Networking
  •  Windows Phone 7 Development : Handling Device Exceptions
  •  Registering a Windows Phone Device for Debugging
  •  Programming the Mobile Web : WebKit CSS Extensions (part 5) - Transformations
  •  Programming the Mobile Web : WebKit CSS Extensions (part 4) - Animations
  •  Programming the Mobile Web : WebKit CSS Extensions (part 3) - Transitions
  •  Programming the Mobile Web : WebKit CSS Extensions (part 2) - Reflection Effects & Masked Images
  •  Programming the Mobile Web : WebKit CSS Extensions (part 1) - WebKit Functions & Gradients
  •  Windows Phone 7 Development : Debugging Application Exceptions (part 2) - Debugging a Web Service Exception
  •  Windows Phone 7 Development : Debugging Application Exceptions (part 1) - Debugging Page Load Exceptions
  •  Programming the Mobile Web : JavaScript Libraries
  •  Programming the Mobile Web : Ajax Support
  •  Windows Phone 7 Development : Building a Phone Client to Access a Cloud Service (part 5) - Deploying the Service to Windows Azure
  •  Windows Phone 7 Development : Building a Phone Client to Access a Cloud Service (part 4) - Coding NotepadViewModel
  •  Windows Phone 7 Development : Building a Phone Client to Access a Cloud Service (part 3) - Coding the BoolToVisibilityConvert
  •  Windows Phone 7 Development : Building a Phone Client to Access a Cloud Service (part 2) - Coding MainPage
  •  Windows Phone 7 Development : Building a Phone Client to Access a Cloud Service (part 1) - Building the User Interface
  •  
    Most View
    ASP.NET State Management Techniques : Understanding the Application/Session Distinction
    Kindle Fire HD - Most Advanced 7" Tablet (Part 1)
    Transact-SQL in SQL Server 2008 : Row Constructors
    Lenovo ThinkPad Edge S430 - Classic Business Laptop
    ASP.NET State Management Techniques : Maintaining Session Data
    G-360 And G-550 Power Supply Devices Review (Part 1)
    SQL Server 2008 : Index maintenance
    The Linux Build: Part For Penguins (Part 1)
    Three-resistance Xperia Go
    Home Security On A Budget (Part 2)
    Top 10
    Thermaltake Cases Are Suitable For Everyone’s Budget (Part 7)
    Thermaltake Cases Are Suitable For Everyone’s Budget (Part 6)
    Thermaltake Cases Are Suitable For Everyone’s Budget (Part 5) : Thermaltake Armor Revo
    Thermaltake Cases Are Suitable For Everyone’s Budget (Part 4) : Thermaltake Level 10 GTS
    Thermaltake Cases Are Suitable For Everyone’s Budget (Part 3) : Thermaltake Commander MS-III
    Thermaltake Cases Are Suitable For Everyone’s Budget (Part 2)
    Thermaltake Cases Are Suitable For Everyone’s Budget (Part 1) : Thermaltake Commander MS-I
    LG Optimus L9 - A Cheap Middle Class Android Phone (Part 3)
    LG Optimus L9 - A Cheap Middle Class Android Phone (Part 2)
    LG Optimus L9 - A Cheap Middle Class Android Phone (Part 1)