Windows
Mobile 6 uses the Windows CE 5.2 kernel. The 5.1 version of this kernel
served as the basis for Windows Mobile 5. Because the behaviors of the
5.1 and 5.2 CE kernels are so similar.
Windows
Mobile 6.1 confused the situation further by tweaking memory management
but not updating the actual CE kernel version. Notable differences
between the Windows CE 5.x and Windows CE 6.0 kernels will be called out
when appropriate. Remember the CE 6.x kernels are not in use for any
currently available Windows Mobile devices.
Windows CE was
developed specifically for embedded applications and is not based on the
Windows NT kernel used in Microsoft’s desktop and server operating
systems. Even though the two platforms are different, those familiar
with the Windows NT kernel will recognize many of the primitives and
concepts used by the Windows CE kernel. The devil remains in the
details, and those familiar with how Windows NT manages security are
recommended to forget that information promptly. The same security model
does not apply in the mobile world.
The Windows CE 5.x kernel is a
fully preemptive and multithreaded kernel. Unlike Windows NT, Windows CE
is a single-user operating system. Security is handled by assigning
each process a trust level that is tracked and managed by the kernel.
Common Windows NT primitives such as security descriptors (SDs) and
access control lists (ACLs) do not exist.
The Windows CE kernel
handles memory management as well as process and thread scheduling.
These services are implemented within the nk.exe executable. Other
kernel facilities such as the file, graphics, and services subsystems
run within their own processes.
Memory Layout
Windows Mobile 6.x uses a
unique “slot-based” memory architecture. When a Windows Mobile device
boots, the kernel allocates a single 4GB virtual address space that will
be shared by all processes. The upper 2GB of this virtual memory (VM)
address space is assigned to the kernel. The lower 2GB region is divided
into 64 slots, each 32MB large.
The lower 32 slots contain
processes, with the exception of Slot 0 and Slot 1. Slot 0 always
refers to the current running process, and Slot 1 contains
eXecute-in-Place (XiP) dynamic link libraries (DLLs).
DLLs contain library code
loaded at application runtime and referenced by multiple running
applications. Sharing code with DLLs reduces the number of times that
the same code is loaded into memory, reducing the overall memory load on
the system. XiP DLLs are a special kind of DLL unique to Windows CE and
Windows Mobile. Unlike normal DLLs, XiP DLLs exist in ROM locations and
therefore are never loaded into RAM. All DLLs shipped in the ROM are
XiP, which goes a long way toward reducing the amount of RAM used to
store running program code. Above the process slots is the Large Memory
Area (LMA). Memory-mapped file data is stored within the LMA and is
accessible to all running processes.
Above
the LMA is a series of reserved slots. Windows Mobile 6 has only one
reserved slot, Slot 63. This slot is reserved for resource-only DLLs.
Windows Mobile 6.1 added four additional slots: Slot 62 for shared
heaps, Slots 60 and 61 for large DLLs, and Slot 59 for Device Manager
stacks. These additional slots were added to address situations where
the device would have actual physical RAM remaining but the process had
exhausted its VM address space.
A process’s slot contains
non-XiP DLLs, the application’s code, heap, static data, dynamically
allocated data, and stack. In Windows Mobile 6.1, some of the non-XiP
DLLs may be moved into Slots 60 and 61. Moving the non-XiP DLLs into
these slots removes them from the current process’s memory slot, freeing
that virtual memory to be used for application data. Even though DLLs
are loaded into the same address space for all processes, the Virtual
Memory Manager (VMM) prevents applications from modifying shared DLL
code.
To maintain the reliability
and security of the device, the VMM ensures that each process does not
access memory outside of its assigned slot. If processes could read or
write any memory on the system, a malicious process could gain access to
sensitive information or modify a privileged process’s behavior.
Isolating each application into its own slot also protects the system
from buggy applications. If an application overwrites or leaks memory,
causing a crash, only the faulting application will terminate and other
applications will not be affected. The check is performed by verifying
the slot number stored in the most significant byte of a memory address.
OEM and other
privileged applications can use the SetProcPermissions, MapCallerPtr,
and MapPtrProcess APIs to marshal pointers between processes (refer to http://msdn.microsoft.com/en-us/library/aa930910.aspx).
These APIs are not exposed as part of the Windows Mobile SDK and are
only used by device driver writers. To call these APIs, applications
must be running at the Privileged level.
When Windows CE was
first introduced, embedded devices with large amounts of RAM were very
rare, and it was unlikely that a process could exhaust all of its VM
space. Modern devices have much more memory, and the Windows CE 6.0
kernel abolishes the “slot” layout in favor of providing each process
with a full 2GB VM address space. This new architecture is much more
robust and closely resembles a traditional desktop OS architecture.
Windows CE Processes
A process in Windows CE is a
single instance of a running application. The Windows CE 5.x kernel
supports up to 32 processes running at any one time. Thirty-two is a bit
of a misnomer because the kernel process (nk.exe) always occupies one
slot, leaving 31 slots available for user processes. Each process is
assigned a “slot” in the kernel’s process
table. Some of the slots are always occupied by critical platform
services such as the file system (filesys.exe), Device Manager
(device.exe), and the Graphical Windowing Environment System (GWES.exe).
The Windows CE 6.0 kernel expands this limit to 32,000. However, there
are some restrictions that make this number more theoretical then
practical. Additionally, the previously mentioned critical platform
services have been moved into the kernel to improve performance.
Like in Windows NT, each
process has at least one thread, and the process can create additional
threads as required. The number of possible threads is limited by the
system’s resources. These threads are responsible for carrying out the
process’s work and are the primitives used by the kernel to schedule
execution. In fact, after execution has begun, the process is really
only a container used for display to the user and for referring to a
group of threads. The kernel keeps a record of which threads belong to
which processes. Some rootkit authors use this behavior to hide
processes from Task Manager and other system tools.
Each thread is given its own
priority and slice of execution time to run. When the kernel decides a
thread has run for long enough, it will stop the thread’s execution,
save the current execution context, and swap execution to a new thread.
The scheduling algorithm is set up to ensure that every thread gets its
fair share of processing time. Certain threads, such as those related to
phone functionality, have a higher priority and are able to preempt
other running threads when necessary. There are 256 possible priority
levels; only applications in privileged mode are able to set the
priority level manually.
Services
Some applications start automatically and are always running in the background. These processes, referred to as services,
provide critical device functionality or background tasks. Service
configuration information is stored within the registry, and services
must implement the Service Control interface so that they may be
controlled by the Service Configuration Manager (SCM).
Objects
Inside the kernel, system
resources are abstracted out as objects. The kernel is responsible for
tracking an object’s current state and determining whether or not to
grant a process access. Here are some examples of objects:
Synchronization objects (Event, Waitable Timer, Mutex, and Semaphore)
File objects, including memory mapped files
Registry keys
Processes and threads
Point-to-point message queues
Communications devices
Sockets
Databases
Objects are generally created
using the Create*() or Open*() API function (for example, the
event-creation API, CreateEvent). The creation functions do not return
the actual objects; instead, they return a Win32 handle. Handles are
32-bit values that are passed into the kernel and are used by the Kernel
Object Manager (KOM) to look up the actual resource. This way, the
kernel knows about and can manage all open references to system objects.
Applications use the handle to indicate to the kernel which object they
would like to interact with.
Applications can choose to
name their objects when creating or referencing them. Unnamed objects
are generally used only within a single process or must be manually
marshaled over to another process by using an interprocess communication
(IPC) mechanism. Named objects provide a much easier means for multiple
processes to refer to the same object. This technique is commonly used
for cross-process synchronization. Here’s an example:
Process 1 creates an event named FileReadComplete.
The kernel creates the event and returns a handle to Process 1.
Process 2 calls CreateEvent with the same name (FileReadComplete).
The kernel identifies that the event already exists and returns a handle to Process 2.
Process 1 signals the event.
Process 2 is able to see the event has been signaled and starts its portion of the work.
In Windows Mobile, each
system object type exists within its own namespace. This is different
from Windows NT, where all object types exist within the same namespace.
By providing a unique namespace per object type, it is easier to avoid
name collisions.
The KOM’s handle table is
shared across all processes on the system, and two handles open to the
same object in different processes will receive the same handle value.
Because Win32 handles are simply 32-bit integers, malicious applications
can avoid
asking the kernel for the initial reference and simply guess handle
values. After guessing a valid handle value, the attacker can close the
handle, which may cause applications using the handle to crash. In fact,
some poorly written applications do this accidently today by not
properly initializing handle values.
To prevent a Normal-level
process from affecting Privileged-level processes, the object
modification APIs check the process’s trust level before performing the
requested operation on the handle’s resource. These time-of-use checks
stop low-privileged processes from inappropriately accessing privileged
objects, but they do not stop malicious applications from fabricating
handle values and passing those handles to higher privileged processes.
If the attacker’s application changes the object referred to by the
handle, the privileged application may perform a privileged action on an
unexpected object. Unfortunately, this attack cannot be prevented with
the shared handle table architecture (refer to http://msdn.microsoft.com/en-us/library/bb202793.aspx).
Windows CE 6.0 removes
the shared handle table and implements a per-process handle table.
Handle values are no longer valid across processes, and it is not
possible to fabricate handles by guessing. Therefore, attackers cannot
cause handle-based denial-of-service conditions or elevate privileges by
guessing handle values and passing them to higher privilege processes.
Remember that Windows Mobile
does not support assigning ACLs to individual objects. Therefore, any
process, regardless of trust level, can open almost any object. To
protect certain key system objects, Windows Mobile maintains a blacklist
of objects that cannot be written to by unprivileged processes. This
rudimentary system of object protection is decent, but much data is not
protected.
Kernel Mode and User Mode
There are two primary modes
of execution on Windows CE 5.x devices: user mode and kernel mode. The
current execution mode is managed on a per-thread basis, and privileged
threads can change their mode using the SetKMode() function. Once a
thread is running in kernel mode, the thread can access kernel memory;
this is memory in the kernel’s address space above 0x8000000. Accessing
kernel memory is a highly privileged operation because the device’s
security relies on having a portion of memory that not all processes can
modify. Contained within this memory is information about the current
state of the device, including security policy, file system, encryption
keys, and hardware device information. Reading or modifying any of this
memory completely compromises the security of the device.
The
concept of kernel mode and user mode is not unique to Windows
Mobile—most operating systems have a privileged execution mode. Normally
the OS uses special processor instructions to enter and exit kernel
mode. Windows Mobile differs because the actual processor execution mode
never changes. Windows Mobile only changes the individual thread’s
memory mapping to provide a view of all memory when the thread enters
kernel mode.
Application threads most
often run in user mode. User mode threads do not have total access to
the device and must leverage kernel services to accomplish most of their
work. For example, a user mode thread wishing to use the file system
must send a request through the kernel (nk.exe) to the file system
process (FileSys.exe).
The request is carried out by
using Windows CE’s system call (syscall) mechanism. Each system API is
assigned a unique number that will be used to redirect the system call
to the portion of system code responsible for carrying out the work. In
many cases, this code is actually implemented in a separate process. The
entire syscall mechanism works as follows:
An application thread calls a system API (for example, CreateFile).
The
CreateFile method is implemented as a Thunk, which loads the unique
number that identifies this call to the system. This number is a 32-bit
number representing an invalid memory address.
The Thunk jumps to this invalid memory address, causing the processor to generate a memory access fault.
The
kernel’s memory access handler, known as the pre-fetch abort handler,
catches this fault and recognizes the invalid address as an encoded
system call identifier.
The
kernel uses this identifier to look up the corresponding code location
in an internal table. The code location is the actual implementation of
the system API. In the case of CreateFile, this code resides within the
memory space of FileSys.exe.
If the kernel wants to allow the call, it changes the thread’s mode to kernel mode using SetKMode().
The kernel then sets the user mode thread’s instruction pointer to the system API’s code location.
The
system API carries out its work and returns. When the process returns,
it returns to an invalid address supplied by the kernel. This generates
another memory access fault, which is caught by the kernel.
The
kernel recognizes the fault as a return address fault and returns
execution to the user mode thread. The kernel also sets the thread’s
execution mode back to user mode.
Throughout
this process, the application code is always running on the same
thread. The kernel is performing some tricks to change which process
that thread executes in. This includes mapping pointers between the
source process and the target process. One potential security problem is
that a process could pass pointers to memory that it doesn’t have
access to. For example, a malicious program is running in the memory
range 0x880CB0C4 and passes a pointer in the range 0xD6S7BE42. After the
kernel has performed operations on the malicious pointer, the memory in
Process 2 would be modified—a clear security issue. The kernel prevents
this attack by checking the top byte of the memory address against the
process’s memory slot to verify that the calling process actually has
access to the memory.
The system call process is
complex and incurs a severe performance penalty due to the time required
to handle faults, perform lookups, map memory, and jump to the
corresponding locations. If the thread is already executing in kernel
mode, the fault-handling process can be bypassed and the thread can jump
directly to the system call implementation. Some Windows CE 5.x drivers
do this. The Windows CE 6.0 kernel is significantly faster because core
services such as Devices.exe, GWES.exe, and FileSystem.exe have been
moved into the kernel, reducing the number of syscalls.