Resource partitioning is
something that has been integrated with the HP-UX kernel since version
9.0 of HP-UX. Over the years, HP has gradually increased the
functionality; today you can provide a remarkable level of isolation
between applications running in a single copy of HP-UX. The current
version provides both resource isolation, something that has been there
from the beginning of resource partitions, and security isolation, the
newest addition. Figure 1
shows how the resource isolation capabilities allows multiple
applications to run in a single copy of HP-UX while ensuring that each
partition gets its share of resources.
Within a single copy of HP-UX, you have the ability to create multiple partitions. To each partition you can:
Allocate a CPU entitlement using whole-CPU granularity (processor sets) or sub-CPU granularity (fair share scheduler)
Allocate a block of memory
Allocate disk I/O bandwidth
Assign a set of users and/or application processes that should run in the partition
Create
a security compartment around the processes that ensures that processes
in other compartments can't communicate or send signals to the
processes in this Secure Resource Partition
One unique feature of HP's
implementation of resource partitions is that inside the HP-UX kernel,
we instantiate multiple copies of the memory management subsystem and
multiple process schedulers. This ensures that if an application runs
out of control and attempts to allocate excessive amounts of resources,
the system will constrain that application. For example, when we
allocate four CPUs and 8GB of memory to Partition 0 in Figure 2-18,
if the application running in that partition attempts to allocate more
than 8GB of memory, it will start to page, even if there is 32GB of
memory on the system. Similarly, the processes running in that partition
are scheduled on the four CPUs that are assigned to the partition. No
processes from other partitions are allowed to run on those CPUs, and
processes assigned to this partition are not allowed to run on the CPUs
that are assigned to the other partitions. This guarantees that if a
process running in any partition spins out of control, it can't impact
the performance of any application running in any other partition.
A new feature of HP-UX is
security containment. This is really the migration of functionality
available in HP VirtualVault for many years into the standard HP-UX
kernel. This is being done in a way that allows customers to choose
which of the security features they want to be activated individually.
The security-containment feature allows users to ensure that processes
and applications running on HP-UX can be isolated from other processes
and applications. Specifically, it is possible to erect a boundary
around a group of processes that insulates those processes from IPC
communication with the rest of the processes on the system. It is also
possible to define access to file systems and network interfaces. This
feature is being integrated with PRM to provide Secure Resource
Partitions.
Resource Controls
The resource controls available with Secure Resource Partitions include:
CPU controls:
You can allocate a CPU to a partition with sub-CPU granularity
using the fair share scheduler (FSS) or with whole-CPU granularity using
processor sets.
Real memory:
Shares of the physical memory on the system can be allocated to partitions.
Disk I/O bandwidth:
Shares of the bandwidth to any volume group can be allocated to each partition.
More details about what is possible and how these features are implemented are provided below.
CPU Controls
A CPU can be
allocated to Secure Resource Partitions with sub-CPU granularity or
whole-CPU granularity. Both of these features are implemented inside the
kernel. The sub-CPU granularity capability is implemented by the FSS.
The fair share scheduler is
implemented as a second level of time-sharing on top of the standard
HP-UX scheduler. The FSS allocates a CPU to each partition in large 10ms
time ticks. When a particular partition gets access to a CPU, the
process scheduler for that partition analyzes the process run queue for
that partition and runs those processes using standard HP-UX
process-scheduling algorithms.
CPU allocation via
processor sets (PSETs) is quite different in that CPU resources are
allocated to each of the partitions on whole CPU boundaries. What this
means is that you assign a certain number of whole CPUs to each
partition rather than a share of them. The scheduler in the partition
will then schedule the processes that are running there only on the CPUs
assigned to the partition. This is illustrated in Figure 2.
The configuration shown in Figure 2-19
shows the system split into three partitions. Two will run Oracle
instances and the other partition runs the rest of the processing on the
system. This means that the Oracle processes running in partition 1
will run on the two CPUs assigned to that partition. These processes
will not run on any other CPUs in the system, nor will any processes
from the other partitions run on these two CPUs.
Comparing FSS to
PSETs is best done using an example. If you have an eight-CPU partition
that you wish to assign to three workloads with 50% going to one
workload and 25% going to each of the others, you have the option of
setting up PSETs with the configuration illustrated in Figure 2-19
or setting up FSS groups with 50, 25, and 25 shares. The difference
between the two is that the processes running in partition 1 will either
get 100% of the CPU cycles on two CPUs or 25% of the cycles on all
eight CPUs.
Memory Controls
In Figure 2-19,
we see that each of the partitions in this configuration also has a
block of memory assigned. This is optional, but it provides another
level of isolation between the partitions. HP-UX 11i introduced a new
memory-control technology called memory resource groups, or MRGs. This
is implemented by providing a separate memory manager for each
partition, all running in a single copy of the kernel. This provides a
very strong level of isolation between the partitions. As an example, if
PSET partition 1 above was allocated two CPUs and 4GB of memory, the
memory manager for partition 1 will manage the memory allocated by the
processes in that partition within the 4GB that was assigned. If those
processes attempt to allocate more than 4GB, the memory manager will
start to page out memory to make room, even though there may be 16GB of
memory available in the partition.
The default behavior is
to allow unused memory to be shared between the partitions. In other
words, if the application in partition 1 is only using 2GB of its 4GB
entitlement, then processes in the other partitions can “borrow” the
available 2GB. However, as soon as processes in partition 1 start to
allocate additional memory, the memory that was loaned out will be
retrieved. There is an option on MRGs that allows you to “isolate” the
memory in a partition. What that means is that the 4GB assigned to the
partition will not be loaned out and the partition will not be allowed
to borrow memory from any of the other partitions either.
Disk I/O Controls
HP-UX supports disk I/O
bandwidth controls for both LVM and VxVM volume groups. You set this up
by assigning a share of the bandwidth to each volume group to each
partition. LVM and VxVM each call a routine provided by PRM that will
reshuffle the I/O queues to ensure that the bandwidth to the volume
group is allocated in the ratios assigned. For example, if partition 1
has 50% of the bandwidth, the queue will be shuffled to ensure that
every other I/O request comes from processes in that partition.
One thing to note here is
that because this is implemented by shuffling the queue, the controls
are active only when a queue is building, which happens when there is
contention for I/O. This is probably what you want. It normally doesn't
make sense to constrain the bandwidth available to one application when
that bandwidth would go to waste if you did.
Security Controls
The newest feature
added to resource partitions is security containment. With the
introduction of security containment in HP-UX 11i V2, we have integrated
some of this functionality with resource partitions to create Secure
Resource Partitions. There are three major features of the security
containment product:
These features have
been available in secure versions of HP-UX and Linux but have now been
integrated into the base HP-UX in a way that allows them to be
optionally activated. Let's look at each of these in detail.
Compartments
The purpose of compartments
is to allow you to provide control of the interprocess communication
(IPC), device, and file accesses from a group of processes. This is
illustrated in Figure 3.
The processes in
each compartment can freely communicate with each other and can freely
access files and directories assigned to the partition, but no access to
processes or files in other compartments is permitted unless a rule has
been defined that allows that specific access. Additionally, the
network interfaces, including pseudo-interfaces, are assigned to a
compartment. Communication over the network is restricted to the
interfaces in the local compartment unless a rule is defined that allows
access to an interface in another compartment.
Fine-Grained Privileges
Traditional HP-UX
provided very basic control of special privileges, such as overriding
permission to access files. Generally speaking, the root user had all
privileges and other users had none. With the introduction of security
containment, the privileges can now be assigned at a very granular
level. There are roughly 30 separate privileges that you can assign.
The combination of
these fine-grained privileges and the role-based access control we
discuss in the next section allows you to assign specific privileges to
specific users when running specific commands. This provides the ability
to implement very detailed security policies. Keep in mind, though,
that the more security you wish to impose, the more time will be spent
getting the configuration set up and tested.
Role-Based Access Controls (RBAC)
In many very
secure environments, customers require the ability to cripple or remove
the root user from the system. This ensures that if there is a
successful break-in to the system and an intruder gains root access, he
or she can do little or no damage. In order to provide this, HP has
implemented role-based access control in the kernel. This is integrated
with the fine-grained privileges so that it is possible to define a
“user admin” role as someone who has the ability to create directories
under /home and can edit the /etc/password file. You can then assign one
or more of your system administrators as “user admin” and they will be
able to create and modify user accounts only without having to know the
root password.
This is implemented by
defining a set of authorizations and a set of roles that have those
authorizations against a specific set of objects. Another example would
be giving a printer admin authorization to start or stop a particular
print queue.
Implementing these
using roles makes it much easier to maintain the controls over time. As
users come and go, they can be removed from the list of users who have a
particular role, but the role is still there and the other users are
not impacted by that change. You can also add another object to be
managed, like another print queue, and add it to the printer admin role
and all the users with that role will automatically get that
authorization; you will not have to add it to every user. A sample set
of roles is shown in Figure 4.
Secure Resource Partitions
An interesting
perspective of Secure Resource Partitions is that it is really a set of
technologies that are embedded in the HP-UX kernel. These include FSS
and PSETs for CPU control, memory resource groups for memory controls,
LVM and VxVM for disk I/O bandwidth control, and security containment
for process communication isolation.
The product that makes
it possible to define Secure Resource Partitions is Process Resource
Manager (PRM). All of the other technologies allow you to control a
group of processes running on an HP-UX instance. What PRM does is make
it much easier for you to define the controls for any or all of them on
the same set of processes. You do this by defining a group of users
and/or processes, called a PRM group, and then assigning CPU, memory,
disk I/O, and security entitlements for that group of processes. Figure 5 provides a slightly modified view of Fig ure 2-18, which includes the security isolation in addition to the resource controls.
This diagram
illustrates the ability to control both resources and security
containment with a single solution. One point to make about PRM is that
it doesn't yet allow the configuration of all the features of the
underlying technology. For example, PRM controls groups of processes, so
it doesn't provide the ability to configure the role-based access
control features of the security-containment technology. It does,
however, allow you to define a compartment for the processes to run in
and will also allow you to assign one or more network interfaces to each
partition if you define the security features.
The default behavior of
security compartments is that processes will be able to communicate
with any process running in the same compartment but will not be able to
communicate with any processes running in any other compartment.
However, file access uses standard file system security by default. This
is done to ensure that independent software vendor applications will be
able to run in this environment without modifications and without
requiring the user to configure in potentially complex file-system
security policies. However, if you are interested in tighter file-system
security and are willing to configure that, there are facilities to
allow you to do that. For network access, you can assign multiple
pseudo-LAN interfaces (eg. lan0, lan1, etc.) to a single physical
network interface card. This gives you the ability to have more
pseudo-interfaces and IP addresses than real interfaces. This is nice
for security compartments and SRPs because you can create at least one
pseudo-interface for each compartment, allowing each compartment to have
its own set of IP addresses. The network interface code in the kernel
has been modified to ensure that no two pseudo-interfaces can see each
others' packets even if they are using the same physical interface card.
The security integration
into PRM for Secure Resource Partitions uses the default compartment
definitions, with the exception of network interface rules. Most modern
applications require network access, so this was deemed a requirement.
When using PRM to define an SRP, you have the ability to assign at least
one pseudo-interface to each partition, along with the resource
controls discussed earlier in this section.
User and Process Assignment
Because all the
processes running in all the SRPs are running in the same copy of HP-UX,
it is critical to ensure that users and processes get assigned to the
correct partition as they come and go. In order to simplify this process
across all the SRP technologies, PRM provides an application manager.
This is a daemon that is configured to know what users and applications
should be running in each of the defined SRPs.
Resource Partition integration with HP-UX
Because resource partitioning
and PRM were introduced in HP-UX in 1995, this technology is thoroughly
integrated with the operating system. HP-UX functions and tools such as
fork(), exec(), cron, at, login, ps, and GlancePlus are all integrated
and will react appropriately if Secure Resource Partitions are
configured. For example:
Login will query
the PRM configuration for user records and will start the users' shell
in the correct partition based on that configuration
The
ps command has two command-line options, –P and –R, which will either
show the PRM partition each process displayed is in or only show the
processes in a particular partition.
GlancePlus
will group the many statistics it collects for all the processes
running in each partition. You can also use the GlancePlus user
interface to move a process from one partition to another.
The result is that you get a product that has been enhanced many times over the years to provide a robust and complete solution.