8. ProFusion Chipset
The next evolution was the ProFusion chipset, shown in Figure 8.
Jointly developed by HP and
Intel, the ProFusion chipset uses a unique two-port (bus) memory design
with 1.6GB/s memory bandwidth (2 × 800MB/s), allowing simultaneous
access to memory on both ports.
The ProFusion
Memory Access Controller (MAC) manages two separate memory controllers
on separate memory buses. This technique employs high-speed synchronous
DRAM
(SDRAM) interleaved on a cache line basis. One controller manages all of
the odd cache line addresses, and the other manages all of the even
cache line addresses. As a result, latency is reduced when accessing two
consecutive cache lines in address order. The MAC arbitrates the access
by the two processors on the two main system buses to the memory and
I/O system.
The
ProFusion Data Interface Buffer (DIB) provides the data path control
and buffering between the AGTL+ buses and memory. The DIB consists of
buffers between the processor, I/O AGTL+ buses, and memory. If the
required dual inline memory module
(DIMM) is busy when the MAC initiates a memory cycle, the DIB
temporarily stores the address and forwards it to the memory on the next
available cycle.
9. F8 Chipset
The next evolution of HP system architecture design is the F8
chipset architecture, which is based on the ProFusion chipset. HP
leveraged experience gained with the ProFusion eight-way architecture to
develop the F8 eight-way multiprocessing architecture. The F8
architecture is shown in Figure 9.
HP developed the F8
chipset with a multiport, nonblocking crossbar switch to optimize
efficiency and allow simultaneous access to memory, processor, and I/O
subsystems.
The F8 chipset supports
multiple PCI-X bridges and incorporates the HP embedded PCI Hot Plug
controller for high availability in the I/O subsystem.
9.1. F8 Chipset Architecture
The F8 chipset was
designed for even higher performance by optimizing the crossbar switch
component and increasing bandwidths to match the processing power of the
Intel Xeon MP processor.
The design includes a large buffer capable of holding 128 cache lines, 13 read ports, and 4 write ports
These
features increase the number of concurrent transactions in the switch,
and include a cache-coherency filter and a patent-pending Guaranteed
Snoop Access algorithm to reduce the amount of cross-bus traffic. All of
these features increase efficiency and improve the scalability of the
F8 architecture.
For example, a bus operating at 100MHz and transferring four data packets on each clock would have 400MT/s (mega transfers per second).
The F8 chipset
includes several innovations that provide high bandwidth, including
scalable memory performance, PCI-X bridges, and Hot Plug RAID Memory.
9.1.1. SCALABLE MEMORY PERFORMANCE
HP engineers
ensure scalable memory performance by increasing the memory bandwidths
to an aggregate of 8.5GB/s, which is 33% more than the bandwidth of the
processor buses combined. This design provides ample headroom for
computing needs because each processor bus has four times the bandwidth
of the Pentium III Xeon processor bus.
9.1.2. PCI-X BRIDGES
The F8 chipset can include
up to four industry-standard dual PCI-X bridges in the I/O subsystem,
each with embedded PCI Hot Plug controllers. Because each bridge can
support two PCI-X bus segments operating at speeds up to 100MHz, the
chipset can easily accommodate peripherals using high-speed
interconnects, such as Gigabit Ethernet and Ultra320 SCSI.
9.1.3. HOT PLUG RAID MEMORY
Servers that enable Hot
Plug RAID Memory use RAID DIMMs to provide fault tolerance and enable
the hot replacement and the hot addition of memory when the server is
operating. This eliminates unplanned downtime in the case of a DIMM
failure.
When the memory
controller in the F8 chipset needs to write data to memory, it splits
the cache line of data into four blocks. Then each block is written, or
striped, across four memory modules. A RAID engine calculates parity
information, which is stored on a fifth cartridge dedicated to parity.
With the four data cartridges and the parity cartridge, the data
subsystem is redundant so that if the data from any DIMM is incorrect or
any cartridge is removed, the data can be rebuilt from the remaining
four cartridges.
9.2. F8 Chipset Advantages
When compared to a cache-coherent Non-Uniform Memory Access (cc-NUMA) architecture, the F8 symmetrical multiprocessor
(SMP) architectures provide (1) a simpler programming model, (2)
reduced average latencies overall, and (3) the ability to use standard
operating systems and applications.
The F8 chipset from HP delivers these additional advantages:
Eliminates potential bottlenecks by using very high bandwidths to match the processing power of the Xeon MP processor
Eliminates potential bottlenecks using optimized crossbar switch capabilities
Expands online replacement capabilities to include Hot Plug RAID Memory
The F8 chipset
includes five memory controllers with HP Hot Plug RAID Memory and a
multiported crossbar switch. The F8 chipset supports the following:
8.5GB/s of aggregate memory bandwidth
3.2GB/s of bandwidth for each processor bus
32GB of Hot Plug RAID Memory (40 DIMMs)
Up to four 100MHz PCI-X bridges with hot-plug support
Eight Pentium IV Xeon processors