We investigate the latest CPU power
saving wizardry from the magicians at ARM
The pace of change in the phone and tablet
markets is breathtaking. Three years ago, the flagship desktop CPU was the Core
i7-2600K, and that's still a big-hitter even now. Subsequent Core i7 CPUs are
little more than die-shrinks - a nip or tuck here, an incremental lift to GPU
performance there.
Innovation seems to have stalled. In the
mobile arena, however, innovation is like a rabid dog, off the leash, foaming
at the mouth, snapping at any challenge. Over the last three years, phone and
tablet CPUs have gone from single-core to dual-core to quad-core - and beyond.
Clock speeds have nearly doubled. GPUs have seen similar advances too.
All this has occurred without major
increases in power-draw. Going forward, though, keeping power-draw in check
will prove increasingly tricky. The undisputed king of mobile CPU technology is
ARM, and its current assault on the demon watt is a mechanism dubbed
big.LlTTLE.
Big
The first SoC (system-on-chip) to implement
big.LITTLE is Samsung’s Exynos 5 Octa. This is the beating heart of Samsung’s
own Galaxy S4 smartphone, the most powerful handset on the shelves and the
fasted-selling Android device in history.
Longer
battery life
Sadly, only specific versions of the S4
feature the Octa: the ‘international GT-I9500 (1.6GHz) and South Korean
SHV-300K/L/S (1.8GHz). The USA gets the GT-I9505 and, against expectation, so
does Europe, including the UK. That’s unfortunate, as the SoC there is the
uninspiring Qualcomm Snapdragon 600 – a quad-core A9/A15 hybrid (1.9GHz),
without big.LITTLE. Imported Octa models have already started appearing on eBay
– at a hefty premium!
Samsung
Exynos Processor
As the name suggests, the Octa has eight
cores. Or does it? Part of the SoC comprises a cluster of four Cortex-A15
cores. This represents the ‘big’ element. The A15 architecture is ARM’s mobile
flagship, and it's a monster. Across the board it's about 40% faster than the
Cortex-A9, its predecessor, cornerstone of the Galaxy S III’s Exynos 4 Quad and
dozens of other SoCs. A performance leap like that in a single generation is
remarkable.
One major change over the A9 is a deeper
pipeline up from eight stages to 15. A deeper pipeline typically causes a lower
IPC (instructions per cycle). That's bad, but it also typically grants a higher
clock, which in theory more than compensates. Oddly, though, if you look at the
table I've compiled, the Octa's frequency is barely higher than the Exynos 4
Quad's. Presumably, faster variants will follow.
For now, the Octa gets its grunt from other
A15 optimizations, such as a faster memory bus, improved branch prediction, and
another major design change: an additional execution unit (a pipeline by another
name). Compared to the A9, there are three instead of two. On paper, that's a
50% boost right there.
Little
An increase in performance usually requires
an increase in circuit complexity and the number of transistors. Most often
that results in higher power demands and heat (though this can be mitigated by
shrinking the fabrication node size). Under full load, the Exynos 4 Quad
consumes 4W, pretty much the viable maximum for a mobile SoC. As the table
shows, the ‘big’ element of the Exynos 5 Octa actually breaches 5W. It's more
power-hungry than a typical x86 Intel Atom in a netbook!
The
Exnos 5 Octa die
Awesome performance is worthless if you can
see your battery meter draining like pop from a bottle, so that brings us to
the Octa's ‘LITTLE’ element. This manifests itself as a cluster of four
Cortex-A7 cores. Somewhere on these pages you'll find schematics of the A7 and
A15 pipelines, and even a blind man can see that the A7's pipeline is vastly
simpler. The Octa die-shot also shows the immense size difference between the
A7 and A15 clusters (though a goodly chunk of the latter's footprint is the 2MB
of L2 cache).
Essentially, the Cortex-A7 is a tweaked
Cortex-A8, foundation of the Exynos 3 Single in the original Galaxy S and of
dozens of SoCs in current budget devices. As such, it features only two
execution units, and the pipeline is an in-order affair (it is in Intel's Atom
too). While this is far less per-cycle efficient than an out-of-order pipeline
(as found in the A9, A15, and most x86 CPUs since the mid-19905), it's also far
less complex. The A7 delivers nearly the same performance as the A8, even with
a much shorter pipeline, yet the four of them in the Octa consume only a
quarter of the power of one A8 on its own. ARM has really delivered the goods.
4 + 4 = 4
So, yes, the Exynos 5 Octa has eight cores,
but as you’ve probably guessed, only four – either the A7 cluster or the A15
cluster can be active at once. As usual, some marketing shenanigans are afoot.
Samsung
Exynos 5 Octa Processor
A similar, earlier technology exists in
Nvidia's current SoC, the Tegra 3. The marketing bods dropped a clinger,
though, as this is sold as a quad-core Cortex-A9 affair, yet the core-count is
actually five. The fifth core (also an A9) is designated a ‘companion’, and it
kicks in when a device's power demands are next to zero when the screen's off,
say, and only background tasks are running. The companion core is limited to
500MHz and, thanks to some manufacturing magic, operates at an ultra-low
voltage. Depending on the specific Tegra 3 model and the number of active
cores, upcoming Tegra 4 using the Cortex-A15 architecture, is also a 4-PLUS-1
design.
The
Lenovo’s A820
During such times, the quad-core cluster is
shut down. The concept works. I know only too well - after I've cast it aside
in despair when failing yet again at some level in Candy Crush Saga – that the
Nexus 7, Go0gIe's Tegra 3 tablet, can be left in standby for days on end and
still have a half-full battery meter when woken up.
ARM's big.LITTLE expands on this '4-PLUS-1'
philosophy. Clearly the Exynos 5 Octa has four low-draw cores, not just one. In
the Tegra 3 too, the companion core is merely auxiliary, with the quad-core
powerhouse handling the majority of duties, yet with the Octa, the cluster of
frugal A7 cores is actually the default. The monster A15 cluster takes control
only when huge performance is required.