ENTERPRISE

ARM’s Big Little At Large (Part 1)

8/14/2013 6:43:37 PM

We investigate the latest CPU power saving wizardry from the magicians at ARM

The pace of change in the phone and tablet markets is breathtaking. Three years ago, the flagship desktop CPU was the Core i7-2600K, and that's still a big-hitter even now. Subsequent Core i7 CPUs are little more than die-shrinks - a nip or tuck here, an incremental lift to GPU performance there.

Innovation seems to have stalled. In the mobile arena, however, innovation is like a rabid dog, off the leash, foaming at the mouth, snapping at any challenge. Over the last three years, phone and tablet CPUs have gone from single-core to dual-core to quad-core - and beyond. Clock speeds have nearly doubled. GPUs have seen similar advances too.

All this has occurred without major increases in power-draw. Going forward, though, keeping power-draw in check will prove increasingly tricky. The undisputed king of mobile CPU technology is ARM, and its current assault on the demon watt is a mechanism dubbed big.LlTTLE.

Big

The first SoC (system-on-chip) to implement big.LITTLE is Samsung’s Exynos 5 Octa. This is the beating heart of Samsung’s own Galaxy S4 smartphone, the most powerful handset on the shelves and the fasted-selling Android device in history.

Longer battery life

Longer battery life

Sadly, only specific versions of the S4 feature the Octa: the ‘international GT-I9500 (1.6GHz) and South Korean SHV-300K/L/S (1.8GHz). The USA gets the GT-I9505 and, against expectation, so does Europe, including the UK. That’s unfortunate, as the SoC there is the uninspiring Qualcomm Snapdragon 600 – a quad-core A9/A15 hybrid (1.9GHz), without big.LITTLE. Imported Octa models have already started appearing on eBay – at a hefty premium!

Samsung Exynos Processor

Samsung Exynos Processor

As the name suggests, the Octa has eight cores. Or does it? Part of the SoC comprises a cluster of four Cortex-A15 cores. This represents the ‘big’ element. The A15 architecture is ARM’s mobile flagship, and it's a monster. Across the board it's about 40% faster than the Cortex-A9, its predecessor, cornerstone of the Galaxy S III’s Exynos 4 Quad and dozens of other SoCs. A performance leap like that in a single generation is remarkable.

One major change over the A9 is a deeper pipeline up from eight stages to 15. A deeper pipeline typically causes a lower IPC (instructions per cycle). That's bad, but it also typically grants a higher clock, which in theory more than compensates. Oddly, though, if you look at the table I've compiled, the Octa's frequency is barely higher than the Exynos 4 Quad's. Presumably, faster variants will follow.

For now, the Octa gets its grunt from other A15 optimizations, such as a faster memory bus, improved branch prediction, and another major design change: an additional execution unit (a pipeline by another name). Compared to the A9, there are three instead of two. On paper, that's a 50% boost right there.

Little

An increase in performance usually requires an increase in circuit complexity and the number of transistors. Most often that results in higher power demands and heat (though this can be mitigated by shrinking the fabrication node size). Under full load, the Exynos 4 Quad consumes 4W, pretty much the viable maximum for a mobile SoC. As the table shows, the ‘big’ element of the Exynos 5 Octa actually breaches 5W. It's more power-hungry than a typical x86 Intel Atom in a netbook!

The Exnos 5 Octa die

The Exnos 5 Octa die

Awesome performance is worthless if you can see your battery meter draining like pop from a bottle, so that brings us to the Octa's ‘LITTLE’ element. This manifests itself as a cluster of four Cortex-A7 cores. Somewhere on these pages you'll find schematics of the A7 and A15 pipelines, and even a blind man can see that the A7's pipeline is vastly simpler. The Octa die-shot also shows the immense size difference between the A7 and A15 clusters (though a goodly chunk of the latter's footprint is the 2MB of L2 cache).

Essentially, the Cortex-A7 is a tweaked Cortex-A8, foundation of the Exynos 3 Single in the original Galaxy S and of dozens of SoCs in current budget devices. As such, it features only two execution units, and the pipeline is an in-order affair (it is in Intel's Atom too). While this is far less per-cycle efficient than an out-of-order pipeline (as found in the A9, A15, and most x86 CPUs since the mid-19905), it's also far less complex. The A7 delivers nearly the same performance as the A8, even with a much shorter pipeline, yet the four of them in the Octa consume only a quarter of the power of one A8 on its own. ARM has really delivered the goods.

4 + 4 = 4

So, yes, the Exynos 5 Octa has eight cores, but as you’ve probably guessed, only four – either the A7 cluster or the A15 cluster can be active at once. As usual, some marketing shenanigans are afoot.

Samsung Exynos 5 Octa Processor

Samsung Exynos 5 Octa Processor

A similar, earlier technology exists in Nvidia's current SoC, the Tegra 3. The marketing bods dropped a clinger, though, as this is sold as a quad-core Cortex-A9 affair, yet the core-count is actually five. The fifth core (also an A9) is designated a ‘companion’, and it kicks in when a device's power demands are next to zero when the screen's off, say, and only background tasks are running. The companion core is limited to 500MHz and, thanks to some manufacturing magic, operates at an ultra-low voltage. Depending on the specific Tegra 3 model and the number of active cores, upcoming Tegra 4 using the Cortex-A15 architecture, is also a 4-PLUS-1 design.

The Lenovo’s A820

The Lenovo’s A820

During such times, the quad-core cluster is shut down. The concept works. I know only too well - after I've cast it aside in despair when failing yet again at some level in Candy Crush Saga – that the Nexus 7, Go0gIe's Tegra 3 tablet, can be left in standby for days on end and still have a half-full battery meter when woken up.

ARM's big.LITTLE expands on this '4-PLUS-1' philosophy. Clearly the Exynos 5 Octa has four low-draw cores, not just one. In the Tegra 3 too, the companion core is merely auxiliary, with the quad-core powerhouse handling the majority of duties, yet with the Octa, the cluster of frugal A7 cores is actually the default. The monster A15 cluster takes control only when huge performance is required.

Other  
 
Video
Video tutorials
- How To Install Windows 8

- How To Install Windows Server 2012

- How To Install Windows Server 2012 On VirtualBox

- How To Disable Windows 8 Metro UI

- How To Install Windows Store Apps From Windows 8 Classic Desktop

- How To Disable Windows Update in Windows 8

- How To Disable Windows 8 Metro UI

- How To Add Widgets To Windows 8 Lock Screen

- How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010
programming4us programming4us
Top 10
Free Mobile And Desktop Apps For Accessing Restricted Websites
MASERATI QUATTROPORTE; DIESEL : Lure of Italian limos
TOYOTA CAMRY 2; 2.5 : Camry now more comely
KIA SORENTO 2.2CRDi : Fuel-sipping slugger
How To Setup, Password Protect & Encrypt Wireless Internet Connection
Emulate And Run iPad Apps On Windows, Mac OS X & Linux With iPadian
Backup & Restore Game Progress From Any Game With SaveGameProgress
Generate A Facebook Timeline Cover Using A Free App
New App for Women ‘Remix’ Offers Fashion Advice & Style Tips
SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
Popular Tags
Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone