Zen 2 Missives – 2019
(I) The Socket Didn’t Have to Change
After languishing through the CPU dark-ages (read: Intel trying real
hard to keep people on 4 cores so they could charge an arm and a leg
for more), the last two years has seen a vertible tsunami of advances in
CPU technologies. AMD’s introduction of an 8-core CPU (the 1700X) began
the rat-race and with both AMD and Intel now pushing high-core-count CPUs,
the big winners here are us! The consumer, the power user, the
programmer, the technically-oriented enthusiast. All of us are the
winners of this new race.
But all is not equal. The situation developing now is primarily related
to power distribution and power consumption. Intel got caught with their
pants down on multiple fronts… they got stuck on their 14nm node and
could only produce minor improvements in power efficiency. And they also
got stuck on a socket with power delivery capabilities a bit lower than
they would have liked. This puts Intel in the unenviable situation of
having to compete against AMD by introducing a higher core-count CPU
(a 10-core) without a commesurate improvement in power efficiency, forcing
a new socket and motherboard upgrade on the Intel world. Intel is taking
power consumption way beyond what most people are actually going to be
willing to push into their machines. It’s a doubly-whammy with Intel on
AMD thought ahead. Their AM4 socket can handle tons more power. More
importantly, AMD is now delivering, on 7nm, performance efficiencies
that are nearly double that of Intel. This means that AMD not only does
not have to change their socket, but their new CPUs will run just fine
on just about ANY AM4 motherboard introduced in the last three years
without even breaking a sweat.
AMD is not going to have to impose a new socket until something major
changes in the memory subsystem, such as a new memory standard that is
incompatible with the DDR4 DIMM socket. Notice I didn’t say electrical,
I said socket. Because the CPU is more or less directly wired to
the DIMM slots.
It is a common misconception that getting the most out of one of these
new Zen 2 CPUs requires a high-end X570 motherboard, with beefier,
higher-end VRMs and other beefy features. But as it turns out, this
is not actually true. The reason is that power efficiency actually keeps
even the high-end 3900X (and later the 3950X) within the power envelope
that older motherboards (B350 and B450 mobos) can deliver. In the case
of the B450, with room to spare. A 3900X gets 95% of its performance
with just 110W in the socket and even low-end B450 motherboards can put
150W into the socket. Low-end B350 motherboards can put at least 100W
in the socket, which is close enough.
Did I mention that you can just pop into the BIOS and set the power cap
for the socket to whatever you want? Poor airflow? Old motherboard? It
doesn’t matter. Well, sure, it does matter, just not as much as people
seem to think.
(II) The Physics has Changed with These Smaller Nodes
When you compare the TSMC 7nm node AMD’s Zen 2 CPUs are currently on with
Intel’s 14nm+++(many pluses) node you will notice some major differences
in how the physics of the node works. Intel’s 14nm node is relatively
temperature-agnostic. When you overclock a 9900K you can take temps
right up to the limit, continuing to push more power into the socket
to get those high frequencies. Zen 2 on 7nm doesn’t work this way. On
7nm, temperature has a direct correlation with frequency. And so on
a Zen 2 system if you increase the voltage to push frequency you also
wind up increasing the temperature which retards the maximum possible
stable frequency. In otherwords, you can’t just push power
into a Zen 2 cpu to get the overclocks you want. It doesn’t work.
On a Zen 2 system the key to overclocking is lower temps, NOT higher power.
Well, if you want to run the bleeding edge and you hit a hard stop on temps
then sure, you can push more power into the socket (as long as you keep
those temps hard-stopped), but this level of overkill just doesn’t net
a whole lot more in the performance department.
Strangely enough this means that you don’t actually need those beefy X570
motherboards with 10+ phases to get a decent overclock, you just need a
good cooler. I’ll bet a lot of people will start seriously thinking about
going sub-ambient as well (at least down to +5C, since going negative has
severe condensation issues). But for the rest of us mere mortals, a decent
tower air cooler gets us almost as much overclock as a good water loop.
When I say good what I mean is that if you really want to overclock you
can easily get to 4.2 on air without pushing massive amounts of power.
The absolute best overclock you will ever get on Zen 2 at non-destructive
voltages (1.3V VCORE, approximately) will be around 4.4 GHz, all-cores.
Higher than that and it won’t be stable or you will have to push too
much voltage. The difference between 4.2 and 4.4 is only 5%. For most
of us, 5% isn’t worth the massive investment in time, effort, and
cooling. Overclocking Zen 2 is easy… you just leave it at stock
settings and maybe bump up the socket power envelope a little, run XMP
on your memory, and you are done, and you get it on any motherboard, even
the cheap ones.
A second major differences in the physics of the TSMC 7nm node verses
Intel’s 14nm is the single-core boost. Whereas you can push more volts
into a 9900K (almost) regardless of how many cores are loaded, the smaller
7nm node cannot. High current all-cores boosts require a lower voltage
to avoid destroying the chip due to electron momentum, whereas a low
current single-core boost allows a higher voltage (up to 1.45V).
So while Intel overclockers can always look for ways to push more power
into the socket (at least on 14nm+++*), AMD overclocks wind up between
a rock and a hard place fairly quickly with ambient cooling. Heat matters.
The physics are just different. This isn’t a bad thing, by the way.
It means that very high performance systems can be built more cheaply.
Intel will be in the same boat soon enough, because this difference in
physics seems to be a side effect of getting smaller. Intel’s 10nm node
is rumored to be limited to 4.1 GHz or so. And later on we may see even
more limitations on clocks. The only way to really scale from here on
will be with IPC and by adding cores.
Nobody is interesting in pushing 200W into their socket, which puts Intel
at a dead-end on 14nm. They may try to push more cores on 14nm, but the
power consumption makes it non-competitive. Remember that.
(III) Power Density, Caches, and IPC
As nodes get smaller, power density increases. Radically. That is,
the smaller size of the chips more then compensates for the improved
power efficiency. From an absolute performance point of view we do get
that power efficiency. But from a power-density point of view we do not.
The transistors are packed more tightly. The power density is heading
up and not down.
Higher power densities mean higher temperature gradiants going from the
transistors to the socket to the cooling solution. No amount of
cooling can completely compensate for this gradient.
Only lower frequencies can help here, and much larger caches. Why the
larger caches? Because CPU caches have relatively low power densities.
They take up chip real-estate but actually improve the overall power
density problem. So every new architecture from here forwards is going
to have much larger CPU caches. We’ve seen this with Zen 2 where each
CPU chiplet (8 cores) sports 32MB of L3 cache. This means that the 3900X
and the 3950X both have 64MB of L3 cache. A 64-core TR3 or EPYC will have
256MB of L3 cache.
Up until now, Intel has always sported small caches on their consumer
chips. They dribble out 6MB here, 8MB there, and they reserve the large
caches for their expensive Xeon behemoths… its a really poor showing
by Intel, frankly. Increasing IPC requires increasing the CPU cache
sizes, and possibly even adding an even larger L4 cache to the die.
This is no longer a knob that Intel can shirk on, not with AMD putting
64MB of L3 into its high-end consumer chips with Zen 2. As already
indicatd, increasing cache size is one of the most important ways a
CPU can scale IPC up.
(IV) I/O Infrastructure and Bandwidth in the New World
Forget PCIe-v4, or v5, or v6… well, no. Don’t forget about them, they
are important. Just not quite as important as people are probably
thinking. The most important aspect of AMD’s X570 chipset is not the
PCIe-v4 support it has going into the PCIe connectors, it’s the 4-lanes
of (effectively) infinity-fabric (basically a rejiggered PCIe-v4) going
from the CPU to the chipset. And it is the 20 lanes of PCIe-v4 heading
out of the CPU just waiting to be fed into expanders.
Why? Because very few PCIe cards actually need PCIe-v4’s bandwidth. Even
the fancy new PCIe-v4 M.2 SSDs… its already overkill. What is important
here is the land expansion that is possible, not pushing 5GBytes/sec from
a single device.
Paired with this vast new I/O capability is M.2 and U.2, embodying a
wonderful new chipset standard called NVMe that Intel couldn’t
intentionally hamstring like they did AHCI and SATA (in order to favor
SAS). This plus SSDs could spell doom for consumer-vs-commercial
separation of the SSD markets. For a SSD the only thing that matters
in terms of market separation is its endurance… and endurance is a lot
harder to gimick than an interface standard. Rejoice folks! The age of
incredible I/O bandwidth has arrived and we are already being overwhelmed
(V) Intel Will Catch-up, Consumers Will Still Win
Don’t get me wrong, Intel will catch up. Eventually. But the days of
Intel’s domination of the CPU are over. TSMC is not being bankrolled by
AMD, they are being bankrolled by the likes of Apple, Google, and others.
Samsung and TSMC both have a lot to lose if they get behind. Domination
of the fabrication node is a monopoly that Intel has definitely lost.
This means that from here on out the CPU race between AMD and Intel is
going to remain relatively neck and neck. That is my belief anyhow.
Intel is quickly losing its monopoly (it will still take half a decade
for commercial market shares to equalize)… we are going to wind up
with a 50-50 split.
I would like to say that ARM is there waiting in the wings but… it
really isn’t. The fab has become equal, but CPU design is still in the
realm of the gods. It is a long, iterative process, and the ARM
architecture (even the 64-bit arch) is still way behind in many respects.
ARM is relevant, but I wouldn’t worry about a third wheel inserting itself
in the AMD-Intel competition any time soon.