Zen 2 Missives – 2019

Matthew Dillon

(I) The Socket Didn’t Have to Change

    After languishing through the CPU dark-ages (read: Intel trying real
    hard to keep people on 4 cores so they could charge an arm and a leg
    for more), the last two years has seen a vertible tsunami of advances in
    CPU technologies. AMD’s introduction of an 8-core CPU (the 1700X) began
    the rat-race and with both AMD and Intel now pushing high-core-count CPUs,
    the big winners here are us! The consumer, the power user, the
    programmer, the technically-oriented enthusiast. All of us are the
    winners of this new race.

    But all is not equal. The situation developing now is primarily related
    to power distribution and power consumption. Intel got caught with their
    pants down on multiple fronts… they got stuck on their 14nm node and
    could only produce minor improvements in power efficiency. And they also
    got stuck on a socket with power delivery capabilities a bit lower than
    they would have liked. This puts Intel in the unenviable situation of
    having to compete against AMD by introducing a higher core-count CPU
    (a 10-core) without a commesurate improvement in power efficiency, forcing
    a new socket and motherboard upgrade on the Intel world. Intel is taking
    power consumption way beyond what most people are actually going to be
    willing to push into their machines. It’s a doubly-whammy with Intel on
    the losing-end.

    AMD thought ahead. Their AM4 socket can handle tons more power. More
    importantly, AMD is now delivering, on 7nm, performance efficiencies
    that are nearly double that of Intel. This means that AMD not only does
    not have to change their socket, but their new CPUs will run just fine
    on just about ANY AM4 motherboard introduced in the last three years
    without even breaking a sweat.
    AMD is not going to have to impose a new socket until something major
    changes in the memory subsystem, such as a new memory standard that is
    incompatible with the DDR4 DIMM socket. Notice I didn’t say electrical,
    I said socket. Because the CPU is more or less directly wired to
    the DIMM slots.

    It is a common misconception that getting the most out of one of these
    new Zen 2 CPUs requires a high-end X570 motherboard, with beefier,
    higher-end VRMs and other beefy features. But as it turns out, this
    is not actually true. The reason is that power efficiency actually keeps
    even the high-end 3900X (and later the 3950X) within the power envelope
    that older motherboards (B350 and B450 mobos) can deliver. In the case
    of the B450, with room to spare. A 3900X gets 95% of its performance
    with just 110W in the socket and even low-end B450 motherboards can put
    150W into the socket. Low-end B350 motherboards can put at least 100W
    in the socket, which is close enough.

    Did I mention that you can just pop into the BIOS and set the power cap
    for the socket to whatever you want? Poor airflow? Old motherboard? It
    doesn’t matter. Well, sure, it does matter, just not as much as people
    seem to think.

(II) The Physics has Changed with These Smaller Nodes

    When you compare the TSMC 7nm node AMD’s Zen 2 CPUs are currently on with
    Intel’s 14nm+++(many pluses) node you will notice some major differences
    in how the physics of the node works. Intel’s 14nm node is relatively
    temperature-agnostic. When you overclock a 9900K you can take temps
    right up to the limit, continuing to push more power into the socket
    to get those high frequencies. Zen 2 on 7nm doesn’t work this way. On
    7nm, temperature has a direct correlation with frequency. And so on
    a Zen 2 system if you increase the voltage to push frequency you also
    wind up increasing the temperature which retards the maximum possible
    stable frequency. In otherwords, you can’t just push power
    into a Zen 2 cpu to get the overclocks you want. It doesn’t work.

    On a Zen 2 system the key to overclocking is lower temps, NOT higher power.
    Well, if you want to run the bleeding edge and you hit a hard stop on temps
    then sure, you can push more power into the socket (as long as you keep
    those temps hard-stopped), but this level of overkill just doesn’t net
    a whole lot more in the performance department.
    Strangely enough this means that you don’t actually need those beefy X570
    motherboards with 10+ phases to get a decent overclock, you just need a
    good cooler. I’ll bet a lot of people will start seriously thinking about
    going sub-ambient as well (at least down to +5C, since going negative has
    severe condensation issues). But for the rest of us mere mortals, a decent
    tower air cooler gets us almost as much overclock as a good water loop.

    When I say good what I mean is that if you really want to overclock you
    can easily get to 4.2 on air without pushing massive amounts of power.
    The absolute best overclock you will ever get on Zen 2 at non-destructive
    voltages (1.3V VCORE, approximately) will be around 4.4 GHz, all-cores.
    Higher than that and it won’t be stable or you will have to push too
    much voltage. The difference between 4.2 and 4.4 is only 5%. For most
    of us, 5% isn’t worth the massive investment in time, effort, and
    cooling. Overclocking Zen 2 is easy… you just leave it at stock
    settings and maybe bump up the socket power envelope a little, run XMP
    on your memory, and you are done, and you get it on any motherboard, even
    the cheap ones.

    A second major differences in the physics of the TSMC 7nm node verses
    Intel’s 14nm is the single-core boost. Whereas you can push more volts
    into a 9900K (almost) regardless of how many cores are loaded, the smaller
    7nm node cannot. High current all-cores boosts require a lower voltage
    to avoid destroying the chip due to electron momentum, whereas a low
    current single-core boost allows a higher voltage (up to 1.45V).

    So while Intel overclockers can always look for ways to push more power
    into the socket (at least on 14nm+++*), AMD overclocks wind up between
    a rock and a hard place fairly quickly with ambient cooling. Heat matters.
    The physics are just different. This isn’t a bad thing, by the way.
    It means that very high performance systems can be built more cheaply.

    Intel will be in the same boat soon enough, because this difference in
    physics seems to be a side effect of getting smaller. Intel’s 10nm node
    is rumored to be limited to 4.1 GHz or so. And later on we may see even
    more limitations on clocks. The only way to really scale from here on
    will be with IPC and by adding cores.

    Nobody is interesting in pushing 200W into their socket, which puts Intel
    at a dead-end on 14nm. They may try to push more cores on 14nm, but the
    power consumption makes it non-competitive. Remember that.

(III) Power Density, Caches, and IPC

    As nodes get smaller, power density increases. Radically. That is,
    the smaller size of the chips more then compensates for the improved
    power efficiency. From an absolute performance point of view we do get
    that power efficiency. But from a power-density point of view we do not.
    The transistors are packed more tightly. The power density is heading
    up and not down.

    Higher power densities mean higher temperature gradiants going from the
    transistors to the socket to the cooling solution. No amount of
    cooling can completely compensate for this gradient.

    Only lower frequencies can help here, and much larger caches. Why the
    larger caches? Because CPU caches have relatively low power densities.
    They take up chip real-estate but actually improve the overall power
    density problem. So every new architecture from here forwards is going
    to have much larger CPU caches. We’ve seen this with Zen 2 where each
    CPU chiplet (8 cores) sports 32MB of L3 cache. This means that the 3900X
    and the 3950X both have 64MB of L3 cache. A 64-core TR3 or EPYC will have
    256MB of L3 cache.

    Up until now, Intel has always sported small caches on their consumer
    chips. They dribble out 6MB here, 8MB there, and they reserve the large
    caches for their expensive Xeon behemoths… its a really poor showing
    by Intel, frankly. Increasing IPC requires increasing the CPU cache
    sizes, and possibly even adding an even larger L4 cache to the die.
    This is no longer a knob that Intel can shirk on, not with AMD putting
    64MB of L3 into its high-end consumer chips with Zen 2. As already
    indicatd, increasing cache size is one of the most important ways a
    CPU can scale IPC up.

(IV) I/O Infrastructure and Bandwidth in the New World

    Forget PCIe-v4, or v5, or v6… well, no. Don’t forget about them, they
    are important. Just not quite as important as people are probably
    thinking. The most important aspect of AMD’s X570 chipset is not the
    PCIe-v4 support it has going into the PCIe connectors, it’s the 4-lanes
    of (effectively) infinity-fabric (basically a rejiggered PCIe-v4) going
    from the CPU to the chipset. And it is the 20 lanes of PCIe-v4 heading
    out of the CPU just waiting to be fed into expanders.

    Why? Because very few PCIe cards actually need PCIe-v4’s bandwidth. Even
    the fancy new PCIe-v4 M.2 SSDs… its already overkill. What is important
    here is the land expansion that is possible, not pushing 5GBytes/sec from
    a single device.

    Paired with this vast new I/O capability is M.2 and U.2, embodying a
    wonderful new chipset standard called NVMe that Intel couldn’t
    intentionally hamstring like they did AHCI and SATA (in order to favor
    SAS). This plus SSDs could spell doom for consumer-vs-commercial
    separation of the SSD markets. For a SSD the only thing that matters
    in terms of market separation is its endurance… and endurance is a lot
    harder to gimick than an interface standard. Rejoice folks! The age of
    incredible I/O bandwidth has arrived and we are already being overwhelmed
    by it!

(V) Intel Will Catch-up, Consumers Will Still Win

    Don’t get me wrong, Intel will catch up. Eventually. But the days of
    Intel’s domination of the CPU are over. TSMC is not being bankrolled by
    AMD, they are being bankrolled by the likes of Apple, Google, and others.
    Samsung and TSMC both have a lot to lose if they get behind. Domination
    of the fabrication node is a monopoly that Intel has definitely lost.

    This means that from here on out the CPU race between AMD and Intel is
    going to remain relatively neck and neck. That is my belief anyhow.
    Intel is quickly losing its monopoly (it will still take half a decade
    for commercial market shares to equalize)… we are going to wind up
    with a 50-50 split.

    I would like to say that ARM is there waiting in the wings but… it
    really isn’t. The fab has become equal, but CPU design is still in the
    realm of the gods. It is a long, iterative process, and the ARM
    architecture (even the 64-bit arch) is still way behind in many respects.
    ARM is relevant, but I wouldn’t worry about a third wheel inserting itself
    in the AMD-Intel competition any time soon.

Read More


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.