Understanding modernUEFI-based platform boot


To many, the (UEFI-based) boot process is like voodoo; interesting in that it’s something that most of us use extensively but is – in a technical-understanding sense – generally avoided by all but those that work in this space.
In this article, I hope to present a technical overview of how modern PCs boot using UEFI (Unified Extensible Firmware Interface). I won’t be mentioning every detail – honestly my knowledge in this space isn’t fully comprehensive (and hence the impetus for this article-as-a-primer).

Also, I can be taken to task for being loose with some terminology but the general idea is that by the end of this long read, hopefully both the reader – and myself – will be able to make some sense of it all and have more than just a vague inkling about what on earth is going on in those precious seconds before the OS comes up.

This work is based on a combination of info gleaned from my own daily work as a security researcher/engineer at Microsoft, public platform vendor datasheets, UEFI documentation, some fantastic presentations by well-known security researchers + engineers operating in this space, reading source code and black-box research into the firmware on my own machines.
Beyond BIOS: Developing with the Unified Extensible Firmware Interfaceby Vincent Zimmer et al. is a far more comprehensive resource and I’d implore you to stop reading now and go and read that for full edification (I personally paged through to find the bits interesting to me). The only original bit that you’ll find below is all the stuff that I get wrong (experts; please feel free to correct me and I’ll keep this post alive with errata).

The code/data for most of what we’re going to be discussing below resides in flash memory (usually SPINOR). The various components are logically separated into a bunch of sections in flash, UEFI parts are in structures called Firmware Volumes (FVs). Going into the exact layout is unnecessary for what we’re trying to achieve here (an overview of boot), so I’ve left it out.

SEC

SEC Genesis 1
1 In the beginning, the firmware was created.
3 And the Power Management Controller said, Let there be light: and there was light.
2 And while it was with form, darkness was indeed upon the face of the deep.
  And the spirit of the BIOS moved upon the face of the flash memory.
4 And SEC saw the light, that it was good: and proceeded to boot.

Platform Initialization starts at power-on. The first phase in the process is called the SEC (Security) phase.

Before we dive in though, let’s back up for a moment.

Pre-UEFI

A number of components of modern computer platform design exist that would be pertinent for us to familiarize ourselves with.

Contrary to the belief of some, there are numerous units capable of execution on a modern PC platform (usually presenting with disparate architectures).
In days past, there were three main physically separate chips on a class motherboard – the northbridge (generally responsible for some of the perf-critical work such as the faster comms, memory controller, video), the southbridge (less-perf-critical work such as slower io, audio, various buses) and, of course, the CPU itself.
On modern platforms, the northbridge has been integrated on to the CPU die (IP blocks of which are termed the ‘Uncore’ by Intel, ‘Core’ being the main CPU IP blocks) leaving the southbridge; renamed the PCH (Platform Controller Hub) by Intel – something we’re just going to refer to as the ‘chipset’ here (to cover AMD as well).
Honestly, exactly which features are on the CPU die and which on the chipset die is somewhat fluid and is prone to change generationally (in SoC-based chips both are present on the same die; ‘chiplet’ approaches have separate dies but share the same chip substrate, etc).

Regardless, the pertinent piece of information here is that we have one unit capable of execution – the CPU that has a set of features, and another unit – the chipset that has another set of supportive features.
The CPU is naturally what we want to get up and running such that we can do some meaningful work, but the chipset plays a role in getting us there – to a smaller or larger extent depending on the platform itself (and Intel + AMD take slightly different approaches here).

Let’s try get going again; but this time I’ll attempt to lie a little less: after all, SEC is a genesis but not *the* Genesis.

That honour – on Intel platforms at least – goes to a component of the chipset (PCH) called the CSME (Converged Security and Manageability Engine). A full review of the CSME is outside the scope of this work (if you’re interested, please refer toYanai Moyal & Shai Hasarfaty’s BlackHat USA ’19 presentationon the same), but what’s relevant to us is its role in the platform boot process.

On power-on, the PMC (Power Management Controller) delivers power to the CSME (incidentally, the PMC has a ROM too – software is everywhere nowadays – but we’re not going to go down that rabbit hole). The CPU is stuck in reset and no execution is taking place over there. The CSME (which is powered by a tiny i486-like IP block), however, starts executing code from its ROM (which is immutably fused on to the chipset die). This ROM code acts as the Root-of-Trust for the entire platform. Its main purpose is to set up the i486 execution environment, derive platform keys, load the CSME firmware off the SPI flash, verify it (against a fused of an Intel public key) and execute it. Skipping a few steps in the initial CSME flow – eventually it gets itself to a state where it can involve itself in the main CPU boot flow (CSME Bringup phase).

Firstly the CSME implements an iTPM (integrated TPM) that can be used on platforms that don’t have discrete TPM chips (Intel calls this PTT – Platform Trust Technology). While the iTPM capabilities are invoked during the boot process (such as when Measured Boot is enabled), this job isn’t unique to the CSME and the existence of a dTPM module would render the CSME job here moot.

More importantly, is the CSME’s role in the boot process itself. The level of CSME involvement in the initial stages of host CPU execution depends on what security features are enabled on the platform.

In the most straightforward case (no Verified or Measured Boot – modes of Intel’s Boot Guard), the CSME simply asks the PMC to bring the host CPU out of reset and boot continues with IBB (Initial Boot Block) execution as will be expounded on further below.

When Boot Guard’s Verified Boot mode is enabled, however, a number of steps take place to ensure that the TCB (Trusted Computing Base) can extended to the UEFI firmware; a fancy way of saying that one component will only agree to execute the next one in the chain after cryptographic verification of that component has taken place (in the case of Verified Boot’s enforcement mode; if we’re just speaking Measured Boot, the verification takes place and TPM PCRs are extended accordingly, but the platform is allowed to continue to boot).

Let’s define some terms (Clark-Wilson integrity policy) because throwing academic terms into anything makes us look smart:

CDI (Constrained Data Item) – trusted data
UDI (Unconstrained Data Item) – untrusted data
TP (Transformation Procedure) – the procedure that will be applied to UDI to turn it into CDI; such as by certifying it with an IVP (Integrity Verification Procedure)

In other words, We take a block of untrusted data (UDI) which can be code/data/config/whatever, run it through a procedure (TP) in the trusted code (CDI) such that it turns that untrusted data to trusted data; the obvious method of transformation being cryptographic verification.

In other other words, trusted code verifies untrusted code and therefore that untrusted code now becomes trusted.

With that in mind, the basic flow is as follows:

  1. The CSME starts execution from the reset vector of its ROM code. The ROM is assumed to be CDI from the start and hence is the Root-of-Trust
  2. The initial parts of the CSME firmware are loaded off the flash into SRAM (UDI), verified by the ROM (now becoming CDI) and executed
  3. The CPU uCode (microcode) will be loaded. This uCode is considered UDI but is verified by the CPU ROM which acts as the Root-of-Trust for the CPU
  4. Boot Guard is enabled, so the uCode will load a component called the ACM (Authenticated Code Module) (UDI) off the flash, and will, using the CPU signing key (fused into the CPU), verify it
  5. The ACM (now CDI) will request the hash of the OEM IBB signing key from the CSME. The CSME is required here as it has access to the FPF (Field Programmable Fuses) which are burned by the OEM at manufacturing time
  6. The ACM will load the IBB (UDI) and verify it using the OEM key (now CDI).

The CPU knows if Boot Guard is enabled by querying the CSME FPFs for the Boot Guard policy.

Astute readers will notice that there is a type of ‘dual root-of-trust’ going on here; rooted in both the CSME and the CPU.

(Note: I’ve purposefully left out details of how the ACM is discovered on the flash; Firmware Interface Table, etc. as it adds further unnecessary complexity for now. I’ll consider fleshing this out in the future.)

The CPU now continues to boot by executing the IBB; either unverified (in no Boot Guard scenario) or verified. We are back at the other Genesis.

Hold up for a moment (making forward progress is tough, isn’t it??)!

Let’s speak about Measured Boot here for a short moment. In it’s simplest configuration, this feature basically means that at every step, each component will be measured into the TPM (such that it can be attested to in the future). When Measured Boot is enabled, an interesting possible point to note here:
A compromise of the CSME – in its Bringup phase – leads to a compromise of the entire TCB because an attacker controls the IBB signing keys provided to the CPU ACM. A machine that has a dTPM and doesn’t rely on the CSME-implemented iTPM for measurement, could still potentially detect this compromise via attestation. Not so when the iTPM is used (as the attacker controlling CSME potentially controls the iTPM as well).

Boot Guard (Measured + Verified Boot), IBB, OBB are Intel terms. In respect of AMD, their HVB (Hardware Validated Boot) covers the boot flow in a similar fashion to Intel’s Boot Guard.
The main difference seems to be that the Root-of-Trust is rooted in the Platform Security Processor (PSP) which fills both the role of Intel’s CSME and the ACM. The processor itself is ARM-Cortex-based and sits on the CPU die itself (and not in the PCH as in Intel’s case).
The PSP firmware is still delivered on the flash; it has it’s own BL – bootloader which is verified from the PSP on-die ROM, analogous to CSME’s Bringup stage. The PSP will then verify the initial UEFI code before releasing the host CPU from reset. AMD also don’t speak about IBB/OBB, rather they talk about ‘segments’, each responsible for verifying the next segment.

SEC

Ok, ok we’re at Genesis for real now!

But wait (again)! What’s this IBB (Initial Boot Block) thing? Wasn’t the SEC phase the start of it all (sans the actual start of it all, as above). All these terms aren’t confusing enough. At all.

I purposely didn’t open at the top with a ‘top down’ view the boot verification flow – instead opting to explain organically as we move forward.
We have, however, discussed the initial stage of Verified Boot. We now understand how trust is established in this IBB block (or first segment). We can quickly recap the general Verified Boot flow:

In short, as we have already established, the ACM (which is Intel’s code), verifies the IBB (OEM code). The IBB as CDI will be responsible for verifying the OBB (OEM Boot Block) UDI to transform it into a CDI. The OBB then verifies the next code to run (which is usually the boot manager or other optional 3rd part EFI images) – as part of UEFI Secure Boot.

So in terms of verification (with the help of CSME): uCode->ACM->IBB->OBB->whatevers-next

Generally, the IBB encapsulates the SEC + Pre-EFI Initialization (PEI) phases – the PEI FV (Firmware Volume).

(The SEC phase named as such but having relatively little to do with actual ‘security’.)

With no Verified Boot the CPU will start executing the SEC phase from the legacy reset vector (0xfffffff0); directly off the SPI flash (the hardware has the necessary IP to implement a transparent memory-mapped SPI interface to the flash. At the reset vector, the platform can execute only in a constrained state. For example, it has no concept of crucial components such as RAM. Kind of an issue if we want to execute meaningful higher-level code. It is also in Real Mode.

As such, one of the first jobs of SEC is switch the processor to Protected Mode (because legacy modes aren’t the funnest). It will also configure the memory available in the CPU caches into a CAR (Cache-as-RAM) ‘no-eviction mode’ – via MTRRs. This mode will ensure that and reads/writes to the CPU caches do not land up in an attempt to evict them to primary memory external to the chip. The constraint created here is that the available memory is limited to that available in the CPU caches, but this is usually quite large nowadays; the recent Ryzen 3900x chip that I acquired has a total of 70Mb; more than sufficient for holding the entire firmware image in memory + extra for execution environment usages (data regions, heaps + stacks); not that this is actually done.

Another important function of SEC is to perform the initial handling of the various sleep states that the machine could have resumed from and direct to alternate boot paths accordingly. This is absolutely out of scope for our discussion (super complex) – as is anything to do with ACPI; it’s enough to know that it happens (and has a measurable impact on platform security + attack surface).

And because we want to justify the ‘SEC’ phase naming, uCode updates can be applied here.

When executing the SEC from a Verified Boot flow (i.e. after ACM verification of the IBB), it seems to me that the CPU caches must already have been set up as CAR (perhaps by the ACM?); in an ideal world the entire IBB should already be cache-memory resident (if it was read directly off the flash after passing verification, we’d potentially have a physical attack TOCTOU security issue on our hands). I’d hope that the same is true on the AMD side.

After SEC is complete, platform initialization continues with the Pre-EFI Initialization phase (PEI). Each phase requires a hand-off to the next phase which includes a set of structured information necessary for the subsequent phase to do its job. In the case of the SEC, this information includes necessary vectors detailing where the CAR is located, where the BFV (Boot Firmware Volume) can be found mapped into a processor-accessible memory region and some other bits and bobs.

PEI

PEI is comprised of the PEI Foundation – a binary that has no code dependencies, and a set of Pre-EFI Initialization Modules (PEIMs).
The PEI Foundation (PeiCore) is responsible for making sure PEIMs can communicate with each other (via the PPI – PEIM-to-PEIM Interface) and a small runtime environment providing number of further services (exposed via the PEI Services Table) to those PEIMs. It also dispatches (invokes/executes) the PEIMs themselves.
The PEIMs are responsible for all aspects of base-hardware initialization, such as primary memory configuration (such that main RAM becomes available), CPU + IO initialization, system bus + platform setup and the init of various other features core to the functioning of a modern computing platform (such as the all-important BIOS status code). Some of the code running here is decidedly non-trivial (for example, I’ve seen a USB stack) and I’ve observed that there are more PEIMs than one would reasonably think there should be; on my MSI X570 platform I count ~110 modules!

I’d like to briefly call out the PEIM responsible for main memory discovery and initialization. When it returns to the PEI Foundation, it provides information about the newly-available primary system memory. The PEI Foundation must now switch from the ‘temporary’ CAR memory to the main system memory. This must be done with care (from a security perspective).

PEIMs can also choose to populate sequential data structures called HOBs (Hand-Off Blocks) which include information that may be necessary to consuming code further down the boot stack (e.g. in phases post-PEI). These HOBs must be resident in main system memory.

Before we progress to the next phase, I’d like to return to our topic of trust.

Theoretically, the PEI Foundation is expected to dispatch a verification check before executing any PEIM. The framework itself has no notion of how to establish trust, so it should delegate this to a set of PPIs (potentially serviced by other PEIMs). There is a chicken-and-egg issue here: if some component of the PEI phase should be responsible for establishing trust, what establishes trust in that component? This is all meaningless unless the PEI itself (or a subsection of it) is trusted.
As a reminder, though, we know that the IBB – which encapsulates the SEC+PEI (hopefully unless the OEM has messed this up) is verified and is trusted (CDI) when running under Verified Boot, therefore the PEI doesn’t necessarily need to perform its own integrity checks on various PEIMs; or does it? Here you can see the haze that becomes a source of confusion for OEMs implementing security around this – with all the good will in the world. If the IBB is memory resident and has been verified by the ACM and is untouched since verification, a shortcut can be taken and the PEI verifying PEIMs seems superfluous. If, however, PEIMs are loaded from flash as and when they’re needed, they need to be verified before execution and that verification needs to be rooted in the TCB already established by the ACM (i.e. the initial code that it verified as the IBB). If PEI code is XIP (eXecuted In Place), things are even worse and physical TOCTOU attacks become a sad reality.
Without a TCB established via a verified boot mechanism the PEI is self-trusted and becomes the Root-of-Trust. This is referred to as the CRTM – the Core Root of Trust for Measurement (the importance of which will become apparent when we eventually speak about Secure Boot). The PEI is measured into the TPM in PCR0 and can be attested to later on, but without a previously-established TCB, any old Joe can just replace the thing; remotely if the OEM has messed up the secure firmware update procedure or left the SPI flash writable. Oy.

Our flow is now almost ready to exit the PEI phase with the platform set up and have some ‘full-fledged’ code!

Next up is the DXE (Driver eXecution Environment) phase. Before entering DXE, PEI must perform two important tasks.
The first is to verify the DXE. In our Intel parlance, PEI (or at least the part of it responsible for trust) was part of the IBB that was verified by the Boot Guard ACM. Intel’s Boot Guard / AMDs HVB code has already exited the picture once the IBB (Intel)/1st segment (AMD) starts executing and the OEM is expected to take over the security flow from here (eh). PEI must therefore have some component to verify and measure the OBB/next phase (of which DXE is a part).

On platforms that support Boot Guard, a PEIM (may be named BootGuardPei in your firmware) is responsible for doing this work. This PEIM registers a callback procedure to be called when the PEI phase is ready to exit. When it is called, it is expected to bring the OBB resident and verify it. The same discussion applies to the DXE as did to the PEI above regarding verification of various DXE modules (we’ll discuss what these are shortly). If the entire OBB is brought resident and verified by this PEIM, the OEM may decide to shortcut verification of each DXE module. Alternatively a part of DXE can be made CDI and that can be used to verify each module prior to execution (bringing with it all the security considerations already mentioned). Either way; yet another part of the flow where the OEM can mess things up.

The second, and final, task of the PEI is to setup and execute the DXE environment.

Anyhoo, let’s get right to DXE.

DXE

Similar to PEI, DXE consists of a DXE Foundation – the DXE Core + DXE driver dispatcher (DxeCore) and a number DXE drivers. We can go down an entire rabbit hole around what’s available to, and exposed by, the DXE phase; yet another huge collection of code (this time I count ~240 modules on in my firmware). But as we’re not writing a book, I’ll leave it up to whoever’s interested to delve further as homework.

The DXE Foundation has access to the various PEIM-populated HOBs. These HOBs include all the information necessary to have the entire DXE phase function independently of what has come before it. Therefore, nothing (other than the HOB list) has to persist once DXE Core is up and running and DXE can happily blast over whatever is left of PEI in memory.

The DXE Dispatcher will discover and execute the DXE drivers available in the relevant firmware volume. These drivers are responsible for higher-level platform initialization and services. Some examples include the setting up of System Management Mode (SMM), higher-level firmware drivers such as network, boot disks, thermal management, etc. Similar to what the PEI Framework does for PEIMs, the DXE Framework exposes a number of services to DXE drivers (via the DXE Services Table). These drivers are able to register (and lookup+consume) various architectural protocols covering higher-level constructs such as storage, security, RTC, etc.

DXE Core is also responsible for populating the EFI System Table which includes pointers to the EFI Boot Services Table, EFI Runtime Services Table and EFI Configuration Table.

TheEFI Configuration Tablecontains a set of GUID/pointer pairs that correspond to various vendor tables identified by their GUIDs. It’s not really necessary to delve into these for the purposes of our discussion:

typedef struct {
  ///
  /// The 128-bit GUID value that uniquely identifies the system configuration table.
  ///
  EFI_GUID                          VendorGuid;
  ///
  /// A pointer to the table associated with VendorGuid.
  ///
  VOID                              *VendorTable;
} EFI_CONFIGURATION_TABLE;

TheEFI Runtime Services Tablecontains a number of services that are invokable for the duration of system runtime:

typedef struct {
  ///
  /// The table header for the EFI Runtime Services Table.
  ///
  EFI_TABLE_HEADER                Hdr;

  //
  // Time Services
  //
  EFI_GET_TIME                    GetTime;
  EFI_SET_TIME                    SetTime;
  EFI_GET_WAKEUP_TIME             GetWakeupTime;
  EFI_SET_WAKEUP_TIME             SetWakeupTime;

  //
  // Virtual Memory Services
  //
  EFI_SET_VIRTUAL_ADDRESS_MAP     SetVirtualAddressMap;
  EFI_CONVERT_POINTER             ConvertPointer;

  //
  // Variable Services
  //
  EFI_GET_VARIABLE                GetVariable;
  EFI_GET_NEXT_VARIABLE_NAME      GetNextVariableName;
  EFI_SET_VARIABLE                SetVariable;

  //
  // Miscellaneous Services
  //
  EFI_GET_NEXT_HIGH_MONO_COUNT    GetNextHighMonotonicCount;
  EFI_RESET_SYSTEM                ResetSystem;

  //
  // UEFI 2.0 Capsule Services
  //
  EFI_UPDATE_CAPSULE              UpdateCapsule;
  EFI_QUERY_CAPSULE_CAPABILITIES  QueryCapsuleCapabilities;

  //
  // Miscellaneous UEFI 2.0 Service
  //
  EFI_QUERY_VARIABLE_INFO         QueryVariableInfo;
} EFI_RUNTIME_SERVICES;

These runtime services are utilized by the OS to perm UEFI-level tasks. Some of the functionality provided by vectors available in the table above are mostly self-explanatory, e.g. thevariable servicesare used to read/write EFI variables – usually stored on in NV (non-volatile) memory – i.e. on the flash. (The Windows Boot Configuration Data (BCD) makes use of this interface for storing variable boot-time settings, for example)

TheEFI Boot Services Tablecontains a number of services that are invokable by EFI applications until such time as ExitBootServices() – itself an entry in this table – is called:

typedef struct {
  ///
  /// The table header for the EFI Boot Services Table.
  ///
  EFI_TABLE_HEADER                Hdr;

  //
  // Task Priority Services
  //
  EFI_RAISE_TPL                   RaiseTPL;
  EFI_RESTORE_TPL                 RestoreTPL;

  //
  // Memory Services
  //
  EFI_ALLOCATE_PAGES              AllocatePages;
  EFI_FREE_PAGES                  FreePages;
  EFI_GET_MEMORY_MAP              GetMemoryMap;
  EFI_ALLOCATE_POOL               AllocatePool;
  EFI_FREE_POOL                   FreePool;

  //
  // Event & Timer Services
  //
  EFI_CREATE_EVENT                  CreateEvent;
  EFI_SET_TIMER                     SetTimer;
  EFI_WAIT_FOR_EVENT                WaitForEvent;
  EFI_SIGNAL_EVENT                  SignalEvent;
  EFI_CLOSE_EVENT                   CloseEvent;
  EFI_CHECK_EVENT                   CheckEvent;

  //
  // Protocol Handler Services
  //
  EFI_INSTALL_PROTOCOL_INTERFACE    InstallProtocolInterface;
  EFI_REINSTALL_PROTOCOL_INTERFACE  ReinstallProtocolInterface;
  EFI_UNINSTALL_PROTOCOL_INTERFACE  UninstallProtocolInterface;
  EFI_HANDLE_PROTOCOL               HandleProtocol;
  VOID                              *Reserved;
  EFI_REGISTER_PROTOCOL_NOTIFY      RegisterProtocolNotify;
  EFI_LOCATE_HANDLE                 LocateHandle;
  EFI_LOCATE_DEVICE_PATH            LocateDevicePath;
  EFI_INSTALL_CONFIGURATION_TABLE   InstallConfigurationTable;

  //
  // Image Services
  //
  EFI_IMAGE_LOAD                    LoadImage;
  EFI_IMAGE_START                   StartImage;
  EFI_EXIT                          Exit;
  EFI_IMAGE_UNLOAD                  UnloadImage;
  EFI_EXIT_BOOT_SERVICES            ExitBootServices;

  //
  // Miscellaneous Services
  //
  EFI_GET_NEXT_MONOTONIC_COUNT      GetNextMonotonicCount;
  EFI_STALL                         Stall;
  EFI_SET_WATCHDOG_TIMER            SetWatchdogTimer;

  //
  // DriverSupport Services
  //
  EFI_CONNECT_CONTROLLER            ConnectController;
  EFI_DISCONNECT_CONTROLLER         DisconnectController;

  //
  // Open and Close Protocol Services
  //
  EFI_OPEN_PROTOCOL                 OpenProtocol;
  EFI_CLOSE_PROTOCOL                CloseProtocol;
  EFI_OPEN_PROTOCOL_INFORMATION     OpenProtocolInformation;

  //
  // Library Services
  //
  EFI_PROTOCOLS_PER_HANDLE          ProtocolsPerHandle;
  EFI_LOCATE_HANDLE_BUFFER          LocateHandleBuffer;
  EFI_LOCATE_PROTOCOL               LocateProtocol;
  EFI_INSTALL_MULTIPLE_PROTOCOL_INTERFACES    InstallMultipleProtocolInterfaces;
  EFI_UNINSTALL_MULTIPLE_PROTOCOL_INTERFACES  UninstallMultipleProtocolInterfaces;

  //
  // 32-bit CRC Services
  //
  EFI_CALCULATE_CRC32               CalculateCrc32;

  //
  // Miscellaneous Services
  //
  EFI_COPY_MEM                      CopyMem;
  EFI_SET_MEM                       SetMem;
  EFI_CREATE_EVENT_EX               CreateEventEx;
} EFI_BOOT_SERVICES;

These services are crucial for getting any OS boot loader up and running.

Trying to stick to the format of explaining the boot process via the security flow, we now need to speak about Secure Boot.

As ‘Secure Boot’ is often pandered around as the be-all and end-all of boot-time security, if you take anything away from reading this, please let is be an understanding that Secure Boot isnot all that is necessary for trusted platform execution. In fact, the part it plays, while crucial, is relatively minor considering everything we’ve learned about Measured+Verified Boot.

Simply put, Secure Boot is this:

Prior to execution of any EFI application, if Secure Boot is enabled, the relevant Secure Boot-implementing DXE driver (SecureBootDXE on my machine) must verify the executable image before launching that application. This requires a number of cryptographic keys:

  • PK – Platform Key: The platform ‘owner’ (alas usually the OEM) issues a key which is written into a secure EFI variable (these variable are only updatable if the update is attempted by an entity that can prove its ownership over the variable. We won’t discuss how this works here; just know: the security around this can bemeh). This key must only by used to verify the KEK
  • KEK – Key Exchange Key: One or more keys that are signed by the PK – used to update the current signature databases
  • dbx – Forbidden Signature Database: Database of entries (keys, signatures or hashes) that identify EFI executables that are blacklisted (i.e. forbidden from executing). The database is signed by the KEK
  • db – Signature Database: Database of entries (keys, signatures or hashes) that identify EFI executables that are whitelisted. The database is signed by the KEK
  • [Secure firmware update key: Outside the scope of this discussion]

For example, prior to executing any OEM-provided EFI applications or the Windows Boot Manager, the DXE code responsible for Secure Boot must first check that the EFI image either appears verbatim in thedbor is signed with a key present in thedb. Commercial machines often come with a OEM-provisioned PK and Microsoft’s KEK and CA already present in the db (much debate over how fair this is).

Important note: Secure Boot isnot designed to defend against an attacker with physical access to a machine(keys are, by design, replaceable).

BDS

The DXE phase doesn’t perform a formal hand-off to the next phase in the UEFI boot process, the BDS (Boot Device Selection) phase; rather DXE is still resident and providing both EFI Boot and EFI Runtime services to the BDS (via the tables described above).
What happens from here can be slightly different depending on what it is that we’re booting (if we’re running some simple EFI application as our end goal – we are basically done already). So let’s carry on our discussion in terms of Microsoft Windows.

As mentioned, when all the necessary DXE drivers have been executed, and the system is now ready to boot an operating system, the DXE code will attempt to launch a boot application. Boot menu entries are specified in EFI variables and EFI boot binaries are usually resident on the relevant EFI system partition. In order to discover + use the system partition, DXE must already (a) have a FAT driver loaded such that it can make sense of the file system (which is FAT-based) and (b) parse the GUID Partition Table (GPT) to discover where the system partition is on disk.

The first Windows-related code to run (ignoring any Microsoft-provided PEIMs or DXE drivers :)) is the Windows Boot Manager (bootmgrfw.efi). The Windows Boot Manager is the initial boot loader required to get Windows running. It uses the EFI Boot-time Service-provided block IO protocol to transact with the disk (such that it doesn’t need to mess around with working out how to communicate with the hardware itself). Mainly, it’s responsible for selecting the configured Windows boot loader and invoking it (but it does some other stuff like setting up device policies, checking if resuming from hibernation or recovery boot is needed, etc.).

TSL

Directly after BDS is done, we’ve got the TSL (Transient System Load) phase; a fancy way of describing the phase where the boot loader actually brings up the operating system and tears down the unnecessary parts of DXE.
In the Windows world, the Windows Boot Manager will now launch the Windows Boot Loader (winload.efi) – after performing the necessary verification (if Secure Boot is enabled).

The Windows Boot Loader is a heftier beast and is performs some interesting work. In the simplest – not-caring-about-anything-security-related – flow, winload.efi is responsible for initializing the execution environment such that the kernel can execute. This includes enabling paging and setting up the Kernel’s page tables, dispatch tables, stacks, etc. It also loads the SYSTEM registry hive (read-only, I believe) and the kernel module itself – ntoskrnl.exe (and once-upon-a-time hal.dll as well). Just before passing control to the NT kernel, winload will call ExitBootServices() to tear down the boot-time services still exposed from DXE (leaving just the runtime services available). SetVirtualAddressMap to virtualize the firmware services (i.e. informing the DXE boot-time service handler of the relevant virtual address mappings).

Carrying on with our theme of attempting to understand how trusted computing is enabled (now with Windows as the focus), on a machine with Secure Boot enabled, winload will of course only agree to load any images after first ensuring they pass verification policy (and measuring the respective images into the TPM, as necessary).

I’d encourage all Windows 10 users to enable ‘Core Isolation’ (available in the system Settings). This will enable HVCI (Hypervisor-enforced Kernel-mode code integrity) on the system; in turn meaning that the Microsoft HyperV hypervisor platform will run, such that VBS (Virtualization-based Security) features enabled by VSM (Virtual Secure Mode) will be available. In this scenario winload is responsible for bringing up the hypervisor, securekernel, etc. but that requires a separate post on its own (and others have done justice to it anyway).

RT

The kernel will now perform it’s own initialization and set up things just right – loading drivers etc; taking us to the stage most people identify as being the ‘end of boot’. The only EFI services still available to the OS are the EFI Runtime Services which the OS will invoke as necessary (e.g. when reading/writing UEFI variables, shutdown, etc.). This part of the UEFI life-cycle is termed the RT (RunTime).

SRTM/DRTM and rambling thoughts

We should now have a rudimentary understanding of the general boot flow.

I do want to back up a bit though and again discuss the verified boot flow and where the pitfalls can lie. Hopefully one can see how relatively complex this all is, and ensuring that players get everything correct is often a challenge. Everything that we’ve discussed until now is part of what we term the SRTM (Static Root-of-Trust for Measurement) flow. This basically means that, from the OS’s perspective, all parts of the flow up until it, itself, boots form part of the TCB (Trusted Computing Base).

Let’s dissect this for a moment.

The initial trust is rooted in the CPU+chipset vendor. In Intel’s case, we have the CPU ROM and the CSME ROM as joint roots-of-trust. Ok, Intel, AMD et. al. are pretty well versed in security stuff after all – perhaps we’re happy to trust they have done their jobs here (history says not; but it is getting better with time and hey, we’ve got to trust someone). But once the ACM verifies the IBB, we have moved responsibility to OEM vendor code. Now I’ll be charitable here and say that often this code is godawful from a security perspective. There is a significant amount of code (just count the PEIMs and DXE drivers) sourced from all over the place and often these folk simply don’t have the security expertise to implement things properly. The OEM-owned IBB measures the OEM-owned OBB which measures the OS bootloader. We might trust the OS vendor to also do good work here (again, not fool proof) but we have this black hole of potential security pitfalls for OEM vendors to collapse in to. And if UEFI is compromised, it doesn’t matter how good the OS bootloader verification flows are. Basically this thing is only as good as its weakest link; and that, traditionally, has been UEFI.

Let’s identify some SRTM pitfalls.
Starting with the obvious: if the CPU ROM or CSME/PSP ROMs are compromised, everything falls apart (same is of course true with DRTM, described below). I wish I could say that there aren’t issues here with specific vendors, but that would be disingenuous.
Let us assume for now the CPU folk have gotten their act together. We now land ourselves in the IBB or some segment verified by the ACM/PSP. The first pitfall is that the OEM vendor needs to actually present the correct parts of the firmware for verification by the ACM. Sometimes modules are left out and are just happily executed by the PEI (as they’ve also short-circuited PEIM verification). Worse, the ACM requires an OEM key to verify the IBB (hence why it needs the CSME in the first place) – some OEMs haven’t burned in their key properly or are using test keys are haven’t set the EOM (End of Manufacturing) fuse allowing carte blanche attacks against this (and even worse, lack of OEM action here can actually lead to these security features being hijacked to protect malicious code itself). OEMs need to be wary about making sure that when PEI switches over to main system memory a TOCTOU attack isn’t opened up by re-reading modules of SPI and assuming they are trusted. Furthermore, for verified boot to work, there needs to be some PEI module responsible for verifying DXE but if the OEM has stuffed up and the IBB isn’t verified properly at all, then this module can be tampered with and the flow falls apart. Oh and this OEM code could do the whole verification bit correctly and simply do a ‘I’ll just execute this code-that-failed verification and ask it to reset the platform because I’ll tell it that it, itself, failed verification’ (oh yes, this happened). And there are all those complex flows that I haven’t spoken about at all – e.g. resuming from sleep states and needing to protect the integrity of saved contexts correctly. Also, just enabling the disparate security features seems beyond some OEMs – even basic ones like ‘don’t allow arbitrary runtime writes to the flash’ are ignored.
Getting this correct requires deep knowledge of the space. For example, vendors have traditionally not considered the evil maid attack in scope and TOCTOU attacks have been demonstrated against the verification of both the IBB and OBB. Carrying on though, let’s assume that PEI is implemented correctly, what about the DXE module responsible for things down line? Has that been done correctly? Secure Boot has its own complexities, what with its key management and only allowing modification of authenticated UEFI variables by PK owners, etc. etc. I’m not going to go into every aspect of what can go wrong here but honestly we’ve seen quite a lot of demonstrable issues historically.
(To be clear, above I’m speaking about STRM in terms of what ususally goes on in most Windows-based machines today. There are SRTM schemes that do a lot better there – e.g.Google’s TitanandMicrosoft’s Cerberusin which a separate component is interposed between the host/chipset processors + the firmware residing on SPI.)

So folk got together and made an attempt to come up with a way to take the UEFI bits out of the TCB. Of course this code still needs to run; but we don’t really want to trust it for the purposes of verifying and loading the operating system. So DRTM (Dynamic Root-of-Trust for Measurement) was invented. In essence what this is, is a way to supply potentially multiple pieces of post-UEFI code for verification by the folk that we trust (more than the folk we trust less :)) – i.e. the CPU/chipset vendors (Intel/AMD et al). Instead of just relying on the Secure Boot flow which relies on the OEMs having implemented that properly, we just don’t care what UEFI does (it’s assumed compromised in the model). Just prior to executing the OS, we execute another ACM via special SENTER/SKINIT instructions (rooted in the CPU Root-of-Trust just like we had with Boot Guard verifying the IBB). This time we ask this ACM to measure a piece of OS-vendor (or other vendor) code called an MLE (Measured Launch Environment) – all measurements extended into PCRs of the TPM of course such that we can attest to them later on. This MLE – after verification by the ACM – is now trusted and can measure the various OS components etc. – bypassing trust in EFI.

Now here’s my concern:

I’ve heard folk get really excited about DRTM – and rightly so; it’s a step forward in terms of security. However I’d like to speak about some potential issues with the ‘DRTM solves all’ approach. My main concern is that we stop understanding that compromised UEFI can still possibly be damaging to practical security – even in a DRTM-enabled world. SMM is still an area of concern (although there are incoming architectural features that will help address this). But even disregarding SMM, the general purpose operating systems that most of the world’s clients + servers run on were designed in a era before our security field matured. Security has been tacked-on for years with increasing complexity. In a practical sense, even our user-modes still execute code that is privileged in the sense of being able to affect the exact damage on a targeted system that attackers are happy to live with (not to forget that our kernel-modes are pretty permissive as well). Remember, attackers don’t care about security boundaries or domains of trust; they just want to do what they need to do.

As an example, in recent history an actor known as Sednit/Strontium achieved a high degree of persistence on machines by installing DXE implants on platforms that hadn’t correctly secured programmatic write access to the SPI flash. Enabling Secure Boot is ineffectual as it only cares about post-DXE; compromised DXE means compromised Secure Boot. Enabling Measured/Verified Boot could *possibly* have helped in this particular case – if we trust the current UEFI code to do its job – but the confidence of that probably isn’t high being that these platform folk didn’t even disable write access to the SPI (and we’ve seen Boot Guard rendered ineffectual via DXE manipulation – something Sednit would have been able to do here). So let us assume that Sednit would have been able to get their DXE modules running even under Verified Boot. Anyway, the firmware implants attacked the system by mounting the primary NTFS partition, writing their malicious code to the drive, messing around with the SYSTEM registry hive to ensure that their service is loaded at early boot and… done! That’s all that’s necessary to compromise a system such that it can persist over an FnR (Format and Restore).
(BitLocker can also help defend against this particular type of attack in an SRTM flow; but that’s assuming that it’s enabled in the first place – and the CRTM is doing its job – and it’s not configured in an auto unlock mode).

Let’s take this scenario into a DRTM world. UEFI ‘Secure Boot’ compromise becomes somewhat irrelevant with DRTM – the MLE can do the OS verification flow; so that’s great. An attacker can no longer unmeasurably modify the OS boot manager, boot loader, kernel (code regions, at least), securekernel, hv with impunity. These are all very good things. But here’s the thing: UEFI is untrusted in the DRTM model, so in the threat model we assume that the attacker would be able to run their DXE code – like today. They can do that very same NTFS-write to get their user-mode service on to NTFS volume. Under a default Windows setup – sans a special application control / device guard policy, Windows will happily execute that attacker service (code-signing is completely ineffectual here – that’s a totally broken model for Windows user-mode applications).
Honestly though, a machine that enabled a DRTM flow would hopefully enable Bitlocker which should be more effectual here as I’d expect that the DRTM measurements are required before unsealing the Bitlocker decryption key (I’m not exactly sure of the implementation details around this), but I wonder to what extent compromised UEFI could mess around with this flow; perhaps by extending the PCRs itself with the values expected from DRTM measurements, unsealing the key, writing its stuff to the drive, reboot? Or, more complexly, launching it’s own hypervisor to execute the rest of the Windows boot flow under (perhaps trapping SENTER/SINIT, et. al.??). I’d need to give some thought as to what’s possible here, but given the complexity surrounding all this it’s not out of the realm of possibility that compromised UEFI can still be damaging.
Now, honestly, in terms of something Sednit-like writing an executable to disk, a more restrictive (e.g. state-separated) platform that is discerning about what user- and kernel-mode code it lets run (and with which privileges) might benefit significantly from DRTM – although it seems likely one can affect huge damage on an endpoint via corrupting settings files alone – no code exec required; something like how we view data-corruption via DKOM, except this time ‘DCM’ (Direct Configuration Manipulation)?? from firmware.

DRTM is *excellent*, I’m just cautioning against assuming that it’s an automatic ‘fix-all’ today for platform security issues. I feel more industry research around this may be needed to empirically verify the bounds of its merits.

I hope you enjoyed this ‘possibly-a-little-more-than-a-primer’ view into UEFI boot and some of the trust model that surrounds it. Some hugely important, related things I haven’t spoken about include SMM, OROM and secure firmware updates (e.g. Intel BIOS Guard); topics for another time. Please let me know if you found this useful at all (it did take a non-insignificant amount of time to write). Am always happy to receive constructive feedback too.

I’d like to end off with a few links to some offensive research work done by some fantastic folk that I’ve come across over the years – showing what can be done to compromise this flow (some of which I’ve briefly mentioned above). If you know of any more resources, please send them my way and I’ll happily extend this list:

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here