I posted a proposed countermeasure for the meltdown and spectre attacks to the freebsd-security mailing list last night. Having slept on it, I believe the reasoning is sound, but I still want to get input on it.
MAJOR DISCLAIMER: This is an idea I only just came up with last night, and it still needs to be analyzed. There may very well be flaws in my reasoning here.
Countermeasure: Non-Cacheable Sensitive Assets
The crux of the countermeasure is to move sensitive assets (ie. keys, passwords, crypto state, etc) into a separate memory region, and mark this non-cacheable using MTRRs or equivalent functionality on a different architecture. I’ll assume for now that the rationale for why this should work will hold.
This approach has two significant downsides:
- It requires modification of applications, and it’s susceptible to information leaks from careless programming, missing sensitive assets, old code, and other such problems.
- It drastically increases the cost to access sensitive assets (a main-memory access), which is especially punitive if you end up using sensitive asset storage as a scratchpad space
The upside of the approach is that it’s compatible with a move toward storing sensitive assets in secure memory or in special devices, such as a TPM, or the flash device I suggested in a previous post.
Programmatically, I could see this looking like a kind of “malloc with attributes” interface, one of the attributes being something like “sensitive”. I’ll save the API design for later, though.
In this rationale, I’ll borrow the terminology “transient operations” to refer to any instruction or portion thereof which is being executed, but whose effects will eventually be cancelled due to a fault. In architecture terminology, this is called “squashing” the operation. The rationale for why this will work hinges on three assumptions about how any processor pipeline and potential side-channel necessarily must work:
- Execution must obey dependency relationships (I can’t magically acquire data for a dependent computation, unless it’s cached somewhere)
- Data which never reaches the CPU core as input to a transient operation cannot make it into any side-channel
- CPU architects will squash all transient operations when a fault or mispredicted branch is discovered as quickly as possible, so as to recover execution units for use on other speculative branches
The meltdown attack depends on being able to execute transient operations that depend on data loaded from a protected address in order to inject information into a side-channel before a fault is detected. The cache and TLB states are critical to this process.For this analysis, assume the cache is virtually-indexed (see below for the physically-indexed cache case). Break down the outcomes based on whether the given location is in cache and TLB:
- Cache Hit, TLB Hit: You have a race between the TLB and cache coming back. TLBs are typically smaller, so they are unlikely to come back after the cache access. This will detect the fault almost immediately
- Cache Hit, TLB Miss: You have a race between a page-table walk (potentially thousands of cycles) and a cache hit. This means you get the data back, and have a long time to execute transient operations. This is the main case for meltdown.
- Cache Miss, TLB Hit: The cache fill operation strongly depends on address translation, which signals a fault almost immediately.
- Cache Miss, TLB Miss: The cache fill operation strongly depends on address translation, which signals a fault after a page-table walk. You’re stalled for potentially thousands of cycles, but you cannot fetch the data from memory until the address translation completes.
Note that both the cache-miss cases defeat the attack. Thus, storing sensitive assets in non-cacheable memory should prevent the attack. Now, if your cache is physically-indexed, then every lookup depends on an address translation, and therefore, fault detection, so you’re still safe.
A New Attack?
In my original posting to the FreeBSD lists, I evidently had misunderstood the Spectre attack, and ended up making up a whole new attack (!!) This attack is still defeated by the non-cacheable memory trick. The attack works as follows:
- Locate code in another address space, which can potentially be used to gather information about sensitive information
- Pre-load registers to force the code to access sensitive information
- Jump to the code, causing your branch predictors to soak up data about the sensitive information
- When the fault kicks you back out, harvest information from branch predictors
This is defeated by the non-cacheable store as well, by the same reasoning.
Aside: this really is a whole new class of attack.
High-Probability Defense Against the Spectre Attack
The actual spectre attack relies on causing speculative execution of other processes to cause cache effects which are detectable within our process. The non-cacheable store is not an absolute defense against this, however, it does defeat the attack with very high probability.
The reasoning here is that any branch of speculative execution is highly unlikely to last longer than a full main memory access (which is possibly thousands of cycles). Branch mispredictions in particular will likely last a few dozen cycles at most, and execution of the mispredicted branch will almost certainly be squashed before the data actually arrives. Thus, it can’t make it into any side-channel.
This is a potential defense against speculative execution-based side-channel attacks, which is based on restoring the dependency between fault detection and memory access to sensitive assets, and incurring a general access delay to sensitive assets.
This blocks any speculative branch which will eventually cause a fault from accessing sensitive information, since doing so necessarily depends on fault detection. This has the effect of defeating attacks that rely on speculative execution of transient operations on this data before the fault is detected.
This also defeats attacks which observe side-channels manipulated by speculative branches in other processes that can be made to access sensitive data, as the delay makes it extremely unlikely that data will arrive before the branch is squashed.
Assuming the reasoning here is sound I plan to start working on implementing this for FreeBSD immediately. Additionally, this defense is a rather coarse mechanism which repurposes MTRRs to effectively mark data as “not to be speculatively executed”. A new architecture, such as RISC-V can design more refined mechanisms. Finally, the API for this defense ought to be designed so as to provide a general mechanism for storage of sensitive assets in “a secure location” which can be non-cacheable memory, a TPM, a programmed flash device, or something else.