A Potential Meltdown/Spectre Countermeasure

I posted a proposed countermeasure for the meltdown and spectre attacks to the freebsd-security mailing list last night.  Having slept on it, I believe the reasoning is sound, but I still want to get input on it.

MAJOR DISCLAIMER: This is an idea I only just came up with last night, and it still needs to be analyzed.  There may very well be flaws in my reasoning here.

Countermeasure: Non-Cacheable Sensitive Assets

The crux of the countermeasure is to move sensitive assets (ie. keys, passwords, crypto state, etc) into a separate memory region, and mark this non-cacheable using MTRRs or equivalent functionality on a different architecture.  I’ll assume for now that the rationale for why this should work will hold.

This approach has two significant downsides:

  1. It requires modification of applications, and it’s susceptible to information leaks from careless programming, missing sensitive assets, old code, and other such problems.
  2. It drastically increases the cost to access sensitive assets (a main-memory access), which is especially punitive if you end up using sensitive asset storage as a scratchpad space

The upside of the approach is that it’s compatible with a move toward storing sensitive assets in secure memory or in special devices, such as a TPM, or the flash device I suggested in a previous post.

Programmatically, I could see this looking like a kind of “malloc with attributes” interface, one of the attributes being something like “sensitive”.  I’ll save the API design for later, though.

Rationale

In this rationale, I’ll borrow the terminology “transient operations” to refer to any instruction or portion thereof which is being executed, but whose effects will eventually be cancelled due to a fault.  In architecture terminology, this is called “squashing” the operation.  The rationale for why this will work hinges on three assumptions about how any processor pipeline and potential side-channel necessarily must work:

  1. Execution must obey dependency relationships (I can’t magically acquire data for a dependent computation, unless it’s cached somewhere)
  2. Data which never reaches the CPU core as input to a transient operation cannot make it into any side-channel
  3. CPU architects will squash all transient operations when a fault or mispredicted branch is discovered as quickly as possible, so as to recover execution units for use on other speculative branches

Defeating Meltdown

The meltdown attack depends on being able to execute transient operations that depend on data loaded from a protected address in order to inject information into a side-channel before a fault is detected.  The cache and TLB states are critical to this process.For this analysis, assume the cache is virtually-indexed (see below for the physically-indexed cache case).  Break down the outcomes based on whether the given location is in cache and TLB:

  • Cache Hit, TLB Hit: You have a race between the TLB and cache coming back.  TLBs are typically smaller, so they are unlikely to come back after the cache access.  This will detect the fault almost immediately
  • Cache Hit, TLB Miss: You have a race between a page-table walk (potentially thousands of cycles) and a cache hit.  This means you get the data back, and have a long time to execute transient operations.  This is the main case for meltdown.
  • Cache Miss, TLB Hit: The cache fill operation strongly depends on address translation, which signals a fault almost immediately.
  • Cache Miss, TLB Miss: The cache fill operation strongly depends on address translation, which signals a fault after a page-table walk.  You’re stalled for potentially thousands of cycles, but you cannot fetch the data from memory until the address translation completes.

Note that both the cache-miss cases defeat the attack.  Thus, storing sensitive assets in non-cacheable memory should prevent the attack.  Now, if your cache is physically-indexed, then every lookup depends on an address translation, and therefore, fault detection, so you’re still safe.

A New Attack?

In my original posting to the FreeBSD lists, I evidently had misunderstood the Spectre attack, and ended up making up a whole new attack (!!)  This attack is still defeated by the non-cacheable memory trick.  The attack works as follows:

  1. Locate code in another address space, which can potentially be used to gather information about sensitive information
  2. Pre-load registers to force the code to access sensitive information
  3. Jump to the code, causing your branch predictors to soak up data about the sensitive information
  4. When the fault kicks you back out, harvest information from branch predictors

This is defeated by the non-cacheable store as well, by the same reasoning.

Aside: this really is a whole new class of attack.

High-Probability Defense Against the Spectre Attack

The actual spectre attack relies on causing speculative execution of other processes to cause cache effects which are detectable within our process.  The non-cacheable store is not an absolute defense against this, however, it does defeat the attack with very high probability.

The reasoning here is that any branch of speculative execution is highly unlikely to last longer than a full main memory access (which is possibly thousands of cycles).  Branch mispredictions in particular will likely last a few dozen cycles at most, and execution of the mispredicted branch will almost certainly be squashed before the data actually arrives.  Thus, it can’t make it into any side-channel.

Summary

This is a potential defense against speculative execution-based side-channel attacks, which is based on restoring the dependency between fault detection and memory access to sensitive assets, and incurring a general access delay to sensitive assets.

This blocks any speculative branch which will eventually cause a fault from accessing sensitive information, since doing so necessarily depends on fault detection.  This has the effect of defeating attacks that rely on speculative execution of transient operations on this data before the fault is detected.

This also defeats attacks which observe side-channels manipulated by speculative branches in other processes that can be made to access sensitive data, as the delay makes it extremely unlikely that data will arrive before the branch is squashed.

Future

Assuming the reasoning here is sound I plan to start working on implementing this for FreeBSD immediately.  Additionally, this defense is a rather coarse mechanism which repurposes MTRRs to effectively mark data as “not to be speculatively executed”.  A new architecture, such as RISC-V can design more refined mechanisms.  Finally, the API for this defense ought to be designed so as to provide a general mechanism for storage of sensitive assets in “a secure location” which can be non-cacheable memory, a TPM, a programmed flash device, or something else.

Meltdown/Spectre and Implications for OS Security

This week has seen the disclosure of a devastating vulnerability: meltdown and its sister vulnerability, spectre.  Both are a symptom of a larger problem that has evidently gone unnoticed for the better part of 50 years of computer architecture: the potential for side-channels in processor pipeline designs.

Aside: I will confess that I am rather ashamed that I didn’t notice this, despite having a background in all the knowledge areas that should have revealed it.  It took me all of three sentences of the paper before I realized what was going on.  Then again, this somehow got by every other computer scientist for almost half a century.  The conclusion seems to be that we are, all of us, terrible at our jobs…

A New Class of Attack, and Implications

I won’t go in to the details of the attacks themselves; there are already plenty of good descriptions of that, not the least of which is the paper.  My purpose here is to analyze the broader implications with particular focus on the operating systems security perspective.

To be absolutely clear, this is a very serious class of vulnerabilities.  It demands immediate and serious attention from both hardware architects and OS developers, not to mention people further up the stack

The most haunting thing about this disclosure is that it suggests the existence of an entire new class of attack, of which the meltdown/spectre attacks are merely the most obvious.  The problem, which has evidently been lurking in processor architecture design for half a century has to do with two central techniques in processor design:

  1. The fact that processors tend to enhance performance by making commonly-executed things faster.  This is done by a whole host of features: memory caches, trace caches, and branch predictors, to name the more common techniques.
  2. The fact that processor design has followed a general paradigm of altering the order in which operations are executed, either by overlapping multiple instructions (pipelining), executing multiple instructions in parallel (superscalar), executing them out of order and then reconciling the results (out-of-order), and executing multiple possible outcomes and keeping only one result (speculative).

The security community is quite familiar with the impacts of (1): it is the basis for a host of side-channel vulnerabilities.  Much of the work in computer architecture deals with implementing the techniques in (2) while maintaining the illusion that everything is executed one instruction at a time.  CPUs are carefully designed for this with regard to processor state according to the ISA, carefully tracking which operations depend on which and keeping track of the multiple possible worlds that arise from speculation.  What evidently got by us is that the side-channels provided by (1) provide an attacker with the ability to violate the illusion of executing one instruction at a time.

What this almost certainly means is that this is not the last attack of this kind we will see.  Out-of-order speculative execution happens to be the most straightforward to attack.  However, I can sketch out ways in which even in-order pipelines could potentially be vulnerable.  When we consider more obscure architectural features such as trace-caches, branch predictors, instruction fusion/fission, and translation units[0], it becomes almost a certainty that we will see new attacks in this family published well into the future.

A complicating factor in this is that CPU pipelines are extraordinarily complicated, particularly these days.  (I recall a tech talk I saw about the SPARC processor, and its scope was easily as large as the FreeBSD core kernel, or the HotSpot virtual machine.)  Moreover, these designs are typically closed.  There are a lot of different ways a core can be designed, even for a simple ISA, and we can expect entire classes of these design decisions to be vulnerable.

 

[0]: The x86 architecture tends to have translation units that are significantly more complex than the cores themselves, and are likely to be a nest of side-channels.

OS Security

The implications of this for OS (and even application) security are rather dismal.  The one saving virtue in this is that these attacks are read-only: they can be used to exfiltrate data from any memory immediately accessible to the pipeline, but it’s a pretty safe assumption that they cannot be used to write to memory.

That boon aside, this is devastating to OS security.  As a kernel developer, there aren’t any absolute defenses that we can implement without a complete overhaul of the entire paradigm of modern operating systems and major revisions to the POSIX specification.  (Even then, it’s not certain that this would work, as we can’t know how the processors implement things under the hood.)

Therefore, OS developers are left with a choice among partial solutions.  We can make the attacks harder, but we cannot stop them in all cases.  The only perfect defense is to replace the hardware, and hope nobody discovers a new side-channel attack.

Attack Surface and Vulnerable Assets

The primary function of these kinds of attacks is to exfiltrate data across various isolation boundaries.  The following is the most general description of the capabilities of the attack, as far as I can tell:

  1. Any data that can be loaded into a CPU register can potentially be converted into the execution of some number of events before a fault is detected
  2. These execution patterns can affect the performance of future operations, giving rise to a side-channel
  3. These side-channels can be read through various means (note that this need not be done by the same process)

This gives rise to the ability to violate various isolation boundaries (though only for the purposes of observing state):

  • Reads of kernel/hypervisor memory from a guest (aka “meltdown”)
  • Reads of another process’ address space (aka “spectre”)
  • Reads across intra-process isolation mechanisms, such as different VM instances (this attack is not named, but constitutes, among other things, a perfect XSS attack)

A salient feature of these attacks is that they are somewhat slow: as currently described, the attack will incur 2n + 1 cache faults to read out n bits (not counting setup costs).  I do not expect this to last, however.

The most significant danger of this attack is the exfiltration of secret data, ideally small and long-lived data objects.  Examples of key assets include:

  • Master keys for disk encryption (example: the FreeBSD GELI disk encryption scheme)
  • TLS session keys
  • Kerberos tickets
  • Cached passwords
  • Keyboard input buffers, or textual input buffers

More transient information is also vulnerable, though there is an attack window.  The most vulnerable assets are those keys which necessarily must remain in memory and unencrypted for long periods of time.  The best example of this I can think of is a disk encryption key.

Imperfect OS Defense Mechanisms

Unfortunately, operating systems are limited in their ability to adequately defend against these attacks, and moreover, many of these mechanisms incur a performance penalty.

Separate Kernel and User Page Tables

The solution adopted by the Linux KAISER patches is to unmap most of the kernel from user space, keeping only essential CPU metadata and the code to switch page tables mapped.  Unfortunately, this requires a TLB flush for every system call, incurring a rather severe performance penalty.  Additionally, this only protects from access to a kernel address; it cannot stop accesses to other process address spaces or crossing isolation boundaries within a process.

Make Access Attempts to Kernel Memory Non-Recoverable

An idea I proposed on the FreeBSD mailing lists is to cause attempted memory accesses to a kernel address range result in an immediate cache flush followed by non-recoverable process termination.  This avoids the cost of separate kernel and user page tables, but is not a perfect defense in that there is a small window of time in which the attack can be carried out.

Special Handling of Sensitive Assets

Another potential middle-ground is to handle sensitive kernel assets specially, storing them in a designated location in memory (or better yet, outside of main memory).  If a main-memory store is used, this range alone can be mapped and unmapped when crossing into kernel space, thus avoiding most of the overhead of a TLB flush.  This would only protect assets that are stored in this manner; however.  Anything else (most notably the stacks for kernel threads) would remain vulnerable.

Userland Solutions

Userland programs are even less able to defend against such attacks than the kernel; however, there are architectural considerations that can be made.

Avoid Holding Sensitive Information for Long Periods

As with kernel-space sensitive assets, one mitigation mechanism is to avoid retaining sensitive assets for long periods of time.  An example of this is GPG, which keeps the users’ keys decrypted only for a short period of time (usually 5 minutes) after they request to use a key.  While this is not perfect, it limits attacks to a brief window, and presents users with the ability create practices which ensure that they shut off other means of attack during this window.

Minimize Potential for Remote Execution

While this only works on systems that are effectively single-user, minimizing the degree to which remote code can execute on ones system closes many vulnerabilities.  This is obviously difficult in the modern world, where most websites pull in Javascript from dozens if not hundreds of sources, but it can be done with browser plugins like noscript.

JIT/Interpreter Mitigations

As JIT compilers and interpreters are a common mechanism for limited execution of remote code, they are in a position to attempt to mitigate many of these attacks (there is an LLVM patch out to do this).  This is once again an imperfect defense, as they are on the wrong side of Rice’s theorem.  There is no general mechanism for detecting these kinds of attacks, particularly where multiple threads of execution are involved.

Interim Hardware Mitigations

The only real solution is to wait for hardware updates which fix the vulnerabilities.  This take time, however, and we would like some interim solution.

One possible mitigation strategy is to store and process sensitive assets outside of the main memory.  HSMs and TPMs are an example of this (though not one I’m apt to trust), and the growth of the open hardware movement offers additional options.  One in particular- which I may put into effect myself -uses a small FPGA device (such as the PicoEVB) programmed with an open design as such a device.

More generally, I believe this incident highlights the value of these sorts of hardware solutions, and I hope to see increased interest in developing open implementations in the future.

Summary

The meltdown and spectre attacks- severe as they are by themselves -represent the beginning of something much larger.  A new class of attack has come to the forefront, which enables data exfiltration in spite of hardware security mechanisms, and we can and should expect to see more vulnerabilities of this kind in the future.  What makes these vulnerabilities so dangerous is that they directly compromise hardware protection mechanisms on which OS security depends, and thus there is no perfect defense at the OS level against them.

What can and should be done is to adapt and rethink security approaches at all levels.  At the hardware level, a paradigm shift is necessary to avoid vulnerabilities of this kind, as is the consideration that hardware security mechanisms are not as absolute as has been assumed.  At the OS level, architectural changes and new approaches can harden an OS against these vulnerabilities, but they generally cannot repel all attacks.  At the user level, similar architectural changes as well as minimization of attack surfaces can likewise reduce the likelihood and ease of attack, but cannot repel all such attacks.

As a final note, the trend in information security has been increasing focus on application, and particularly web application security.  This event, as well as the fact that a relatively simple vulnerability managed to evade detection by the entire field of computer science for half a century strongly suggests that systems security is not at all the solved problem it is commonly assumed to be, and that new thinking and new approaches are needed in this area.

Design of a Trust System for FreeBSD

About a month ago, I started a discussion on freebsd-hackers and freebsd-security about a system for signed executables, with a focus on signed kernels and kernel modules.  This is part of a larger agenda of mine to equip FreeBSD with OS-level tamper resistance features.

While the initial use of this is for signing the kernel and its modules, and checking signatures during the loader process as well as at runtime when kernel modules are loaded.  However, it is desirable to build a system that is capable of growing in likely directions, such as executable and library signing.

This article details the current state of the design of this system.

Desiderata

I originally outlined a number of goals for this system:

  1. Be able to check for a correct cryptographic signature for any kernel or modules loaded at boot time for some platforms (EFI at a minimum)
  2. Be able to check for a correct cryptographic signature for any kernel module loaded during normal operations (whether or not to do this could be controlled by a sysctl, securelevel, or some similar mechanism)
  3. Work with what’s in the base system already and minimize new additions (ideally, just a small utility to sign executables)
  4. Minimize administrative overhead and ideally, require no changes at all to maintain signed kernel/modules
  5. Have a clear path for supporting signed executables/libraries.
  6. The design must support the case where a system builds locally and uses its own key(s) for signing kernels and modules (and anything else) and must allow the administrator complete control over which key(s) are valid for a given system (ie. no “master keys” controlled by central organizations)
  7. The design must allow for the adoption of new ciphers (there is an inevitable shift to post-quantum ciphers coming in the near future)

I also specified a number of non-goals:

  • Hardware/firmware-based attacks are considered out-of-scope (there is no viable method for defending against them at the OS level)
  • Boot platforms that don’t provide their own signature-checking framework up to loader/kernel can’t be properly secured, and are considered out-of-scope
  • Boot platforms that impose size restrictions prohibiting incorporation of RSA and ED25519 crypto code (ex. i386 BIOS) are considered out-of-scope
  • GRUB support is desirable, however it is not necessary to support GRUB out-of-the-box (meaning a design requiring reasonable modifications to GRUB is acceptable

Considerations

There are several considerations that should weigh in on the design.

FreeBSD Base System

Unlike linux, FreeBSD has a base operating system which contains a number of tools and libraries which provide a set of operating system utilities.  Most notably, the base system contains the OpenSSL (or in some cases, LibreSSL) crypto suite.  This includes an encryption library as well as tools capable of creating and managing key-pairs and other cryptographic data in a variety of formats.

Additionally, the FreeBSD base system contains libelf, which is a library that provides mechanisms for manipulating ELF binaries.  Additionally, the base system provides the binutils suite, including objcopy, which are command-line tools capable of manipulating ELF binaries.

Note that only some of these components (namely the signelf tool) exist at the present; the rest of the components exist only as man pages that describe them at present.

Public-Key Cryptography

The FreeBSD kernel does not currently incorporate code for public-key cryptography, and direct incorporation of OpenSSL into the kernel has proven infeasible.  Additionally, parsing code needs to be incorporated into the kernel for any formats that are used.  Options here include incorporation of code from the NaCl library, which provides a very lightweight implementation of both RSA 4096 and Ed25519, as well as creating a minimal library out of code harvested from OpenSSL or LibreSSL.

A note on elliptic curve cryptography: the state of support for safe elliptic curves is sad.  In my drafts of the man pages, I have mandated that the only acceptable curves are those that satisfy the security properties described by the SafeCurves project.  At this time, these include M-221, E-222, Curve1174, Curve25519, E-382, M-383, Curve383187, Curve41417, Goldilocks-448, M-511, and E-521.  Unfortunately, none of these is supported by OpenSSL at this time, though Curve25519 support is supposedly coming soon.  However, I would prefer to write specs that mandate the right curves (and thus put pressure on crypto libraries) than cave to using bad ones.

Modifications to GRUB

GRUB provides the best option for FreeBSD coreboot support at this time.  It also provides an existing mechanism for signing binaries.  However, this mechanism is deficient in two ways.  First, it relies on external signatures, which would complicate administration and require modification of virtually all installer programs, as well as run the risk of stale signatures.  Second, it relies on the gnupg toolset, which is not part of the FreeBSD base system.  Thus, it is inevitable that GRUB will need to be patched to support the signed executables proposed by this design.  However, we should make efforts to keep the necessary changes as minimal as possible.

Signing and Trust System Design

The signing and trust system consists of a number of components, some of which are standards, some of which are interfaces, and some of which are tools.  The core feature, of course, is the signed ELF convention.  The signelf tool provides a one-stop tool for signing large numbers of executables.  The trust system provides a system-level mechanism for registering and maintaining verification keys that are used to check signatures on kernel modules.  Finally, the portable verification library provides a self-contained code package that can be dropped into the kernel, the loader, or a third-party codebase like GRUB.

Note that this design is not yet implemented, so it may be subject to change.  Also, it has not yet undergone review on the FreeBSD lists, so it should be considered more of a proposal.

Signed ELF Binaries

The ELF format is very flexible, and provides a generic mechanism for storing metadata.  The signed ELF convention utilizes this to store signatures in a special section within the binary itself.  A signed ELF binary contains a section named .sign, which contains a detached PKCS#7 signature in DER encoding for the file.  This signature is computed (and checked) on the entire file, with the .sign section itself being replaced by zero data of equal size and position.

Signing an ELF binary is somewhat involved, as it requires determining the size of a signature, creating a new section (along with its name), recomputing the ELF layout, computing the signature, and writing it into the section.  Checking a signature is considerably simpler: it involves merely copying the signature, overwriting the .sign section with zeros, and then checking the signature against the  entire file.

The PKCS#7 format was chosen because it is an established standard which supports detached signatures as well as many other kinds of data.  The signatures generated for signed ELF files are minimal and do not contain certificates, attributes, or other data (a signature for RSA-4096 is under 800 bytes); however, the format is extensible enough to embed other data, allowing for future extensions.

The signelf Tool

Signed ELF binaries can be created and checked by adroit usage of the objcopy and openssl command-line tools.  This is quite tedious, however.  Moreover, there are certain use cases that are desirable, like signing a batch of executables using an ephemeral key, discarding the key, and generating a certificate for verification.  The signelf tool is designed to be a simplified mechanism for signing batches of executables which provides this additional functionality.  It is a fairly straightforward use of libelf and OpenSSL, and should be able to handle the binaries produced by normal compilation.  Additionally, the signelf tool can verify signed ELF files.  The signelf code is currently complete, and works on a kernel as well as modules.

The Trust System

In order to check signatures on kernel modules (and anything else), it is necessary to establish and maintain a set of trusted verification keys in the kernel (as well as in the boot loader).  In order for this system to be truly secure, at least one trust root key must be built into the kernel and/or the boot loader, which can then be used to verify other keys.  The trust system refers to the combination of kernel interfaces, standard file locations, and conventions that manage this.

System Trust Keys and Signing Keys

The (public) verification keys used to check signatures as well as the (private) signing keys used to generate signatures are kept in the /etc/trust/ directory.  Verification keys are stored in /etc/trust/certs, in the X509 certificate format, and private keys are stored in /etc/trust/keys in the private key format.  Both are stored in the PEM encoding (as is standard with many OpenSSL applications).

There is no requirement as to the number, identity, or composition of verification or signing keys.  Specifically, there is not and will never be any kind of mandate for any kind of verification key not controlled by the owner of the machine.  The trust system is designed to be flexible enough to accommodate a wide variety of uses, from machines that only trust executables built locally, to ones that trust executables built on an in-house machine only, to those that trust executables built by a third party (such as the FreeBSD foundation), or any combination thereof.

The preferred convention, however, is to maintain a single, per-machine keypair which is then used to sign any additional verification keys.  This keypair should be generated locally for each machine, and never exported from the machine.

Trust Keys Library

Keys under /etc/trust/certs will be converted into C code constants and subsequently compiled into a static library providing the raw binary data for the keys during the buildworld process.  This provides the mechanism for building keys into the kernel, loader, and other components.  These keys are known as trust root keys, as they provide the root set for all trusted keys.

Kernel Trust Interface

The kernel trust interface provides access to the set of verification keys trusted by the kernel.  This consists of an in-kernel interface as well as a user-facing device interface.  The in-kernel interface looks like an ordinary key management system (KMS) interface.  The device interface provides two primary mechanisms: access to the current set of trusted keys and the ability to register new keys or revoke existing ones.

Access to the existing database is accomplished through a read-only device node which simply outputs all of the existing trusted keys in PEM-encoded X509 format.  This formatting allows many OpenSSL applications to use the device node itself as a CA root file.  Updating the key database is accomplished by writing to a second device node.  Writing an X509 certificate signed by one of the existing trusted keys to this device node will cause the key contained in the certificate to be added to the trusted key set.  Writing a certificate revocation list (CRL) signed by a trusted key to the device node will revoke the keys in the revocation list as well as any keys whose signature chains depend on them.  Trust root keys cannot be revoked, however.

This maintains the trusted key set in a state where any trusted key has a signature chain back to a trust root key.

Portable Verification Library

The final piece of the system is the portable verification library.  This library should resemble a minimal OpenSSL-like API that performs parsing/encoding of the necessary formats (PKCS#7, X509, CRL), or a reduced subset thereof and public-key signature verification.  I have not yet decided whether to create this from harvesting code from OpenSSL/LibreSSL or write it from scratch (with code from NaCl), but I’m leaning toward harvesting code from LibreSSL.

Operation

The trust system performs two significant roles in the system as planned, and can be expanded to do more things in the future.  First, it ensures that loader only loads kernels and modules that are signed.  Second, it can serve as a kind of system-wide keyring (hence the device node that looks like a typical PEM-encoded CAroot file for OpenSSL applications).  The following is an overview of how it would operate in practice.

Signature Checking in the loader

In an EFI environment, boot1.efi and loader.efi have a chain of custody provided by the EFI secure boot framework.  This is maintained from boot1.efi to loader.efi, because of the use of the EFI loaded image interface.  The continuation of the chain of custody must be enforced directly by loader.efi.  To accomplish this, loader will link against the trust key library at build time to establish root keys.  These in turn can either be used to check the kernel and modules directly, or they can be used to check a per-kernel key (the second method is recommended; see below).

Per-Kernel Ephemeral Keys

The signelf utility was designed with the typical kernel build process in mind.  The kernel and all of its modules reside in a single directory; it’s a simple enough thing to run signelf on all of them as the final build step.  Additionally, signelf can generate an ephemeral key for signing and write out the verification certificate after it finishes.

This gives rise to a use pattern where every kernel is signed with an ephemeral key, and a verification certificate is written into the kernel directory.  This certificate is in turn signed by the local trust root key (signelf does this as part of the ephemeral key procedure).  In this case, the loader first attempts to load the verification certificate for a kernel, then it loads the kernel and all modules.

Signed Configuration Files

The FreeBSD loader relies on several files such as loader.4th, loader.conf, loader.menu, and others that control its behavior in significant ways.  Additionally, one can foresee applications of this system that rely on non-ELF configuration files.  For loader, the simplest solution is to store these files as non-detached PKCS#7 messages (meaning, the message and file contents are stored together).  Thus, loader would look for loader.conf.pk7, loader.4th.pk7, and so on.  A loader built for secure boot would look specifically for the .pk7 files, and would require signature verification in order to load them.

The keybuf Interface

The kernel keybuf interface was added in a patch I contributed in late March 2017.  It is used by GELI boot support to pass keys from the boot phases to the kernel.  However, it was designed to support up to 64 distinct 4096-bit keys without modification; thus it can be used with RSA-4096.  An alternative to linking the trust key library directly into the kernel is to have it receive the trusted root key as a keybuf entry.

This approach has advantages and disadvantages.  The advantage is it allows a generic kernel to be deployed to a large number of machines without rebuilding for each machine.  Specifically, this would allow the FreeBSD foundation to publish a kernel which can make use of local trust root keys.  The primary disadvantage is that the trust root keys are not part of the kernel and thus not guaranteed by the signature checking.  The likely solution will be to support both possibilities as build options.

Key Management

The preferred scheme for trust root keys is to have a local keypair generated on each machine, with the local verification certificate serving as the sole trust root key.  Any vendor keys that might be used would be signed by this keypair and loaded as intermediate keys.  Every kernel build would produce an ephemeral key which would be signed by the local keypair.  Kernel builds originating from an organization would also be signed by an ephemeral key, whose certificate is signed by the organization’s keypair.  For example, the FreeBSD foundation might maintain a signing key, which it uses to sign the ephemeral keys of all kernel builds it publishes.  An internal IT organization might do the same.

It would be up to the owner of a machine whether or not to trust the vendor keys originating from a given organization.  If the keys are trusted, then they are signed by the local keypair.  However, it is always an option to forego all vendor keys and only trust locally-built kernels.

An alternate use might be to have no local signing key, and only use an organizational trust root key.  This pattern is suitable for large IT organizations that produce lots of identical machines off of a standard image.

Conclusion

This design for the trust system and kernel/module signing is a comprehensive system-wide public-key trust management system for FreeBSD.  Its initial purpose is managing a set of keys that are used to verify kernels and kernel modules.  However, the system is designed to address the issues associated with trusted key management in a comprehensive and thorough way, and to leave the door open to many possible uses in the future.

Slides from My IEEE SecDev Talk

I gave a talk at IEEE SecDev on Nov 3 about my vision for how to combine industrial programming language pragmatics with formal methods.  The slides can be found here.

This was a 5-minute talk, but I will be expanding it into a 30-minute talk with more content.

InfoSec and IoT: A Sustainability Analogy

Yesterday saw a major distributed denial-of-service (DDoS) attack against the DNS infrastructure that crippled the internet for much of the east coast.  This attack disabled internet access for much of the Northeastern US, as well as other areas.  These sorts of attacks are nothing new; in fact, this attack came on the anniversary of a similar attack fourteen years ago.  Yesterday’s attack is nonetheless significant, both in its scope and also in the role of the growing internet of things (IoT) in the attack.

The attack was facilitated by the Mirai malware suite, which specifically targets insecure IoT devices, applying a brute-force password attack to gain access to the machines and deploy its malware.  Such an attack would almost certainly fail if directed against machines with appropriate security measures in place and on which passwords had been correctly set.  IoT devices, however, often lack such protections, are often left with their default login credentials, and often go unpatched (afterall, who among even the most eager adopters of IoT can say that they routinely log in to every lightbulb in their house to change the passwords and download patches).  Yesterday, we saw the negative consequences of the proliferation of these kinds of devices

Public Health and Pollution Analogies

Industry regulation- whether self-imposed or imposed by the state -is an widely-accepted practice among modern societies.  The case for this practice lies in the reality that some actions are not limited in their effect to oneself and one’s customers, but rather that they have a tangible effect on the entire world.  Bad practices in these areas leads to systemic risks that threaten even those who have nothing to do with the underlying culprits.  In such a situation, industry faces a choice of two options, one of which will eventually come to pass: self-regulate, or have regulations imposed from without.

Two classic examples of such a situation come in the form of public health concerns and environmental pollution.  Both of these have direct analogs to the situation we now face with insecure IoT devices and software (in)security in the broader context.

IoT and Pollution

After the third attack yesterday, I posted a series of remarks on Twitter that gave rise to this article, beginning with “IoT is the carbon emissions of infosec. Today’s incident is the climate change analog. It won’t be the last”.  I went on to criticize the current trend of gratuitously deploying huge numbers of “smart” devices without concern for the information security implications.

The ultimate point I sought to advance is that releasing huge numbers of insecure, connected devices into the world is effectively a form of pollution, and it has serious negative impacts on information security for the entire internet.  We saw one such result yesterday in the form of one of the largest DDoS attacks and the loss of internet usability for significant portions of the US.  As serious as this attack was, however, it could be far worse.  Such a botnet could easily be used in far more serious attacks, possibly to the point of causing real damage.  And of course, we’ve already seen cases of “smart” device equipped with cameras being used to surreptitiously capture videos of unsuspecting people which are then used for blackmail purposes.

These negative effects, like pollution, affect the world as a whole, not just the subset of those who decide they need smart lightbulbs and smart brooms.  They create a swarm of devices ripe for the plucking for malware, which in turn compromises basic infrastructure and harms everyone.  It is not hard to see the analogies between this and a dirty coal-burning furnace contaminating the air, leading to maladies like acid rain and brown-lung.

Platforms, Methodologies, and Public Health

Anyone who follows me on Twitter or interacts with me in person knows I am harshly critical of the current state of software methodologies, Scrum in particular, and of platforms based on untyped languages, NodeJS in particular.  Make no mistake, scrum is snake-oil as far as I’m concerned, and NodeJS is a huge step backward in terms of a programming language and a development platform.  The popularity of both of these has an obvious-enough root cause: the extreme bias towards developing minimally-functional prototypes, or minimum-viable products (MVPs), in Silicon Valley VC lingo.  Scrum is essentially a process for managing “war-room” emergencies, and languages like JavaScript do allow one to throw together a barely-working prototype faster than a language like Java, Haskell, or Rust.  This expedience has a cost, of course: such languages are far harder to secure, to test, and to maintain.

Of course, few consumers really care what sort of language or development methodology is used, so long as they get their product, or at least the current conventional wisdom goes.  When we consider the widespread information security implications, however, the picture begins to look altogether different.  Put another way, Zuckerburg’s addage “move fast and break things” becomes irresponsible and unacceptable when the potential exists to break the entire internet.

Since the early 1900’s, the US has had laws governing healthcare-related products as well as food, drugs and others.  The reasons for this are twofold: first, to protect consumers who lack insight into the manufacturing process, and second, to protect the public from health crises such as epidemics that arise from contaminated products.  In the case of the Pure Food and Drug act, the call for this regulation was driven in a large part by the extremely poor quality standards of large-scale industrial food processing as documented in Upton Sinclair’s work The Jungle.

The root cause of the conditions that led to the regulation of food industries and the conditions that have led to the popularization of insecure platforms and unsound development methodologies is, I believe, the same.  The cause is the competition-induced drive to lower costs and production times combined with a pathological lack of accountability for the quality of products and the negative effects of quality defects.  When combined, these factors consistently lead nowhere good.

Better Development Practices and Sustainability

These trends are simply not sustainable.  They serve to exacerbate an already severe information security crisis and on a long enough timeline, they stand to cause significant economic damage as a result of attacks like yesterdays, if not more severe attacks that pose a real material risk.

I do not believe government-imposed regulations are a solution to this problem.  In fact, in the current political climate, I suspect such a regulatory effort would end up imposing regulations such as back-doors and other measures that would do more damage to the state of information security that they would help.

The answer, I believe, must come from industry itself and must be led by infosec professionals.  The key is realizing that as is the case with sustainable manufacturing, better development practices are actually more viable and lead to lower eventual costs.  Sloppy practices and bad platforms may cut costs and development times in the now, but in the long run they end up costing much more.  This sort of paradigm shift is neither implausible nor unprecedented.  Driving it is a matter of educating industrial colleagues about these issues and the benefits of more sound platforms and development processes.

Summary

Yesterday’s attack brought the potential for the proliferation of insecure devices and software to have a profound negative effect on the entire world to the forefront.  A key root cause of this is an outdated paradigm in software development that ignores these factors in favor of the short-term view.  It falls to the infosec community to bring about the necessary change toward a more accurate view and more sound and sustainable practices.

FreeBSD OS-Level Tamper-Resilience

I’ve posted about my work on EFI GELI support.  This project is actually the first step in a larger series of changes that I’ve been sketching out since April.  The goal of the larger effort is to implement tamper-resilience features at the OS level for FreeBSD.  The full-disk encryption capabilities provided by GELI boot support represent the first step in this process.

OS-Level Tamper-Resilience

Before I talk about the work I’m planning to do, it’s worth discussing the goals and the rationale for them.  One of the keys to effective security is an accurate and effective threat model; another is identifying the scope of the security controls to be put in place.  This kind of thinking is important for this project in particular, where it’s easy to conflate threats stemming from vulnerable or malicious hardware with vulnerabilities at the OS level.

Regarding terminology: “tamper-resistance” means the ability of a device to resist a threat agent who seeks to gain access to the device while it is inactive (in a suspended or powered-off state) in order to exfiltrate data or install malware of some kind.  I specifically use the term “tamper-resilience” to refer to tamper-resistance features confined to the OS layer to acknowledge the fact that these features fundamentally cannot defeat threats based on hardware or firmware.

Threat Model

In our threat model, we have the following assets:

  • The operating system kernel, modules, and boot programs.
  • Specifically, a boot/resume program to be loaded by hardware, which must be stored as plaintext.
  • The userland operating system programs and configuration data.
  • The user’s data.

We assume a single threat agent with the following capabilities:

  • Access and write to any permanent storage medium (such as a disk) while the device is suspended or powered off.
  • Make copies of any volatile memory (such as RAM) while the device is suspended.
  • Defeat any sort of physical security or detection mechanisms to do so.

Specifically, the following capabilities are considered out-of-scope (they are to be handled by other mechanisms):

  • Accessing the device while powered on and in use.
  • Attacks based on hardware or firmware tampering.
  • Attacks based on things like bug devices, reading EM radiation (van Eyck phreaking), and the like.
  • Attacks based on causing users to install malware while using the device.

Thus, the threat model is based on an attacker gaining access to the device while powered-off or suspended and tampering with it at the OS level and up.

It is important to note that hardware/firmware tampering is a real and legitimate threat, and one deserving of effort.  However, it is a separate and parallel concern that requires its own effort.  Moreover, if the OS level has weaknesses, no amount of hardware or firmware hardening can compensate for it.

Tamper-Resilience Plan

The tamper resilience plan is based around the notion of protecting as much data as possible through authenticated encryption, using cryptographic verification to ensure that any part of the boot/resume process whose program must be stored as plaintext is not tampered with, and ensuring that no other data is accessible as plaintext while suspended or powered off.

The work on this breaks down into roughly three phases, one of which I’ve already finished.

Data Protection and Integrity

All data aside from the boot program to be loaded by the hardware (known in FreeBSD as boot1) can be effectively protected at rest by a combination of ZFS with SHA256 verification and the GELI disk encryption scheme.  Full-disk encryption protects data from theft, and combining it with ZFS’ integrity checks based on a cryptographically-secure hash function prevents an attacker from tampering with the contents (this can actually be done even on encrypted data without an authentication scheme in play).

Secure Boot

There is always at least one program that must remain unprotected by full-disk encryption: the boot entry-point program.  Fortunately, the EFI platform provides a mechanism for ensuring the integrity of the boot program.  EFI secure boot uses public-key crypto to allow the boot program to be signed by a private key and verified by a public key that is provided to the firmware.  If the verification fails, then the firmware informs the user that their boot program has been tampered with and aborts the boot.

In an open-source OS like FreeBSD, this presents an effective protection scheme along with full-disk encryption.  On most desktops and laptops, we build the kernel and boot loaders on the machine itself.  We can simply store a machine-specific signing key on the encrypted partition and use it to sign the boot loader for that machine.  The only way an attacker could forge the signature would be to gain access to the signing key, which is stored on an encrypted partition.  Thus, the attacker would have to already have access to the encrypted volume in order to forge a signature and tamper with the boot program.

To achieve the baseline level of protection, we need to ensure that the plaintext boot program is signed, and that it verifies the signature of a boot stage that is stored on an encrypted volume.  Because of the way the EFI boot process works, it is enough to sign the EFI boot1 and loader programs.  The loader program is typically stored on the boot device itself (which would be encrypted), and loaded by the EFI LOAD_IMAGE_PROTOCOL interface, which performs signature verification.  Thus, it should be possible to achieve baseline protection without having to modify boot1 and loader beyond what I’ve already done.

There is, of course, a case for doing signature verification on the kernel and modules.  One can even imagine signature verification on userland programs.  However, this is out-of-scope for the discussion here.

Secure Suspend/Resume

Suspend/resume represents the most significant tamper weakness at the present.  Suspend/resume in FreeBSD is currently only implemented for the suspend-to-memory sleep state.  This means that an attacker who gains access to the device while suspended effectively has access to the device at runtime.  More specifically, they have all of the following:

  • Access to the entire RAM memory state
  • Sufficient data to decrypt all mounted filesystems
  • Sufficient data to decrypt any encrypted swap partitions
  • Possibly the signing key for signing kernels

There really isn’t a way to protect a system that’s suspended to memory.  Even if you were to implement what amounts to suspend-to-disk by unmounting all filesystems and booting the kernel and all programs out to an encrypted disk storage, you still resume by starting execution at a specified memory address.  The attacker can just implant malware in that process if they have the ability to tamper with RAM.

Thus, the only secure way to do suspend/resume is to tackle suspend-to-disk support for FreeBSD.  Of course, it also has to be done securely.  The scheme I have in mind for doing so looks something like this:

  • Allow users to specify a secure suspend partition and set a resume password.  This can be done with a standard GELI partition.
  • Use the dump functionality to write out the entire kernel state to the suspend partition (because we intend to resume, we can’t do the usual trick of dumping to the swap space, as we need the data that’s stored there)
  • Alternatively, since the dump is being done voluntarily, it might be possible to write out to a filesystem (normally, dumps are done in response to a kernel panic, so the filesystem drivers are assumed to be corrupted).
  • Have the suspend-to-disk functionality sign the dumped state with a resume key (this can be the signing key for boot1, or it can be another key that’s generated during the build process)
  • Make boot1 aware of whatever it needs to know for detecting when resuming from disk and have it request a password, load the encrypted dumped state, and resume.

There are, of course, a lot of issues to be resolved in doing this sort of thing, and I imagine it will take quite some time to implement fully.

Going Beyond

Once these three things are implemented, we’d have a baseline of tamper-resilience in FreeBSD.  Of course, there are ways we could go further.  For one, signed kernels and modules are a good idea.  There has also been talk of a signed executable and libraries framework.

Current Status

My GELI EFI work is complete and waiting for testing before going through the integration process.  There are already some EFI signing utilities in existence.  I’m currently testing too many things to feel comfortable about trying out EFI signing (and I want to have a second laptop around before I do that kind of thing!); however, I plan on getting the baseline signed boot1 and loader scheme working, then trying to alter the build process to support automatically generating signed boot1 and loader programs.

The kernel crypto framework currently lacks public-key crypto support, and it needs some work anyway.  I’ve started working on a design for a new crypto library which I intend to replace the boot_crypto code in my GELI work and eventually the code in the kernel.  I’ve also heard of others working on integrating LibreSSL.  I view this as a precursor to the more advanced work like secure suspend/resume and kernel/module signing.

However, we’re currently in the middle of the 11 release process and there are several major outstanding projects (my GELI work, the i915 graphics work).  In general, I’m reluctant to move forward until those things calm down a bit.

Design Sketch for LiCl: A Lightweight Cryptography Library

There has been a lot of work on better cryptography libraries in the wake of a number of OpenSSL bugs.  One of the major steps forward in this realm is NaCl, or the Networking and Cryptography Library.  NaCl aims to address the fact that most older crypto libraries are quite difficult to use, and misuse is often the source of vulnerabilities.

In my recent work on FreeBSD, I ran into the kernel crypto code.  It is worth mentioning that older crypto, particularly kernel crypto frameworks tend to hearken back to days when things were different than they are now.  For one, strong crypto was classified as a munition, and exporting it from various countries ran afoul of international arms trafficking laws.  Second, CPUs were much slower back then, crypto represented a more significant overhead, and most hardware crypto devices were attached to the PCI bus.  In the modern world, we have Bernstein v. United States (publication of crypto is protected free speech), CPUs are much faster, and hardware crypto typically takes the form of special CPU instructions, not devices that have to be accessed through kernel interfaces.

This state of affairs tends to lead to fairly fragmented crypto codebases, which is exactly what the FreeBSD kernel crypto codebase looks like.  Moreover, it completely lacks any public-key algorithms, which are necessary for kernel and driver signing.  Lastly, existing userland crypto libraries tend not to fair so well when being converted into kernel libraries, as they tend to rely on userland utilities to operate.

LiCl Goals

To address this, I recently started working on ideas for a lightweight, embeddable crypto library I’m calling LiCl.  The name of course is the chemical symbol for lithium chloride: a salt similar to sodium chloride (NaCl).  An interpretation of the name could be “lightweight interoperable crypto library”, though it occurred to me that “Lego-inspired crypto library” also works, as the design involves building cryptosystems out of “blocks”.

LiCl aims to produce a lightweight crypto library that is easy to use and also easy to drop into any application (userland, kernel, or embedded).  It has several design goals, which I’ll discuss here.

Control over Crypto through Policies

Aspects of the library should be governed by policies which can be set both at library build time as well as in any application that uses the library.  Policies should be as fine-grained as “don’t use these specific algorithms”, all the way up to things like “don’t use hardware random number generators”, or “only use safecurves-approved ECC”.  If done right, this also captures the configuration options necessary to say things like “don’t use anything that depends on POSIX userland”.

This is done in the implementation through a variety of C preprocessor definitions that control which implementations are present in the library, and which can be used by an application.

NaCl-Style Easy Interfaces

NaCl is designed to eliminate many bugs that can arise from improper use of crypto by providing the simplest possible interface through its “box” functions.  In NaCl, this works, as it aims to provide a crypto interface for network applications.

LiCl, on the other hand, aims to provide a more general toolbox.  Thus, it needs a way to build up a NaCl-style box out of components.  As we’ll see, I have a plan for this.

Curate Crypto, Don’t Implement It

Most of LiCl will be the code devoted to assembling the external crypto interfaces.  The actual crypto implementations themselves will be curated from various BSD-compatible licensed or public-domain source.  Now, of course, I may run into some algorithms that require direct implementation; however, I intend to actually write crypto code myself as a last resort.

Design Sketch

My plans for LiCl actually draw on programming language concepts to some degree, where objects describing components of a crypto scheme represent an AST-like structure that is used to generate a NaCl-style interface.  I’ll go into the various components I’ve worked out, and how they all fit together.

It should be remembered that this is a design in progress; I’m still considering alternatives.

The User-Facing Crypto Interfaces

Right now, there are six user-facing interfaces, all of which take the form of structs with function pointers, each of which take a user data pointer (in other words, the standard method for doing object-oriented programming in C).  The exact length of the user data depends on the components from which the interface was built.  The six interfaces are as follows:

  • Symmetric-key encryption (stream cipher or block cipher with a mode)
  • Symmetric-key authenticated encryption
  • Symmetric-key authentication (hashing with a salt)
  • Public-key encryption
  • Public-key authenticated encryption (encryption with signature checking)
  • Public-key authentication (signature verification)

These interfaces represent the combination of multiple crypto methods to create a complete package that should handle all the details in a secure fashion.  The result is that we can support encryption/decryption and signing/verification in a NaCl box-like interface.

Creating User-Facing Interfaces

A user creates one of the above interfaces by assembling components, each of which represents some cryptographic primitive or method (for example, a hash function, or a block cipher mode).  The key is ensuring that users assemble these components in a way that is valid and secure.  This will be guaranteed by a “build cryptosystem” function that performs a consistency check on the specification it’s given.  For example, it shouldn’t allow you to encrypt an authenticated message (encrypt-then-MAC).  Another reason for this is for supporting hardware crypto, which may impose various limits on how the primitives those implementations provide can be used.

I come from a programming language background, so I like to think about this in those terms.  The “build cryptosystem” function acts similarly to a compiler, and the rules are similar to a type system.  The key here is figuring out exactly what the “types” are.  This is an ongoing task, but it starts with figuring out what the basic component model looks like.  I have a good start on that, and have identified several kinds of components.

Components

Ultimately, we’ll build up a cryptosystem out of components.  A components is essentially a “block” of crypto functionality, which itself may be built out of other components.  For example, a keystore may require a random source.  I’ve sketched a list of components so far, and will discuss each one here:

Random Sources

Random sources are essential in any cryptosystem.  In LiCl, I want to support an HSM-style interface for key generation and storage, so it’s necessary to provide a random source for generating keys.  There are also concerns such as padding that require random bits.  Random sources are the only thing in the GitHub repo at the moment, and the only one is the POSIX urandom source.  The first curation task is to identify a high-quality software random number generator implementation that’s BSD/MIT licensed or public domain.

Keystores

LiCl’s interfaces are built around an assumption that there’s a layer of indirection between keys and their representation in memory.  This is done to enable use of HSMs and other hardware crypto.  A keystore interface represents such an indirection.

Keystores have a notion of an external representation for keys.  In the “direct” keystore implementation, this is the same as the actual representation; in an HSM-based keystore, it might be an ID number.  Keystores provide the ability to generate keys internally, add keys to the store, delete keys, and extract a key given its external representation.

The only implementation so far is the “direct” keystore, which is just a passthrough interface.  It requires a random source for its keygen functionality.

Arbitrary-Size Operations

One major building block is the ability to perform a given operation on arbitrary-sized data.  This is innate in some primitives, such as stream ciphers and hash functions.  In others, it involves things like modes of operation and padding.

This is where the type-like aspects begin to become visible.  For example, the GCM block cipher mode takes a fixed-size symmetric-key encryption operation and produces an arbitrary-sized symmetric-key authenticated encryption operation.  We could write this down in a semi-formal notation as “symmetric enc fixed size (n) -> symmetric auth enc variable block size(n)”.  Padding operations would eliminate the restriction on input size, and could be written as “algo variable block size (n), randsrc -> algo output variable output block size (n)”.

Of course, we wouldn’t write down this notation anywhere in the actual implementation (except maybe in the documentation).  In the code, it would all be represented as data structures.

Ultimately, we’d need to assemble components to get an arbitrary-sized operation with no input block size restriction.  We’d also need to match the algorithm type of the scheme we’re trying to build (so if we want authenticated symmetric key encryption, we need to ensure that’s what we build).

MAC Schemes and Signing

MAC schemes and signing algorithms both take a hash function and an encryption scheme and produce an authenticated encryption scheme.  Signing algorithms also require a public-key crypto scheme.  In the semi-formal notation, a MAC scheme might look something like this: “symmetric enc variable, hash -> symmetric auth enc variable”

Ciphers and Hashes

Ciphers are of course the basic building blocks of all this.  Ciphers may have different characteristics.  Block ciphers might be written as “symmetric enc fixed size(n)”.  An authenticated stream cipher would be written as “symmetric auth enc variable”.

Putting it All Together

Ultimately, the “build cryptosystem” functions will take a tree-like structure as an argument that describes how to combine all the various components to build a cryptosystem.  They then perform a consistency check on the whole tree to ensure that everything is put together correctly and then fill up a cryptosystem structure with all the implementation functions and data necessary to make it work.

Going Forward

With the design I’ve described, it should be possible to build a crypto library that will serve the needs of kernel and systems developer, but will also make it easier to use crypto in a manner that is correct.

The biggest remaining question is whether this design can effectively deal with the legacy interfaces that kernel developers must deal with.  However, it seems at least plausible that the model of assembling components should be able to handle this.  Afterall, even legacy systems are ultimately just assembling crypto primitives in a particular way; if a particular system can’t be modeled by the components LiCl provides, it should be possible to implement new components within the model I’ve described.

The repository for the project is here, however, there isn’t much there at this point.