Design of a Trust System for FreeBSD

About a month ago, I started a discussion on freebsd-hackers and freebsd-security about a system for signed executables, with a focus on signed kernels and kernel modules.  This is part of a larger agenda of mine to equip FreeBSD with OS-level tamper resistance features.

While the initial use of this is for signing the kernel and its modules, and checking signatures during the loader process as well as at runtime when kernel modules are loaded.  However, it is desirable to build a system that is capable of growing in likely directions, such as executable and library signing.

This article details the current state of the design of this system.

Desiderata

I originally outlined a number of goals for this system:

  1. Be able to check for a correct cryptographic signature for any kernel or modules loaded at boot time for some platforms (EFI at a minimum)
  2. Be able to check for a correct cryptographic signature for any kernel module loaded during normal operations (whether or not to do this could be controlled by a sysctl, securelevel, or some similar mechanism)
  3. Work with what’s in the base system already and minimize new additions (ideally, just a small utility to sign executables)
  4. Minimize administrative overhead and ideally, require no changes at all to maintain signed kernel/modules
  5. Have a clear path for supporting signed executables/libraries.
  6. The design must support the case where a system builds locally and uses its own key(s) for signing kernels and modules (and anything else) and must allow the administrator complete control over which key(s) are valid for a given system (ie. no “master keys” controlled by central organizations)
  7. The design must allow for the adoption of new ciphers (there is an inevitable shift to post-quantum ciphers coming in the near future)

I also specified a number of non-goals:

  • Hardware/firmware-based attacks are considered out-of-scope (there is no viable method for defending against them at the OS level)
  • Boot platforms that don’t provide their own signature-checking framework up to loader/kernel can’t be properly secured, and are considered out-of-scope
  • Boot platforms that impose size restrictions prohibiting incorporation of RSA and ED25519 crypto code (ex. i386 BIOS) are considered out-of-scope
  • GRUB support is desirable, however it is not necessary to support GRUB out-of-the-box (meaning a design requiring reasonable modifications to GRUB is acceptable

Considerations

There are several considerations that should weigh in on the design.

FreeBSD Base System

Unlike linux, FreeBSD has a base operating system which contains a number of tools and libraries which provide a set of operating system utilities.  Most notably, the base system contains the OpenSSL (or in some cases, LibreSSL) crypto suite.  This includes an encryption library as well as tools capable of creating and managing key-pairs and other cryptographic data in a variety of formats.

Additionally, the FreeBSD base system contains libelf, which is a library that provides mechanisms for manipulating ELF binaries.  Additionally, the base system provides the binutils suite, including objcopy, which are command-line tools capable of manipulating ELF binaries.

Note that only some of these components (namely the signelf tool) exist at the present; the rest of the components exist only as man pages that describe them at present.

Public-Key Cryptography

The FreeBSD kernel does not currently incorporate code for public-key cryptography, and direct incorporation of OpenSSL into the kernel has proven infeasible.  Additionally, parsing code needs to be incorporated into the kernel for any formats that are used.  Options here include incorporation of code from the NaCl library, which provides a very lightweight implementation of both RSA 4096 and Ed25519, as well as creating a minimal library out of code harvested from OpenSSL or LibreSSL.

A note on elliptic curve cryptography: the state of support for safe elliptic curves is sad.  In my drafts of the man pages, I have mandated that the only acceptable curves are those that satisfy the security properties described by the SafeCurves project.  At this time, these include M-221, E-222, Curve1174, Curve25519, E-382, M-383, Curve383187, Curve41417, Goldilocks-448, M-511, and E-521.  Unfortunately, none of these is supported by OpenSSL at this time, though Curve25519 support is supposedly coming soon.  However, I would prefer to write specs that mandate the right curves (and thus put pressure on crypto libraries) than cave to using bad ones.

Modifications to GRUB

GRUB provides the best option for FreeBSD coreboot support at this time.  It also provides an existing mechanism for signing binaries.  However, this mechanism is deficient in two ways.  First, it relies on external signatures, which would complicate administration and require modification of virtually all installer programs, as well as run the risk of stale signatures.  Second, it relies on the gnupg toolset, which is not part of the FreeBSD base system.  Thus, it is inevitable that GRUB will need to be patched to support the signed executables proposed by this design.  However, we should make efforts to keep the necessary changes as minimal as possible.

Signing and Trust System Design

The signing and trust system consists of a number of components, some of which are standards, some of which are interfaces, and some of which are tools.  The core feature, of course, is the signed ELF convention.  The signelf tool provides a one-stop tool for signing large numbers of executables.  The trust system provides a system-level mechanism for registering and maintaining verification keys that are used to check signatures on kernel modules.  Finally, the portable verification library provides a self-contained code package that can be dropped into the kernel, the loader, or a third-party codebase like GRUB.

Note that this design is not yet implemented, so it may be subject to change.  Also, it has not yet undergone review on the FreeBSD lists, so it should be considered more of a proposal.

Signed ELF Binaries

The ELF format is very flexible, and provides a generic mechanism for storing metadata.  The signed ELF convention utilizes this to store signatures in a special section within the binary itself.  A signed ELF binary contains a section named .sign, which contains a detached PKCS#7 signature in DER encoding for the file.  This signature is computed (and checked) on the entire file, with the .sign section itself being replaced by zero data of equal size and position.

Signing an ELF binary is somewhat involved, as it requires determining the size of a signature, creating a new section (along with its name), recomputing the ELF layout, computing the signature, and writing it into the section.  Checking a signature is considerably simpler: it involves merely copying the signature, overwriting the .sign section with zeros, and then checking the signature against the  entire file.

The PKCS#7 format was chosen because it is an established standard which supports detached signatures as well as many other kinds of data.  The signatures generated for signed ELF files are minimal and do not contain certificates, attributes, or other data (a signature for RSA-4096 is under 800 bytes); however, the format is extensible enough to embed other data, allowing for future extensions.

The signelf Tool

Signed ELF binaries can be created and checked by adroit usage of the objcopy and openssl command-line tools.  This is quite tedious, however.  Moreover, there are certain use cases that are desirable, like signing a batch of executables using an ephemeral key, discarding the key, and generating a certificate for verification.  The signelf tool is designed to be a simplified mechanism for signing batches of executables which provides this additional functionality.  It is a fairly straightforward use of libelf and OpenSSL, and should be able to handle the binaries produced by normal compilation.  Additionally, the signelf tool can verify signed ELF files.  The signelf code is currently complete, and works on a kernel as well as modules.

The Trust System

In order to check signatures on kernel modules (and anything else), it is necessary to establish and maintain a set of trusted verification keys in the kernel (as well as in the boot loader).  In order for this system to be truly secure, at least one trust root key must be built into the kernel and/or the boot loader, which can then be used to verify other keys.  The trust system refers to the combination of kernel interfaces, standard file locations, and conventions that manage this.

System Trust Keys and Signing Keys

The (public) verification keys used to check signatures as well as the (private) signing keys used to generate signatures are kept in the /etc/trust/ directory.  Verification keys are stored in /etc/trust/certs, in the X509 certificate format, and private keys are stored in /etc/trust/keys in the private key format.  Both are stored in the PEM encoding (as is standard with many OpenSSL applications).

There is no requirement as to the number, identity, or composition of verification or signing keys.  Specifically, there is not and will never be any kind of mandate for any kind of verification key not controlled by the owner of the machine.  The trust system is designed to be flexible enough to accommodate a wide variety of uses, from machines that only trust executables built locally, to ones that trust executables built on an in-house machine only, to those that trust executables built by a third party (such as the FreeBSD foundation), or any combination thereof.

The preferred convention, however, is to maintain a single, per-machine keypair which is then used to sign any additional verification keys.  This keypair should be generated locally for each machine, and never exported from the machine.

Trust Keys Library

Keys under /etc/trust/certs will be converted into C code constants and subsequently compiled into a static library providing the raw binary data for the keys during the buildworld process.  This provides the mechanism for building keys into the kernel, loader, and other components.  These keys are known as trust root keys, as they provide the root set for all trusted keys.

Kernel Trust Interface

The kernel trust interface provides access to the set of verification keys trusted by the kernel.  This consists of an in-kernel interface as well as a user-facing device interface.  The in-kernel interface looks like an ordinary key management system (KMS) interface.  The device interface provides two primary mechanisms: access to the current set of trusted keys and the ability to register new keys or revoke existing ones.

Access to the existing database is accomplished through a read-only device node which simply outputs all of the existing trusted keys in PEM-encoded X509 format.  This formatting allows many OpenSSL applications to use the device node itself as a CA root file.  Updating the key database is accomplished by writing to a second device node.  Writing an X509 certificate signed by one of the existing trusted keys to this device node will cause the key contained in the certificate to be added to the trusted key set.  Writing a certificate revocation list (CRL) signed by a trusted key to the device node will revoke the keys in the revocation list as well as any keys whose signature chains depend on them.  Trust root keys cannot be revoked, however.

This maintains the trusted key set in a state where any trusted key has a signature chain back to a trust root key.

Portable Verification Library

The final piece of the system is the portable verification library.  This library should resemble a minimal OpenSSL-like API that performs parsing/encoding of the necessary formats (PKCS#7, X509, CRL), or a reduced subset thereof and public-key signature verification.  I have not yet decided whether to create this from harvesting code from OpenSSL/LibreSSL or write it from scratch (with code from NaCl), but I’m leaning toward harvesting code from LibreSSL.

Operation

The trust system performs two significant roles in the system as planned, and can be expanded to do more things in the future.  First, it ensures that loader only loads kernels and modules that are signed.  Second, it can serve as a kind of system-wide keyring (hence the device node that looks like a typical PEM-encoded CAroot file for OpenSSL applications).  The following is an overview of how it would operate in practice.

Signature Checking in the loader

In an EFI environment, boot1.efi and loader.efi have a chain of custody provided by the EFI secure boot framework.  This is maintained from boot1.efi to loader.efi, because of the use of the EFI loaded image interface.  The continuation of the chain of custody must be enforced directly by loader.efi.  To accomplish this, loader will link against the trust key library at build time to establish root keys.  These in turn can either be used to check the kernel and modules directly, or they can be used to check a per-kernel key (the second method is recommended; see below).

Per-Kernel Ephemeral Keys

The signelf utility was designed with the typical kernel build process in mind.  The kernel and all of its modules reside in a single directory; it’s a simple enough thing to run signelf on all of them as the final build step.  Additionally, signelf can generate an ephemeral key for signing and write out the verification certificate after it finishes.

This gives rise to a use pattern where every kernel is signed with an ephemeral key, and a verification certificate is written into the kernel directory.  This certificate is in turn signed by the local trust root key (signelf does this as part of the ephemeral key procedure).  In this case, the loader first attempts to load the verification certificate for a kernel, then it loads the kernel and all modules.

Signed Configuration Files

The FreeBSD loader relies on several files such as loader.4th, loader.conf, loader.menu, and others that control its behavior in significant ways.  Additionally, one can foresee applications of this system that rely on non-ELF configuration files.  For loader, the simplest solution is to store these files as non-detached PKCS#7 messages (meaning, the message and file contents are stored together).  Thus, loader would look for loader.conf.pk7, loader.4th.pk7, and so on.  A loader built for secure boot would look specifically for the .pk7 files, and would require signature verification in order to load them.

The keybuf Interface

The kernel keybuf interface was added in a patch I contributed in late March 2017.  It is used by GELI boot support to pass keys from the boot phases to the kernel.  However, it was designed to support up to 64 distinct 4096-bit keys without modification; thus it can be used with RSA-4096.  An alternative to linking the trust key library directly into the kernel is to have it receive the trusted root key as a keybuf entry.

This approach has advantages and disadvantages.  The advantage is it allows a generic kernel to be deployed to a large number of machines without rebuilding for each machine.  Specifically, this would allow the FreeBSD foundation to publish a kernel which can make use of local trust root keys.  The primary disadvantage is that the trust root keys are not part of the kernel and thus not guaranteed by the signature checking.  The likely solution will be to support both possibilities as build options.

Key Management

The preferred scheme for trust root keys is to have a local keypair generated on each machine, with the local verification certificate serving as the sole trust root key.  Any vendor keys that might be used would be signed by this keypair and loaded as intermediate keys.  Every kernel build would produce an ephemeral key which would be signed by the local keypair.  Kernel builds originating from an organization would also be signed by an ephemeral key, whose certificate is signed by the organization’s keypair.  For example, the FreeBSD foundation might maintain a signing key, which it uses to sign the ephemeral keys of all kernel builds it publishes.  An internal IT organization might do the same.

It would be up to the owner of a machine whether or not to trust the vendor keys originating from a given organization.  If the keys are trusted, then they are signed by the local keypair.  However, it is always an option to forego all vendor keys and only trust locally-built kernels.

An alternate use might be to have no local signing key, and only use an organizational trust root key.  This pattern is suitable for large IT organizations that produce lots of identical machines off of a standard image.

Conclusion

This design for the trust system and kernel/module signing is a comprehensive system-wide public-key trust management system for FreeBSD.  Its initial purpose is managing a set of keys that are used to verify kernels and kernel modules.  However, the system is designed to address the issues associated with trusted key management in a comprehensive and thorough way, and to leave the door open to many possible uses in the future.

Advertisements

InfoSec and IoT: A Sustainability Analogy

Yesterday saw a major distributed denial-of-service (DDoS) attack against the DNS infrastructure that crippled the internet for much of the east coast.  This attack disabled internet access for much of the Northeastern US, as well as other areas.  These sorts of attacks are nothing new; in fact, this attack came on the anniversary of a similar attack fourteen years ago.  Yesterday’s attack is nonetheless significant, both in its scope and also in the role of the growing internet of things (IoT) in the attack.

The attack was facilitated by the Mirai malware suite, which specifically targets insecure IoT devices, applying a brute-force password attack to gain access to the machines and deploy its malware.  Such an attack would almost certainly fail if directed against machines with appropriate security measures in place and on which passwords had been correctly set.  IoT devices, however, often lack such protections, are often left with their default login credentials, and often go unpatched (afterall, who among even the most eager adopters of IoT can say that they routinely log in to every lightbulb in their house to change the passwords and download patches).  Yesterday, we saw the negative consequences of the proliferation of these kinds of devices

Public Health and Pollution Analogies

Industry regulation- whether self-imposed or imposed by the state -is an widely-accepted practice among modern societies.  The case for this practice lies in the reality that some actions are not limited in their effect to oneself and one’s customers, but rather that they have a tangible effect on the entire world.  Bad practices in these areas leads to systemic risks that threaten even those who have nothing to do with the underlying culprits.  In such a situation, industry faces a choice of two options, one of which will eventually come to pass: self-regulate, or have regulations imposed from without.

Two classic examples of such a situation come in the form of public health concerns and environmental pollution.  Both of these have direct analogs to the situation we now face with insecure IoT devices and software (in)security in the broader context.

IoT and Pollution

After the third attack yesterday, I posted a series of remarks on Twitter that gave rise to this article, beginning with “IoT is the carbon emissions of infosec. Today’s incident is the climate change analog. It won’t be the last”.  I went on to criticize the current trend of gratuitously deploying huge numbers of “smart” devices without concern for the information security implications.

The ultimate point I sought to advance is that releasing huge numbers of insecure, connected devices into the world is effectively a form of pollution, and it has serious negative impacts on information security for the entire internet.  We saw one such result yesterday in the form of one of the largest DDoS attacks and the loss of internet usability for significant portions of the US.  As serious as this attack was, however, it could be far worse.  Such a botnet could easily be used in far more serious attacks, possibly to the point of causing real damage.  And of course, we’ve already seen cases of “smart” device equipped with cameras being used to surreptitiously capture videos of unsuspecting people which are then used for blackmail purposes.

These negative effects, like pollution, affect the world as a whole, not just the subset of those who decide they need smart lightbulbs and smart brooms.  They create a swarm of devices ripe for the plucking for malware, which in turn compromises basic infrastructure and harms everyone.  It is not hard to see the analogies between this and a dirty coal-burning furnace contaminating the air, leading to maladies like acid rain and brown-lung.

Platforms, Methodologies, and Public Health

Anyone who follows me on Twitter or interacts with me in person knows I am harshly critical of the current state of software methodologies, Scrum in particular, and of platforms based on untyped languages, NodeJS in particular.  Make no mistake, scrum is snake-oil as far as I’m concerned, and NodeJS is a huge step backward in terms of a programming language and a development platform.  The popularity of both of these has an obvious-enough root cause: the extreme bias towards developing minimally-functional prototypes, or minimum-viable products (MVPs), in Silicon Valley VC lingo.  Scrum is essentially a process for managing “war-room” emergencies, and languages like JavaScript do allow one to throw together a barely-working prototype faster than a language like Java, Haskell, or Rust.  This expedience has a cost, of course: such languages are far harder to secure, to test, and to maintain.

Of course, few consumers really care what sort of language or development methodology is used, so long as they get their product, or at least the current conventional wisdom goes.  When we consider the widespread information security implications, however, the picture begins to look altogether different.  Put another way, Zuckerburg’s addage “move fast and break things” becomes irresponsible and unacceptable when the potential exists to break the entire internet.

Since the early 1900’s, the US has had laws governing healthcare-related products as well as food, drugs and others.  The reasons for this are twofold: first, to protect consumers who lack insight into the manufacturing process, and second, to protect the public from health crises such as epidemics that arise from contaminated products.  In the case of the Pure Food and Drug act, the call for this regulation was driven in a large part by the extremely poor quality standards of large-scale industrial food processing as documented in Upton Sinclair’s work The Jungle.

The root cause of the conditions that led to the regulation of food industries and the conditions that have led to the popularization of insecure platforms and unsound development methodologies is, I believe, the same.  The cause is the competition-induced drive to lower costs and production times combined with a pathological lack of accountability for the quality of products and the negative effects of quality defects.  When combined, these factors consistently lead nowhere good.

Better Development Practices and Sustainability

These trends are simply not sustainable.  They serve to exacerbate an already severe information security crisis and on a long enough timeline, they stand to cause significant economic damage as a result of attacks like yesterdays, if not more severe attacks that pose a real material risk.

I do not believe government-imposed regulations are a solution to this problem.  In fact, in the current political climate, I suspect such a regulatory effort would end up imposing regulations such as back-doors and other measures that would do more damage to the state of information security that they would help.

The answer, I believe, must come from industry itself and must be led by infosec professionals.  The key is realizing that as is the case with sustainable manufacturing, better development practices are actually more viable and lead to lower eventual costs.  Sloppy practices and bad platforms may cut costs and development times in the now, but in the long run they end up costing much more.  This sort of paradigm shift is neither implausible nor unprecedented.  Driving it is a matter of educating industrial colleagues about these issues and the benefits of more sound platforms and development processes.

Summary

Yesterday’s attack brought the potential for the proliferation of insecure devices and software to have a profound negative effect on the entire world to the forefront.  A key root cause of this is an outdated paradigm in software development that ignores these factors in favor of the short-term view.  It falls to the infosec community to bring about the necessary change toward a more accurate view and more sound and sustainable practices.

FreeBSD OS-Level Tamper-Resilience

I’ve posted about my work on EFI GELI support.  This project is actually the first step in a larger series of changes that I’ve been sketching out since April.  The goal of the larger effort is to implement tamper-resilience features at the OS level for FreeBSD.  The full-disk encryption capabilities provided by GELI boot support represent the first step in this process.

OS-Level Tamper-Resilience

Before I talk about the work I’m planning to do, it’s worth discussing the goals and the rationale for them.  One of the keys to effective security is an accurate and effective threat model; another is identifying the scope of the security controls to be put in place.  This kind of thinking is important for this project in particular, where it’s easy to conflate threats stemming from vulnerable or malicious hardware with vulnerabilities at the OS level.

Regarding terminology: “tamper-resistance” means the ability of a device to resist a threat agent who seeks to gain access to the device while it is inactive (in a suspended or powered-off state) in order to exfiltrate data or install malware of some kind.  I specifically use the term “tamper-resilience” to refer to tamper-resistance features confined to the OS layer to acknowledge the fact that these features fundamentally cannot defeat threats based on hardware or firmware.

Threat Model

In our threat model, we have the following assets:

  • The operating system kernel, modules, and boot programs.
  • Specifically, a boot/resume program to be loaded by hardware, which must be stored as plaintext.
  • The userland operating system programs and configuration data.
  • The user’s data.

We assume a single threat agent with the following capabilities:

  • Access and write to any permanent storage medium (such as a disk) while the device is suspended or powered off.
  • Make copies of any volatile memory (such as RAM) while the device is suspended.
  • Defeat any sort of physical security or detection mechanisms to do so.

Specifically, the following capabilities are considered out-of-scope (they are to be handled by other mechanisms):

  • Accessing the device while powered on and in use.
  • Attacks based on hardware or firmware tampering.
  • Attacks based on things like bug devices, reading EM radiation (van Eyck phreaking), and the like.
  • Attacks based on causing users to install malware while using the device.

Thus, the threat model is based on an attacker gaining access to the device while powered-off or suspended and tampering with it at the OS level and up.

It is important to note that hardware/firmware tampering is a real and legitimate threat, and one deserving of effort.  However, it is a separate and parallel concern that requires its own effort.  Moreover, if the OS level has weaknesses, no amount of hardware or firmware hardening can compensate for it.

Tamper-Resilience Plan

The tamper resilience plan is based around the notion of protecting as much data as possible through authenticated encryption, using cryptographic verification to ensure that any part of the boot/resume process whose program must be stored as plaintext is not tampered with, and ensuring that no other data is accessible as plaintext while suspended or powered off.

The work on this breaks down into roughly three phases, one of which I’ve already finished.

Data Protection and Integrity

All data aside from the boot program to be loaded by the hardware (known in FreeBSD as boot1) can be effectively protected at rest by a combination of ZFS with SHA256 verification and the GELI disk encryption scheme.  Full-disk encryption protects data from theft, and combining it with ZFS’ integrity checks based on a cryptographically-secure hash function prevents an attacker from tampering with the contents (this can actually be done even on encrypted data without an authentication scheme in play).

Secure Boot

There is always at least one program that must remain unprotected by full-disk encryption: the boot entry-point program.  Fortunately, the EFI platform provides a mechanism for ensuring the integrity of the boot program.  EFI secure boot uses public-key crypto to allow the boot program to be signed by a private key and verified by a public key that is provided to the firmware.  If the verification fails, then the firmware informs the user that their boot program has been tampered with and aborts the boot.

In an open-source OS like FreeBSD, this presents an effective protection scheme along with full-disk encryption.  On most desktops and laptops, we build the kernel and boot loaders on the machine itself.  We can simply store a machine-specific signing key on the encrypted partition and use it to sign the boot loader for that machine.  The only way an attacker could forge the signature would be to gain access to the signing key, which is stored on an encrypted partition.  Thus, the attacker would have to already have access to the encrypted volume in order to forge a signature and tamper with the boot program.

To achieve the baseline level of protection, we need to ensure that the plaintext boot program is signed, and that it verifies the signature of a boot stage that is stored on an encrypted volume.  Because of the way the EFI boot process works, it is enough to sign the EFI boot1 and loader programs.  The loader program is typically stored on the boot device itself (which would be encrypted), and loaded by the EFI LOAD_IMAGE_PROTOCOL interface, which performs signature verification.  Thus, it should be possible to achieve baseline protection without having to modify boot1 and loader beyond what I’ve already done.

There is, of course, a case for doing signature verification on the kernel and modules.  One can even imagine signature verification on userland programs.  However, this is out-of-scope for the discussion here.

Secure Suspend/Resume

Suspend/resume represents the most significant tamper weakness at the present.  Suspend/resume in FreeBSD is currently only implemented for the suspend-to-memory sleep state.  This means that an attacker who gains access to the device while suspended effectively has access to the device at runtime.  More specifically, they have all of the following:

  • Access to the entire RAM memory state
  • Sufficient data to decrypt all mounted filesystems
  • Sufficient data to decrypt any encrypted swap partitions
  • Possibly the signing key for signing kernels

There really isn’t a way to protect a system that’s suspended to memory.  Even if you were to implement what amounts to suspend-to-disk by unmounting all filesystems and booting the kernel and all programs out to an encrypted disk storage, you still resume by starting execution at a specified memory address.  The attacker can just implant malware in that process if they have the ability to tamper with RAM.

Thus, the only secure way to do suspend/resume is to tackle suspend-to-disk support for FreeBSD.  Of course, it also has to be done securely.  The scheme I have in mind for doing so looks something like this:

  • Allow users to specify a secure suspend partition and set a resume password.  This can be done with a standard GELI partition.
  • Use the dump functionality to write out the entire kernel state to the suspend partition (because we intend to resume, we can’t do the usual trick of dumping to the swap space, as we need the data that’s stored there)
  • Alternatively, since the dump is being done voluntarily, it might be possible to write out to a filesystem (normally, dumps are done in response to a kernel panic, so the filesystem drivers are assumed to be corrupted).
  • Have the suspend-to-disk functionality sign the dumped state with a resume key (this can be the signing key for boot1, or it can be another key that’s generated during the build process)
  • Make boot1 aware of whatever it needs to know for detecting when resuming from disk and have it request a password, load the encrypted dumped state, and resume.

There are, of course, a lot of issues to be resolved in doing this sort of thing, and I imagine it will take quite some time to implement fully.

Going Beyond

Once these three things are implemented, we’d have a baseline of tamper-resilience in FreeBSD.  Of course, there are ways we could go further.  For one, signed kernels and modules are a good idea.  There has also been talk of a signed executable and libraries framework.

Current Status

My GELI EFI work is complete and waiting for testing before going through the integration process.  There are already some EFI signing utilities in existence.  I’m currently testing too many things to feel comfortable about trying out EFI signing (and I want to have a second laptop around before I do that kind of thing!); however, I plan on getting the baseline signed boot1 and loader scheme working, then trying to alter the build process to support automatically generating signed boot1 and loader programs.

The kernel crypto framework currently lacks public-key crypto support, and it needs some work anyway.  I’ve started working on a design for a new crypto library which I intend to replace the boot_crypto code in my GELI work and eventually the code in the kernel.  I’ve also heard of others working on integrating LibreSSL.  I view this as a precursor to the more advanced work like secure suspend/resume and kernel/module signing.

However, we’re currently in the middle of the 11 release process and there are several major outstanding projects (my GELI work, the i915 graphics work).  In general, I’m reluctant to move forward until those things calm down a bit.

Design Sketch for LiCl: A Lightweight Cryptography Library

There has been a lot of work on better cryptography libraries in the wake of a number of OpenSSL bugs.  One of the major steps forward in this realm is NaCl, or the Networking and Cryptography Library.  NaCl aims to address the fact that most older crypto libraries are quite difficult to use, and misuse is often the source of vulnerabilities.

In my recent work on FreeBSD, I ran into the kernel crypto code.  It is worth mentioning that older crypto, particularly kernel crypto frameworks tend to hearken back to days when things were different than they are now.  For one, strong crypto was classified as a munition, and exporting it from various countries ran afoul of international arms trafficking laws.  Second, CPUs were much slower back then, crypto represented a more significant overhead, and most hardware crypto devices were attached to the PCI bus.  In the modern world, we have Bernstein v. United States (publication of crypto is protected free speech), CPUs are much faster, and hardware crypto typically takes the form of special CPU instructions, not devices that have to be accessed through kernel interfaces.

This state of affairs tends to lead to fairly fragmented crypto codebases, which is exactly what the FreeBSD kernel crypto codebase looks like.  Moreover, it completely lacks any public-key algorithms, which are necessary for kernel and driver signing.  Lastly, existing userland crypto libraries tend not to fair so well when being converted into kernel libraries, as they tend to rely on userland utilities to operate.

LiCl Goals

To address this, I recently started working on ideas for a lightweight, embeddable crypto library I’m calling LiCl.  The name of course is the chemical symbol for lithium chloride: a salt similar to sodium chloride (NaCl).  An interpretation of the name could be “lightweight interoperable crypto library”, though it occurred to me that “Lego-inspired crypto library” also works, as the design involves building cryptosystems out of “blocks”.

LiCl aims to produce a lightweight crypto library that is easy to use and also easy to drop into any application (userland, kernel, or embedded).  It has several design goals, which I’ll discuss here.

Control over Crypto through Policies

Aspects of the library should be governed by policies which can be set both at library build time as well as in any application that uses the library.  Policies should be as fine-grained as “don’t use these specific algorithms”, all the way up to things like “don’t use hardware random number generators”, or “only use safecurves-approved ECC”.  If done right, this also captures the configuration options necessary to say things like “don’t use anything that depends on POSIX userland”.

This is done in the implementation through a variety of C preprocessor definitions that control which implementations are present in the library, and which can be used by an application.

NaCl-Style Easy Interfaces

NaCl is designed to eliminate many bugs that can arise from improper use of crypto by providing the simplest possible interface through its “box” functions.  In NaCl, this works, as it aims to provide a crypto interface for network applications.

LiCl, on the other hand, aims to provide a more general toolbox.  Thus, it needs a way to build up a NaCl-style box out of components.  As we’ll see, I have a plan for this.

Curate Crypto, Don’t Implement It

Most of LiCl will be the code devoted to assembling the external crypto interfaces.  The actual crypto implementations themselves will be curated from various BSD-compatible licensed or public-domain source.  Now, of course, I may run into some algorithms that require direct implementation; however, I intend to actually write crypto code myself as a last resort.

Design Sketch

My plans for LiCl actually draw on programming language concepts to some degree, where objects describing components of a crypto scheme represent an AST-like structure that is used to generate a NaCl-style interface.  I’ll go into the various components I’ve worked out, and how they all fit together.

It should be remembered that this is a design in progress; I’m still considering alternatives.

The User-Facing Crypto Interfaces

Right now, there are six user-facing interfaces, all of which take the form of structs with function pointers, each of which take a user data pointer (in other words, the standard method for doing object-oriented programming in C).  The exact length of the user data depends on the components from which the interface was built.  The six interfaces are as follows:

  • Symmetric-key encryption (stream cipher or block cipher with a mode)
  • Symmetric-key authenticated encryption
  • Symmetric-key authentication (hashing with a salt)
  • Public-key encryption
  • Public-key authenticated encryption (encryption with signature checking)
  • Public-key authentication (signature verification)

These interfaces represent the combination of multiple crypto methods to create a complete package that should handle all the details in a secure fashion.  The result is that we can support encryption/decryption and signing/verification in a NaCl box-like interface.

Creating User-Facing Interfaces

A user creates one of the above interfaces by assembling components, each of which represents some cryptographic primitive or method (for example, a hash function, or a block cipher mode).  The key is ensuring that users assemble these components in a way that is valid and secure.  This will be guaranteed by a “build cryptosystem” function that performs a consistency check on the specification it’s given.  For example, it shouldn’t allow you to encrypt an authenticated message (encrypt-then-MAC).  Another reason for this is for supporting hardware crypto, which may impose various limits on how the primitives those implementations provide can be used.

I come from a programming language background, so I like to think about this in those terms.  The “build cryptosystem” function acts similarly to a compiler, and the rules are similar to a type system.  The key here is figuring out exactly what the “types” are.  This is an ongoing task, but it starts with figuring out what the basic component model looks like.  I have a good start on that, and have identified several kinds of components.

Components

Ultimately, we’ll build up a cryptosystem out of components.  A components is essentially a “block” of crypto functionality, which itself may be built out of other components.  For example, a keystore may require a random source.  I’ve sketched a list of components so far, and will discuss each one here:

Random Sources

Random sources are essential in any cryptosystem.  In LiCl, I want to support an HSM-style interface for key generation and storage, so it’s necessary to provide a random source for generating keys.  There are also concerns such as padding that require random bits.  Random sources are the only thing in the GitHub repo at the moment, and the only one is the POSIX urandom source.  The first curation task is to identify a high-quality software random number generator implementation that’s BSD/MIT licensed or public domain.

Keystores

LiCl’s interfaces are built around an assumption that there’s a layer of indirection between keys and their representation in memory.  This is done to enable use of HSMs and other hardware crypto.  A keystore interface represents such an indirection.

Keystores have a notion of an external representation for keys.  In the “direct” keystore implementation, this is the same as the actual representation; in an HSM-based keystore, it might be an ID number.  Keystores provide the ability to generate keys internally, add keys to the store, delete keys, and extract a key given its external representation.

The only implementation so far is the “direct” keystore, which is just a passthrough interface.  It requires a random source for its keygen functionality.

Arbitrary-Size Operations

One major building block is the ability to perform a given operation on arbitrary-sized data.  This is innate in some primitives, such as stream ciphers and hash functions.  In others, it involves things like modes of operation and padding.

This is where the type-like aspects begin to become visible.  For example, the GCM block cipher mode takes a fixed-size symmetric-key encryption operation and produces an arbitrary-sized symmetric-key authenticated encryption operation.  We could write this down in a semi-formal notation as “symmetric enc fixed size (n) -> symmetric auth enc variable block size(n)”.  Padding operations would eliminate the restriction on input size, and could be written as “algo variable block size (n), randsrc -> algo output variable output block size (n)”.

Of course, we wouldn’t write down this notation anywhere in the actual implementation (except maybe in the documentation).  In the code, it would all be represented as data structures.

Ultimately, we’d need to assemble components to get an arbitrary-sized operation with no input block size restriction.  We’d also need to match the algorithm type of the scheme we’re trying to build (so if we want authenticated symmetric key encryption, we need to ensure that’s what we build).

MAC Schemes and Signing

MAC schemes and signing algorithms both take a hash function and an encryption scheme and produce an authenticated encryption scheme.  Signing algorithms also require a public-key crypto scheme.  In the semi-formal notation, a MAC scheme might look something like this: “symmetric enc variable, hash -> symmetric auth enc variable”

Ciphers and Hashes

Ciphers are of course the basic building blocks of all this.  Ciphers may have different characteristics.  Block ciphers might be written as “symmetric enc fixed size(n)”.  An authenticated stream cipher would be written as “symmetric auth enc variable”.

Putting it All Together

Ultimately, the “build cryptosystem” functions will take a tree-like structure as an argument that describes how to combine all the various components to build a cryptosystem.  They then perform a consistency check on the whole tree to ensure that everything is put together correctly and then fill up a cryptosystem structure with all the implementation functions and data necessary to make it work.

Going Forward

With the design I’ve described, it should be possible to build a crypto library that will serve the needs of kernel and systems developer, but will also make it easier to use crypto in a manner that is correct.

The biggest remaining question is whether this design can effectively deal with the legacy interfaces that kernel developers must deal with.  However, it seems at least plausible that the model of assembling components should be able to handle this.  Afterall, even legacy systems are ultimately just assembling crypto primitives in a particular way; if a particular system can’t be modeled by the components LiCl provides, it should be possible to implement new components within the model I’ve described.

The repository for the project is here, however, there isn’t much there at this point.

Design Sketch for a Quantum-Safe Encrypted Filesystem

Almost all disk encryption systems today follow a similar design pattern.  Symmetric-key block ciphers are used, with the initialization vector being derived entirely from the block index to which the data is stored.  Often times, the disk is broken up into sections, each of which has its own key.  However, the point of this is that the key and IV are static across any number of writes.

This preserves the atomicity of writes and allows the design to work at the block layer as opposed to the filesystem layer.  However, it also restricts the modes of operation to those that are strong against reuse of IVs.  Typically, this means CBC mode.  This block-level design also makes it quite difficult to integrate a MAC.  Modes like AES-XTS go some distance to mitigating this, and the problem can be mitigated completely by using a filesystem with inherent corruption-resistance like ZFS.

The problem is that this scheme completely prohibits the use of stream ciphers such as ChaCha20 or modes of operation such as CTR or OFB that produce stream cipher-like behavior.  This would be a footnote but for recent results that demonstrate a quantum period-finding attack capable of breaking basically all modes other than CTR or OFB.  This suggests that to implement quantum-safe encrypted storage, we need to come up with a scheme capable of using stream ciphers.

The Problem of Disk Encryption

The fundamental problem with using stream ciphers for block-layer disk encryption stems from the fact that the initialization vector (and ideally the key) must be changed every time the block is written, and this key must be available at an arbitrarily later time in order to read.

In general, there are basically three ways to manage keys in the context of disk encryption:

  1. Derive the key and IV from the block index
  2. Store the keys in a separate location on disk, look this up when needed
  3. Alter the interface for block IO to take a key as a parameter

Most current disk encryption schemes use option 1; however, this ends up reusing IVs, which prohibits the use of stream ciphers (and modes like CTR and OFB).  Option 2 guarantees that we have a unique IV (and key, if we want it) every time we write to a given block; we simply change keys and record this in our key storage.  The price we pay for this is atomicity: every incoming block write requires a write to two separate disk blocks.  This effectively undermines even atomic filesystems like ZFS.  The only example of this sort of scheme of which I am aware is FreeBSD’s older GBDE system.

Option 3 eschews the problem to someone else.  Of course, this means they have to solve the problem somehow.  The only way this ends up not being wholly equivalent to options 1 or 2 is if the consumer of the block-layer interface (the filesystem) somehow organizes itself in a way that a key and IV are always readily available whenever a read or write is about to take place.  This of course, requires addressing full-disk encryption in the filesystem itself.  It also places certain demands on the design of the filesystem.

Atomic Snapshot Filesystems

Atomic filesystems are designed in such a way that all I/O operations appear to be atomic.  With regard to writes, this means that the sort of filesystem corruption that necessitates tools like fsck cannot happen.  Either an operation takes place, or it does not.

Of course, a given write operation may actually perform many block writes; however, the filesystem’s on-disk data structures are carefully designed in such a way that one single write causes all of the operations that lead up to it to “take effect” at once.  Typically, this involves building up a number of “shadow” objects representing the new state, then switching over to them in a single write.

Note that in this approach, we effectively get snapshots for free.  We have a data structure consisting of a mutable spine that points to a complex but immutable set of data structures.  We never overwrite anything until the single operation that updates the mutable spine, causing the operation to take effect.

Atomic Key/IV Updates

The atomic snapshot filesystem design provides a way to effectively change the keys and IVs for every node of a filesystem data structure every time it is written.  Because we are creating a shadow data structure, then installing it with a single write, it is quite simple to generate new keys or IVs every time we create a node in this shadow structure.  Conversely, because the filesystem is atomic, and every node contains the keys and IVs for any node to which it points, anyone traversing the filesystem always has the information they need to decrypt any object they can reach.

This scheme has advantages over conventional disk or filesystem encryption.  Unlike conventional disk encryption, each filesystem object has its own key and IV, and these are uniquely generated every time a write takes place.  Nothing about the key and IV can be inferred by any attacker looking at an arbitrary disk block.  Unlike conventional filesystem encryption which typically only encrypts file contents, everything is encrypted.

Possible ZFS Extension

The ZFS filesystem is a highly advanced filesystem and volume management scheme that provides fully atomic operations and snapshots.  I am admittedly not familiar enough with its workings to know for absolute certain whether the scheme I describe above could be added to it, but I am fairly confident that it could.  I am also aware that ZFS provides an encryption system already, but I am also fairly confident that it is not equivalent to the scheme I describe above.

ZFS would also need to be extended to support a broader range of ciphers and modes of operation to take advantage of this scheme.  Support for CTR and OFB modes are absolutely essential, of course.  I would also recommend support for ciphers beyond AES.  Camellia and ChaCha20 would make good additions, among others.

Conclusion

Quantum-safe disk encryption is arguably not as critical to develop as quantum-safe encryption for network communications.  With network communications, it is reasonable to assume that all traffic is being recorded and will be subject to quantum attacks once those become available.  The same it not true of disk storage.  However, the technology does need to be developed, and the recent results about the period-finding attack on symmetric cipher modes demonstrate a workable attack against nearly all disk encryption schemes.

I would urge all filesystem projects to consider the scheme I’ve laid out and integrate concerns for quantum-safe encryption into their design.

As a final note, should anyone from Illumos run across this blog, I’d be more than willing to discuss more details of this scheme with them.

Reactions to Burr-Feinstein and Congressional Hearings

The relationship of government and technology has been cast to the forefront in the past two weeks, with the official introduction of the Burr-Feinstein anti-encryption bill, comments made by a US Attorney about banning “import of open-source encryption software”, and two congressional hearings on technological issues: one by the committee on energy and commerce, and one by the committee on oversight and government reform.  All of this points to a need for greater understanding of the issues surrounding strong encryption, both in the context of this debate as well as in the government at large.

Strong Encryption is Indispensable

Strong encryption is a technological necessity for building and operating computing and communication systems in the modern world.  It is simply not feasible and in many cases not possible to design these systems securely without building in strong encryption at a fundamental level.  We are seeing an increase in the attacks against computing and communication infrastructure, and there is no reason to believe this trend will stop in the foreseeable future.  Simply put, strong encryption is indispensable.

To fully understand the issue, however, we need to explore the specifics in greater detail.

Role of Strong Encryption in Secure Systems

Strong encryption plays a vital role in protecting information in modern computing and communication systems.  Cryptography deals with methods of secure communication over insecure channels.  Because of the scale, the distribution, and the inherent physics of modern communication and computing technology, it is simply not feasible (and in many cases, not even possible) to design and deploy “secure” channels and computing devices.

For example, it would be prohibitively expensive to replace the telecommunications grid with physically secure and shielded land-lines; moreover, this physical security system would be so large as to require its own “secure” communication channels.  Wireless communication, on the other hand, can’t be secured by physical means at all.  Similarly, physically securing every computing device is not even remotely possible, particularly with the proliferation of mobile devices.  Finally, strong encryption is critical for protecting systems from threats like malicious insiders, physical theft or assault, persistent threats, and attackers who are able to breach the outer defenses.

Even with physical security, there are still systems that inherently rely on strong encryption to function.  Authentication systems, which provide a means of securely identifying oneself inherently depend on the ability to present unforgeable credentials and communicate and store those credentials in a manner that prevents theft.  Basic authentication mechanisms rely on encryption to communicate passwords and store them securely.  Advanced authentication mechanisms such as the Kerberos protocol, certificate authentication, and CHAP protocols incorporate strong encryption on a more fundamental level, relying on its properties as part of their design.  These systems are especially high-value targets, as they serve as the “gatekeepers” to other parts of the system.  If an attacker is able to forge or steal authentication materials, they can gain arbitrary access to the system.

Necessity of Increased Use of Strong Encryption

Despite several assertions in the ongoing debates of “rapidly advancing technologies” and “going dark”, strong encryption is nothing new.  The methods and ciphers have existed for decades, and various protocols and technologies have been using them for the better part of twenty years.  Indeed, in certain applications such as banking, medical, and payment processing, use of encryption is mandated by law.  Even when there are no statutory requirements, strong encryption has been used for decades in many applications to mitigate the civil liability risk of data loss.

Prior to 2013, areas such as commodity operating systems, mobile devices, communication protocols, and cloud storage had been lagging behind the aforementioned higher-risk domains in terms of their use of strong encryption for security.  This was driven largely by a lack of perceived need.  However, the increasing interconnectedness of devices and systems coupled with a steady increase in the number, scope, and sophistication of cyberattacks, together with the increase in attacks sponsored by organized crime, corporate, and nation-state entities has driven vendors to build strong encryption into new products by default.  This is not criminals “going dark”.  Rather, it is the world-at-large reacting to an increasingly hostile climate by shoring up its defenses.

This strengthening of defenses is necessary; the data breaches of 2015 are quite literally too numerous to cite here and affected everything from major retailers to critical government systems.  This trend is expected to continue if not increase.  Because attackers tend to target the weak links in a system, we can expect systems that fail to employ strong encryption in their design to become targets for attacks.  Moreover, because of the increasing interconnectivity of devices and sophistication of attacks, we can expect these systems to become entry-points for multi-stage attacks and persistent infiltration.

The Fallacy of Secure Back-Doors

The notion of a secure back-door or “golden key” is a theme that has surfaced again and again in the ongoing debate on encryption.  Moreover, this notion played a central role in the similar debate that took place in the 1990’s.

In 1994, there was a push to legislate the Escrowed Encryption Standard (EES) as legally-usable crypto and to ban unescrowed encryption.  The EES hardware implementation was named “Clipper”, and was designed to provide the very sort of back-door access to encrypted traffic that has been the subject of recent debates.  This push lost its momentum when researchers discovered critical flaws in the cipher.  A very recent attempt by the British GCHQ to design a similar cipher has been found to have similar flaws.

In the mid-2000’s, the NSA introduced a surreptitious back-door into the Dual-EC random-number generation standard.  This back-door was designed to allow the NSA to reconstruct the stream of random numbers generated by the algorithm, thus allowing them to decrypt traffic.  The vulnerability was speculated about and exploits were developed by third-party researchers, and it was ultimately revealed to be the result of a deliberate effort by the NSA in the Snowden documents.  This back-door vulnerability has been a root cause of at least one high-profile breach: the Juniper ScreenOS vulnerability, which affected a number of high-security networks including the U.S. State and Treasury departments.

These real-world cases demonstrate the practical danger of back-doors.  On a more abstract level, a “secure” back-door is a paradox for the simple fact that any back-door is inherently a vulnerability.  Introduction of covert vulnerabilities into security systems has been one of the leading causes of exploits.  Doing so introduces added complexity and anomalies that an experienced researcher can detect and ultimately find ways to exploit.

Moreover, even if a back-door could be engineered in such a way as to be undetectable, there still remains the problem of protecting the information necessary to exploit the back-door.  Were back-doored encryption to be mandated by law, the information necessary to exploit it would be invaluable, as it would provide uncontrolled, unmitigated access into every system using the standard.  We can and should expect rival nation-state entities to employ every means to steal this information and were they to succeed, the result would be a severe national security crisis.

There is a scientific consensus among security researchers that back-doors cannot be engineered in such a way that does not introduce severe security risks.  Moreover, it is very telling that agencies such as GHCQ and the NSA have not produced such a system themselves, despite their considerable mathematical and computational resources and decided interest in doing so.  To ignore these facts and attempt to mandate back-doors would introduce critical and systemic vulnerabilities and grave risks to U.S. national security.

The Futility of an Encryption Ban

Even if secure back-doored cryptography were possible and the access materials could somehow be kept secure from attackers, a ban on strong encryption would be futile for the simple fact that it could not effectively be enforced.  It would be impossible to prevent anyone from obtaining the source code of, or at least the knowledge of how to implement strong crypto even within the U.S., let alone outside of it.

For starters, encryption software is ubiquitous.  Strong crypto has been the subject of extensive academic research for over half a century and has been written about in dozens textbooks and thousands of research papers.  Exact descriptions of strong encryption algorithms have been published in international standards by multiple bodies.  There are many implementations of these algorithms in both open- and closed-source software used around the world.  Moreover, these algorithms can be printed on a few sheets of paper, or even on a T-shirt.

Attempting to ban access to strong encryption is tantamount to attempting to ban the possession and implementation of widespread and pervasive knowledge.  Banning knowledge is as futile as it is misguided, and even if it could work, it would apply only to U.S. persons.  It would not prevent foreigners from obtaining and using knowledge about crypto.  Moreover, there is a long history of case law that would render any such action unconstitutional.  Griswold v. Connecticut arose from an attempt to ban possession or use of knowledge almost a century ago; more recently, Bernstein v. United States establishes the publication of open-source software as a form of free speech, protected by the First Amendment.

Lastly, even if such a ban could stand legally, strong encryption could still be utilized through the related technique of steganography which provides methods for surreptitiously embedding information inside seemingly innocuous data.  As a simple example, a hidden, encrypted message or file can be disguised as ordinary background noise in an image.  It is easy to see how this can be used to defeat any attempt to enforce a ban on encryption.

More fundamentally though, cryptography arises out of mathematics; it is not something we created, but rather something we discovered.  Trying to control the laws of mathematics through legislation is a doomed effort.  Rather, we should focus our efforts on finding ways to make the most of what encryption offers.

Impacts on the U.S. Infosec and Technology Industries

The U.S. information security and technology sectors rely on strong encryption to build secure products and maintain their competitive advantage.  Any ban or restriction on the ability of U.S. companies to use strong encryption in their products will almost certainly have serious negative consequences for these sectors.  This would likely lead to a serious negative impact on the U.S. economy and workforce, as well as national security and technological advantage.

Such a ban would amount to a guarantee that software produced inside the U.S. is insecure, which would create a critical competitive advantage for companies based outside the U.S.  The inability to properly secure software would prevent the information security industry from being able to operate effectively, and we should expect to see those firms immediately begin relocating operations to foreign countries where no such ban exists.  The competitive disadvantage imposed by being unable to produce secure software would likewise drive much of the software and technology sectors to move primary development activities off-shore, if at a slower rate.  The end result of this would not be the sort of universal access by law enforcement that these policies seek to provide, but rather a world where secure software incorporating strong encryption is produced by foreign nations, but not within the U.S.

We can expect that this move by industry would be echoed in the workforce, with the best workers emigrating as soon as possible to avoid negative impacts on their careers, followed by larger migrations driven by a shrinking job pool.  There is already a global shortage for technology workers, and several savvy nations have programs in place to encourage technology workers to immigrate, bringing their talents (and tax revenues) with them.  We could expect more of these sorts of policies should U.S. policies turn against the infosec and technology sectors, as foreign nations seek to capture talent leaving U.S.  This sort of foreign migration of an entire sector was evident during the 1990’s and early 2000’s, when export of strong crypto was controlled under arms trafficking laws within the U.S.

This risk to the information security and technology industry and the potential loss of the U.S.’s technological advantage was directly referenced during the energy and commerce hearing multiple times.  The industry panel confirmed that this is a concern among the industry leaders.  The law enforcement panel rebuffed the concern, but offered only a vague counterargument, stating that the demand for U.S. software would not be impacted because of the U.S.’s reputation.  This argument, which asserts that general reputation will somehow override specific, serious, and material concerns about quality, is an example of magical thinking and does not reflect an accurate picture of how reputation works, particularly with regards to technology.

The impact of the loss of the U.S. information security and technology sectors, as well as the technological advantage enjoyed by the U.S. as a result of its position within these industries on the U.S. economy would be catastrophic.  Moreover, the impact on national security would similarly be severe as it becomes necessary to look abroad for software vendors and security solutions.  Policies that industry leaders agree are likely to lead to this scenario are simply not a risk the U.S. can afford to take.

Relationship of Government and Technology

On a broader scale, we are facing a problem rooted in the relationship of technology and government.  The congressional hearings in particular point to a number of issues in this relationship, ranging from outdated systems, to lack of knowledge and understanding, to a generally disorganized approach.

Encryption is Complex and Requires New Thinking

One of the key difficulties of the issues surrounding encryption is the fact that it is very different from what existing laws and policies have grown accustomed to regulating.  This became evident in the congressional hearings, with representatives and law enforcement officials proposing “real-world” analogies, which do not hold up under more serious scrutiny.

If we are to use analogies to think about encryption and information security, the only really appropriate one is the world of microbiology, where pathogens are ubiquitous, adaptive, and require constant suppression by various immunity mechanisms.  An immune system in this environment is not an extra feature, but an absolute necessity for continued existence.  In such an analogy, the most dangerous pathogens of all are those that target the immune system itself; thus, any additional vulnerability such as a back-door opens up the entire system to such an attack.

More specifically, computational and communication infrastructure is vulnerable to attack because it is automatic, fast, and removed from human judgment.  Institutions like banks can institute security policies for access to assets like safe deposit boxes and vaults that rely on human judgment and that are not susceptible to mass exploitation.  The same is not true of systems protected by encryption: human judgment is far too slow to be a part of any computing process, and attackers can often use exploits against large amounts of data before being detected.

Lastly, the civil rights implications of encryption cannot be overlooked.  Encryption is quite rare among technologies in that it directly protects and supports basic freedoms in an environment that is far less friendly to those freedoms than the physical world in which we live.  While private communications can be conducted and accurate attributions can be made in the physical world, neither of these things are possible over the internet without strong encryption.  With a significant portion of public discourse having moved to computing-based platforms, technologies such as encryption play a key role in protecting  basic freedoms.  Moreover, strong encryption is vital for activists living in countries with oppressive governments, state censorship, and discrimination.  We must be careful to ensure that advancing technology does not erode basic rights, and technologies such as strong encryption play a vital role in doing so.

Encryption is a complex subject that cannot be accurately represented by any “real world” phenomenon, and requires effort to understand enough to form effective policy.  Moreover, it is subject to a “weakest link” principle that mandates considerable caution when developing both systems and the policies that govern them.  However, it is essential that we take the time and effort necessary to develop this understanding.

Technological Deficiency of Law Enforcement: A Serious Problem

One of the overarching themes of the congressional hearings- particularly of the first one is the apparent technological incompetence of high-level law enforcement officials.  This is a very serious problem, especially with attacks by state-sponsored hackers and organized crime on the rise.

The first panel in the hearing by energy and commerce was composed of high-level law-enforcement officials.  As a whole, these officials demonstrated an apparent lack of knowledge of the basics regarding technology and information security.  Their testimony was of a wholly different tone from some of the press we’ve seen in the course of this debate.  We have seen technically dubious PR, such as the claims about “dormant cyber-pathogens” and the New York Times’ characterization of what sounds like a command-line interface as encryption software.  This sort of malicious PR is no doubt designed to exploit false public perceptions formed from inaccurate depictions of hacking in movies and TV to make its point.

However, I do not believe that was what we saw in the energy and commerce hearing; rather, the law-enforcement officials seemed to be making a genuine testimony, but were simply lacking in the knowledge and competency necessary to make a coherent, factually-correct point.  In one of the more serious examples, one of the panelists responded to a question about the role of encryption in protecting authentication with a comment that authentication was a “firewall issue, not an encryption issue”.  This makes no sense technically (firewalls generally don’t manage authentication, while encryption is central in the design of authentication protocols), and points to a fundamental lack of understanding about how secure systems work.  Another panelist suggested statutory limits on the complexity of passwords.  Simply put, such a policy would be nothing short of an information security catastrophe.

This lack of competence shows in the solutions that were proposed by the panelists, which largely focused on attempting to break encryption outright, or else legislate weaknesses into security systems to facilitate this course of action.  This kind of thinking is common among novices in information security; experienced, knowledgeable actors such as professional hackers do not work this way.  A professional hacker would not attempt to break encryption, but rather would focus on circumventing it through measures such as capturing keys, capturing data in an unencrypted form, social engineering, persistent malware, and forensic analysis.

The appropriate response to this by technologists is not scorn and arrogance, but rather alarm and action.  The testimony in this hearing is evidence of a critical vulnerability in our law-enforcement system and by extension an inability to deal with the very real threats posed by the security problem.  This suggests that law enforcement is in desperate need of assistance to develop the necessary competencies to deal with these issues.  The technology sector can and should make efforts to educate and inform law enforcement, and help develop alternatives that do not weaken our infrastructure and create serious economic and national security risks.

Lack of Consensus within the Government

More generally, the hearings demonstrate a critical lack of consensus within the government as to how to act.  This division was evident among the panelists as well as the representatives questioning them.  Some demonstrate good technical competence, and make technically sound recommendations; others quite plainly do not.

Unsurprisingly, the most technically-competent areas of the government take a position in favor of strong encryption.  The NSA for example, has voiced support for strong encryption, as has the Secretary of Defense.  Former NSA and DHS heads have likewise voiced support for strong cryptoA report cited during the oversight and reform panel recommends (among similar points) that the U.S. Government “should not in any way subvert, weaken, or make vulnerable generally-available commercial software.”

Large sections of the government remain dangerously behind both in terms of technical competence and the state of their systems.  We of course have the technically unsound arguments in favor of the introduction of back-doors and other weaknesses in critical systems.  The oversight and reform hearing also revealed that some areas of the government are running dangerously out-of-date legacy systems, even referencing COBOL and punched-card based systems.  This is a serious problem in a world where state-sponsored hackers are on the rise.

To give credit where due, the Obama administration has begun to make moves to address this.  The foundation of the U.S. Digital Service, which seeks to draw talent from industry to address the problems within the government is a step in the right direction.  However, the congressional hearings suggest that we will need to step up these sorts of efforts significantly in order to address these problems effectively.

The Burr-Feinstein Anti-Encryption Bill

The Burr-Feinstein anti-encryption bill (formally, the “Compliance with Court Orders Act”) represents the wrong kind of thinking and policy on the issue of encryption.  The bill mandates that any producer of encryption software must provide access to encrypted data on demand.  While the bill does contain a strange provision stating that it does not mandate or prohibit any design feature, the fact remains that it is impossible to comply with its basic stipulations for any system which includes strong end-to-end encryption.  In spite of its assertion, the bill does effectively prohibit the development and use of these technologies.

As previously discussed, should the bill pass, we should expect the consequences with regard the U.S. information security and technology industries, the U.S. economy and workforce, U.S. national security and technological advantage, and the ability to defend against increasing information security threats to be very bad.  Moreover, the bill’s direction is very much out-of-sync with the recommendations and directions of the most technically competent parts of the government, and would likely undermine their ongoing efforts.

More generally, this bill is simply the wrong direction.  This kind of legislation will not work, as it will not prevent the development of truly secure software outside the U.S., nor can it prevent the use of strong encryption by criminals, state-sponsored hackers, and other extralegal entities.  It does nothing to address the critical lack of technological expertise by critical areas of the government, including law enforcement.  It stands to seriously undermine ongoing and important efforts to strengthen our defenses against a rising tide of attacks, and moreover, it is not at all clear how to comply with the bill’s stipulations while maintaining compliance with existing information security requirements in areas like banking, healthcare, payment processing, and storage of classified data.

Conclusion: Towards Effective Policy

Even though the congressional hearings served to highlight a number of problems, the overall tone was one of Congress taking action- which I believe to be more or less effective action -to understand and address these issues.  Moreover, it was apparent that some members of Congress do possess an astute grasp of the issues surrounding information security and encryption.  Of course, the existence of measures such as the Burr-Feinstein bill and the other problems I’ve mentioned show that we have quite a way to go.

I believe there is a need for the technology sector to take a proactive role in helping to shape these policies.  These issues are extremely complex, and we need to apply our expertise to the problems we are facing to find solutions that won’t cause serious damage to our economy and national security.  There are a number of issues that need to be addressed, including the following:

  • Make addressing the increasing number and sophistication of cyberattacks and vulnerabilities in our infrastructure a policy priority.
  • Address the pervasive presence of vulnerabilities in software as a whole.
  • Proactively replace vulnerable legacy systems and update outdated IT practices within the government.
  • Education and training to address the technological deficiency apparent in law enforcement competencies.
  • Develop techniques, guidance, and equipment to enable law enforcement to capture data in an unencrypted state.
  • Better understanding of the fundamental constraints governing what is possible with regard to encryption and information security.
  • Develop mitigation scenarios and techniques to deal with loss of critical infrastructure due to an exploit.
  • Further encourage and facilitate interaction with industry experts to help the government address these issues effectively.

In closing, one of the most telling remarks in the congressional hearings was the statement by an industry panelist that the state of software security is “a national crisis”.  A crisis of this kind calls for action, and it is critical that we take the necessary steps to understand the issues, so that we may address the crisis effectively.

The Complex Nature of the Security Problem

This article is an elaboration on ideas I originally developed in a post to the project blog for my pet programming language project here.  The ideas remain as valid (if not moreso) now as they did eight months ago when I wrote the original piece.

The year 2015 saw a great deal of publicity surrounding a number of high-profile computer security incidents.  While this trend has been ongoing for some time now, the past year marked a point at which the problem entered the public consciousness to the point where it has become a national news item and is likely to be a key issue in the coming elections and beyond.

“The Security Problem” as I have taken to calling it is not a simple issue and it does not have a simple solution.  It is a complex, multi-faceted problem with a number of root causes, and it cannot be solved without adequately addressing each of those causes in turn.  It is also a crucial issue that must be solved in order for technological civilization to continue its forward progress and not slip into stagnation or regression.  If there is a single message I would want to convey on the subject, it is this: the security problem can only be adequately addressed by a multitude of different approaches working in concert, each addressing an aspect of the problem.

Trust: The Critical Element

In late September, I did a “ride-along” of a training program for newly-hired security consultants.  Just before leaving, I spoke briefly to the group, encouraging them to reach out to us and collaborate.  My final words, however, were broader in scope: “I think every era in history has its critical problems that civilization has to solve in order to keep moving forward, and I think the security problem is one of those problems for our era.”

Why is this problem so important, and why would its existence have the potential to block forward progress?  The answer is trust.  Trust: specifically the ability to trust people about which we know almost nothing and indeed, may never meet is arguably the critical element that allows civilization to exist at all.  Consider what might happen, for example, if that kind of trust did not exist: we would be unable to create and sustain basic institutions such as governments, hospitals, markets, banks, and public transportation.

Technological civilization requires a much higher degree of trust.  Consider, for example, the amount of trust that goes into using something as simple as checking your bank account on your phone.  At a very cursory inspection, you trust the developers who wrote the app that allows you to access your account, the designers of the phone, the hardware manufacturers, the wireless carrier and their backbone providers, the bank’s server software and their system administrators, the third-party vendors that supplied the operating system and database software, the scientists who designed the crypto protecting your transactions and the standards organizations who codified it, the vendors who supplied the networking hardware, and this is just a small portion.  You quite literally trust thousands of technologies and millions of people that you will almost certainly never meet, just to do the simplest of tasks.

The benefits of this kind of trust are clear: the global internet and the growth of computing devices has dramatically increased efficiency and productivity in almost every aspect of life.  However, this trust was not automatic.  It took a long time and a great deal of effort to build.  Moreover, this kind of trust can be lost.  One of the major hurdles for the development of electronic commerce, for example, was the perception that online transactions were inherently insecure.

This kind of progress is not permanent, however; if our technological foundations prove themselves unworthy of this level of trust, then we can expect to see stymied progress or in the worst case, regression.

The Many Aspects of the Security Problem

As with most problems of this scope and nature, the security problem does not have a single root cause.  It is the product of many complex issues interacting to produce a problem, and therefore its solution will necessarily involve committed efforts on multiple fronts and multiple complimentary approaches to address the issues.  There is no simple cause, and no “magic bullet” solution.

The contributing factors to the security problem range from highly technical (with many aspects in that domain), to logistical, to policy issues, to educational and social.  In fact, a complete characterization of the problem could very well be the subject of a graduate thesis; the exposition I give here is therefore only intended as a brief survey of the broad areas.

Technological Factors

As the security problem concerns computer security (I have dutifully avoided gratuitous use of the phrase “cyber”), it comes as no surprise that many of the contributing factors to the problem are technological in nature.  However, even within the scope of technological factors, we see a wide variety of specific issues.

Risky Languages, Tools, and APIs

Inherently dangerous or risky programming language or API features are one of the most common factors that contribute to vulnerabilities.  Languages that lack memory safety can lead to buffer overruns and other such errors (which are among the most common exploits in systems), and untyped languages admit a much larger class of errors, many of which lead to vulnerabilities like injection attacks.  Additionally, many APIs are improperly designed and lead to vulnerabilities, or are designed in such a way that safe use is needlessly difficult.  Lastly, many tools can be difficult to use in a secure manner.

We have made some headway in this area.  Many modern frameworks are designed in such a way that they are “safe by default”, requiring no special configuration to satisfy many safety concerns and requiring the necessary configuration to address the others.  Programming language research over the past 30 years has produced many advanced type systems that can make stronger guarantees, and we are starting to see these enter common use through languages like Rust.  My current employer, Codiscope, is working to bring advanced program analysis research into the static program analysis space.  Initiatives like the NSF DeepSpec expedition are working to develop practical software verification methods.

However, we still have a way to go here.  No mature engineering discipline relies solely on testing: civil engineering, for example, accurately predicts the tolerances of a bridge long before it is built.  Software engineering has yet to develop methods with this level of sophistication.

Configuration Management

Modern systems involve a dizzying array of configuration options.  In multi-level architectures, there are many different components interacting in order to implement each bit of functionality, and all of these need to be configured properly in order to operate securely.

Misconfigurations are a very frequent cause of vulnerabilities.  Enterprise software components can have hundreds of configuration options per component, and we often string dozens of components together.  In this environment, it becomes very easy to miss a configuration option or accidentally fail to account for a particular case.  The fact that there are so many possible configurations, most of which are invalid further exacerbates the problem.

Crypto has also tended to suffer from usability problems.  Crypto is particularly sensitive to misconfigurations: a single weak link undermines the security of the entire system.  However, it can be quite difficult to develop and maintain hardened crypto configurations over time, even for the technologically adept.  The difficulty of setting up software like GPG for non-technical users has been the subject of actual research papers.  I can personally attest to this as well, having guided multiple non-technical people through the setup.

This problem can be addressed, however.  Configuration management tools allow configurations to be set up from a central location, and managed automatically by various services (CFEngine, Puppet, Chef, Ansible, etc.).  Looking farther afield, we can begin to imagine tools that construct configurations for each component from a master configuration, and to apply type-like notions to the task of identifying invalid configurations.  These suggestions are just the beginning; configuration management is a serious technical challenge, and can and should be the focus of serious technical work.

Legacy Systems

Legacy systems have long been a source of pain for technologists.  In the past, they represent a kind of debt that is often too expensive to pay off in full, but which exacts a recurring tax on resources in the form of legacy costs (compatibility issues, bad performance, blocking upgrades, unusable systems, and so on).  To most directly involved in the development of technology, legacy systems tend to be a source of chronic pain; however, from the standpoint of budgets and limited resources, they are often a kind of pain to be managed as opposed to cured, as wholesale replacement is far took expensive and risky to consider.

In the context of security, however, the picture is often different.  These kinds of systems are often extremely vulnerable, having been designed in a time when networked systems were rare or nonexistent.  In this context, they are more akin to rotten timbers at the core of a building.  Yes, they are expensive and time-consuming to replace, but the risk of not replacing them is far worse.

The real danger is that the infrastructure where vulnerable legacy systems are most prevalent: power grids, industrial facilities, mass transit, and the like are precisely the sort of systems where a breach can do catastrophic damage.  We have already seen an example of this in the real world: the Stuxnet malware was employed to destroy uranium processing centrifuges.

Replacing these legacy systems with more secure implementations is a long and expensive proposition, and doing it in a way that minimizes costs is a very challenging technological problem.  However, this is not a problem that can be neglected.

Cultural and Policy Factors

Though computer security is technological in nature, its causes and solutions are not limited solely to technological issues.  Policy, cultural, and educational factors also affect the problem, and must be a part of the solution.

Policy

The most obvious non-technical influence on the security problem is policy.  The various policy debates that have sprung up in the past years are evidence of this; however, the problem goes much deeper than these debates.

For starters, we are currently in the midst of a number of policy debates regarding strong encryption and how we as a society deal with the fact that such a technology exists.  I make my stance on the matter quite clear: I am an unwavering advocate of unescrowed, uncompromised strong encryption as a fundamental right (yes, there are possible abuses of the technology, but the same is true of such things as due process and freedom of speech).  Despite my hard-line pro-crypto stance, I can understand how those that don’t understand the technology might find the opposing position compelling.  Things like golden keys and abuse-proof backdoors certainly sound nice.  However, the real effects of pursuing such policies would be to fundamentally compromise systems and infrastructure within the US and turn defending against data breaches and cyberattacks into an impossible problem.  In the long run, this erodes the kind of trust in technological infrastructure of which I spoke earlier and bars forward progress, leaving us to be outclassed in the international marketplace.

In a broader context, we face a problem here that requires rethinking our policy process.  We have in the security problem a complex technological issue- too complex for even the most astute and deliberative legislator to develop true expertise on the subject through part-time study -but one where the effects of uninformed policy can be disastrous.  In the context of public debate, it does not lend itself to two-sided thinking or simple solutions, and attempting to force it into such a model loses too much information to be effective.

Additionally, the problem goes deeper than issues like encryption, backdoors, and dragnet surveillance.  Much of the US infrastructure runs on vulnerable legacy systems as I mentioned earlier, and replacing these systems with more secure, modern software is an expensive and time-consuming task.  Moreover, this need to invest in our infrastructure this way barely registers in public debate, if at all.  However, doing so is essential to fixing one of the most significant sources of vulnerabilities.

Education

Education, or the lack thereof also plays a key role in the security problem.  Even top-level computer science curricula fail to teach students how to think securely and develop secure applications, or even to impress upon students the importance of doing so.  This is understandable: even a decade ago, the threat level to most applications was nowhere near where it is today.  The world has changed dramatically in this regard in a rather short span of time.  The proliferation of mobile devices and connectedness combined with a tremendous upturn in the number of and sophistication of attacks launched against systems has led to a very different sort of environment than what existed even ten year ago (when I was finishing my undergraduate education).

College curricula are necessarily a conservative institution; knowledge is expected to prove its worth and go through a process of refinement and sanding off of rough edges before it reaches the point where it can be taught in an undergraduate curriculum.  By contrast, much of the knowledge of how to avoid building vulnerable systems is new, volatile, and thorny: not the sort of thing traditional academia likes to mix into a curriculum, especially in a mandatory course.

Such a change is necessary, however, and this means that educational institutions must develop new processes for effectively educating people about topics such as these.

Culture

While it is critical to have a infrastructure and systems built on sound technological approaches, it is also true that a significant number of successful attacks on both large enterprises and individuals alike make primary use of human factors and social engineering.  This is exacerbated by the fact that we, culturally speaking, are quite naive about security.  There are security-conscious individuals, of course, but most people are naive to the point that an attacker can typically rely on social engineering with a high success rate in all but the most secure of settings.

Moreover, this naivety affects everything else, ranging policy decisions to what priorities are deemed most important in product development.  The lack of public understanding of computer security allows bad policy such as back doors to be taken seriously and insecure and invasive products to thrive by publishing marketing claims that simply don’t reflect reality (SnapChat remains one of the worst offenders in this regard, in my opinion).

The root cause behind this that cultures adapt even more slowly than the other factors I’ve mentioned, and our culture has yet to develop effective ways of thinking about these issues.  But cultures do adapt; we all remember sayings like “look both ways” and “stop, drop, and roll” from our childhood, both of which teach simple but effective ways of managing more basic risks that arise from technological society.  This sort of adaptation also responds to need.  During my own youth and adolescence, the danger of HIV drove a number of significant cultural changes in a relatively short period of time that proved effective in curbing the epidemic.  While the issues surrounding the security problem represent a very different sort of danger, they are still pressing issues that require an amount of cultural adaptation to address.  A key step in addressing the cultural aspects of the security problem comes down to developing similar kinds of cultural understanding and awareness, and promoting behavior changes that help reduce risk.

Conclusion

I have presented only a portion of the issues that make up what I call the “computer security problem”.  These issues are varied, ranging from deep technological issues obviously focused on security to cultural and policy issues.  There is not one single root cause to the problem, and as a result, there is no one single “silver bullet” that can solve it.

Moreover, if the problem is this varied and complex, then we can expect the solutions to each aspect of the problem to likewise require multiple different approaches coming from different angles and reflecting different ways of thinking.  My own work, for example, focuses on the language and tooling issue, coming mostly from the direction of building tools to write better software.  However, there are other approaches to this same problem, such as sandboxing and changing the fundamental execution model.  All of these angles deserve consideration, and the eventual resolution to that part of the security problem will likely incorporate developments from each angle of approach.

If there is a final takeaway from this, it is that the problem is large and complex enough that it cannot be solved by the efforts or approach of a single person or team.  It is a monumental challenge requiring the combined tireless efforts of a generation’s worth of minds and at least a generation’s worth of time.